2020年11月24日 星期二

[ 文章收集 ] Pandas dataframe filter with Multiple conditions

 Source From Here

Preface
Selecting or filtering rows from a dataframe can be sometime tedious if you don’t know the exact methods and how to filter rows with multiple conditions. In this post we are going to see the different ways to select rows from a dataframe using multiple conditions

Let’s create a dataframe with 5 rows and 4 columns i.e. Name, Age, Salary_in_1000 and FT_Team(Football Team):
  1. import pandas as pd  
  2. df=pd.DataFrame({'Name':['JOHN','ALLEN','BOB','NIKI','CHARLIE','CHANG'],  
  3.               'Age':[35,42,63,29,47,51],  
  4.               'Salary_in_1000':[100,93,78,120,64,115],  
  5.              'FT_Team':['STEELERS','SEAHAWKS','FALCONS','FALCONS','PATRIOTS','STEELERS']})  
  6. df  
Output:


Selecting Dataframe rows on multiple conditions using these 5 functions
In this section we are going to see how to filter the rows of a dataframe with multiple conditions using these five methods
a) loc
b) numpy where
c) Query
d) Boolean Indexing
e) eval

What’s the Condition or Filter Criteria ?
Get all rows having salary greater or equal to 100K and Age < 60 and Favourite Football Team Name starts with ‘S’

Using loc with multiple conditions
loc is used to Access a group of rows and columns by label(s) or a boolean array. As an input to label you can give a single label or it’s index or a list of array of labels.

Enter all the conditions and with & as a logical operator between them:
  1. df.loc[(df['Salary_in_1000']>=100) & (df['Age']< 60) & (df['FT_Team'].str.startswith('S')),['Name','FT_Team']]  
Output:


Using np.where with multiple conditions
numpy.where can be used to filter the array or get the index or elements in the array where conditions are met. You can read more about np.where in this post.

Numpy where with multiple conditions and & as logical operators outputs the index of the matching rows:
import numpy as np
  1. idx = np.where((df['Salary_in_1000']>=100) & (df['Age']< 60) & (df['FT_Team'].str.startswith('S')))  
Output:
  1. (array([05], dtype=int64),)  
The output from the np.where, which is a list of row index matching the multiple conditions is fed to dataframe loc function:
Output:


Using Query with multiple Conditions
It is used to Query the columns of a DataFrame with a boolean expression:
  1. df.query('Salary_in_1000 >= 100 & Age < 60 & FT_Team.str.startswith("S").values')  
Output:


pandas boolean indexing multiple conditions
It is a standrad way to select the subset of data using the values in the dataframe and applying conditions on it.

We are using the same multiple conditions here also to filter the rows from pur original dataframe with salary >= 100 and Football team starts with alphabet ‘S’ and Age is less than 60:
  1. df[(df['Salary_in_1000']>=100) & (df['Age']<60) & df['FT_Team'].str.startswith('S')][['Name','Age','Salary_in_1000']]  
Output:


Pandas Eval multiple conditions
Evaluate a string describing operations on DataFrame column. It Operates on columns only, not specific rows or elements:
  1. df[df.eval("Salary_in_1000>=100 & (Age <60) & FT_Team.str.startswith('S').values")]  
Output:










2020年11月21日 星期六

[ Python 常見問題 ] Use of python getattr/setattr for current function

 Source From Here

Question
Is it possible to use getattr/setattr to access a variable in a class function?

Example below. Say I have a class A, that has two methods, func1 and func2 both of which define a variable count of a different type. Is there a way to use getattr in func2 to access the local variable count?

In reality, I have quite a few variables in func2 (that are also defined differently in func1) that I want to loop through using getattr and I'm looking to shorten up my code a bit by using a loop through the variable names.
  1. class A(object):  
  2.   
  3.    def __init__(self):  
  4.       pass  
  5.   
  6.    def func1(self):  
  7.         count = {"A"1"B":2}  
  8.   
  9.    def func2(self):  
  10.         count = [12]  
  11.         mean = [1020]  
  12.         for attr in ("count""mean"):  
  13.            xattr = getattr(self, attr)   ! What do I put in here in place of "self"?  
  14.            xattr.append(99)  

HowTo
  1. import sys  
  2.   
  3. getattr(sys.modules[__name__], attr)  
you can also look up and update the dict returned by globals() directly, ex. this is roughly equivalent to the getattr() above:

2020年11月20日 星期五

[ 文章收集 ] Docker Tips: Clean Up Your Local Machine

 Source From Here

Preface
Understand disk space usage and reclaiming the unused part

In this piece, we’ll go back to basics. We will look at how Docker uses the disk space of the host machine and how to reclaim it when it is not being used anymore:


Overall Consumption
Docker is great, there’s no doubt about that. A couple of years ago, it provided a new way to build, ship and run any workloads by democratizing the usage of containers and hugely simplifying management of their lifecycle. It also brought the developer the ability to run any applications without polluting the local machine. But, when we run containers, pull images, deploy complex application stacks, and build our own images the footprint on our host filesystem might increase in a significant way.

If we have not cleaned up our local machine for a while we might be surprised by the result of this command:
$ docker system df



This command shows Docker’s disk usage in several categories:
* Images:
The size of the images that have been pulled from a registry and the ones built locally.

* Containers:
The disk space used by the containers running on the system, meaning the space of each containers’ read-write layer.

* Local Volumes:
Storage persisted on the host but outside of a container’s filesystem.

* Build Cache:
the cache generated by the image build process (only if using BuildKit, available from Docker 18.09).

From the output above, we can see quite a lot of disk space can be reclaimable. In other words, as it’s not in use by Docker, it can be given back to the host machine.

Containers Disk Usage
Each time a container is created, several folders and files are created under /var/lib/docker on the host machine. Among them:
* the /var/lib/docker/containers/ID folder (ID being the container’s unique identifier)
If the container uses the default logging driver, all its logs will be persisted in a JSON file within this folder. In this context, generating too many logs might impact the filesystem of the host machine.

* a folder within /var/lib/docker/overlay2
which contains the container’s read-write layer (overlay2 being the preferred storage driver on most Linux distributions). If the container persists data in its own filesystem, those will be stored under /var/lib/docker/overlay2 on the host machine.

Let’s imagine we have a brand new system where Docker has just been installed:
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         0          0          0B         0B  
  3. Containers     0          0          0B         0B  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

First, we start a NGINX container:
$ docker container run --name www -d -p 8000:80 nginx:1.16

Running the df command again, we can now see:
* one image with a size of 126MB. This is the NGINX:1.16 one pulled when we launched the container.
* one container — the www container run from the NGINX image.


$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         1          1          126M       0B (0%)  
  3. Containers     1          1          2B         0B (0%)  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

There is no reclaimable space yet as the container is running and the image is currently in use. As the size of the container (2B) is negligible and thus not easy to track on the filesystem, let’s create an empty 100MB file in the container’s filesystem. For this purpose, we use the handy dd command from within the www container.
$ docker exec -ti www dd if=/dev/zero of=test.img bs=1024 count=0 seek=$[1024*100]

This file is created in the read-write layer associated with this container. If we check the output of the df command again, we can now see the container now takes up some additional disk space:
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         1          1          126M       0B (0%)  
  3. Containers     1          1          104.9MB    0B (0%)  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

Where is this file located on the host? Let’s take a look:
$ find /var/lib/docker -type f -name test.img
/var/lib/docker/overlay2/83f177...630078/merged/test.img
/var/lib/docker/overlay2/83f177...630078/diff/test.img

Without going too deep into the details, this file was created in the container’s read-write layer which is managed by the overlay2 driver. If we stop the container, the disk space used by the container becomes reclaimable. Let’s take a look:
// Stopping the www container
$ docker stop www

// Visualizing the impact on the disk usage
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         1          1          126M       0B (0%)  
  3. Containers     1          0          104.9MB    104.9MB (100%)  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

How can this space be reclaimed? By deleting the container, which will delete the associate read-write container’s layer. The following commands allow us to delete all stopped containers at once and to reclaim the disk space they’re using:
$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N]
 y
Deleted Containers:
5e7f8e5097ace9ef5518ebf0c6fc2062ff024efb495f11ccc89df21ec9b4dcc2
Total reclaimed space: 104.9MB

From the output, we can see there is no more space used by containers and, as the image is not used anymore (no container is running), the space it uses on the host filesystem can be reclaimed:
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         1          0          126M       126M (100%)  
  3. Containers     0          0          0B         0B  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

Note: As soon as an image is used by at least one container, the disk space it uses cannot be reclaimed.

The prune subcommand we used above removes the stopped containers. If we need to remove all containers, the running ones and the stopped ones we can use one of the following commands (both are equivalent):
// Historical command
$ docker rm -f $(docker ps -aq)

// More recent command
$ docker container rm -f $(docker container ls -aq)

Note: It’s often useful to use the --rm flag when running a container so that it is automatically removed when it’s PID 1 process is stopped, thus releasing unused disk immediately.

Images Disk Usage
A couple of years ago, it was common to have several hundred MB per image. Ubuntu was around 600MB, Microsoft .Net images weighed several GB (true story). At that time, pulling only a couple of images could quickly impact the disk space of the host machine, even if the layers are shared between images. This is less true today — base images are much lighter — but after a certain amount of time, piling up images will definitely have an impact if we’re not careful.

There are several kinds of images that are not directly visible to the end-user:
Intermediate images are referenced by other images (child image) and cannot be removed
Dangling images are images no longer referenced. They take some disk space and so can be deleted

The following commands list the existing dangling image on the system:
$ docker image ls -f dangling=true
  1. REPOSITORY  TAG      IMAGE ID         CREATED             SIZE  
  2. <none>      <none>   21e658fe5351     12 minutes ago      71.3MB  

To remove the dangling image we can go the long way:
$ docker image rm $(docker image ls -f dangling=true -q)

Or we can use the prune subcommand:
$ docker image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N]
 y
Deleted Images:
deleted: sha256:143407a3cb7efa6e95761b8cd6cea25e3f41455be6d5e7cda
deleted: sha256:738010bda9dd34896bac9bbc77b2d60addd7738ad1a95e5cc
deleted: sha256:fa4f0194a1eb829523ecf3bad04b4a7bdce089c8361e2c347
deleted: sha256:c5041938bcb46f78bf2f2a7f0a0df0eea74c4555097cc9197
deleted: sha256:5945bb6e12888cf320828e0fd00728947104da82e3eb4452f
Total reclaimed space: 12.9kB

In case we need to remove all images at once (not only the dangling ones) we can run the following command. This will not be able to remove the images currently used by a container though:
$ docker image rm $(docker image ls -q)


Volumes Disk Usage
Volumes are used to store data outside of a container filesystem. For instance, when a container runs a stateful application we want the data to be persisted outside of the container so they are decoupled from the container life-cycle. Volumes are also used because heavy filesystem operations inside the container are bad for performance.

Say we run a container based on MongoDB and then use it to test a backup we previously did (available locally in the bck.json file):
// Running a mongo container
$ docker run --name db -v $PWD:/tmp -p 27017:27017 -d mongo:4.0

// Importing an existing backup (from a huge bck.json file)
$ docker exec -ti db mongoimport \
--db 'test' \
--collection 'demo' \
--file /tmp/bck.json \
--jsonArray

The data within the backup file will be stored on the host in the /var/lib/docker/volumes folder. Why is this data not saved within the container’s layer? Because in the mongo image’s Dockerfile the location /data/db (where mongo stores its data by default) is defined as a volume:

Note: Many images, often related to stateful applications, define volumes to manage data outside of the container’s layer.

When we are done testing the backup we stop or remove the container. But the volume is not removed — it stays there consuming disk space unless we explicitly remove it. To remove the volumes not used any longer, we can go the long way:
$ docker volume rm $(docker volume ls -q)

Or we can use the prune subcommand:
$ docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
d50b6402eb75d09ec17a5f57df4ed7b520c448429f70725fc5707334e5ded4d5
8f7a16e1cf117cdfddb6a38d1f4f02b18d21a485b49037e2670753fa34d115fc
599c3dd48d529b2e105eec38537cd16dac1ae6f899a123e2a62ffac6168b2f5f
...
732e610e435c24f6acae827cd340a60ce4132387cfc512452994bc0728dd66df
9a3f39cc8bd0f9ce54dea3421193f752bda4b8846841b6d36f8ee24358a85bae
045a9b534259ec6c0318cb162b7b4fca75b553d4e86fc93faafd0e7c77c79799
c6283fe9f8d2ca105d30ecaad31868410e809aba0909b3e60d68a26e92a094da
Total reclaimed space: 25.82GB

Build Cache Disk Usage
The Docker 18.09 release introduces enhancements for the build process through BuildKit. Using this tool can improve performance, storage management, feature functionality, and security. We won’t detail BuildKit in this piece, but just look at how to enable it and how it affects disk usage.

Let’s consider the following dummy Node.Js application and its associated Dockerfile: index.js file defines a simple HTTP server which exposes the ‘/’ endpoint and replies with a string for each request received:
  1. var express = require('express');  
  2. var util    = require('util');  
  3. var app = express();  
  4. app.get('/', function(req, res) {  
  5.   res.setHeader('Content-Type''text/plain');  
  6.   res.end(util.format("%s - %s"new Date(), 'Got Request'));  
  7. });  
  8. app.listen(process.env.PORT || 80);  
package.json defines the dependencies: only expressjs here, to set up the HTTP server:
  1. {  
  2.   "name""testnode",  
  3.   "version""0.0.1",  
  4.   "main""index.js",  
  5.   "scripts": {  
  6.     "start""node index.js"  
  7.   },  
  8.   "dependencies": {  
  9.     "express""^4.14.0"  
  10.   }  
  11. }  
Dockerfile defines how to build an image from the code above:
  1. FROM node:13-alpine  
  2. COPY package.json /app/package.json  
  3. RUN cd /app && npm install  
  4. COPY . /app/  
  5. WORKDIR /app  
  6. EXPOSE 80  
  7. CMD ["npm""start"]  
Let’s build an image as we usually do, without BuildKit enabled:
$ docker build -t app:1.0 .

If we check the disk usage, we only see the base image (node:13-alpine pulled at the beginning of the build) and the final image of the build (app:1.0):
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         2          0          109.3MB    109.3MB (100%)  
  3. Containers     0          0          0B         0B  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    0          0          0B         0B  

Let’s now build the version 2.0 of the image using BuildKit. We just need to set the DOCKER_BUILDKIT to 1:
$ DOCKER_BUILDKIT=1 docker build -t app:2.0 .

If we check the disk usage once more, we can see build-cache was created:
$ docker system df
  1. TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE  
  2. Images         2          0          109.3MB    109.3MB (100%)  
  3. Containers     0          0          0B         0B  
  4. Local Volumes  0          0          0B         0B  
  5. Build Cache    11         0          8.949kB    8.949kB  
To remove the build cache, we can use the following command:
$ docker builder prune
WARNING! This will remove all dangling build cache.
Are you sure you want to continue? [y/N]
 y
Deleted build cache objects:
rffq7b06h9t09xe584rn4f91e
ztexgsz949ci8mx8p5tzgdzhe
3z9jeoqbbmj3eftltawvkiayi
Total reclaimed space: 8.949kB

Cleaning Everything at Once
As we saw in the examples above, each of the container, image and volume commands provides the prune subcommand to reclaim disk space. The prune subcommand is available at the Docker’s system-level so it reclaims all the unused disk space at once:
$ docker system prune
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all dangling images
- all dangling build cache
Are you sure you want to continue? [y/N]

Running this command once in a while to clean up the disk is a good habit to have.

[ 文章收集 ] Pandas dataframe filter with Multiple conditions

  Source From  Here Preface Selecting or filtering rows from a dataframe can be sometime tedious if you don’t know the exact methods and how...