2020年11月24日 星期二

[ 文章收集 ] Pandas dataframe filter with Multiple conditions

 Source From Here

Preface
Selecting or filtering rows from a dataframe can be sometime tedious if you don’t know the exact methods and how to filter rows with multiple conditions. In this post we are going to see the different ways to select rows from a dataframe using multiple conditions

Let’s create a dataframe with 5 rows and 4 columns i.e. Name, Age, Salary_in_1000 and FT_Team(Football Team):
  1. import pandas as pd  
  2. df=pd.DataFrame({'Name':['JOHN','ALLEN','BOB','NIKI','CHARLIE','CHANG'],  
  3.               'Age':[35,42,63,29,47,51],  
  4.               'Salary_in_1000':[100,93,78,120,64,115],  
  5.              'FT_Team':['STEELERS','SEAHAWKS','FALCONS','FALCONS','PATRIOTS','STEELERS']})  
  6. df  
Output:


Selecting Dataframe rows on multiple conditions using these 5 functions
In this section we are going to see how to filter the rows of a dataframe with multiple conditions using these five methods
a) loc
b) numpy where
c) Query
d) Boolean Indexing
e) eval

What’s the Condition or Filter Criteria ?
Get all rows having salary greater or equal to 100K and Age < 60 and Favourite Football Team Name starts with ‘S’

Using loc with multiple conditions
loc is used to Access a group of rows and columns by label(s) or a boolean array. As an input to label you can give a single label or it’s index or a list of array of labels.

Enter all the conditions and with & as a logical operator between them:
  1. df.loc[(df['Salary_in_1000']>=100) & (df['Age']< 60) & (df['FT_Team'].str.startswith('S')),['Name','FT_Team']]  
Output:


Using np.where with multiple conditions
numpy.where can be used to filter the array or get the index or elements in the array where conditions are met. You can read more about np.where in this post.

Numpy where with multiple conditions and & as logical operators outputs the index of the matching rows:
import numpy as np
  1. idx = np.where((df['Salary_in_1000']>=100) & (df['Age']< 60) & (df['FT_Team'].str.startswith('S')))  
Output:
  1. (array([05], dtype=int64),)  
The output from the np.where, which is a list of row index matching the multiple conditions is fed to dataframe loc function:
Output:


Using Query with multiple Conditions
It is used to Query the columns of a DataFrame with a boolean expression:
  1. df.query('Salary_in_1000 >= 100 & Age < 60 & FT_Team.str.startswith("S").values')  
Output:


pandas boolean indexing multiple conditions
It is a standrad way to select the subset of data using the values in the dataframe and applying conditions on it.

We are using the same multiple conditions here also to filter the rows from pur original dataframe with salary >= 100 and Football team starts with alphabet ‘S’ and Age is less than 60:
  1. df[(df['Salary_in_1000']>=100) & (df['Age']<60) & df['FT_Team'].str.startswith('S')][['Name','Age','Salary_in_1000']]  
Output:


Pandas Eval multiple conditions
Evaluate a string describing operations on DataFrame column. It Operates on columns only, not specific rows or elements:
  1. df[df.eval("Salary_in_1000>=100 & (Age <60) & FT_Team.str.startswith('S').values")]  
Output:










沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...