2019年5月10日 星期五

[ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

Source From Here 
Question 
How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: 
>>> import pandas as pd 
>>> df = pd.DataFrame({'countries':['US', 'UK', 'Germany', 'China'], 'population':[100, 200, 300, 400]}) 
>>> countries = ['UK', 'China']

My current way of doing this is as follows: 
  1. rows = []  
  2. for ri, row in df.iterrows():  
  3.     if row.countries in countries:  
  4.         rows.append(ri)  
  5.   
  6. print(df.loc[rows])  
How-To 
You can use pd.Series.isin: 
* For "IN": somewhere.isin(list_of_place) 
* For "NOT IN": ~somewhere.isin(list_of_place)

As an example: 
  1. >>> df.head()  
  2.   countries  population  
  3. 0        US         100  
  4. 1        UK         200  
  5. 2   Germany         300  
  6. 3     China         400  
  7. >>> countries  
  8. ['UK', 'China']  
  9. >>> df[df.countries.isin(countries)]  
  10.   countries  population  
  11. 1        UK         200  
  12. 3     China         400  
  13. >>> df[~df.countries.isin(countries)]  
  14.   countries  population  
  15. 0        US         100  
  16. 2   Germany         300  

沒有留言:

張貼留言

[ Py DS ] Ch5 - Machine Learning (Part2)

Source From  Here   Introducing Scikit-Learn   There are several Python libraries that provide solid implementations of a range of machin...