2019年5月10日 星期五

[ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

Source From Here 
Question 
How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: 
>>> import pandas as pd 
>>> df = pd.DataFrame({'countries':['US', 'UK', 'Germany', 'China'], 'population':[100, 200, 300, 400]}) 
>>> countries = ['UK', 'China']

My current way of doing this is as follows: 
  1. rows = []  
  2. for ri, row in df.iterrows():  
  3.     if row.countries in countries:  
  4.         rows.append(ri)  
  5.   
  6. print(df.loc[rows])  
How-To 
You can use pd.Series.isin: 
* For "IN": somewhere.isin(list_of_place) 
* For "NOT IN": ~somewhere.isin(list_of_place)

As an example: 
  1. >>> df.head()  
  2.   countries  population  
  3. 0        US         100  
  4. 1        UK         200  
  5. 2   Germany         300  
  6. 3     China         400  
  7. >>> countries  
  8. ['UK', 'China']  
  9. >>> df[df.countries.isin(countries)]  
  10.   countries  population  
  11. 1        UK         200  
  12. 3     China         400  
  13. >>> df[~df.countries.isin(countries)]  
  14.   countries  population  
  15. 0        US         100  
  16. 2   Germany         300  

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...