程式扎記: [ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

2019年5月10日星期五

[ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

Source From Here
Question
How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario:

>>> import pandas as pd
>>> df = pd.DataFrame({'countries':['US', 'UK', 'Germany', 'China'], 'population':[100, 200, 300, 400]})
>>> countries = ['UK', 'China']

My current way of doing this is as follows:

view plaincopy to clipboardprint?
rows = []  
for ri, row in df.iterrows():  
    if row.countries in countries:  
        rows.append(ri)  
  
print(df.loc[rows])  

How-To
You can use pd.Series.isin:

* For "IN": somewhere.isin(list_of_place)
* For "NOT IN": ~somewhere.isin(list_of_place)

As an example:

view plaincopy to clipboardprint?
>>> df.head()  
  countries  population  
0        US         100  
1        UK         200  
2   Germany         300  
3     China         400  
>>> countries  
['UK', 'China']  
>>> df[df.countries.isin(countries)]  
  countries  population  
1        UK         200  
3     China         400  
>>> df[~df.countries.isin(countries)]  
  countries  population  
0        US         100  
2   Germany         300  

程式扎記

標籤

2019年5月10日星期五

[ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2019年5月10日 星期五

[ Python 常見問題 ] How to implement 'in' and 'not in' for Pandas dataframe

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2019年5月10日星期五