2018年7月25日 星期三

[ Python 常見問題 ] Pandas - how do you filter pandas dataframes by multiple columns

Source From Here 
Question 
To filter a dataframe (df) by a single column, if we consider data with male and females we might: 
  1. males = df[df[Gender]=='Male']  
Question 1 - But what if the data spanned multiple years and i wanted to only see males for 2014? In other languages I might do something like: 
  1. if A = "Male" and if B = "2014" then   
(except I want to do this and get a subset of the original dataframe in a new dataframe object) 

Question 2. How do I do this in a loop, and create a dataframe object for each unique sets of year and gender (i.e. a df for: 2013-Male, 2013-Female, 2014-Male, and 2014-Female 

How-To 
Using & operator, don't forget to wrap the sub-statements with (): 
  1. males = df[(df[Gender]=='Male') & (df[Year]==2014)]  
To store your dataframes in a dict using a for loop: 
  1. from collections import defaultdict  
  2. dic={}  
  3. for g in ['male', 'female']:  
  4.   dic[g]=defaultdict(dict)  
  5.   for y in [2013, 2014]:  
  6.     dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict  


沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...