2021年3月2日 星期二

[ Python 常見問題 ] Pandas - How to collect frequency from two columns of a dataframe

 Question

Consider we have below Dataframe:


Two questions here:
* Question1
How to create another column hp_rank to map column horsepower to High (150+)Medium (100-150) and Low (0-100)

* Question2
How to count frequency of the combination of column hp_rank and drive-wheels to show below table


HowTo
For Question1, the intuitive approach will look like:
  1. from collections import defaultdict  
  2.   
  3. data['hp_rank'] = data.apply(  
  4.     lambda r: 'Low (0-100)' if r.horsepower <= 100 else 'Medium (100-150)' if r.horsepower <= 150 else 'High (150+)',  
  5.     axis=1  
  6. )  
A better approach is by using API pandas.cut:
  1. bin_labels = ['Low (0-100)''Medium (100-150)''High (150+)']  
  2. data['hp_rank'] = pd.cut(  
  3.     data['horsepower'],  
  4.     bins=[0100150, data['horsepower'].max()],  
  5.     labels=bin_labels  
  6. )  
Regarding the Question2, you can leverage another API pandas.pivot_table:
  1. data['value'] = 1  
  2. pd.pivot_table(data, values='value', columns=['drive-wheels'], index='hp_rank', fill_value=0, aggfunc=np.sum)  



沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...