程式扎記: [ Python 常見問題 ] Pandas - How to collect frequency from two columns of a dataframe

2021年3月2日星期二

[ Python 常見問題 ] Pandas - How to collect frequency from two columns of a dataframe

Question

Consider we have below Dataframe:

Two questions here:
* Question1

How to create another column hp_rank to map column horsepower to High (150+), Medium (100-150) and Low (0-100)

* Question2

How to count frequency of the combination of column hp_rank and drive-wheels to show below table

HowTo
For Question1, the intuitive approach will look like:

view plaincopy to clipboardprint?
from collections import defaultdict  
  
data['hp_rank'] = data.apply(  
    lambda r: 'Low (0-100)' if r.horsepower <= 100 else 'Medium (100-150)' if r.horsepower <= 150 else 'High (150+)',  
    axis=1  
)  

A better approach is by using API pandas.cut:

view plaincopy to clipboardprint?
bin_labels = ['Low (0-100)', 'Medium (100-150)', 'High (150+)']  
data['hp_rank'] = pd.cut(  
    data['horsepower'],  
    bins=[0, 100, 150, data['horsepower'].max()],  
    labels=bin_labels  
)  

Regarding the Question2, you can leverage another API pandas.pivot_table:

view plaincopy to clipboardprint?
data['value'] = 1  
pd.pivot_table(data, values='value', columns=['drive-wheels'], index='hp_rank', fill_value=0, aggfunc=np.sum)  

程式扎記

標籤

2021年3月2日星期二

[ Python 常見問題 ] Pandas - How to collect frequency from two columns of a dataframe

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2021年3月2日 星期二

[ Python 常見問題 ] Pandas - How to collect frequency from two columns of a dataframe

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2021年3月2日星期二