2021年3月10日 星期三

[ Python 常見問題 ] Pandas - Groupby: Count and mean combined

 Source From Here

Question
Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories. There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.

My (simplified) dataframe looks like this:
  1. import pandas as pd  
  2. import numpy as np  
  3.   
  4. df = pd.DataFrame(data=[  
  5.     ['bar''some string'0.13],  
  6.     ['foo''alt string',  -0.8],  
  7.     ['bar''another str',  0.7],  
  8.     ['foo''some text',   -0.2],  
  9.     ['foo''more text',   -0.5]],  
  10.     columns=['source''text''sent']  
  11. )  


My expected output will look like this:
  1. source    count     mean_sent  
  2. -----------------------------  
  3. foo       3         -0.5  
  4. bar       2         0.415  
HowTo
You can use groupby with aggregate:
  1. df.groupby('source') \  
  2.        .agg({'text':'size''sent':'mean'}) \  
  3.        .rename(columns={'text':'count','sent':'mean_sent'}) \  
  4.        .reset_index()  



沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...