Source From Here
QuestionWorking with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories. There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.
My (simplified) dataframe looks like this:
- import pandas as pd
- import numpy as np
- df = pd.DataFrame(data=[
- ['bar', 'some string', 0.13],
- ['foo', 'alt string', -0.8],
- ['bar', 'another str', 0.7],
- ['foo', 'some text', -0.2],
- ['foo', 'more text', -0.5]],
- columns=['source', 'text', 'sent']
- )
My expected output will look like this:
- source count mean_sent
- -----------------------------
- foo 3 -0.5
- bar 2 0.415
You can use groupby with aggregate:
- df.groupby('source') \
- .agg({'text':'size', 'sent':'mean'}) \
- .rename(columns={'text':'count','sent':'mean_sent'}) \
- .reset_index()
沒有留言:
張貼留言