2020年3月11日 星期三

[ 常見問題 ] Concatenate strings from several rows using Pandas groupby

Source From Here
Question
I want to merge several strings in a dataframe based on a groupedby in Pandas.

This is my code so far:
  1. from io import StringIO  
  2.   
  3. data = StringIO("""  
  4. "name1","hej","2014-11-01"  
  5. "name1","du","2014-11-02"  
  6. "name1","aj","2014-12-01"  
  7. "name1","oj","2014-12-02"  
  8. "name2","fin","2014-11-01"  
  9. "name2","katt","2014-11-02"  
  10. "name2","mycket","2014-12-01"  
  11. "name2","lite","2014-12-01"  
  12. """)  
  13.   
  14. # load string as stream into dataframe  
  15. df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])  
  16.   
  17. # add column with month  
  18. df["month"] = df["date"].apply(lambda x: x.month)  


I don't get how I can use groupby and apply some sort of concatenation of the strings in the column "text". Any help appreciated!

How-To
You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:
  1. df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))  


  1. df = df[['name','text','month']].drop_duplicates()  


Actually I can just call apply and then reset_index:
  1. df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()  

This message was edited 7 times. Last update was at 11/03/2020 19:28:44

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...