2020年3月11日 星期三

[ 常見問題 ] Concatenate strings from several rows using Pandas groupby

Source From Here
I want to merge several strings in a dataframe based on a groupedby in Pandas.

This is my code so far:
  1. from io import StringIO  
  3. data = StringIO("""  
  4. "name1","hej","2014-11-01"  
  5. "name1","du","2014-11-02"  
  6. "name1","aj","2014-12-01"  
  7. "name1","oj","2014-12-02"  
  8. "name2","fin","2014-11-01"  
  9. "name2","katt","2014-11-02"  
  10. "name2","mycket","2014-12-01"  
  11. "name2","lite","2014-12-01"  
  12. """)  
  14. # load string as stream into dataframe  
  15. df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])  
  17. # add column with month  
  18. df["month"] = df["date"].apply(lambda x: x.month)  

I don't get how I can use groupby and apply some sort of concatenation of the strings in the column "text". Any help appreciated!

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:
  1. df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))  

  1. df = df[['name','text','month']].drop_duplicates()  

Actually I can just call apply and then reset_index:
  1. df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()  

This message was edited 7 times. Last update was at 11/03/2020 19:28:44



[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...