2021年3月2日 星期二

[ Python 常見問題 ] Pandas - aggregate count distinct (pandas.Series.nunique)

 Source From Here

Question
Let's say I have a log of user activity and I want to generate a report of total duration and the number of unique users per day:
  1. import numpy as np  
  2. import pandas as pd  
  3.   
  4. df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02''2013-04-02'],  
  5.     'user_id': ['0001''0001''0002''0002''0002'],  
  6.     'duration': [3015201530]})  
Aggregating duration is pretty straightforward:


What I'd like to do is sum the duration and count distinct at the same time, but I can't seem to find an equivalent for count_distinct:
  1. agg_sum_dist_df = group.aggregate({ 'duration': np.sum, 'user_id': count_distinct})  
Below works, but surely there's a better way, no?


HowTo
Check pandas.Series.nunique




沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...