Source From Here
QuestionLet's say I have a log of user activity and I want to generate a report of total duration and the number of unique users per day:
- import numpy as np
- import pandas as pd
- df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
- 'user_id': ['0001', '0001', '0002', '0002', '0002'],
- 'duration': [30, 15, 20, 15, 30]})
What I'd like to do is sum the duration and count distinct at the same time, but I can't seem to find an equivalent for count_distinct:
- agg_sum_dist_df = group.aggregate({ 'duration': np.sum, 'user_id': count_distinct})
HowTo
Check pandas.Series.nunique
沒有留言:
張貼留言