程式扎記: [ Python 常見問題 ] Calculate the time difference between two consecutive rows in pandas

2019年1月23日星期三

[ Python 常見問題 ] Calculate the time difference between two consecutive rows in pandas

Source From Here
Question
I have a pandas dataframe as follows

view plaincopy to clipboardprint?
Dev_id     Time  
88345      13:40:31  
87556      13:20:33  
88955      13:05:00  
.....      ........  
85678      12:15:28  

The above dataframe has 83000 rows. I want to take time difference between two consecutive rows and keep it in a separate column. The desired result would be

view plaincopy to clipboardprint?
Dev_id    Time          Time_diff(in min)  
88345      13:40:31      20  
87556      13:20:33      15  
88955      13:05:00      15  

I have tried df['Time_diff'] = df['Time'].diff(-1) but getting error as shown below

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

How to solve this?

How-To
Problem is pandas need datetimes or timedeltas for diff function, so first converting by to_timedelta, then get total_seconds and divide by 60:

view plaincopy to clipboardprint?
>>> import pandas as pd  
>>> from datetime import datetime, timedelta  
>>> df = pd.DataFrame([[1, datetime.now()], [2, datetime.now()- timedelta(hours = 1)]], columns = ['id', 'time'])  
>>> df  
   id                       time  
0   1 2019-01-24 09:10:19.732798  
1   2 2019-01-24 08:10:19.732864  
>>> df['time_diff'] = df['time'].diff(-1).dt.total_seconds().div(60)  
>>> df  
   id                       time  time_diff  
0   1 2019-01-24 09:10:19.732798  59.999999  
1   2 2019-01-24 08:10:19.732864        NaN  

If want floor or round per minutes:

view plaincopy to clipboardprint?
>>> df['time'].diff(-1).dt  
 0x7ff17c4bad90>  

>>> df['time'].diff(-1).dt.floor('T')

0 00:59:00

1 NaT

>>> df['time_diff'] = df['time'].diff(-1).dt.floor('T').dt.total_seconds().div(60)

>>> df

id time time_diff

0 1 2019-01-24 09:10:19.732798 59.0

1 2 2019-01-24 08:10:19.732864 NaN

程式扎記

標籤

2019年1月23日星期三

[ Python 常見問題 ] Calculate the time difference between two consecutive rows in pandas

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2019年1月23日 星期三

[ Python 常見問題 ] Calculate the time difference between two consecutive rows in pandas

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2019年1月23日星期三