2020年11月1日 星期日

[ Python 常見問題 ] Pandas recalculate index after a concatenation

 Source From Here

Question
I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

Each of the constituent dataframes has an autogenerated index (ascending numbers). After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing. Here is what I tried to do:
  1. train_df = pd.concat(train_class_df_list)  
  2. train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])  
It failed with "cannot reindex from a duplicate axis.I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

HowTo
After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:
  1. train_df.reset_index(drop=True)  
you can do this in place using inplace=True:
  1. >>> import pandas as pd  
  2. >>> cat_df = pd.concat([pd.DataFrame({'a':[1,2]}), pd.DataFrame({'a':[1,2]})])  
  3. >>> cat_df.index  
  4. Int64Index([0101], dtype='int64')  
  5. >>> cat_df.reset_index(drop=True, inplace=True)  
  6. >>> cat_df.index  
  7. RangeIndex(start=0, stop=4, step=1)  
  8. >>> cat_df  
  9.    a  
  10. 0  1  
  11. 1  2  
  12. 2  1  
  13. 3  2  


沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...