Source From Here
Question
I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).
Each of the constituent dataframes has an autogenerated index (ascending numbers). After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.
I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing. Here is what I tried to do:
It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.
HowTo
After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:
you can do this in place using inplace=True:
I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).
Each of the constituent dataframes has an autogenerated index (ascending numbers). After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.
I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindex seems to be doing. Here is what I tried to do:
- train_df = pd.concat(train_class_df_list)
- train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])
HowTo
After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:
- train_df.reset_index(drop=True)
- >>> import pandas as pd
- >>> cat_df = pd.concat([pd.DataFrame({'a':[1,2]}), pd.DataFrame({'a':[1,2]})])
- >>> cat_df.index
- Int64Index([0, 1, 0, 1], dtype='int64')
- >>> cat_df.reset_index(drop=True, inplace=True)
- >>> cat_df.index
- RangeIndex(start=0, stop=4, step=1)
- >>> cat_df
- a
- 0 1
- 1 2
- 2 1
- 3 2
沒有留言:
張貼留言