程式扎記: [ Pandas 文章收集 ] Pandas Dataframe 隨機 shuffle 小技巧

2020年9月12日星期六

[ Pandas 文章收集 ] Pandas Dataframe 隨機 shuffle 小技巧

Source From Here

Preface
在做 learning 的時候會需要先把 pandas 的 dataframe 的 order 打亂，有幾種方法可以做到，稍微紀錄一下，我個人是比較喜歡 sklearn 的方法啦…

HowTo
以下要在 jupyter notebook 或 python script 裡執行， assume 已經安裝 pandas 並 import

mlcc 裡的方法

view plaincopy to clipboardprint?
california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index))  

sample 法

view plaincopy to clipboardprint?
california_housing_dataframe = california_housing_dataframe.sample(frac=1).reset_index(drop=True)  

sklearn shuffle 法

view plaincopy to clipboardprint?
from sklearn.utils import shuffle  
california_housing_dataframe = shuffle(california_housing_dataframe)  

Example:

>>> import pandas as pd
>>> df = pd.DataFrame({'name':['ken', 'john', 'mary'],'age':[21,37,18]}
>>> df
view plain copy to clipboard print ?
   name  age
0   ken   21
1  john   37
2  mary   18

>>> import numpy as np
>>> df.reindex(np.random.permutation(df.index))
view plain copy to clipboard print ?
   name  age
2  mary   18
0   ken   21
1  john   37

>>> df.sample(frac=1).reset_index(drop=True)
view plain copy to clipboard print ?
   name  age
0   ken   21
1  mary   18
2  john   37

>>> from sklearn.utils import shuffle
>>> shuffle(df)
view plain copy to clipboard print ?
   name  age
0   ken   21
1  john   37
2  mary   18

>>> df
view plain copy to clipboard print ?
   name  age
0   ken   21
1  john   37
2  mary   18

程式扎記

標籤

2020年9月12日星期六

[ Pandas 文章收集 ] Pandas Dataframe 隨機 shuffle 小技巧

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2020年9月12日 星期六

[ Pandas 文章收集 ] Pandas Dataframe 隨機 shuffle 小技巧

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2020年9月12日星期六