Source From Here
Question
What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.
HowTo
What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.
HowTo
- >>> import pandas as pd
- >>> df = pd.DataFrame({'A':range(10), 'B':range(10)})
- >>> df
- A B
- 0 0 0
- 1 1 1
- 2 2 2
- 3 3 3
- 4 4 4
- 5 5 5
- 6 6 6
- 7 7 7
- 8 8 8
- 9 9 9
- >>> df.apply(lambda r: print(r), axis=0)
- 0 0
- 1 1
- 2 2
- 3 3
- 4 4
- 5 5
- 6 6
- 7 7
- 8 8
- 9 9
- Name: A, dtype: int64
- 0 0
- 1 1
- 2 2
- 3 3
- 4 4
- 5 5
- 6 6
- 7 7
- 8 8
- 9 9
- >>> import numpy as np
- >>> def shuffle(df, n=1, axis=0):
- ... df = df.copy()
- ... for _ in range(n):
- ... df.apply(np.random.shuffle, axis=axis)
- ... return df
- ...
- >>> shuffle(df)
- A B
- 0 3 8
- 1 0 3
- 2 7 0
- 3 2 7
- 4 9 9
- 5 8 1
- 6 6 4
- 7 1 5
- 8 5 6
- 9 4 2
- >>> df
- A B
- 0 0 0
- 1 1 1
- 2 2 2
- 3 3 3
- 4 4 4
- 5 5 5
- 6 6 6
- 7 7 7
- 8 8 8
- 9 9 9
沒有留言:
張貼留言