Question
Consider I have a dataframe as below:
- >>> import pandas as pd
- >>> import numpy as np
- >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['f1', 'f2'], index=['r1', 'r2'])
- >>> df
- f1 f2
- r1 1 2
- r2 3 4
How should I efficiently calculate the absolute different between row r1 and r2 and create another row as r3 to keep the result. That is to say the result will look like:
- >>> for cn in df.columns:
- ... diff_dat.append(abs(df[cn]['r1'] - df[cn]['r2']))
- ...
- >>> diff_dat
- [2, 2]
- >>> df.append(pd.DataFrame([diff_dat], index=['r3'], columns=df.columns))
- f1 f2
- r1 1 2
- r2 3 4
- r3 2 2
You can do this:
- In [576]: df.append(df.diff().dropna().abs())
- Out[583]:
- f1 f2
- r1 1.0 2.0
- r2 3.0 4.0
- r2 2.0 2.0
- df.loc['r3'] = (df.loc['r1'] - df.loc['r2']).abs()
- print (df)
- f1 f2
- r1 1 2
- r2 3 4
- r3 2 2
- np.random.seed(123)
- df = pd.DataFrame(np.random.randint(10, size=(2, 1000)), index=['r1', 'r2']).add_prefix('f')-5
- #Mayank Porwal solution
- In [40]: %timeit df.append(df.diff().dropna().abs())
- 1.51 ms ± 19.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- #jezrael solution
- In [41]: %timeit df.loc['r3'] = (df.loc['r1'] - df.loc['r2']).abs()
- 663 µs ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- #NaT3z solution
- In [42]: %timeit df.loc["r3"] = df.apply(lambda c: abs(c["r1"] - c["r2"]), axis=0)
- 967 µs ± 80.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
- In [49]: %timeit df.loc['r3'] = np.abs(df.loc['r1'].values - df.loc['r2'].values)
- 414 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Supplement
* Pandas Doc - Indexing and Selecting Data
沒有留言:
張貼留言