Source From Here
Question
I want to apply scaling (using StandardScaler) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not what I want:
A "solution" I found online is:
How do I apply scaling to the pandas dataframe, leaving the dataframe intact? Without copying the data if possible.
How-To
You could convert the numpy array to as DataFrame as below:
Now fit_transform the DataFrame to get the scaled_features array:
Will get:
Assign the scaled data to a DataFrame (Note: use the index and columns keyword arguments to keep your original indices and column names):
Came across the sklearn-pandas package. It's focused on making scikit-learn easier to use with pandas. sklearn-pandas is especially useful when you need to apply more than one type of transformation to column subsets of the DataFrame, a more common scenario. It's documented, but this is how you'd achieve the transformation we just performed:
I want to apply scaling (using StandardScaler) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not what I want:
- features = df[["col1", "col2", "col3", "col4"]]
- autoscaler = StandardScaler()
- features = autoscaler.fit_transform(features)
- features = features.apply(lambda x: autoscaler.fit_transform(x))
How-To
You could convert the numpy array to as DataFrame as below:
Now fit_transform the DataFrame to get the scaled_features array:
- from sklearn.preprocessing import StandardScaler
- scaled_features = StandardScaler().fit_transform(df.values)
- print(scaled_features[:3,:] #lost the indices)
- array([[-1.89007341, 0.05636005, 1.74514417, 0.46669562],
- [ 1.26558518, -1.35264122, 0.82178747, 0.59282958],
- [ 0.93341059, 0.37841748, -0.60941542, 0.59282958]])
- scaled_features_df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)
- from sklearn_pandas import DataFrameMapper
- mapper = DataFrameMapper([(df.columns, StandardScaler())])
- scaled_features = mapper.fit_transform(df.copy(), 4)
- scaled_features_df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)
This message was edited 3 times. Last update was at 12/09/2020 15:17:04
沒有留言:
張貼留言