2017年7月21日 星期五

[ Python 常見問題 ] Pandas - Adding new column to existing DataFrame in Python pandas

Source From Here 
Question 
I have the following indexed DataFrame with named columns and rows not- continuous numbers: 
  1.           a         b         c         d  
  2. 2  0.671399  0.101208 -0.181532  0.241273  
  3. 3  0.446172 -0.243316  0.051767  1.577318  
  4. 5  0.614758  0.075793 -0.451460 -0.012493  
I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame). 
  1. 0   -0.335485  
  2. 1   -1.166658  
  3. 2   -0.385571  
  4. dtype: float64  
How-To 
Use the original df indexes to create the series: 
>>> import pandas as pd 
>>> datas = [{'a':1, 'b':2, 'c':3}, {'a':4, 'b':5, 'c':6}, {'a':7, 'b':8, 'c':9}] 
>>> df = pd.DataFrame(datas) 
>>> df // Check content 
a b c 
0 1 2 3 
1 4 5 6 
2 7 8 9
 
>>> import numpy as np 
>>> df['e'] = pd.Series(np.random.randn(df.shape[0]), index=df.index) // Add column 'e' 
>>> df 
a b c e 
0 1 2 3 -0.271183 
1 4 5 6 -0.853137 
2 7 8 9 1.444576

Some reported to get the SettingWithCopyWarning with this code. However, the code still runs perfect with the current pandas version 0.16.1. The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead: 
>>> df.loc[:,'f'] = pd.Series(np.random.randn(df.shape[0]), index=df.index)

In fact, this is currently the more efficient method as described in pandas docs. Currently the best method to add the values of a Series as a new column of a DataFrame could be using assign: 
>>> pd.__version__ 
u'0.20.3' 
>>> df.assign(g = lambda x: x.a * 10) 
a b c e f g 
0 1 2 3 -0.271183 0.647920 10 
1 4 5 6 -0.853137 0.564964 40 
2 7 8 9 1.444576 -1.576647 70
 
>>> df 
a b c e f 
0 1 2 3 -0.271183 0.647920 
1 4 5 6 -0.853137 0.564964 
2 7 8 9 1.444576 -1.576647

沒有留言:

張貼留言

[ Py DS ] Ch2 - Introduction to NumPy (Part3)

Source From  Here Comparisons, Masks, and Boolean Logic This section covers the use of Boolean masks to examine and manipulate values wit...