2017年7月21日 星期五

[ Python 常見問題 ] Pandas - Adding new column to existing DataFrame in Python pandas

Source From Here 
Question 
I have the following indexed DataFrame with named columns and rows not- continuous numbers: 
  1.           a         b         c         d  
  2. 2  0.671399  0.101208 -0.181532  0.241273  
  3. 3  0.446172 -0.243316  0.051767  1.577318  
  4. 5  0.614758  0.075793 -0.451460 -0.012493  
I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame). 
  1. 0   -0.335485  
  2. 1   -1.166658  
  3. 2   -0.385571  
  4. dtype: float64  
How-To 
Use the original df indexes to create the series: 
>>> import pandas as pd 
>>> datas = [{'a':1, 'b':2, 'c':3}, {'a':4, 'b':5, 'c':6}, {'a':7, 'b':8, 'c':9}] 
>>> df = pd.DataFrame(datas) 
>>> df // Check content 
a b c 
0 1 2 3 
1 4 5 6 
2 7 8 9
 
>>> import numpy as np 
>>> df['e'] = pd.Series(np.random.randn(df.shape[0]), index=df.index) // Add column 'e' 
>>> df 
a b c e 
0 1 2 3 -0.271183 
1 4 5 6 -0.853137 
2 7 8 9 1.444576

Some reported to get the SettingWithCopyWarning with this code. However, the code still runs perfect with the current pandas version 0.16.1. The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead: 
>>> df.loc[:,'f'] = pd.Series(np.random.randn(df.shape[0]), index=df.index)

In fact, this is currently the more efficient method as described in pandas docs. Currently the best method to add the values of a Series as a new column of a DataFrame could be using assign: 
>>> pd.__version__ 
u'0.20.3' 
>>> df.assign(g = lambda x: x.a * 10) 
a b c e f g 
0 1 2 3 -0.271183 0.647920 10 
1 4 5 6 -0.853137 0.564964 40 
2 7 8 9 1.444576 -1.576647 70
 
>>> df 
a b c e f 
0 1 2 3 -0.271183 0.647920 
1 4 5 6 -0.853137 0.564964 
2 7 8 9 1.444576 -1.576647

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...