2017年8月18日 星期五

[Python 文章收集] Effective Python 心得筆記 - Make pickle Reliable with copyreg

Source From Here 
Preface 
這邊要介紹 copyreg 這個內建的 module (需 Python3+),搭配 pickle 使用的使用情境. 首先 pickle 使用上很簡單,假設我們有個 class: 
  1. class GameState(object):  
  2.     def __init__(self):  
  3.         self.level = 0  
  4.         self.lives = 4  
  5.   
  6. state = GameState()  
  7. state.level += 1  # Player beat a level  
  8.   
  9. state.lives -= 1  # Player had to try again  
我們可以使用 pickle 如下保存/讀取 object: 
  1. import pickle  
  2. state_path = '/tmp/game_state.bin'  
  3. with open(state_path, 'wb') as f:  
  4.     pickle.dump(state, f)  
  5.   
  6. with open(state_path, 'rb') as f:  
  7.     state_after = pickle.load(f)  
  8. # {'lives': 3, 'level': 1}  
  9.   
  10. print(state_after.__dict__)  
但是如果增加了新的 field,game_state.bin load 回來的 object 當然不會有新的 field (points),可是它仍然是 GameState 的 instance,這會造成混亂。 
  1. class GameState(object):  
  2.     def __init__(self):  
  3.         self.level = 0  
  4.         self.lives = 4  
  5.         self.points = 0  
  6.   
  7. with open(state_path, 'rb') as :  
  8.     state_after = pickle.load(f)  
  9. # {'lives': 3, 'level': 1}  
  10.   
  11. print(state_after.__dict__)  
  12. assert isinstance(state_after, GameState)  
使用 copyreg 可以解決這個問題,它可以註冊用來 serialize Python 物件的函式。 

Default Attribute Values 
pickle_game_state() 回傳一個 tuple ,包含了拿來 unpickle 的函式以及傳入該函式的引數。 
  1. import copyreg  
  2.   
  3. class GameState(object):  
  4.     def __init__(self, level=0, lives=4, points=0):  
  5.         self.level = level  
  6.         self.lives = lives  
  7.         self.points = points  
  8.   
  9. def pickle_game_state(game_state):  
  10.     kwargs = game_state.__dict__  
  11.     return unpickle_game_state, (kwargs,)  
  12.   
  13. def unpickle_game_state(kwargs):  
  14.     return GameState(**kwargs)  
  15.   
  16. copyreg.pickle(GameState, pickle_game_state)  
Versioning Classes 
copyreg 也可以拿來記錄版本,達到向後相容的目的。假設原先的 class 如下: 
  1. class GameState(object):  
  2.     def __init__(self, level=0, lives=4, points=0, magic=5):  
  3.         self.level = level  
  4.         self.lives = lives  
  5.         self.points = points  
  6.         self.magic = magic  
  7.   
  8. state = GameState()  
  9. state.points += 1000  
  10. serialized = pickle.dumps(state)  
後來修改了,拿掉 lives ,這時原先使用預設參數的做法不能用了。 
  1. class GameState(object):  
  2.     def __init__(self, level=0, points=0, magic=5):  
  3.         self.level = level  
  4.         self.points = points  
  5.         self.magic = magic  
  6.   
  7. # TypeError: __init__() got an unexpected keyword argument 'lives'  
  8.   
  9. pickle.loads(serialized)  
在 serialize 時多加上版號, deserialize 時加以判斷: 
  1. def pickle_game_state(game_state):  
  2.     kwargs = game_state.__dict__  
  3.     kwargs['version'] = 2  
  4.     return unpickle_game_state, (kwargs,)  
  5.   
  6. def unpickle_game_state(kwargs):  
  7.     version = kwargs.pop('version', 1)  
  8.     if version == 1:  
  9.         kwargs.pop('lives')  
  10.     return GameState(**kwargs)  
  11.   
  12. copyreg.pickle(GameState, pickle_game_state)  
Stable Import Paths 
重構程式時,如果 class 改名了,想要 load 舊的 serialized 物件當然不能用,但還是可以使用 copyreg 解決。 
  1. class BetterGameState(object):  
  2.     def __init__(self, level=0, points=0, magic=5):  
  3.         self.level = level  
  4.         self.points = points  
  5.         self.magic = magic  
  6.   
  7. copyreg.pickle(BetterGameState, pickle_game_state)  
可以發現 unpickle_game_state() 的 path 寫入 dump 出來的資料中,當然這樣做的缺點就是 unpickle_game_state() 所在的 module 不能改 path 了。 
  1. state = BetterGameState()  
  2. serialized = pickle.dumps(state)  
  3. print(serialized[:35])  
輸出: 
b'\x80\x03c__main__\nunpickle_game_state\nq\x00}'

Full Example 
底下為完整範例: 
- demo.py 
  1. #!/usr/bin/env python3  
  2. import copyreg  
  3. import pickle  
  4.   
  5. state_path = '/tmp/game_state.bin'  
  6. class GameState(object):  
  7.     def __init__(self, level=0, lives=4, points=0):  
  8.         self.level = level  
  9.         self.lives = lives  
  10.         self.points = points  
  11.   
  12.     def __str__(self):  
  13.         return "Level{}; Lives={}; Points={}".format(self.level, self.lives, self.points)  
  14.   
  15. def pickle_game_state(game_state):  
  16.     kwargs = game_state.__dict__  
  17.     print('Call pickle_game_state:\n{}\n'.format(kwargs))  
  18.     return unpickle_game_state, (kwargs,)  
  19.   
  20. def unpickle_game_state(kwargs):  
  21.     print('Call unpickle_game_state:\n{}\n'.format(kwargs))  
  22.     return GameState(**kwargs)  
  23.   
  24. copyreg.pickle(GameState, pickle_game_state)  
  25.   
  26. gs = GameState()  
  27. gs.points += 1000  
  28.   
  29. print('Serialized GameState...')  
  30. with open(state_path, 'wb') as fh:  
  31.     pickle.dump(gs, fh)  
  32.   
  33. print('Unserialized GameState...')  
  34. with open(state_path, 'rb') as fh:  
  35.     ngs = pickle.load(fh)  
  36.   
  37. print("New GameState:\n{}\n".format(str(ngs)))  
執行結果: 
# ./demo.py 
Serialized GameState... 
Call pickle_game_state: 
{'level': 0, 'points': 1000, 'lives': 4} 

Unserialized GameState... 
Call unpickle_game_state: 
{'level': 0, 'lives': 4, 'points': 1000} 

New GameState: 
Level0; Lives=4; Points=1000


沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...