Source From Here
Preface
Most Python programmers don’t spend a lot of time thinking about how equality and hashing works. It usually just works. However there’s quite a bit of gotchas and edge cases that can lead to subtle and frustrating bugs once one starts to customize their behavior – especially if the rules on how they interact aren’t understood.
Object Equality
Equality in Python is more complicated than most people realize but at its core you have to implement a __eq__(self, other) method. It should return either a boolean value if your class knows how to compare itself to other or NotImplemented if it doesn’t. For inequality checks using !=, the corresponding method is __ne__(self, other).
By default, those methods are inherited from the object class that compares two instances by their identity – therefore instances are only equal to themselves. A common mistake in Python 2 was to override only __eq__() and forget about __ne__(). Python 3 is friendly enough to implement an obvious __ne__() for you, if you don’t yourself.
Object Hashes
An object hash is an integer number representing the value of the object and can be obtained using the hash() function if the object is hashable. To make a class hashable, it has to implement both the __hash__(self) method and the aforementioned __eq__(self, other) method. As with equality, the inherited object.__hash__ method works by identity only: barring the unlikely event of a hash collision, two instances of the same class will always have different hashes, no matter what data they carry.
Since this is usually good enough, most Pythonistas don’t realize there’s even a thing called hashing until they try to add an unhashable object into a set or a dictionary:
So hashes are important because sets and dictionaries use them for their lookup tables to quickly find their keys. To do that effectively, they make an important assumption that leads to our first gotcha:
So if you decide to do the perfectly sensible thing and define the equality and hash of your object by the hash and equality of a tuple of the instance’s attributes, you have to make sure those attributes never change lest weird things happen:
Although our mutated c clearly is in both d and s, Python claims it never heard of it! This explains why all immutable data structures like tuples or strings are hashable while mutable ones like lists or dictionaries aren’t.
To make matters even more confusing, creating an object with the same hash value will also not work because Python is going to throw a call to __eq__ into the mix and C(1) is clearly not equal to C(2):
Why the equality check? As we’ve established before, a hash is an integer. And even though we have 64 bits to splurge on modern architectures, there’s still the possibility that two objects have the same hash. Given this behavior we’ve found another assumption made by sets and dictionaries:
In other words: if x == y it must follow that hash(x) == hash(y). Since that’s not true in our case, we can’t access that object by its hash anymore.
What Does All of This Mean?
You can’t base your hash on mutable values. If an attribute can change in the lifetime of an object, you can’t use it for hashing or very funky things happen. Generally speaking, immutable objects are the cleanest approach and they come with many other upsides your FP-loving friends will happily explain to you at length.
Practically speaking though, that’s not always possible – for performance reasons alone. Python just isn’t conceived with immutability in mind like, say Clojure.
Hashes can be less picky than equality checks. Since key lookups are always followed by an equality check, your hashes don’t have to be unique. That means that you can compute your hash over an immutable subset of attributes that may or may not be a unique “primary key” for the instance.
You can take this property to town by returning a constant hash and make the set or dictionary work purely on equality checks. However that would regress them into a list and lead to terrible performance.
You shouldn’t compare by value but hash by identity. This approach fails to take the second assumption into account. That said, it usually works perfectly fine because it takes a rather unlikely hash collision to become a problem.
However it’s still a violation of the contract with the Python runtime and may lead to problems; albeit only possibly in the future. Python feels so strongly about this that as of Python 3, it automatically makes classes unhashable if you implement __eq__ but not __hash__. Python 2 lets you happily shoot off your foot.
This is a blog to track what I had learned and share knowledge with all who can take advantage of them
標籤
- [ 英文學習 ]
- [ 計算機概論 ]
- [ 深入雲計算 ]
- [ 雜七雜八 ]
- [ Algorithm in Java ]
- [ Data Structures with Java ]
- [ IR Class ]
- [ Java 文章收集 ]
- [ Java 代碼範本 ]
- [ Java 套件 ]
- [ JVM 應用 ]
- [ LFD Note ]
- [ MangoDB ]
- [ Math CC ]
- [ MongoDB ]
- [ MySQL 小學堂 ]
- [ Python 考題 ]
- [ Python 常見問題 ]
- [ Python 範例代碼 ]
- [心得扎記]
- [網路教學]
- [C 常見考題]
- [C 範例代碼]
- [C/C++ 範例代碼]
- [Intro Alg]
- [Java 代碼範本]
- [Java 套件]
- [Linux 小技巧]
- [Linux 小學堂]
- [Linux 命令]
- [ML In Action]
- [ML]
- [MLP]
- [Postgres]
- [Python 學習筆記]
- [Quick Python]
- [Software Engineering]
- [The python tutorial]
- 工具收集
- 設計模式
- 資料結構
- ActiveMQ In Action
- AI
- Algorithm
- Android
- Ansible
- AWS
- Big Data 研究
- C/C++
- C++
- CCDH
- CI/CD
- Coursera
- Database
- DB
- Design Pattern
- Device Driver Programming
- Docker
- Docker 工具
- Docker Practice
- Eclipse
- English Writing
- ExtJS 3.x
- FP
- Fraud Prevention
- FreeBSD
- GCC
- Git
- Git Pro
- GNU
- Golang
- Gradle
- Groovy
- Hadoop
- Hadoop. Hadoop Ecosystem
- Java
- Java Framework
- Java UI
- JavaIDE
- JavaScript
- Jenkins
- JFreeChart
- Kaggle
- Kali/Metasploit
- Keras
- KVM
- Learn Spark
- LeetCode
- Linux
- Lucene
- Math
- ML
- ML Udemy
- Mockito
- MPI
- Nachos
- Network
- NLP
- node js
- OO
- OpenCL
- OpenMP
- OSC
- OSGi
- Pandas
- Perl
- PostgreSQL
- Py DS
- Python
- Python 自製工具
- Python Std Library
- Python tools
- QEMU
- R
- Real Python
- RIA
- RTC
- Ruby
- Ruby Packages
- Scala
- ScalaIA
- SQLAlchemy
- TensorFlow
- Tools
- UML
- Unix
- Verilog
- Vmware
- Windows 技巧
- wxPython
訂閱:
張貼留言 (Atom)
[Git 常見問題] error: The following untracked working tree files would be overwritten by merge
Source From Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 # git clean -d -fx 方案2: 今天在服务器上 gi...
-
前言 : 為什麼程序管理這麼重要呢?這是因為: * 首先,本章一開始就談到的,我們在操作系統時的各項工作其實都是經過某個 PID 來達成的 (包括你的 bash 環境), 因此,能不能進行某項工作,就與該程序的權限有關了。 * 再來,如果您的 Linux 系統是個...
-
屬性 : 系統相關 - 檔案與目錄 語法 : du [參數] [檔案] 參數 | 功能 -a | 顯示目錄中個別檔案的大小 -b | 以bytes為單位顯示 -c | 顯示個別檔案大小與總和 -D | 顯示符號鏈結的來源檔大小 -h | Hum...
-
來源自 這裡 說明 : split 是 Perl 中非常有用的函式之一,它可以將一個字串分割並將之置於陣列中。若無特別的指定,該函式亦使用 RE 與 $_ 變數 語法 : * split /PATTERN/,EXPR,LIMIT * split /...
沒有留言:
張貼留言