Preface
I was asking around to see if anyone knew a good, short explanation of Ruby’s object and method dispatch system the other day, and the response from several people was, “no, you should write one.” So, here we are. I’m going to explain how Ruby’s object system works, including method lookup, inheritance, super calls, classes, mixins, and singleton methods. My understanding comes not from reading the MRI source but from reimplementing this system, once in JavaScript and once in Ruby. If you want to read a minimal but almost correct implementation that Ruby gist is not a bad place to start.
Because I’ve not actually read the source, this will explain what happens logically but it might not be what actually happens inside of Ruby. It’s just a model you can use to understand things.
How Ruby method dispatch works
Right, let’s start at the start. You can build almost all of Ruby’s object system out of Module. Think of a module as a bag of methods. For example, module A contains methods foo and bar.
When you write def foo ... end inside a Ruby module, you are adding that method to the module, that’s all. Now, a module can have any number of ‘parents’:
- require "A"
- module B
- include A
- def hello
- puts "B's hello"
- end
- def bye
- puts "B's bye"
- end
- end
Now, a module can have many parents, and they form a tree. Take these modules:
- module C
- include B
- def start
- puts "C's start"
- end
- def stop
- puts "C's stop"
- end
- end
- module D
- include A
- include C
- end
An important concept that affects how methods are dispatched is a module’s ‘ancestry’. You can ask a module for its ancestors and it will give you an array of modules:
The important thing about this list is that it’s flat, rather than being a tree. It determines the order that we search modules in to find a method. To build this list, we start at D and run a depth-first right-to-left search of its tree. This is why the order of include calls is important: a module’s parents are ordered and this determines the order they are searched in.
When we want to dispatch a method, we look at each one of a module’s ancestors in turn, and stop at the first module that contains a method with the name we want. If none of the modules contain this method, we perform the search again but this time looking for the method called method_missing. If none of the modules contain that method, we throw a NoMethodError.
We can use Ruby’s reflection capabilities through instance_method to find out which method will be used when we invoke certain names:
)#foo>
>> D.instance_method(:hello)
=> #
>> D.instance_method(:start)
=> #
An UnboundMethod is just an object representing a method from a module, before it’s been bound to an object. When you see D(A)#foo, it means D has inherited the #foomethod from A. If you dispatch #foo to an object that includes D, you’ll get the method defined in A.
Speaking of objects, why haven’t we made any yet? What good is a bag of methods will no objects to invoke them on? Well, that’s where Class comes in. In Ruby, Class is a subclass of Module, which sounds weird but just remember they’re data structures that hold methods. A Class is like a Module, in that it’s a thing that stores methods and can include other modules, but it also has some additional capabilities, the first of which is that it can create objects.
- class K
- include D
- end
- k = K.new
)#start>
This shows that when we invoke k.start, we’ll get the #start method from module C. You’ll notice that while calling instance_method on a module gets us anUnboundMethod, calling method on an Object gets us a Method. The difference is that a Method is bound to an object; it’s a callable that, when you invoke #call on it, will do the same thing as calling k.start. UnboundMethod cannot be called directly since they have no object to be invoked on.
So it looks like we dispatch method calls by finding the class the object belongs to, then looking through that class’s ancestors until we find a matching method. That’s almost true, but Ruby has another trick up its sleeve: singleton methods. You can add new methods to any object, and only that object, without adding them to a class. See:
We can add them to modules too, since modules are just another kind of object:
When a Method‘s name has a dot (.) instead of a hash (#) in it, it means the method exists only on that object instead of being contained in a module. But we said earlier that modules are the thing Ruby uses to store methods; plain old objects don’t have this power. So where are singleton methods stored?
Every object in Ruby (and remember, modules and classes are objects too) has what’s called a metaclass, also known as a singleton class, eigenclass or virtual class. The job of this class is simply to store the object’s singleton methods; by default it contains no methods and has the object’s class as its only parent. So for our object k, its full ancestor tree looks like this:
We can ask Ruby for an object’s metaclass, and reflect on it just like any other. Here we see the metaclass is an anonymous Class attached to the object k(Object.singleton_class), and it has an instance method #mart that doesn’t exist in the K class.
One gotcha to look out for is that metaclasses don’t appear in their own #ancestors lists, but you should think of them being in their for the purposes of finding methods. When we invoke methods on k, it asks its metaclass to find the method, and this uses the metaclass’s ancestry to locate the required method. Singleton methods live in the metaclass itself, so they are preferred over methods inherited from the object’s class or any of its ancestors.
Now we come to the second special property of classes, beyond their ability to create objects. Classes have a special form of inheritance called ‘subclassing’. Every class has one and only one superclass, the default being Object. In terms of method lookup, you can think of a superclass as just being the class’s first parent module:
So Foo.ancestors gives us [Foo, Extras, Bar] in both cases, and this determines method lookup order as usual. (Actually it gives us [Foo, Extras, Bar, Object, Kernel, BasicObject] but we’ll get to those letter modules in a minute.) Note that Ruby violates the Liskov substitution principle by not allowing classes to be given to include; only modules can be used this way, not their subtypes. The above snippet simply expresses what subclassing means for method lookup, and the code on the right will not run if Bar is a Class.
If subclassing is the same as including, why do we need it at all? Well, it does one extra thing: classes inherit their superclass’s class methods, but not those of included modules.
We can model this in terms of parent relationships by saying that the subclass’s metaclass has the superclass’s metaclass as a parent:
And indeed if we reflect on Foo we see that its #bar method originates from Bar‘s metaclass.
We’ve seen how inheritance and method lookup in Ruby can be modelled as a tree of modules, with include and subclassing creating various parent relationships. This describes single and multiple inheritance of instance and singleton methods pretty well. Now let’s look at a few things that piggy-back on this model.
The first is the Object#extend method. Calling object.extend(M) makes the methods in module M available on object. It doesn’t copy the methods, it just adds M as a parent of the object’s metaclass. If object has class Thing, we get this relationship:
So extending an object with a module is just the same thing as including that module in the object’s metaclass. (Actually there are some differences but they’re not relevant to the present discussion.) Given this tree, we see that when we invoke methods on object, the lookup process will prefer methods contained in M to those defined in Thing, and will prefer methods defined directly in the object’s metaclass over both of them.
This context is important: we cannot say methods in M take precedence over Thing in general, only when we’re talking about method calls to object. The method receiver’s ancestry is what’s important, and this shows up when we investigate how super works. Take this set of modules:
- module X
- def call ; [:x] ; end
- end
- module Y
- def call ; super + [:y] ; end
- end
- class Test
- include X
- include Y
- end
To dispatch the method, we invoke the first method in this list. If that method calls super, we jump to the second, and so on until we run out of methods to invoke. If Testdidn’t include module X, there would be no implementations of #call after the one from Y so that call to super would fail.
Sure enough, in our case Test.new.call returns [:x, :y].
We’re almost done, but I promised I’d explain what Object, Kernel and BasicObject are. BasicObject is the root class of the whole system; it’s a Class with no superclass.Object inherits from BasicObject, and is the default superclass of all user-defined classes. The difference between the two is that BasicObject has almost no methods defined in it, while Object has loads: core Ruby methods like #==, #__send__, #dup, #inspect, #instance_eval, #is_a?, #method, #respond_to?, and #to_s. Well, actually it doesn’t have all those methods itself, it gets them from Kernel. Kernel is just the module with all Ruby’s core object methods in it. So when we map out Ruby’s core object system we get the following:
This shows the core modules and classes in Ruby: BasicObject, Kernel, Object, Module and Class, their metaclasses, and how they are all related. Yes,BasicObject.singleton_class.superclass is Class. Ruby does some voodoo internally to make this circular relationship work. Anyway, if you want to understand Ruby method dispatch, just remember:
Supplement
* Stackoverflow - Get the name of the currently executing method in Ruby
* Blog - Ruby’s define_method, method_missing, and instance_eval
沒有留言:
張貼留言