Saturday, September 11, 2010

Adventures in Ruby Land: a tutorial on ruby

  • Updated: 15-Sep-2010; section on modules and constants
escher like hands drawing hands
A Penrose Impossible Triangle around which circle the 3 main objects in ruby land
This is a tutorial on the ruby programming language and how it's structured which I'll refer to as ruby land. Ruby is a very dynamic programming language with a particular emphasis on object oriented programming but with some nice functional aspects.
This article might be useful to anyone who wants to get a bit of a roadmap on how ruby is structured. It could also be wrong. But if it's wrong, I'd like to think that it might still help to provide a framework or starting point from which to get the right picture.
You should be familiar with things like object oriented programming, basic ruby programming and irb (ruby's repl) which is very useful for trying things out as you think and experiment.

In the beginning... there was Object

The first thing you need to know in ruby is that everything is an object. Even classes are objects.
So what is an object? An object is an instance of a class. Objects don't contain methods - these are defined by the class of the object1. All an object does is keep track of the particular state of a particular instance of that class 2. Consequences of this will be discussed further on.
1 Ok, we could say that classes "have" or probably better, "define" instance methods. But these methods are intended for instances of the class. However an object that is a class has a class in turn which defines instance methods for it.
2 state is stored in instance variables in ruby; these are variables that start with a single `@` in their name.

Class and Superclass

We said everything in ruby is an object and this can be seen by inspecting the class method. Every object has one, even the classes because every object is an instance of some class when you get down to it.
Let's create a class:
class Foo
end
Now let's create an object:
foo = Foo.new
Now we can look at the class of foo
foo.class # => Foo
This tells us that foo's class is Foo. foo gets its "instance methods" from Foo. But we can also look at the class of Foo
Foo.class # => Class
They're both objects and objects must have a class that they belong to.
Objects that have a class of Class are clearly special objects because they are classes and they can be used to instantiate new instances (foo = Foo.new).
  • We'll call any object with a class of Class, a class-like object.
  • We'll call any other object as an instance or object instance.
  • We'll refine this a little when we get to modules.

Superclass

Another thing that makes class-like objects stick out from their peers is the fact that they have a superclass. This is no surprise. If you're a class, you're probably a subclass of some other class. That's how classes work. Instances of a class may have access not only to the methods of their class, but also any superclasses of that class.
Back to our example of Foo
  Foo.superclass # => Object
But foo, being just a lowly object instance, doesn't:
  foo.superclass # => Error!
We could make a new class SubFoo and make it a subclass of Foo like this:
  class SubFoo < Foo
  end
Now we have:
  SubFoo.superclass # => Foo
  SubFoo.superclass.superclass # => Object
In the land of class-like objects and by extension, the instances that are built from them, all roads eventually lead back to Object.

Instance Land

Here is Instance Land:
ruby instance land
This is the land of object instances. foo is an instance of Foo (foo = Foo.new); bar is an instance of Bar. obj is an instance of Object (obj = Object.new).
A lot of the basic work gets done in Instance Land because ruby encourages you to define and then instantiate classes like Foo in order to create object instances that store specific state information that can be readily modified as your program goes about solving its problems.

Class Land

Now let's add in Class Land. Like the men in Plato's cave staring at the shadows cast on the wall by the fire behind them, the object instances in Instance Land look up to Class Land, the land of Platonic ideas. This is where we define the idea of a Dog, or an Animal, or in our case a Foo or a Bar. Once those ideas have been fashioned we can then instantiate them to create object instances.
ruby class land
We draw the dotted lines from object instances to the classes in Class Land which define them. These dotted lines represent the class relationship that stamps a particular object instance as being an instance of a certain class.
In Class Land we can also trace out the relationships that classes maintain between themselves. The thick black lines with white arrow tips are the superclass lines that tell us that a certain class is a subclass of another.
SubFoo is a subclass of Foo. We write it like this:
  class SubFoo < Foo
  end
  SubFoo.superclass # => Foo
  Foo.superclass # => Object
But, this being ruby, there is yet another level. A level that the class-like objects themselves look up to. Welcome to...

Strange Land

Strange Land is strange. We have come to the axis of the ruby world; this is the backbone on which ruby is built. To stretch the Plato cave allegory just a little bit too far, it is as if we climbed out of the cave, past the fire and up into the daylight to see the real world. There are essentially three main protagonists: Class which is a subclass of Module, Module, which is a subclass of Object and of course Object itself.
ruby strange land
The class-like objects in Class Land look up to these objects in Strange Land the way object instances in Instance Land look up to classes in Class Land. (Well, excluding Object which we include in Class Land to keep the diagram slightly this side of sane.)
The denizens of Strange Land are class-like objects. Their class is Class. Even Class has a class of Class. And as just noted they form a class hierarchy that can be explored using superclass.
What makes things slightly strange is that Object is at the top of the tree but its class is one of its subclasses Class.
escher like hands drawing hands
Object has class of Class; Class has superclass of Object. A small homage to M.C Escher's "Drawing Hands".
Object itself is a funny beast. As noted earlier, all roads lead back to Object because, well, everything is an object.
self is always the same in Object
  class Object
    def self.self1
      self
    end
    def self2
      self
    end
  end
  Object.self1==Object.self2  
Object allows you to extend it with methods which will then become available to all other objects at all levels of ruby land.
  class Object
    def everywhere
      "hi I'm #{self}"
    end
  end
Whilst Object defines the notion of class, Class gives it the ability to instantiate objects and the idea of class hierarchies (superclass).

Module Land: Modules and instance methods

We need to briefly step down from Strange Land to look at modules.
Module Land is a strange little adjunct to Class Land. Members of Module Land have a class of Module, not Class. They are instances of a thing that precedes the notion of Class itself. So the denizens of Module Land can't be instantiated like Foo or Bar. This is because in Strange Land, Module sits in-between Object and Class and appears to represent a primordial Class; a thing that is not able to confer the ability of instantiation upon its instances, but which encapsulates the idea of instance methods.
Unlike Object where self is always the same, self is most definitely not always the same in an instance of Module. This is because Module introduces the idea of instance methods, methods that are defined in a class-like object but which are used only by instances of that class-like object.
  m = Module.new do
    # Singleton method
    def self.self1
      self
    end
    # Instance method
    def self2
      self
    end
    self
  end
  m == m.self1       # => true
  # At this point is already clear self1 and self2
  # are not the same because self2 does not apply to m.
  # Instead we could try to add self2 to some other object:
  o = Object.new.extend(m)
  o.self2 == m.self1 # => false
In the above, self1 is a singleton method which can be called on m - more on singleton methods later. self2 cannot be called on m because self2 is an instance method. The only we can use this is to extend another object with it or include it into a class-like object at which point self2 will return the self of that object.
So modules (the instances of Module) can't create instances of themselves but they can provide instance methods to be mixed in to other classes or used to extend other objects.
Classes (instances of Class) inherit the idea of instance methods from modules but go one step further, allowing themselves to be instantiated so that these instance methods can be used.

Modules and Constants

Ruby constants are an interesting construct. Use a variable name that starts with a capital letter and you have a constant, a very different thing to a non-capitalised variable. Constants can be seen by the rest of your program.
You can create constants as easy as this
Foo=1
You can't nest constants in constants:
Foo::Bar=2  # Error
makes no sense. This is where modules come in again. You can set constants inside a module and access them accordingly. Instances of Module have instance methods for handling constants contained by a module.
If our module is Bar, we can define a constant inside it:
module Bar
  Foo=1
end
Bar::Foo # => 1
The module that houses the constant doesn't have to be referenced by a constant:
m = Bar
m::Foo # => 1
Here m is a local variable.
You can see the instance methods for handling constants that Module defines for its instances like this:
irb(main):013:0> Module.instance_methods(false).sort.grep(/const/)
=> [:const_defined?, :const_get, :const_missing, :const_set, :constants]
For example:
irb(main):019:0> m.constants
=> [:Foo]
Modules and constants work hand in hand to allow you to house and namespace your code. Because classes are instances of a subclass of Module, they inherit the same capabilities.

Where do class-like objects get their methods from?

Modules and classes define instance methods which ultimately get used by instances of classes. So where do classes and modules themselves as objects get their instance methods from?
Take for example the include method. This is a private instance method of Module:
    irb(main):010:0> Module.private_instance_methods(false).sort
    => [:alias_method, :append_features, :attr, :attr_accessor,
    :attr_reader, :attr_writer, :define_method, :extend_object,
    :extended, :include, :included, :initialize, :initialize_copy,
    :method_added, :method_removed, :method_undefined,
    :module_function, :private, :protected, :public,
    :remove_const, :remove_method, :undef_method]
What does that mean?
It means that instances of Module (ie modules) will have an include method. Because Class is a subclass of Module, instances of Class (ie classes in Class Land) will also have include.
But because Module has a class of Class, it too will have include as a method, inherited, oddly enough, from itself since Class is a subclass of Module1. :)
Perhaps another name for Strange Land might be "Metaclass Land".
1Note, the underlying implementation of ruby (in C) may be somewhat more straightforward (I don't actually know). We are just looking at the "logic" that ruby presents to us as ruby programmers.

Singleton Methods

It gets stranger.
Recall that we said that everything is an object and that objects don't contain their own methods1 but derive their methods from the class that they belong to.
1 Class-like objects "contain" instance methods, but these are for their instances.
Well, if you've programmed for any time in ruby you might be aware that you can add singleton methods to objects:
  foo = Foo.new
  foo2 = Foo.new
  def foo.a
    'a'
  end
  foo.a # => 'a'
  foo2.a # => Error!
foo and foo2 are both instances of Foo and have the instance methods that are defined by Foo. But foo now has a method that foo2 does not.
We can do the same thing with class-like objects of course:
  def Foo.a
    'a'
  end
  Foo.a # => 'a'
Often, singleton methods for classes are defined like this:
  class Foo
    def self.a
      'a'
    end
  end
Singleton methods for class are like class methods in other programming languages like java.

Extend and include

As a small digression:
Yet another way to add singleton methods to an object is extend. extend is a method defined by ruby's Kernel module which is mixed in to Object and so is available to any other object in ruby land. extend takes the instance methods defined by a module and adds them as singleton methods to the object being extended.
Compare this to include which is a private instance method defined by Module which injects instance methods into a class-like object.
So where do these mysterious singleton methods belong if objects always derive their "instance methods" from a class?

Behold, Shadow Land!

Well, the shadowy answer to this conundrum is that they get it from a class, just not the usual class that we've been talking about up till now.

Virtual Classes

In fact, this is going to come as a shock, but every object in ruby land has two classes. It has a class - the bright shiny platonic things in Class Land we discussed earlier. But also something called a virtual class (some people refer to it as an eigenclass), a shadowy denizen that resides in a kind of bottomless underworld of usually unnamed and often unseen classes.
shadow land
The dotted lines are `class`; the white arrow lines are `superclass`.
One way to think about virtual classes is that every object in ruby, class-like or instance or whatever, has its own special, unique class that shadows it and can be called upon to stash instance methods that are unique to that object. Such methods are best referred to as singleton methods.

Getting the virtual class

Getting the virtual class is a tricky business. (It may have gotten less tricky in recent times - I don't know - I'm just referring to the state of play with the now bewhiskered ruby 1.8.6 as an example).
One way to get it is by using ruby's class syntax in a form that is both totally ungoogleable and rather unintuitve. We can open one up like this:
  class << Foo
    ... do something with Foo's eigenclass ...
  end
The self inside this class-statement is our virtual class.
We can make life easy for ourselves here and stash a method directly in Object to get the above self
  class Object
    def eigenclass
      class << self; self; end
    end
  end
Adding a method to Object will make it accessible to all other objects in ruby land; self in Object is always the same. This technique was mentioned some years ago on the ruby mailing list.
Back to our example of def Foo.a, the singleton method we defined above. We note that:
  Foo.instance_methods(false).sort
does not show a; but
  Foo.eigenclass.instance_methods(false).sort
does.
You'll note that Shadow Land has several layers in the above diagram. In fact those layers just keep going down:
  irb(main):015:0> Object.class
  => Class
  irb(main):016:0> Object.eigenclass
  => #<Class:Object>
  irb(main):017:0> Object.eigenclass.eigenclass
  => #<Class:#<Class:Object>>
  irb(main):018:0> Object.eigenclass.eigenclass.eigenclass
  => #<Class:#<Class:#<Class:Object>>>
  ...
And so it goes...

Practicalities

So how does mapping out ruby land help in the real world?
Well, if we look back at the above map of ruby land we can see that it is segmented into the different "lands" and that this separation revolves around the act of instantiation.
We can see, for instance, that
  Class.new
will give use an anonymous Class-like object in class land. We can further see that
  Class.new.new
will give us an anonymous instance of an anonymous class.
We can also create anonymous modules:
  m = Module.new
and it should be clear that modules can't be instantiated any further.
We can see as a result that:
  class Foo
  end
is just a nice way to assign a new class to the constant Foo
  Foo = Class.new
Similarly for modules:
  module M
  end
becomes
  M = Module.new
We're really only a hop, skip and a jump away from meta programming - building classes and object oriented structures on the fly.
We can also see that to understand ruby to any deep level including metaprogramming we must first inspect and appreciate the 3 central objects that form the backbone of ruby land: Object, Module and Class and the roles they play.

Final Thoughts...

Personally, I think one can go a little too far by viewing the world exclusively in an object oriented way. This is a static world where things like to be isolated and named and allotted associated behaviours and where potentially they can be specialisations (subclass) of more general things. Ruby sits in an interesting space with its highly dynamic approach to this static world.

Final notes

A word on variables and scope

There are at least 5 different types of variables. Ruby's scoping rules vary depending on the type.
  • Globals
    • These start with a $
    • Ruby has many in-built globals
    • Globals are accessible everywhere
  • Constants
    • Any variable name that starts with a capital becomes a constant
    • like globals, constants can be seen anywhere
    • however, constants can be nested
    • if you define a constant within the context of a module or class, that constant will be nested within that context
          class Foo
            Bar='bar'
          end
          Foo::Bar # => 'bar'
      
    • constants are usually camel-cased where as instance, class and local variables are usually underscored
    • constants are often used to store modules or classes
  • Instance variables
    • Any variable that starts with @ is an instance variable.
    • Every object will have a bunch of these.
    • One gotcha for ruby novices is to assume that @inst1 is an instance variable of an instance of Foo (eg Foo.new).
        class Foo
          attr_reader :inst1
          @inst1='inst1'
        end
      
      It is not.
        Foo.new.inst1 # => nil
      
      We can show that it is an instance variable of the object Foo by defining an instance method to access it in Foo's virtual class:
        class << Foo
          attr_reader :inst1
        end
        Foo.inst1 # => 'inst1'
      
  • Class variables
    • Any variable that starts with @@ is an class variable.
    • class variables allow instances of a class to access state that is associated with that class; these variables are per-class whereas instance variables are per-instance.
  • local variables
    • anything that isn't prefixed by @ or @@ or $ or a capital letter is a local variable (or possibly the name of a method available within the context you are working in)

Ruby 1.9

Ruby 1.9 introduced a superclass to Object called BasicObject. I've omitted it here.

No comments: