Python augmented assignment (+=) (__iadd__)

Another wonderful day in python land and another wonderful inconsistency in the language (at least in my opinion).

You might say hey take a look at that sweet += operator, it sure is a nice concise way to write x = x + y. This is correct sometimes and incorrect other times depending on the mutability of the structure you are working on in python. (In ruby however it always works the same way.)

Here is the SO post on it: http://stackoverflow.com/questions/2347265/why-does-behave-unexpectedly-on-lists

Here is my quick little demo in python:

# Lists with __add__
s = [1]
r = s
s = s + [2]

print s # => [1, 2]
print r # => [1]   ... reasonable

# Lists with __iadd__
w = [1]
t = w
w += [2]

print w  # => [1, 2]
print t  # => [1, 2] ... what?!, mostly same as .extend()

# Tuples with __iadd__
u = (1,)
p = u
u += (2)

print u # => (1, 2)
print p # => (1)   ... reasonable

If you try similar things in ruby you will get consistent results. Their (+=) operator is just shorthand for x = x +y everywhere no inconsistencies as far as i can tell.

MRO with bases and metaclasses

Once upon a time i wanted to have an object inherit something from its metaclass that had a conflicting entry somewhere in its MRO. Turns out that class level attributes on metaclasses are resolved after the entire set of normal bases. Here is a small example illustrating that point.


class A(object):
  c = 1
  
class B(type):
  c = 2
  d = 5
  
class C(A):
  __metaclass__ = B
  
class D(C)
  __metaclass__ = B
  
C.c # => 1
C.d # => 5
D.c # => 1
D.d # => 5

delattr(A, 'c')
C.c # => 2
C.d # => 5
D.c # => 2
D.d # => 5

Interesting. So if you wanted to force an override of something in a class from a metaclass you would need to write it on the class or update the getattribute function carefully

Ruby’s singleton class and python’s metaclass

A couple posts back i was kind of amazed that class methods in python collide with instance methods of the same name despite being “classmethods”. I did a quick check in Ruby and found that what i was used to calling “classmethods” were really just “singleton methods”. Needless to say i never really looked into how my class methods worked before in ruby, never really had to as things just worked tm. However let’s go over how they work and see how we can get them to work for us in Python.

Note: In ruby 1.9+ metaclasses are called singleton_classes. They are also know as eigenclasses in other languages.

Ruby’s singleton class

First i highly recommend reading these three posts in order to truly understand the metaclass class in ruby

  1. http://rubymonk.com/learning/books/4-ruby-primer-ascent/chapters/39-ruby-s-object-model/lessons/131-singleton-methods-and-metaclasses
  2. http://yehudakatz.com/2009/11/15/metaprogramming-in-ruby-its-all-about-the-self/
  3. http://yugui.jp/articles/846

Mind blown right? Also pretty amazing how well abstracted ruby made their metaclasses.

The important part of it though was that when method definitions are created on an “instance” the methods get written as instance methods on the metaclass and in ruby that is handled for you under the covers.

class A

  p self  # A -- This is the class A
  p self.singleton_class?  # false -- It is NOT a singleton_class.metaclass


  # Defining a method on this class is actually writing to the
  # metaclass but its abstracted away for you
  def self.bar
    'bar'
  end

end

p A.singleton_class  # #<class A>  
p A.singleton_class == A  # False 
p A.methods(false)  # [:bar]
p A.singleton_class.instance_methods - Object.methods # [:bar]

You can see from the output that when you write a method to “self” inside the class body what you are really doing under the covers is writing an instance method to the class’s singleton/metaclass. 

This is how ruby is able to prevent “class methods” from colliding with instance methods. It allows also you to call “class methods” before the class is finished being defined.

Python’s metaclass

Now in python metaclasses are mysterious, hard to understand, and almost never used by normal developers. 

To quote Tim Peters:

Metaclasses are deeper magic than 99% of users should never worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).

Just look at the top rated SO post: http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python

I believe this is why most people don’t use them. The language doesn’t abstract them well and the power is lost in the confusion. Well no more!

Making metaclasses work like ruby

If you want to make use of metaclasses in python and gain that wonderful declarative class behavior that ruby gives you (instead of being handicapped with the defaults in python) you can do it but it takes some work.

  1. Create a metaclass factory class that can generate metaclasses on demand
  2. When you define a class create a metaclass with your factory
  3. Assign it to the __metaclass__ variable
  4. Use the instance as you would the class

class metaclass(object):
    def __new__(cls, name, bases):
        type_name = "{0}_meta".format(name)
        metas = [type(base) for base in bases]
        metas = [meta for index, meta in enumerate(metas)
                 if not [later for later in metas[index + 1:]
                         if issubclass(later, meta)]]

        return type(type_name, tuple(metas), {})


class Mixin(object):
    self = metaclass('Mixin', [object])
    __metaclass__ = self

    def foo(mcs):
        print mcs

    self.foo = foo  # This could be done in a decorator


class TestClass(Mixin):
    self = metaclass('TestClass', [Mixin])
    __metaclass__ = self 
    self.wee = 1
    self.foo()  # Not bound correctly (functools needed)

Is it a good idea in python?

I don’t really think so. The language isn’t really setup for anything more than a one off usage of this tool. 

  1. vars() and dirs() don’t report what you might expect (missing anything defined on the metaclasses)
  2. You have to consciously manage two inheritance hierarchies (class and its metaclass)
  3. You can get something similar (albeit uglier but simpler) with class decorators and some conventions.
  4. Its hard to find people that understand this as “python developers”

The majority of why I was looking into it was to avoid included:

  • Correctly isolating my class variables.
  • Being able to call class setup methods in the class body
  • Preventing having to create a class decorator that calls a method for class setup.

However it would be an amazing language feature to have namespaced class methods, be able to do initialization of the class inside the class and have the other tools in the language work with it.

Inherited class variables vs cached instance variables

Despite classes and instance both being objects they don’t both benefit from inheritance in the same way. Classes (which are instances of some metaclass) only have a list of “bases” that are shared amongst all subclasses. Instances of a class when instantiated have their own unique instances of their super-classes. However what if you wanted to get the instance inheritance behavior of variables loaded into classes?

Creating inherited non-shared variables for classes

There seems to be two ways of going about doing this:

  1. Manually managing memory in subclasses (copying) on declaration
  2. Converting class variables to instance variables and caching on the class at instantiation time

The first method i will call “inherited class variables” and the second i will call “cached instance variables”.

Inherited class variables

To create inherited class variables you need to manually copying memory from the super classes you care about. Doing this is tricky since in many cases you want a deep copy (which python provides copy.deepcopy) and may not be compatible with everything you would want to store in a class variable. Also if there was any side effect of creating the variables you will only see that happen on the base which might be confusing in some cases. 

  • Allows for inherited variable behavior on classes
  • Copies are created immediately after class definition
  • Can be used with class directly
  • Requires calls on class after class has been defined (bottom of file or metaclass/class decorator)

Cached instance variables

Cached instance variables essentially reuse object instantiation to generate the variables and then leave it up to the programmer to intelligently cache and trigger side effects as necessary. This gives the programmer flexibility to choose what to copy and how to initialize the variables for a subclass but increases the complexity of the __init__ process for instances.

  • Allows for inherited variable behavior on classes
  • How/what and side effects are controlled with first object instantiation
  • Can’t be used with class directly until an instance is created.

Python class methods are bound instance methods…

Why did anyone think it was ok to transform python class methods into bound instance methods?

When i think “class method” i think of a method that is bound to a given class that you must reference from a class. You should not be able to call that method from an instance and it should not conflict with instance methods of the same name.

class A(object):
  
  def m(self):
    return self

  @classmethod
  def m(cls):
    return cls


a = A()

print a.m()  # This will print the cls, not the instance

Thats not explicit, not expected, and not cool. Why?

  1. The class methods may have collisions with instance methods
  2. There is no way to tell when you are calling a class method or instance method just by looking at the code.

Python getting classmethod references

Classmethod decorator and getattr

Classmethods were a bit mysterious to me when i started working with them in python. If you got them via the normal method (getattr) you got  something that was statically bound to the class where it was declared.

class Foo(object):
  @classmethod
  def meth(cls):
    pass
    
getattr(Foo, 'meth')  # Returns bound method instance...

For a while I thought that python bound the classmethod when it was declared/decorated and there wasn’t a way to get access to a pure classmethod with dynamic binding.

Classmethods via constructor call

However sometime later I noticed that if I didn’t create the classmethod with a decorator iI could get access to a true dynamic classmethod instance.

class Foo(object):
  @staticmethod
  def meth(cls):
    pass
    
m = getattr(Foo, 'meth') 
classmethod(m)  # Creates proper classmethod instance 

After realizing that there was such a thing as dynamic classmethod instances I figured there was just some magic going on with getattr that was assuming I wanted a bound method and not the classmethod instance.

Classmethods via vars (or __dict__)

Indeed if you go direct (not through the “descriptors”) via vars or __dict__ you can get the classmethod instances directly.


class Foo(object):

  @classmethod
  def meth(cls):
    pass
    
print vars(Foo)['meth'] # Returns proper classmethod instance 

So there you go. If you wanted to get access to those classmethods in their unbound form you can grab them directly off the __dict__ or through the vars call.

Python MRO vs Ruby ancestors

Ruby and python both have a method for resolving the linearization of ancestors/superclasses however they act differently and surprisingly in many cases.

Shared beginnings

Here is a seemingly simple use case and its resolution order in python

class Mixin1(object):
  def m(self):
    return "Mixin1"

class C(Mixin1):
  def m(self):
    return "C"

class B(Mixin1):
  pass

class A(B,C):
  pass

import inspect
print inspect.getmro(A)
print A().m()

# MRO: A, B, C, Mixin1
# method m(): "C"
    

In ruby the ancestor order is the same:

module Mixin1
  def m
    "Mixin1"
  end
end

module C
  include Mixin1

  def m
    "C"
  end

end

module B
  include Mixin1
end

class A
  include C
  include B
end

p A.ancestors
p A.new.m()

# ancestors: A,B,C Mixin1, etc
# method m(): "C"

From the looks of it seems like you just interchange mixins for multiple inheritance and be on your way. You would be right in most part but there are some things to consider like order disagreements.

Ruby allows “serious order disagreements”

To see the difference between the two systems you have to dig a little deeper. 

module Mixin1
  def m
    "Mixin1"
  end
end

module Mixin2
  def m
    "Mixin2"
  end
end

module C
  include Mixin1
  include Mixin2
end

module B
  include Mixin2
  include Mixin1
  
  def mixin_1_first
    raise "Error!" unless self.m == 'Mixin1'
  end

end

class A
  include C
  include B
end

class Test
  include B
end

p A.ancestors
t = Test.new
a = A.new
p a.m()
t.mixin_1_first  # No exception
a.mixin_1_first  # Exception


# [A, B, C, Mixin2, Mixin1, Object, Kernel, BasicObject]    
# "Mixin2"

You can see that we have a order disagreement between module C and B. Ruby’s implementation picks C’s dependency order (mixin2 first and then mixin1) over Bs. If B relied on that order in any of the mixed in methods things would now stop working or worse (keep working but incorrectly). You can see this in action with the exceptions on the last line.

Python MRO prevents “serious order disagreements”

Python got MRO “Wrong” twice in the past and they seem to have settled on the C3 algorithm. The C3 algorithm explicitly prevents the errors found in the ruby example by raising an error. 


class Mixin1(object):
  def m(self):
    print "Mixin1"

class Mixin2(object):
  def m(self):
    print "Mixin2"


class C(Mixin2, Mixin1):
  pass

class B(Mixin1, Mixin2):
  pass

class A(B,C):
  pass

import inspect
print inspect.getmro(A)
A().m()

# TypeError: Error when calling the metaclass bases
#    Cannot create a consistent method resolution
#    order (MRO) for bases Mixin1, Mixin2
    

Awesome! I love it when the language prevents surprising results from happening.

There are still gotchas in python

However C3 can’t guarantee non surprising behavior when dealing with Mixins (perhaps it isn’t possible).

class Mixin1(object):
  def m(self):
    return "Mixin1"

class Mixin2(object):
  def m(self):
    return "Mixin2"

class Mixin3(object):
  def m(self):
    return "Mixin3"


class C(Mixin1, Mixin3):
  def m(self):
    return "C"

class B(Mixin2, Mixin1):
  pass

class A(B,C):
  pass

import inspect
print inspect.getmro(A)
print A().m()

# MRO: A, B, Mixin2, C, Mixin1, Mixin3
# Method m(): "Mixin2"

This is a correct ordering and its valid but this might not be what I would expect as a programmer in A. In order to reason about this code I have to load the entire class hierarchy into my head. Nothing as simple as “B and then C” works in this case. 

In fact this issue is worse than it appears above because it also creates an implicit dependency on the order of the mixins for all subclasses.

For example if as the maintainer of B you decided that you want to change the ordering of Mixin1 and Mixin2 as they have no impact on B itself you might change the MRO for anything that subclasses B. 

So try switching the order:

class B(Mixin1, Mixin2):
  pass

and then run your code again:

import inspect
print inspect.getmro(A)
print A().m()

# MRO: A, B, C, Mixin1, Mixin2, Mixin3
# Method m(): "C"

You will see that method m has changed, the MRO has COMPLETELY changed and all because of a relatively innocuous order change in B.

What about in ruby?

module Mixin1
  def m
    "Mixin1"
  end
end

module Mixin2
  def m
    "Mixin2"
  end
end

module Mixin3
  def m
    "Mixin3"
  end
end

module C
  include Mixin3
  include Mixin1
  def m
    "C"
  end
end

module B
  include Mixin1 # Switch the order here to see the same results as python
  include Mixin2
end

class A
  include C
  include B
end

class Test
  include B
end

p A.ancestors
t = Test.new
a = A.new
p a.m()

# Before # Ancestors: A, B, Mixin2, C, Mixin1, Mixin3 # Method m(): "Mixin2"

# After
# Ancestors: A, B, C, Mixin1, Mixin2, Mixin3
# Method m(): "C"

Ruby has the same resolution order as python in this case and I believe the issue to be an intrinsic surprise/limitation of using mixins/multiple inheritance.

What to do to avoid the problem?

Don’t create cycles with your mixins or classes. Aka no diamonds.


#  Good State 1
#  C------------> B ----------> A
#  |-> Mixin1     |-> Mixin3    |-> (Don't include Mixin1-3)
#  |-> Mixin2

# Bad State 2
#  C------------> B ----------> A
#  |-> Mixin1     |-> Mixin3    |-> (Still don't include Mixin1-3)
#  |-> Mixin2
#  |-> Mixin3
    

# Fixed Stated 2 by removing diamond
#  C------------> B ----------> A
#  |-> Mixin1     |-> Removed   |-> (Still don't include Mixin1-3)
#  |-> Mixin2
#  |-> Mixin3

The issue in both cases (multiple inheritance and mixins) is that you are no longer working with a dependency “tree" but a "graph". The key criteria is that there is a simple cycle has been introduced into the resolution ordering and this creates the complex and unintuitive MROs seen above.  

In new style classes there is a trivial simple cycle as all classes inherit from the “object” type but given that it is always the last item on the MRO unless you are doing something really weird you shouldn’t have to worry about it.

MRO in detail:

If you are in python you should read and understand the MRO algorithm. Perhaps even implement the algorithm for yourself so you understand deeply the details of how classes get resolved. 

http://www.phyast.pitt.edu/~micheles/mro.pdf or 

https://www.python.org/download/releases/2.3/mro/

Then i would then advise you play with super() and understand how you can override the default MRO method called when you need to.

Postgresql backup plan

Here is a quick note on a postgresql backup plan

First read this excellent post on how Heroku does their postgresql backups: https://devcenter.heroku.com/articles/heroku-postgres-data-safety-and-continuous-protection

Second let’s define what we are trying to achieve. 

We would like to have a reliable low cost method for attaining automated backups on a regular basis. This means that we will always want a binary replicationrunning. Running logical backups will happen less frequently and on a standby as they are relatively costly operations but still good to have if you need to be able to repair a corrupted database manually.

Retention Plan:

  1. Daily backups retained for 1 week (7 backups)
  2. Weekly backups retained for 1 month ( 4 backups)
  3. Monthly backups for a quarter (3 backups)
  4. Yearly backups for 2 years (2 backups)

To keep things simple and achieve high redundancy each retention line will be unique. This means that even though you could program one of the daily backups to act as the most recent weekly backup we will not. For additional simplicity weekly backups will simply mean every Sunday, monthly will simply mean at the end of the current month, quarterly will be every 3 months and yearly every 4 quarters. This allows for some variation between backups on leap years but keeps all the backup labels very simple.

This plan gives you a rather clean setup of backups that are easy to reason about:

  • daily: mon, tues, wed, thurs, fri, sat, sun
  • weekly: sun #1, sun #2, sun #3, sun #4
  • Monthly: Jan 31st, Feb 28th (or 29th), March 31st
  • Quarterly: end of Q1, end of Q2, end of Q3, end of Q4
  • Yearly: 2013, 2012

Backup types

We will want to create two kinds of backups:

  • Logical (created with pg_dump)
  • Binary (created with pg_basebackup)

Logical backups will take place weekly and above on the slave. While the binary replication will take place on separate hard drives on the master and will use the full retention plan noted above. Binary backups will be also be retained locally for 1 week on the master (in addition to being available on shared storage).

Failure cases to check

  1. Binary corruption via either software or hardware
  2. Logical corruption usually via software (deleting wrong rows etc)
  3. Physical failure (hard drive crashes, power goes out)

Why daily binary replication on the master?

First we want to make sure that the replication is occurring off the main disks used for table space on the master. If we have this flexibility then

  • Usually binary replication provides faster recovery at the cost of space.
  • Most disaster recovery occurs on a relatively recent basis
  • If the data is logically corrupted any replicated standby will also be logically corrupted so having backup on standby provides little value in this case.
  • If the data is corrupted due to physical failure a standby can spinup to master without need for binary backup.

Why weekly logical backups on slaves?

Logical backups are expensive, mess with the operating system cache and aren’t needed in all but the most catastrophic database failures. Keeping them is good insurance but their load can be pushed onto a slave instead.

Python: Broken imports

One of things I really like about ruby is the auto-import system and global namespace. With python it can be implemented but it’s not pythonic and it’s not “given” to you by the language.

While exploring the subtitles of the import system I noticed a slight difference between two types of similar import statements.

First form

import a
a.A

Second form

from a import A

Example:

Define a module “a” that imports module “b” and then defines a class A:

#In module a 
import b 
class A(object): 
  pass 
  

In module “b” we try to access “a” and its class A via the two import methods:

  #In module b 
  from a import A # This produces a import error 
  import a 
  a.A # This produces an attribute error 
  
  class B(object): 
    pass 

In the example above when using the “import a; a.A” import you get a reference to the partially constructed module “a” that doesn’t yet have the class A defined yet. However when you use the “from a import A” the python prevents you from even getting the partially constructed “a” at all. 

Conclusion

Its a little frustrating that what seems like a innocuous shorthand notation has dramatic implications for importing and circular imports. 

In the first case you can work with the module “a” in method/function bodies and the in the second case you can’t load your program at all. 

I would prefer that if the import system be consistent and either enforce modules be completely constructed or allow partially constructed modules with the shorthand form. One potential solution without giving up the shorthand notation and retaining consistency is to define your own importer and override the built-in one when using “from” lists.

Notes:

  • When you call “import a.b.c”, a gets initialized, then b then c. There are no requirements that a or b finish initialization before continuing though.
  • When you call “from a.b import c” a gets initialized, b gets initialized and must finish initialization before continuing on to “c”.
  • When you call __import__(‘a.b.c’) the initialization order is still a, b,c however “a” is what is returned from the call not “a.b.c”
  • When you call __import__(‘a.b.c’, fromlist=[‘a_method’]) the same initialization order occurs however “a.b.c” is returned not “a”.

python mixins

There seems to be at least five different ways to implement mixins in python. 

Multiple inheritance

When you go searching for python mixins the first thing you will see are posts utilizing multiple inheritance. You essentially just create multiple classes inherit from them all and gain access to the superset of functionality. Here is a SO post that goes over this approach:

http://stackoverflow.com/questions/533631/what-is-a-mixin-and-why-are-they-useful 

that references 

http://werkzeug.pocoo.org/docs/wrappers/#mixin-classes

Pros:

  • It works without any additional code

Cons:

  • If you have a large inheritance hierarchy with multiple mixins or mixins inheriting from other classes you quickly get into a situation where its very difficult to reason about the control flow of your program.

Single Inheritance

Ruby implements mixins without resorting to multiple inheritance. In short they create “proxy classes” that are inserted above the current class in the inheritance hierarchy. Since Ruby doesn’t support multiple inheritance all mixins are placed linearly into the hierarchy and makes it relatively simple to reason about and provides the expected functionality in most cases

Here is a older post demonstrating it in Ruby: http://chadfowler.com/blog/2009/07/08/how-ruby-mixins-work-with-inheritance/

Here is a SO post on how you might create a similar thing in Python:http://stackoverflow.com/questions/4139508/in-python-can-one-implement-mixin-behavior-without-using-inheritance

Pros:

  • Removes the possibility of highly complex class hierarchies
  • instanceof will allow you to detect mixins
  • Easily extend mixin behavior in your classes

Cons:

  • Requires custom implementation that fundamentally changes the expectations around inheritance in python.  

Method & property grafting 

In the javascript world it is fairly common to see functions getting grafted onto an existing object. In python you can do something similar and create a class level decorator to help with it.

Here is a SO post that goes over that technique http://stackoverflow.com/questions/4139508/in-python-can-one-implement-mixin-behavior-without-using-inheritance

Pros:

  • Intuitive to understand

Cons:

  • Lose the ability to do isinstance checks natively (if that is important to you)
  • Difficult to implement overridden methods that reference mixed in functionality by the same name
  • Requires many function pointer references as each class will require pointers to the mixed in functionality
  • Copy by value attributes will not be shared with other classes with the same mixin as the grafting processes creates unique copies as part of the mixin process. 

Mixin delegation

If you are familiar with the Ruby module included call you will know that when a module is included in another the mixin module is delegated to so it may augment the base class as it sees fit. You could quite easily design a decorator that simply called a known method on the mixedin class passing it the current base class

Here are the ruby docs: 

http://www.ruby-doc.org/core-2.1.2/Module.html#method-i-included

Here is an abstraction around this in ActiveSupport

http://www.fakingfantastic.com/2010/09/20/concerning-yourself-with-active-support-concern/

Pros

  • Allows the Mixin to dictate how it will get mixed in removing the problematic decision making in naive method/property grafting above.

Cons

  • Requires mixin designers to implement how and what should be mixed in.

Method missing (__get_attr__) loading

While i haven’t seen an example of this proposed online yet you could conceivably create an implementation of __get_attr__ that would look up method definitions and properties from other objects when not defined in the current class.

Pros:

  • Could provide explicit load orders for mixins

Cons:

  • Requires construction of metaclass to define __getattr__ at class level for class level methods that need to be mixed in

Alternatives to Mixins: Composition

Instead of doing mixins you could just do composition, creating objects that represent the functionality you want and calling to those instances to do the work on behalf of the class that needs shared functionality.

Note: Composition is often used in the “has-a” context where inheritance is more of a “is-a” relationship. This may or may not be what you are looking for.

Here is an example showing the difference between inheritance and composition: 

http://eflorenzano.com/blog/2008/05/04/inheritance-vs-composition/