Postgresql backup plan

Here is a quick note on a postgresql backup plan

First read this excellent post on how Heroku does their postgresql backups: https://devcenter.heroku.com/articles/heroku-postgres-data-safety-and-continuous-protection

Second let’s define what we are trying to achieve. 

We would like to have a reliable low cost method for attaining automated backups on a regular basis. This means that we will always want a binary replicationrunning. Running logical backups will happen less frequently and on a standby as they are relatively costly operations but still good to have if you need to be able to repair a corrupted database manually.

Retention Plan:

  1. Daily backups retained for 1 week (7 backups)
  2. Weekly backups retained for 1 month ( 4 backups)
  3. Monthly backups for a quarter (3 backups)
  4. Yearly backups for 2 years (2 backups)

To keep things simple and achieve high redundancy each retention line will be unique. This means that even though you could program one of the daily backups to act as the most recent weekly backup we will not. For additional simplicity weekly backups will simply mean every Sunday, monthly will simply mean at the end of the current month, quarterly will be every 3 months and yearly every 4 quarters. This allows for some variation between backups on leap years but keeps all the backup labels very simple.

This plan gives you a rather clean setup of backups that are easy to reason about:

  • daily: mon, tues, wed, thurs, fri, sat, sun
  • weekly: sun #1, sun #2, sun #3, sun #4
  • Monthly: Jan 31st, Feb 28th (or 29th), March 31st
  • Quarterly: end of Q1, end of Q2, end of Q3, end of Q4
  • Yearly: 2013, 2012

Backup types

We will want to create two kinds of backups:

  • Logical (created with pg_dump)
  • Binary (created with pg_basebackup)

Logical backups will take place weekly and above on the slave. While the binary replication will take place on separate hard drives on the master and will use the full retention plan noted above. Binary backups will be also be retained locally for 1 week on the master (in addition to being available on shared storage).

Failure cases to check

  1. Binary corruption via either software or hardware
  2. Logical corruption usually via software (deleting wrong rows etc)
  3. Physical failure (hard drive crashes, power goes out)

Why daily binary replication on the master?

First we want to make sure that the replication is occurring off the main disks used for table space on the master. If we have this flexibility then

  • Usually binary replication provides faster recovery at the cost of space.
  • Most disaster recovery occurs on a relatively recent basis
  • If the data is logically corrupted any replicated standby will also be logically corrupted so having backup on standby provides little value in this case.
  • If the data is corrupted due to physical failure a standby can spinup to master without need for binary backup.

Why weekly logical backups on slaves?

Logical backups are expensive, mess with the operating system cache and aren’t needed in all but the most catastrophic database failures. Keeping them is good insurance but their load can be pushed onto a slave instead.

Python: Broken imports

One of things I really like about ruby is the auto-import system and global namespace. With python it can be implemented but it’s not pythonic and it’s not “given” to you by the language.

While exploring the subtitles of the import system I noticed a slight difference between two types of similar import statements.

First form

import a
a.A

Second form

from a import A

Example:

Define a module “a” that imports module “b” and then defines a class A:

#In module a 
import b 
class A(object): 
  pass 
  

In module “b” we try to access “a” and its class A via the two import methods:

  #In module b 
  from a import A # This produces a import error 
  import a 
  a.A # This produces an attribute error 
  
  class B(object): 
    pass 

In the example above when using the “import a; a.A” import you get a reference to the partially constructed module “a” that doesn’t yet have the class A defined yet. However when you use the “from a import A” the python prevents you from even getting the partially constructed “a” at all. 

Conclusion

Its a little frustrating that what seems like a innocuous shorthand notation has dramatic implications for importing and circular imports. 

In the first case you can work with the module “a” in method/function bodies and the in the second case you can’t load your program at all. 

I would prefer that if the import system be consistent and either enforce modules be completely constructed or allow partially constructed modules with the shorthand form. One potential solution without giving up the shorthand notation and retaining consistency is to define your own importer and override the built-in one when using “from” lists.

Notes:

  • When you call “import a.b.c”, a gets initialized, then b then c. There are no requirements that a or b finish initialization before continuing though.
  • When you call “from a.b import c” a gets initialized, b gets initialized and must finish initialization before continuing on to “c”.
  • When you call __import__(‘a.b.c’) the initialization order is still a, b,c however “a” is what is returned from the call not “a.b.c”
  • When you call __import__(‘a.b.c’, fromlist=[‘a_method’]) the same initialization order occurs however “a.b.c” is returned not “a”.

python mixins

There seems to be at least five different ways to implement mixins in python. 

Multiple inheritance

When you go searching for python mixins the first thing you will see are posts utilizing multiple inheritance. You essentially just create multiple classes inherit from them all and gain access to the superset of functionality. Here is a SO post that goes over this approach:

http://stackoverflow.com/questions/533631/what-is-a-mixin-and-why-are-they-useful 

that references 

http://werkzeug.pocoo.org/docs/wrappers/#mixin-classes

Pros:

  • It works without any additional code

Cons:

  • If you have a large inheritance hierarchy with multiple mixins or mixins inheriting from other classes you quickly get into a situation where its very difficult to reason about the control flow of your program.

Single Inheritance

Ruby implements mixins without resorting to multiple inheritance. In short they create “proxy classes” that are inserted above the current class in the inheritance hierarchy. Since Ruby doesn’t support multiple inheritance all mixins are placed linearly into the hierarchy and makes it relatively simple to reason about and provides the expected functionality in most cases

Here is a older post demonstrating it in Ruby: http://chadfowler.com/blog/2009/07/08/how-ruby-mixins-work-with-inheritance/

Here is a SO post on how you might create a similar thing in Python:http://stackoverflow.com/questions/4139508/in-python-can-one-implement-mixin-behavior-without-using-inheritance

Pros:

  • Removes the possibility of highly complex class hierarchies
  • instanceof will allow you to detect mixins
  • Easily extend mixin behavior in your classes

Cons:

  • Requires custom implementation that fundamentally changes the expectations around inheritance in python.  

Method & property grafting 

In the javascript world it is fairly common to see functions getting grafted onto an existing object. In python you can do something similar and create a class level decorator to help with it.

Here is a SO post that goes over that technique http://stackoverflow.com/questions/4139508/in-python-can-one-implement-mixin-behavior-without-using-inheritance

Pros:

  • Intuitive to understand

Cons:

  • Lose the ability to do isinstance checks natively (if that is important to you)
  • Difficult to implement overridden methods that reference mixed in functionality by the same name
  • Requires many function pointer references as each class will require pointers to the mixed in functionality
  • Copy by value attributes will not be shared with other classes with the same mixin as the grafting processes creates unique copies as part of the mixin process. 

Mixin delegation

If you are familiar with the Ruby module included call you will know that when a module is included in another the mixin module is delegated to so it may augment the base class as it sees fit. You could quite easily design a decorator that simply called a known method on the mixedin class passing it the current base class

Here are the ruby docs: 

http://www.ruby-doc.org/core-2.1.2/Module.html#method-i-included

Here is an abstraction around this in ActiveSupport

http://www.fakingfantastic.com/2010/09/20/concerning-yourself-with-active-support-concern/

Pros

  • Allows the Mixin to dictate how it will get mixed in removing the problematic decision making in naive method/property grafting above.

Cons

  • Requires mixin designers to implement how and what should be mixed in.

Method missing (__get_attr__) loading

While i haven’t seen an example of this proposed online yet you could conceivably create an implementation of __get_attr__ that would look up method definitions and properties from other objects when not defined in the current class.

Pros:

  • Could provide explicit load orders for mixins

Cons:

  • Requires construction of metaclass to define __getattr__ at class level for class level methods that need to be mixed in

Alternatives to Mixins: Composition

Instead of doing mixins you could just do composition, creating objects that represent the functionality you want and calling to those instances to do the work on behalf of the class that needs shared functionality.

Note: Composition is often used in the “has-a” context where inheritance is more of a “is-a” relationship. This may or may not be what you are looking for.

Here is an example showing the difference between inheritance and composition: 

http://eflorenzano.com/blog/2008/05/04/inheritance-vs-composition/

pip vs easy_install

The answer

There is a huge amount of confusion surrounding pip, easy install, disutils, setuptools, and the whole range of other tools out there for python.  If you don’t care about the history, debate, and banter I’ll list the answer first and then go into details:

The “Official” recommendation for python is currently:

  • Use pip with virtualenv to install packages
  • Use setuptools to define packages and distribute to the pypi index

Reference: https://python-packaging-user-guide.readthedocs.org/en/latest/current.html

(Updated: 2014-04-09)

How do I use them together:

Here is a recent (2013) well-written article that goes into the finer points of setuptools and pip requirements (abstract and concrete requirements). https://caremad.io/blog/setup-vs-requirement/

But setuptools is bad

For those of you still reading you might wonder how the hell setuptools is the recommended distribution tool when it is not backwards compatible with the standard lib disutils and in fact promotes using non-compatible APIs. Well its pretty simple. The standard library disutils didn’t provide a mechanism for dealing with package dependencies (a fairly critical flaw), no one fixed it in the stdlib and it looks like the python community just gave up on staying backwards compatible with “pure” python. After all most of us using python are doing so for profit and could care less about the nerd battle going on between the 5 different ways of distributing python software.

Here is a nice rant from 2008 that was from a purist (with good points) but ultimately lost: http://www.b-list.org/weblog/2008/dec/14/packaging/;

But setuptools was meant to be used with easy_install

Setuptools ships with easy_install as the package installer so how can the recommendation be to use pip? Well pip uses setuptools under the hood (except when building wheels — a topic for later) EVEN if you use disutils. I suspect the case for this is that setuptools works with disutils but not the other way around. It also provides additional dependency support (reading from requirement.txt files)

Here is an update by that same guy a day later in 2008: http://www.b-list.org/weblog/2008/dec/15/pip/

So what are we left with for people who want to build production applications that make money?

Use:

Don’t use:

  • disutils
  • distribute (merged into setuptools)
  • bento (it looks like it stalled)
  • or really anything else

Ansible: Gotchas deploying to ubuntu

Ansible (http://www.ansible.com/home) is a pretty cool new system for deploying systems. While using it i pulled out the gotchas that took up some of my time. Many are just unix admin details but some are ansible specific.

Gotcha #1: Shell Types and environment setup

Depending on what kind of shell you are using you might be surprised to learn that some of the files you expect to be sourced aren’t and your environment is messed up. Ansible has taken the stance that you should declare your environment requirements in the playbook rather than sourced shell files. However there are cases where that just isn’t possible and in that case its very important to know what gets sourced when and how to trigger different shell types.

In bash you can:

  • enable a login shell with “-l” (ell)
  • enable a interactive shell with “-i” 
  • with sudo you can enable an interactive login shell for a user with “sudo -iu username”

What scripts get read with what shell type can be found here: https://github.com/sstephenson/rbenv/wiki/Unix-shell-initialization

Gotcha #2: Command and shell are different

Ansible offers up two modules for executing arbitrary commands. The “command” module is well documented and states that it can’t handle piping, shell operators or much else. Its pretty much only good for a single non complex command. The shell command defaults to “sh” which is a POSIX standard. However for different *nix you may have different shells symlinked to that executable:

On ubuntu > 6 you have “dash” which is not “bash”. See https://wiki.ubuntu.com/DashAsBinSh

More on this here: http://stackoverflow.com/questions/5725296/difference-between-sh-and-bash

Gotcha #3: RVM

RVM is a pretty nice ruby version manager but it has some subtle issues regarding the first two gotchas.

  • RVM works best in bash or zsh which isn’t what ansible guarantees when you run a shell.
  • RVM needs its initialization script normally ~/.rvm/scripts/rvm.sh to be sourced for RVM to be added to the path
  • RVM functions like “rvm use” will not work by default on noniteractive shells even if you source the initialization script. Instead you can source a particular ruby OR use the RVM binary option
source $(rvm 2.1.2 do rvm env --path)

OR

rvm 2.12 do rvm gemset create my_gemset

More on this here: https://rvm.io/rvm/basics#post-install-configuration and here https://rvm.io/workflow/scripting

Gotcha #4: .bashrc files that return quickly for non interactive shells

Watch out for pregenerated  bashrc files that automatically return when the shell isn’t interactive. Often there is a small line at the top that reads 

[ -z "$PS1" ] && return

that will automatically return if you aren’t in an interactive shell. This will bite you if you add necessary sourced files to the bottom of your bashrc hastily.

More on this here: https://rvm.io/rvm/basics#post-install-configuration

Gotcha #5: sudoers file isn’t setup correctly

This one is pretty obvious but you should know that when you sudo anything everything from what environment is inherited to what programs you can access is determined by your /etc/sudoers file. Beyond not setting it up correctly you can also have add a type to it. If modifying it with ansible its recommended to validate it with lineinfile

More on this here: http://docs.ansible.com/lineinfile_module.html

More gotchas as i hit them will arrive here

Implications of no free will

I have recently come to believe with high probability that free will is simply a misunderstanding of the mass complexity in our universe and that free will does not exist, as it is normally defined. I rationalize my belief through induction and that the universe (macro, micro, subatomic, etc.) does not shift randomly when observed without rhyme or reason at the smallest time deltas available. 

If you have no free will, meaning that your very next action is predestined, then a seemingly logical progression is that the step following the next is also predestined. If so then everything you will ever do (no matter how many times you reflect on it … weird) have already been “done” and you are just waiting to experience it. 

However if you consider the implications of this idea, it leads to a rather confusing outcome where definitions of concepts like a soul, life, time and fate all seem to solidify.

  • Your soul then can be defined as a constantly changing but always known function of your DNA and the environment you have been exposed to so far.
  • Your life can be likened to a movie that has already been taped and that if accurately modeled enough in a computer could be fast-forwarded and predicted with 100% accuracy.
  • Time is then defined as nothing more than a measure of distance in that movie and doesn’t imply uncertainty in the future as it always has for me.
  • Fate is simply a fact.
  • What you will do is what you have already done but you are just waiting to experience it.

I find it interesting that humans have the ability to do predict things naturally (albeit limited and sometimes flawed) and that perhaps our projections of our futures (especially as we get older) may just be more reasonable than I previously thought

I also wonder if we could get a quantifiable metric on how well a humans can predict the future based on recorded interviews of individuals conducted at differing lengths in the past (1,2,5,10,20 years etc.). My guess is pretty well. Doing a quick mechanical turk survey might just tell you a bunch about your future (not that you can change it or anything) :) 

When SOA is appropriate and when it’s not

image

I have had this itch to investigate the pros and cons of service oriented architecture (SOA) and its derivatives (ROA, WOA, etc) for many years but never really cared enough to do it. Usually because many of the projects I built didn’t use it or the decision had been made a long time ago and it was pretty much irreversible at that point. 

My experience with SOA systems has been at medium/large sized companies and generally I found that it worked sufficiently well for them. However SOA didn’t seem like a vastly superior solution relative to the architectural patterns I had been exposed to over time. SOA just had a different set of trade offs that really just made it feel like a different tool in the tool box. 

Now, I deliberately try to avoid religious wars with tools and somewhere in my mid 20s I stopped caring about what they did in “theory” as well. I care much more as a technically competent entrepreneur about what actually happens when you implement a solution across all of the business and not just the problem at hand (think hiring, execution risk, business flexibility, time to develop, etc).

Now, with my history and viewpoint clearly established i’ll jump into my analysis. 

Given my experience at medium/large companies where SOA is used internally to power the product here are the complaints of SOA that I have heard/experienced:

  • Services are only as good as the best architect on that service team.
  • Its expensive to move engineers from team to team because each service completely different.
  • If a service has a bug you have to wait for the service team to fix it.
  • It is incredibly hard to test changes across multiple services.
  • With each incremental service you add your client requires more integration code (different payloads, protocols, errors etc).
  • When a service introduces breaking changes to an interface it is hard to find who all the clients are and just how big of an impact those changes will have downstream.
  • When there are a nontrivial number of services its difficult to find what services exist and what functionality they expose.

Now let’s go over how SOA is meaningfully different from the traditional object oriented architecture (OOA).

  • You have the flexibility to use different everything in SOA (languages, libraries, operating systems, protocols, web servers, testing tools, etc) per service where as in traditional OOA you typically stay in one language have a list of common dependencies, on a single stack.
  • You usually have stateless communication between components where as in OOA you implicitly have stateful communication with pass by reference mechanics.
  • With SOA everything that is exposed publicly is done intentionally and its harder for “client” engineers to break service abstractions although that doesn’t stop the “owner” engineers from doing it on occasion.
  • In theory you get to deploy your service independently of others (until you make changes that break other clients, which happens frequently enough in practice for me to mention it here)
  • Strict permission control over code bases if you have contractors etc. 

With these differences SOA has the following advantages over OOA in my mind:

  • You can adopt new technologies independently from the rest of the org.
  • Monkey patching, accessing private methods, or otherwise inserting hacks into code you don’t really own is harder to do.
  • Prevents code commit conflicts and enforces ownership contracts.
  • With stateless communication implementation *can* be simpler, it certainly makes threading easier.

Here is my list of disadvantages of SOA vs OOA:

  • Its much more complex and therefore slower to execute initially.
  • You typically need more engineers for the same amount of work.
  • You need sr engineers and architects proportional to # of services.
  • It requires tooling for service discovery, registration and testing.
  • Moving engineering resources has larger fixed costs.
  • Cross service development requires more coordination.
  • Without solid architects there is significant execution risk.

I think SOA is really appropriate for large teams working on complex systems at medium/large profitable “cool tech” companies where the costs can be amortized over a longer time periods with great engineers without threatening the success of the business. I also think SOA is appropriate for communication between companies where each company is a service. At that level you really don’t lose much because few of the OOA benefits are applicable.

If SOA matches your company’s profile then i highly recommend taking a look at this: http://www.infoq.com/presentations/twitter-soa

If your company doesn’t match the criteria above then I think OOA is probably right for you. It is faster to start, easier to understand, more flexible from an engineering management perspective, and allows you to move much faster with smaller teams. I would also say that even if you are planning on being a huge successful company with 100s of engineers its not worth investing in SOA until you actually feeling the pain of your immense success.

Small aside: SOA’s tradeoffs actually remind me a bit of database sharding but for engineering teams. 

(Lead Image source: https://tech.bellycard.com/blog/migrating-to-a-service-oriented-architecture-soa/)

First post in about a year

After a year long hiatus i have finally reconstructed my blog. There is a lot to talk about and not a lot of time to do it in. First thing is to put this placeholder in and start writing later this week.