Original Content

Why your PHP App NEEDS a Circuit Breaker

More and more websites these days are relying on external resources for data and functionality. Each one of these services introduces a stability hole in your application. You’re now relying on the uptime of an external resource. Yahoo, Google, etc.. ALL have downtime at some point. No one has a 100% SLA. This means that more than ever you need to code defensively to prevent external resources from tarnishing the user experience of your site. This is where the Circuit Breaker Stability Pattern comes into play (From the book “Release It”).

The Circuit Breaker Pattern is a simple concept. If too many connections fail while accessing a resource the circuit is “opened” or stopped at that point. After a certain period of time we let one connection go through, if that one works the circuit is re-closed and operations resume as normal. Let’s break this down further.

Your site connects to Google to get a list of widgets. The Google service goes down and now your customer is stuck in a timeout waiting for the connection to fail. You have no idea this is happening. You think everything is fine, until people get tired of waiting, your connections pool up and you run out of resources on your machines while 10,000 people are waiting 30 seconds to timeout. This makes your site look horrible to the user and they find somewhere else to query for these widgets. The goal is that if something is constantly failing you stop calling it and notify administrators and users.

The Circuit Breaker pattern defines a max number of failures (threshold) that can occur before the circuit is tripped. So let’s say we want to stop connecting to Google after 5 people’s connections have failed in a row. Now we can immediately return responses back to our users letting them know the service is down or they’re query has been stored for processing at a later time. An informed user is a happy user, even if something does work they like to know about it immediately. After 10 minutes or so we want to open up ONE connection to test the waters to see if Google is still down so we open the circuit “halfway”. If that connections fails we reopen the circuit and await the next timeout. Once a connection is successful again we re-close the circuit and operations can resume as normal. When a circuit does open we want to alert our operations group or the site admin so they know there is a problem with this service.

This allows you to automatically shut off and re-instate services if failures occur. It’s a very powerful concept in increasing the stability of your system.

I’ve created a very simple, sample Circuit Breaker class in PHP that can be used for this purpose. It currently assumes a mysql database to store this information but you can replace it with a memcached or other in-memory caching option for increased performance. You can find all the code and samples here: http://code.google.com/p/plushcode/source/browse/#svn/trunk/stability_patterns/circuit_breaker

The schema is listed out below for the table

CREATE TABLE IF NOT EXISTS `circuit_breaker` (
  `id` int(5) NOT NULL auto_increment,
  `app_id` varchar(255) NOT NULL,
  `failure_count` int(5) NOT NULL,
  `threshold` int(5) NOT NULL,
  `state` enum('open','closed','half') NOT NULL,
  `timeout` int(5) NOT NULL COMMENT 'timeout in seconds',
  `last_open_time` datetime NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

Let’s start going through the fields….

app_id – Each external resource you have (including databases!) should get it’s own application id that is unique. Something such as “google_widgets” works fine.

failure_count – Each time a failure occurs this is incremented by 1. Once a successful call has been made it is reset to 0

threshold – The number of times a service can fail before we open the circuit and stop all processing

state – The current state of the service. It’s either closed (normal), open(failure occurred) or half(we’re trying one more time)

timeout – The number of seconds to wait before attempting a connection again

last_open_time – A timestamp when the circuit was last opened which would indicate a failure

One of the nice aspects to having this in the database is that you can easily monitor all of your external services from one site. All you have to do is query the table and you can see exactly what the health of your system is. Here is a quick and dirty sample report.

To use this class you simply call it as seen in this test file: http://code.google.com/p/plushcode/source/browse/trunk/stability_patterns/circuit_breaker/test.php

What we’re doing is including the CircuitBreaker class, then we’re making sure the circuit is closed which means everything is A-OK. We make a web services call over to my test server page. If that connection fails then we call $cb->fail which then tests to see if our failure count is over our threshold and if not, increases it by one. If we are over the threshold it will open the circuit and all future calls will return false when isClosed() is called. If the connection is successful then we call $cb->success() which resets the failures back to 0.

To test this out:

  1. create a test database
  2. import the schema into the new database
  3. enter in one record called “myapp” as the app_id, put in 5 as the threshold and 10 as the timeout value
  4. paste the Test.php code into your own file
  5. When you run the code you should see a var dump of some xml data which indicates success
  6. Now change the url in the file_get_contents line to a url that does not exist
  7. Start to refresh your page and you start to see the connections fail and then the circuit trips, after 10 seconds it opens up halfway, does one test then reopens for another 10 seconds
Published: April 13th, 2009 at 5:21
Categories: Uncategorized
Tags:

11 comments to “Why your PHP App NEEDS a Circuit Breaker”

This is a great concept.

I was also thinking that this should be applied to your internal services dependencies also, as in SOA architecture.

Sam

This strikes me as more of a "gatekeeper" pattern than a "circuit breaker." The problem is that the limiting factor is unlikely to be the unresponsive external resource, but local resources limits – specifically, database connections. Most frameworks connect to the database early in the bootstrap phase – attempts to access external resources typically happen during the processing or render phases. If the external resources are timing out, or just being plain slow, then this causes requests to pile up, but it’s only when db connection limits are exceeded that things actually go pear-shaped.

One solution is to manage your local resource footprint tightly within your app – ie. only hold db connections open for as long as you are actually using them. This is extremely difficult to achieve if you are using an off-the-shelf framework.

Alternatively, brush up on good sysadmin practices 101 and reduce connection timeouts on external resources to reasonable limits *and* handle connection timeouts properly within your app.

The *wrong* thing to do is to rely absolutely on limited-availability local resources (ie. your database) to gatekeep access to external resources.

Actually John things are more likely to go pear shaped dealing with external dependencies. You most likely have tight control over your database, ideally your sys admins monitor the loads, etc… When you rely on external dependencies you have no idea what’s behind the scenes. You could be aggregating data from one source and in turn they’re aggregating that data from 100 other sources, so now you need to rely on 101 sites with X number of app and db servers.

Even if you set your timeouts properly your connections can pile up quickly leaving your server out of resources on high traffic sites. This implementation is defined as a circuit breaker. Google: "circuit breaker pattern"

thoer81@gmail.com
May 1st, 2009 at 11:44 am

I was thinking about implementing something like this, but I have a fundamental problem with this class. It seems to me that errors never expire, which would mean that the threshold might be reached in 2 years, you still lock the source for a while. I’m not sure how it would work out on a real life application, it may not be worth the effort and resources to let those errors expire, but it’s definietly worth thinking about it.

This is a really interesting article.

The only issue I have with this implimentation is it relies on an external resource, your MySQL database, which as you mention in the article should also be included in the circuit breaker.

A better option may be to use a cache of some sort, even if it is just a flat file. Then you won’t be reliant on another service.

Thoughts?

Thanks for this article. I found it to be really interesting and introduced me to a really interesting new concept.

I must admit that I did have a couple problems with the implementation though. One being that the state store is hard-coded in the class, but the bigger problem being that I’m a bit slow and keep forgetting whether a circuit should be ‘open’ or ‘closed’.

So I thought I would have a go at implementing the pattern with a storage adapter and what I hope is a simpler use case. You can find the description and code I came up with here
http://leighmakewell.blogspot.com/2009/07/pattern-heavy-circuit-breaker-pattern.html

Please let me know what you think. :)

now I know why circuit breaker is really needed! thanks for a bunch of info! It would really help me do my homework! ;)

now I know why circuit breaker is really needed! thanks for a bunch of info! It would really help me do my homework! ;)

Great article! I will definately use it in the future on my website mobilfaktor.dk

I have created a PHP implementation of zend framework circuit breaker compatible with ZF 1.x.

Please have a look at diagrams and code (its cache based storage so no mysql database needed).

http://artur.ejsmont.org/blog/PHP-Circuit-Breaker-initial-Zend-Framework-proposal

I hope it will make into the release :- )