More and more websites these days are relying on external resources for data and functionality. Each one of these services introduces a stability hole in your application. You're now relying on the uptime of an external resource. Yahoo, Google, etc.. ALL have downtime at some point. No one has a 100% SLA. This means that more than ever you need to code defensively to prevent external resources from tarnishing the user experience of your site. This is where the Circuit Breaker Stability Pattern comes into play (From the book "Release It").
The Circuit Breaker Pattern is a simple concept. If too many connections fail while accessing a resource the circuit is "opened" or stopped at that point. After a certain period of time we let one connection go through, if that one works the circuit is re-closed and operations resume as normal. Let's break this down further.
Your site connects to Google to get a list of widgets. The Google service goes down and now your customer is stuck in a timeout waiting for the connection to fail. You have no idea this is happening. You think everything is fine, until people get tired of waiting, your connections pool up and you run out of resources on your machines while 10,000 people are waiting 30 seconds to timeout. This makes your site look horrible to the user and they find somewhere else to query for these widgets. The goal is that if something is constantly failing you stop calling it and notify administrators and users.
The Circuit Breaker pattern defines a max number of failures (threshold) that can occur before the circuit is tripped. So let's say we want to stop connecting to Google after 5 people's connections have failed in a row. Now we can immediately return responses back to our users letting them know the service is down or they're query has been stored for processing at a later time. An informed user is a happy user, even if something does work they like to know about it immediately. After 10 minutes or so we want to open up ONE connection to test the waters to see if Google is still down so we open the circuit "halfway". If that connections fails we reopen the circuit and await the next timeout. Once a connection is successful again we re-close the circuit and operations can resume as normal. When a circuit does open we want to alert our operations group or the site admin so they know there is a problem with this service.
This allows you to automatically shut off and re-instate services if failures occur. It's a very powerful concept in increasing the stability of your system.
I've created a very simple, sample Circuit Breaker class in PHP that can be used for this purpose. It currently assumes a mysql database to store this information but you can replace it with a memcached or other in-memory caching option for increased performance. You can find all the code and samples here: http://code.google.com/p/plushcode/source/browse/#svn/trunk/stability_patterns/circuit_breaker
The schema is listed out below for the table
CREATE TABLE IF NOT EXISTS `circuit_breaker` (
`id` int(5) NOT NULL auto_increment,
`app_id` varchar(255) NOT NULL,
`failure_count` int(5) NOT NULL,
`threshold` int(5) NOT NULL,
`state` enum('open','closed','half') NOT NULL,
`timeout` int(5) NOT NULL COMMENT 'timeout in seconds',
`last_open_time` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
Let's start going through the fields....
app_id - Each external resource you have (including databases!) should get it's own application id that is unique. Something such as "google_widgets" works fine.
failure_count - Each time a failure occurs this is incremented by 1. Once a successful call has been made it is reset to 0
threshold - The number of times a service can fail before we open the circuit and stop all processing
state - The current state of the service. It's either closed (normal), open(failure occurred) or half(we're trying one more time)
timeout - The number of seconds to wait before attempting a connection again
last_open_time - A timestamp when the circuit was last opened which would indicate a failure
One of the nice aspects to having this in the database is that you can easily monitor all of your external services from one site. All you have to do is query the table and you can see exactly what the health of your system is. Here is a quick and dirty sample report.
To use this class you simply call it as seen in this test file: http://code.google.com/p/plushcode/source/browse/trunk/stability_patterns/circuit_breaker/test.php
What we're doing is including the CircuitBreaker class, then we're making sure the circuit is closed which means everything is A-OK. We make a web services call over to my test server page. If that connection fails then we call $cb->fail which then tests to see if our failure count is over our threshold and if not, increases it by one. If we are over the threshold it will open the circuit and all future calls will return false when isClosed() is called. If the connection is successful then we call $cb->success() which resets the failures back to 0.
To test this out:
- create a test database
- import the schema into the new database
- enter in one record called "myapp" as the app_id, put in 5 as the threshold and 10 as the timeout value
- paste the Test.php code into your own file
- When you run the code you should see a var dump of some xml data which indicates success
- Now change the url in the file_get_contents line to a url that does not exist
- Start to refresh your page and you start to see the connections fail and then the circuit trips, after 10 seconds it opens up halfway, does one test then reopens for another 10 seconds


Comments (Login to leave comments)
I was also thinking that this should be applied to your internal services dependencies also, as in SOA architecture.
Sam
One solution is to manage your local resource footprint tightly within your app - ie. only hold db connections open for as long as you are actually using them. This is extremely difficult to achieve if you are using an off-the-shelf framework.
Alternatively, brush up on good sysadmin practices 101 and reduce connection timeouts on external resources to reasonable limits *and* handle connection timeouts properly within your app.
The *wrong* thing to do is to rely absolutely on limited-availability local resources (ie. your database) to gatekeep access to external resources.
Even if you set your timeouts properly your connections can pile up quickly leaving your server out of resources on high traffic sites. This implementation is defined as a circuit breaker. Google: "circuit breaker pattern"
The only issue I have with this implimentation is it relies on an external resource, your MySQL database, which as you mention in the article should also be included in the circuit breaker.
A better option may be to use a cache of some sort, even if it is just a flat file. Then you won't be reliant on another service.
Thoughts?
I must admit that I did have a couple problems with the implementation though. One being that the state store is hard-coded in the class, but the bigger problem being that I'm a bit slow and keep forgetting whether a circuit should be 'open' or 'closed'.
So I thought I would have a go at implementing the pattern with a storage adapter and what I hope is a simpler use case. You can find the description and code I came up with here
http://leighmakewell.blogspot.com/2009/07/pattern-heavy-circuit-breaker-pattern.html
Please let me know what you think. :)
<a href="http://digitplanner.free-hosting.cc/homemade-power-wheelchair-carrier-plans.html">homemade power wheelchair carrier plans</a>|
<a href="http://digitplanner.free-hosting.cc/bluebird-house-plans.html>bluebird house plans</a>|
<a href="http://digitplanner.free-hosting.cc/stainless-steel-biodiesel-processor-plans.html>stainless steel biodiesel processor plans</a>|
<a href="http://digitplanner.free-hosting.cc/free-unmounted-lesson-plans.html>free unmounted lesson plans</a>|
<a href="http://coasterplanet.com/classified-ufo-files.html>classified ufo files</a>|
<a href="http://coasterplanet.com/unison-health-plan.html>unison health plan</a>|
<a href="http://coasterplanet.com/ring-roller-plans.html>ring roller plans</a>|
<a href="http://coasterplanet.com/pantograph-plans.html>pantograph plans</a>