Getting started with CouchDB : meet PHP on Couch

September 9, 2010

Uncategorized

The setup

I will not detail the installation of the CouchDB server, the wiki got enough details.
For the rest of this article I will assume that we have a running CouchDB server waiting for our queries on couch.example.com on port 5984 (the default CouchDB port). Setting up PHP on Couch is not really complicated : just download it on github.com, extract the archive : the interesting bits are in the lib folder.

Overview of the library

The library is made of five files.

couch.php

This is the low level class, that implements communication with the CouchDB server. One nice point is that, if your PHP have the curl module, the class will use automatically use it.

couchClient.php

This is the most important file of the library, defining all high-level methods the developper will need to interact with CouchDB. This file also defines a specific exception couchException.

couchDocument.php

This one provides Object-Document mapping : the couchdb document is translated into a PHP object, with nice getter and setter properties.

couchReplicator.php

This class goal is to simplify the process of starting a replication between two CouchDB databases.

couchAdmin.php

Last but not least, the couchAdmin class allows the developer to manage CouchDB security features : adding/removing users/admins, applying roles, …

Other than those files, the only requirement is for your PHP flavour to support the json_encode/json_decode functions, included since version 5.2.0.

What you need to know on CouchDB

CouchDB is a complex piece of software : in this chapter I’ll try to explain the things you have to know before beginning using it

JSON

CouchDB’s data representation is written in the JSON format. Most of you already know this notation, as it’s heavily used in Javascript, and because PHP implements json encoding and decoding functions for a long time. And that’s a good thing, because it means Couch documents can be translated to PHP objects, and vice-versa.

Databases

A CouchDB server contain databases. A database is a container for documents. Rights management can be defined at the database level, such as “user X can write to database Y”.

Documents

A document is the main point of interrest in a document-store ! A Couch document, is made of property-value pairs : the key has to be a string, and can’t begin with underscores. The value can be of any valid JSON type : boolean, numbers, strings, arrays or objects. You could see a document as a PHP object that can’t have methods. A Couch document got three special properties :

_id

This is the unique identifier of the document. Inside a Couch database, two documents can’t have the same _id.

_rev

This is the revision number of the document. When you update a document, CouchDB doesn’t delete the previous version of the document, but writes the new version of the document, incrementing _rev. This is to implement replication features. _rev property is managed by the Couch server and you can’t override it.

_deleted

This is a flag property that tell you that the document has been deleted from the database, and as such is no more available.

Design documents

Design documents are a special kind of documents. You know a document is a design document, because its _id begins with _design/. Design documents have special (think magic) properties, and are used to implement the computing parts on the database (a.k.a. the cool stuff).

Views

CouchDB views are the way to find your documents, other than by the _id. Views are built with a map function, that analyze the document and export related informations: it creates indexes we can query on. Views can also, when needed, contain a reduce function, that takes in input the created indexes and should compute them, for example to sum them, or to compute the average, … That’s called a MapReduce algorithm.

Simple things first

As a developer, the first need regarding a database is to store and retrieve data. Let’s see how to do it with CouchDB.

Enabling CouchDB access in a PHP script

In any PHP script involving a connection to a CouchDB server, you’ll need to :

  • Include the library files
  • Instanciate the client object, that represents your connection to the Couch Database

Here the corresponding code, creating a client connection to the database testdb:

require_once "couch.php";
require_once "couchClient.php";
require_once "couchDocument.php";

$client = new couchClient("http://couch.example.com:5984/","testdb");

We created the $client object, pointing to couch.example.com, port 5984, and using the database testdb. From now on, $client will always refer to this object : I will not rewrite the connection string every time.

Creating the database

The methods databaseExists and createDatabase allow easy database creation. In the PHP on Couch library, nearly all methods can throw exceptions : you should use the try/catch blocks to properly check errors.

if ( !$client->databaseExists() ) {
	try {
		$client->createDatabase();
	} catch ( Exception $e ) {
		die("Unable to create the database : ".$e->getMessage());
	}
}
echo "From here I'm sure the database exists !";

Storing a document

Storing a document is only a matter of calling the storeDoc method, passing it an object in parameter.

$doc = new stdClass();
$doc->_id = "document1"; // the unique id of the document

// let's add some other properties
$doc->type = "article";
$doc->published = true;
$doc->tags = array ( "story", "couch", "php" ) ;

try {
	$client->storeDoc($doc);
} catch ( Exception $e ) {
	die("Unable to store the document : ".$e->getMessage());
}

Getting a document

To get a document, knowing it’s unique identifier, just call the getDoc method, setting the unique id as the argument. In the following script, I get back the document1 document I just recorded :

try {
	$doc = $client->getDoc("document1");
} catch ( Exception $e ) {
	die("Unable to get back the document : ".$e->getMessage());
}
echo $doc->type ; // that should print "article".

What if the document does not exist ? In this case, an exception is thrown, and the exception code is 404… You know it, THE HTTP 404 “Not found” code !

try {
	$doc = $client->getDoc("document10");
	echo "I found document10.";
} catch ( Exception $e ) {
	if ( $e->getCode() == 404 ) {
		echo "document10 does not exist in the database.";
	} else {
		die("Unable to get document10 : ".$e->getMessage());
	}
}

Using the couchDocument Object-Document Mapper

The couchDocument class implements a very simple ODM for CouchDB. The following code does the very same thing than the preceding examples, but using the ODM :

Storing a document using the ODM

$doc = new couchDocument($client);
$doc->_id = "document1"; // need to set _id first, otherwise an id is auto-generated by couchDB.
$doc->type = "article";
$doc->published = true;
$doc->tags = array ( "story", "couch", "php" ) ;

No need to call a record method, because couchDocument supports magic getters and setters. So, each time a property is set, the document is updated in database. One can wonder about performances : hopefully there is a way to record several properties in one go, using the ODM set method.

$doc = new couchDocument($client);
$doc->set( array (
	"_id" => "document1",
	"type"=>"article",
	"published=>true,
	tags => array ( "story", "couch", "php" )
) );

Getting a document using the ODM

A factory pattern is used to get a couchDocument instance of an existing document : the trick is to call the static method getInstance :

$doc = couchDocument::getInstance($client,"document1");

Working with views

Creating views

The thing to have in mind when building applications using CouchDB is : you have to know what sort of queries you’ll issue to the database and implement them in the database, not in the PHP code. Let me explain a little deeper : in an SQL world, you create a table to store some data. Because you know the table as a field “name” which is a varchar, you know you’ll be able to issue queries from PHP, asking “give me the rows having name = ‘foo’” or “give me the rows having name beginning with ‘foo’”. When performances decrease, you set up some indexes and it’s over.
In a document store world, documents are unstructured : that means that a document will have a property “name”, document3 will not, the property “name” will be a string in document1, a boolean in document2, … To get the benefits from this type of stores, you have to tell the database what your documents contains and what informations we can get out of them. That’s the goal of the map functions, implemented in the views of CouchDB design documents. Views can be coded in a wide variety of languages, but, out of the box, CouchDB expect… javascript. And that’s a good thing : the vast majority of developers have already, one day or another, dealt with Javascript.

Some concrete code

Let’s take an example : I want a view that export the “type” property of my documents, so later on I can ask CouchDB to give me all documents, for example, of type “article”. I will create a view function, in javascript. It’s very important to understand that this function will be executed on the CouchDB server, and not in a browser. The function takes the document in paramater, and uses a special emit function to export the indexes I need. This is a really simple example : I test if my document have a type property, and if so, I emit the property’s value.

function (doc) {
	if ( doc.type ) {
		emit (doc.type, null);
	}
}

I will now create a design document called app1 containing a view called by-type. The magic property of the design document to create views is called views, and the function is a map function.

$doc = new couchDocument($client);
$doc->_id = "_design/app1";

//preparing the views object
$views = new stdClass();
$views->{"by-type"} = array (
	"map" => "function (doc) {
		if ( doc.type ) {
			emit (doc.type, null);
		}
	}"
);
$doc->views = $views;

The design document, once recorded in CouchDB, should look like :

{
	"_id": "_design/app1",
	"views": {
		"by-type": {
			"map": "function (doc) {\n		if ( doc.type ) {\n			emit (doc.type, null);\n		}\n	}"
		}
	}
}

A little more fun

I will now write a view that export every tags of my document if the document type is article :

$doc = couchDocument::getInstance($client,"_design/app1");
$views = $doc->views;

$jsfunc = "function (doc) {
	if ( doc.type && doc.type == \"article\" && doc.tags ) {
		for ( var index in doc.tags ) {
			emit(doc.tags[index],null);
		}
	}
}";

$views->{"by-tag"} = array("map"=>$jsfunc);
$doc->views = $views;

This javascript function is a little more complicated than the preceding one. To people with really no javascript background, the loop for ( var index in doc.tags ) is equivalent to the PHP loop foreach ( $docs->tags as $index => $value ). What is interesting here is that a map function can emit several indexes for the same document.

Query a view

To query a view, we have to use the getView method, with the parameters telling the design document and the view name :

$view = $client->getView("app1","by-tag");
print_r($view);

/* should print something like :

stdClass Object (
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => couch
			[value] => null
		)
		[1] => stdClass Object (
			[id] => document1
			[key] => php
			[value] => null
		)
		[2] => stdClass Object (
			[id] => document1
			[key] => story
			[value] => null
		)
	)
)
*/

A view response is made of three main properties : the total_rows integer, that gives the number of rows available in the view : it’s not dependant on criteria selection (we’ll see that later). The offset integer, giving us the offset between the first returned doc and the first doc inside our selection criteria (again we’ll see that later). At least, the rows array, that contains for each index emitted by the emit function, the unique identifier of the document that emmited the index, the emitted key, and the emmited value (the second parameter of the javascript emit function, that we set to null for now). One thing to note is that the index has been sorted alphabetically. An other thing is that the emit call used in the javascript function, emit the index AND associate it to the id of the document that emitted the index.

Query parameters

CouchDB supports several query parameters, that are implemented as chainable methods in the PHP on Couch library. I will show you some of them now :

// limit the result rows to 1 row
$view = $client->limit(1)->getView("app1","by-tag");
//   this previous line is equivalent to :
// $client->limit(1);
// $view = $client->getView("app1","by-tag");
//   that's why the "limit" method is called a chainable method
print_r($view);
/* should print something like :
stdClass Object (
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => couch
			[value] => null
		)
	)
)
*/

// sort in descending order and limit the result rows to 1 row
$view = $client->limit(1)->descending(true)->getView("app1","by-tag");
print_r($view);
/* should print something like :
stdClass Object(
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => story
			[value] => null
		)
	)
)
*/

// sort in descending order, limit the result rows to 1 row and include the document in the response
$view = $client->limit(1)->descending(true)->include_docs(true)->getView("app1","by-tag");
print_r($view);
/* should print something like :
stdClass Object(
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => story
			[value] => null
			[doc] => stdClass Object (
				[_id] => document1
				[type] => article
				...
			)
		)
	)
)
*/

// only select rows of documents having emitted the key "php"
$view = $client->key("php")->getView("app1","by-tag");
print_r($view);
/* should print something like :
stdClass Object(
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => php
			[value] => null
		)
	)
)
*/

// only select rows of documents having emitted a key that is alphabetically equal or greater than "php"
$view = $client->startkey("php")->getView("app1","by-tag");
print_r($view);
/* should print something like :
stdClass Object(
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => document1
			[key] => php
			[value] => null
		)
		[1] => stdClass Object (
			[id] => document1
			[key] => story
			[value] => null
		)
	)
)


*/

The most used query parameter in views is obviously include_docs(true). It’s also worth noting that, no matter how many results I get in the rows array, the total_rows integer is always 3 : that’s because it’s the total number of rows (= emitted indexes) in this view, no matter what query parameters have been set.

View collation (aka where is my join ?)

You’ve seen it : couch documents have little in common with SQL tables. So there is no formal way of doing associations between documents. However, the MapReduce algorithm used to build views, once well understood, is really powerfull. To illustrate the view collation pattern, I will add to the database two documents of type “comment” and create a view to retrieve both the story and the comments.
First of all, let’s add the two comments : By convention, I will set the type property to comment and create an article property, with the value document1, to indicate that the comments are related to the article that got _id document1.

$doc = new couchDocument($client);
$doc->set( array (
	"_id"=> "comment1",
	"type"=>"comment",
	"article"=>"document1",
	"body"=>"The first comment"
));
$doc = new couchDocument($client);
$doc->set( array (
	"_id"=> "comment2",
	"type"=>"comment",
	"article"=>"document1",
	"body"=>"Another one"
));

Now the map function : we have to test the kind of document, and export indexes as a two elements array : the first one being the article id (set in _id property if the document is an article, and in article property if the document is a comment, and the second one being the type of document (for easier parsing).

function (doc) {
	if ( !doc.type ) {
		return ;
	}
	if ( doc.type == "article" ) {
		emit( [ doc._id, doc.type ], null);
	} else if ( doc.type == "comment" ) {
		emit( [ doc.article, doc.type ],null);
	}
}

Lest’s add this map function in the app1 design document, and name it article-full (because I have no better idea).

$doc = couchDocument::getInstance($client,"_design/app1");
$views = $doc->views;

$jsfunc = "function (doc) {
	if ( !doc.type ) {
		return ;
	}
	if ( doc.type == \"article\" ) {
		emit( [ doc._id, doc.type ], null);
	} else if ( doc.type == \"comment\" ) {
		emit( [ doc.article, doc.type ],null);
	}
}";

$views->{"article-full"} = array("map"=>$jsfunc);
$doc->views = $views;

If I query the view, in one go, I can get back the article and it’s associated comments. In the response I won’t include the complete doc display for readability reasons.

$view = $client->include_docs(true)->getView("app1","article-full");
print_r($view);
/* should print something like :
stdClass Object(
	[total_rows] => 3
	[offset] => 0
	[rows] => Array (
		[0] => stdClass Object (
			[id] => comment1
			[key] => Array (
				[0] => article1
				[1] => comment
			)
			[value] => null
			[doc] => stdClass Object (
				[_id] => comment1
				(...)
			)
		)
		[1] => stdClass Object (
			[id] => comment2
			[key] => Array (
				[0] => article1
				[1] => comment
			)
			[value]=>null
			[doc] => stdClass Object (
				[_id] => comment2
				(...)
			)
		)
		[2] => stdClass Object (
			[id] => article1
			[key] => Array (
				[0] => article1
				[1] => article
			)
			[value]=>null
			[doc] => stdClass Object (
				[_id] => article1
				(...)
			)
		)
	)
)
*/

One more thing : in the present example we only have one article. However in a real world situation, with many articles in the database, this view will return all articles and all associated comments. To limit the output to one article, we should use the startkey and endkey query parameters. The collation specification (ie. how you can set the startkey and endkey to only include what you need) is specified in the CouchDB wiki. In our case, we set the startkey to an array having only one row (the article id) and the endkey being an array with two rows : the article id and an empty array. Arrays being ordered after strings in the couchdb collation, our interval will include all keys we need, and only these. That gives :

$view = $client
		->startkey( array("article1") )
		->endkey( array("article1", array()) )
		->include_docs(true)->getView("app1","article-full");

Conclusion

I hope you like this first sight of the CouchDB database. Couch have many other features, but the documents and views are the central part of the system : once those are well understood, the rest is easily understandable. The most difficult part, as said in the introduction, is to forget years of SQL thinking to build web applications.

Comments are closed.