Revisiting charts: NodeJS, Twitter and Elasticsearch

Some time ago, I developed a mashup using NodeJS and Couchbase to demonstrate how to use Couchbase to create stats and charts. Despite I still use Couchbase for many things in my day-to-day basis, it does not provide any comfortable way to perform free text search, so I moved to ElasticSearch for some projects. In this series of  posts I show how to create a very basic NodeJS application that gets tweets from Twitter stream, stores them into ElasticSearch and get some charts about them.

First thing first, you have to install ElasticSearch. If you are using a Mac and Homebrew, just brew the following formula:

$ brew install elasticsearch

And then, run ElasticSearch as you prefer. Example:

$/usr/local/bin/elasticsearch -f -D es.config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

After a few seconds, you will get something like this on you terminal:

[2014-01-13 20:55:08,290][INFO ][node ] [Rhiannon] {0.20.2}[36723]: initializing ...
[2014-01-13 20:55:08,309][INFO ][plugins ] [Rhiannon] loaded [], sites []
[2014-01-13 20:55:17,488][INFO ][node ] [Rhiannon] {0.20.2}[36723]: initialized
[2014-01-13 20:55:17,526][INFO ][node ] [Rhiannon] {0.20.2}[36723]: starting ...
[2014-01-13 20:55:17,967][INFO ][transport ] [Rhiannon] bound_address {inet[/]}, publish_address {inet[/]}
[2014-01-13 20:55:21,391][INFO ][cluster.service ] [Rhiannon] new_master [Rhiannon][pDxyYfeqSo2Mf1WWNFzwJA][inet[/]], reason: zen-disco-join (elected_as_master)
[2014-01-13 20:55:21,529][INFO ][discovery ] [Rhiannon] elasticsearch_apolion/pDxyYfeqSo2Mf1WWNFzwJA
[2014-01-13 20:55:21,582][INFO ][http ] [Rhiannon] bound_address {inet[/]}, publish_address {inet[/]}
[2014-01-13 20:55:21,583][INFO ][node                     ] [Rhiannon] {0.20.2}[36723]: started

This means that ElasticSearch is running. Now, point your browser to http://localhost:9200 and you should get something similar to:

  "ok" : true,
  "status" : 200,
  "name" : "Rhiannon",
  "version" : {
    "number" : "0.20.2",
    "snapshot_build" : false
  "tagline" : "You Know, for Search"

So you know you can start using ElasticSearch. But before starting, you’ll need to install the HEAD plug-in in order to make your life easier with ES. Quit ElasticSearch by entering Ctrl+C on the terminal and type this command that will install the plugin

/usr/local/bin/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Then, run ES again and brose to http://localhost:9200/_plugin/head/. You will see something like this:

Indicating that the ElasticSearch cluster is running and ready to store your documents.

Now it’s when the fun begins. Browse to and clone the project on your local machine. Once you have cloned it, go inside the directory and run

$ npm install
$ node app.js

Before doing anything else, please do edit the file named keys.js and update it with your Twitter Tokens. You can get them on

Now to http://localhost:3000, where you will see a simple website based on Twitter Boostrap. Click on the Stream button on the navigation bar and then on the blue button that says “Start Streaming”.


By clicking on the button, a connection to Twitter’s Streaming API will be made and you will start receiving tweets. These will appear on the page above.

Now at at the code, at the file named app.js. For each tweet received, it is saved on the callback named saveOnElasticSearch and then emitted to the front-end via The interesting part here is how the documents are saved using the following call

    index: 'tweets',
    type: 'tweet',
    id: myKey,
    body: aTweet
  }, function (err, resp) {;;

Go now again to http://localhost:9200/_plugin/head/ and check that a new index has been created and the documents have been saved into it.

And that’s it? No CREATE DATABASE? No schema? No nothing?

Well.. Yes and no. The index (database) is created automatically and the mapping (schema) is created from the documents that you just inserted. This is a lot more trickier, but this will work for this demo (probably you noticed the parsing of the date… that’s one of the tricky parts). By default, ElasticSearch will index every single field of every document in the index, so that allow you to query by any of them.

On of the most interesting parts of ElasticSearch are the facets, that we are going to use to create the chart on the stats page. Facets are aggregations based on data. Something similar to asking the database to calculate average, median… and many other things such an histogram. Read more about facets on the docs.

It is the histogram facet that we are going to use. Check the file named routes/tweets.js and see this function{
    index: 'tweets',
    size: 0,
    body: {
      "query": {
        "match_all": {}
      "facets": {
        "histo1": {
          "date_histogram": {
            "field": "created_at",
            "interval": "minute"
      "size": 0
  }, function (error, response) {;
    res.render('stats', {"title": "Stats", "data": response.facets.histo1.entries});

What we are doing is using the search method of the ES client to query the database. We don’t want any results, so size = 0. But we want the histogram facet and we want it to count the tweets created at each minute. Check the docs for more info.

The result of the ElasticSearch query is sent into a template and rendered as a Javascipt variable that is later parsed by amcharts (See routes/tweets.js and public/js/templates/stats.ejs).

And that’s it. On following posts I will explain how to perform advanced querying to ElasticSearch by adding more expressions to the “query” object (so, using a filter different from this “match_all”).

Posted in Uncategorized | Comments Off on Revisiting charts: NodeJS, Twitter and Elasticsearch

Building Freeling Java API on Mac OS X

Next step on my Freeling adventure was to build the Java API. It comes together with Freeling, if you checked out the SVN repo, but the Makefile there is totally incorrect for the Mac. If you followed the instructions on my last post, you should change the provided Makefile to match these contents:

Then, just run

make install

And you are good to go.

Posted in Uncategorized | Comments Off on Building Freeling Java API on Mac OS X

Building Freeling in Mac OS X

I’m involved in a NLP project (the Natural Language Processing, not the Neuro-linguistic Programming thing) and I had to use Freeling. After some days of going back and forward with the installation instructions in the forums, I manage to build it. Here’s my recipe.

First, install Macports. I tried brew, but it just does not work. You’ll also need XCode command line tools from Apple.

Then, open a Terminal and run:

sudo port install autoconf
sudo port install db46
sudo port install automake
sudo port install libtool
sudo port deactivate boost (optional, just in case you already have boost already installed in your system)
sudo port install subversion (just in case you’ve never used SVN before)

Edit /opt/local/etc/macports/sources.conf and add this line just before the “rsync” line

mkdir /Users/Shared/dports
cd /Users/Shared/dports

Now install the specific version of boost by co-ing its revision (1.49, any other version will prevent freeling to build). Then, index your new repo and install boost

svn co --revision 90293 devel/boost
 portindex /Users/Shared/dports
 sudo port install boost @1.49.0

Now, checkout Freeling from SVN

svn checkout freeling_src
cd freeling_src
sudo glibtoolize –force
automake -a

I guess something is basically wrong with boost sources from macports, because you have to do this:

cd /opt/local/include/boost
sudo ln -s property_map/property_map.hpp

Lastly, build freeling itself:

env LDFLAGS=”-L/opt/local/lib -L/opt/local/lib/db46″ CPPFLAGS=”-I/opt/local/include -I/opt/local/include/boost -I/opt/local/include/db46″ ./configure
sudo make install

I’ve checkout the code from SVN today (June, 11th 2013) and I had to change a bit the configure call. Use this one:

$ env LDFLAGS=”-L/opt/local/lib -L/opt/local/lib/db46″ CPPFLAGS=”-I/opt/local/boost -I/opt/local/include/db46″ ./configure –enable-traces –enable-boost-locale CPPFLAGS=”-I/opt/local/include -I/opt/local/include/boost -I/opt/local/include/db46″ LDFLAGS=”-L/opt/local/lib -L/opt/local/lib/db46″

If you copy & paste the line above, it won’t work because of wordpress replacing quotes and dashes. Use this one:

Posted in Uncategorized | Comments Off on Building Freeling in Mac OS X

Basic Couchbase querying for SQL people

This is the second of a series of posts about Couchbase. For the first one, click here.

Probably, you all know how to query a relational database. It is done using SQL, a declarative (non-imperative) language that is widely used to insert and retrieve information from databases.

Taking a users table A typical SQL query should look like this:

SELECT, u.username, u.first_name, u.last_name, u.last_login
FROM users u
WHERE = “1111”;

But… How do I create a query similar to the one above in Couchbase? Well. The answer is: Creating a view

Download and install Couchbase server on your computer, and go through the set-up process. Then, you’ll need to create some documents. You can download these sample documents and use them in the following examples.

Once you have created your documents, go to the VIEWS tab.

Once there, you’ll see something like this page.

As you have probably never created a view on your system, your views list will be empty. Let’s start by creating the easiest query ever. Click on “Create Development View” and use these names to create the view:

Design Document Name: _design/dev_users
View Name: by_id

And click on Save.

Now, your design document and your view have been created. Right now, it would be easier for you to think in a design document as in a package of views. And to think in a view as in a query. At least, it was easier for me. You should be seeing a screen like this:

Click on the name of the view. You can create your query in the next page.

Basic Querying – SELECT

There are two text inputs where you can type the code of your view. The one on the left is for the MAP function. The one on the right is for the REDUCE function. You can read a lot more on how this map and reduce things work on the Couchbase Server Manual

But, basically, you’ll need the map function to return data from your database, and the reduce function for calculating and aggregating the data (like in SQL SELECT COUNT(*), for instance). In Couchbase, both of them are created using JavaScript, so you don’t need to learn a new tricky language. Let’s focus on retrieve the information, by now.

Leave the textbox on the left with the sample code, like this:

function (doc) {
emit(doc._id, null);

Map functions are used to create an index which is queried every time you ask Couchbase for some information. The “emit” call stands for “add this key/value pair to the index”. With the basic code, we are adding a pair key=doc._id and value=null to the index, which will work for a lot of queries. Save the view and click on Show Results twice. You will see the results of your view like this:

This method is not the most comfortable one to see the results of a view. It works for a few results, but not for big sets of data. I encourage you to open a new browser tab and navigate to this URL:

(Notice the words in bold. These are the names you specified a minute ago)

So you’ll get this:


So… What has happened? By emit-ing the id of the document, we have SELECTED the documents that the view will retrieve. For each row, you can see an attribute named “key”, which is the first argument of the emit function above and an attribute named “value”, that has a null value and that is the second argument of the same emit call.

In SQL, you have done this:

FROM users u

Going back to the SQL query at the beginning of this post, we wanted to retrieve multiple information. We can do this by changing the emit call and add some value to the second argument. Edit the view and type this code:

function (doc) {
emit(doc._id, [doc.username, doc.first_name, doc.last_name, doc.last_login]);

If you reload the previous URL, you’ll see a new results, like those:

{"id":"1111","key":"1111","value":["alice","Alice","Abrahams","2012-08-31 01:06:35"]},
{"id":"2222","key":"2222","value":["bob","Bob","Bacon","2012-08-30 02:06:35"]},
{"id":"3333","key":"3333","value":["charlie","Charles","Celsius","2012-08-29 02:06:35"]}

Which is exactly what you wanted to retrieve by your first SQL query. By emitting the values that you want to retrieve, you’ll be doing the equivalent to specifying the names of the attributes in the SELECT clause in SQL.



Probably the most common operation in SQL is selecting a row by its ID. This can be easily achieved in Couchbase, and you don’t even have to modify your view code. Simply go to this URL”1111”

This is what you’ll get:

{"id":"1111","key":"1111","value":["alice","Alice","Abrahams","2012-08-31 01:06:35"]}

By adding the key=some_value argument to the URL, you’ll get only the results matching that key. By the way: The emitted keys can have repeated values. This is the way to retrieve multiple results at the same time.

Imagine you want to get all the users from the same city. Then, you’ll create a view named “users / by_city”, with the following map function:

function (doc) {
emit(, [doc.username, doc.first_name, doc.last_name, doc.last_login]);

Accesing it from will return:

{"id":"1111","key":"Barcelona","value":"alice","Alice","Abrahams","2012-08-31 01:06:35"]},
{"id":"2222","key":"Barcelona","value":["bob","Bob","Bacon","2012-08-30 02:06:35"]},
{"id":"3333","key":"Boston","value":["charlie","Charles","Celsius","2012-08-29 02:06:35"]}

If you now specify a key in the URL, like this:”Barcelona”

You’ll get only these two results:

{"id":"1111","key":"Barcelona","value":["alice","Alice","Abrahams","2012-08-31 01:06:35"]},
{"id":"2222","key":"Barcelona","value":["bob","Bob","Bacon","2012-08-30 02:06:35"]}

In other scenarios, you’ll need to SELECT COUNT(*). You can do this by using the REDUCE function. Couchbase includes a set of the most common reduce functions.

Go back and edit the “users / by_city” view that you created. On the textbox on the right, type _sum. The view should look like this:

Then, open


There you go. You have performed the equivalent to SQL:

FROM users

If you need to retrieve the amount of users living in a single city, just add the “key=” argument to the previous URL, like this:”Barcelona”

Then, you’ll get:


Which is the equivalent to:

FROM users


Every view will return an ordered set of rows, based on the lexicographical order of the emitted key, according to unicode. It’s roughly like “ORDER BY key ASC” in SQL. If you want your results in the reserve order, just add a param “descending” to the view URL. For instance, this call:

will return:

{"id":"3333","key":"3333","value":["charlie","Charles","Celsius","2012-08-29 02:06:35"]},
{"id":"2222","key":"2222","value":["bob","Bob","Bacon","2012-08-30 02:06:35"]},
{"id":"1111","key":"1111","value":["alice","Alice","Abrahams","2012-08-31 01:06:35"]}



Couchbase doesn’t have tables, so there is no such a thing as “SELECT * FROM …”. If you need that kind of partitions in your database (you probably will, by the way), you need to add an additional attribute to the documents. In the sample documents, this is called “jsonType” and simulates two different tables: active_users and inactive_users.

In this case, a query like:

SELECT, u.username, u.first_name, u.last_name, u.last_login
FROM active_users u

Will become to a view having this map function:

function (doc) {
if(doc.jsonType == "active_user"){
emit(doc._id, [doc.username, doc.first_name, doc.last_name, doc.last_login]);

which will return this info:

{"id":"1111","key":"1111","value":["alice","Alice","Abrahams","2012-08-31 01:06:35"]},
{"id":"2222","key":"2222","value":["bob","Bob","Bacon","2012-08-30 02:06:35"]}

As you see, performing basic SELECT operations in Couchbase is simple, very simple. In a next post, I’ll talk about more advanced topics such as selecting multiple keys at a time, joins, pagination and selecting between two given values.

Posted in couchbase, databases | Comments Off on Basic Couchbase querying for SQL people

Couchbase to Amazon S3 back-up script

On my last post, I was struggling with recovering all the documents from a Couchbase wacky installation after a system crash. It was an easy task to do, but I was very, very lucky to have the results of my “view all documents” view on a browser tab before the whole thing broke down.

I’m publishing on a couple of scripts on github, based on what I used to recover my database. The exporter ( can be used along with crontab to back-up your couchbase buckets to Amazon S3, while can be used to retrieve the data from S3 and restore it to a Couchbase installation.

I found a small tweak, already documented on Couchbase issues, but I think it is related to python’s library for Couchbase and not Couchbase itself.

Enjoy. And let me know if you are using it.


Thanks to Francis Varga, from, I discovered a faster way to retrieve all documents, based on port 8092 and not on querying a view.

Posted in Uncategorized | Comments Off on Couchbase to Amazon S3 back-up script

Emergency Recovery Script from Couchbase disaster

Today, I dealt with a totally broken installation of Couchbase 2.0.0 Developer Preview. I couldn’t find why but, although views were returning the right results, when I tried to retrieve a single document, I was getting the error below:

"error": "badrpc",
"reason": "{'EXIT',{{{badmatch,{error,closed}}, [{mc_client_binary,cmd_binary_vocal_recv,5}, {mc_client_binary,get_meta,3}, {ns_memcached,do_handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [{'ns_memcached-$data-default','ns...@'}, {get_meta,<>,37}, 30000]}}}"

This is taken from Network Monitor panel on Chrome. On Coucbase admin console (:8091) the only message I got was “Unknown error”.

I tried to upgrade the couchbase version by downloading one of the latests builds, which is said to be more stable and have less problems that Developer Preview. But, this is what happened when I tried to install the .deb package:

# dpkg -i couchbase-server-community_x86_2.0.0-1482-rel.deb
dpkg: warning: downgrading couchbase-server from 2.0.0dp4r to 2.0.0.
(Reading database … 35466 files and directories currently installed.)
Preparing to replace couchbase-server 2.0.0dp4r (using couchbase-server-community_x86_2.0.0-1482-rel.deb) …
* Stopped couchbase-server
Upgrading previous couchbase … (2.0.0dp4r)
Stopping previous couchbase … (2.0.0dp4r)
* Failed to stop couchbase-server
Saving previous couchbase config.dat …
Cleaning symlinks …
Unpacking replacement couchbase-server …
Setting up couchbase-server (2.0.0) …
Upgrading couchbase-server …
/opt/couchbase/bin/install/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes
Automatic mode: running without interactive questions or confirmations.
Upgrading your Couchbase Server to 2.0.0-1482-rel.
The upgrade process might take awhile.
Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat.debsave
Target node: ns_1@
ERROR: bucket default is configured but missing: /opt/couchbase/var/lib/couchbase/data/default
dpkg: error processing couchbase-server (–install):
subprocess installed post-installation script returned error exit status 1
Processing triggers for ureadahead …
Errors were encountered while processing:

And my documents just disappeared. Lucky me, I had in one tab of my browser the result of an “all/all” view, which looks like this:

function (doc) {
emit(doc._id, null);

accessed from http://myserver:8092/default/_design/dev_all/_view/all?full_set=true&include_docs=true

Notice the include_docs and the missing limits (from, skip). This makes an entire dump of your database, in JSON, which I saved into a txt file.

After that, I wrote this small piece of code, in Python, and ran it. After a few seconds, I had access to my documents again.

The problem was hard. I was lucky. But I love this kind of simplicity. I think I can write a easy couchbase backup/export and recovery/import tool after this.

Posted in hack | Comments Off on Emergency Recovery Script from Couchbase disaster

Creating an e-commerce platform with Couchbase 2.0

For the last couple of months, I’ve been involved in the development of an e-commerce website named, focused on beautiful, unique products created by designers all over the world with a story behind.

When I first met @swilera, @eoingalla and @cmgottschalk, the idea seemed so interesting to me that I got involved in the project really fast. And then, in a few weeks, we had the first release of We created it using Python, Django, Satchmo and running on the top of a PostgreSQL database engine and it worked well for some weeks, until we started developing the first business-level report queries. The whole object-relational stack the product was based on wasn’t enough at all for our purposes, so a technological change was needed.

And then, we moved to Couchbase. At first, I was quite skeptical on the power of NoSQL databases, but I soon changed my mind when I visit couchbase and read everything about it .

So let’s start on the conceptual model (yes, I am a lecturer at the UPC School of Informatics & Computer Science, so I’m meant to be a bit of an academic).

Conceptual Model

The CM of is roughly sketched below

So the first thing to do was to create a (say) database schema from it. After some reading and a lot of guessing, I figured out that the best way of thinking about the NoSQL document model is the “cluster-join” approach. This is: forget about inner or left-outer joins. Store all the data you can in an already “joined” form.

So, for instance, a user became something like this:

"_id": "44fb07ca-69ee-4fc1-bb44-e0b13f3d08d8",
"_rev": "56-000a6a6984fca8a10000065300000000",
"$flags": 0,
"$expiration": 0,
"email": "",
"password": "password_in_encrypted_in_some_magical_way",
"tos": "yes",
"name": {
"givenName": "Pablo",
"familyName": "Casado"
"jsonType": "user",
"last_login": "2012-07-20T21:31:34+00:00",
"lang": "en",
"time_zone": "Madrid",
"displayName": "Pablo Casado",
"billingAddresses": [
"addressee": "Pablo Casado",
"street": "Some Place",
"postalCode": "08840",
"city": "Viladecans",
"stateProvince": "Barcelona",
"country": "Spain",
"phoneNumber": "34690916113"
"created_at": "2012-03-02T20:03:57+00:00"

As you can see, addresses are stored in an array inside the user object itself, so you can get rid of one expensive join operation.

On the other hand, and following the same rationale, we created product documents that live on their own (jsonType = product) and also placed the same document information inside the Store documents (jsonType = store). By doing this, we created replication of information in our database, totally broke the traditional relational model and forced the system to cascade-update Stores when updating a Product information. It also increases the complexity of the code (update, edit) but, in return, it gives the system an amazing speed.

Saving data

How data is saved into Couchbase can slightly change depending of your chosen development language. But it essentially consists in creating a Javascript object, stringifying it and sending it to couchbase. Remember that Couchbase is a key – value storage system, so you’ll need to provide an ID to the document being sent.

Editing is, by the way, somehow more complicated. Depending on how you retrieved the document (ie: the format of your object) and the library you are using, you should remove some of the data in the document before saving it. This is because couchbase (even in the admin console) returns an error when trying to save a document having the same revision (“_rev” attribute). So I recommend you to delete the keys named _id, _rev, $flags and $expiration from your object before replacing an existing doc. Remember this in the future to avoid time-wasting mistakes.

Retrieving data

Back on PostgreSQL, a typical query would look like this:

SELECT givenName, familyName from users WHERE email = “";

Now, on Couchbase, the process is completely different. No more querying like that. It’s all about mapping and reducing.

On the Couchbase admin website above, you need to provide two different functions: the map (left, big) and the reduce function (left, small). In order to “query” the database, we’ll focus on the map function. So, to emulate the previous SQL query, your map function should look like this:

function (doc) {
if(doc.jsonType == "user") {
emit(, [doc.user.givenName, doc.user.familyName]);

Take a look back at the user document example. Specifying the table you want to retrieve data from is equivalent to the if statement above. The result of the query will only contain information for documents having that jsonType. Then, choosing the “columns” you want to retrieve is equivalent to specifying the “value” in the emit function (in this case, an array).

If you create a view like that on your couchbase administration website under “_design/dev_test/_view/test”, you can get the results from it by navigating to:””

Finally, the information you’ll get from couchbase will be like this:

"rows": [

Couchbase will always return the document ID and then, the key and the value you emitted. By passing the “key” argument in the previous URL, you are indicating to the database engine to return only information from the document having the matching key. In other words: you’ll only get results if the the first argument in the “emit” call matches the value of the key argument in the URL.

Stock control
The underlying business model of required a 15-minute reservation cart for each product a user added to the cart. This means that, if you have added a product to your shopping cart, you have 15 minutes to go through the checkout process (or add another item to the cart) before your reservation expires.

This made the stock control process more complex than usual. It involved not only decreasing the actual stock for each SKU after a purchase is completed, but also when users add a product to their carts. In order to deal with that, we created a set of documents having jsonType = cart, containing the ID of the user and and a list of the items added to it. Those were created when a user added a product to the cart. See the example below:

"_id": "011de442eace65a9c0de7ec1a1b86ad1",
"_rev": "1-0010fc329c910b380000014100000000",
"$flags": 0,
"$expiration": 0,
"jsonType": "cart",
"user": "852da85a-a730-4d40-ac67-b3e28059345b",
"created_at": "2012-07-12T09:11:21+00:00",
"updated_at": "2012-07-12T09:11:21+00:00",
"items": {
"bussoga-ceramic-tile-mural": {
"added_at": "2012-07-12T09:11:21+00:00",
"quantity": 1

To close the loop, a new “reduce” view was created. The reduce function was just “_sum”. And the map function looked like this:

function (doc) {
if(doc.jsonType == 'cart') {
for(item in doc.items) {
emit([item, doc.updated_at], doc.items[item].quantity);

The aim of this view was to get the amount of items in every single cart document. To achieve this goal, we used the startkey and endkey arguments of a Couchbase view call, like this:["bussoga-ceramic-tile-mural", "2012-01-01T00:00:00+00"]&endkey=["bussoga-ceramic-tile-mural", "2012-08-01T23:59:59+00"]&group=true&group_level=1

What we are doing here is specifying a startkey and an endkey (couchbase will only return keys between those values), and ask for a level 1 reduction. In plain english: We are asking couchbase to add (_sum) the emitted values (quantities in cart) of the carts containing the product and created between those two dates.

You can do a lot of magic by emitting complex keys in your map functions. Start and end keys will always be sorted alphanumerically according to unicode. Read more about sorting here:

So, each time we needed to retrieve the stock of a product, we called the previous view by indicating the ID of the product and two datetimes string: one containing a “fifteen minutes ago” and the other containing a “now”.

Couchbase is perfect for reporting queries but, apparently, it is not so good when trying to emulate the “commit” mechanism of a relational database.  There is no such a thing as COMMIT, BEGIN TRANS or END TRANS. If you want to get the current contents of the view, they won’t be available until the next time you call the view. Let’s use an example to explain this.

1.- Call view. View returns 1, 2 and 3
2.- Insert 4
3.- Call view. Still returns 1, 2 and 3.
4.- Call view. Now, it returns 1, 2, 3 and 4

This is caused because every view created a index when it is updated and just after returning the results. Couchbase uses that index to return the stored values very, very fast. After the operation finished, it updated the index again. In the previous example, The index is updated after step 3. Not before step 3 or during the execution of step 3. This is why you get the “wrong” values.

How to deal with this in a almost-real-time environment like the 15-minute reservation explained before? Well… the answer is simple. Just add stale=false to the call, like this:["bussoga-ceramic-tile-mural", "2012-01-01T00:00:00+00"]&endkey=["bussoga-ceramic-tile-mural", "2012-08-01T23:59:59+00"]&group=true&group_level=1&stale=false

This will force Couchbase to “update the index” before returning you any results. Then, you’ll get the expected, traditional results in step 3 above. But be aware of using this carefully and only in queries that require it, because it will decrease the performance of your database.

Where is my JOIN?
By choosing the described “join-cluster” approach when defining the structure of your data documents, you’ll stay away from headaches involving joins and couchbase. A newbie would say that it is impossible to “join” in Couchbase, as there are no foreign keys and the map functions cannot access all documents several times.

The first (and naïf) approach would consist of retrieving the designer by ID and then, query Couchbase again to get all the products by their designer. The “products by designer” view would look like this:

function (doc) {
if(doc.jsonType == 'product') {
emit(doc.designerId, [,, doc.images, doc.designer])

(Note: doc.images is an array)

If you need to retrieve all the products of a single designer, you can send the id of the designer like this:”bussoga”

And, if you need to retrieve all the products from several designers, you can go like this:["atelier-melis","bussoga"]

SQL-like speaking:
– key is equal to “WHERE designerId = ‘some_designer_id’”
– keys (notice the array!) is equal to “WHERE designerId IN some_list_of_designer_ids”

This can be useful in many situations, but it wasn’t exactly what we were trying to achieve. Let’s say that we don’t want to be naïf and that we don’t want to query the database many times by the use of some ignote join-like operator. There is a technique named View Collation that will emulate the inner join operation. Trust me: view collation is your friend. I learned it from here:

To keep ourselves focused: the idea is to retrieve the designer and all his/her products by querying the database only once. Then, you’ll need to create a view like this one:

function (doc) {
emit([doc._id, 0], [doc.title,]);
}else if(doc.jsonType == "product"){
emit([doc._id, 1], [,, doc.images]);

And call it like this:["bussoga",0]&endkey=["bussoga\ufff0",1]

Because of the UTF-8 sorting implemented in Couchbase, the first element in the result rows will be the designer having the specified key. Then, the following elements will be products designed by that designer. You have your “join”, now. This is a sample result:

"total_rows": 30,
"rows": [
"id": "bussoga",
"key": ["bussoga", 0],
"value": ["Bussoga", "Girona"]
"id": "bussoga-ceramic-tile-mural",
"key": ["bussoga-ceramic-tile-mural", 1],
"value": ["bussoga-ceramic-tile-mural", "Ceramic Tile Mural", ["1.JPG", "3.JPG"]]
"id": "bussoga-fake-product",
"key": ["bussoga-fake-product", 1],
"value": ["bussoga-fake-product", "Bussoga Fake Product", ["SFM_skull3.jpg"]]

On returning all the contents of the docs
In a very early stage of the development, we had views that returned the whole document as the emitted value. The simplest view like this should be:

function (doc) {
emit(doc._id, doc);

Soon, we found out that doing that was a mistake because we were using much more bandwidth than we initially expected. Remember: usually, you don’t need all the information on your documents to render your web pages, but a subset of it. I encourage you to emit arrays containing only the desired information. Like this:

function (doc) {
emit(doc._id, [doc.attr1, doc.attr2, doc.attr3, doc.attr4]);

And, just in those cases where having all the information is mandatory, you can use a view like this:

function (doc) {
emit(doc._id, null);

And call it passing the “include_docs=true” argument. For instance:

Then you will get something like this:

"total_rows": 55908,
"rows": [
"id": "000bee1b-6453-4505-a0b0-351ecb69c361",
"key": "000bee1b-6453-4505-a0b0-351ecb69c361",
"value": null,
"doc": {
"_id": "000bee1b-6453-4505-a0b0-351ecb69c361",
"_rev": "1-00106aaf7d628d2500000b6a00000000",
"$flags": 0,
"$expiration": 0,

Notice the new key in each row of the result, named “doc”. It will contain the whole document.

Concerning project management
I am a developer, and a geek, and a motorcycle fan and I like to learn lots of new things. But I am also an experienced Project Manager and CTO. And I am very concerned about learning curves, development times and costs.

Well… I have to say that the development team at Presive consisted in 4 people, including myself. Two of us were involved full-time in the project. The other two people were part-time interns. We all together managed to get from zero to up and running in just 3,5 weeks. It took us 6 weeks to customize Satchmo. Keep this in mind.

Posted in databases, hacker, programming languages, project management | Comments Off on Creating an e-commerce platform with Couchbase 2.0

Marketing electoral, comunicación 2.0 y políticos

Aviso: esta historia, aunque dramatizada, es real. Los nombres se han omitido para preservar el anonimato a los protagonistas de tal acontecimiento.

Me contaban ayer un caso de esos que me causan una mezcla entre desazón, angustia y risa histérica. Para poner a la audiencia en situación, digamos que se trata se una conversación real entre un diseñador gráfico, especialista en márketing 2.0 y un relevante político sobre cómo encarar la estrategia de comunicación para la campaña electoral de las pasadas elecciones municipales.

[Político]: Hola, Diseñador. A ver, ¿qué me traes para la campaña electoral?

[Diseñador]: Pues mire, Sr. Político… Nuestro departamento de marketing y nuestro departamento de diseño se han reunido con su gabinete de campaña y hemos pensado una estrategia de comunicación on-line que será todo un éxito.

Para cada municipio hemos diseñado un plan de comunicación personalizado para cada candidato, dependiendo de si está en el gobierno o en la oposición. Además, hemos previsto actuaciones en facebook y en twitter en base a los mensajes que el gabinete de campaña nos había pasado.

Todo esto, claro, se acompaña de acciones presenciales en los barrios, como es costumbre… pero la idea es centrar más la campaña en la presencia en Internet y en redes sociales, porque es donde se genera opinión y hay que posicionarse en estos medios cada vez más…

[Político]: Vale… Está bien… Me gusta la idea. Pero… ¿qué tienes para mi pueblo?

[Diseñador]: Hombre, Sr. Político… Claro, no pensaba iniciar nada en concreto para su municipio sin hablar con usted primero para ver cómo lo podíamos enfocar… Tengo bastantes ideas que creo que pueden funcionar muy bien, pero si usted tiene alguna propuesta en concreto, me encantaría poder escucharla antes de empezar a trabajar en ellas.

[Político]: Sí. Mira. Vamos a repartir lápices que ponga “vota XXXXX”, que es lo que nos ha funcionado siempre.

Y ésta es, pues, la gloriosa situación del marketing on-line en este país.

Posted in Uncategorized | Comments Off on Marketing electoral, comunicación 2.0 y políticos

Psycopg2 and PostgreSQL 9.1 on Snow Leopard

Today I’ve had to install psycopg on my MacBook Pro running Mac OS X 10.6.8 (10K549) and I had some troubles doing that, so I’ll post this here just for the records.

To install PostgreSQL, i used the .dmg provided by postgresql web itself, as I’m not a big fan of fink or macports. And I downloaded psycopg from the project web. I’m not a big fan of Fink or MacPorts, so I didn’t use any of them to get any software, actually.

My first attempt was to install psycopg as usual, typing ‘$python install’, but I get this error:

Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
 python build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.

That one was easy. I just added the location of pg_config at the end of the command. It looked like this: ‘$PATH=$PATH:/Library/PostgreSQL/9.1/bin python install’. Note that this is the path for a typical dmg installation of PostgreSQL on a Mac. It may differ to the one on your system.

And this is when things got complicated. The next error the system spit looked like this:

Running psycopg2-2.4.2/ -q bdist_egg --dist-dir /tmp/easy_install-LKNkhA/psycopg2-2.4.2/egg-dist-tmp-PbhT1X
no previously-included directories found matching 'doc/src/_build'
/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
Installed assemblers are:
/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
psycopg/psycopgmodule.c:1035: fatal error: error writing to -: Broken pipe
compilation terminated.

I then swith to easy_install, which is probably the easiest way to get python modules onto your system. Then, I typed: ‘$PATH=$PATH:/Library/PostgreSQL/9.1/bin/ sudo easy_install psycopg2’, but the same error appeared.

I also used pip, just in case ‘PATH=$PATH:/Library/PostgreSQL/9.1/bin pip install psycopg2’, but it resulted in the same error, again.

Finally, I assumed this should be something related to the architecture (well… that word appeared a couple of times in the error message… not a so-educated guess). I googled for a while and I found the answer in this post in StackOverflow.

Adding that env directive to the command, make it look like this: ‘$PATH=$PATH:/Library/PostgreSQL/9.1/bin sudo env ARCHFLAGS=”-arch i386 -arch x86_64″ pip install psycopg2’ and tada! It worked

Successfully installed psycopg2
Cleaning up...

I think this is something related to the development environment. Probably somehow related to XCode and iPhone development 🙁

Hope this helps.

Posted in Uncategorized | Comments Off on Psycopg2 and PostgreSQL 9.1 on Snow Leopard

Hasta siempre y gracias por el pescado

To read this in english, click here

Escribo este post para hacer público que dejo mi puesto como CTO en Layers.

He hecho un buen puñado de amigos y juntos hemos hecho cosas maravillosas. Cuando Marcos Cuevas y yo empezamos esta aventura, en mayo del 2008 (pensando en desarrollar lo que después se conoció como Spotify), no podía imaginar que Layers fuese tan grande.

Primero desarrollamos una herramienta de anotación para la web, llegando la ser “la extensión con mayor crecimiento en número de usuarios que haya existido jamás para Chrome” y que pronto estará disponible como plug-in de Moodle. Después fuimos invitados a participar en el lanzamiento mundial de la Chrome App Store y creamos Layers4GoogleChrome, que ofrece la mejor experiencia de usuario en consumo de contenidos procedentes de Facebook y Twitter. También Layers4Publishers, que es ese fantástico widget que puedes ver en la parte de abajo de esta página. Y muchas más cosas que están por llegar.

Pero ha llegado el momento de hacer otras cosas.

Quería agradeceros estos años a todos los que habéis estado cerca mío. Ha sido un placer trabajar con vosotros y me siento honrado de haber sido vuestro compañero. He aprendido algo de todos y cada uno de vosotros en Conformáis el mejor equipo que haya existido nunca.

Como escribió Douglas Adams:
Hasta siempre (y gracias por el pescado)!

Posted in Uncategorized | Comments Off on Hasta siempre y gracias por el pescado