Revisiting charts: NodeJS, Twitter and Elasticsearch

Some time ago, I developed a mashup using NodeJS and Couchbase to demonstrate how to use Couchbase to create stats and charts. Despite I still use Couchbase for many things in my day-to-day basis, it does not provide any comfortable way to perform free text search, so I moved to ElasticSearch for some projects. In this series of  posts I show how to create a very basic NodeJS application that gets tweets from Twitter stream, stores them into ElasticSearch and get some charts about them.

First thing first, you have to install ElasticSearch. If you are using a Mac and Homebrew, just brew the following formula:
$ brew install elasticsearch

And then, run ElasticSearch as you prefer. Example:
$/usr/local/bin/elasticsearch -f -D es.config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

After a few seconds, you will get something like this on you terminal:
[2014-01-13 20:55:08,290][INFO ][node ] [Rhiannon] {0.20.2}[36723]: initializing ...
[2014-01-13 20:55:08,309][INFO ][plugins ] [Rhiannon] loaded [], sites []
[2014-01-13 20:55:17,488][INFO ][node ] [Rhiannon] {0.20.2}[36723]: initialized
[2014-01-13 20:55:17,526][INFO ][node ] [Rhiannon] {0.20.2}[36723]: starting ...
[2014-01-13 20:55:17,967][INFO ][transport ] [Rhiannon] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
[2014-01-13 20:55:21,391][INFO ][cluster.service ] [Rhiannon] new_master [Rhiannon][pDxyYfeqSo2Mf1WWNFzwJA][inet[/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2014-01-13 20:55:21,529][INFO ][discovery ] [Rhiannon] elasticsearch_apolion/pDxyYfeqSo2Mf1WWNFzwJA
[2014-01-13 20:55:21,582][INFO ][http ] [Rhiannon] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[/127.0.0.1:9200]}
[2014-01-13 20:55:21,583][INFO ][node ] [Rhiannon] {0.20.2}[36723]: started

This means that ElasticSearch is running. Now, point your browser to http://localhost:9200 and you should get something similar to:
{
"ok" : true,
"status" : 200,
"name" : "Rhiannon",
"version" : {
"number" : "0.20.2",
"snapshot_build" : false
},
"tagline" : "You Know, for Search"
}

So you know you can start using ElasticSearch. But before starting, you'll need to install the HEAD plug-in in order to make your life easier with ES. Quit ElasticSearch by entering Ctrl+C on the terminal and type this command that will install the plugin
/usr/local/bin/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Then, run ES again and brose to http://localhost:9200/_plugin/head/. You will see something like this:


Indicating that the ElasticSearch cluster is running and ready to store your documents.

Now it's when the fun begins. Browse to https://github.com/hardlifeofapo/twitter-node-elasticsearch and clone the project on your local machine. Once you have cloned it, go inside the directory and run
$ npm install
$ node app.js

Before doing anything else, please do edit the file named keys.js and update it with your Twitter Tokens. You can get them on http://dev.twitter.com

Now to http://localhost:3000, where you will see a simple website based on Twitter Boostrap. Click on the Stream button on the navigation bar and then on the blue button that says "Start Streaming".


By clicking on the button, a connection to Twitter's Streaming API will be made and you will start receiving tweets. These will appear on the page above.

Now at at the code, at the file named app.js. For each tweet received, it is saved on the callback named saveOnElasticSearch and then emitted to the front-end via socket.io. The interesting part here is how the documents are saved using the following call
client.index({
index: 'tweets',
type: 'tweet',
id: myKey,
body: aTweet
}, function (err, resp) {
console.info(err);
console.info(resp);
if(!err){
callback(aTweet);
}
});

Go now again to http://localhost:9200/_plugin/head/ and check that a new index has been created and the documents have been saved into it.

And that's it? No CREATE DATABASE? No schema? No nothing?

Well.. Yes and no. The index (database) is created automatically and the mapping (schema) is created from the documents that you just inserted. This is a lot more trickier, but this will work for this demo (probably you noticed the parsing of the date... that's one of the tricky parts). By default, ElasticSearch will index every single field of every document in the index, so that allow you to query by any of them.

On of the most interesting parts of ElasticSearch are the facets, that we are going to use to create the chart on the stats page. Facets are aggregations based on data. Something similar to asking the database to calculate average, median... and many other things such an histogram. Read more about facets on the docs.

It is the histogram facet that we are going to use. Check the file named routes/tweets.js and see this function
client.search({
index: 'tweets',
size: 0,
body: {
"query": {
"match_all": {}
},
"facets": {
"histo1": {
"date_histogram": {
"field": "created_at",
"interval": "minute"
}
}
},
"size": 0
}
}, function (error, response) {
console.info(response.facets.histo1.entries);
res.render('stats', {"title": "Stats", "data": response.facets.histo1.entries});
});

What we are doing is using the search method of the ES client to query the database. We don't want any results, so size = 0. But we want the histogram facet and we want it to count the tweets created at each minute. Check the docs for more info.

The result of the ElasticSearch query is sent into a template and rendered as a Javascipt variable that is later parsed by amcharts (See routes/tweets.js and public/js/templates/stats.ejs).

And that's it. On following posts I will explain how to perform advanced querying to ElasticSearch by adding more expressions to the "query" object (so, using a filter different from this "match_all").


Popular posts from this blog

Diseño y comités

Creating an e-commerce platform with Couchbase 2.0

Emergency Recovery Script from Couchbase disaster