View on GitHub

Crawlable

A way to make your web application crawlable, so it can be well referenced on the web.

Download this project as a .zip file Download this project as a tar.gz file

Crawlable is a way to render your web application as a static web site

When you develop some cool features on a web project, there is a good chance that you do some ajax requests. In the case you are developing a web application with backbone.js for example, you have no choice but to use the ajax feature proposed by jQuery. So you are developing some great stuffs, but if your project needs to be viewed on the web, you will wonder two things:

Crawlable could be your solution ! It is able to render your dynamic client side stuffs written with javascript, on the server side. By this way, it can give a static cached html to your client, before any javascript code started to be executed on the web page.

You may say now, "ok, but what if I have cached some dynamic content which could be updated at every time !?".

Crawlable doesn't simply cache html, it uses a module named Solidify to generate a derived version of your client side templates before storing it. When a client requests the server, Crawlable will feed the cached template with some updated data before giving it to you.

How does it works ?

Before explaining how all of this can be used, you need to understand how it works a little more deeply.

Here are the steps Crawlable is going through to compute your final server side rendered html:

How to use it ?

Crawlable uses phantomjs to render the web page on the server side, but you have no need to install it yourself, the installer takes care of it for you.

But, phantomjs uses python. So you should have it installed to make the whole thing work.

Then, install it like this:

npm install crawlable --save

At this time, Crawlable is very convenient to use with the great Express and Connect modules. As we saw above, Crawlable is not simply a server side module, but also a client side library.

On the client side, you would use it with the JQuery plugin named jquery.crawlable.js. This plugin depends on the Solidify plugin named jquery.solidify.js, which also depends on the Handlebars template engine and JQuery.

So you would include something like this in your html:

    <script type="text/javascript" src="/jquery.js"></script>
    <script type="text/javascript" src="/handlebars.js"></script>
    <script type="text/javascript" src="/jquery.solidify.js"></script>
    <script type="text/javascript" src="/jquery.crawlable.js"></script>

How to use it on the server side with Express ?

The code below is what your app.js file could contain.

var Crawlable = require('crawlable');

Crawlable.express({
    port: process.env.PORT || 5000, // the listened port.
    configure: function (app, express) {
        // you can configure your app here.
        app.use(express.favicon());
        app.use(express.static(__dirname + '/public'));
        app.use(express.logger('dev'));
        app.use(express.errorHandler({ dumpExceptions: true, showStack: true }));
    },
    routes: function (app) {
        // register your api routes here.
        app.get('/my/api/route', function (req, res) {
            // do something in your API route.
        });

        // and your crawlable routes.
        app.crawlable('/');
    },
    render: function (req, res) {
        // specify a way to render your application.
        res.render('app.html', { html: req.crawlable.html });
    }
}, function (err, app) {
    if (err)
        return console.log(err);
    console.log('The application is ready.');
    // crawl every routes in order to generate the crawlable cache.
    app.crawl();
});

In this example, we create a ready to use Express application. The server is already configured to be used with Crawlable. This means each registered routes (get|post|put|del) define your API, and each registered Crawlable routes define a way to render your application.

When a client requests a Crawlable route, the render function is called with the req.crawlable object filled with some elements you may need to render your application (most of the time your need the req.crawlable.html string, which contains the html of your rendered application for this route).

Notice the Crawlable.express configures the application to use HandleBars as a template engine.

The index.html template could be as below.

<html>
    <head>...</head>
    <body>
        <!-- Where you put your static application content at the first place. -->
        <div id="app">{{{ html }}}</div>

        <!-- JS libraries -->
        <script type="text/javascript" src="/jquery/jquery.js"></script>
        <script type="text/javascript" src="/handlebars/handlebars.js"></script>
        <script type="text/javascript" src="/underscore/underscore.js"></script>
    <script type="text/javascript" src="/backbone/backbone.js"></script>
        <script type="text/javascript" src="/backbone.babysitter/lib/backbone.babysitter.js"></script>
        <script type="text/javascript" src="/backbone.wreqr/lib/backbone.wreqr.js"></script>
        <script type="text/javascript" src="/marionette/lib/backbone.marionette.js"></script>
        <script type="text/javascript" src="/solidify/jquery.solidify.js"></script>
        <script type="text/javascript" src="/crawlable/jquery.crawlable.js"></script>

        <!-- Application sources -->
        <script type="text/javascript" src="/app.js"></script>
    </body>
</html>

Description of Crawlable.express

Crawlable.express(options, callback) {...};

options:

callback: a function called just after the application has started (function (err, res) {...};).

app object extra features:

Adapt a Backbone.Marionette application to Crawlable

The code below is what your app.js file could contain.

// Be sure to use the solidify template engine.
Backbone.Marionette.TemplateCache.prototype.compileTemplate = function (rawTemplate) {
    return Backbone.$.solidify(rawTemplate);
};

// Create a Marionette application (it could be Backbone.js or whatever you want).
var app = new Marionette.Application();

$(document).ready(function () {
    // Initialize your main application anchor with crawlable.
    // It says to crawlable to wait for the application to be fully loaded, before injecting the code
    // into the <div id="#app">.
    $('#app').crawlable();

    // Start your application.
    app.start();
});

This example of code shows you how to render your application over the static html, cached by Crawlable. By doing that, your application page will seems to be fully loaded even if your javascript code hasn't been executed yet.

Description of the jQuery.crawlable plugin

jQuery("selector").crawlable(options);: define an anchor in which your dynamic application will be injected when fully loaded.

options:

Dynamic cached templates

You may wonder now, "What if my page deals with some dynamic contents ?". Crawlable is able to handle it, but you will have to adapt your client side templates.

For this example, imagine we want to render a list. We have a Collection and a View, rendering an ItemView for each Model of our Collection.

By using some Handlebars templates, see how we do it.

Here is the Item template:

<!-- specify the needed request to fetch the data -->
{{solidify "/api/items"}}
<!-- the same as {{#each}}, but for the server side rendering only (client will ignore it) -->  
{{#solidify-each "this"}}
    <!-- dereference the field content, will be interpreted on the client and server side -->
    <li>{ {content} }</li>
{{/solidify-each}}

Now here is the List template:

<div>
     <h1>My list</h1>
     <div>
          <!-- Include a template. This is for the server side only, the client simply ignore it -->
          {{solidify-include "/templates/item.html"}}
     </div>
</div>

As you can see, we just have to respect some extra rules to make our template understandable by Crawlable.

Quick description of the Solidify syntax

You can see the Solidify documentation for details, but here is what you need for now:

Notice that every other HandleBars syntax are available, and all the syntaxes we saw which are used on the server side only, are completely ignored by Solidify on the client side, so it has no influence on your client side original templates.

What technologies does it use and why ?

Crawlable uses the excellent PhantomJS through a bridge, implemented in the node module phantom It is light because only one PhantomJS process is used. This process runs like a "page pool", meaning that an amount of pages is launched at the start and only these PhantomJS pages are used to render the html. By doing this way, Crawlable saves a lot of memory and can consider doing some efficient parallel renderings.

Crawlable also uses nedb by default to store data. This can handle an "in memory" and a "persistent" storing. It is also totally embedded and very light.

Want an example ?