Scaling Nodechecker.com

Nodechecker is an app I built a few weeks ago that automatically tests all node.js modules available in the NPM registry. Check these slides from require(‘lx’) for more details, they explain the idea and motivation behind Nodechecker but the technical stuff is now outdated as you will see in this post.

It all started with a PoC around node.js and docker.io, it worked well for the NPM use case but it didn’t scale for other use cases like on demand testing.

Nodechecker new architecture

Nodechecker new architecture

This is the new architecture, it relies on multiple Docker instances completely abstracted each one running nodechecker-engine, I call this entity worker nodes.

The beautiful thing about this is, that I can add more worker nodes without even rebooting/restarting anything. Just boot another VM with Docker already installed and run nodechecker-engine specifying the nodechecker-balancer‘s ip address in the argument.

When the engine starts it will make a dnode call to nodechecker-balancer basically saying “i’m here and ready to rock please add me to the available worker nodes list

Nodechecker-balancer always dispatches work to the node that have the smallest work queue, if you add new nodes they will probably be the ones getting new work orders.

This architecture is so agile that if you just want to fiddle a little with it you don’t need to run a balancer, just run a single worker node without specifying an ip address in it’s nodechecker-engine. Then in the nodechecker-crawler and api just use the worker ip address.

You may even run crawler, api and balancer and even a worker inside a single machine 🙂

Everything is still in a rough state, feel free to contribute.

Advertisements