Scaling Nodechecker.com

Nodechecker is an app I built a few weeks ago that automatically tests all node.js modules available in the NPM registry. Check these slides from require(‘lx’) for more details, they explain the idea and motivation behind Nodechecker but the technical stuff is now outdated as you will see in this post.

It all started with a PoC around node.js and docker.io, it worked well for the NPM use case but it didn’t scale for other use cases like on demand testing.

Nodechecker new architecture

Nodechecker new architecture

This is the new architecture, it relies on multiple Docker instances completely abstracted each one running nodechecker-engine, I call this entity worker nodes.

The beautiful thing about this is, that I can add more worker nodes without even rebooting/restarting anything. Just boot another VM with Docker already installed and run nodechecker-engine specifying the nodechecker-balancer‘s ip address in the argument.

When the engine starts it will make a dnode call to nodechecker-balancer basically saying “i’m here and ready to rock please add me to the available worker nodes list

Nodechecker-balancer always dispatches work to the node that have the smallest work queue, if you add new nodes they will probably be the ones getting new work orders.

This architecture is so agile that if you just want to fiddle a little with it you don’t need to run a balancer, just run a single worker node without specifying an ip address in it’s nodechecker-engine. Then in the nodechecker-crawler and api just use the worker ip address.

You may even run crawler, api and balancer and even a worker inside a single machine 🙂

Everything is still in a rough state, feel free to contribute.

Rebuilding Outkept

Outkept DashboardLast November I did a talk at Codebits about a private project that ended to be Outkept, it was basically a project that was build for a  specific use case and then I generalized it.

At the time I developed it in Java, but after the talk I decided to rebuild it for more generalist use cases and start everything entirely from scratch in node.js.

Outkept allows you to control and monitor a large server cluster, without needing to manually specify which and what servers to monitor/control. Instead you just define which subnets you want to control and then, using SSH, Outkept‘s crawlers will look for servers and what do they support.

Rebuilding it in node.js was awesome, allowing me to tackle my node.js skills and dig more into node’s scene while using a lot ‘earlier adopter’ tech.

Right now Outkept v2 supports everything the old version supported and even more, things are quicker and more fluid. New dashboard connects using shoe (sockjs) and the new system relies entirely in multiple node.js processes.

I will talk more about this project later, but right now I would love some feedback. The codebase is big and mostly uncommented, in the next posts will fix this and talk a bit about it.

If you want to give it a try just follow these instructions. If you need help you can reach me at petermdias@gmail.com

Disrupting Java apprentices with node.js

I started lecturing a few years ago, for a while it was a fulltime job. I lectured mainly in two programming languages: C in introductory classes (first semester) and Java for OOP, data structures and distributed systems classes (equally distributed along the bachelor degree).

After a while I got bored and nowadays being a lecturer is not a fulltime job, my main contribution now is to bring fresh tech into the play and introduce it to seniors and other lecturers and this is where this story starts.

nodejs_logo

About one year ago I started diving into node.js. Initially I only used it for non critical stuff, still using Java as my main language.
I really liked node from the first day but I just didn’t have the time to dive into it at the time.
About 6 months ago everything changed, I’m really hooked into it and now I finally feel that I acquired enough skill to allow it to replace Java in my head.
Node ecosystem is awesome and it doesn’t have all that clutter that Java has, it is so clean and agile…

This semester I decided to bring node.js into a senior class and simultaneously a few senior projects.

Remember how I started this post?

In here almost every design/programming topics are taught Java, when I gave the first class about JavaScript and node.js everyone was like “WTF”?
A few students even looked at me thinking I was drunk because in their heads Javascript was for browsers and nothing else.

In order to sell them node.js I live coded a drag-n-drop DIV element that moved simultaneously in all browsers using socket.io. Instantly everyone on the room was curious about this, even the more sceptic ones. (if you want to disrupt someone show them something very graphic/explicit and that I did)
Code available at https://github.com/apocas/psi2013-node (it uses prototype based objects, modules, and events)

It’s true that I could have implemented that in almost any other language, but the lack effort needed to implement it in node.js is really impressive and that was what disrupted the audience (npm awesomeness helped :-D).

Right now seniors are starting to put their hands on node.js at multiple projects (https://github.com/portugol – Portugol rewritten in node.js) so far I feel that the hardest thing for them is the asynchronous architecture.

Although one big advantage I felt was the fact that they came from a language where everything is an Object (Java), because of this they quickly understood objects in JavaScript and how can events be used for message passing in an asynchronous environment.

In my opinion this is one of the most important thing to understand in the Javascript/node.js world.

.onion Mapper

Initially I had the idea of developing this during Codebits, but I ended up in developing this for fun before the event, invalidating any idea of using it.

.onion is a “non official” top-level domain suffix, the big thing about this TLD is that you can only access domains in it over the TOR network.

Due to the anonymity characteristics of the TOR network and the necessity of using it to access this TLD, very often this type of networks are called deep web.

The idea here was to crawl the .onion network, but instead of crawling and data mining its contents I just wanted to crawl its server’s relationships.

The stack is very simplistic, at the infrastructure level this was built using a main control node. Which runs node.js and a Redis instance.

Additionally there were multiple crawlers running a onion tweaked version of crawler4j, each crawler grabs the *.onion links in the html code and saves them (domains relationships) in Redis using Jedis. At the network level Polipo was used as a proxy and obviously tor client.

Each domain relationship is displayed in a graph which is rendered using sigma.js, all data is delivered to the browser using socket.io.

In two days, it crawled 1.5M urls finding and mapping relationships between 440 domains. Keep in mind that this crawling was done inside the tor network, which sometimes have very high latency times.

Finally here it is.

Control your cloud from node.js

There was no node.js module implementing Onapp API, so I realized that it should be a good idea to implement it for my first npm published module. 🙂

There are still a lot of methods to implement, but the basic stuff is there. When i have time i will implement more.

Module’s structure is similar to other node client implementations out there, it is very readable.

node.js community is awesome and is certainly something I’m going to be doing more often.

Check it out at: https://github.com/apocas/node-onapp

Installation, as usual, is done using npm awesomeness

npm install onapp

In order to started you need to instantiate a client.

var onapp = require('onapp');

var config = {
 username: 'username@email.com',
 apiKey: 'api_hash',
 serverUrl: 'http://192.168.1.1'
};

var client = onapp.createClient(config);

The options passed during VM creation, are exactly accordingly to Onapp API. This way you create a VM like you were using the original API.

var options = {
  memory: '1024',
  cpus: '1',
  cpu_shares: '50',
  hostname: 'tests.tests.com',
  label: 'VM from node',
  primary_disk_size: '10',
  swap_disk_size: '1',
  primary_network_id: '2',
  template_id: '6',
  hypervisor_id: 2,
  initial_root_password: '12345675',
  rate_limit: 'none'
};

client.createVirtualMachine(options, function (err, vm) {
  if(err !== null) {
    console.log(err);
  } else {
    console.log(vm);
  }
});

Powering off a VM.

client.getVirtualMachine('vm_id', function (err, vm) {
  if(err !== null) {
    console.log(err);
  } else {
    vm.off(function(error, data){});
    //vm.reboot(function(error, data){});
    //...
  }
});