.onion Mapper

Initially I had the idea of developing this during Codebits, but I ended up in developing this for fun before the event, invalidating any idea of using it.

.onion is a “non official” top-level domain suffix, the big thing about this TLD is that you can only access domains in it over the TOR network.

Due to the anonymity characteristics of the TOR network and the necessity of using it to access this TLD, very often this type of networks are called deep web.

The idea here was to crawl the .onion network, but instead of crawling and data mining its contents I just wanted to crawl its server’s relationships.

The stack is very simplistic, at the infrastructure level this was built using a main control node. Which runs node.js and a Redis instance.

Additionally there were multiple crawlers running a onion tweaked version of crawler4j, each crawler grabs the *.onion links in the html code and saves them (domains relationships) in Redis using Jedis. At the network level Polipo was used as a proxy and obviously tor client.

Each domain relationship is displayed in a graph which is rendered using sigma.js, all data is delivered to the browser using socket.io.

In two days, it crawled 1.5M urls finding and mapping relationships between 440 domains. Keep in mind that this crawling was done inside the tor network, which sometimes have very high latency times.

Finally here it is.

Designing a monitor and control system for 200+ servers

A few months ago i had to design a proactive monitoring system that could handle 200+ servers with ease. The idea was not to build a simple monitor that passively watched the server farms notifying the admins when some threshold was reached.

Keeping a team watching the servers 24h/7 has its problems, if the system could lighten up the load on them would be great.

I wanted the system to have some capability of reacting according with the scenario it had at the moment. This scenario is represented by all the readings of each sensor loaded at the time and it may be a single server contained scenario or farm/cluster wide. With this reactive capability humans are notified only for situations that the system couldn’t handle/contain.

Sorry if i offended someone with the project name (Skynet), too much movies… lol, but fyi it has TTS library for many things but one of them is saying “hasta la vista baby” 😛

Architecture

  • Starting from the core piece, it was written in Java for two main reasons. First was because at the time i had only a few days to implement the prototype of this and since i have years of experience in Java so it is where i was most productive.
  • Second reason was “Reflection“, i know many other languages let you inspect and execute code at runtime, but again previous experience in the technology allowed me to cut corners. Runtime inspection/execution was obligatory since i wanted to be able to add components/sensors/… at any time and more important abstract all this.

Skynet Schematic

Input sources

  • Currently Skynet has many input sources, the mainly one is sessions over SSH opened to each server which allow to monitor everything in each server, accordingly with each server profile the right set of sensors will be loaded at runtime using reflection.
  • This SSH sessions are, of course, used by Skynet to actively interact with the servers. For example block an ip, keeping mail queues clean, stop some non critical services if a server is under stress, etc. All this is done automatically and if the problem fails to be contained then humans are alerted for the problem.
  • The second main input source is Mail, this is great since end-users/customers can interact with the system without knowing and without human intervention, for example: requesting an ip unblock from a server in an shared hosting cluster.
  • There are many others like: RSS feeds, SMS and so on. RSS feeds support is a funny history, Skynet actively scans defacements feeds (like zone-h and others) for IPs from any one of the servers connected to it. If a match is found it alerts the admins allowing them to alert the website owner.
  • Applications are endless.

Data

  • All events and readings are stored in a offsite Redis instance, adding persistence capability.

Ouput

  • Current version have modules for SMS, Mail and  Twitter. Twitter is used almost like a timeline log for each action Skynet does and since there is almost a twitter client in any electronic device nowadays, its the perfect on the go log solution. (feed is kept private)

Security

  • The machines where Skynet core is running are in a secure location without any direct input connections form the web. Since SSH sessions are used to talk with the servers, there were a real danger if the location was compromised.
  • Key authentication is used and keys are saved only in volatile memory. If the power goes down they are lost, so if even someone steal the machines they will not be able to reestablish the sessions with the servers in the new location.
  • It is totally autonomous, accepting only emergency shutdown in case something starts to deviate. This shutdown command is not sent directly to the Skynet since theres no direct connection to it from the outside, instead its saved in a location where Skynet connects to check for emergency commands. (Botnet style)

Web Architecture

  • Here goes my favorite part of all this. That Redis instance had to be accessed  someway, for me the only web that makes sense (in these kind of things) is in realtime.
  • In order to achieve realtime and bragging rights you have to build it full Javascript, so i needed to have a good async data controller at server side, this was the big opportunity for Node.JS in this project.
  • Node.JS allowed to build something using socket.io real quick, since some code was reused  in the webclients. This allowed quick, painless and direct access in realtime to the data at the Redis instance.
  • Added a few cool UI libraries into the pan (like Google Chart, jQuery, jGrowl) and a realtime dashboard was built overnight.

After Skynet was online and “reactive” human intervention in maintenance tasks and solving simple event scenarios dropped drastically. More important it filters the problems, solving the simple ones and only passing the harder ones to the sysadmins, boosting productivity.