Onapp hypervisor installation

These instructions take in account that your control panel (with ip 10.40.1.1 in this example) is where templates and backups are stored, if this is not the case adjust them accordingly.

Before starting with the hypervisor installation, make sure you can reach your SAN/Volumes and Onapp management network.

  1. echo >> /etc/fstab ‘10.40.1.1:/onapp/backups /onapp/backups nfs soft,noatime,intr,tcp,rsize=32768,wsize=32768 0 0’
  2. echo >> /etc/fstab ‘10.40.1.1:/onapp/templates /onapp/templates nfs soft,noatime,intr,tcp,rsize=32768,wsize=32768 0 0’
  3. Go to 10.40.1.1 and edit the file /etc/exports in order to allow nfs access from the hypervisor ip you are installing. Restart nfs by doing /etc/init.d/nfs restart
  4. mkdir -p  /onapp/backups
  5. mkdir -p /onapp/templates
  6. wget http://rpm.repo.onapp.com/repo/centos/5/x86_64/OnApp-x86_64.repo -O /etc/yum.repos.d/OnApp-x86_64.repo
  7. yum install onapp-hv-install
  8. /onapp/onapp-hv-install/onapp-hv-xen-install.sh or /onapp/onapp-hv-install/onapp-hv-kvm-install.sh
  9. /onapp/onapp-hv-install/onapp-hv-config.sh -h 10.40.1.1
  10. nohup ruby /onapp/tools/vmon.rb > /dev/null &
  11. nohup ruby /onapp/tools/stats.rb > /dev/null &
  12. mount -a (in order to make sure everything is alright with your fstab)
  13. reboot
  14. Go to your cloud control painel and add the hypervisor to the desired hypervisor zone.

These instructions worked at the time i wrote this, in the future they may not work.

Designing a monitor and control system for 200+ servers

A few months ago i had to design a proactive monitoring system that could handle 200+ servers with ease. The idea was not to build a simple monitor that passively watched the server farms notifying the admins when some threshold was reached.

Keeping a team watching the servers 24h/7 has its problems, if the system could lighten up the load on them would be great.

I wanted the system to have some capability of reacting according with the scenario it had at the moment. This scenario is represented by all the readings of each sensor loaded at the time and it may be a single server contained scenario or farm/cluster wide. With this reactive capability humans are notified only for situations that the system couldn’t handle/contain.

Sorry if i offended someone with the project name (Skynet), too much movies… lol, but fyi it has TTS library for many things but one of them is saying “hasta la vista baby” 😛

Architecture

  • Starting from the core piece, it was written in Java for two main reasons. First was because at the time i had only a few days to implement the prototype of this and since i have years of experience in Java so it is where i was most productive.
  • Second reason was “Reflection“, i know many other languages let you inspect and execute code at runtime, but again previous experience in the technology allowed me to cut corners. Runtime inspection/execution was obligatory since i wanted to be able to add components/sensors/… at any time and more important abstract all this.

Skynet Schematic

Input sources

  • Currently Skynet has many input sources, the mainly one is sessions over SSH opened to each server which allow to monitor everything in each server, accordingly with each server profile the right set of sensors will be loaded at runtime using reflection.
  • This SSH sessions are, of course, used by Skynet to actively interact with the servers. For example block an ip, keeping mail queues clean, stop some non critical services if a server is under stress, etc. All this is done automatically and if the problem fails to be contained then humans are alerted for the problem.
  • The second main input source is Mail, this is great since end-users/customers can interact with the system without knowing and without human intervention, for example: requesting an ip unblock from a server in an shared hosting cluster.
  • There are many others like: RSS feeds, SMS and so on. RSS feeds support is a funny history, Skynet actively scans defacements feeds (like zone-h and others) for IPs from any one of the servers connected to it. If a match is found it alerts the admins allowing them to alert the website owner.
  • Applications are endless.

Data

  • All events and readings are stored in a offsite Redis instance, adding persistence capability.

Ouput

  • Current version have modules for SMS, Mail and  Twitter. Twitter is used almost like a timeline log for each action Skynet does and since there is almost a twitter client in any electronic device nowadays, its the perfect on the go log solution. (feed is kept private)

Security

  • The machines where Skynet core is running are in a secure location without any direct input connections form the web. Since SSH sessions are used to talk with the servers, there were a real danger if the location was compromised.
  • Key authentication is used and keys are saved only in volatile memory. If the power goes down they are lost, so if even someone steal the machines they will not be able to reestablish the sessions with the servers in the new location.
  • It is totally autonomous, accepting only emergency shutdown in case something starts to deviate. This shutdown command is not sent directly to the Skynet since theres no direct connection to it from the outside, instead its saved in a location where Skynet connects to check for emergency commands. (Botnet style)

Web Architecture

  • Here goes my favorite part of all this. That Redis instance had to be accessed  someway, for me the only web that makes sense (in these kind of things) is in realtime.
  • In order to achieve realtime and bragging rights you have to build it full Javascript, so i needed to have a good async data controller at server side, this was the big opportunity for Node.JS in this project.
  • Node.JS allowed to build something using socket.io real quick, since some code was reused  in the webclients. This allowed quick, painless and direct access in realtime to the data at the Redis instance.
  • Added a few cool UI libraries into the pan (like Google Chart, jQuery, jGrowl) and a realtime dashboard was built overnight.

After Skynet was online and “reactive” human intervention in maintenance tasks and solving simple event scenarios dropped drastically. More important it filters the problems, solving the simple ones and only passing the harder ones to the sysadmins, boosting productivity.

A little brainstorm about mobile…

About a year ago, i started looking for a good cross-platform development framework for our mobile apps.
After a few weeks hacking around Titanium (before the new Eclipse based IDE was launched), i decided to pull the plug on it since we had multiple different problems in each platform, a nightmare…

In order to launch something quick, i used my previous experience in Android , and built something very quick, using native SDK, to get us started.
IOS app at the time? Nothing in the horizon…

Im a full stack guy, that loves to put his hands in everything (i leave design to people with better fashion taste). Learning Objective-C was not a problem for me, although my language count is high and i dont want to lose focus.
The problem at the time was: Wasting time learning a language and platform that i could never use for anything else than develop apps for a very specific mobile platform (IOS).

Some of you reading this are screaming: PHONEGAP , call me stupid, but one year ago i really underestimated the potential phonegap had.
I saw some potential in it but i was naif and never saw the bigger picture around web based mobile apps, i had the idea that there were always some limitations, again naif…

What i mean by bigger picture was combining all this great JS libraries/frameworks out there, in order to “build a framework” that gives almost the same advantages than native, architecture wise, mvc model for example.

Today we are building our new mobile apps using Phonegap , combining backbone.js, jquery,… all this good stuff out there.
Its giving us a good solid structure to grow in the future, without all past cross platform problems from Titanium , in the end of the day its pure web development.