Monitoring an infrastructure is one of the most trivial things around, isn’t it? Yet a lot of people are still not happy with the way current monitoring tools work. Last summer John Vincent (@lusis) started a trend on twitter and #monitoringsucks was born. Today it’s a Twitter hashtag, an irc channel and also a github group, but mostly it’s a group of people talking about how to build better tools, or how to glue tools that already exist together.
But why does monitoring suck? No better an explanation than quoting John Vincent himself from his blog.
Monitoring is AWESOME. Metrics are AWESOME. I love it. Here’s what I don’t love:
* Having my hands tied with the model of host and service bindings
* Having to set up “fake” hosts just to group arbitrary metrics together
* Having to collect metrics twice – once for alerting and another for trending
* Only being able to see my metrics in 5 minute intervals
* Having to chose between shitty interface but great monitoring or shitty monitoring but great interface
* Dealing with a monitoring system that thinks it is the system of truth for my environment
* Not actually having any real choices
You could also add the lack of automation possibilities to most monitoring solutions, as this is the criteria that limits choice most. So even with a huge amount of open source tools out there that bits and pieces of are right, people aren’t happy, and every couple of weeks a new effort to solve monitoring – or at least to improve it – starts. The monitoringsucks crew have set up a github repository pointing to all kinds of different tools that are around.
Most people agree that there are plenty of good tools around if your infrastructure is small-to-medium sized. It starts to be more problematic when your infrastructure grows, when you have more and more items to monitor and more and more items to measure. With the introduction of “infrastructure as code” people want to be able to deploy a service automatically, and also include monitoring in that deployment; that’s one area where current tools are not ready. And another area of course, is scaling the monitoring solution itself. If you need a full time DBA to manage the database where your metrics are being sent because otherwise it is too slow to accept new metrics, there is a problem. Add to that the fact that most monitoring tools are written with static environments in mind, or at least infrastructures in a local network. Not with a flexible cloud style environment where machines are being decommissioned even faster than they were provisioned.
All of these problems are popping up in today’s web infrastructures. All of the larger web applications, the social networking sites, the online shops and their friends, are starting to feel the pain, so they are looking for problems. Should this be solutions?
As the devops community are pretty keen on monitoring, it’s no surprise that there is a huge overlap within both communities. This means there is a large group of people out there who are investigating new ways to tackle the problem, and they are sharing their experiences, sharing their code.
In February Inuits, the leading Open Source consultancy, hosted a 2 day hack session in their offices, which gathered people from different stakeholders who were trying to solve the problem of monitoring. People from large sites such as TomTom, Booking.com, Atlassian, Spotify, and others were present to discuss and share their ideas.
One of the core ideas that came out of this discussion was to go back to the old unix philosophy and build a chain of tools that work closely together, each specialized in their own area. With small changes to existing tools, a tool chain could be built that would collect data and throw it on to a message bus. Then other tools could be listening to that bus to transform that data, make statistics out of it, base alerts on it, graph it, archive it, or perform analytics on top of it.
Plenty of tools exist in the area and plenty of new tools are popping up almost weekly, but one thing is for sure… monitoring is not a solved problem yet. It still sucks.
Kris Buytaert is a long time Linux and Open Source Consultant. He’s one of instigators of the devops movement, currently working for Inuits.
Kris is the Co-Author of Virtualization with Xen ,used to be the maintainer of the openMosix HOWTO and author of different technical publications. He is frequently speaking at, or organizing different international conferences
He spends most of his time working on Linux Clustering (both High Availability, Scalability and HPC), Virtualisation and Large Infrastructure Management projects hence trying to build infrastructures that can survive the 10th floor test, better known today as the cloud while actively promoting the devops idea !
His blog titled “Everything is a Freaking DNS Problem” can be found at http://www.krisbuytaert.be/blog/.