Since I have been using Scalr to manage my Amazon Web Services farms I have been wanting more monitoring in terms of statistical information on services, traffic, disk usage, and uptime to name a few. Scalr has built in means of basic event notifications such as host up, host down, etc. Along with providing very basic load statistic via RRDtool. In the past I have always used Zabbix for most projects I have worked on so I wanted to be able to use it with Scalr. I am still testing the setup I am going to speak of so please keep that in mind. This is NOT a howto, but more of a brainstorming of how I plan on getting Zabbix integrated into my Scalr setup. In the Zabbix documentation (PDF) there are a few ways to use the auto-discovery that they cover (page 173). You can have Zabbix monitor a block of IPs to find new Zabbix Agents running for example. So here is what I will have my Zabbix Server do:
- Look for new Zabbix Agents on my AWS internal IP range.
- If the system.uname contains “Scalr” it will add to Scalr server group
- Server must be up for 30+ minutes
There will be other stipulations in order to get the server added to Zabbix. I will have system templates for each of my Scalr AMI roles. Once the server is added to Zabbix it will add them to to their respective groups and monitor for items and triggers listed in the system template. There will also be a rule to remove old instances after 24 hours from Zabbix after receiving the host down trigger. This way I will not have a bunch of old instances that were once monitored still cluttering Zabbix database. If you happen to also have Windows AWS instances you can add a rule to monitor these as well. The AMI just needs to have the Zabbix Windows Agent installed.
So, now that IO have the overall plan I want to compile the Zabbix agent on my test instance of a Scalr role. Once I have compiled, packaged, and tested it I install on my different Scalr roles and “Synchronize to all” in order to re-bundle the role with the Zabbix agent and propagate it to all instances running that role. Once this is done I should see your farms instances being added to Zabbix. Assuming all went well I will now have all my Scalr instances being monitored by Zabbix.
One important issue with monitoring your AWS/Scalr instances is what to do if AWS itself goes down? Well I plan to use Zabbix Proxy to have an instance running in AWS send all events to a dedicated server I have running on another provider. This way the events can be safe off the AWS network while still leveraging the AWS environment to send alerts to the proxy. Also, with using the Zabbix Proxy internally within AWS I can have the MySQL DB for it on an EBS volume or being backed up via Scalr while still having the offiste DB on the Zabbix Server itself.
Once I have completed my testing of this scenario I will post more on my results. I would love to hear any ideas or feedback from others who have tried this or are interested in getting Zabbix going with Scalr/AWS.