Red Eye Monitor (REM) is a project I am developing which is a Total System Automation framework and set of scripts for managing cloud vendor and data center machines and storage in an integrated and fully automated fashion.
The site resides on SourceForge, here:
Because SourceForge doesn’t have any kind of blogging mechanism, I’m using WordPress to keep my custom tool sets to a minimum of this project.
At the moment I’m finishing documentation and the addition of the final SQL tables that add the data center and advance service and package concepts. REM had already been successfully tested on a functioning system, but I have since added automated persistence for databases and storage now, and wanted to expand it to include a “home cloud”, where all non-cloud machines, whether virtualized or raw hardware, are treated the same way that a machine instance from a cloud vendor, like Amazon’s EC2, is treated.
This design is now present and documented in the URL listed above, the schema has been designed and my next steps after finishing writing the basic documentation is to merge the new schema in and wrap it up in the API code and basic web pages.
Parallel to this I’m building a cloud-only EC2 install to start building a full matrix of possible failures for this system, so that I can write tests to trigger each machine/system failure, and ensure that REM covers the system properly. Fail tests would include writing bad data into configuration files, doing the same and restarting the services, writing random data over storage volumes, or critical kernel modules, deleting database tables, filling up the partition, changing permissions on log files, and all other unique failure conditions that can occur.
Once this matrix has been designed, I will put the REM installation through its paces and find which areas are already functioning resiliently and which are broken, and start writing the code to handle the broken cases.
I hope to get this in hand over the next couple of weeks, but the next day or two I’ll be finishing up the basic documentation which will also serve as the living design document.
If anyone is interested in this project, I’m not looking for coding support at the moment, as too many design aspects are still in swing. I could definitely use experienced opinions on anything that catches your eye as a logic flaw, gap in the design, or missing failure case.
This project will be considered Beta when I have a comprehensive set of documentation, an ISO and EC2 AMI image that will kickstart a REM installation, and the failure matrix well-populated any fully tested with the initial REM packages.
The REM installation will consist of a Apache web server pool, with a MySQL backend, a postfix mail server, an NFS server to share static content between the Apache servers, and a syslog server. The basics of a functioning internet presence.
When this can be tested for failures resiliently, then it is a matter of getting enough positive feedback to leave Beta.