Aborting the Red Eye Monitor project and next steps

June 7, 2011

This project has been a success from many standpoints, but getting internal adoption has not been one of them.  Comprehensive automation is very hard to grasp, as being comprehensive it is quite extensive and detailed.

I’ve weighed my time costs in releasing the system as it is, and I don’t think it’s worth the support requests, if it gained any interest, until the topic of comprehensive automation has been documented and has a base of understanding in the industry.

I’m going to re-focus my home efforts to document how to create comprehensive automation, and methods I used in the REM project, and perhaps once I have explained how things work and have some grass roots support for wanting this kind of automation a release would be warranted.

I’m going to go back and break up the project into updated component pieces and release those as separate open source technologies and use them as examples for documenting how I comprehensively automate things.

I’m going to leave this blog up as a place holder, but all new writing will be posted at the more general site:

ge01f.wordpress.com


Automation Package Editor Screenshot

February 15, 2011

I’m making pretty good progress on the GUI to edit the internals of the system.  I’m sticking to a pretty basic approach, with just a few goals:

  • All data resides in YAML, for optional sysadmin friendly hand editing, but everything can be edited in the GUI
  • Packages in REM are a hierarchy of tagged data by dictionary/hash key, with data indexed underneath.  The bread crumbs in the above picture show this: test.yaml >>> jobs >>> tester >>> tester2
  • Leaf nodes can be grouped in a deep hierarchy, to make it easy to organize the nodes.  Nodes can be copied and pasted to other similar hierarchy types (Schema Sections)
  • A package is specified by a Schema Instance, and then is instantiated by a Schema Instance Item.  There can be any number of items per Schema Instance, so that packages can be defined as a specification of specifications, but then instantiated with custom data and custom usage of the specification so that a general outline can work for many different projects.
  • Packages are essentially equivalent to “distributed programs”, they can specify jobs be run on many different machines, and using many workers per job if desired.  Jobs can return output or save results to a message queue, which can be graphed and analyzed with custom SLAs, which can have alerting or meta-analysis data stored.  This is considered the normal desired case for any job, and is not a theoretical “it can be done”, but it is assumed all jobs will potentially want graphing on results, and alerting or automated responses kicked off on data being out of specified tolerance over a period of time, or meeting a test scripts criteria.
  • Packages can mount web pages and RPC functions specified in them (default Sections, ‘http pages’ and ‘http rpc’), and use other packages as fall backs for misses, so that they can extend custom functionality of base packages, and reuse where standard results are desired.
  • Packages are meant to be used as a data based Domain Specific Language.  Organize the data into actions and groups, and then the Section specifier will process the data, as if it was a language.  In this way plans can be built, and the package substitutes for a normal program’s core architecture starting from Main() and initializing state and running code.  All of this is specified in the Package, and the data’s hierarchy serves as the architecture of when scripts should be called, and what data should be passed to them.

Here’s the data that is being edited:

jobs:
 tester:
   tester1:
    script: /tmp/tester.py
    name: Tester Script 1
    title: Tester Script 1
    command: null
    workers: 1

   tester2:
     script: /tmp/tester.py
     name: Tester Script 2
     title: Tester Script 2
     command: null
     workers: 1

The Schema Section, which is what will be processed, has 2 grouped index layers, the 1st being an actual group, “tester”, and the 2nd layer being the labels for the ‘jobs’ item data.  Grouped indexes can be any depth, and field names are assigned to the indexes, so that the final item data collects new fields along the way, picking up it’s hierarchy position as field information.  In this case, it is specified as:

# Indexes we keep to reference this data, grouped in layers
grouped indexes:
 - index: group
   type: text

 - index: null
   key field: name

Which means that either of these items will also have a field ‘group’ with the value ‘tester’.

Sections can contain other Sections, so can can be layered as deeply as required.  It can also be linked out to other files in several different ways to create various types of relationships.

Sections specify all the scripts association with the section, the most basic being the ‘process’ script, which in the case of the ‘jobs’ section starts up jobs (Python scripts, in this case) through the Job Manager, which can run jobs on the current host, or schedule them to be run on remote hosts and receive the results when they complete, or periodically through message queue replication, if the job is long-running.  Replication is built into REM as a core component, as well as shared state, locks, message queues, counters and time series data storage (for graphing and time series analysis).

Sections also specify their fields, including a type and validation.  Types are high level, and have their own schema definition, like the Section specification, they specify scripts to validate, format, save, serialize and performing other operations on the type of data.  Types are meant to be added whenever new basic functionality on data is desired.

Sections additionally specify their rendering information, for instance the edit dialog above was rendered with the following specification:

edit:
 field sets:
   - Job:
     - name
     - title
   - Execution:
     - script
     - command
     - workers

This specifies the order in which the fields are displayed in the Field Set editing dialog box that comes up when you edit a Schema Section Item.  Note there are two field set groups specified to visually separate the fields: ‘Job’ and ‘Execution’.  This can also be used to create a wizard style interface with multiple pages of field set groups.

I’m still working out all the functionality to make the creation of new packages, adding of sections to the packages, and then creating and moving around items under indexes in the sections.  Once that gets worked out, I’ll go back through all the other features done the non-dynamically-edited way, and migrate all their features to working with data in this new way.

Hopefully I’ll have some screenshots of dynamically creating web pages and widgets as part of the tool building process by next week.  After that I’ll put up a demo on EC2 to show it controlling another EC2 instance as it goes through various stages of configuration and forced failures.


The Delay in Release

February 10, 2011

It’s taking a while to get the release together, and it’s going to be a while longer until it is done.  Current guess is maybe 2 months delay.  Primarily, the majority of my time is now directed on other projects, but in addition I ran into the documentation issue.

For software to really be released, it needs reasonable documentation, and the Red Eye Monitor (REM) project is a large and complex project meant to do large and complex things, so it needs documentation that makes at least it’s basic operations clear before it can really launch.

Since I have developed REM to use very loose hierarchical data structures and a loose pluggable architecture, and both can recurse, documenting how this can be worked with would first require explaining all the methods and motives I used to put the system together, which would take a good deal of explanation.

Instead, I’ve decided I’m going to put time into the front-end GUI, so that I can document using the system through snapshots of GUI pages and explanations of the work flow and the schema at each stage.  This should allow interested parties to quickly install and start playing around with configuring it to do new things, and I can defer writing about the internal structure until after it starts getting some install base.


Update: Next Beta Release Includes Usefulness

December 10, 2010

The last beta release (000) ran, but was not especially useful as some of the required features for monitoring and alerting were missing.  The upcoming release (001) will be fully functional and usable for a monitoring and alerting solution (though it is still early in it’s application life cycle).

Things have been delayed a bit, as I have taken the steps to complete the automation platform, and not just the monitoring application.

New pieces:

  • Packaging system: Full life cycle management for adding new components, changing things, updating things, and wrapping all the different kinds of stuff needed for operational automation together.  This includes: HTTP/RPC registration, a state machine for executing long-running code, a job system for executing scheduled code (distributed worker model included), requiring and importing other packages, a module plug-in system, defining data used by the package, and replication for state between nodes.
  • Job Scheduling: One time, recurring, cron-style, worker threads, distributed/remote worker threads.  Job control, and result handling management (replication/storage/processing) are included in the Job scheduling model.
  • Replication: Simple push/pull model for state and queue data for now.  Later this will be expanded by pushing any state changes and slurping back the updates, but for now simple gets the job done and creates an automated flow of information to keep nodes up to date, and deliver results generated locally on nodes to management systems.

These latest modules bring the system from a local Rapid Operations Automation Development System (ROAD), into being a distributed/cluster ROAD.

The Package and Job Schedule system does a much better job of encapsulating code and data to be run on a single system, and make adding more nodes very simple and adds a minimum of complexity.  These also provide all the necessary functionality for doing local agent monitoring and automation, which has been a major delay in finishing the monitoring system’s functionality.

I’m not sure I’ve mentioned this here, but I have a policy of working towards Logarithmic Effort.  I find that many projects fall into requiring Exponential Effort as they progress: for any given change, it takes an exponential amount of effort in coding/testing/deploying to effect the change.

Creating libraries that allow Logarithmic Effort to produce more and more logical content means using Network Effects to create functionality without there being something to actually facilitate it.  The structure and flow of the process creates the effect that might otherwise have been created directly.  This is pretty subtle stuff, and probably sounds like BS, but isn’t.  I’ll try to figure out how to clearly demonstrate this in some of the documentation examples.  Using my system, you get the benefits, as they are wrapped up in the system’s functionality, but I think it would be useful to continue using them in your custom scripts as well.

My goals with infrastructure development are always to work less, but not today, in the future.  So each progression of the Red Eye Monitor (REM) system has been developed with that goal in mind.  Reduce the effort required to do any piece of work to streer towards logarithmic effort, and away from exponential effort.

Where logarithmic effort cannot be enabled, go for linear effort.  The changes can’t be shared, but things can be copy-pasted and changed (using descriptive data, templates and small pieces of isolated-yet-networked code), without side effects or creating more work in the future.

I believe the system I have now is well on it’s way to providing this Logarithmic Effort for creating operational automation, and hoping to be able to start demonstrating that in some articles showing how to build things inside of the REM Package System in the near future.

I’m aiming for having the 001 release and an online demo running this Sunday, and then documentation should begin to flow in after that.  This was also my intention last weekend, so slippage may occur.


GUI Editor Teaser Image

September 22, 2010

I am about a week away from releasing the “REM Monitoring” package, which will be the first of the product releases from the REM suite.

Currently all the local and remote-local-collection monitoring is done, and I have been wrapping up a GUI editor for easy creation of monitoring, dashboard and other general purpose web development for REM tools.  The GUI relies on jQuery and a lot of awesome UI plugins developed by the community, which I have integrated and added to a common widget rendering Python library called jquery_drop_widget.

The next post should have the Alpha release of the REM Monitoring package and initial documentation on how to use it, including the GUI page and widget layout system.

Here is a screen shot of a full page layout getting a new widget created:


Local System Monitoring Demo

August 22, 2010

I have the first draft of the local system monitoring demo (single node) ready: It can be viewed here.

I’ll be flushing this out more after I finish the monitoring for Linux, and fix the Disk I/O to update properly in FreeBSD and OS X, and fix the View Internals for the RRDs, on RRDs that have multiple targets per type.  Then I’ll add some formatting for the sections, and make the list of items dynamic and so you can turn uninteresting ones off, and then I will ship that demo.  After a few more demos to finish testing out all the different packages that it takes to make up Red Eye Monitor (REM), then I will turn this into a real monitoring software install, that does good things out of the box, and works on single nodes or multiple nodes.


dropSTAR released as Python library

August 19, 2010

dropSTAR has now been released as a stand-alone Python library for creating an HTTP server.

I’ll be putting together RPM/Deb/make packages to get a more functional install for those not interested in the packages, but the functionality.  These will come with installers for modules on the dropSTAR and procblock platform, which will allow services to be packages and downloaded separately and will stay focused on providing functionality, not a lower level development framework.

More to come!