Rapid Operations Automation Development (ROAD)

I am getting ready to release Red Eye Monitor (REM) to the public for people to start using, and have been working at how to describe it.

It started as a monolithic system for automating system provisioning, configuration, deployments, monitoring and reactions. A full life-cycle automation system.

After implementing it in several different environments, and several different ways, I learned that while this is a viable method, it is not very ideal as it there are always more business goals that need to be pursued and there are many gaps that need to be filled in an operations environment and often people are not ready to think about their operations as a big picture. They’re not ready for an Authoritative System Automation system. It’s too much of a change for their processes and their way of thinking about their operations.

So in response development started to change to become more modular and light weight, and I think the REM system now most closely resembles a development platform, and specifically a type of Rapid Application Development (RAD) platform.

However, RADs are designed primarily around GUI applications, or end-user applications. The way that one thinks about a RAD and how one uses one is very different than an operational system, which is not a single application with some data sources and a UI and some logic tying the two together and doing validation and formatting.

An operational development platform requires a lot of communication devices, and ways to collect up custom data, and to process it into things that are general and can be acted upon, based on business goals that change frequently, and most important, it is a live and running system, in the wild.

A system of development tools, slanted for automation of live operations systems, to rapidly create new ways to monitor, configure and deploy data and logic and analyze how it’s all working has a very different feel to it than an application development system like Delphi or Visual Basic were when RADs were last in fad.

The REM suite of tools and libraries has become something I think of as a Rapid Operations Automation Development system, and over the next few days and weeks I will be releasing documentation and demos explaining all of the pieces that the suite consists of. This post and the previous one on Authoritative System Automation are intended to give background and context to these tools and libraries, as there are a lot of ways to use them together, and when used in conjunction they provide a system for authoritatively provisioning and controlling complex and large systems that scale and provide comprehensive insight into what is going on.

They are an architecture, with some standard libraries for working out of the box, which can be extended in many ways to account for the many goals of businesses and are bound to concepts, not specific technology, so that as new platforms emerge they can be integrated into an existing system, and wrapped with the same container logic as existing systems.

The current suite of REM tools and libraries is:

procblock: A hybrid data and logic processor. This is a primitive-data sourced tag processor which conditional returns data and is a hardened execution environment that manages threads and pipe-oriented-execution of scripts and shall commands. It will need it’s own documentation and examples to explain, which should be finished in a few days. Defaults with a YAML backend, making the tag logic look similar to Python. Tags are overloadable to custom data sourced mini-languages can be created for defining and processing data. Both data and code blocks are treated the same way, and are essentially interchangeable. Time series collection (default: RRDTool) and graphing are also included, as is interval based result caching (running in threads). This has command line parsing functionality to provide state directly from the command line, in addition to the normal Python invocation.

dropSTAR: A threaded web server that runs in a procblock, processes HTTP page requests and RPC code requests via procblocks. Made to conditionally import site configuration, and make it extremely fast to add custom code to one or more machines, specified dynamically by some configuration data. Inclusion of RPC makes all nodes easily networked in communicating their needs or states in any communication topology desired. Pages are constructed, by default, by running a procblock pipe of scripts, and templating the result through a text file. Portlets can be created by embedding recursive requests. Maximum flexibility is left to the developer of the scripts and minimal effort is needed to create a new script. Can cache long-running or fault-likely steps to avoid stalls in requests.

Shared Resources Library: A python library that includes a message queue, a shared lock manager, a shared state manager, a shared counter manager, and a shared connection pool manager (ex: database cursors). This allows very small scripts to be written, and communicate and coordinate with other small scripts. Logic should be made minimal, and through the shared resources enterprise level features can be added. This is not meant as a high-performance system, in competition with memcache/reddis or ActiveMQ/RabbitMQ, but is meant to be a solid development tool to rapidly create operation automation scripts without adding more operational overhead, which standalone shared resource software requires. All of these are based on thread-safe dictionaries and lists, keeping with data-primitives for maximum development flexibility and simplicity. Expansive logging and wrappers for executing code on a system are also included in this.

Time Series Library: A python library that wraps Time Series requests for a single node. Default implementation backend is RRD, but that is adjustable. Many-node systems will manage file locations and naming and use this library both locally on nodes, and on regional collectors, as desired.

Dot Inspection Library: A small stand-alone library that allows inspection and manipulation of data-primitives via strings. So strings “var1.var2”, “var1.0”, “var1.-1”, “var1.-1.var2”, would do inspection into a data-primitive using input data from a dictionary with “var1” and “var2” fields defined. Sub-inspections can be done like “var1.(svar1.svar2).var2”, so that inspections can be made dynamic. This allows deep configuration of data manipulation at run-time, and stored in data sources, allowing more to be done in procblock, and freeing up real logic code (Python/whatever) to be simpler and more about interacting with the operating system and other services, so that architectural issues and architectural goals can be a separated implementation process than direct interaction logic.

schemagen: A schema generator for a backend data source (default implementation is MySQL). Schema information and relations are specified in data (default: YAML) and can create or update a database or other data source. Implementation allows primary key access into the data source as well, so after the MySQL database has been generated, or updated, requests can be formatted through schemagen to extract or insert data, allowing both specification and interaction wrapping through schemagen.

Mother Brain: This is the schema definition for REM, and is fairly massive in scope, covering all physical and virtual hardware, their connections, OS platform specifications, software package specifications, service, users, monitoring, SLAs and everything else in the REM Authoritative System Automation system.

Utility Computing Library: A wrapper for dealing with utility computing (default: Amazon’s EC2). This wraps provisioning machine instances, disks, floating IPs and all the other resources that are required to create a cloud-like Utility Computing system.

REM scripts: Like the Utility Computing scripts, there are many scripts for managing databases, file systems, services, deployment, monitoring, user lists, pager rotation and many more topics. These integrate into dropSTAR and the Mother Brain to be run via procblock or normal system methods to perform actions and collect data.

These tools and libraries form the foundation of the REM system. REM is intended to be customized and expanded, and ultimately to allow “plans” for automated system administration to be published and shared, creating an open source environment for operations.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: