The Systems Administrator’s Lab

Background

Recently, I listened to an O’Reilly webcast called “The Myths of Innovation” where Scott Berkun discussed the concept of a lab. He showed a picture of Edison’s lab which showed wooden tables, lamps, and beakers. Systems administrators are also inventors.  We are required to script, program, and configure exotic servers and equipment. To discover new solutions, we need a lab. This is especially true with cloud computing and virtual infrastructure where machines are created and destroyed in a very transient manner.  You need a lab to track all of the successful and failed experiments.

Many people get caught up on which tool to use in the lab. It is more important to have and use a tool than which one you pick. If you are having trouble selecting a tool, then most likely it’s just brands of beakers, just select one and move on. Don’t spend too much time on the selection process, you can always change it later, which will take the same amount of time, but you will be more productive in the interim.

Anybody that has lab tools that they love, please leave them in the comments.

Basics

  • Operations Development Box: This is often forgotten, but the most important. You need a dedicated systems administrator’s only development box. Often, each team of programmers has a dev box, but quite often whole teams of systems administrators do not have one server truly dedicated to doing systems administration research and development. A gold server doesn’t count, you need a box that is all yours. A VM host is even better, so you can experiment with new operating systems. It is preferable that this box is a newer server, in the data center and backed up, just like a programmers development box, but all for you. If you ask, I promise your boss will buy one. Don’t skimp, your worth it, it will be one of the best investments in your team. The Operations Development Box has the advantage of access from all of the team members, including command history, and enables a new form of technical conversation.
  • A Ticket System: Everyone says this. Thomas Limoncelli says this in the introduction to his book. I prefer Request Tracker, but anything will do. Doing side work, I have even used Bugzilla. Just get it in place first and track all of your work there. It gives you a sense of “completeness” for each task and helps you move along. When you get done building/installing the ticket system, create a ticket for it and mark it completed.
  • A Wiki: A wiki is the next most important part of the lab. You need a place to track your work. If you have never used a Wiki, it will take a while to get comfortable with this style of always accessible, version controlled documentation, but just experiment. At first put everything you can think of in it. Get a rough draft, go back and edit. When I am documentation a process, it usually takes two or three iterations to get it really nail it.
  • Fault Monitoring: This is critical, so that you know when you are breaking something, even if it is an experimental machine. Have a fault monitoring system in place and use it to monitor the operations box, ticket system, and wiki. I use Nagios because it has been around forever and most likely will be. I don’t need the latest greatest because I do that in my lab.
  • Data Acquisition:Always capture historic data and capture as much as you can. I use Cacti to capture ping, port, MySQL insert/update/delete stats, apache thread stats, etc, etc, etc Capture anything. I use a central Syslogng box for all of our logs. It is critical to have this stuff in place for analyzing what happened after the fact.
  • Version Control: You must learn the magic of this to develop long living scripts, plugins, configuration files, etc. This helps so much when collaborating with other systems administrators in your team, but it is critical when releasing tools to the internet and collaborating with others. I generally use SVN internally (legacy and it works), but have been know to use Mercurial, and GIt as necessary

 

Lab Guidelines

Use a simple syntax for everything, you don’t want it to get in your way. Don’t be weighed down by what you should do, just develop it as you can. It will make your life less stressful. Don’t spend thirty hours evaluating each piece of the lab. That is what your lab will help you do more efficiently.

Skill, at using these all together will grow over time, making you more efficient at building out your lab. Data acquisition and fault monitoring will always grow, but try and get a basic syntax for the ticket system and wiki down early.

Ticket System

In the ticket system I use three simple headings, Info (optional for each ticket), Completed, and Action Items. Leading by example with a simple syntax can get everyone using the ticket system into the same format which makes training and usage of the system easier for everyone.

As I complete items, I always copy the entire list to the bottom of the ticket so that it is easier to get a sense of where things are at without searching the whole ticket, which would be no better than a giant convoluted email. Also, always keep your action items on the bottom so that they are easier to see.

Info
* item 1
* item 2

Completed
* item 1
* item 2

Action Items
* item 1
* item 2

Wiki

In my wikis for standard knowledge base items, I use a simple template with five headings

= Background =
= Architecture =
= Routine Operations =
= Special Operations =
= Installation Notes =

Thinking Ahead

Once you have the main pieces in place grow it organically.

Our operations box, ticket system, wiki, monitoring and data acquisition have grown to all have redundancy. Our MySQL DB for the ticket system and wiki are even replicated with a Master/Master and failover DNS.

 

Experiment

Go forth and build something young Edison (I prefer Tesla), but remember you must be able to do experiments to really find creative solutions. We are scientists and to get experimentation done right you need tools in place.

Here are some of the projects I have developed in my lab.

  • Log Analysis Program
  • Backup Scripts (very complex and interact with Bacula)
  • Scriptlog: Data acquisition sensor for bash scripts (similar to log4sh, but with nagios plugin)
  • Chev: Vulnerability rss feed analyzer
  • Red Hat Build tool: Kind of like cobbler but from DVD/CD (eventually open source)
  • Red Hat Standard Installer: Configuration script which gets us to a baseline with install/uninstall for approximately 30 modules from ntp, to mysql and apache. (eventually open source too)
  • Cacti plugins
  • Nagios plugins
  • Squirrel Mail password change plugin
  • Flat file postfix system with IMAP, POP3, Squirel Mail, Vacation, Password change, and version control on all configs
  • Many, many, many scripts (more than a hundred and more then 20K lines of code)
  • Evaluation of New Software (for myr lab too)

4 comments on “The Systems Administrator’s Lab

  1. Automated tests are another tool that I've found to be extremely useful. Automated tests basically break down into two categories: first there are the tests where you know you want to do something new so you write a test to prove the capability and then implement the capability which is commonly called TDD in the software development world. These kind of tests are nice because it breaks up the planning and implementation phases such that you end up catching more edge cases than you would if you discover all of the edge cases during implementation. Post-implementation these graduate to the second class of tests, end to end tests.
    The end to end tests form a suite of tests that can be run in an automated fashion (this part is important) and can be used to test that your system is in a working state. You can also run these tests when things are broken to help narrow down what has broken.  In-effect these tests become a specialized set of tooling for proving and diagnosing your environment and are every bit as important as the standard tools.
    While not everything is easy to test and some things are simply not testable in an automated fashion, the more tests you can write to prove your system works the easier it is to maintain that system.

Leave a Reply

Your email address will not be published. Required fields are marked *