Petit is an open source log analysis tool
This short video gives a basic overview of what the Petit log analysis tool can do
- The Basics
- Issue Tracker
- Routine Operations
- Special Operations
Log analysis is something that all systems administrators know they need to do. Many of us come to this point, either because there is a problem, there is a security requirement from the organization, or it keeps you up all night wanting to know what is going on in all of that data. Looking for best practices for log analysis on The Internet is difficult at best. Many years ago, I discovered a script that hashed log files by removing all of their numbers and replacing them with "#" characters. The results of this simple algorithm were phenomenal, logs could be reduced by a factor of ten. This was much more readable, yet left much of the quality data that I needed to determine if there was a problem. In the years since I discovered that simple algorithm, I have come to discover many techniques on text analysis which are commonly used in linguistics and anthropology to analyze natural languages. This has led me to develop very simple best practices for analyzing logs.
- Logs are made up of output which are programmed by human beings. There are no real restraints on what is output, other than, some cultural rules on being professional. This makes the output from programs very much a natural language. This also makes the output of someones program an approximation of the reality of what is happening inside a program. This is important to remember, logs are not perfect.
- When a systems administrator analyzes logs by changing them, he is creating an approximation of an approximation of reality in side a working program. This is not necessarily a bad thing, especially, when the programmer never gives you better than his approximation of reality anyway.
- In practice logs are made up of certainty and uncertainty. For example, I know what OpenSSH puts in the log during a login, because it is common. On the other hand, I do not now what a Compaq DL380 G3 will put in the log when it has a disk controller error. This is important to remember.
- The basic log analysis algorithm in Petit works to remove certainty, while leaving uncertainty. Stated another way, Petit quantitatively removes certainty, thereby leaving uncertainty, which by necessity requires qualitative analysis from a systems administrator
- After the algorithm has been applied, the output must be read by a systems administrator to determine if it is a normal or abnormal. Then abnormal entries can be acted on, hopefully before there is noticeable impact to your system.
Version 1.1.1: Change Log
RPM, Deb, and Tar Archive for use with any Unix and Cygwin. Also, this tar archive contains scripts/Makefile to create RPM/DEB packages
Issues, bugs and problems can be reported and tracked on Google Code
Hash a syslog, removing reboots and all standard filters. By default petit will show a sample for all entries which are found three or less times.
Hash an Apache log
Get a daemons report
Get a host report
Show samples for each entry
Find qualitatively important words in your log. This is especially useful to help determine what should be monitored in swatch or some other program
Graph first 60 seconds in a syslog
Graph first 60 minutes in a syslog
Track a special word you are interested in by minute
Use With Non-Standard Log Files
Create an on the fly driver for a nonstandard file format, then pipe it to Petit. Petit can hash files of non-standard types ok, but graphing requires the time values to be in the correct columns.
Do Not Remove Patterns
Sometimes, especially with custom log files you may want to keep all patterns found in the log file
Use With Just Sniffer
Just sniffer is a really cool sniffer that outputs in apache access log format. This makes it very easy to use with Petit.
Mod Security Quick and Dirty Analysis
Petit treats this as a raw log file with no times stamp data, but at least gives a quick and dirty overview. Mod_security is a Web Application Firewall that is gaining in popularity. Most of the tools to analyze it's format are made by [http://www.breach.com/ Breach Security]
- Sane by default, works out of the box
- Designed to follow the Unix philosophy of small fast and easy to use
- Intersect, not overlap, with other tools such as cat, tail, awk, sed, and grep
- Sane by default, works out of the box(sanebydefault.com)
- Auto-detects and supports Syslog, Apache Access, Apache Error, Snort Log, Linux Secure Log, and raw log file formats
- Log reduction for easy reading
- Word discovery and count with common stopwords support
- Default & Custom filters
- Fingerprints, useful in identifying and eliminating reboot signatures
- Different output options for wide screen terminals and character selection
- Add –dev1 and –dev2 to display subset of –hash report during graphing. This will display all entries that are outside 1 or 2 standard deviations respectively. This will be useful for finding new problems that arise in a log file
- Key/Value Bar Charting for side by side comparison of distributions
- Python 2.3 support
- Add better fingerprint database, especiallyDebian/Ubuntu/Suse
- Restructure implementation of fingerprinting for hashing
- This was the inspiration for my original perl script lt, which later became petit. This mailing list entry describes the basics of artificial ignorance: http://www.ranum.com/security/computer_security/papers/ai/index.html