1. getstats -c (concise report)
Server: http://www.eit.com/ (NCSA)
Local date: Fri Feb 11 18:17:07 PM PST 1994
Covers: 02/09/94 to 02/11/94 (3 days).
All dates are in local time.
Requests last 7 days: 4495
New unique hosts last 7 days: 358
Total unique hosts: 358
Number of HTML requests: 1854
Number of script requests: 472
Number of non-HTML requests: 2169
Number of malformed requests (all dates): 5
Total number of all requests/errors: 4500
Average requests/hour: 90.2, requests/day: 2164.7
Running time: 11 seconds.
This basic set of statistics is always output when getstats runs. Using the -c option will only produce this statistics paragraph.
2. getstats -m (monthly report)
Covers: 10/30/93 to 11/08/93 (9 days).
All dates are in local time.
Each mark (#) represents 1000 requests. ----------------------------------------------
Oct (10/30/93): 569 : #
Nov (11/04/93): 2 : ...
The -m option will produce a monthly report of server use. The dates in the report are the first day of reported activity for that month.
3. getstats -w (weekly report)
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 500 requests. ----------------------------------------------
Week of 12/27/93: 1878 : ###
Week of 01/03/94: 5606 : ###########
Week of 01/10/94: 23287 : ############################################## ...
The -w option will produce a weekly report of server use. The dates in the report are always the Monday of that particular week.
4. getstats -ds (daily summary)
Covers: 12/28/93 to 01/27/94 (32 days).
All dates are in local time.
Each mark (#) represents 1000 requests. ----------------------------------------------
Mon: 16018 : ################
Tue: 13219 : #############
Wed: 9904 : ######### ...
The -ds option produces a daily summary, which shows the aggregate number of requests for a particular day of the week.
5. getstats -d (daily report)
The -d option produces a daily report, which shows the number of requests per day and the date.
6. getstats -hs (hourly summary)
The -hs option produces an hourly summary, which shows the aggregate number of requests for a particular hour.
7. getstats -h (hourly report)
The -h option produces an hourly report, which shows the number of requests per hour, the day of the week, and the total number of requests for each day.
8. getstats -f (full report)
The -f option tells getstats to create a full report sorted by host name (and IP address). Use the -fa option to make a full report sorted by the number of accesses, the -fd option to create a full report sorted by the last access date, or the -fb option to create a full report sorted by the number of bytes transferred.
9. getstats -r (request report)
The -r option tells getstats to create a report of requests sorted by the request name. Use the -ra option to sort by accesses, -rd to sort by the last access time, -rb to sort by the number of bytes transferred, and -rf to sort by individual file sizes.
10. getstats -dn (domain report)
The -dn option generates a domain report, sorted by domain name. Use -da to sort by the number of requests, -dd to sort by last access date, -db to sort by the number of bytes transferred, or -du to sort by the number of unique domains. The unique domain number is the total number of unique sites under a domain. In the example above, for instance, a total of 3 unique sites came from the .au domain.
11. getstats -dt (directory tree report)
The -dt option generates a directory tree report, which cannot be sorted. The number of requests and last request date for directories and files is displayed. The request count for directories is the amount of requests for that directory plus the sum of all requests for the files and subdirectories under it.
If you find this report is empty, try using the -dr option without specifying a directory. This will tell getstats to make a tree report without verifying that the files and directories reported in the log file actually exist.
12. getstats -e (file) (error report)
-e generates a report of all malformed (or ignored) requests for all dates in the order they were encountered in the log file. If a filename is given as the argument to the option, bad requests will be appended to an error file, where they can be analyzed later.
getstats -a (all reports)
The -a option will produce all of the above reports, with list reports sorted by the number of accesses, if possible. If you want a report sorted another way, however, specify the correct option after the -a flag.
example: getstats -a -fb
This will create all reports sorted by number of requests, with the exception of the full report, which is sorted by byte traffic, and the error report, which must be specified on the command line.
To see the usage, run getstats with a -z option.
-dr directory, -l file, -C, -N, -P, -G, -A, -O, -M (root directory, logfile and log type)
The -dr option tells getstats what your root Web or Gopher directory is. This information is needed in order to determine byte statistics.
example: getstats -dr "/usr/local/www"
The -l option specifies the log file to use. The -C, -N, -P, -G, -A, and -O options will tell getstats to expect the log file to be in either CERN, NCSA, Plexus, GN, MacHTTP, or UNIX Gopher format.
example: getstats -l my.ncsa.log -N example: getstats -l my.plexus.log -P
The -M option tells getstats to expect the log to be in the "common" log file format, a standard that was agreed upon by the World-Wide Web community. If your log looks something like this:
www.eit.com - - [01/Jan/1994:10:30:00 +0000] "GET /test.html" 200 123
then your server is using the format. Include the option in the command line:
example: getstats -l cern.common.log -C -M example: getstats -l ncsa.common.log -N -M
If you use NCSA's httpd 1.2 or later, or CERN's server version 1.16b or later, you are probably using the common log format.
-sa string, -ss string, -sr string, -sp string (address and request masks)
The -sa option will only report (IP or name) addresses matching the conditions in the mask string. The -ss option will skip addresses matching the string conditions. -sr will only report requests matching the string conditions, and -sp will skip report requests matching the string conditions.
For these four case-insensitive string masks, the following rules apply:
example 1: getstats -sa "*.com, *.edu" -ss "*.eit.com" example 2: getstats -sr "*.html, *.gif" -sp "*secret*" example 3: getstats -sa "*.*" -sp "/internal/demo.html" example 4: getstats -ss "dopey, sneezy, grumpy"
-sd string, -sh string, -sw string (date, hour, and day masks)
The -sd option reports requests matching the date conditions in the string. In a similar way, the -sh option filters by hour, and the -sw option filters by the day of the week.
For these three case-insensitive string masks, the following rules apply:
example 1: getstats -sd "*/[4-10]/93" -sh "6" -sw "mon" example 2: getstats -sd "1/[5-30]/1993" -sh "5-17" -sw "wed-sun" example 3: getstats -sd "[1-5]/*/[91-94]" -sh "-17" -sw "-thu" example 4: getstats -sd "lastweek" -sh "15-" -sw "tue-"
-i, -p, -ht (input and output)
The -i option will allow you to take input from standard input, so you can do things such as piping lines into getstats. This option is disabled for VMS platforms.
example: tail -100 my.log | getstats -i
-p displays a progress meter, so you can see where getstats is in its processing. The -ht option will report all reports in a single-page HTML format, with appropriate links to server support URLs and this documentation.
-dl number, -df file (domain report options)
The -dl option allows you to specify how many domain levels to report. For instance, with the number of levels set at 1, the domain report would look something like:
With the number of levels set at 2:
The -df option allows you to specify a file with descriptions for domain codes, to make the domain report a bit easier to understand. A file with descriptions is available here.
With domain descriptions:
-b, -ip, -t number (byte reporting, IP lookup, and top lines)
The -b option will report byte traffic statistics in all reports, and an extra column in list reports will be added for byte traffic. In addition, the average number of bytes transferred per hour and day will be added to the statistics paragraph. Byte counts for each file are determined by getting the size of the requested file, the path of which is determined by the top web directory and the request. However, like many log analyzers, there are byte statistics that getstats can't report:
Redirected URLs The sizes of files in personal HTML directories The sizes of scripts Old files that have been changed Requests that have been rejected (perhaps due to access control)
Counts for these requests, or counts for requests that can't be determined, are reported as zero.
The -ip option will make getstats attempt to look up host names from IP addresses. This feature slows processing time, but is useful in analyzing logs from servers that don't look up IP addresses, such as the CERN server.
The -t option allows you to specify how many top lines to report in full, request, and domain reports. Using this, you can easily generate "Top 10" lists and short summaries.
Getstats has a number of options that are not available from the command line which must be specified in the source code:
The root directory of your web or gopher tree, the URL for your server, and aliases for empty and slash ("/") requests. The mark character for graphs, and the number of requests and bytes per mark for different reports. The default settings have been tuned for light server use, or about 200 to 1,000 requests per day. The character length to truncate graph reports and requests reports to. The option to display all dates in local or GMT time (the default is GMT time), and what time zone the logfile uses. Whether or not you want to show numerical domains in the domain report. Whether or not you want to report hours, minutes, and seconds of the last access time in non-graph reports. Whether or not you wish to show files and their related statistics as well as directories in directory tree report. Whether or not to check the actual files to see if they exist in the directory tree report. For HTML request reports, whether or not to display requests as selectable URLs. If you have a GN server, whether you want to report both Gopher and HTML requests or not. Whether or not you want to display an image in your HTML reports (like the one on this page). How dates are to be displayed - either as M/D/Y or D/M/Y.
Getstats crashes and burns when I run it.
First, double-check that you've specified the correct format for your log file. If it's in the "common" format, add the -M option to the command line. If all else fails, and you think there may be a bad line in your log file, try to narrow down the section under which getstats crashes and email it to me so I can test it out.
How can I use getstats on multiple log files?
A good idea is to use the -i option on previously archived logs, for instance:
zcat log.*.Z | getstats -i
Will getstats be rewritten in perl?
No.
How do I run getstats as a CGI script?
First, uncomment the "#define CGI" line in the source code and compile. To call getstats from a URL with options, use a question mark after the program name, plus characters for spaces, and "%22" for quotes in the command line, such as:
Normal example: getstats -a -fb URL example: http://www.eit.com/cgi-bin/getstats?-a+-fb Normal example: getstats -ss "*.com" URL example: http://www.eit.com/cgi-bin/getstats?-ss+%22*.com%22
Why is my directory tree report empty?
If an actual root Web directory is specified, nothing will be reported if getstats can't find the files. Use the -dr option with no arguments to generate a report if no physical directory exists.
Why are some files reported as directories in my tree report?
Unless you tell getstats where the actual files exist, it can only make a "best guess" as to whether a file is a directory or not. Use the -dr option with your root Web directory to tell it where your files are. See the above documentation for more details.
The reported errors from my "common" format log file look OK to me!
The second to last field on each line indicates the status code the server returned. Codes not in the "2xx" range mean some sort of error occurred. You can get a list of these codes in any good HTML book.