Webalizer Web Statistics FAQ
Questions List
- What is The Webalizer?
- How can I get my web stats?
- What is the difference between "hits," "files," "pages," and "visits?"
- What do these Webalizer reports look like?
- Where does Webalizer get its information?
- Where can I get more information about The Webalizer and its uses?
Answers List
-
What is The Webalizer?
The Webalizer web server log file analysis program. It produces highly detailed, usage reports in HTML format, for viewing with a standard web browser. Holstein World provides this analysis to assist our customers in better understanding their web.
The Webalizer produces yearly, monthly, daily and hourly statistics. In the monthly reports, various statistics may be produced to show overall usage, usage by day and hour, usage by visiting sites, URL's, user agents (browsers), referrers, page and visit totals, entry and exit page totals, search string analysis, and much more.
Each time you initiate a log view, your site statistics are built on the fly by the Webalizer program. This then produces a set of web pages with your site statistics.
-
How do I get my web stats?
You can either click on the Hit Counter at the bottom of your web site, or the "Site Statistics" icon in your control panel. When you do this, your website log is examined, and Webalizer outputs statistic pages on the fly.
You will be presented with a 12-month summary of all traffic on the selected domain. Webalizer statistics started in mid-November 2001, so no information will be available for prior months. For a breakdown of traffic in a given month, click on the month name under "Summary by Month."
-
What is the difference between "hits," "files," "pages," and "visits?"
Here is a listing of terms used by The Webalizer, and how it defines each. Examination of each stastic tells you different things about your site and its traffic.
The makers of The Webalizer maintain their own list of definitions at http://mrunix.com/webalizer/webalizer_help.html
-
Hits
Any request made to the server which is logged, is considered a 'hit'. The requests can be for anything... html pages, graphic images, audio files, cgi scripts, etc... Each valid line in the server log is counted as a hit. This number represents the total number of requests that were made to the server during the specified report period.
-
Files
Some requests made to the server, require that the server then send something back to the requesting client, such as a html page or graphic image. When this happens, it is considered a 'file' and the files total is incremented. The relationship between 'hits' and 'files' can be thought of as 'incoming requests' and 'outgoing responses'.
-
Pages ('Pageviews')
Pages are, well, pages! Generally, any HTML document, or anything that generates an HTML document, would be considered a page. This does not include the other stuff that goes into a document, such as graphic images, audio clips, etc... This number represents the number of 'pages' requested only, and does not include the other 'stuff' that is in the page. What actually constitutes a 'page' can vary from server to server. The default action is to treat anything with the extension '.htm', '.html' or '.cgi' as a page. This is also used with other extensions, such as '.shtml', '.php3' and '.pl'.
-
Sites
Each request made to the server comes from a unique 'site', which can be referenced by a name or ultimately, an IP address. The 'sites' number shows how many unique IP addresses made requests to the server during the reporting time period. This does not mean the number of unique individual users (real people) that visited, which is impossible to determine using just logs and the HTTP protocol (however, this number might be about as close as you will get).
-
Visits
Whenever a request is made to the server from a given IP address (site), the amount of time since a previous request by the address is calculated (if any). If the time difference is greater than a preconfigured 'visit timeout' value (or has never made a request before), it is considered a 'new visit', and this total is incremented (both for the site, and the IP address). The default timeout value is 30 minutes, so if a user visits your site at 1:00 in the afternoon, and then returns at 3:00, two visits would be registered. Note: in the 'Top Sites' table, the visits total should be discounted on 'Grouped' records, and thought of as the "Minimum number of visits" that came from that grouping instead. Note: Visits only occur on Page Type requests, that is, for any request whose URL is one of the 'page' types defined with the Page Type option. Due to the limitation of the HTTP protocol, log rotations and other factors, this number should not be taken as absolutely accurate, rather, it should be considered a pretty close "guess".
-
KBytes
The KBytes (kilobytes) value shows the amount of data, in KB, that was sent out by the server during the specified reporting period. This value is generated directly from the log file.
Note: Webalizer defines a kilobyte as 1024 bytes, not 1000
-
Top Entry and Exit Pages
The Top Entry and Exit Pages give a rough estimate of what URL's are used to enter your site, and what the last pages viewed are. Because of limitations in the HTTP protocol, log rotations, etc... this number should be considered a good "rough guess" of the actual numbers, however will give a good indication of the overall trend in where users come into, and exit, your site.
-
-
What do these Webalizer reports look like?
See Sample Reports for a look at a sample report. This sample is hosted by the creators of The Webalizer.
-
Where does webalizer get its information?
Actually it's farmed straight out of the raw web logs. The raw logs should contain a variety of environment variables supplied to our server when a browser requests a file from us, including HTTP_REFERRER.
- source (IP or name)
- destination domain (eg. holsteinworld.com) timestamp
- HTTP request (normally a "GET" followed by the URL of the file in question)
- HTTP response code (normally 200 for a success)
- file size (in bytes)
- URL of the referring site (which is where Webalizer gets the search string)
- Whatever the browser identifies itself as (eg. "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)")
Take all that information for every request regarding your site and you can get all kinds of data together, namely the webalizer stats.
-
Where can I get more information about The Webalizer and its uses?
See Webalizer home page for authoritative information about the workings of The Webalizer straight from the source.
You may also want to read the WebMonkeytm article on " Troubles with Tracking, " a realistic read on the capabilities and limitations of of web site tracking statistics for business.