Web Log Analysis
The details of your site's activity can help your organization make better business decisions. Here's how to make the best use of your Web logs.
September 10, 2004
Web analysis can pay off. A midsize electronics distributor that had been analyzing its Web traffic for several months, for instance, noticed increased interest in its security products, such as smart cards. So the company launched a marketing campaign for its security offerings, and within a couple of months saw a significant rise in revenue for a division that previously hadn't been on the front burner.
Here's how some of us analyze Web logs at Syracuse University: First, we telnet into one of our Apache Web servers with a "tail -f access_log" that shows log file updates in real time. This provides plenty of data, but it offers only very basic analysis. Watching the traffic hitting the server, we can determine whether users are encountering errors and are referred to our site by external Web sites or search engines. If a user was referred by a search engine, we can see which one, as well as which keywords they used--information that can help your organization with marketing and content management.
Chances are, you don't sit around watching your telnet screen. Your best bet for getting in-depth traffic analysis is to run a commercial log-analysis package.
Needle in a Haystack
Most midsize or large organizations build dynamic Web sites that generate information from a database. Unfortunately, that makes analyzing user behavior more difficult since a site's URLs can look similar:www.mydomain.com/products.aspx?productid=20
www.mydomain.com/products.aspx?productid=21
Most log-analysis software treats these URLs as one page (products.aspx) and ignores the additional parameter that points to which products are actually viewed (product id=20 and productid=21). Make sure you select log-analysis software that lets you define the parameters. You'll get more detailed reports based on the entire query string.
In NetIQ's popular WebTrends analysis tool, for instance, the "URL Parameter Analysis" tab in the advanced feature administration section lets you drill down to specifics on the page contents. You supply the page name (products.aspx) and the parameter name (productid), and the tool spits out specific results under the parameter-analysis report, such as the number of times a particular product page was viewed. You can then determine more useful information, such as the most- and least-requested products on your site. When setting up the parameters, you can use a translation file, which in this case would correlate names with ID numbers--product ID 20 is widget1 and product ID 21 is widget 2, for example.
Tracking the TrendsClick to Enlarge |
But WebTrends doesn't see the parameters as part of the overall path analysis. The effects of the parameters are shown only in the parameter analysis report. This is a start, but it would be useful to use these same parameters in entry, exit and path analysis reports, for example.
By analyzing the paths your users and clients take through your Web site, you may find that visitors don't always enter your site from the home page and exit after they find the information they're looking for. Instead, they often choose paths that bypass information you might consider strategic, such as specials on your home page.
You can alter your site to change how users navigate and use it. An educational question-and-answer service, for instance, had been receiving more questions than its staff could handle. After studying the path analysis to see how site visitors were finding the question form, the company's Web administrators moved the "Ask a Question" link to a less-conspicuous spot. The volume of questions immediately decreased. (They were also able to link the page back to the home page when they found they needed more incoming questions).
Most log-analysis products come with some type of path-analysis report. Based on my experience with log analysis, I like to use a 5 percent rule for entry pages: If more than 5 percent of visitors to a Web site enter on a particular page--from the products page, for instance--that page should get as much priority as the home page. That may mean spicing it up with dynamic content, advertisements and specials. The 5 percent rule can be applied to exit pages, too, to keep users on your Web site longer and entice them to make more purchases or seek additional information.
The reality is that most visitors only stay a few minutes and rarely purchase products. Review the exit-page analysis to determine where visitors are leaving the Web site. If it's not from a page considered a destination, such as a page that lists the categories of products but doesn't show any products, chances are you can reroute them by changing the architecture or content of pages. Perhaps the content on the exit page is too long or too short, or the content is boring. Maybe there's a broken link or an image that takes too long to load. You could also conduct a usability study to determine why visitors are exiting on a particular page, but that's expensive and time-consuming.The good news is that when you make the changes to your site, you'll see results in user behavior immediately.
WebTrends treats parameter analysis and path analysis separately. So to see useful results based on entry and exit pages, you have to turn on an option in WebTrends that displays the entire URL (including the query string) in the reports. We did this by changing the default option from truncated URLs for ASP extensions to showing the entire query string for ASP extensions in the reports.
Then the path analysis--including entry and exit page reports--should have more detailed information to help you make better management decisions. The trade-off is that sometimes the entire query string doesn't translate. WebTrends, for example, can't translate the parameter of a report like this: www.mydomain. om/roster.asp?playerid=1253&sport=189&roster=143.
A truly useful report lets you know every time a user takes an item out of his or her shopping cart prior to checkout. Sure, it would be nice to know if the customer replaced that item with another, but that's easier said than done. Most shopping-cart applications are written using cookies, so it's difficult, if not impossible, for log-analysis software to track the specific products added or removed from a shopping cart.
WebTrends tracks the total number of shopping cart additions or removals, since this information can be found in standard log files (i.e., remove_products.asp). But the information isn't very useful without knowing the details of the specific products removed. One work-around is to store the productID in the query string (i.e., www.mydomain.com/remove_product.asp?productid= 20), which would then become part of the request and written to the log file.Or you can create custom code that writes to a database every time a visitor adds or removes an item from a cart. You can then write a program that reports in detail which products are most often being removed or substituted from the product database. And if you run more sophisticated log-analysis software such as Datanautics' G2 platform, you can use data mining to correlate database information with the log file data. Datanautics' G2 Universal Collector can be integrated with your CRM database, for example, for a detailed report on customer behavior on your site.
The 411 on 404 Errors
It's crucial to detect any errors or problems users experience while navigating your site. The most common is "Error 404 File Not Found."
There are three ways a 404 error can occur: A user may be at a Web site other than yours (including a search engine results page) and click on a link to a page that doesn't exist. Second, a user may click on a link that's supposed to bring him to another page on the same Web site (internal link). Third, a user could incorrectly type a URL.
Obviously, it's important to manage the links within your Web site, but it's also critical to manage external referrals. Decent log-analysis software not only shows the number of 404 errors that occurred, but also tells you which page a visitor was trying to get to when she received the error and which file (internally or externally) produced the 404.External 404s are common after a Web site redesign that includes a new Web architecture and changes or additions to directories and file-naming conventions. There could be thousands of external sites linking to pages that no longer exist--that's akin to a store relocating but not telling its customers where it has moved. There's no way to automate the process of managing external 404s, but you can reduce or eliminate the number of broken referrals.
With log-analysis software, you can go through the 404 errors and find the ones coming from external Web sites. Then send an e-mail to those Web admins, with your updated link information. In most cases, that link will get fixed and you'll score potential customers. But more important, users at your site can go from one Web site to another uninterrupted.
Think of your Web log files as the individual pieces of a puzzle, and your log-analysis software as a tool to help put those pieces together. Then it's up to you to use that information to help generate new business and keep your existing clients happy.
Dissecting the Log File
Check the date and time. The Web page in the "Log Blog" example on page 77 was accessed on August 20, 2004, at 1:25:52 p.m., based on the server time, not the client time. You can't derive the user's time zone, but you can generate reports based on high and low activity times to the site.Trace the client IP address, which can be used for geographic information. Most log-analysis software can execute a whois query based on an IP address to find the country, state, city and organization of the user.
Check the path and file requested by the user. This information is used by the Web administrator in his/her customization of the analysis tool to calculate page counts, as well as entry, exit and path analysis (combined with additional information such as the IP address).
Know your status (code). Anything in the 200s spells success, and the 300s mean the client was redirected to a different page. The 400s indicate a client error (for example, 404 File Not Found), and anything in the 500s means a server error (such as an ASP script error). The status codes are used to generate some of the technical information found in log-analysis reports.
Check the user agent. This string indicates the browser version and operating system the visitor was using, which helps detect problems caused by certain browser/OS combinations, for instance. It also helps you determine whether to use a specific technology based on the percentage of users with browsers that support it. The visitor in our example used Microsoft Internet Explorer 6.0 on a Windows XP system.
Refer to the referral. What internal or external Web page referred the user to the requested page or file? Find out where the user came from. Our "Log Blog" example shows a search for Harry Matrone using the Google search engine. The referral was generated from the third page of the Google results (start=30).Jeffrey H. Rubin is a senior instructor with the School of Information Studies at Syracuse University and president of Internet Consulting Services. Send your comments on this article to him at [email protected].
You May Also Like