Review: Real-Time Monitoring Products
We tested three real-time user monitoring products. Overall, we were pleased with the entries, but our Editor's Choice edged out the competition with its ease of use and URL parsing.
January 13, 2006
Real-Time User Monitoring Product Features Click to enlarge in another window |
designed to monitor the user experience without employing synthetic or robotic transactions. Synthetic transactions can be useful, especially when there isn't any real user traffic, but they provide only best guesses of what users do. We also evaluated implementation, administration, usability, third-party integration and price. To our as-tested pricing we added maintenance/support costs.
IBM and TeaLeaf Technology didn't respond to our invitation. BMC Software, Cove Light Systems and Mercury Interactive declined. The products sent by Network General, Network Physics and WildPackets analyze protocols and track TCP sessions but don't track HTTP user session statistics, so we didn't include them in our review. Only Compuware, Coradiant and Quest Software sent qualified entries. All three products came to us preloaded on rack-mount servers. Both Coradiant's 2U TrueSight and Quest's 1U User Experience Monitor (UEM) are sold as appliances that comprise data-collection capabilities and database and management interfaces. Compuware's Vantage entry was two software products that typically would run on different servers on a single 1U i86 box.
We tested the products in our Syracuse University Real-World Labs® using a mix of the university's live traffic and our generated traffic as a background and load source. We set up an IBM WebSphere server, which served two applications from the same IP-port combo, forcing each RTUM product to parse the URL to follow app-, transaction- and user-specific traffic. We used the Gomez Performance Network (GPN) service to create transactions originating from "end users" located around the world with varied access speeds, from dial-up to broadband.
Real-Time User Monitoring Vendors |
Services, such as those from Gomez and Keynote Systems, send robotic transactions into a Web site and record the site's reactions to just those transactions. We then let in some live Syracuse University Internet traffic. Our RTUM products recorded all the live (real user) transactions. This approach gave us an idea how the traffic we cared about looked intermingled with other traffic. Our method emulated a situation in which one IT group is caring for one application, while a different group is watching another. We wanted to know how well the traffic could be separated. Note that the RTUM products can be set to recognize and ignore bots like Keynote and Gomez.
If you love detailed data, you'll be in heaven with these products. We anticipated problems damping down the influx of test traffic, but that turned out to be a nonissue. All the products had methods for managing massive quantities of data, and both Coradiant's TrueSight and Quest's UEM suggest limiting data collection to important sites or transactions. By setting capture filters for specified servers, pages, users and transactions, thousands of other sessions being carried over the test segment were written off as noise.
Even with our tightened page, user, transaction and server limits, we collected a lot of application data, and all of it was measured. Statistics about transaction frequency, and user and server response times, as well as data about these response times in relation to historical averages and as standard deviation and percentiles across all users' experiences (all the packets sent and received), were monitored and collected in real time by all three products tested.
We took hourly, daily, weekly and monthly views to see the performance trends and aberrations of our Web server, sites and specific transactions. We looked for measured trends or averages outside our thresholds by running through application and transaction reports. All the products let us set thresholds and could alert us to violations, but as is always the case with baselining performance, it took a week of data collection before these baselines were accurate enough to use.The products we evaluated did well at providing service-level views. By testing all three monitors simultaneously and reviewing service levels daily, we could make head-to-head comparisons and compare that data to our Gomez reports. The tested products provide high-level views that are linked to more granular data. We could define the linkage along server, application, transaction or page perspectives. Although the data display and organization differ, the information provided is similar.
Each product supports a wide range of diagnostic metrics, and our test scenario ensured that we had slow pages to view. Coradiant TrueSight had the edge with its total URL parsing, but Compuware Vantage Analysis Server (VAS) explained the reason for slow page loads, and Quest UEM provided the most comprehensive diagnostic reports. All the products' metrics spanned network, server and application. Compuware emphasized its slow-page analytics' ability to help Web developers with design improvements, but we didn't find its secret sauce unusual. All three products gave us tools to create slow-page reports filtered by application, user and server.
Coradiant's TrueSight and the other two entries differ in regard to transaction monitoring. UEM and VAS will monitor transactions, defined as a set sequence of HTML pages and their objects. UEM was the easiest to set up as it includes a transaction recorder and lets nonsequence browses that hit all the required pages be included in the transaction count. TrueSight doesn't support any link pages; rather, it monitors single pages or page objects.
What better way to start our quest than by monitoring real transactions? We defined within the RTUM products a transaction (a series of HTML pages) or some unique page, or object of a transaction. We used our WebSphere apps, for example, to create a bank-balance transaction that had three pages to complete, and a transaction to purchase a bonsai tree that required seven pages to browse and check out, including filling in credit-card and shipping info. We set the monitors to recognize, count and save performance information about our special transactions whenever they came down the wire. UEM and VAS allowed for simple selections of those sequential pages. In TrueSight, we chose to monitor specific pages, like the page that returned the balance info, and the first and last check-out pages. We could then view how well each product displayed portions of monitored performance, like an admin tracking down a particular problem.TrueSight, rather than monitoring just our specific transaction, parsed and monitored the entire URL in complete detail. VAS and UEM monitored the URL path, host, string and objects, but not to TrueSight's level of detail, which let us choose parameters, values, objects, executables and methods in any combination. Nice.
Thresholds for services were static in both VAS and TrueSight. Only UEM let us set a threshold based on a historical baseline. We said that end-to-end response for our bank home page, for example, must be within 90 percent of the historical setting.
TrueSight doesn't support sorting by access speed as quantified into typical last-mile increments for dial, broadband and T1--something VAS and UEM do, with UEM providing the best detail. The UEM report answering our query "What kind of access speeds do end users have?" provided a good best guess based on throughput monitoring. A line graph showed the average access speed, and a stacked bar graph showed the percentage of users broken down across access-speed buckets--56 Kbps, 128 Kbps, 384 Kbps, 768 Kbps and greater than 768 Kbps.
VAS also includes access speed indicated as a specific user attribute, and we could view broader slow-page download percentages by user location. During our testing, this slow-user metric was consistently from a Road Runner location in Texas, something we would not have guessed, given our mix of dial-up users. All three products placed users in geographical locations using CIDR (Classless Interdomain Routing) blocks. VAS has the coolness edge here--it shows service providers through AS (Autonomous System)-number knowledge.
If dial-up users are a significant part of your site's traffic--and they probably are, considering that 20 percent to 40 percent of Internet-connected households use dial-up, according to IDG--knowing their experience compared with broadband users is helpful when designing pages.
Both TrueSight and UEM were equally logical and simple to configure. VAS, in contrast, was a pain to stablize, even with lots of technical support. TrueSight and UEM didn't require database tuning, for instance, but getting VAS to monitor our data required high-level support and a lot more effort than it should have.
Of course, the time it will take to teach someone to use the interface and get value from their efforts is always a concern. We give the easy learning edge to the simple, well-organized UEM interface, with few levels of linked reports. On the other end of the spectrum is VAS, with its massive metric views that crammed a lot into every screen, making things busy and requiring new users to rely on the data displayed to stay oriented.
TrueSight excelled in creating a context for remembering where we were when we drilled deep into a metric, and made the context of our rules obvious. When creating system rules for error detection, for example, the error categories of network, client, server application and user-defined errors were displayed along with descriptions. This was important in our overall scenario of tracking a specific user who was experiencing an error on a specific page during a specific Web session. When we drilled deep, all the way to an object, TrueSight's graphical breadcrumbs not only helped us learn the user interface and navigation in general, but gave us a consistent reminder of orientation within the performance data, which helped with analysis. Indications of errors in the report, for instance, could be traced down to all the specific errors on a page, with performance statistics and a graphic showing the server, site, page and object contributing to the user experience.
All three products supported internal security administration, including the usual user creation with predefined access roles. All could store SSL keys, but only TrueSight let us store them on an optional tamperproof crypto vault card. If that wasn't enough, TrueSight's confidentiality policy engine let us define how and whether parameters like cookies, queries and URI parameters were stored. We created a policy to delete passwords when they were posted, for example. Other policy options could have let us hash the password or delete just the value, not the fact that a password was passed. However, no product supports external directories for user access--something we'd like.TrueSight's front-panel LED that displays traffic utilization is useful. We weren't going to stand in front of the appliance to watch for spikes, but when connecting new fiber from a tap, it's helpful to have a clue whether any data is making it to the appliance. UEM didn't have the little screen, but its menu-driven command-line interface let us set and check communications, so we could be sure the appliance was seeing data and able to communicate with servers and clients.
All three products have decent canned reports. UEM and TrueSight provide better reporting out of the box, and their more modern interface designs made reports easier to follow overall. But for report creation, the VAS Data Mining interface not only gave us access to all the metrics collected, it also made myriad formatting and selection combinations possible. Report creation in VAS will take some time to learn, but it's worth it. None of the products integrate with third-party reporting tools, like Crystal Reports.
Price, as usual, is tough to compare but is driven mostly by the number of sites you'll need to monitor. Where we get list, you'll pay less. Our two test sites and the few university sites we sprinkled in were at the bottom, sizewise, of what these products support. Quest UEM's pricing was lowest, and the UEM base product scaled further than TrueSight.
Compuware VAS starts at $125,000, a big contrast to the mid-$45,000 cost of UEM. Coradiant staked out the middle ground, at $89,950, but TrueSight had a lower initial maintenance cost, as a percentage of list, and more support options.
Quest told us it's considering a premium support option that will compare with TrueSight's highest level, at 25 percent of list. In most cases, extended support will be a waste of money; after all, how important is it if the monitor breaks in the middle of the night? We were disappointed that no vendor includes formal training in the selling price, and only TrueSight includes on-site implementation with purchase.We were pleased, though, with what we'd seen. All three products do a good job gathering real-time user stats, as evidenced by their close--and high--scores. UEM and TrueSight tied, and had an edge over VAS, thanks to their ease of use and price. Had we tested protocols other than HTTP, both UEM and VAS would have taken back the advantage from TrueSight. But given our scenario, we awarded our Editor's Choice to TrueSight, thanks to its flexible URL parsing.
TrueSight creates flexible yet precise threshold monitoring by employing a combination of complete URL parsing and global and specific thresholding. TrueSight's base monitoring element is the Watchpoint. Watchpoints are regular-expression-monitoring definitions that can parse every portion of the HTTP and HTTPS protocol, including host, port, URI stem and URI query parameters. When traffic from an end user matches the Watchpoint, it gets counted, stored and reported. To set up regular expressions, the TrueSight user interface drives forms, creating logical "and/or" filters that, when matched, count, store and alarm. Each Watchpoint can be monitored in real time and reported over time.
Reports provide an overview of important Web pages, average page-download times and errors that may have occurred over time. When we drilled into specific errors displayed and down to page details, we found an error count for our bank home page. We dug deeper into the errors and found a high number of retries that, according to TrueSight, had been responded to by the client, meaning it was not a network issue, but a server that didn't respond. Additional analysis let us compare the errors, speed and frequency of this object to others during the selected time, so we could see what was going on and make a determination about how localized the problem was.
Cordiant TrueSight Click to enlarge in another window |
TrueSight doesn't string pages together to create a transaction. Its take is that Watchpoints will watch what matters, as long as they're configured to do so. We set Watchpoints for each page we were concerned about and could see the combined and specific experience of every user hitting these pages.Fields available as triggers to capture specific HTTP/HTTPS page and object performance include client IP, port and browser, network, server, application page and content errors; aborted, redirected, expired, denied, document (for example, watching a particular PDF get downloaded), and secure pages and objects; midpage and -object stops; latency, size, TCP round trip; and complete URL path, parameter, action and element parsing.
We could set thresholds, called Performance Compliance Levels (PCLs), within each Watchpoint, or we could have them default to system global values. Defining two thresholds, one for tolerable performance and one for frustrating performance, helped create an operational context for our performance data. The idea is that latency below the tolerable threshold meant users were satisfied; performance above the tolerable but below frustration thresholds indicated a warning; and above the frustrated threshold meant, "Watch out, the phone's going to ring!"
Next we went through performance metrics by network, host, end-to-end latency and throughput to see, graphically, how users were feeling about our service.
To get more detailed information on how our bank and bonsai apps were faring, we used the Reports function, which is also linked in context to Watchpoints. IT management will be able to consistently navigate this linkage into the detailed reports, and by default see host latency graphically with an overlay of user experience broken into percentiles.
When only a one-click report will do, TrueSight's Snapshot Browser quickly displays TopN statistics, like "100 most recent sessions this hour--with errors." Snapshot Browser comes with 14 cleverly named reports, which we could clone and tweak. The reports list user, site, errors and a graphic depicting the length of the session; we got quite specific about the user, the site, the page or any of the statistics TrueSight collects.
TrueSight's interface was good but a tad sluggish. Pages never failed or hung, but the graphs were always a few seconds behind--not so we couldn't live with it, but noticeable. The online Help had useful explanations of keywords and operators, and contextual as well as index-based searching meant it never took us too long to find an answer.
Coradiant TrueSight TS-1100 Real-User Monitor, $89,950. Coradiant, (877) 731-7277, (858) 753-1717. www.coradiant.comUEM's service monitoring is driven through the use of categorization. A hierarchy of groups and predefined reports gathers a multitude of data. Looking at our bank balance transactions, for example, we could see current and historical responses laid against a baseline of activity oriented to normal performance for the particular application or transaction. This is a straight-average baseline, without the more sophisticated period-to-period comparisons we've seen in some performance products. But it was useful and accurate after a couple weeks of collection.
We started by gathering data on our target bank and plant applications, defining the applications and transactions we cared about. We time-correlated and ran more specific metric reports, or jumped back to canned views, like the enterprise transaction view. These canned views covered times like last hour, day or week, which is OK for occasional use and appropriate for business users who need to know what's happening, and they served as a jump-off point for diagnostics.
Our first stop was the applications report, which averaged the bank and plant applications we were testing against. We looked for spikes in the last full day's performance by application. Besides being tabled and graphed, performance metrics were linked to more specific metric reports. Each page linked to a view of specific page download times that made up for the lack of linked graphics. The continuity between reports was good.
Quest Software Click to enlarge in another window |
UEM monitors servers, locations, access speeds, applications, pages, command objects on pages and users. Parsing of the URL for monitoring is split into host, path, parameter and query. We set up URL path and parameter definitions, which UEM then monitored. This is different from TrueSight's approach, which specifically parses URLs. TrueSight offers more granularity, and UEM's data-collection impact is significant because enablement of a URL parameter to watch will cause every instance of that parameter to be gathered, regardless of other URL values.UEM offers canned diagnostic reports for content, transactions, user experience, site, traffic, network and Web services. The report called "What is the quality of the user's experience with the site?" (the name says it all) was our favorite place to start. Displayed in separate graphs, we saw average page download, average time to complete a command, page stop activity (a frustration indicator) and user stickiness (an indicator of activity). This gave us a jumping-off point to notice overall performance trends.
UEM also does some analytics, such as tracking where users abandon the site and common user paths. We recommend the latter report, which can be used to create transactional paths worth watching.Setting up a transaction in UEM was a matter of defining pages and setting their order. Like VAS, a list of observed base pages made for a simple point-and-click definition, but this was dependent on our knowing the pages we wanted. UEM also has a script recorder that launches and runs within a browser on the current management desktop--nothing additional to install. It automatically set UEM to save traffic from the management desktop and capture the browsing path. The script recorder is easy to use, and it works, but we found it just as easy to select the URLs manually.
UEM is big on grouping and categorizing, but Quest has succeeded in keeping the interface simple. It didn't take long for us to feel comfortable navigating it. One problem we occasionally ran into was a partial paint of graphs. A reload or minimization of the UI would fill in the graphics, but Quest admits it has a problem with the client-side Java applet's rendering of graphs. The company says it plans to add server-side graphic creation in the next release, which will have an added benefit of graphically embedded links for even more intuitive navigation.
UEM's Help documents are complete and easy to read online. In addition to providing the typical PDF versions of admin and user manuals, Quest also provides online documents describing each of the menus and gives a complete explanation of the extensive UEM metrics. Metric documentation provided detailed explanations, with examples. The metric-command RTT (round-trip time) had us scratching our heads--is that the same as network round-trip time? The doc explained that the UEM watches small HTTP packets to get a sense of RTT, plus it gave the formula, what assumptions had to be true (like symmetrical bandwidth and low host processing) and other metrics that incorporated this basic metric to make determinations.
Quest UEM is the low-price leader in this review, and it scales well, earning it a Best Value award.
Quest User Experience Monitor 4.6, $45,000 (breaks down to $2,500 per Web server processor with a 16-CPU limit for the software, and $5,500 for each hardware appliance). Quest Software, (949) 754-8000. www.quest.comVAS offers overviews of everything--servers, applications, Web sites, transactions and pages--and we could monitor and average our URLs in many aggregated views. A heavy focus on page load time is aggregated across client, server (what Compuware terms "data center") and network. Thresholds are compared with page download times, the number of users having problems and the percentage of pages downloaded successfully, in living color sorted by severity. Further network page download data is expanded to attribute, for example, network slow pages to network loss rate, network latency, network request time and other miscellaneous, undetermined factors. By displaying the context of network time against end-to-end time for a client, VAS did a good job letting us know when the network was the culprit and when it was not.
From a NOC perspective, the activity map, a graphical overview, shows traffic between monitored Web sites and clients and summarizes usage. It works as a high-level data-center tour screen, but is not without real monitoring value. We saw overview metrics for our sessions, slow pages, affected users, load time, visits and errors.
Compuware Vantage |
An important, though not unique, function of VAS is the ability to diagnose the cause of slow Web pages. Compuware claims some secret algorithm, and it was powerful, but we found the same data available in both TrueSight and UEM. Our plant test app showed a particular page downloading slowly more than a third of the time, for example. VAS broke out the page performance into client delays, network delays, data center/server delays and delays due to page design. From this we could drill into a report listing affected users and their specific number of page loads. We then drilled into all activity of the users having the longest page load times.
One unique feature Compuware VAS does provide is the ability to link from a URL to the actual Web page. It's a useful orientation tool and certainly helps when diagnosing issues.We also looked for specific users experiencing problems. Reports in VAS can be searched, so we opened a canned report that listed users, entered an IP address and jumped to that user's pages. We did the same sort of thing by looking at a transaction report for the time the user was having a problem, then searching for the user. We found the report display interface in VAS flexible.
We missed having physical-layer connectivity to help with probing. There is a command-line interface, but the NICs on our test unit were incorrectly set by manufacturing, requiring our on-site implementation engineer to spend a day fixing it. Compuware aligned the problem to a manufacturing snafu that bypassed some of its usual procedures. It didn't make a good impression, but Compuware assured us it was due to the special circumstances of getting the product to us for review--our test system setup was one-off--and that normally a customer would not have received the product the way we did.Beyond this initial problem, we were glad to have expert help in the care and feeding of VAS. All the parameters for system tuning and maintenance are exposed, explained and tracked, which was impressive--and totally overwhelming. After having run for a while, for example, the system filled up its 184-GB drive by tracking too many university sessions. The database had to be made smaller and the aggregation lessened to 10-minute increments instead of five. It took about five days for the processing to catch up, which meant no real-time monitoring during that period.
We saw two distinct UI designs in this product, both with links, mouse-over descriptions, and report transparency for cloning and editing report definitions. But they are different and take some time getting used to. The design differences are due to the fact that Compuware purchased Adlex Advanced Web Diagnostic Server and is only at the beginning of a planned integration. We like the Advanced Web Diagnostic Server interface for its cleaner and more straightforward display, but for now, users will have to deal with both.
VAS' performance reporting was spotty. WebSphere transactions for the previous day, for example, were displayed in just a couple of seconds, but a report on clients and servers by IP for the same time period took longer than five minutes (we gave up after that). Tech support thought our database problems were to blame, but periodic interface slowdowns persisted without any discernable pattern. Not a big deal, but annoying nonetheless.
We found VAS' help documents confusing, again attributable to the newness of the former Adlex product now being in the Compuware family. Reading the docs did help (surprise) orient and train us, but not many users are going to dig as much as we had to before getting to the soft, chewy center. This area could use some work. We hope to take another look at this system once Compuware finishes absorbing the Adlex technology.
Vantage 9.8, starts at $125,000. Compuware, (800) 521-9353, (313) 227-7300. www.compuware.comBruce Boardman, executive editor of Network Computing, tests and writes about network and systems management. Write to him at [email protected].
R E V I E W
Real-Time User Monitoring Products
Sorry,
your browser
is not Java
enabled
Welcome to NETWORK COMPUTING's Interactive Report Card, v2. To launch it, click on the Interactive Report Card ® icon above. The program components take a few moments to load.
Once launched, enter your own product feature weights and click the Recalc button. The Interactive Report Card ® will re-sort (and re-grade!) the products based on the new category weights
you entered.
Click here for more information about our Interactive Report Card ®.
In our tests, we created transactions (prescribed sequences of browsing, clicks, object downloads and data input) that checked a bank account balance or bought a bonsai tree. In Coradiant's TrueSight, we set Watchpoints for each page, and could see the combined and specific experience of every user hitting these pages.
We attached our test devices to one of the segments leading to Syracuse University's Internet connection using a NetOptics eight-port gigabit regeneration tap. The appliances were able to see all traffic coming and going on this segment, which averages utilization in the high teens. In previous tests, we've seen the thousands of unique sessions crush passive monitoring systems, but this time all three products handled the flood with aplomb. All carefully opened their monitoring only to what we defined, because they knew enough to be afraid.
We used the Gomez GPN testing service to drive the transactions into a generic IBM WebSphere server running the included Plant Store and Bank applications on the same port with the included internal database. We placed our monitoring appliances in front of the Web server, and we tested for HTTP traffic only.Gomez really aided testing, thanks to its global reach, differing access methods, granular reports, tight controls, and because we could isolate each test to a particular access method, time, geographic location, ISP, OS and IP address. All these identifiers were necessary to check the report results in the products under test.
It was important to create multiple tests that appeared similar so that when diagnosing what happened, we weren't able to use summary reports and see outlining errors as cause. It's nice when that happens, but in the real world, it's more likely that the good and bad are going to be lumped together in a jumble. So we designed and ran a bunch of similar bank-balance transactions, other bank transactions, and plant store browses and purchases.
Each morning we'd look at the service views of the products and compare them with the Gomez GPN reports. Gomez showed us very long and failed transactions, and we then looked for these in the RTUM products and chose specific attempts at transactions that were much longer than other transactions within a given hour. This way, even significant time differences among the products and the Gomez service wouldn't cause us to analyze the wrong transactions. If our overview check of services hadn't already pointed to the same problem, we'd use transaction identifier, time and location where the transaction was performed to find the problem, just as if a user was on the phone, chewing our ear off.
All Network Computing product reviews are conducted by current or former IT professionals in our own Real-World Labs®, according to our own test criteria. Vendor involvement is limited to assistance in configuration and troubleshooting. Network Computing schedules reviews based solely on our editorial judgment of reader needs, and we conduct tests and publish results without vendor influence.
You May Also Like