Analysis: Video in the Enterprise

From corporate communications to customer support, a video stream is the next best thing to being there. Here's how to make it work on your network.

June 29, 2007

13 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Video is keeping forward-thinking network architects at their desks long past the dinner hour. Is it the next killer app for corporate networks? Or, maybe it will kill the network, literally. If your video initiative sucks up too much bandwidth, can it be slimmed down without creating herky-jerky output? If no policy has been defined, should IT simply treat video as just another P2P application, like Kazaa, that needs to be monitored and possibly blocked?

The phenomenon known as YouTube means IT can no longer simply ignore the issue. Most enterprises have all but stamped out illegal downloads of music and movies, but YouTube is HTTP-based. It's harder to regulate, especially when employees use the service for legitimate reasons, such as sharing short training videos or sending video messages that have to do with work-related activities. Yeah, we know, for every one legit use you're seeing hundreds of dancing cadets and sobbing heiresses doing the perp walk. But making training, customer sales and support, and corporate communications richer and more cost-effective provides a solid rationale for accommodating video traffic. Here's what you need to know.

Training Day: Video Changes Everything

The training paradigm is shifting: Many large corporations now have CLOs (chief learning officers) charged with assessing training needs and acquiring and delivering tailored curriculums. Cisco and other vendors have shown that the vast majority of training can be delivered without a classroom instructor or travel, with employees completing sessions synchronously or asynchronously, in bite-size increments or during off-peak hours using Webinars, Web-based collaboration and self-study courses. At this year's Interop, for example, LifeSize Communications had a crowd watching its HD videoconferencing system, which used only about 1 Mbps of bandwidth.Enterprises without the manpower to develop and deliver training can take advantage of LMS (learning management system) vendors. Most LMS providers will store your courses, track student access and success, administer exams that measure student learning, and then produce a report. The Corporate University Xchange (www.corpu.com) is dedicated to best practices in this type of training and boasts a high-profile client list that includes IBM, Microsoft and Intel.

But what's the impact of this new application on the network? That depends on where training materials are stored and played out, whether the presentation technology used is TV or computer, the underlying video format, whether the content is cached or stored near the play-out point, and the path the video travels through the network. We cover all these in this Primer Pack. One note: Consider intellectual property protection: With streaming servers that stream and play, content isn't cached, so no file is ever available on the player's computer to be appropriated. However, with download and play, the file must be protected. In their newest offerings, Adobe, Microsoft and Real Networks all provide capabilities to protect downloaded content; see "What the Big 3 Support in Streaming" chart.

Video Quality and the Network

It took IT several years to understand the relationship between VoIP output quality and network capacity. The sages told us that VoIP networks must be extremely high-performing, that even a 1 percent packet loss would be harmful to voice quality. But experience has shown that in VoIP, when a packet is not delivered, the receiving decoder simply plays the last packet over or synthesizes the sound. Our ears aren't sensitive enough to really miss that 1/50 second of audio. In fact, packet losses of 2 percent to 3 percent can be tolerated, absent other problems.

Video is different. In MPEG transport over UDP or RTP, losing one IP packet usually means that seven MPEG transport packets are lost. If they came from an I-frame, the impact is significant and may cause the decoder to incorrectly calculate the next dozen or so video frames (see "Compression: The Other GOP" for an explanation of frame types). This could represent nearly 1/3 of a second of output. On the other hand, if the IP frame carried information that belonged to a B-frame, the impact will be on that one frame only, causing a deterioration of 1/30 of a second of the video.There are tools and developing standards to help you assess video quality. The IETF MDI (Media Delivery Index) is promoted by Agilent Technologies, IneoQuest Technologies and others, while Telchemy is betting on its proprietary VQmon. But when it comes to monitoring the quality of streamed video, you're probably on your own: Most tools can't look inside HTTP to see the video stream because major vendors, including Microsoft and Adobe, keep their formats proprietary.

When video is streamed over HTTP/TCP, TCP sequence numbers indicate when a frame has been dropped. Eventually, it will be retransmitted. If it arrives in time, there will be no deterioration of video output. Added delay will likely be compensated for by the play-out buffer.

Got Bandwidth?

A form of convergence is taking place in video encoding and transport technologies. This is to be expected--manufacturers often use proprietary implementations and then evolve to standards-based platforms. For example, Microsoft's VC-1 produces a bitstream that can be transported over RTP. It's been specified in RFC 4425 and SMPTE (Society of Motion Picture and Television Engineers) 421M. Likewise, Adobe's Macromedia Flash now supports H.263, another transport standard.

Broadcast video delivered in an SPTS (single-program transport stream) using an MPEG-2 encoder will require about 2 Mbps to 5 Mbps bandwidth capacity per stream. There's generally 5 percent to 20 percent overhead in the control and packet headers. Several manufacturers, including Tut Systems, have demonstrated HDTV encoders that produce an encoded stream in 6 Mbps or slightly less. However, with additional control and packet headers, your network would need to support about 8 Mbps if the single-program stream of a corporate TV studio or the output of a security camera is sent over the network.Streaming video is almost always an HTTP/TCP application. The video server will throttle bandwidth based on the amount of capacity selected for the session. However, download-and-play video is a file-transfer operation, and will use all the bandwidth it can grab. When we recorded a sample of downloads from YouTube, the average bandwidth used was about 4 Mbps, roughly the same as an SPTS.The amount of bandwidth you'll use also depends on the path of the video; the goal is to transmit a stream as few times as possible to reach viewers. Two techniques, caching and multicasting, support this goal. Multicasting has been around for a while and it isn't supported on the Internet, so we'll focus on caching.

Video caching works like Web-page caching: An origin server, in this case a video server, creates a unicast stream and sends it to a server near the viewer. At that server, the video file is distributed using two techniques, multicasting and stream splitting. Stream splitting is similar to multicasting but doesn't depend on 224.x.x.x addresses. Cisco and Blue Coat are among the many vendors that provide this capability. Blue Coat says its remote caching servers also provide other services, including viewer authentication and reporting.

Finally, for those contemplating video distribution over wireless, Meru Networks demonstrated at this year's Interop eight HDTV signals being transmitted through a single 802.11n AP. Something to look forward to.

Compression: The Other GOP

The MPEG encoder is usually configured to create a GOP, or group of pictures, using I, P and B frames, each 1/30 second. An I-frame is essentially a JPEG-compressed image. While all I frames could be transported, the bandwidth requirement would be unreasonable. So, the encoder calculates P-frames by removing spatial redundancy from the I-frames. As an example, consider a pitcher's body, with a ball moving from frame 1 to frame 2.

Next, the encoder creates B-frames by removing temporal redundancy—essentially, information that is repeated across frames. In our example, the position of the ball can be predicted in frame 2 by looking at frame 1 and frame 3. So, P-frames are derived from preceding I-frames or P-frames; B-frames are derived from preceding or succeeding I- or P- frames. As a result, if part of an I-frame is lost, the entire GOP is affected.An encoder creates the I, P and B frames, and the resulting bitstream is broken into 184-byte blocks, each with a four-byte transport header appended that contains information about the overall program stream and timing. Seven such transport packets generally comprise the payload of an IP packet, and an I-frame will span many IP packets. So, if one is lost, the part of the screen that is presented is distorted. History: The Role of Television

To understand the many forms in which video might appear in the enterprise network, begin with the method used to implement television broadcasting. Years ago, federal regulators allocated 6 MHz of bandwidth per individual channel being broadcast over the air. With this bandwidth, engineers managed to represent moving images in two primary formats: NTSC (National Television Standards Committee), used mainly in North and South America and Japan, and PAL (Phase Alternation Line), used most everywhere else.

NTSC records and shows 30 frames per second using the 6 MHz originally granted by the government. The image is shown as 480 visible lines, each containing 720 pixels (720X480). This format is often referred to as Standard Definition TV, or SDTV. High Definition TV, or HDTV, has approximately twice the number of lines and twice the density of pixels on each line. Therefore, an HDTV signal, when digitized, will use about four times the bandwidth of a digitized SDTV signal, which would consume 270 Mbps. This is where MPEG compression comes in (see "Got Bandwidth").

At the source of the video stream is an encoder. Using sophisticated mathematical techniques, the signal captured by the camera is digitized, and redundant information is removed. About 100:1 compression is achievable without causing the output to be significantly distorted, though the actual rate is dependent on many factors, including how much diversity there is in color and brightness, the amount of motion (read: don't use a handheld camera), and parameters set in the encoder.

The other major factors that affect final bandwidth requirements are frame rate (frames per second) and frame size (pixels/lines). Once all this is taken into account, the actual bandwidth requirement can vary from as little as 80 Kbps for a small QSIF (Quarter Source Input Format) image (176X144) to as high as 4.5 Mbps for an SDTV (720X480) signal.As for other compression techniques, Windows Media uses WM9, and Macromedia uses Flash. Based on many of the same standards as MPEG or derivative ideas from those standards, each claims to produce the same output as MPEG, but with more significant compression ratios. Real Networks and Apple also have compression implementations, but Flash and Windows Media seem to be dominating the streaming market.

Doing Video On the Cheap

Suppose you only need to capture video at one site and have it played out at a second location. For example, you have a camera in the executive suite and want the video transported to the studio in another building. Simple, right? You need an MPEG-2 encoder, a network interface, an MPEG-2 set-top box and a display device. Visionary Solutions sells an encoder for about $1,200, and a set top box can be had from Amino Technologies for a little under $300. Throw in about 3 Mbps to 5 Mbps of bandwidth and a television and you're in business. If you need to secure the stream, inexpensive encryption appliances are available from vendors like Mistletoe Technologies.

A Taxonomy of Video Types

Broadcast Video: What you see when you watch television at home. It's the technology of antennae or coax cables, DVD players, and set-top boxes. The term "broadcast" refers to a realistic top-quality output—it's the "toll quality" of video, transmitted in 6 MHz channels if analog. If it's digitized, as many as four broadcast program channels can be squeezed into the same 6 MHz band. It may use IP transport, but it won't be apparent because conversion and delivery are inside the service provider's network. It's almost always in a one-to-many configuration.Desktop or Room-based Videoconferencing: VC technology comes in two distinct forms: room-to-room and desktop-to-desktop. Desktop VC is well understood; it usually encodes video at a low frame rate and image size, because the output is on a computer screen, and is often HTTP based, although other implementations exist.

Room-based VC is usually two-way but may involve three or more locations and allows for a high degree of realism by using a squares paradigm—think of the old "Hollywood Squares" format. This form of conferencing was originally based on H.320 standards and eventually evolved to H.264 codecs and IP transport. While still generally based on bandwidth allocations that are fractions of a T-1 circuit, generally either 128 Kbps or 384 Kbps, there are two gotchas. First. IP overhead may be 25 percent or higher on smaller frames. In addition, in a three-or-more-party conference session, some bridges create a full path for each square (party) shown on the screen. Other conferencing bridges combine the images and send a single signal to each output device. Bandwidth usage is controlled by setting frame rate and image size (resolution) and controlling subject motion. Note that the newest HD VC implementations are reported to use very high levels of bandwidth, up to 40 Mbps.

Streaming or Streamed Video While its precise definition varies, streaming video is often characterized as much by the four vendors that dominate the market, Microsoft WM9, Adobe Flash, Real Networks Real and Apple Computer QuickTime, as by its terminology and basis in proprietary standards.

To play video from a streamed source, you must have a player that is compatible with the encoder that was used. Streaming architectures depend on a source video server, optional proxy or caching servers, and the player. Large implementations will likely require caching servers, which may be supplied by a CDN (content delivery network) vendor. These networks are an overlay to the underlying transport networks and supply caching capability and other control functions, such as authentication.

While it isn't a technical requirement, common file formats are usually transported over HTTP to allow browser control as well as player control. This is significant because it means that the transport is based on TCP, therefore, such servers will use all of the bandwidth allocated. Because it's based on HTTP, it may be more difficult for IT to isolate the streaming video to limit its bandwidth usage.That is in sharp contrast to MPEG traffic, which travels in a transport stream based on UDP or UDP/RTP. In either case, there will be a maximum bandwidth requirement dependent on settings in the encoder. Most often, CBR (constant bit rate) transmission is used. For example, if the encoder is set to output 6Mbps, the MPEG bit stream will fill the MPEG transport packets at that rate. If the encoder can't supply enough video or audio bits, stuffing will be inserted. With packet overhead added, total bandwidth will increase by about 5 percent.What the Big 3 Support in Streaming

We're seeing a migration of video toward streaming: Microsoft told us that the overlap between its IPTV effort, MSTV, and its streaming effort, Windows Media, is large and is expected to grow. Real Networks agreed that this is a trend. All streaming vendors listed support a wide range of resolutions.


Vendor

Download

Stream

Less than 100

100 to 1000

Greater than 1000

Multiple Player Support

Intelligent Streaming

IP Protection

Auto-play From Browser

Adobe

x

x

x

x

x

x

x

Microsoft

x

x

x

x

x

x

x

x

x

Real Networks

x

x

x

x

x

x

x

x

x


Phil Hippensteel is an assistant professor of information systems at Penn State University and an industry consultant. Write to him at [email protected].

Read more about:

2007
SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights