File Synchronization

File synchronization tools let you consolidate information for backup and restore data after a disaster. We help you sort through commercial and freeware options.

February 27, 2004

8 Min Read
NetworkComputing logo in a gray background | NetworkComputing

With bandwidth at a premium over WAN links, and enterprises maintaining large code bases and documentation sets, it doesn't make sense to copy every updated file over the WAN. File replication, which is part of the file-synchronization process, is a better option because it uses less bandwidth.

No Smoke and Mirrors

In its simplest form, synchronization resembles file copying. But synchronization is more complex, and can handle several people changing source files at the same time. In this case, the software typically mirrors the changes to machines around the office or across the WAN. Synchronization software recognizes file conflicts. To prevent the data loss inherent in simple file replacement, the synchronization software lets the systems administrator preconfigure the appropriate rules or notifies the administrator before replication occurs.

A similar problem occurs in the database world, where multiple users try to update a data row at the same time. Say, for instance, two salespeople are simultaneously changing a large customer's order. One adds a few more widgets, while the other adds twice as many. The database administrator must implement file locking to prevent data corruption. The parameters specify which sales order is valid so that incorrect information is not accepted--or replicated. This means complex rules.

Not so with file synchronization. There, the software notifies the administrator when conflicting versions of the information appear or when users are introducing nearly duplicate entries into the directory. In most cases, the administrator must arbitrate the conflict manually.This approach originated in the early days of synchronization tools, when excruciatingly slow modems (300 bps) were the main transport for file transfers. At such a sluggish speed, mirroring directory structures on geographically dispersed machines also was expensive; it was cheaper to block replication until the administrator could investigate and resolve the data conflict. Many of today's commercial file-sync products now include rule sets to simplify this process, but in most freeware solutions, the administrator still must intervene.

Once a mirror of the file is established, only changes occurring within that file structure get transmitted. So if a 2 million-line source-code listing requires an update to five lines, synchronization software updates just the five lines instead of copying the entire file.

File synchronizers determine how a file changed based on a date/time stamp and a digest signature of the file. If the source and destination files' signatures and time stamps match, the files are considered identical and are not transferred during an update process. If they differ, the file is presumed altered, so it's then mirrored.

Depending on the synchronization software, files may be divided into smaller, bite-size chunks. Digest signatures, or algorithms, help the software detect any differences between the two files and the smaller pieces, and no action is taken unless the digest signatures change. To learn more about what happens behind the scenes in the sync process, Rsync provides a close-up look (see samba.org/~tridge/phd_thesis.pdf).Today's file-synchronization software is good not only for directory mirroring, but also for data backup and restoration and point-in-time snapshots. Some even offer multicasting. These tools can't replace a good tape backup in major disasters, unless you plan to remove one or more mirrored hard drives from the system in a rotation plan and keep off-site copies--an expensive option. But they can speed movement around the network once the tape restoration is complete, especially to client machines.

Even with tape drives and data on your server, you must back up critical information on desktop computers, preferably when you back up the server. The easiest way to do this is to save a copy of the desktop data to the server's hard drives and let the tape backup at the server do its job.You can synchronize the user-data folder and daily work to the server if you give each user a folder on the server's hard drive. This takes minimal administrative action, and restoration is straightforward--just return the data to the desktop, on an individual or a group scale. Most synchronizers support scripting, access-control lists, file permissions and the like, so you can test this in the lab while devising your backup-and-recovery strategy.

File Sync Test resultsclick to enlarge

Within our internal network at Entre Solutions, we use a file synchronizer to move information from the user's hard drive to the network's backup drive at predetermined intervals. Everyone knows to expect it during lunch and shortly after closing time. We stagger backup times if necessary to minimize the impact on production bandwidth. After the synchronizer completes its sweep, we back up the network drive to tape at night. If we need to restore user data, we use a batch file that acts as a file copy while restoring file permissions and access. It's simple, clean and fast because only recently modified items in the data folder get replicated.

We use file synchronization between our in-house Web development and production environments. It's a Microsoft-specific domain, so we run Fcopy and digital certificates. We perform this synchronization manually when we need to publish a Web site from development to production. This ensures all changes are made, and it cuts transfer time for maintenance updates on some of the larger Web sites. Fcopy is simple to use, but for cross-domain work, you need certificate services and MSMQ (Microsoft Message Queuing) on the server, as well as a certificate for Fcopy. The server resource kit explains how to do this.

There are some security issues with MSMQ, however, including functions in unpatched OS code, such as the MSMQ MQLocateBegin heap overflow. Small server installations don't typically load MSMQ or Fcopy.

Multicasting is used for moving an image from one of our servers to multiple clients. It takes time (see "File-Sync Test Results,"), but it beats copying images to each client location individually, especially if there's no batch file and you have to do it on an ad hoc basis. If you have several remote sites, multicasting also minimizes wide-area bandwidth usage because it's a single session.If your multicast operation fails for just one client, however, you have to repeat the entire multicast for that client. And when restoring hard drive images, security identifiers, outdated Kerberos keys and other security elements can cause problems in the synchronization process.

File synchronization can't replace your tape backup just yet, and in some environments with high rates of data change, it may not even be the best approach. But file sync is an efficient way to consolidate information for desktop-data backup, and it's the method of choice for restoring that data in the wake of a disaster.

Roger Beall is a Savannah, Ga.-based certified senior network systems engineer with Entre Solutions. Write to him at [email protected].

Post a comment or question on this story.

Confused by the plethora of file-synchronization freeware? Here's an overview of the most popular:

  • Rsync (samba.anu.edu.au/rsync) is the granddaddy of file-sync software, and one of the most reliable and easiest to use. It is also one of the simplest to install for Unix platforms. Rsync provides secure (SSH), incremental file transfers and was designed as a file-mirroring program, so it excels at this task. It also comes in Macintosh (www.macosxlabs.org/RsyncX/RsyncX.html) and Windows (optics.ph.unimelb.edu.au/help/ rsync/rsync_pc1.html) versions.

  • Unison's (www.cis.upenn.edu/~bcpierce/unison) claim to fame is that it works across various Unix and Windows platforms. The freeware offers both a CLI and GUI--each machine must run one or the other component. Unison flags conflicts in the replication process and displays them, and security is provided through SSH in Unix environments. A Sockets method is available if you don't need the extra security. The command-line version is scriptable, which is a plus for the administrator. But beware: If you're scripting a mirror and a conflict condition arises, you'll have to handle it administratively before synchronization can proceed.

  • Microsoft offers three tools in its Windows Resource Kit (www.microsoft.com/windowsserver2003/ downloads/tools/default.mspx) that support file synchronization and replication. Robocopy, Fcopy and Mcast work with Windows server versions from NT 4.0 to 2003 and most Microsoft clients. Robocopy has been around since NT4 days, and is fast. It lets you control bandwidth usage, and it maintains an exact mirror. Robocopy can be included in batch files or its own format job files, and you can schedule it to run at periodic intervals or continuously. It's designed for LAN or WAN links in the same network, but doesn't work with Windows 95 clients.

    Fcopy takes a different approach to file synchronization, building on the MSMQ (Microsoft Messaging Queuing) environment. It's designed to operate in a workgroup or networked configuration. The catch is that you need digital certificates on all machines participating in the synchronization process. Depending on your configuration, this may require some work because each server and workstation must have a registered certificate in the MSMQ service before Fcopy will do its job. If you're using digital certificates and have issued them to your users, there's an option in Fcopy that lets it impersonate the user rather than issue its own certificate. But this doesn't always work, especially if synchronization crosses multiple network domains.

    Multicasting is Microsoft's newest file-sync tool. It has a server component, MQCast, and a client component, MQCatch. Your network routers must be set to pass multicast traffic. Microsoft's MQCast and MQCatch can replicate to an unlimited number of clients as long as you kick off the catching component on a timer or manually. It's more of a file-replication tool than a file-synchronization one, however. MQCast and MQCatch are based on MSMQ 3 and aimed at the latest Microsoft OSs--WinXP and Windows Server 2003. The hitch is that leaving the catching component running all the time raises security issues, so use this tool with caution.

  • Other solutions for the Mac OS X environment include Synk and ExecutiveSync. Synk (www.decimus.net/synk) is a freeware resource for Mac users who don't have complex environments. It's limited, but does well as a backup tool. For more complex setups, ExecutiveSync (www.macupdate.com/info.php/id/8543) makes more sense.0

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights