COLD Storage for Computer Generated Data

COLD vs. Enterprise Report Management Systems

©2001 by Charles A. Plesums, Austin, Texas, USA


Electronic images are the best way to store the mail companies receive. The technology has been practical since the mid-1980s. In the early days, the slightly-higher cost of storing the document electronically, rather than in paper or microfilm, had to be offset by the ease of retrieval. Today the dropping cost of computers makes the actual storage of images cheaper than storing paper.

However, what about the "file copy" of the correspondence or other computer output that is sent in response to the mail? It only takes a few thousand bytes to tell a printer how to format and print a page, but if we scan that page and make an image, it will likely start at 500,000 bytes, and may still take 30-50,000 bytes after compression. And if we create an image, we cannot search for words or phrases. There must be a better way

The earliest image systems had a technology for placing text on top of the image of a form - what FileNET later called COLD for "Computer Output to Laser Disc." In the IBM ImagePlus system, the documents were considered images, but in a different "MO:DCA PTOCA" file format. Those documents were stored individually, integrated with the images in an electronic file folder.

There are at least two problems with that early technology

Current Technology

Some computer output management products have evolved from the report distribution technology, and are still optimized for customization and distribution of the minimally formatted computer runs. The RSD America, Inc. product, EOS (Enterprise Output Solution) and the Mobius Infopac product are examples of these products. For discussion purposes, these will be referred to as the "Report Distribution" technology.

Other computer output products have evolved from the output document archive technology, and are optimized for the efficient storage, rapid retrieval, and faithful reproduction of the original document, although they may be more limited in the customization and distribution features. The documents stored in this way must often be maintained, unchangeable, for an extended period to meet regulatory requirements. The FileNET COLD and IBM MO:DCA - PTOCA format described above are early examples of this technology, although these and other vendors have delivered far more modern products in the marketplace. For discussion purposes, this will be referred to as the "Document Archive" or "COLD" technology.

Why the focus on optimization for the type of work? One company was using a leading report distribution package for all applications. A custom report could be generated in just a few minutes. The administrative users of these custom reports were thrilled. The customer service representatives were frustrated - two minutes to regenerate a copy of a customer statement, while the customer is on the phone, is a lifetime. Far too slow for quality customer service. Another division of the same company bought an "add-on" product that prepared all the customer statements that might be needed in advance. They were thrilled with the performance, but the cost of storing all these statement, ready to display or print, was breaking the budget. They were equally frustrated. A competing "COLD" or "Document Archive" product was able to provide sub-second to few-second delivery, with a very low storage cost. The COLD product was not as flexible for producing custom reports from the computer output, but was far superior for customer service.

Could these products, or similar products in today's marketplace, handle all of the storage and distribution of computer output requirements for a company? The respective vendors would undoubtedly say they could, but I believe that two products would better meet the needs of all but the smallest companies: One "Report Distribution" product oriented to mining and distribution of data from large reports, and a separate "Document Archive" product optimized for the rapid delivery and faithful reproduction of formatted documents to a service representative on the telephone or web. In most good-size companies, the large volume of computer output makes the optimization of the product a higher priority than having a single product for all purposes.

The "Document Archive" technology should only be used for computer generated documents that must be delivered and stored in a specific format. Other technologies, such as HTML or XML text files, are better to store information, such as web forms and e-mail, where the content is important, but the formats are variable or can be overridden.

The Document Archive must store all data in a "durable" format, which is

The index information, to randomly access specific documents in the document archive, must be accessible, so that these documents can be virtually integrated with images and other documents.

Although a widely accepted standard for the storage format would be ideal, no such standard format is known. Many vendors support the print streams destined for common printers (such as Xerox Metacode, IBM AFP, HP PCL-5), these widely used print-streams (defined by the hardware of hundreds of thousands of printers) is a de facto standard. Some vendors, particularly those optimized for "Report Distribution" store the information from the print data stream in a database, to optimize the performance of the applications that mine the data and construct the reports. These proprietary database structures are not a durable format. See the separate paper on Durable Format Documents at

