The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in NDIIPP.
This is the second post (of two) on the recently posted comparison of selected digital file formats compiled by the Still Images Working Group within the Federal Agencies Digitization Guidelines Initiative. In this post, I’ll offer some thoughts about JPEG 2000, since one motivation for the comparison project was to size up JPEG 2000 against tried-and-true TIFF. There is also bit here about PNG. Meanwhile, the first post introduced the general topic and offered some notes about TIFF.
During the last four or five years, various specialists in our national and international circle have taken note of JPEG 2000. The Library has made extensive use of JPEG 2000 in its online access applications for maps and scanned newspaper pages. These are both large-raster content forms that benefit from JPEG 2000’s capability to tile images and to handle scaling (in this context, support for zooming). For the maps and newspapers, the archival master files are uncompressed TIFFs. Sets of derivative JPEG 2000 files provide an underlying raster dataset that a server-based application zooms and tiles to meet the end-users request and then delivers to the browser as cropped-to-order “old” JPEG files.
The Library has also used JPEG 2000 encoding wrapped in MXF format (a SMPTE standard) as the archival master target format when reformatting videotapes. The video content is for the most part protected by copyright and access is limited to the Library’s premises, where end-user delivery is provided by MPEG files derived from the MXF masters. Regarding the JPEG 2000 component, this application uses single-tile imagery and (thus far) has not taken advantage of scalability features.
There is a lot to recommend about JPEG 2000. Both the wrapper and the encodings are proper capital-S standards from the International Organization for Standardization and the International Electrotechnical Commission. The family of JPEG 2000 standards includes three encodings, with the main core encoding understood to be free of patent issues. One key JPEG 2000 compression process employs wavelet transforms to provide a very clean image, even in a lossy mode. (JPEG 2000 can also be employed in a lossless mode.) The encoding includes a number of “resiliency” features that add a bit of error-protection absent in most other encodings. The JPEG 2000 wrapper provides a bit more help with color documentation than TIFF, and it has a “box” that can carry XML-encoded metadata.
The encoding can be structured to accommodate user-defined tiling and scalability. At the Library we depend upon a commercial server application to do the work but other organizations take advantage of the JPEG 2000 Interactive Protocol (JPIP, a separate ISO/IEC standard). The Wikipedia article about JPIP reports use in medical imaging applications (“zoom in on the xray”).
In 2011, the Library and FADGI organized a JPEG 2000 Summit; the papers are available online. The speakers included several Europeans who were enthusiastic adopters of JPEG 2000 as a master format.
So what’s not to like? Since the 2011 summit, we have participated in a number of discussions of the topic, including the aforementioned exchange in the Digital Curation group, a January 2013 blog by my colleague Chris Adams with the provocative title Is JPEG-2000 a Preservation Risk?, and a discussion at the March 2014 meeting of the FADGI Still Image Working Group. Our online format comparison was informed by what we learned in these exchanges. Here are a few selected statements about JPEG 2000 from our comparison matrix:
- Sustainability Factors: Adoption: Moderate-to-Wide Adoption (moderate adoption in cultural heritage community, but widely adopted in communities such as moving images. Negligible support in browsers and still cameras)
- Sustainability Factors: Transparency: Acceptable. Compression is compensated for by resiliency elements, intended to mitigate low levels of transparency. However, the format offers many options (tiling, quality layers, progression order, more), and some users have found that “legal” variations may not interoperate from one application to another.
- Cost Factors: Implementation Cost: Medium-High (for reference, other formats including TIFF come in as low)
- Cost Factors: Cost of Software Tools: Medium-High (best toolsets available currently are proprietary tools. Open source tools are not yet mature.) (for reference, other formats including TIFF come in as low)
- Cost Factors: Storage Cost: Low (for reference, some other formats including TIFF come in as high)
- System Implementation Factors: Level of difficulty/complexity: Medium-high (for reference, other formats including TIFF come in as low)
- System Implementation Factors: Availability of tools: Limited to Moderate Availability (not all tools support all features)
- Settings and Capabilities: Support for Color Maintenance: Good (good but not perfect documentation of color space. Standards group working on these) (for reference, the TIFF statement for this factor is Good, caveat: to insert an ICC profile or declare certain color spaces, you must use an extended tag set.)
Is it time to switch from TIFF to JPEG 2000? As a trial lawyer might say, it’s not an open-and-shut case. At the March FADGI meeting, the trend of discussion was toward lining up some pilot projects, perhaps in both the Library of Congress and the Government Printing Office, where extensive image holdings add weight to the storage management factor. It is also possible that some of our scanning projects–think old catalog cards–will provide good candidates for lossy compression, at which JPEG 2000 excels. (JPEG 2000 can also be used in a lossless mode.)
Meanwhile, at the FADGI meeting, we also heard from other agencies that have received significant numbers of JPEG 2000 image files from digitization partners. These need to be managed for the long term and no one suggested that the agencies transform them from their svelte JPEG 2000 selves to chubby TIFFs. Perhaps we can work up some pilot projects before long and, respecting Chris Adams’s push, budget in some actions that will improve the available tools.
I’ll close by looping back to our varying levels of confidence about some of the formats, especially PNG, aka Portable Network Graphics. As I wrote in the Digital Curation thread, several years of inattentiveness on my part led me to relegate PNG to the use case of access-via-browser. This was due to the fact that the format was initially created as a reaction to the threats of licensing fees for GIF (another good-for-browsers format) in the 1990s.
Last month, however, I re-read the W3C specification for PNG and found lots of nifty elements, on paper at least. For example, the standard includes features that support color management. These include a group of metadata tags under the heading Colour Space Information that could document an image’s primary chromaticities and white point, image gamma, and carry an embedded ICC profile. In addition, PNG offers lossless compression with excellent results. Do any libraries or archives use PNG as a mastering file? Have people found that tools support some of the features that caught my eye, like the ones that support color management? Inquiring minds seek to know!
Part One of this series appeared on Wednesday, May 14, 2014.
Great article (love part one as well). Can I ask why JPEG 2000 Lossless always seems to be mentioned as an extra? It’s not just here but elsewhere as well.JPEG 2000 always seems to be pushed as an improvement over standard JPEG but rarely as a lossless archival target. I’m not an image expert and so many of the workflows are alien to me, but it seems that unless JPEG 2000 lossless can live on its own as a format it will never displace TIFF.
Thanks for the note Bill. The relevant international standard for what is called JPEG 2000 “core coding” is ISO/IEC 15444-1:2004. (See http://www.digitalpreservation.gov/formats/fdd/fdd000138.shtml.) The overall compression approach employs a couple of algorithms, the most interesting of which is the wavelet transform. It can applied in what the standard calls “reversible” mode — less compression in terms of size, but no lost data — or “irreversible” mode — more compression, but some loss.
“Old” JPEG also existed in lossless modes (two of ’em even) as well as the MUCH better known lossy mode. (Look at the subtypes hanging from this page: http://www.digitalpreservation.gov/formats/fdd/fdd000017.shtml). But for old JPEG, the actual compression mechanisms for lossy and lossless differ. The Joint Photographic Experts Group succeeded in bringing the two outcomes under one roof with JPEG 2000. This can make it easier to build a switchable, dual-purpose encoding tool.
For most of us, one big plus for wavelet is the increased “clarity” of images in the lossy mode. Old JPEG depends upon a discrete cosine transform applied to 8×8-pixel blocks and this accounts for the visual artifacts you see, especially when the quality knob gets turned down. (They are also there at higher quality, but not very visible.) You get much less “artifacting” with JPEG 2000’s wavelet transform. This is one reason the encoding (in a lossy mode) has been embraced by Hollywood for digital cinema: clean pictures on the big screen.
The lossless functionality of JPEG 2000 (and, actually, of “old” JPEG) was something I was unaware of. But, technology capability aside for a moment, my hard-wired kneejerk response to a JPEG file in the wild is “lossy” — and often “very lossy.”
TIFFs are (generally) much better. On-board lossless LZW and ZIP can make TIFF file sizes fairly acceptable for quality rendered.
In the hands of trained techs who understand the tradeoffs when choosing discrete cosine vs wavelets, the options are great. In the hands of the public, not so based solidly on 25 years in the printing industry watching otherwise-competent clients butcher files at the last second by ill-advised or poorly executed file decisions.
Your videotape procedure seems ideal — the high-res raster master that’s cropped and sampled on the fly under direction of the customer but command of an expert system capable of making proper decisions on file choices. And three years from now, when new discrete tangent wavelets are invented, the option can get stirred into the output option list.
Should we talk of .PDF as if it is a single format or ‘standard’? Is the committee able to wrest a degree of control – and thus assurance of continuity (and longevity beyond the life or whims of the manufacturer – see also Windows XP) from the source-code holders?
Does the concept of “life-time” also include at least the basis of a plan for assets to be “refreshed” or “assured” during the life of an encoding schema, and to be converted to the next ‘best thing’ whenever it might come-along?
I would very much appreciate your thoughts about PDF and PDF/A. Thank you for your articles. They are very helpful.
Thanks for the note. We’ve written extensively about PDF/A on the Signal. I point you to an NDSA report on PDF/A-3 at //blogs.loc.gov/digitalpreservation/2014/02/new-ndsa-report-the-benefits-and-risks-of-the-pdfa-3-file-format-for-archival-institutions/