Top of page

Is JPEG-2000 a Preservation Risk?

Share this post:

This is a guest post by Chris Adams, in the Repository Development Center, technical lead for the World Digital Library at the the Library of Congress.

Like many people who work with digital imagery, I’ve been looking forward to the JPEG-2000 image format for a long time due to solid technical advantages: superior compression performance for both lossless masters and lossy access images, progressive decoding and multiple resolutions and tiling. Having a single format which is flexible enough to satisfy both preservation and access requirements is appealing, particularly at a time when many organizations are being forced to reconcile rising storage costs with shrinking budgets.

So, given clear technical advantages, why do many of my fellow software developers seem distinctly uneasy about using JPEG-2000? Johan van der Knijff has summarized a range of concerns but I want to focus on the last point from his guest post at the Wellcome Library’s JPEG-2000 blog:

According to [David Rosenthal], the availability of working open-source rendering software is much more important, and he explains how formats with open source renderers are, for all practical purposes, immune from format obsolescence…. Perhaps the best way to ensure sustainability of JPEG 2000 and the JP2 format would be to invest in a truly open JP2 software library, and release this under a free software license.

The most common concern I’ve heard about JPEG-2000 is the lack of high-quality tools and particularly support within the open-source world. I believe this is a critical concern for preservation.

1. Our future ability to read a file is a function of how widely it is used. Some formats achieve this through openness: it is highly unlikely that we will lose the ability to display JPEG files because they are ubiquitous and there are many high-quality implementations which are regularly used to encode and decode files produced by other implementations. (Adoption is one of the Library’s factors for assessing the sustainability of digital collections).

Some proprietary formats have achieved similar levels of confidence through pervasive market share: while it’s always possible for a single vendor to discontinue a product this would have significant repercussions and create strong demand from many organizations around the world for tools to migrate their orphaned content.

I would argue that JPEG-2000 is currently in the unfortunate position of having limited use outside of a few niches and the majority of users depend on proprietary software but might not represent a sufficiently large market to support multiple high-quality implementations. How likely is it that JP2 constitutes even 1% of the new images created every day? The lack of browser support ensures that JP2 is almost non-existent on the web and thus is not a factor in most software selection decisions.

2. The format is quite complex. Complexity significantly increases the barrier to entry for new implementations, particularly given the challenge of not only implementing the entire standard but reaching competitive performance as well. (Transparency is another key sustainability factor).

As a rough estimate of the relative complexity consider the size of two open-source JPEG-2000 implementations versus the entire Python Imaging Library, which supports several dozen formats as well as a general-purpose image processing toolkit (Core lines of code, generated using David A. Wheeler’s SLOCCount):

OpenJPEG C 49,892
libjasper C 26,458
PIL C 12,493, Python: 9,229

3. Limited resources for compliance testing implementations. The complexity of the format and the restricted specification provide many opportunities for developers to produce malformed files or fail to decode correct but obscure options.

In my primary role as the technical lead for the World Digital Library I’ve been asked to process files in most common formats created by the wide variety of software used by our many partners. JPEG-2000 has been by far the most common format requiring troubleshooting and, disturbingly, in most cases the files had already processed by at least one other program before a problem attracted closer inspection. While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.

Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure
Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure.

This is particularly disturbing when you consider the possibility that a file which is was viewable today could become problematic later after a bug is fixed!

A Role for Open Source

Over the last decade, open-source software has become pervasive as users spend considerable time working directly in open-source applications or, far more commonly, using applications which rely on open-source libraries. This is currently a problem from the perspective of JPEG-2000 as the most widespread open-source implementation unfortunately has some compatibility issues and is considerably slower than the modern commercial implementations. The good news is that many, many users — including those of several popular, high-volume websites — are using open-source libraries; targeted improvements there would benefit many people around the world.

The need for improved open-source support has also been touched upon before on this blog in Steve Puglia’s summary of the 20011 JPEG 2000 summit.

Browser Support is Critical

Users are increasingly using browsers to perform tasks that used to be considered solely the domain of traditional desktop applications. Photos which used to be stored and viewed locally are now increasingly being uploaded to social networks and photo sharing sites, a trend which will continue as HTML5 makes increasingly advanced web applications possible. In practice, this means that any image format which cannot be viewed directly in the average web browser will become a support burden for site operators and it becomes correspondingly tempting to adopt a storage format such as PNG or JPEG since most images will eventually need to be transcoded into those formats for display.

This is not in itself a threat to traditional preservation but it’s an additional complication that needs to be dealt with as internal file management applications are increasingly web based and it makes development of access services more expensive, again providing an incentive to simplify access at the expense of storage costs.

Of the popular browsers, only one supports JPEG-2000. For wider adoption, requirements for browsers to support JPEG-2000 will need to be detailed (one example is here).

Potential Use of OpenJPEG

OpenJPEG is emerging as a possible solution for many of these problems, particularly as the recently released version 2.0 added support for streaming and tiled decoding which deliver some of the greatest benefits relative to other formats. For applications such as Chronicling America which need to serve 60+ tile and thumbnail requests per second, this is far more of a limiting factor than the time required to decode the entire master image and was previously unavailable to open-source developers.

Conformance Testing

In addition to the standard JPEG 2000 conformance suite the OpenJPEG project has been developing their own test suite. An obvious area where the preservation community could assist would be in contributing not only examples of test images which exercise features which are important to us but also known-bad images for which tools should issue compliance warnings.

The preservation community should review existing suites and contribute additional freely-available tests to help validate implementations, with a focus on features which are less commonly used in other industries as well as strong conformance testing and improved detection and reporting of non-compliant files to help guide implementations towards stronger interoperability.

1/28/2013: Added additional information to introduction.

Comments (25)

  1. Pleased to read this assessment. I wanted to add a few supporting comments on your 3 considerations.

    1. Use. I doubt use of jpeg2000 is anywhere near as high as 1%, but for whatever use there is the experience of use is further subdivided among different communities. For instance many jpeg2000 tools only support RGB-based colorspaces or bit depths that are multiples of 8. Some support 10 bit sampling but not losslessly. Some have limited support for various chroma subsampling patterns. When the objective of using jpeg2000 is to encode visual data losslessly there can often be a lot of work to identify jpeg2000 tools that support the pixel format, colorspace, bit depth, and chroma subsampling of the source visual data (transcoding any incoming visual data to the same pixel format in jpeg2000 to normalize the result would compromise the objective of a lossless representation of the original data). Additionally many encoders are not multi-threaded and thus too slow to practical use with video. So the tools relevant to a photo archivist may not at all be appropriate for the goals of a video archivist and vice versa.

    2. Complexity. This point is very true. I believe building a jpeg2000 encoder/decoder has been the only FFmpeg-managed Google Summer of Code project (http://wiki.multimedia.cx/index.php?title=FFmpeg_Summer_Of_Code) that has not been completed within the assigned time. Actually the jpeg2000 SOC project has probably been assigned within three different years and is still unfinished and in experimental status. [note: FFmpeg does support compilation with libopenjpeg as well as their native j2k codec]

    3. Compliance. Generally I’d advocate that archives should use a file format or codec in the simplest configuration possible in order to achieve preservation objectives. I find this also true with the QuickTime file format as well where the entire specification covers extensive complexity in order to facilitate an experience to the user but can produce an overly complex container architecture if not carefully managed. Also with lossless encodings like jpeg2000 there are testing formats such as framemd5 that can be used to verify that both the lossless compressed data and the source data decode to identical images to verify the intended losslessness.

    Thanks again,
    Dave Rice

  2. I too have concerns about the format’s long-term viability. Even the commercial options for JPEG2000 are not that great. Luratech’s implementation was very good for a while, but they seem to have lost interest in maintaining it. Kakadu has confusing license terms and a significant price just to evaluate it. OpenJPEG, at least when I last evaluated it, is slow.

  3. Thank you very much for this column. I work in a field that has been advocating JPEG200. I don’t have an IT background, and your column gave me a lot to think about in a succinct and easy-to-understand way. I now have the language to ask the necessary questions of those promoting this format.

  4. Dave Rice: one of our resident video experts brought up that line of questions and I feel it might warrant its own post, particularly as it sounds like it’s starting to become standardized in some fields.

    I strongly agree with your comments about the range of JPEG-2000 feature support, particularly given the pattern of narrow-but-deep adoption in fields might not have much software in common. To act on my last point, I think the first step would be to start surveying the files used within different specialties and collecting examples of valid JP2s using the less common features, perhaps seeing whether there’s a clear need for a new profile beyond e.g. the ones listed at http://www.digitalpreservation.gov/formats/fdd/fdd000138.shtml.

    Ultimately, I would like to have an open source project which would basically contain test images and canonical reference TIFFs with some sort of automated comparison tool as it’s currently somewhat daunting for someone outside of the community to know how to test an implementation. I doubt many people are going to try writing a JP2 codec from scratch but it’s quite easy to envision someone wanting to integrate a library such as OpenJPEG into their own project or perhaps fix bugs or optimize an implementation and wanting an easy way to confirm that they haven’t introduced some subtle regression.

  5. Thanks for this post Chris. Your first 2 points sum up the concerns I’ve had with this format for awhile.

    On popularity: The digital preservation community doesn’t have the resources to maintain its own processing and rendering tools so we should try to stay away from niche formats.

    On complexity: PDF suffers from this same problem. The more complex and feature-filled a format becomes, the more difficult it is to write supporting creation/validation/rendering software. I think that this is a more difficult problem than the first because many times the things that make a format complex are the things that make them appealing for keeping storage costs and bandwidth use down and for creating rich user experiences.

  6. Excellent summary and nice discussion. One things does not make sense really. Why does the amount of images play a role? Of course there is an endless amount of simple and small images out there where jpeg is fully sufficient but there are also areas like DCINEMA or medical images, geo etc. where the images are very big or where quality is crucial. That is why there is no alternative in some areas (like DCINEMA). Does preservation then really mean to not preserve those items or only take snapshots?
    Also when it comes to size we are used to copy images, no matter what size is required for the targetted device, think of smartphones etc. So we move data that is not required and wait for ages, sometimes we do not even show these images because they are too big.
    I implement solutions based on Kakadu for a decade now. I agree the lic terms are not easy and they basically prohibit to publish code but everybody who owns a lic can publish all sorts of applications. On the other hand there are limitations to open-source. David Taubman is working on that for a long time now and has all the expertise, this is impossible without licence fees.
    For me the bottom line is: Keep it simple when the image is simple but also invest and preserve when quality is crucial or the importance is high.

  7. Excellent post Chris. I agree that collecting example files would be a really useful community contribution. You and others are welcome to contribute to the Format Corpus we’ve been gradually building on Git, which already has some JPEG2000 files there. It might be a useful staging post for collating contributions before moving them in to some of the test suites you’ve mentioned.
    https://github.com/openplanets/format-corpus

  8. Lars: I completely agree about the prospective savings with large numbers of high-resolution images. One way to approach this would be for us as a community prioritize building better JPEG-2000 support now – particularly in common image-processing tools where weak/slow support actively discourages use – and treating it as a long-term investment which will allow us to save significant amounts of storage and transfer cost for many years to come.

    Andrea Goethals: I strongly agree with your general conclusion about processing and rendering tools, although I would argue that JPEG-2000 could be an exception because we’re not starting from scratch. Helping with some rough edges on OpenJPEG or integrating it into a popular tool like ImageMagick or GraphicsMagick is significantly less work than producing a new implementation from scratch or convincing millions of users to adopt a new image processing application.

    Paul Wheatley: the format corpus sounds like a great starting point. I’ve been talking internally about surveying the various features which we’re using, which would be a good starting point for seeing where contributions would be useful.

  9. If what’s needed is a lossless format for large images, what about BigTIFF? It doesn’t seem to be widely used, but it’s supported by TiffLib, so lots of people already have the code to create it, and it’s really just TIFF with some parameters changed to allow 64-bit offsets, so it’s a conservative approach technologically.

  10. Thank you, Chris, for a fascinating post. It reinforces the point I’ve made since 2007 that the key attribute for a format’s survivability is that it have strong open source support. Formats that get wide adoption will have strong open source support and those that don’t, won’t. Its a Darwinian world out there.