Is JPEG-2000 a Preservation Risk?

This is a guest post by Chris Adams, in the Repository Development Center, technical lead for the World Digital Library at the the Library of Congress.

Like many people who work with digital imagery, I’ve been looking forward to the JPEG-2000 image format for a long time due to solid technical advantages: superior compression performance for both lossless masters and lossy access images, progressive decoding and multiple resolutions and tiling. Having a single format which is flexible enough to satisfy both preservation and access requirements is appealing, particularly at a time when many organizations are being forced to reconcile rising storage costs with shrinking budgets.

So, given clear technical advantages, why do many of my fellow software developers seem distinctly uneasy about using JPEG-2000? Johan van der Knijff has summarized a range of concerns but I want to focus on the last point from his guest post at the Wellcome Library’s JPEG-2000 blog:

According to [David Rosenthal], the availability of working open-source rendering software is much more important, and he explains how formats with open source renderers are, for all practical purposes, immune from format obsolescence…. Perhaps the best way to ensure sustainability of JPEG 2000 and the JP2 format would be to invest in a truly open JP2 software library, and release this under a free software license.

The most common concern I’ve heard about JPEG-2000 is the lack of high-quality tools and particularly support within the open-source world. I believe this is a critical concern for preservation.

1. Our future ability to read a file is a function of how widely it is used. Some formats achieve this through openness: it is highly unlikely that we will lose the ability to display JPEG files because they are ubiquitous and there are many high-quality implementations which are regularly used to encode and decode files produced by other implementations. (Adoption is one of the Library’s factors for assessing the sustainability of digital collections).

Some proprietary formats have achieved similar levels of confidence through pervasive market share: while it’s always possible for a single vendor to discontinue a product this would have significant repercussions and create strong demand from many organizations around the world for tools to migrate their orphaned content.

I would argue that JPEG-2000 is currently in the unfortunate position of having limited use outside of a few niches and the majority of users depend on proprietary software but might not represent a sufficiently large market to support multiple high-quality implementations. How likely is it that JP2 constitutes even 1% of the new images created every day? The lack of browser support ensures that JP2 is almost non-existent on the web and thus is not a factor in most software selection decisions.

2. The format is quite complex. Complexity significantly increases the barrier to entry for new implementations, particularly given the challenge of not only implementing the entire standard but reaching competitive performance as well. (Transparency is another key sustainability factor).

As a rough estimate of the relative complexity consider the size of two open-source JPEG-2000 implementations versus the entire Python Imaging Library, which supports several dozen formats as well as a general-purpose image processing toolkit (Core lines of code, generated using David A. Wheeler’s SLOCCount):

OpenJPEG C 49,892
libjasper C 26,458
PIL C 12,493, Python: 9,229

3. Limited resources for compliance testing implementations. The complexity of the format and the restricted specification provide many opportunities for developers to produce malformed files or fail to decode correct but obscure options.

In my primary role as the technical lead for the World Digital Library I’ve been asked to process files in most common formats created by the wide variety of software used by our many partners. JPEG-2000 has been by far the most common format requiring troubleshooting and, disturbingly, in most cases the files had already processed by at least one other program before a problem attracted closer inspection. While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.

Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure

Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure.

This is particularly disturbing when you consider the possibility that a file which is was viewable today could become problematic later after a bug is fixed!

A Role for Open Source

Over the last decade, open-source software has become pervasive as users spend considerable time working directly in open-source applications or, far more commonly, using applications which rely on open-source libraries. This is currently a problem from the perspective of JPEG-2000 as the most widespread open-source implementation unfortunately has some compatibility issues and is considerably slower than the modern commercial implementations. The good news is that many, many users — including those of several popular, high-volume websites — are using open-source libraries; targeted improvements there would benefit many people around the world.

The need for improved open-source support has also been touched upon before on this blog in Steve Puglia’s summary of the 20011 JPEG 2000 summit.

Browser Support is Critical

Users are increasingly using browsers to perform tasks that used to be considered solely the domain of traditional desktop applications. Photos which used to be stored and viewed locally are now increasingly being uploaded to social networks and photo sharing sites, a trend which will continue as HTML5 makes increasingly advanced web applications possible. In practice, this means that any image format which cannot be viewed directly in the average web browser will become a support burden for site operators and it becomes correspondingly tempting to adopt a storage format such as PNG or JPEG since most images will eventually need to be transcoded into those formats for display.

This is not in itself a threat to traditional preservation but it’s an additional complication that needs to be dealt with as internal file management applications are increasingly web based and it makes development of access services more expensive, again providing an incentive to simplify access at the expense of storage costs.

Of the popular browsers, only one supports JPEG-2000. For wider adoption, requirements for browsers to support JPEG-2000 will need to be detailed (one example is here).

Potential Use of OpenJPEG

OpenJPEG is emerging as a possible solution for many of these problems, particularly as the recently released version 2.0 added support for streaming and tiled decoding which deliver some of the greatest benefits relative to other formats. For applications such as Chronicling America which need to serve 60+ tile and thumbnail requests per second, this is far more of a limiting factor than the time required to decode the entire master image and was previously unavailable to open-source developers.

Conformance Testing

In addition to the standard JPEG 2000 conformance suite the OpenJPEG project has been developing their own test suite. An obvious area where the preservation community could assist would be in contributing not only examples of test images which exercise features which are important to us but also known-bad images for which tools should issue compliance warnings.

The preservation community should review existing suites and contribute additional freely-available tests to help validate implementations, with a focus on features which are less commonly used in other industries as well as strong conformance testing and improved detection and reporting of non-compliant files to help guide implementations towards stronger interoperability.

1/28/2013: Added additional information to introduction.

18 Comments

  1. Dave Rice
    January 28, 2013 at 7:36 pm

    Pleased to read this assessment. I wanted to add a few supporting comments on your 3 considerations.

    1. Use. I doubt use of jpeg2000 is anywhere near as high as 1%, but for whatever use there is the experience of use is further subdivided among different communities. For instance many jpeg2000 tools only support RGB-based colorspaces or bit depths that are multiples of 8. Some support 10 bit sampling but not losslessly. Some have limited support for various chroma subsampling patterns. When the objective of using jpeg2000 is to encode visual data losslessly there can often be a lot of work to identify jpeg2000 tools that support the pixel format, colorspace, bit depth, and chroma subsampling of the source visual data (transcoding any incoming visual data to the same pixel format in jpeg2000 to normalize the result would compromise the objective of a lossless representation of the original data). Additionally many encoders are not multi-threaded and thus too slow to practical use with video. So the tools relevant to a photo archivist may not at all be appropriate for the goals of a video archivist and vice versa.

    2. Complexity. This point is very true. I believe building a jpeg2000 encoder/decoder has been the only FFmpeg-managed Google Summer of Code project (http://wiki.multimedia.cx/index.php?title=FFmpeg_Summer_Of_Code) that has not been completed within the assigned time. Actually the jpeg2000 SOC project has probably been assigned within three different years and is still unfinished and in experimental status. [note: FFmpeg does support compilation with libopenjpeg as well as their native j2k codec]

    3. Compliance. Generally I’d advocate that archives should use a file format or codec in the simplest configuration possible in order to achieve preservation objectives. I find this also true with the QuickTime file format as well where the entire specification covers extensive complexity in order to facilitate an experience to the user but can produce an overly complex container architecture if not carefully managed. Also with lossless encodings like jpeg2000 there are testing formats such as framemd5 that can be used to verify that both the lossless compressed data and the source data decode to identical images to verify the intended losslessness.

    Thanks again,
    Dave Rice

  2. Gary McGath
    January 28, 2013 at 8:14 pm

    I too have concerns about the format’s long-term viability. Even the commercial options for JPEG2000 are not that great. Luratech’s implementation was very good for a while, but they seem to have lost interest in maintaining it. Kakadu has confusing license terms and a significant price just to evaluate it. OpenJPEG, at least when I last evaluated it, is slow.

  3. Susan
    January 29, 2013 at 10:59 am

    Thank you very much for this column. I work in a field that has been advocating JPEG200. I don’t have an IT background, and your column gave me a lot to think about in a succinct and easy-to-understand way. I now have the language to ask the necessary questions of those promoting this format.

  4. Chris Adams
    January 29, 2013 at 12:36 pm

    Dave Rice: one of our resident video experts brought up that line of questions and I feel it might warrant its own post, particularly as it sounds like it’s starting to become standardized in some fields.

    I strongly agree with your comments about the range of JPEG-2000 feature support, particularly given the pattern of narrow-but-deep adoption in fields might not have much software in common. To act on my last point, I think the first step would be to start surveying the files used within different specialties and collecting examples of valid JP2s using the less common features, perhaps seeing whether there’s a clear need for a new profile beyond e.g. the ones listed at http://www.digitalpreservation.gov/formats/fdd/fdd000138.shtml.

    Ultimately, I would like to have an open source project which would basically contain test images and canonical reference TIFFs with some sort of automated comparison tool as it’s currently somewhat daunting for someone outside of the community to know how to test an implementation. I doubt many people are going to try writing a JP2 codec from scratch but it’s quite easy to envision someone wanting to integrate a library such as OpenJPEG into their own project or perhaps fix bugs or optimize an implementation and wanting an easy way to confirm that they haven’t introduced some subtle regression.

  5. Andrea Goethals
    January 30, 2013 at 9:19 am

    Thanks for this post Chris. Your first 2 points sum up the concerns I’ve had with this format for awhile.

    On popularity: The digital preservation community doesn’t have the resources to maintain its own processing and rendering tools so we should try to stay away from niche formats.

    On complexity: PDF suffers from this same problem. The more complex and feature-filled a format becomes, the more difficult it is to write supporting creation/validation/rendering software. I think that this is a more difficult problem than the first because many times the things that make a format complex are the things that make them appealing for keeping storage costs and bandwidth use down and for creating rich user experiences.

  6. Lars
    January 30, 2013 at 11:41 am

    Excellent summary and nice discussion. One things does not make sense really. Why does the amount of images play a role? Of course there is an endless amount of simple and small images out there where jpeg is fully sufficient but there are also areas like DCINEMA or medical images, geo etc. where the images are very big or where quality is crucial. That is why there is no alternative in some areas (like DCINEMA). Does preservation then really mean to not preserve those items or only take snapshots?
    Also when it comes to size we are used to copy images, no matter what size is required for the targetted device, think of smartphones etc. So we move data that is not required and wait for ages, sometimes we do not even show these images because they are too big.
    I implement solutions based on Kakadu for a decade now. I agree the lic terms are not easy and they basically prohibit to publish code but everybody who owns a lic can publish all sorts of applications. On the other hand there are limitations to open-source. David Taubman is working on that for a long time now and has all the expertise, this is impossible without licence fees.
    For me the bottom line is: Keep it simple when the image is simple but also invest and preserve when quality is crucial or the importance is high.

  7. Paul Wheatley
    January 31, 2013 at 7:13 am

    Excellent post Chris. I agree that collecting example files would be a really useful community contribution. You and others are welcome to contribute to the Format Corpus we’ve been gradually building on Git, which already has some JPEG2000 files there. It might be a useful staging post for collating contributions before moving them in to some of the test suites you’ve mentioned.
    https://github.com/openplanets/format-corpus

  8. Chris Adams
    January 31, 2013 at 5:52 pm

    Lars: I completely agree about the prospective savings with large numbers of high-resolution images. One way to approach this would be for us as a community prioritize building better JPEG-2000 support now – particularly in common image-processing tools where weak/slow support actively discourages use – and treating it as a long-term investment which will allow us to save significant amounts of storage and transfer cost for many years to come.

    Andrea Goethals: I strongly agree with your general conclusion about processing and rendering tools, although I would argue that JPEG-2000 could be an exception because we’re not starting from scratch. Helping with some rough edges on OpenJPEG or integrating it into a popular tool like ImageMagick or GraphicsMagick is significantly less work than producing a new implementation from scratch or convincing millions of users to adopt a new image processing application.

    Paul Wheatley: the format corpus sounds like a great starting point. I’ve been talking internally about surveying the various features which we’re using, which would be a good starting point for seeing where contributions would be useful.

  9. Gary McGath
    February 4, 2013 at 8:08 am

    If what’s needed is a lossless format for large images, what about BigTIFF? It doesn’t seem to be widely used, but it’s supported by TiffLib, so lots of people already have the code to create it, and it’s really just TIFF with some parameters changed to allow 64-bit offsets, so it’s a conservative approach technologically.

  10. David S. H. Rosenthal
    February 4, 2013 at 9:37 am

    Thank you, Chris, for a fascinating post. It reinforces the point I’ve made since 2007 that the key attribute for a format’s survivability is that it have strong open source support. Formats that get wide adoption will have strong open source support and those that don’t, won’t. Its a Darwinian world out there.

    This suggests that the idea of “preservation formats” as opposed to “access formats” is a trap. Precisely because they aren’t access formats, preservation formats are less likely to have the strong open source support that enables successful preservation.

    But your most interesting observation is “more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.” This is a failure to observe Postel’s Law, which I blogged about in the context of the use of format validation tools in preservation pipelines.

    Postel’s Law is fundamental to the Internet; it says you should be strict in what you emit and liberal in what you consume. What we care about is whether the preserved file can be rendered legibly, not whether it conforms to one tool developer’s interpretation of the standard versus another’s,

  11. Johan
    February 6, 2013 at 8:07 am

    Chris, David,

    Following David’s comment above I just re-read Chris’ statement on validation:

    While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.

    Reading this makes me wonder if validation is really the problem here. This would be only true if we had the following situation:

    Encoder A is used to create a JP2
    Decoder B fails to decode it because before doing so it applies some checks (‘validation’) which are too strict

    The assumption here is that decoder B would actually be able to decode the file if the checks were left out altogether!

    Based on my own experience, in the majority of cases the real issue is that the JP2 created by encoder A simply contains features that are not (fully) supported by decoder B, leading to interoperability problems. If you look at JPEG 2000′s Codestream Syntax specification it’s also easy to see how such things may be happening, as it contains quite some features that are optional and which may not be fully supported by all decoders. So often it’s not even about slightly different interpretations of the standard, but rather about incomplete interpretations. And since for JP2 we don’t have many decoding options to begin with this is somewhat worrying.

    But maybe you have some specific examples to the contrary?

    Also a final note/warning on validation: even my jpylyzer validator is still somewhat limited in this regard, as for the image codestreams I’ve only managed to include support for the required (non-optional) marker segments for the main codestream header (5 out of a total of 13) so far. In addition jpylyzer only provides information on 5 out of the 11 marker segments that can occcur at the tile-part level.

    In practical terms this means that 2 JP2s that were created by 2 different encoders may yield similar jpylyzer output (in terms of validation results and reported marker segments), even if one of them contains some of the less frequently-used marker segments.

    At this stage these limitations are mainly due to limited time and resources, and limited availability of sample files for the less-common marker segments, but eventually I would like to add these features.

    Johan

  12. David S. H. Rosenthal
    February 9, 2013 at 4:51 pm

    Johan’s observations are interesting but I stand by my comment that if a program, whether a decoder or a validator, is “rejecting a file with minor errors” then it is not conforming to Postel’s Law. This may be because it is nit-picking, or because it is incomplete, but in the light of Postel’s Law either way it is wrong.

    If this wrong-ness causes the “file with minor errors” to be rejected for preservation that is a serious problem. Given the limited resources available for preservation, an only slightly less serious problem is that it is wasting the valuable time of people like Chris.

  13. Rob Buckley
    March 27, 2013 at 9:04 am

    It was good to see this post and all the discussion it has led to, here and elsewhere. This blog offers useful reminders of points that are all too familiar to specialists in the field.

    Is JPEG 2000 a preservation risk? There is too much of it out there and in high-value collections to believe that it is. JPEG 2000 may be more widely used than people realize. For example, FamilySearch generates over 300 million JPEG 2000 images a year, while historic newspaper digitization programs in the UK and the Netherlands so far have created more than 13 million archival JPEG 2000 images. And at the JPEG 2000 Summit two years ago, the Library of Congress reported that the Moving Image Archiving was growing by 80 to 100 TB a month, and that 60 percent of that data were MXF files with JPEG 2000. Although not a preservation format in a strict sense, it is also the case that JPEG 2000 is the compression format used in the digital cinema standard, now in wide use.

    Investments of this magnitude are not likely to disappear any time soon, and the economic case for using a compressed format such as JPEG 2000 is still compelling. There are also the economics around infrastructure, including open source. Not every institution approaches the economics the same way and if open source software for example is not up to desired levels of performance, then it’s because the incentives aren’t in place yet.

    While the questions raised in the post do not invalidate the use of JPEG 2000, they do point to the continuing opportunity for further developments. All of us share an interest in increasing the availability and capability of JPEG 2000 applications, open source and commercial. All of us see the importance of being sure that practical profiles are documented; profiles that, among other things, increase the ease with which access to content can be provided, directly or indirectly, as the Library of Congress does today with its images of maps. This is where the incentives come in, as well as the identification of actors who might carry out needed work.

    Several of us are planning to meet for an informal lunchtime conversation at the IS&T Archiving Conference in Washington on Tuesday, April 2. We hope this will provide an opportunity to explore these issues, as a follow-up to this blog post and the discussion planned for the JPEG 2000 short course at the conference. If you are interested in joining this get together, send an email to JPEG2000Preservation@gmail.com.

  14. Bill LeFurgy
    March 27, 2013 at 10:11 am

    Rob: Thanks for your comment, and I agree completely that an ongoing discussion is important to explore all the issues involved.

  15. Chris Adams
    March 27, 2013 at 12:06 pm

    Rob: thank you for your response. I feel I should reiterate that my goal is improve the situation for JPEG-2000 because I agree that the benefits are particularly appealing in a time where budgetary concerns are forcing many institutions are forced to make difficult preservation decisions. I strongly agree about economic incentives: one reasonable response might be seeing if the major JPEG-2000 users were to temporarily earmark a small percentage of the funding which currently goes towards producing JP2 files into ensuring that future users will have a first-class experience accessing them and that they fit well with the image processing tools many groups use rather than relegated to special case status.

    Economics is also why I’m concerned: FamilySearch’s 300M files per year is quite impressive but still less than the daily number of uploaded JPEGs which Facebook mentioned in their IPO over a year ago – while I’m sure we would agree about the relative preservation merits of many of those files, market economics may push the major developers in directions other than what we’d prefer (c.f. Photoshop Elements). This is why I feel that browser support is so important – and potentially also an area where JPEG 2000 has a great deal to offer the web as many people are searching for a responsive image solution which doesn’t require maintaining many variations of the same image tailored for various screen sizes and resolutions.

    I look forward to the discussion on Tuesday.

  16. Sean Martin
    April 24, 2013 at 5:52 am

    I too agree that there are many interesting and relevant points are raised in this blog. We have been using JP2 for some time and we have found that a small proportion of the encoded files were defective, but this was only discovered later. Having learned from the experience we are developing a much more robust process that involves encoding, inspection of the file structure in the JP2 that was produced, decoding back to the original format, and then assessing the similarity of the image payload of the round-trip version with the original. The quality criteria for similarity can be adjusted according to the desired level of compression that has been used. (We do not use lossless for all master files, but that is a different story.) Staff from both our repository and digital preservation teams, the latter as part of the SCAPE project, are involved in this work.

    While the primary task is encoding we have realised that that there are merits in incorporating a decode activity. We will have shown that each archived file has been successfully decoded, and the image payload meets the required quality criteria. However, we also have the source and the decode tool is built from that source. This now places an emphasis on the preservation and long term sustainability of the process for decoding, as derived from the version of the source that was used. I suggest this reasoning applies generally as long as one has the source, and not just if it is open source. The approach also provides some mitigation against the “fussiness” about future versions of tools as mentioned in the blog. I also note that Dave Rice made a related remark about a similar merit of a round trip process.

    Among the institutions I am aware of that have large scale digitisation operations, many, if not most, use kakadu. It is our current preferred tool for production operations – the choice is under review – see below. We pay a modest annual licence, we have access to the source, and that is currently used for decoding. We cannot release the source, but we envisage preserving the process built using that source, subject to the licence conditions. Like many open source tools it can be built in several common environments including gnu and visual studio. We would of course all be supportive of and encourage more widespread open source software as advocated in the blog, and this would avoid any licence complications. Our digital preservation staff are comparing the effectiveness of several tools for decoding, including open source tools. We are open minded and will confirm our choice for the best all round tool for round trip decoding taking into account licencing. This may be kakadu or it may be an open source tool, but the same reasoning about the desirability for preserving the ability to decode as part of the round trip process would still apply.

    The blog advocates that browser support is critical. However, consider a web page that shows thumbnails of all 60 pages in a digitised newspaper, and where each master file is a 20Mbyte JP2. Clearly a poor user experience would result if all 1.2Gbytes had first to be downloaded. A better way is to deliver only the image information that the browser actually needs, such as will be required generally with a typical zoom and pan viewer. One can transcode at the server, but it now does not matter that much which format is used for final delivery, and conventional JPEG works quite well. This approach is aligned with the IIIF initiative whereby an IIIF viewer requests the image tiles that it needs to display (link http://lib.stanford.edu/iiif ) and where these can be generated dynamically on request from a master file. A longer term approach would be for browser support for the JPIP protocol (JP interactive protocol) which is specifically designed to send only the image information that the viewer requests. That would be an attractive capability if it were widely supported.

    Risk and its management are of course important topics, but it is appropriate also to consider benefits and cost. A single JP2 archival file can also be used for access to support a zoom and pan viewer. The total cost of ownership is reduced in several ways by: the compression efficiency, the flexibility to apply different levels of compression to different types of content, and also because the architecture is greatly simplified by no longer needing a separate set of access files, or worse multiple sets of access files at different resolutions. Obviating the need for access files completely eliminates the need for a separately managed access system.

    As an organisation we are experiencing ever increasing pressure to reduce costs, certainly in our current economic climate, and we are constrained by what we can afford to do. We need to find a balance between risk, cost and the volume of what we can afford to digitise and retain. This is not easy; however, we see that the key issues we face are thus to ensure that the encoding process is robust and also to preserve the decoding process we will have applied to each image file. There are no absolutes, but these steps will at least partially mitigate many of the points raised in this blog.

  17. Peter Marreck
    April 26, 2013 at 3:10 pm

    I have been following this format for a number of years now. During that time I have gone from being a commercial developer to an open-source developer. Time and again, open source has run circles around commercial solutions both from a security as well as an efficiency perspective. Especially reading here about the problems between various implementations, interoperability and especially the long-term preservation risks around using this format, I am almost certain that this unique and promising format will only achieve niche success unless its implementations are open-sourced and a test suite to validate new or existing implementations developed. And, as you realized, not using an open-source implementation for a long-term-preservation project is just asking for trouble.

    It saddens me that whoever is in charge of the various commercial implementations can’t “see the light” and open-source their code while perhaps selling consulting, custom implementations or other services around it, which is a business model that has been successful for many others.

  18. N Holt
    March 6, 2014 at 6:05 pm

    I have come to this discussion 1 year on from the last post. I have the same concerns as mentioned above. Can anyone tell me if the status of the JPEG 2000 has moved any closer to being a good, reliable file format which has wide useage, that will be long lasting? And therefore, be a good alternative to a Tiff for archiving.

    Thanks

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.