I’ve always loved the term “lossy” compression (add a “y” to anything and the “cute” factor really goes up). But just like a baby tiger is cute only so long as you understand that it will one day grow into a vicious, man-eating beast, lossy compression is cute only so long as you understand that it may someday come back and bite you if you’re thinking about long-term preservation.
That sounds a bit hyperbolic so let me step back a bit. In 2011 I wrote about IDOM, four simple steps to helping you start thinking about how to preserve your own digital materials (for the record, it’s Identify, Decide, Organize and Make copies). One undeniable factor in “make copies” is that there’s a trade-off everyone has to make between quality and affordability.
We all want to store our digital data at the highest quality possible, but higher quality generally means larger file sizes, which means more storage which means more money. Compressed data, generally speaking, takes up less physical storage space and moves more easily over networks. The file size difference can be dramatic.
Let’s say you wanted to rip your CD collection and store it as high-quality WAVE files on an external hard drive. A digital file that holds a typical three-minute song on a CD is 30–40 megabytes in size so an average CD would be around 450 megabytes. If you had 1000 CDs in your collection you’d need about ½ a terabyte of storage. Things aren’t so bad these days, cost-wise: ½ terabyte would only run you about $40 (10 years ago it would have run you almost $1200.)
Now lets say you wanted to save storage space by compressing the audio. The MPEG Layer III Audio Encoding (MP3 for short) typically reduces the file size for an audio song by an order of magnitude. So that half a terabyte would now be around 220 gigabytes and cost you roughly $20 total (prices for external hard drives fluctuate quite a bit so don’t hold me to these prices!).
However, when we’re thinking about preserving digital information we generally want to avoid compressing the data, unless we can compress it “losslessly.” “Lossless” compression means that we can shrink the size of any arbitrary piece of digital content, but we can also bring it back to its original size without losing any information in the transformation process.
“Lossy” compression, on the other hand, is a data encoding method that compresses data by removing part of it. Different compression schemes apply different algorithms to determine how to effectively discard the data while keeping the image within an acceptable level of quality as determined by the user’s needs, but there’s no getting around the fact that once the data is discarded under “lossy” compression schemes it’s gone for good.
While institutions (and individuals) want to save on costs as much as possible, we all want to retain as much of the utility of the information as we possibly can. We have no idea how much storage or bandwidth will cost in the future (hopefully less) nor do we know what future users might do with current data (undoubtedly many interesting things), but we’re pretty sure we want to keep our options open.
An MP3 is an example of lossy compression. If you compress that original WAVE file utilizing the MP3 compression scheme the information you remove to decrease the file size is gone for good and you can’t bring it back. It is possible to convert your MP3 back to a WAVE file using available software tools, but all you’ll have is a mediocre WAVE file. The original information is gone and you can definitely hear the difference.
So if you want to preserve an audio file for the long-term you either need to keep it in its original format or utilize a compression scheme that allows you roll back your compressed file to its original form.
There are a number of lossless compressions schemes for audio, though they’re not implemented equally by the major digital media players.
The same holds true for photographs. For example, let’s look at my “butch dogg” picture from the IDOM article.
This image is stored in Joint Photographic Experts Group (JPEG) format which is a compressed format. Sadly, JPEGs are a form of “lossy” compression.
Of course, a large amount of data can be discarded before the result is sufficiently degraded to be noticed by the user, but it’s the same situation as the audio described above. Had I been thinking long-term I might have made a different decision on the final-state format for my photo.
If planning these things out from the start, it’s most advantageous to start with a high-resolution master lossless file that can then be used to produce compressed files for different purposes; for example, a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine while a smaller, lossy copy can be made for a small image on a web page.
A consideration of lossy vs. lossless compression is just one factor in identifying sustainable stewardship practices, but it’s an important one to consider, especially at the start of a digital workflow. The Still Image Working Group of the Federal Agencies Digitization Guidelines Initiative has been exploring these issues in great depth.
Consensus is still developing on most sustainable preservation master formats (see recommendations from NARA, the American Society of Media Photographers and others) but compression is certainly one of the big issues to consider.
The stewardship community will undoubtedly spend plenty of time managing and preserving lossy files (huge numbers of JPEGs and MP3 files are already out there), but if you’ve got the option make yours lossless!
Comments (6)
I’ve always liked this explanation of lossy vs. lossless compression: Imaging you have a large sponge you want to fit in your pocket. If you used lossy compression you would cut and remove portions of the sponge to the size of your pocket. If you used lossless compression, you would squeeze the sponge in your fist and push it in your pocket, leaving the size of the sponge unchanged when pulled from your pocket.
Lots of talk but no useful advice. Would have been a useful article if the author had started by considering the needs of the reader. As an example of the uselessness, he never did recommend any lossless format for saving images.
To lossy or not to lossy? I’m not convinced it is the right question. It is too exclusive!
There are shades of loss in compression technology and sometimes to compress is a good thing and not bad at all. When faced with the choice – compress or decimate? Which is better?
The question stems I think from a lack of understanding. Compression technologies have been well thought out by experts in their given domains, are often complex and often designed to ensure things largely undetectable or unnecessary are omitted from the end result. Very few people understand the complexities of the standards – there is no one “MP3” format for example – and so it is easy to see why generalisations about compression being Bad™ are made.
While I think it is OK to say “we’d prefer not to use lossy compression” I also think we (the DP community) should avoid preaching the impossible and we should understand that the catch all defence “think of the children” (another way of saying “we don’t know what future generations will use this for”) is not usually backed by evidence, often cannot be and as such is not very convincing.
There are bigger questions – we keep the original to vouch for its integrity and the integrity of any surrogates. We shouldn’t use a super compression for which we may one day have to pay a licence fee for. Sometimes it really doesn’t matter that you can’t hear the ant walking across the carpet in the recording studio because someone chose a 512kbps MP3 for the master.
Peter,
Great points! You’re absolutely correct that we’ll need to be more nuanced in the kinds of recommendations that our community makes in terms of file formats and compression, etc.
My main concern in writing the post was to try and let creators know that there are issues with compressed formats that they should keep in mind on the creating end, and that if they have a choice, they should choose master formats that will give them greater options in the future.
Hiram,
Sorry that the article didn’t meet your needs.
There are several recommendations in the next-to-last paragraph from NARA, the American Society of Media Photographers and others.
Thanks Butch. You’re right – wise to get the creators to start big and work from there… Nip it in the bud and all that!