During Preservation Week 2013, I gave a webinar about personal digital archiving. Over 600 people participated and, during the post-presentation question section, 91 people submitted questions online. I had time to answer about a dozen or so. After the webinar, the hosts from the Association for Library Collections and Technical Services sent me the complete list of questions and I’m gradually responding to all of them. Questions are always good because it helps us to improve and expand our information resources.
The questions covered a variety of topics — email preservation, file naming, digital video, file migration, scanning and digital asset management — but the most striking fact is that two-thirds of all the questions could be grouped into just two main topics: digital photos and storage.
Interest in digital photos is not surprising. Most of the questions we get at NDIIPP personal-digital-archiving presentations are related to digital photos. The webinar questions about storage were also not surprising; with the variety of available digital storage options and the uncertainty about their reliability, storage can be a perplexing topic.
I’d like to share a few of the webinar questions in this post. There’s not enough space to cover both topics today so I will just do the digital photo ones. I will post the digital storage questions in a future column.
Photographer David Riecks, of photometdata.org, helped answer the more difficult questions. Since many of the questions were variations on the same theme, I mashed some of the more representative ones together.
Which is better for preservation, JPEG or TIFF? I have heard that TIFF is better because of degrading. Do JPEGs deteriorate?
TIFF is a lossless format, though newer versions of photo-processing applications such as Photoshop have options to save TIFF files with various forms of lossless compression. A lossless file format is especially good if you plan to return to the file to make tone or color changes, or to retouch the photo. When you finish with the file and close it, there is no data compression and no image data is lost.
TIFF files require more storage space than JPEGs because of their relatively larger data-rich sizes, so some photographic organizations use a form of lossless file compression called LZW. It does take a bit of time to pack the file and each time you open the file it may take a bit of time to expand it. But no data is thrown away and the image does not degrade over time.
If you scan a photo, it is a good practice to save the scan as a TIFF, rather than as a JPEG or PDF, because of the TIFF’s losslessness. In addition, if you want the maximum quality, you can even capture and save up to 16 bits per channel in an RGB TIFF; JPEG only allows for 8 bits per channel.
If you want to share a digital photo that is in a TIFF file format, saving or exporting a copy of it as a JPEG is a fine option. A JPEG can be viewed a web browser and it takes less bandwidth to transmit or download. Always keep the original TIFF though.
If your original digital photo file is a JPEG and you don’t intend to modify it, you can archive it as it is. There is no benefit to converting it to a TIFF if you are not going to modify it. The “lossy” aspect of JPEG becomes an issue when you modify the JPEG and save it — and consequently compress it.
JPEG compression of image data results in some loss of image information, which is why it is referred to as lossy. Compression is not inherently bad; light compression reduces a file size and the lost image information is barely visible. But the more you compress a file, the more information you lose and the worse the photo looks. Once that digital information is lost, you can never get it back.
If you take a TIFF file and save it as a high quality JPEG with a low compression setting, the JPEG may occupy a fraction of the disk space that the TIFF would have occupied. However, if you were to open the JPEG again, make tone or color changes and then re-save it, you would subject it to another round of compression; after multiple rounds of modification and re-compression you would begin to see degradation in the image file.
The amount and quality of compression applied to a JPEG file is an important factor in its quality. In Photoshop, there are two means of creating a JPEG. One uses a quality scale of 1 to 12, with 12 being the least compression or “maximum quality” and it results in the largest file size. Quality equals size. The higher the quality, the larger the file size; the lower the quality, the greater the data loss and the smaller the file size.
The type of JPEG compression applied in a camera will be different from that used in Photoshop. Some of the newer cameras have several settings, ranging from a “Basic” JPEG to a “Superfine” JPEG. These settings probably have a rough equivalent setting to Photoshop but they are not exactly the same.
When modifying digital photos, never modify the original. Always make a copy and modify the copy. You can compress copies for upload or delete copies if you are not happy with the results. Be careful to save the copy with a different name than the original; otherwise it will overwrite and replace the original.
The JPEG 2000 format has both a lossless and a lossy means of compression. Like TIFF, JPEG 2000 can store files with more than 8 bits per channel, though it requires less storage space than a TIFF. Note that while you can substantially reduce a JPEG 2000 file size, there are fewer applications that can create and open this file format compared to a TIFF. If you are considering converting your files to JPEG 2000, do some tests first.
Here’s a tip: if you open a JPEG image in a photo-processing application, modify it and save the retouched image as a TIFF (with or without LZW compression), then this TIFF image will not be any further degraded or compressed than the original. However, if you apply curves or levels to the image, then you will more than likely introduce some loss of data, since both these ways of modifying the tonal distribution of the image do so by squishing or stretching out the original data.
Does adding metadata affect the photo file? If you add descriptive information using particular software, will any other software enable you to view that information or is it all proprietary? Are there any open-source options for adding metadata?
You can modify the metadata about the image — such as caption, description and keywords — with a number of programs. Most of these will only modify the file header information, not the image pixels. [See “An Easy Way to Add Descriptions to Digital Photos,” part 1 and part 2.] Adding metadata to a photo file does not subject the image to compression, so the quality of the image will not change. Since the metadata text does take up a little bit space, the size of the image will increase slightly.
Information written to the file header of JPEG images can be read by many applications and, in newer computers, even the operating system itself. For instance in Windows Vista and Windows 7/8, the WIC (Windows Imaging Component) allows you to see this information simply by “right clicking” and viewing the image properties. With Macs, from OS 10.5 forward, the information is visible by using “Preview” and Command + I (view info).
If you add metadata to TIFF files, much is the same as with JPEGs, though not all programs will work. Other special and proprietary file formats like Photoshop files (PSD) and camera RAW files (NEF, CR2) are even more problematic in terms of image metadata and review by other programs.
Most software use the IPTC or XMP standards to store embedded photo metadata. Picasa uses the older IPTC standard. Photoshop uses XMP for storing metadata: this includes the IPTC Core, IPTC Extension, PLUS and more. Information entered with Picasa can be read by Photoshop. The reverse is not always true.
You can find a list of photometadata resources at controlledvocabulary.com.
Does frequently opening digital photos, JPEGs, degrade the quality or is that due to compression?
Moving a JPEG from one location to another will not degrade the image but if the file is corrupted in transit (due to, say, a virus), it will likely not be openable.
It’s important to understand that while compression is used in saving the JPEG file, and the JPEG image has to be decompressed before you can view it, there is no change to the image just through the act of opening the file. Re-compressing the file changes it.
If you “Save” the opened JPEG file, rather than just close the open file (exit without saving), you can cause the file to degrade over time with each “open/save” action. Typically the only time you would be asked to save the file is after modifying the image pixels, such as changing the tone or color, or retouching, cropping or removing red-eye.
You might consider making pixel changes to your JPEG and saving the digital photo as a (lossless) TIFF file.
You mentioned scanning at 300 dpi for the standard photograph sizes. Would you use a different dpi if you were scanning a color photograph versus a black and white photograph?
You could scan a b&w photo using the “grayscale” option rather than the RGB color option, but you’d want at least 300 dpi/ppi regardless.
Mike – Thank you for taking the time to research the nuances of scanning and tagging. Excellent addendum to the information presented in the webinar. In the webinar you also mentioned Outreach as a component of preservation, your Signal contributions are a great outreach tool for the advocacy of Digital Preservation.
Thanks for your kind words, Marcia.
I am concerned that advising people to save at 300 dpi will result in lots of regrets for future generations. The quality of printing, computer monitors and televisions will continue to improve (and thus the ability see details in higher quality imagery). Also, a person may want to zoom in and view just a portion of a scan or even cut out a piece (just their grandmother from a school group photo) all of which will suffer from 300 dpi. I believe that 600 dpi is a better recommended minimum size. As we all realize, it’s better to build the quality into the original scan (saving as a TIFF), then saving JPEGs from that for sharing with relatives or posting online (for smaller file sizes).
I recommend looking at the “use cases” of scanned photography and as well as better future proofing recommendations. 600dpi does cause larger files, but with hard drive prices coming down I believe the value is worth it.
In regards to Mark’s comment, I think the answer really revolves around what you are scanning. For “photos” (i.e. a photographic gelatin silver print, or chromogenic dye print like RA4 process), you can scan at a higher resolution. However, in most cases, all you will see are the defects. If the original you have to work with is a 4 x 6 inch print, and you scan it at 600 or 1200 pixels per inch you could then make the equivalent of an 8 x 12 inch print, but it’s not likely to give you better quality. It will however, as he notes, take up much more space on your hard drive.
If you have a high-quality 8 x 10 inch glossy print, in which the image is sharp (no motion blur from the camera moving) it might be worth going to a higher sampling setting. But I would recommend that you do some tests first to make sure it’s worth it.
In my experience, higher scanning resolutions usually just give me more dust to spot out later, and the enlarged images never look as good as the small original.
If you are scanning a b&w or color negative, or a color slide, then you certainly want to scan at higher resolutions. Which is best has much to do with your intentions (now and in the future), the quality of the original, and the type of hardware you are using to make the scan. Many scanners advertise an interpolated sampling rate in their “marketing speak” though you will often get better results scanning at the maximum “optical resolution” of the scanner.
Hope that helps.
First, begin with how much detail is there actually in the original. This amount of detail varies widely. A halftone screen for an old newspaper may result in less than 200 dpi actual. A modern lens on a quality black 7 white emulsion may be 2800 dpi.
In the old days, (the 1990s) when scanning became widely available, 300 dpi was a good starting point because many many books and documents did not contain more detail than that and even today, 300 dpi is a good starting point.
For example, at the Library of Congress we currently print our digital photographs using high quality pigment printers that may claim a resolution of 1200 or 2400 or much much more. But those are microdots of different color merged to produce the variety of shades of gray or color. Usually the printer driver produces a finished resolution between 240 dpi and 360 dpi.
Second, we need to sort out the term “resolution.” Scanners and cameras contain pixels and “sample” the image at a “sampling rate” depending on the distance between the camera and the image. So when people talk about “resolution” using 300 ppi or 600 ppi or 3000 ppi they are actually using the “sampling rate” of the device. But few devices are 100% efficient.
Common scanners may be only 50% efficient, cameras may be 80 – 95% efficient. Thus the actual resolution achieved at 300 ppi may only be about 200 ppi – higher ppi rates are the result of image processing which may give the appearance of sharper lines but which does not produce additional detail. Many scanners will claim 1200 ppi and produce less than 600 ppi true optical resolution. Federal Agencies Digitization Guidelines Initiative standards (http://www.digitizationguidelines.gov/) are currently at 80% efficiency for a 2 star, 90% for a 3-star, and 95% for a 4-star outcome. Many of our projects for prints and photographs and rare books are 400 ppi at 3-star levels, although some are much higher.
Third, many people want to enlarge an image. We often try to scan film – particularly 35mm film – at a resolution necessary to provide a final print at 300 dpi. So if you want a common 4″ x 6″ print you need a true resolution of 1200 ppi. Specialized film scanners and high quality camera setups can achieve this. Commonly available consumer flatbed scanners cannot. (If you read the fine print specifications, they will often say something like “true 2400 ISO sampling rate” not ISO “resolution.”)
But once you reach the limits of the actual device resolution and the actual detail in the original, then additional enlargement doesn’t help. I think I have a couple of illustrations of this in my most recent blog article which discussed enlargement (http://go.usa.gov/j2q4). I don’t believe you can magnify a newspaper image and find additional detail in a scan with a true resolution above 300ppi
Finally, Apple claims that human vision is only capable of resolving 326ppi – see their Retina display marketing materials. There is a lot of quibbling about that number – but most still claim not more than 450ppi.
In the end, I doubt that you will see any significant improvement in an image of reflective materials beyond an ISO standard resolution of 400ppi. I doubt you will find any improved image quality on consumer scanners above an ISO standard resolution beyond 1200 ppi unless you scan 35mm film in a specialized, high quality film scanner.
Two final notes. I believe the costs of higher resolution are vastly underestimated. Scan time will increase significantly with increased resolution. Transfer times increase, processing times increase. The expertise needed increases to get better quality. Storage and multiple backups increase. Consumer hard disk drives are not – repeat and emphasize – not archival devices. Your children and grandchildren may not be able to retrieve images from a hard disk even 15 years from now. Increased image size means greatly increased cost.
And I believe 300 ppi / 400 ppi is future proof. At least for reflective materials, I don’t believe we will see greater detail in a 1200ppi scan no matter how improved future equipment is.
Regarding JPEGs, is there any advantage (in terms of quality) to scanning documents/photos as a TIFF and then scanning as a JPEG or just scanning as TIFF and re-saving the TIFF as a JPEG?
Would the resulting JPEG be of any better or less quality if it was a scanning JPEG or a TIFF re-saved as a JPEG?