Personal Digital Archiving: The Basics of Scanning

A digital image consists of tens of thousands of tiny dots or squares called pixels.

A digital image consists of tens of thousands of tiny dots or squares called pixels.

Although the National Digital Information Infrastructure and Preservation Program and the National Digital Stewardship Alliance focus on digital preservation and access, many of the personal digital archiving questions that the general public ask us are about scanning. Though scanning is a separate issue from digital preservation, scanning does generate digital files that need to be preserved. In the interest of helping people create the best possible digitization of their photos and documents for preservation, we have produced a “how to” video that we will be releasing soon. In the meantime here is a brief, basic introduction to scanning that we hope will demystify the process.

When you scan a paper photograph, the scanning device creates a digital version of the photo made up of tens of thousands of tiny dots or squares called pixels. This paper-to-digital conversion process is digitizing, though the act of digitizing is not limited to images or text; you can also digitize video and audio. In this post we will just look at scanning and digitizing photos.

Step 1: Prepare Scanner and Photos
The first step in the process is to clean the scanner and photos. Smudges, dust and hair will scan into your digital photo and ruin its purity. Wipe the scanner glass with a plain, lint-free cloth dampened with water. Do not spray the water directly onto the scanner; spray it on the cloth. Wipe the inside of the scanner lid too.

Next, lightly wipe off the photograph with a dry anti-static cloth. You can find these cloths in a camera store. They not only clean the photo, they help reduce static electricity on the photo and prevent it from attracting more dust particles and hair. Place the cleaned photo face down on the scanner. Do not touch the glass when you place the photo. The natural oils on your fingers may smudge the glass and you’ll have to clean the glass again. Try to slide the photo against the side of the scanner glass and up into the corner for the best alignment.

detect-separate-itemsSome software will detect separate photos or items and scan them as individual files. Leave about one-half inch between the photos to help the software recognize them as separate items. Close the lid gently so the photos remain aligned with the scanner glass edges.

Step 2: Set Scan Properties – DPI and Bit Depth
Once you have prepared your scanner and photos, set the properties for the photo scan. In your computer, open the scanner software. There are two important settings to look for:

  • details of the digital image, such as the number of dots per inch and whether the image is color or grayscale.
  • the file format to save the image as – such as TIFF or JPEG – and the type of image compression (if any) you want on that file type.

Dots per inch – or DPI – is a measurement of pixel density. Image specialists use the more precise term “pixels per inch” or PPI. However, since documentation for commercial scanners almost exclusively uses the term DPI, we will stick with the term “DPI.”

scan_300-400dpi_220jpgThe more pixels packed into a one-square inch space, the greater the potential detail an image can hold. An image with 200 dots per inch potentially displays more detail than the same image with 72 dots per inch. There are optimum DPI settings for different photo sizes and types but more DPI is not always better; there is a DPI limit or threshold. Beyond that limit, there is nothing more of value that increased DPI can add. You can only scan so much detail from a photo.

  • For most personal work, 300 to 400 dpi is satisfactory for snapshot prints and for common enlargements at 4″x6″, 5″x7″ or 8″x10″ in size.
  • Since very small prints or photographic slides contain a lot of detail in a small area, capture more dots per inch, around 1400 to 1500 DPI.
  • Photo negatives also hold a lot of detail, so for negatives, select a minimum of 1500 to 2000 dpi. Remember that increasing dots per inch increases data and increases the file size.
  • If you want a close up, such as an enlargement of a face in a crowd, select the area and increase the DPI until you get the height and width that you want. Keep in mind that the quality may not be very good.

8-bit_onlySome software may enable you to adjust the bit depth of data per pixel. The more bits per pixel, the more information the pixel contains and the richer the digital palette you have to work with. The most commonly used scan setting is 8-bits per pixel for grayscale (some scanners may also offer you 16), and 24-bits-per-pixel for color (although some scanners may also offer 48). With more bits per pixel you can have a bit more to work with if you intend to edit your digital photos later. But for routine scanning, where you do not plan to edit much, or where the quality of the outcome is not such a big deal, then select 8-bit grayscale or 24-bit color. Remember that increasing bits per pixel increases the amount of data in the file and so it increases the size of the file.

If the paper photo you want to scan is black and white, and you see a menu choice of grayscale or color, select “grayscale.” If the paper photo is color, select “color.”

tiff-lzw-compression_smallStep 2: File Format and Compression
Scanner software saves your scanned photo as a digital file and the most common file-type options are TIFF and JPEG. TIFF, the preferred format for digital photo preservation, retains the maximum amount of digital data that your scanner captures. If you have a choice, save your original master scan as a TIFF.

If file storage space is an issue, you can compress a file and reduce its file size. Scanning software may offer an option of LZW compression for a TIFF file, which will cut the size of the TIFF file without the loss of digital data. This is called “lossless” compression. By contrast, saving an image as a JPEG employs a “lossy” compression, so named because a JPEG file, by its nature, is compressed and it loses some of the digital data during compression that the scanner captured. You can select JPEG quality levels and degrees of compression, from “least compression” — the least amount of lost data and the highest JPEG quality — to “most compression” — the most amount of lost data and the lowest JPEG quality.

jpeg_max-quality_200We recommend that if you intend to modify or work with a digital photo, you save two versions of it: a master version and a working copy. Keep a TIFF file as the master file and store it safely with your other personal digital archives; use a JPEG version as the working copy. The JPEG file will be smaller and more convenient to email or post on social media sites. Edit, modify and work with the JPEG. You can always make a fresh JPEG copy of the master TIFF file.

Once you have selected the file type and set your bit depth and DPI, you are ready to scan your photo. Preview the scan, if you have that option, and look it over to make sure you haven’t picked up any dust, hair or artifacts. And check that the photo is aligned properly. Then select “scan.”

Renaming a file does not affect the contents of the file.

Renaming a file does not affect the contents of the file.

After scanning the file, some scanning software will prompt you to assign a file name. Some software will automatically assign a file name to your file. If it assigns a file name (usually some alphanumeric name like “DC2148793.jpg”), you can either keep that file name or you can change it. To change the file name, right-click –- if you are on a PC –- and select “rename.” On a Mac, control-click and select “rename.” Renaming the file will not affect the contents of the file. We recommend that you rename the file to help you find the file later. Many people include the date in the file name — at least the year or the combined year and month. If your file names lead off with year-month, followed by a descriptive word or two, then — in your computer folder — the files will sort in chronological order.

Remove each photo from the scanner by slipping a piece of paper under it and lifting it. Avoid touching the glass with your fingers.

As soon as possible, back up your digital photos in a few separate places. Every five years or so, migrate your personal digital archives to a new storage medium in order to avoid having your collection stuck on some obsolete media.

20 Comments

  1. David Stong
    March 28, 2014 at 9:21 am

    Thanks; I’ll share this. One setting that seems to be a default on the personal scanners I’ve worked with is something like “Auto correct” or “Auto levels.” The scanning software will automatically sharpen and correct unbalanced tones when you scan. I generally turn that off for the cleanest scan, preferring to take care of sharpness, tone and color in another processing package if needed. My thought has been that if it happens automatically in the scanner, it’s altering the image data and something more pristine should be the goal for the preserved copy. Have I been blowing it?

  2. Mike Ashenfelder
    March 28, 2014 at 3:05 pm

    David,

    You have been doing the right thing by scanning the image into what you describe as a clean and pristine file. Aside from the basic scan settings like DPI and color/grayscale, the less you do to the original scan at the scanning stage, the better.

    Editing, as you rightly practice it, should be different process from scanning. Make a copy of the original scanned file, backup the original, and use photo-editing software to edit the copy for sharpness, contrast, red eye, cropping or whatever. Past a certain point in the editing process you cannot undo the edits, so it is better to modify a copy than the original. You can always make new copies of the clean and pristine original and try your edits again.

    Mike

  3. Pat Yo
    March 29, 2014 at 3:04 am

    Mike, I’ve been scanning to PNG, as it is also a lossless format. Any thoughts on what’s good vs. bad about using this format over TIFF?

    Thanks.

  4. Juanita Robertson
    March 29, 2014 at 8:55 am

    Thanks for letting this be printed. I have many old family photos I’m trying to save and share with family. The hints of keeping the scanner clean are most helpful. Thank you again.

  5. Susan Steeble
    March 29, 2014 at 10:13 am

    Dear Mike,

    I was referred to this article by Eastman’s Online Genealogy Newsletter. I read it with interest because, in the past couple years, I’ve scanned thousands of old family photos (and still have many more to do). Through trial and error, I’ve learned how important it is to keep the scanner clean and dust-free. When I was scanning large items, I needed to weight down the material so that it would stay aligned properly when I closed the scanner cover. I scanned the photos at 100% but then printed them out at 80% or less, so that I would not lose data. I followed the practice you recommended for scanning old photos “as is” the first time and then making copies for editing, cropping, enhancing contrast, and touch-ups. The photos and pages were scanned at 300 dpi and saved as JPEGs. My scanner is a Canon CanoScan LiDE 100.

    When I printed out the individual scanned photos on photo paper, they looked great–no obvious loss of detail. I then glued them to cardstock to make scrapbook pages. Eventually, I scanned the completed 8-1/2 x 11 scrapbook pages (at 100%) and sent them off to professional companies for printing and binding (there was no opportunity for me to proof them). When I received the printed and bound books, I was dismayed that some photos were very streaky (either horizontally or vertically)–but the background cardstock and other photos on the same page were not streaky. Therefore, I have to conclude that the streaks were not due to the printing process but to some error on my part. But I don’t know what I did wrong!!

    Can you suggest possible factors in my technique that might have resulted in streakiness? I want to avoid making the same mistake when I work on my next scrapbook.

    Thanks very much,

    Susan
    Baltimore, MD

  6. Kelvin
    March 29, 2014 at 2:00 pm

    Old photos can be fragile and brittle. If pulling them out of albums, scanning, then transferring to PC and cropping manually, and putting them back in the albums sounds like a pain, it is. An alternative is to use your smart phone or digital cam – most are 8MP or more. For iPhone or iPad users, there’s an app called Pic Scanner that will let you scan 2 to 4 photos at a time, and crop them automatically and accurately. Very fast.

  7. C Holm
    March 29, 2014 at 2:25 pm

    Consistently missing info from these sites—scan for final print size. This rarely gets mentioned in these basic instructions. I have sadly been tasked innumerable times with trying to restore scanned images done by those who did not get taught that step. Anyone wanting an 8×10 print from a 4×6 original MUST scan at at least 600dpi or they will be extremely disappointed w resulting pixelated print lacking details it could have portrayed.
    Original size print x 300dpi scan = same size print
    Original size print x 600dpi scan =twice the size print
    Original size print x 900 or 1000 or closest setting dpi = three times size print
    Look online. Pros scan slides 2400 or 3600 or 4800dpi. Pump pixels in. Get best results.

  8. Scott W. Tilden
    March 29, 2014 at 6:27 pm

    If you’ve made it this far without running away from your computer, a few more baby steps can help you take a giant next step.

    The compression well explained in the article is additive and accumulative. So if you modify your file and save it as a JPEG, then decide later to do something else and save it again, you’ll be adding additional JPEG compression to the original JPEG compression. You’ll start to see degradation akin to what we’ve always seen when we make a copy of a copy of a copy.

    The best workflow is to immediately rename and file a master master copy of the scan. Then, make your crops, color adjustments or whatever on a copy of the master master TIFF file. Maybe even save a few interim TIFF copies (or even native .PST file copies) along the way just in case.

    In case of what? In case the editor wishes there were “just a bit more out here on the left crop” or “less green in that tree.” If all you have is a JPEG, you can’t go back…and the next rev will be of slightly less quality.

    And if they decide to use your photo on the cover more resolution, you can just go to your last hi-res master and save out a new version. You don’t have to start over and try to remember all your interim changes.

    There will (someday) come a time when the boss will want the photo in some other component colors for Hexachrome or HiDef TV or something we don’t even know yet. Having a trail of your versions will make those times oh-so-easy…and make you look oh-so-smart.

    You’ll find yourself in the best of all worlds.

  9. Martha Chapin
    March 31, 2014 at 8:34 am

    Thanks for the tutorial. We are studying digitization issues at library school and look forward to the video as well.

  10. Mike Ashenfelder
    March 31, 2014 at 9:12 am

    Thanks for your comment, Martha. And if you have any suggestions on how we can improve our information, please let us know.

  11. Mike Ashenfelder
    March 31, 2014 at 9:48 am

    Well put, Scott. Thanks for writing.

  12. Cathi M.
    March 31, 2014 at 2:46 pm

    Some comments on desirable features for a historical documents research scanner, especially one intended for field use, would be appreciated.

    For example, I understand that the wand – swipe units and units with autofeeds are not permitted in some archives, including NARA.

    What are the differences in quality between a camera (or a camera-phone, or pad-camera) and a scanner, and considering the luggage restrictions in travel, can you do as well in the field with one? ( even without the flash)

  13. John Small
    March 31, 2014 at 2:47 pm

    TO:Susan
    The vertical and/or horizontal streaks are linear artifacts that occur when an edge of a photo is rendered by the scanner. The edge will appear as a line. For example, one can often observe this phenomena where one edge of a sheet of paper appears as a line – a line our eyes don’t register but under the right circumstances is rendered as a line by the scanner. In short, scanners often consistently render one edge of a sheet as a line. Very annoying of course requiring a careful crop or other editing to correct prior to printing.

  14. Carl Fleischhauer
    March 31, 2014 at 3:39 pm

    Response to Pat Yo, who asked why everyone gravitates to TIFF as a master format when scanning, when many features and capabilities are also offered by PNG? (And PNG has one that TIFF does not: widespread browser support.) Pat’s question is a good one, and (gulp) one for which I cannot supply a really compelling answer. So I will shoot from the hip, confident that wise folks will chime in with better information.

    In professional and institutional settings, TIFF is preferred in part because we all have a long history with it, and it has served us well. At the Library of Congress, we have used TIFF in digitizing programs since the 1980s; PNG turned up later, in 1996, with standardization following in 2004. Many in the digital library community have TIFF tools and TIFF experience and this leads to a bit of inertia, in this case helpful and appropriate inertia.

    As is often the case, the Wikipedia article on PNG is excellent and thorough (http://en.wikipedia.org/wiki/Portable_Network_Graphics), and it does make the format sound very appealing. We’d love to hear from folks who use PNG as a mastering (as compared to a service-access) format.

    It is worth saying that in our (collective) institutional setting, there is a rather different back-and-forth over the use of TIFF for master images. The Library and other big federal agencies are making or receiving hundreds of thousand of images for certain classes of content, and this is pressing our capacities for receipt and ingestion, movement in networks, and digital storage. Examples include scanned microfilm (think newspapers or books) and scanned catalog cards. Some have asked, “Could we not apply lossy compression — maybe JPEG 2000 — to these classes and reduce our costs?” Others have asked about lossless JPEG 2000 for higher-end content. (Our Packard Campus audio-visual center already uses lossless JPEG 2000 for reformatted video.) The point is this: when some of us think about alternatives to TIFF, we have been looking at JPEG 2000. There ain’t much JPEG 2000 software out there, however, and this makes JPEG 2000 an imperfect choice for “at-home” scanning operations at this time.

  15. Mike Ashenfelder
    March 31, 2014 at 4:36 pm

    [This response to C Holm is from digital image expert David Riecks, of PhotoMetadata.org and ControlledVocabulary.com.]

    You have a good point. However, the vast majority of people will probably not know the largest print size they will want at some future unspecified point in time. Therefore, as you note, we need to balance the potential future print size with what this will take in terms of storage over time.

    In respect to scanning prints, the inherent low dynamic range and often poor quality of originals means that you are unlikely to get much more detail (other than dust and noise) by scanning at extremely high resolutions. At most, you’ll get a smoother image (rather than pixelated) but the end result is probably not worth the added storage. When possible, scan the largest print you have, as it’s always better to make a smaller print from a larger original. Going the other direction usually results in disappointment.

    In addition, the surface of the print can have a huge impact on the scan quality. This type of image quality is rarely discussed when scanning (no doubt because it is very hard to explain), and because there are so many factors to consider. As one example, prints that were done on “silk” finish paper (named for a pebbly finish popular in the 60s and 70s), or other textured prints often require special techniques when scanning to reduce defects. With these “pebbly” prints, the light from the scanner will often capture the peaks and valleys of the undulating print surface in such a way that the pattern is preserved in the scan. No matter what DPI or bit depth is chosen, the image is going to look poor and even poorer if you enlarge it. There are work-arounds with these type of prints. I’ve found it best to scan the image twice with the second scan done at 180 degrees to the first. In a image manipulation program that allows layers (like Adobe Photoshop) you can bring the second scan and place it on top of the first. You then match them up (rotate one 180 to match image details), and then you can use layer modes like lighten or darken to remove the peaks (white spots) or valleys (dark spots) so that these anomalies cancel each other out.

    If you are talking about negatives or positives (slides or transparencies), and using dedicated transmission scanners (or even “camera scans”), then generally a higher ppi (pixel per inch) setting is advised. Again, what is best for your situation is always a balancing act. The other factor, even with negatives and slides, is to evaluate critically to begin. The vast majority of photos taken with hand-held cameras are generally not that sharp and will not hold up to extreme enlargements. Images shot on a tripod with cable release (and possibly using larger format films) are usually much better detailed and will hold up to a higher level of scanning.

    Scanning a slightly out of focus or blurred slide at a super high resolution really makes little sense unless there is some intrinsic value to the image (like scarcity or rareness). So it’s often worthwhile to critically evaluate images using a special magnifying glass (loupes) to save yourself some time.

    One option often overlooked is “scan early and scan often.” By that I mean that it may make sense to scan more images at a lower resolution to start with. By sharing those images early in the process, you’ll learn which have more value. Those that are deemed worthy can then be singled out for scanning a second time at higher resolution or with a better scanner.

    Lastly, scanners, like cameras, are often not the biggest limiting factor to a good quality scan. A good photographer knows the limitations of his cameras and works within those limitations to create the best work they can. Likewise a good scanner operator knows the limitations of his scanner/capture rig and works within those limitations. Often mid-range scanners, coupled with more advanced software tools (like the very moderately priced “Vuescan” from Hamrick.com), can be used to eke out better scans by expanding bit depth, using multiple scan passes and other advanced options. If you are just slapping down the photo and hitting the “auto scan” button, you get what you pay for (and that’s not likely to be much).

    David Riecks

  16. Mary Margaret
    April 1, 2014 at 2:44 pm

    You suggest that we select “grayscale” if the photo is black and white. I get much better results when I use the “color” option to scan black and white photos. I can always use photo software to change color to grayscale, but not the other way around.

    Thanks for a good article. Looking forward to the video.

  17. David Riecks
    April 1, 2014 at 5:10 pm

    There were some additional comments about PNG vs. TIFF that appeared later yesterday, after I’d responded to Michael’s request for comment.

    One thing that Carl doesn’t mention is that PNG wasn’t really set up to deal with embedded metadata in the beginning. As such it was generally avoided as a format, as losing the IPTC or XMP info you’d spent time embedding was a real pain.

    Originally PNG or “portable network graphic” files were designed to be small versions for web use, and may be suitable in place of GIF, but not as an archive format when compared to TIFF.

    Like GIFs, PNG’s can have “transparency” (a useful web attribute for those that wish to avoid squared edges on a background). They can also support several RGB color spaces, but are not designed for CMYK.

    While it’s possible to save a file from Photoshop as a PNG with embedded metadata, many other applications will not view the file as having any embedded info as the information is not stored within the image itself.

    I’ve not thoroughly investigated why this is so, but when I tested, Found that images saved as PNGs with Photoshop appeared to have metadata when viewed in the Expression Media cataloging program, but not in Photo Mechanic. Those same images when uploaded to a website, and then downloaded, then appeared to have no image metadata in either program. If I wrote metadata with Photo Mechanic, then they could be read in either PhotoShop’s File info, or Expression Media. Given this complexity, I would be very reluctant to endorse the use of PNG for long term “archive” use.

    David

  18. Mike Ashenfelder
    April 2, 2014 at 9:33 am

    Cathi M.,

    The purpose of this scanning article is to enable you to make informed decisions when scanning files for digital preservation, which is why we included information on DPI and bit-depth settings. Most wands give you little or no such control over the resulting file and will not even let you choose the JPEG or TIFF file type. Similarly, cell-phone cameras and pad cameras don’t offer you the level of control that desktop scanners and their companion software do. So-called “scanning” apps are available for pads and smart phones but they too offer little control over the resulting file. The app I tested for the phone saves the image automatically as a PDF file (which should never be your first choice as a master file for digital preservation) with compression choices of “small,” “normal” and “large”; the app I tested for the pad only gives you an option to save as JPEG or PDF. Both apps had their own default and “auto” settings, so you have to accept the end result that they give you.

    Scans or photos from wands, cell phones and pads might be good enough for a quick reference image taken in the field but not for the highest-quality permanent digital record.

    I hope this helps.

    Mike

  19. Pat Yo
    April 9, 2014 at 10:43 pm

    My thanks to Carl and David for their input regarding my question on PNG format. Now I’ll go ruminate on whether to continue w/PNG as my default scan format or convert to TIFF. Hmmmm.

  20. Stu H
    May 19, 2014 at 3:49 pm

    One thing that this article doesn’t cover is the issue of color. For most lay people, those who don’t do this professionally, you need to be aware that many scanning applications written by manufacturers tend to over enhance colors when scanning. They look fine on computer screens (which use Red, Green, Blue pixels) but look too vivid when printed electronically because ink and toner printers use CMYK (Cyan, Magenta, Yellow and Black) colors. This can create a real headache for scanning items on acid paper that yellow and brown over the years. If you dial down the color intensity in your scan controls down a bit, you may find the results are easier to look at and closer to the original.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.