The Car Library Project: Guide to Improved Identification
and Classification of Digital Photos, Images and Documents

(Click on logo to return to main page)

Google announced on February 12, 2016 that Picasa, its desktop photo editing and management program, would not be supported after March, 2016.  The Picasa Web Albums, the online feature of this program, would transition to Google Photos.  

This webpage recommends using Picasa for basic photo captioning and metadata tagging, including location tagging.  "Desktop" Picasa will function indefinitely for this use.  Software ("app") program  recommendations will be updated as replacements become known.   

The primary webpage of CarLibrary.org digital archive project, promotes the use of the open-source Greenstone Digital Library program for car historians, collectors, museums and collections, to encourage the creation of digital archives.

This webpage describes using Picasa and other Windows programs to improve the identification - and eventual classification - of digital photos and scanned documents and photos.  This is not only highly useful for Greenstone, but for many other programs which classify and display digital assets.

The primary topics of this guide are:

1. A Brief Introduction to Metadata and Embedded Metadata

2. Picasa - Software to Display Photos and Embed Metadata - Why and How

3. ExifTool - Another Method to Create and Use Embedded Metadata

4. Archiving and Classifying Photos with Greenstone

5. Preliminary Recommendations  

This guide was started in December 2012 through trials and tests of Picasa, Greenstone and other Windows utility programs. The trials are on the "Trials and Tests with Picasa, Metadata, Greenstone and the ExifTool" page in chronological order.

Preliminary recommendations are also near the end of this webpage.  In my experience, these recommended steps will minimize the  re-entry of data and reduce the duplication of processing steps. As more is learned, these recommendations will be refined.

The Problem

Even in personal collections, there are always many digital photographs and image documents to identify for future use.  If captions (or similar identifying data) can be readily added in a (standard) photo organizing program, there is a better chance the identification process will be "actually done" rather than delayed or never done!

Although identification can be done by adding "metadata" to each image in the Greenstone digital library program - and "externally" to the photo/image files with programs such as Excel -  this extra step rarely results in data with long-term links to the image/photo.

This important archive issue is discussed on a Library of Congress blog: "Mission Possible: An Easy Way to Add Descriptions to Digital Photos."

1.  A Brief Introduction to Metadata and Embedded Metadata

"Metadata" has been around for a very long time, but this term only appeared in 1968..  One example of "metadata" is the information on the cards in a library card catalog, where book titles, author, summary, etc can be found for the books of a library collection.  A common definition of metadata is "data about data" or more fully, "'content about individual instances of data content' or metacontent, the type of data usually found in library catalogues."  Wikipedia provides a great explanation and history of metadata.

In the digital world, library card catalogs are now on databases and search terms are used to search for the metadata - to locate the desired books!  Digital files nearly always contain their own metadata.  For example, Microsoft Word and Excel files contain file creation date and frequently list the "author", which can be seen in "File Properties".  Music MP3 files contain much metadata, such as song title, length, artist, etc.

Images/photos from digital cameras contain a great amount of metadata, including photo date, camera model, shutter speed, etc.  As more detailed in the Wikipedia reference above, photo metadata appears in specific standard categories, as will be seen in examples below.

Although not a common term, "external metadata" is the type found in a card catalog or on a list of books or photos.  The type of metadata in an Excel file or digital photo is "embedded metadata".  Even the Smithsonian Institution recognizes the benefits of embedded metadata.

Why Use Embedded Metadata?

Embedded metadata in images (and other digital files) is preferred and has several benefits:

A.  This is the equivalent of "writing on the back of a photo" and important identifying information will be with the image or document long into the future.

B.  This metadata can be used by many other programs to classify or locate a specific digital file based on single items of the embedded metadata.  For example, if one or more photos are captioned with "1963 Chicago picnic with Grandma Smith", searching on "1963", or "picnic" or "Grandma Smith" will find those photos.

C.  Using embedded metadata can significantly reduce data input and make descriptions more consistent.  Rather than enter "1963 Chicago picnic with Grandma Smith" more than once in a database or spreadsheet to describe a photo, it can be extracted from the photo's metadata to produce a spreadsheet list.  Less data input usually means less errors!

D.  Metadata added by offline/offsite individuals (or by volunteers) working on a subset of the digital files can be a "gateway activity" to forming a digital archive.

2.  Picasa - Software to Display Photos and Embed Metadata - Why and How

I've personally been using the (free) Picasa program from Google for several years to make minor edits to digital photos, such as cropping and light correction. I also use Picasa to organize, by year and month of creation, my (thousands of) digital photographs, many images scanned from slides and negatives and images of scanned documents.  Picasa provides a function to make "albums" with subsets of these photos without changing the original directory (location) of the images.  

Sorry to say, only a fraction of my photos have captions, even though this step is not difficult in Picasa.  However, after learning that Picasa captions can be extracted to Excel files to make lists or used by a digital library/archive program (Greenstone and others), there is now a great incentive to caption everything!

There are other programs that will caption - or otherwise create more types of metadata in digital photos - so Picasa need not be the "one size that fits all".  For example, the ExifToolGUI,  discussed in section 3. below, also provides an easy method to embed metadata after setting up its Workspace manager.  Metadata written by ExifToolGUI in the correct categories described below will appear as captions and keywords in Picasa!

If you use another program, let me know and I'll add your experiences to this guide.

However, try Picasa!

A.  If you don't have Picasa, download it from Google and install it the same as any Windows program.

B.  Picasa will likely try to find all the digital images on your computer.  You can control this by using the "Tools" and "Folder Manger" menu options to select or de-select specific folders to add to Picasa.  For trial and learning purposes, you only need 10-25 digital images in Picasa. There's no harm to let Picasa "scan everything" except the computer may be busy for a while.

C.  Click on any photo that Picasa displays as a thumbnail.  Under the large image, it states "Make a caption".  Just over-type this with a suitable, descriptive caption.  When you move to the next photo, a short delay indicates your caption has been "embedded" in both the "XMP Description" and "IPTC Caption-Abstract" metadata categories.   These are two of six standard categories of metadata categories available for photos.  They are listed by Wikipedia as:

  • IPTC Information Interchange Model IIM (International Press Telecommunications Council),

  • IPTC Core Schema for XMP

  • XMP – Extensible Metadata Platform (an ISO standard)

  • Exif – Exchangeable image file format

  • Dublin Core (Dublin Core Metadata Initiative – DCMI)

  • PLUS (Picture Licensing Universal System).

D.  Photos can be geo-tagged, using the "red pin" (on the lower right of the Picasa screen)  Geo-tags are embedded as "EXIF GPSLatitude", "EXIF GPSLongitude", etc. in decimal format.

E.  Multiple keywords ('tags") can be added to any photo using the "Tag" function (also on lower right).  Tag(s) are embedded  in the "XMP Subject" and "IPTC Keyword" categories with multiple tags separated by an asterisk "*")

F.  The photo date is embedded as "EXIF DateTimeOriginal".

G.  You can be creative with captions although using "tags" with captions provides more flexibility.  If you also put a unique ID (an "accession number" which is a museum "best practices" technique) in the Picasa caption field, a text search on the number would locate the photo.  That same number can be written on the original photograph or slide before scanning. The digital representation can be readily cross-referenced and located through the Picasa search function or in many collections management programs.

H.  You may confirm your digital photos captioned in Picasa has this embedded metadata.  This can be done by inspecting each photos in Adobe Photoshop Elements (or similar program) with the "File info" command.  It shows the captions are embedded metadata in the "XMP Description", "dc:Description (alt container)" and TIFF "Image Description" categories

Picasa has other features that are very useful for archive purposes.  None of the image edits (except the caption, tags and geotagging) actually change the original photo until it is "saved" or "exported" to a different folder.  Until then, the image edits are stored in a small, separate Picasa file.  If you have made image edits in Picasa, it is easy to export a folder or group of photos to a folder intended for adding to or archiving in follow-on program in either the original photo size or your choice of a smaller photo size.

At this stage, photos (or other digital assets) that have been captioned or tagged in Picasa - or other program - are ready for many future uses, especially ready identification by others at some future time.

One important use of digital assets is the creation or a digital library or archive, for personal use, business use or as part of a museum collection.  Section 4 below describes how to use/import captioned and tagged photos into the open-source Greenstone digital library software.

3. ExifTool - Another Method to Create and Use Embedded Metadata

Two other free programs are potentially very useful to further add identification data to digital photos:  "ReNamer" and "ExifTool".  ReNamer can change the file name for an entire folder of photos in many ways, including adding metadata.  For this sample group of photos, the caption was temporarily added to the file name as a prefix or suffix by selecting "ITPC Caption" choice.  If the only the "accession number" was put in the Picasa caption, this data could be easily added to a group of photos.  

Phil Harvey's ExifTool has the ability to extract, add, copy or move nearly all types of metadata.  The basic program must run from a command line, but with the correct configuration, it is very powerful and promises to "do everything"  A download and very complete explanation of its functions are here.  Using the command line functions are described on the webpage referenced below.

Bogdan Hrastnik has written a GUI (Graphical User Interface, Windows only) for ExifTool, which allows very easy access to many of the ExifTool functions.  The "how to" and download page for ExifToolGUI is here.  

For a guide to using both ExifTool and the ExifToolGUI, go to this webpage: ExifTool - Reading and Writing Embedded Metadata which is a section of this CarLibrary.org website.

4. Archiving and Classifying Photos with Greenstone

Other sections of this website - and many Internet guides - offer good instructions on downloading and installing Greenstone.  

A.  After the program is installed, the sample/trial captioned and tagged images can be dragged ("gathered") into Greenstone (version 2.85 for Windows is used for the following examples).  

B.  No other metadata needs to be added to any photograph in the Greenstone "Enrich" panel.

C.  In the "Design" panel, configure the Greenstone Image "plugin" to extract "OIDmetadata dc:Description" 

D.  "Create" the collection.  When this is complete, "Preview the Collection" and you should see Picasa captions displayed in the "ex.XMP.Description" metadata.  

E.  This metadata category can be renamed in the Format panel as "Caption" for the search results display. Greenstone shows the correct Picasa caption with each digital photograph.  

F.  The screenshot below (Figure 1) shows that Greenstone has also extracted the photo's file name from "ex.Source" and the date it was taken from the EXIF metadata "ex.EXIF.DateTimeOriginal".

G.  Photos in this trial collection can also be found by searching for any desired text in the "Captions" or "Photo Dates" categories.

Figure 1- Screen shot shows browse results on Captions starting with "T"

H. Picasa Captions and Tags -

If you have added Picasa Tags (keywords), figure 2, below, shows results from browsing in the keywords category, specifically to display the photo ID numbers, as embedded as Picasa "tags."

Figure 2 - A display of keywords starting with "1", which show the photos with the trial ID numbers starting with "12", etc.  The Greenstone "search" function can also be used to locate a specific ID number - or other keyword.

A Greenstone test archive of 190 personal photos taken at the Mullin Automotive Museum was made using only embedded metadata added in Picasa.  The metadata includes location data for each photo: latitude and longitude.  The newest Greenstone, version 3.0, can use this data for map displays.

Note: the Greenstone lab team at the University of Waikato helped fix a bug that was preventing all images from being added to the collection.  It was a simple fix - select "unicode" as an option of "input_encoding" in the plugin for embedded metadata.

Another bug prevented viewing the embedded metadata in the "Enrich" panel for files with the uppercase "JPG" extension.  This was fixed by a simple edit to the "util.pm" file in the "perllib" directory.  Contact me for the bug fix sent by the Greenstone Users Group.

Figure 3 - An archive of Mullin museum photos, this is the initial display of "Captions". Picasa was used to identify each image with captions and "car make" and "car year" as tags/keywords.  The file names and photo dates are standard metadata embedded by the digital camera and extracted by Greenstone automatically.

5. Preliminary Recommendations

These are based on these trials, practical considerations and guidelines for archives (U. S. National Archives and the Smithsonian Institution):

Digital Photographs:

a.  Use Picasa or the ExifToolGUI to put captions on each photo.  Captions will be very useful for later identifying the photo.  

b.  Use tags to add "keywords" to each photo or group of photos.

c.  If you decide to add a unique ID number to each photo in Picasa, make this the first tag.  Best museum/archive practice is to make this unique number an "accession number".  If you use the ExifToolGUI to add an accession number, put it in the DC:Identifier category.

d.  Use the Picasa geo-tag function (red pin), or a similar function in the ExifToolGUI, to locate each photo or group of photos on Google's maps.

e.  "Export" the photos from Picasa for use in Greenstone or other archive software at a resolution suitable for the archive's use.

Scanned images:

a.  Scan at least at 300 dpi; some archivists recommend 400-600 dpi.

b.  If "archive quality" is not a concern, scanning to JPG format is "OK".

c.  If long-term preservation is a concern, scan to TIFF or PDF/A format (Note: "PDF/A" is a new standard for long-term archive storage and use of digital images and documents).

d.  Use the ExifTool, the ExifToolGUI or Picasa, as for digital photos above, to add captions, tags, geo-tags to each image.  

e.  Export the images, as above, from Picasa.  TIFF images will be exported to JPG format and image metadata will be preserved.

f.  However, Picasa does not recognize PDF/A formatted images.  Subject, keyword and other identifying data can be added in the ExifToolGUI or a PDF editor, such as Adobe Acrobat, ABBYY FineReader, or Lightning PDF Editor.

Scanned Slides and Negatives:

a.  If you have the original negative or slide for any image, scanning the slide or negative directly will almost always give better results than scanning the photo previously printed in a darkroom or with a digital printer.

b.  The same steps for scanned images apply, except the most common negative/slide format - 35 mm - should be scanned at 2800-4000 dpi.  This resolution should be within the optical resolution of your scanner.  Better quality scanners (usually those costing more than $100 or scanners that are not part of a "all-in-one" printer) will give better, near-professional archive quality results.

c.  The software included with your scanner may be adequate.  You should scan several slides or negatives and check the results to determine whether you need a software upgrade or alternative.

Scanned Documents:

a.  If text recognition (and later text searching) is not a concern, scan as described above for images.

b.  However, text recognition is important!  Therefore scan to PDF, multi-image TIFF or PDF/A at 300 dpi or higher.

c.  Process each document with good optical character recognition (OCR) software such as Adobe Acrobat, ABBYY FineReader or other tested and proven OCR software.

d.  Add identifying information after the OCR process stage with the same OCR software.  This information will be located in the XMP metadata category.

e.  The ExifToolGUI can also be used to add metadata to a PDF or PDF/A file if the Workspace manager is configured for this function.

Note:  TIFF files are a long-recognized standard for archiving photos and scanned images/documents. In a white paper "Guidelines for TIFF Metadata, Recommended Elements and Format", a US government standards organization recommends using "ImageDescription" for the subject of the item and "ImageUniqueID" for a unique file identifier.  However, the seemingly logical "ImageUniqueID" was empty for TIFF files, but very much in use for digital camera images.  

Summary

My technical knowledge of Greenstone is "moderate", so improvements to these processes will be by trial and error.  I have queried the Greenstone user group (technical support) seeking a more efficient and clearer method to reach these results.   At the very least, these tests seem to be on the right track - Picasa is a viable recommendation to initially organize and identify images, especially by reviewers and classifiers with average computer skills.  Also, FineReader, the ExifTool and ExifToolGUI promise to be a powerful combination to improve embedded metadata of digital images; I will use these tools for my collections.

Email me with any comments, suggestions or questions!  Bob Schmitt, rgschmitt@gmail.com

Created June 15, 2013

Revised December 1, 2014 and November 24, 2015

Note: The Greenstone collections of CarLibrary.org are hosted on a ThinkPad T61 system located in Burbank, using (free!) Linux Ubuntu server software and (also free!) Greenstone 2.85 (Linux) on a 60 GB SSD (OCZ) disk drive.