The Car Library
Project: Guide to Improved Identification
and Classification of Digital Photos, Images and Documents
Google announced on February 12, 2016 that Picasa, its desktop photo editing and management program, would not be supported after March, 2016. The Picasa Web Albums, the online feature of this program, would transition to Google Photos. This webpage recommends using Picasa for basic photo captioning and metadata tagging, including location tagging. "Desktop" Picasa will function indefinitely for this use. Software ("app") program recommendations will be updated as replacements become known. The primary webpage of CarLibrary.org digital archive project, promotes the use of the open-source Greenstone Digital Library program for car historians, collectors, museums and collections, to encourage the creation of digital archives. This webpage describes using Picasa and other Windows programs to improve the identification - and eventual classification - of digital photos and scanned documents and photos. This is not only highly useful for Greenstone, but for many other programs which classify and display digital assets. The primary topics of this guide are:
This guide was started in December 2012 through trials and tests of Picasa, Greenstone and other Windows utility programs. The trials are on the "Trials and Tests with Picasa, Metadata, Greenstone and the ExifTool" page in chronological order. Preliminary recommendations are also near the end of this webpage. In my experience, these recommended steps will minimize the re-entry of data and reduce the duplication of processing steps. As more is learned, these recommendations will be refined. The Problem Even in personal collections, there are always many digital photographs and image documents to identify for future use. If captions (or similar identifying data) can be readily added in a (standard) photo organizing program, there is a better chance the identification process will be "actually done" rather than delayed or never done! Although identification can be done by adding "metadata" to each image in the Greenstone digital library program - and "externally" to the photo/image files with programs such as Excel - this extra step rarely results in data with long-term links to the image/photo. This important archive issue is discussed on a Library of Congress blog: "Mission Possible: An Easy Way to Add Descriptions to Digital Photos." 1. A Brief Introduction to Metadata and Embedded Metadata "Metadata" has been around for a very long time, but this term only appeared in 1968.. One example of "metadata" is the information on the cards in a library card catalog, where book titles, author, summary, etc can be found for the books of a library collection. A common definition of metadata is "data about data" or more fully, "'content about individual instances of data content' or metacontent, the type of data usually found in library catalogues." Wikipedia provides a great explanation and history of metadata. In the digital world, library card catalogs are now on databases and search terms are used to search for the metadata - to locate the desired books! Digital files nearly always contain their own metadata. For example, Microsoft Word and Excel files contain file creation date and frequently list the "author", which can be seen in "File Properties". Music MP3 files contain much metadata, such as song title, length, artist, etc. Images/photos from digital cameras contain a great amount of metadata, including photo date, camera model, shutter speed, etc. As more detailed in the Wikipedia reference above, photo metadata appears in specific standard categories, as will be seen in examples below. Although not a common term, "external metadata" is the type found in a card catalog or on a list of books or photos. The type of metadata in an Excel file or digital photo is "embedded metadata". Even the Smithsonian Institution recognizes the benefits of embedded metadata. Why Use Embedded Metadata? Embedded metadata in images (and other digital files) is preferred and has several benefits:
2. Picasa - Software to Display Photos and Embed Metadata - Why and How I've personally been using the (free) Picasa program from Google for several years to make minor edits to digital photos, such as cropping and light correction. I also use Picasa to organize, by year and month of creation, my (thousands of) digital photographs, many images scanned from slides and negatives and images of scanned documents. Picasa provides a function to make "albums" with subsets of these photos without changing the original directory (location) of the images. Sorry to say, only a fraction of my photos have captions, even though this step is not difficult in Picasa. However, after learning that Picasa captions can be extracted to Excel files to make lists or used by a digital library/archive program (Greenstone and others), there is now a great incentive to caption everything! There are other programs that will caption - or otherwise create more types of metadata in digital photos - so Picasa need not be the "one size that fits all". For example, the ExifToolGUI, discussed in section 3. below, also provides an easy method to embed metadata after setting up its Workspace manager. Metadata written by ExifToolGUI in the correct categories described below will appear as captions and keywords in Picasa! If you use another program, let me know and I'll add your experiences to this guide. However, try Picasa!
Picasa has other features that are very useful for archive purposes. None of the image edits (except the caption, tags and geotagging) actually change the original photo until it is "saved" or "exported" to a different folder. Until then, the image edits are stored in a small, separate Picasa file. If you have made image edits in Picasa, it is easy to export a folder or group of photos to a folder intended for adding to or archiving in follow-on program in either the original photo size or your choice of a smaller photo size. At this stage, photos (or other digital assets) that have been captioned or tagged in Picasa - or other program - are ready for many future uses, especially ready identification by others at some future time. One important use of digital assets is the creation or a digital library or archive, for personal use, business use or as part of a museum collection. Section 4 below describes how to use/import captioned and tagged photos into the open-source Greenstone digital library software. 3. ExifTool - Another Method to Create and Use Embedded Metadata Two other free programs are potentially very useful to further add identification data to digital photos: "ReNamer" and "ExifTool". ReNamer can change the file name for an entire folder of photos in many ways, including adding metadata. For this sample group of photos, the caption was temporarily added to the file name as a prefix or suffix by selecting "ITPC Caption" choice. If the only the "accession number" was put in the Picasa caption, this data could be easily added to a group of photos. Phil Harvey's ExifTool has the ability to extract, add, copy or move nearly all types of metadata. The basic program must run from a command line, but with the correct configuration, it is very powerful and promises to "do everything" A download and very complete explanation of its functions are here. Using the command line functions are described on the webpage referenced below. Bogdan Hrastnik has written a GUI (Graphical User Interface, Windows only) for ExifTool, which allows very easy access to many of the ExifTool functions. The "how to" and download page for ExifToolGUI is here. For a guide to using both ExifTool and the ExifToolGUI, go to this webpage: ExifTool - Reading and Writing Embedded Metadata which is a section of this CarLibrary.org website. 4. Archiving and Classifying Photos with Greenstone Other sections of this website - and many Internet guides - offer good instructions on downloading and installing Greenstone.
Figure 1- Screen shot shows browse results on Captions starting with "T"
Figure 2 - A display of keywords starting with "1", which show the photos with the trial ID numbers starting with "12", etc. The Greenstone "search" function can also be used to locate a specific ID number - or other keyword. A Greenstone test archive of 190 personal photos taken at the Mullin Automotive Museum was made using only embedded metadata added in Picasa. The metadata includes location data for each photo: latitude and longitude. The newest Greenstone, version 3.0, can use this data for map displays. Note: the Greenstone lab team at the University of Waikato helped fix a bug that was preventing all images from being added to the collection. It was a simple fix - select "unicode" as an option of "input_encoding" in the plugin for embedded metadata. Another bug prevented viewing the embedded metadata in the "Enrich" panel for files with the uppercase "JPG" extension. This was fixed by a simple edit to the "util.pm" file in the "perllib" directory. Contact me for the bug fix sent by the Greenstone Users Group. Figure 3 - An archive of Mullin museum photos, this is the initial display of "Captions". Picasa was used to identify each image with captions and "car make" and "car year" as tags/keywords. The file names and photo dates are standard metadata embedded by the digital camera and extracted by Greenstone automatically. 5. Preliminary Recommendations These are based on these trials, practical considerations and guidelines for archives (U. S. National Archives and the Smithsonian Institution): Digital Photographs:
Scanned images:
Scanned Slides and Negatives:
Scanned Documents:
Note: TIFF files are a long-recognized standard for archiving photos and scanned images/documents. In a white paper "Guidelines for TIFF Metadata, Recommended Elements and Format", a US government standards organization recommends using "ImageDescription" for the subject of the item and "ImageUniqueID" for a unique file identifier. However, the seemingly logical "ImageUniqueID" was empty for TIFF files, but very much in use for digital camera images. Summary My technical knowledge of Greenstone is "moderate", so improvements to these processes will be by trial and error. I have queried the Greenstone user group (technical support) seeking a more efficient and clearer method to reach these results. At the very least, these tests seem to be on the right track - Picasa is a viable recommendation to initially organize and identify images, especially by reviewers and classifiers with average computer skills. Also, FineReader, the ExifTool and ExifToolGUI promise to be a powerful combination to improve embedded metadata of digital images; I will use these tools for my collections. Email me with any comments, suggestions or questions! Bob Schmitt, rgschmitt@gmail.com Created June 15, 2013 Revised December 1, 2014 and November 24, 2015 Note: The Greenstone collections of CarLibrary.org are hosted on a ThinkPad T61 system located in Burbank, using (free!) Linux Ubuntu server software and (also free!) Greenstone 2.85 (Linux) on a 60 GB SSD (OCZ) disk drive. |