Frazer Nash Archive Improvements

The Frazer Nash Archive

Google announced on February 12, 2016 that Picasa, its desktop photo editing and management program, would not be supported after March, 2016. The Picasa Web Albums, the online feature of this program, would transition to Google Photos.

This webpage discusses using Picasa for basic photo captioning and metadata tagging, including location tagging. "Desktop" Picasa will function indefinitely for this use. Software ("app") program recommendations will be updated as replacements become known.

Introduction:

One of the first collections/archives made after learning Greenstone was for the Frazer Nash. It is based on an imported (exploded) CSV file of all the pre-war and post-war cars, about 440 cars (including "replicas"). The archive also contains about 120 objects, consisting of photos, documents and sample web pages. These objects were categorized in Greenstone with the standard Dublin Core metadata categories, plus author-created car-specific categories such as "car.make", "car.year", etc.

The Frazer Nash archive was completed in September 2012 and unchanged while other collections were created and other management systems reviewed. Based on lessons learned through work on Digital Asset Management systems (DAM) and Collection Management (CMS) systems, the archive was upgraded with new digital archival techniques. However, there is very little difference visible to an end-user. These changes improve searching and make the addition of new material more rationale. Also the Frazer Nash archive is closer to "best practices".

These are the types of improvements to the Frazer Nash archive:

A. Object/Resource Numbering

B. Improved Metadata Use

C. Use of Embedded Metadata

D. Frazer Nash Raid to New England - Creating Captions and Car IDs Using Embedded Metadata

E. Further Metadata Trials and Recommendations

Section D describes the first use of the ExifTool to efficiently put metadata into a group of photos from an Excel spreadsheet (exported to a CSV file). This should be very useful for archives with many digital photos which are partly or poorly identified.

A. Object/Resource Numbering

The initial list of Frazer Nash cars in the archive had a simple numbering scheme - the first Frazer Nash built is "F001", the next one is "F002", etc. Museum and archives practices recommend an "accession number", which should be in the first section the year an object enters the collection, and then just serialized for each object after that. Multiple objects entering in a group would get a third digit.

The first Frazer Nash built in 1925 with S/N 1008 is now assigned "1925.1008". None of these cars actually "enter the collection", but this combination of year and chassis number seems to be a practical approach. Postwar cars have serial numbers such as "421/100/168". The last three digits are uniquely adequate and easily remembered, so a specific car built in 1952 becomes "1952.168"

Other resources - photos, documents, books - will use the archive convention if the "collection entry date" is known and meaningful. Most frequently, numbers are being assigned to the date(s) best identified with the object. A photograph or document from April, 1954 will become 1954.4.xxx with the "xxx" assigned as necessary. A spreadsheet is used record the file names, accession numbers and a description of the item. As discussed below, this data will be used eventually as "metadata" for each object/resource.

Figure 1 below is the master spreadsheet showing data for the early cars and the assigned accession numbers.

Figure 1 - Extract from a Frazer Nash spreadsheet

This spreadsheet was converted to a CSV file and imported into the Frazer Nash archive (see a guide to this process here) on July 20, 2013. Actually, it was imported, deleted, imported, etc. a few times before it was done correctly! As forecast, there are no apparent changes for an end-user.

B. Improved Metadata Use

Other pages on this website explain what is "metadata" as it relates to digital resources. A complete and authoritative guide can be found at: Dublin Core User Guide. From this guide:

"A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author, title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf."

The linkage between a metadata record and the resource it describes may take one of two forms:

1. elements may be contained in a record separate from the item, as in the case of the library's catalog record; or
2. the metadata may be embedded in the resource itself."

Using "embedded metadata" is the next topic. There are many standard metadata classifications but the primary one used by Greenstone is the "Dublin Core" which has 15 basic categories:

Dublin Core Metadata	Definition
dc.Title	A name given to the resource.
dc.Subject (and keywords)	The topic of the resource.
dc.Description	An account of the resource.
dc.Date	A point or period of time associated with an event in the lifecycle of the resource.
dc.Type (of resource)	The nature or genre of the resource.
dc.Identifier (of resource)	An unambiguous reference to the resource within a given context.
dc.Source	A related resource from which the described resource is derived.
dc.Format	The file format, physical medium, or dimensions of the resource.
dc.Creator (author)	An entity primarily responsible for making the resource.
dc.Publisher	An entity responsible for making the resource available.
dc.Contributor (other)	An entity responsible for making contributions to the resource.
dc.Language	A language of the resource.
dc.Relation	A related resource.
dc.Coverage	The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
dc.Rights (management)	Information about rights held in and over the resource.

Table 1 - Dublin Core categories defined

These categories were known when the Frazer Nash archive was created, but their use was inconsistent. To better understand improving their use, the Dublin Core examples were reviewed. Specific examples in Table 2 below come from this Frazer Nash photo (Figure 2).

Figure 2 - Frazer Nash Mille Miglia publicity photo, Duke Donaldson is the driver

Dublin Core Metadata	Frazer Nash Examples
dc.Title	Frazer Nash Mille Miglia publicity photo, Duke Donaldson is the driver
dc.Subject (and keywords)	Frazer Nash, Mille Miglia, Duke Donaldson
dc.Description	Publicity photo of Mille Miglia 421/100/168 in New York
dc.Date	1952-10
dc.Type (of resource)	image
dc.Identifier (of resource)	1952.10.5.1
dc.Source	1952.10.5 (the accession number of the the photograph which was scanned)
dc.Format	image/jpg
dc.Creator (author)	Duke Donaldson
dc.Publisher	Bob Schmitt
dc.Contributor (other)	--
dc.Language	English
dc.Relation	1952.168 (the accession number of the actual car in the photograph)
dc.Coverage	1952-1953
dc.Rights (management)	NA

Table 2 - Frazer Nash Dublin Core examples

Data similar to that in Table 2 is already in the Frazer Nash archive for all the objects and resources, but was not very well "controlled." To approach archival standards, many of the terms should adhere to a "controlled vocabulary" from an "authority. For example, the Library of Congress Name Authority File recognizes "Frazer Nash" as the preferred term for this car and this project will use this as part of a controlled vocabulary. No authority could be found for John Stuart "Duke" Donaldson, the importer of several cars and owner of the Frazer Nash car and team that won Sebring in 1952.

C. Use of Embedded Metadata

In the original Frazer Nash Greenstone archive, all the resources are described and classified by "external" metadata, which are stored by the Greenstone program as separate files. For the most part, the work that was done to create this metadata is only useful in the Greenstone environment, with some exceptions for specialized exports of Greenstone files. Collection managers now realize there are many benefits to "embedding" the metadata in the digital objects, as this forms a link between the metadata and the object that can only be changed by deliberate editing.

Embedding metadata is discussed elsewhere on this website and the Picasa and ExifTool programs allows this to be done efficiently. (A recent, unreviewed video on getting started with the ExifTool and GUI is here) The ExifTool (and its GUI) allow the metadata to be extracted to spreadsheet compatible files. These, in turn, can be imported to Greenstone or Greenstone archive can be built directly, using only the embedded metadata. In summary - "type it once, use it many times".

One initial decision is to choose which categories to use for embedding data - there are hundreds, perhaps thousands! Picasa captions are put into XMP and IPTC categories: "Description". Picasa tags (keywords) are put into "XMP.Keywords" and the "DC.Subject" categories. The ExifToolGUI was found to be more flexible, useful and efficient than Picasa. There is some inconsistent transfer of keywords between IPTC.Keywords and DC.Subject when using Picasa to embed "tags".

In the ExifToolGUI program, a custom Workspace file was created to embed metadata in certain Dublin Core, EXIF and XMP categories. The ExifToolGUI will display metadata in PDF, Word and Excel files. Although the the ExifToolGUI will write metadata to some PDF files, expect to use other programs (Acrobat, Lightning PDF, etc.) to embed metadata in PDF files. Word, Excel and compatible programs in LibreOffice and OpenOffice can embed limited categories of metadata in the "Properties" menu choice for "doc" and "xls" files.

Table 3, below, is the Workspace "set" (part of the "ExifToolGUIv5.ini" file) to use for work on the FN archive, created by much trial and error!

Metadata Category	Category Code	Hint/example
[WorkspaceTags]
Accession/ID Number	Identifier	Accession or ID Number
DC:Title	Title	DC Title (alt: name of object)
DC:Description	Description	DC Description
DC:Subject	Subject	DC subject
IPTC:Keywords	IPTC:Keywords	Keywords/tags (use comma)
DC:Resource Identifier	Identifier	DC Resource Identifier
DC:Relation	Relation	DC Relation (to Primary Object)
DC:Source	Source	DC source (is part of)
DC:Creator	Creator	DC creator
DC:Date	Date	DC date
DC:Contributor	Contributor	DC contrib
DC:Coverage	Coverage	DC coverage
DC:Format	Format	DC format
DC:Type	Type	DC type
DC:Language	Language	DC language
DC:Publisher	Publisher	DC publisher
DC:Rights	Rights	DC rights
Artist/Author	Author	"Bob Schmitt"
Location	Location	Where created
Primary Object Number	XMP:Relation	Relation to Primary Object
PDF Title	pdf:Title	Document title
PDF Subject	pdf:Subject	Document subject
PDF Keywords	pdf:Keywords	Document keywords
CreateDate	exif:CreateDate	[2012:01:14 20:00:00]
DateTimeOriginal	exif:DateTimeOriginal	[2012:01:14 20:00:00]
FileAccessDate	FileAccessDate
FileName	FileName
FileSize	FileSize
FileType	FileType
ImageSize	ImageSize
PhotoShop: TextLayer	TextLayerText	Copyright data (if stored by Photoshop)
PDF Relation	pdf:Relation	Custom tag being tested; similar to DC:Relation
[TagList]

Table 3 - ExifToolGUI Workspace for the Frazer Nash archive

Below is the actual "WorkspaceTags" part of the "ini" file that will produce the Workspace described above and shown below. It can be copied and pasted into that corresponding section any ExifToolGUIv5.ini:

[WorkspaceTags]
Accession/ID Number=-Identifier^Accession or ID Number
DC:Title=-Title^DC Title (name of object)
DC:Description=-Description^DC Description
DC:Subject=-Subject^DC subject
IPTC:Keywords=-IPTC:Keywords^Keywords/tags (use comma)
DC:Resource Identifier=-Identifier^DC Resource Identifier
DC:Relation=-Relation^DC Relation (to primary object)
DC:Source=-Source^DC source (is part of)
DC:Creator=-Creator^DC creator
DC:Date=-Date^DC date
DC:Contributor=-Contributor^DC contrib
DC:Coverage=-Coverage^DC coverage
DC:Format=-Format^DC format
DC:Type=-Type^DC type
DC:Language=-Language^DC language
DC:Publisher=-Publisher^DC publisher
DC:Rights=-Rights^DC rights
Artist/Author=-Author^"Bob Schmitt"
Location=-Location^Where created
Primary Object Number=-XMP:Relation^Relation to primary Object
PDF Title=-pdf:Title^Document title
PDF Subject=-pdf:Subject^Document subject
PDF Keywords=-pdf:Keywords^Document keywords
CreateDate=-exif:CreateDate^[2012:01:14 20:00:00]
DateTimeOriginal=-exif:DateTimeOriginal^[2012:01:14 20:00:00]
FileAccessDate=-FileAccessDate
FileName=-FileName
FileSize=-FileSize
FileType=-FileType
ImageSize=-ImageSize
PhotoShop: TextLayer=-TextLayerText^Copyright
PDF Relation=-pdf:Relation^Custom tag being tested; similar to DC:Relation
[TagList]

Figure 4 below shows what the ExifToolGui sees on the image file from Figure 2 before adding the example data from Table 2 (above), as embedded metadata.

Figure 4 - Screenshot from the ExifToolGui

The payoff from using a modified Workspace manager with the ExifToolGUI is the efficient ability to embed user-chosen data in digital photos and documents for later search and retrieval. Further, extracting the data to spreadsheet compatible files is easily done for many future uses. Experience builds proficiency with the ExifToolGUI - it can embed data in selected batches of photos quickly. The original photo file date can also be preserved.

Figure 5, below, is an Excel spreadsheet made from ExifTool extraction of metadata embedded in 200+ images and documents in the Frazer Nash Greenstone archive. Although the "external metadata" in the original archive was not changed, this demonstrates that the title, subject, keywords, etc. embedded in these documents can also be used for classifying and searching the archive and for any other future need. Note that the "Identifier" category (column) is the new accession number assigned to these digital objects.

Figure 5 - Screenshot from an Excel spreadsheet of extracted metadata

This Frazer Nash archive has also embedded the "DC:Relation"/"Primary Object Number" in nearly all photos and documents, showing the relation of the digital object to the physical object (primarily a particular Frazer Nash car). This has the potential for very good, future benefits. In the example in Figure 4 above, the subject Frazer Nash car is "1952.168". This technique can be used for collections and databases that include lists of car owners, cars, events and digital objects.

It isn't clear if there is a set of the "best" metadata categories to use. The metadata set in Table 3 was used for trials of objects in the Frazer Nash archive beginning July 20, 2013 and continuing. Even after 200+ objects have been so enhanced, further readings and trials will ensure these are "correct", or reasonable, categories.

Based on more trials and feedback from reviewers with the sample of photos and documents in the archive, this webpage will further make recommendations that may help others making Greenstone collections.
D. Frazer Nash Raid to New England - Creating Captions and Car IDs Using Embedded Metadata

There are about 500 personal photos from the Frazer Nash Car Club Raid to New England, September 24 - October 3, 2013. Just over 400 were considered "good" and tagged into a Picasa Album, then uploaded to a Picasa Web Album. The ExifTool GUI was used to put captions in "DC:Description" category; these appear as captions in Picasa on the photos. The cars were also identified with an assigned accession number in the "DC:Relation"/"Primary Object Number" fields. These accession numbers were created as described above (e.g. "1952.196"), but a few cars could not be identified with a chassis number, so numbers such as "1937.UNK1" were assigned, for temporary, testing reasons. Keywords and map locations were also assigned in the ExifToolGUI and confirmed in Picasa.
Adding a unique accession number (in the "DC:ResourceIdentifier" category) for each photo would be tedious using the ExifToolGUI, so the ExifTool "-tagsFromFile" option, used from the command line. This option is described as using data from a CSV file ("saved as" from Excel) to write to entire folders of images as new (or added) metadata. After help from the ExifTool forum, this command was successfully run. These were the steps:

1.   The metadata was extracted from the photos by running the ExifTool from the (Windows) command line as follows to make a CSV/Excel file:

exiftool -csv -r -FileName -FileSize -Title -Identifier -Description -Subject -DateTimeOriginal -Relation -Keywords e:\DigitalLibrary\USRaid > Raid1030.csv

This produced the "Raid1030.csv" file, opened in Excel.

2. The metadata for each photo (in the Excel rows) and in each category (in the columns) was checked.

3. Specific data for 102 photos of individual cars was copied from the "Keywords" column to the empty "Title" column. This set of photos was the initial selection of photos to be added to the Greenstone archive.

4. The Excel "data fill" function was used to create an "accession number" for all 400+ photos in the format "2013.9.1", "2013.9.2" etc. in the "Identifier" column.

5. Columns that had no new data were deleted, leaving only "SourceFile", "Title" and "Identifier".

6. The Excel file was saved in the "CSV" format, using a new file name: "Raid1030input.csv". This was done to prevent confusion with the CSV file which extracted the metadata from the photos.

7. A command window was opened and the "e:\DigitalLibrary\USRaid\" drive and directory for the photos was maneuvered to.

8. This ExifTool command was entered:

exiftool -csv=Raid1030input.csv -ext jpg -v2 e:\DigitalLibrary\USRaid\

9. Success! The accession numbers were added to all 405 photos as "Identifiers" and the 102 individual car photos now had "Titles". The ExifTool had backed up the original photos with an added "original" file extension.

77 of the 102 photos were further selected and added to the Frazer Nash Greenstone archive. Only a single Greenstone item of "external" metadata was added in the "DC:Description" category at the "folder" level: "Frazer Nash cars on the Raid to New England, 2013" using the Greenstone "Enrich" function.

The Raid car photos can be reviewed in the Greenstone archive in the "titles" browsing tab by looking for the "year of manufacture" of any car; it's the last tab segment: "0-9".

One anomaly was noted in the displayed "Document/photo date" field for some Raid cars - the date of the most recent photo modification (when photos were resized smaller for this archive by an export from Picasa) is displayed. For other cars, the preferred "DateTimeOriginal" is shown. Review of the metadata in Greenstone shows "DateTimeOriginal" has not been consistently extracted from all files; this is an issue for further investigation.

To provide another method to find the Raid cars, the Greenstone "Create" function was used to add the metadata category "ex.XMP.Relation" to "car.Serial" as a search index. Because this archive primarily holds digital objects classified with metadata originally imported as Greenstone "external" metadata, "car.Serial" is on nearly all these original objects. The Raid cars embedded field "Relation" is now the "accession number", assigned by the year AND serial number of each car (if known).

The full metadata descriptor for "Relation" is "ex.XMP.Relation". Greenstone will search both of these fields in the single "car serial number" search box. For example, searching for "2065" (look for "2065" by searching for "car serial number") will display two photos of the 1932 TT Replica that visited the Raid at the Lime Rock race track and the simple (nul) record originally imported in July, 2013. See section A. above and Figure 1.

The ExifTool was later used on the original photographs in the "sep13" and "oct13" folder, in two steps. First, all photos had complete (and new) accession numbers added, even those not related to the Raid. Next, those photos intended for the Greenstone archive had "Titles" added, exactly as done previously. This new metadata was visible, of course, in the Picasa Album for the Raid.   Later all the original Raid photos had accession numbers and "titles" added. Individual car photos were exported and resized, as done previously, and the photos replaced those previously in the Frazer Nash Archive. These steps were repeated to develop and confirm a process that can be recommended for other collections and archives.

When a photo is found or viewed on the Frazer Nash archive (search for "2065" as above), this photo can be saved - "Save Image As..." - and the metadata can be reviewed in the ExifToolGUI or other programs. The file name may have been changed, but the original metadata has been preserved.

Alternative Greenstone search and browsing categories are possible, as are changes to the display format of the search and browsing results. Greenstone reports it has extracted 90+ metadata items from most digital objects, so many, many search and display formats are possible!

E. Further Metadata Trials and Recommendations

After a visit to the Frazer Nash Archives in September 2014, the command-line ExifTool was used to create Excel spreadsheets from more than 2000 Frazer Nash photos in 53 subdirectories of a single "AFNPics" folder/directory to evaluate its possible application to the digital resources in the Frazer Nash Archives. The ExifTool was also used to create embedded metadata for the 800+ travel photos in England.

Based on this recent experience, these recommendations should be considered as "next steps" for any collection of digital assets:

1. Use the ExifTool from the command line to read entire folders/subfolders of photos.

2. Review the resulting Excel file ("save as" from the CSV output file) to determine which metadata categories will help organize these collection assets.

3. The Dublin Core categories should have high priority, especially the "DC-Identifier" category which will prove very useful if used for a unique "accession number" for each digital asset. Although the documentation for the ExifTool states that new (i.e., car-specific) metadata categories can be created, this is complex. An accession number would be the best method to link the default embedded metadata categories (Dublin Core and similar) to car-specific categories that can be more easily created on a Collections Management System (CMS) and/or Greenstone. Using accession numbers is also a museum "best practice".

4. Use the Excel copy/paste functions to file in missing metadata. Use the Excel "data fill" command to create accession numbers in the DC:Identifier category.

5. Use the ExifTool from the command line to write the Excel file (in CSV format) back to the entire set of photo folders/subfolders.

6. At any future time, you may use the ExifTool again from the command line to read these "metadata-updated" photo folders/subfolders to create data-rich Excel files for import into a collections management, content management or digital library (e.g. Greenstone) software program.

In conclusion, why consider creating "embedded metadata"? Most significantly, embedding metadata in digital objects (photos, etc.) results in those objects being very well identified for many future uses - not only for Greenstone! Databases, Digital Asset Management (DAM) systems and Collections Management Systems (CMS, for archives and museums) most always can use the exported Excel-file data directly or indirectly as imports into their systems.

Email me with any questions or comments!

Bob Schmitt, rgschmitt@gmail.com

November 3, 2014

update December 10, 2015