CarLibrary.org - the Southward Car Museum Archive

April 24, 2016

Introduction

A Greenstone collection/archive for the Southward Car Museum was first made in 2012 based on an Excel file, provided by Southward, of the museum's 300+ pre-war and post-war cars and motorcycles.  Following standard Greenstone steps, this was converted to a CSV file and imported (exploded) into Greenstone.  Southward also sent about 200 photos, which were augmented with personal photos.  All were categorized in Greenstone with the standard Dublin Core metadata categories, plus author-created car-specific categories such as "car.make", "car.year", etc.

The archive was completed in December 2012 and not improved while other collections were created and the software category "collections management systems" (CMS) was investigated.  Based on lessons learned during reviews of CMS software, the archive is now improved with new digital archival techniques.  However, although these changes should improve searching and make adding new material closer to "best practices", an end-user will not notice much difference.  

These are the improvements to the Southward archive:

A. Object/Resource Numbering

B. Improved Metadata Use

C. Use of Embedded Metadata

D. Creating Captions and Car IDs Using Embedded Metadata

Section D describes using of the ExifTool to efficiently put metadata into a group of photos from an Excel spreadsheet (exported to a CSV file).  This should be tremendously useful for archives with many digital photos partly or poorly identified.

A. Object/Resource Numbering

The initial list of Southward cars in the archive used a simple numbering scheme - the first Southward vehicle was "A001", the next one was "A002", etc.  Motorcycles used a similar "M001, etc. numbering pattern.  Museum and archives practices recommend  an "accession number", which should be in the first section the year an object enters the collection, and then serialized for each object entering the collection after that.  Multiple objects entering in a group would get a third digit. 

The first Southward car was built in 1895 is now assigned "1895.001".  This does not reflect when the cars actually "entered the collection", but this combination of year and serial number seems to be a practical approach.  

Other resources - photos, documents, books - will use the archive convention if the "collection entry date" is known and meaningful.  Most frequently, numbers are being assigned to the date(s) best identified with the object.  A photograph or document from April, 1954 will become 1954.4.x with the "x" assigned as necessary.  A spreadsheet is used record the file names, accession numbers and a description of the item.  As discussed below, this data will be used eventually as "metadata" for each object/resource.

The spreadsheet made in 2012 was updated with these accession numbers.  It also shows the original file name for photos from the Southward and the revised file names.  This spreadsheet will be converted to a CSV file and used to write metadata to the photos after all are identified.  In addition to the 200+ original photos, about 100 personal photos were taken in February, 2013.

Figure 1 below is the master spreadsheet showing Southward cars data and the assigned accession numbers.

Figure 1 - Extract from the Southward spreadsheet

B. Improved Metadata Use

Other pages on this website explain "metadata" as it relates to digital resources.  A complete and authoritative guide can be found at: Dublin Core User Guide.  From this guide:

"A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author, title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf."

The linkage between a metadata record and the resource it describes may take one of two forms:

1.  elements may be contained in a record separate from the item, as in the case of the library's catalog record; or

2.  the metadata may be embedded in the resource itself."

Using "embedded metadata" is a further refinement.  There are many standard metadata classifications but the primary one used by Greenstone is the "Dublin Core" which has 15 basic categories:

Dublin Core Metadata

Definition

dc.Title

A name given to the resource.

dc.Subject (and keywords)

The topic of the resource.

dc.Description

An account of the resource.

dc.Date

A point or period of time associated with an event in the lifecycle of the resource.

dc.Type (of resource)

The nature or genre of the resource.

dc.Identifier (of resource)

An unambiguous reference to the resource within a given context.

dc.Source

A related resource from which the described resource is derived.

dc.Format

The file format, physical medium, or dimensions of the resource.

dc.Creator (author)

An entity primarily responsible for making the resource.

dc.Publisher

An entity responsible for making the resource available.

dc.Contributor (other)

An entity responsible for making contributions to the resource.

dc.Language

A language of the resource.

dc.Relation

A related resource.

dc.Coverage

The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.

dc.Rights (management)

Information about rights held in and over the resource.

Table 1 - Dublin Core categories defined

These categories were known when the Southward archive was created, but were used inconsistently.  To better understand improving their use, the Dublin Core examples were reviewed.  Specific examples in Table 2 below come from this Southward photo (Figure 2).  

Figure 2 - Southward Chrysler Airflow photo

Dublin Core Metadata

Southward Examples

dc.Title

1933 Chrysler Airflow

dc.Subject (and keywords)

Southward Car Museum, Chrysler, Airflow, 1933

dc.Description

Southward Car Museum - 2013

dc.Date

2013-02-14

dc.Type (of resource)

image

dc.Identifier (of resource)

1933.002.1

dc.Source

1933.002.1 (same; the accession number of the digital photograph)

dc.Format

image/jpg

dc.Creator (author)

Bob Schmitt

dc.Publisher

Bob Schmitt

dc.Contributor (other)

Southward Car Museum

dc.Language

English

dc.Relation

1933.002 (the accession number of the actual car in the photograph)

dc.Coverage

1933-1936

dc.Rights (management)

NA

Table 2 - Southward Dublin Core examples

Data similar to that in Table 2 is already in the Southward archive for all the photos and resources, but was not very well "controlled."  To approach archival standards, many of the terms should adhere to a "controlled vocabulary" from an "authority.  For example, the Library of Congress Name Authority File recognizes "Chrysler Airflow" as the preferred term for this car and this project will use this as part of a controlled vocabulary.

C. Use of Embedded Metadata

All the resources in the original Southward Greenstone archive are described and classified by "external" metadata, which are stored by the Greenstone program as separate files.   For the most part, the work that was done to create this metadata is only useful in the Greenstone environment, with some exceptions for specialized exports of Greenstone files.  Collection managers now realize there are many benefits to "embedding" the metadata in the digital objects, as this forms a link between the metadata and the object that can only be changed by deliberate editing.

Using "embedded metadata" is discussed elsewhere on this website and the ExifTool program allows this to be done efficiently.  (A recent, unreviewed video on getting started with the ExifTool and GUI is here) The ExifTool (and the related "GUI" program) allow the metadata to be "read" to spreadsheet-compatible files.  These, in turn, can be imported to Greenstone or a Greenstone archive can be built directly, using only the embedded metadata.  The benefit of this - "type it once, use it many times".

One initial decision is to choose which categories to use for embedding data - there are hundreds, perhaps thousands!  Picasa captions are put into XMP and IPTC categories.  Picasa tags (keywords) are put into "XMP.Keywords" and the "DC.Subject" categories.  The ExifToolGUI was found to be more flexible, useful and efficient than Picasa.  There is some inconsistent transfer of keywords between IPTC.Keywords and DC.Subject when using Picasa to embed "tags".

Within the ExifToolGUI program, a custom Workspace file was created to embed metadata in certain Dublin Core, EXIF and XMP categories.  The ExifToolGUI will display metadata in PDF, Word and Excel files.  Although the the ExifToolGUI will write metadata to some PDF files, expect to use other programs (Acrobat, Lightning PDF, etc.) to embed metadata in PDF files.  Word, Excel and compatible programs in LibreOffice and OpenOffice can embed limited categories of metadata in the "Properties" menu choice for "doc" and "xls" files.

Table 3, below, is the Workspace "set" (part of the "ExifToolGUIv5.ini" file) to use for work on the FN archive, created by much trial and error!

Metadata Category

Category Code

Hint/example

[WorkspaceTags]

Accession/ID Number Identifier Accession or ID Number
DC:Title Title DC Title (alt: name of object)
DC:Description Description DC Description
DC:Subject Subject DC subject
IPTC:Keywords IPTC:Keywords Keywords/tags (use comma)
DC:Resource Identifier Identifier DC Resource Identifier
DC:Relation Relation DC Relation (to Primary Object)
DC:Source Source DC source (is part of)
DC:Creator Creator DC creator
DC:Date Date DC date
DC:Contributor Contributor DC contrib
DC:Coverage Coverage DC coverage
DC:Format Format DC format
DC:Type Type DC type
DC:Language Language DC language
DC:Publisher Publisher DC publisher
DC:Rights Rights DC rights
Artist/Author Author "Bob Schmitt"
Location Location Where created
Primary Object Number XMP:Relation Relation to Primary Object
PDF Title pdf:Title Document title
PDF Subject pdf:Subject Document subject
PDF Keywords pdf:Keywords Document keywords
CreateDate exif:CreateDate [2013:02:14 20:00:00]
DateTimeOriginal exif:DateTimeOriginal [2013:02:14 20:00:00]
FileAccessDate FileAccessDate  
FileName FileName  
FileSize FileSize  
FileType FileType  
ImageSize ImageSize  
PhotoShop: TextLayer TextLayerText Copyright data (if stored by Photoshop)
PDF Relation pdf:Relation Custom tag being tested; similar to DC:Relation
[TagList]

Table 3 - ExifToolGUI Workspace for the Southward archive

Below is the actual "WorkspaceTags" part of the "ini" file that will produce the Workspace described above and shown below.  It can be copied and pasted into that corresponding section any ExifToolGUIv5.ini:

[WorkspaceTags]
Accession/ID Number=-Identifier^Accession or ID Number
DC:Title=-Title^DC Title (name of object)
DC:Description=-Description^DC Description
DC:Subject=-Subject^DC subject
IPTC:Keywords=-IPTC:Keywords^Keywords/tags (use comma)
DC:Resource Identifier=-Identifier^DC Resource Identifier
DC:Relation=-Relation^DC Relation (to primary object)
DC:Source=-Source^DC source (is part of)
DC:Creator=-Creator^DC creator
DC:Date=-Date^DC date
DC:Contributor=-Contributor^DC contrib
DC:Coverage=-Coverage^DC coverage
DC:Format=-Format^DC format
DC:Type=-Type^DC type
DC:Language=-Language^DC language
DC:Publisher=-Publisher^DC publisher
DC:Rights=-Rights^DC rights
Artist/Author=-Author^"Bob Schmitt"
Location=-Location^Where created
Primary Object Number=-XMP:Relation^Relation to primary Object
PDF Title=-pdf:Title^Document title
PDF Subject=-pdf:Subject^Document subject
PDF Keywords=-pdf:Keywords^Document keywords
CreateDate=-exif:CreateDate^[2012:01:14 20:00:00]
DateTimeOriginal=-exif:DateTimeOriginal^[2012:01:14 20:00:00]
FileAccessDate=-FileAccessDate
FileName=-FileName
FileSize=-FileSize
FileType=-FileType
ImageSize=-ImageSize
PhotoShop: TextLayer=-TextLayerText^Copyright
PDF Relation=-pdf:Relation^Custom tag being tested; similar to DC:Relation
[TagList]

Figure 4 below shows what the ExifToolGUI sees on the image file from Figure 2 before adding the example data from Table 2 (above), as embedded metadata.

Figure 4 - Screenshot from the ExifToolGui

The payoff from using a modified Workspace manager with the ExifToolGUI is the ability to embed user-chosen data in digital photos and documents efficiently for near permanent use.  Further, extracting the data - to spreadsheet compatible files - is easily done for many uses.  Experience builds proficiency with the ExifToolGUI - it can embed data in selected batches of photos quickly.  And there is an option to retain the original file date.

Figure 5, below, is an Excel spreadsheet made from ExifTool extraction of metadata embedded in 200+ images and documents in the original set of Southward photos in the "ExportPhotos" folder.  This is the command used:

exiftool -csv -r -FileName -FileSize -Title -Identifier -Description -Subject -DateTimeOriginal -relation -keywords e:\DigitalLibrary\southward\exportphotos > southwardexportphotos.csv

Note that no data was returned for most of the metadata categories and only a very few "Description" values are present.

Figure 5 - Screenshot from an Excel spreadsheet of extracted metadata

These Southward photos have hardly any desired embedded metadata.  This will be fixed by using a modification of the Figure 1 Excel file, converted to a CSV file, to embed data in these photos.

It isn't clear if there is a set of the "best" metadata categories to use.  The metadata set in Table 3 was used for trials of objects in the Southward archive beginning July 20, 2013 and continuing. Even after 200+ objects have been so enhanced, further readings and trials will ensure these are "correct", or reasonable, categories.

Based on more trials and feedback from reviewers with the sample of photos and documents in the archive, this tutorial will further make recommendations that may help others making Greenstone collections.

D. Creating Captions and Car IDs Using Embedded Metadata

There were about 200 photos obtained directly from the Southward Car Museum and about 175 personal photos from two visits to the museum.  The ExifTool GUI was used to put a value in "DC:Description" category - this appears as a caption on the photos in Picasa.  A map location was also assigned in the ExifTool and appears in Picasa.

Adding a unique accession number (in the "DC:ResourceIdentifier" category) for each photo would be tedious using the ExifTool GUI, so the ExifTool "-tagsFromFile" option, used from the command line, was used attempted.  This option is described as using data from a CSV file (export from Excel) to write to single, or multiple, images as new (or added) metadata. 

The photos will be identified with an assigned accession number, title and accession number of the cars in the photos. These accession numbers were created as described above (e.g. "1928.007"), but a few cars could not be identified, so numbers such as "TBD.004" were assigned until identified.   The "DC:Relation"/"Primary Object Number" is recommended for all photos and documents, to show the relation of the digital object to the physical object.  In this case, a particular Southward car. This has the potential for very good, future benefits.  

In the example in Figure 4 above, the subject Southward car is "1933.002". This technique can be used for collections and databases that include lists of car owners, cars, events and digital objects.

There are three groups of photos to update and these were the step for the first group in the "photos\april12" folder:

1.   The metadata was extracted from the photos by running the ExifTool from the (Windows) command line as follows, to make a CSV/Excel file:

E:\DigitalLibrary\Southward\exportphotos

exiftool -csv -r -FileName -FileSize -Title -Identifier -Description -Subject -DateTimeOriginal -relation -keywords e:\photos\april12 > april12exportphotos.csv

This produced the "april12exportphotos.csv" file, which was opened in Excel.

2. The metadata for each photo (in the Excel rows) and in each category (in the columns) was reviewed.

3.  Specific data for 76 individual car photos was copied from the "Description" column to the empty "Title" column. This set of photos was the first of three sets to be added to the revised Greenstone archive.

4.  The Excel "data fill" function was used to create an "accession number" for all photos in the format "2012.2.1", "2012.3.2" etc. in the "Identifier" column.

5.  Columns that had no new data were deleted, leaving only "SourceFile", "Title", "Relation" and "Identifier".

6.  The Excel file was saved in the "CSV" format, using a new file name: "april12export2.csv". This was done to prevent confusion with the CSV file which extracted the metadata from the photos.

7.  A command window was opened and from the prompt, we switched to drive and a new directory for the photos: "e:\test" 

8.  This ExifTool command was entered:

Exiftool -csv=april12export2.csv -ext jpg -v2 e:\test\*.jpg > Apr8.txt

9.  Success!  The accession numbers were added to all 76 photos as "Identifiers". The individual car photos now have "Title", "Description", "Relation" and "Contributor" (the manufacturer).  The Exiftool had backed up the original photos with an added "original" file extension.

These steps were next done to the 204 original photos from Southward and 100 personal photos taken during February, 2013 .

The photos were then added ("gathered") to the Southward Greenstone archive replacing photos with the same name. Only a single Greenstone item of "external" metadata was added in the "DC:Description" category at the "folder" level: "Southward Car Museum 2014".  This was done in the Greenstone "Enrich" function.

Greenstone extracts and uses the embedded metadata with its "EmbeddedMetadataPlugin".  This was configured to extract "XMP.*" metadata, the category that was used by the Excel/CSV file to put the data into the photos.  If this filter is not used, Greenstone will extract 250+ items of metadata from each photo, nearly all related to the original camera settings!

The photos can be reviewed in the Greenstone Southward archive in the "photo title (xmp)" or the "descriptions" browsing tabs.  The cars are listed from the earliest to most recent by the "year of manufacture" of any car which is the last tab segment: "0-9".

Trials with alternative search and browsing categories are ongoing, as are changes to the display format of the search and browsing results.  Greenstone reports extracting more than 90+ metadata items from most digital objects, so many, many search and display formats are possible!

It's important to note that embedding metadata in digital objects (photos, etc.) results in those objects being very well identified for many future uses - not only for Greenstone.  Databases, Digital Asset Management (DAM) systems and Collections Management Systems (CMS, for archives and museums) can also use this data directly or indirectly through the above-described metadata export to CSV/Excel files.

For example, if a photo is found on the Southward archive (search for the color photo of the 1904 Wolseley) this full-sized photo (click on the thumbnail photo) can be saved - "Save Image As..." - and the metadata can be reviewed in the ExifToolGui or other programs.  The file name may have been changed, but the original metadata has been preserved. 

Email me with any questions or comments: Bob Schmitt, rgschmitt@gmail.com