CarLibrary.org - the Southward Car Museum Archive
A Greenstone collection/archive for the Southward Car Museum was first made in 2012 based on an Excel file, provided by Southward, of the museum's 300+ pre-war and post-war cars and motorcycles. Following standard Greenstone steps, this was converted to a CSV file and imported (exploded) into Greenstone. Southward also sent about 200 photos, which were augmented with personal photos. All were categorized in Greenstone with the standard Dublin Core metadata categories, plus author-created car-specific categories such as "car.make", "car.year", etc.
The archive was completed in December 2012 and not improved while other collections were created and the software category "collections management systems" (CMS) was investigated. Based on lessons learned during reviews of CMS software, the archive is now improved with new digital archival techniques. However, although these changes should improve searching and make adding new material closer to "best practices", an end-user will not notice much difference.
These are the improvements to the Southward archive:
Section D describes using of the ExifTool to efficiently put metadata into a group of photos from an Excel spreadsheet (exported to a CSV file). This should be tremendously useful for archives with many digital photos partly or poorly identified.
The initial list of Southward cars in the archive used a simple numbering scheme - the first Southward vehicle was "A001", the next one was "A002", etc. Motorcycles used a similar "M001, etc. numbering pattern. Museum and archives practices recommend an "accession number", which should be in the first section the year an object enters the collection, and then serialized for each object entering the collection after that. Multiple objects entering in a group would get a third digit.
The first Southward car was built in 1895 is now assigned "1895.001". This does not reflect when the cars actually "entered the collection", but this combination of year and serial number seems to be a practical approach.
Other resources - photos, documents, books - will use the archive convention if the "collection entry date" is known and meaningful. Most frequently, numbers are being assigned to the date(s) best identified with the object. A photograph or document from April, 1954 will become 1954.4.x with the "x" assigned as necessary. A spreadsheet is used record the file names, accession numbers and a description of the item. As discussed below, this data will be used eventually as "metadata" for each object/resource.
The spreadsheet made in 2012 was updated with these accession numbers. It also shows the original file name for photos from the Southward and the revised file names. This spreadsheet will be converted to a CSV file and used to write metadata to the photos after all are identified. In addition to the 200+ original photos, about 100 personal photos were taken in February, 2013.
Figure 1 below is the master spreadsheet showing Southward cars data and the assigned accession numbers.
Figure 1 - Extract from the Southward spreadsheet
Using "embedded metadata" is a further refinement. There are many standard metadata classifications but the primary one used by Greenstone is the "Dublin Core" which has 15 basic categories:
Table 1 - Dublin Core categories defined
These categories were known when the Southward archive was created, but were used inconsistently. To better understand improving their use, the Dublin Core examples were reviewed. Specific examples in Table 2 below come from this Southward photo (Figure 2).
Figure 2 - Southward Chrysler Airflow photo
Table 2 - Southward Dublin Core examples
Data similar to that in Table 2 is already in the Southward archive for all the photos and resources, but was not very well "controlled." To approach archival standards, many of the terms should adhere to a "controlled vocabulary" from an "authority. For example, the Library of Congress Name Authority File recognizes "Chrysler Airflow" as the preferred term for this car and this project will use this as part of a controlled vocabulary.
All the resources in the original Southward Greenstone archive are described and classified by "external" metadata, which are stored by the Greenstone program as separate files. For the most part, the work that was done to create this metadata is only useful in the Greenstone environment, with some exceptions for specialized exports of Greenstone files. Collection managers now realize there are many benefits to "embedding" the metadata in the digital objects, as this forms a link between the metadata and the object that can only be changed by deliberate editing.
Using "embedded metadata" is discussed elsewhere on this website and the ExifTool program allows this to be done efficiently. (A recent, unreviewed video on getting started with the ExifTool and GUI is here) The ExifTool (and the related "GUI" program) allow the metadata to be "read" to spreadsheet-compatible files. These, in turn, can be imported to Greenstone or a Greenstone archive can be built directly, using only the embedded metadata. The benefit of this - "type it once, use it many times".
One initial decision is to choose which categories to use for embedding data - there are hundreds, perhaps thousands! Picasa captions are put into XMP and IPTC categories. Picasa tags (keywords) are put into "XMP.Keywords" and the "DC.Subject" categories. The ExifToolGUI was found to be more flexible, useful and efficient than Picasa. There is some inconsistent transfer of keywords between IPTC.Keywords and DC.Subject when using Picasa to embed "tags".
Within the ExifToolGUI program, a custom Workspace file was created to embed metadata in certain Dublin Core, EXIF and XMP categories. The ExifToolGUI will display metadata in PDF, Word and Excel files. Although the the ExifToolGUI will write metadata to some PDF files, expect to use other programs (Acrobat, Lightning PDF, etc.) to embed metadata in PDF files. Word, Excel and compatible programs in LibreOffice and OpenOffice can embed limited categories of metadata in the "Properties" menu choice for "doc" and "xls" files.
Table 3, below, is the Workspace "set" (part of the "ExifToolGUIv5.ini" file) to use for work on the FN archive, created by much trial and error!
Table 3 - ExifToolGUI Workspace for the Southward archive
Below is the actual "WorkspaceTags" part of the "ini" file that will produce the Workspace described above and shown below. It can be copied and pasted into that corresponding section any ExifToolGUIv5.ini:
Figure 4 below shows what the ExifToolGUI sees on the image file from Figure 2 before adding the example data from Table 2 (above), as embedded metadata.
Figure 4 - Screenshot from the ExifToolGui
The payoff from using a modified Workspace manager with the ExifToolGUI is the ability to embed user-chosen data in digital photos and documents efficiently for near permanent use. Further, extracting the data - to spreadsheet compatible files - is easily done for many uses. Experience builds proficiency with the ExifToolGUI - it can embed data in selected batches of photos quickly. And there is an option to retain the original file date.
Figure 5, below, is an Excel spreadsheet made from ExifTool extraction of metadata embedded in 200+ images and documents in the original set of Southward photos in the "ExportPhotos" folder. This is the command used:
Note that no data was returned for most of the metadata categories and only a very few "Description" values are present.
Figure 5 - Screenshot from an Excel spreadsheet of extracted metadata
These Southward photos have hardly any desired embedded metadata. This will be fixed by using a modification of the Figure 1 Excel file, converted to a CSV file, to embed data in these photos.
It isn't clear if there is a set of the "best" metadata categories to use. The metadata set in Table 3 was used for trials of objects in the Southward archive beginning July 20, 2013 and continuing. Even after 200+ objects have been so enhanced, further readings and trials will ensure these are "correct", or reasonable, categories.
Based on more trials and feedback from reviewers with the sample of photos and documents in the archive, this tutorial will further make recommendations that may help others making Greenstone collections.
There were about 200 photos obtained directly from the Southward Car Museum and about 175 personal photos from two visits to the museum. The ExifTool GUI was used to put a value in "DC:Description" category - this appears as a caption on the photos in Picasa. A map location was also assigned in the ExifTool and appears in Picasa.
Adding a unique accession number (in the "DC:ResourceIdentifier" category) for each photo would be tedious using the ExifTool GUI, so the ExifTool "-tagsFromFile" option, used from the command line, was used attempted. This option is described as using data from a CSV file (export from Excel) to write to single, or multiple, images as new (or added) metadata.
The photos will be identified with an assigned accession number, title and accession number of the cars in the photos. These accession numbers were created as described above (e.g. "1928.007"), but a few cars could not be identified, so numbers such as "TBD.004" were assigned until identified. The "DC:Relation"/"Primary Object Number" is recommended for all photos and documents, to show the relation of the digital object to the physical object. In this case, a particular Southward car. This has the potential for very good, future benefits.
In the example in Figure 4 above, the subject Southward car is "1933.002". This technique can be used for collections and databases that include lists of car owners, cars, events and digital objects.
There are three groups of photos to update and these were the step for the first group in the "photos\april12" folder:
These steps were next done to the 204 original photos from Southward and 100 personal photos taken during February, 2013 .
The photos were then added ("gathered") to the Southward Greenstone archive replacing photos with the same name. Only a single Greenstone item of "external" metadata was added in the "DC:Description" category at the "folder" level: "Southward Car Museum 2014". This was done in the Greenstone "Enrich" function.
Greenstone extracts and uses the embedded metadata with its "EmbeddedMetadataPlugin". This was configured to extract "XMP.*" metadata, the category that was used by the Excel/CSV file to put the data into the photos. If this filter is not used, Greenstone will extract 250+ items of metadata from each photo, nearly all related to the original camera settings!
The photos can be reviewed in the Greenstone Southward archive in the "photo title (xmp)" or the "descriptions" browsing tabs. The cars are listed from the earliest to most recent by the "year of manufacture" of any car which is the last tab segment: "0-9".
Trials with alternative search and browsing categories are ongoing, as are changes to the display format of the search and browsing results. Greenstone reports extracting more than 90+ metadata items from most digital objects, so many, many search and display formats are possible!
It's important to note that embedding metadata in digital objects (photos, etc.) results in those objects being very well identified for many future uses - not only for Greenstone. Databases, Digital Asset Management (DAM) systems and Collections Management Systems (CMS, for archives and museums) can also use this data directly or indirectly through the above-described metadata export to CSV/Excel files.
For example, if a photo is found on the Southward archive (search for the color photo of the 1904 Wolseley) this full-sized photo (click on the thumbnail photo) can be saved - "Save Image As..." - and the metadata can be reviewed in the ExifToolGui or other programs. The file name may have been changed, but the original metadata has been preserved.
Email me with any questions or comments: Bob Schmitt, firstname.lastname@example.org