The Car Archive Project: How to Make an Archive
How and Why Should I Organize My Cars, Photos and Documentation?
The full set of CarLibrary.org webpages review and recommend the use of the open-source Greenstone digital library software program for car historians, collectors, clubs, museums and collections, to encourage the creation of digital archives.
However, many car hobbyists simply would like to have their car(s), photos and documentation better organized.
This webpage describes a step-by-step process which will help a collector go from "stuff" to a well-organized digital archive/library. The benefits of better organization are realized at each stage.
These recommended steps are based on my experience over 10+ years to get better organized, digital archive/library practices, documentation from professionals - and commons sense!
One goal is to make good progress on organizing without repeating "data entry" and make use of the identifying information (metadata) which may already be on, and in, your files. A second goal is to incorporate both professional and practical library/archive standards so that the collection - or its documents - can be shared with others.
This webpage covers the following topics:
A free online training course, "Digital Libraries, Repositories and Documents" is very useful to learn terms, practices and steps to create a digital library. Regular reference to this site and just a scan of the lessons can help very much. I reviewed most of these lessons in March, 2013.
A more traditional reference source is "How to Build a Digital Library, Second Edition" by Ian H. Witten, David Bainbridge and David M. Nichols. Reviews of this book note it could be used as a college text for digital libraries/archives.
More that two-thirds of this book is an excellent introduction to the
concepts, history and issues of digital libraries - all relevant to
everything a collector is likely to do. The remaining third of the book is
a very good tutorial on the Greenstone digital library software.
My copy is always on my desk! Why should a collector or historian want to be better organized or work towards
an "archive"? Some car hobbyists like to work and collect solely for
their own pleasure. Other collectors/historians want to share their
"collection" with others. Most collectors/historians enjoy
seeing other car collections, libraries and "other material". Automotive - and other - collections can be "organized"
in many different ways and from "casual" to
file folders and cabinets are used to store documents and photos.
Artifacts (including spare parts!), books, magazines may be on shelves
or in boxes. For the collector who enjoys his collection alone,
improved organization can make all of this material easier to access.
Further, better organization may show relationships between some objects that
is new knowledge - such as the commonality of a part to several
manufacturers. For collectors who plan to share their collected
material and look for photos, documentation or literature from others,
organization is very important. A collector may be willing to sort
through partly organized material a few times looking for something
particular, but random searches through files or books are not helpful
for any practical exchanges of information. Digital photos, digitized documents and Internet sources
have greatly complicated the organization task - they initially have
cryptic file names, are only viewable on a electronic device and remain
numerous and unorganized without human intervention. To summarize, goals may include: a. Make your assets ("stuff") better identified
and organized so you can find what you need! b. Organize your assets in a standard
categories so it can be shared with club members, other owners, or even
world" by means of the Internet. c. Use a database, content management, or digital
library software to make it easier to find particular items in my
collection. Perhaps the first step is recognizing that the
collection's assets are physical items! This may seem elementary, but it may help
make a better understanding of the first steps: Step One - Create an Inventory Because this topic is about car collections, make a
list of the relevant cars. Plan to soon add a separate list of car parts.
Experience has shown that an Excel spreadsheet is very good for this
purpose and for
many hobbyists, Excel will be all that is needed. If a more
capable inventory system is needed, Microsoft's Access database program is very
powerful and there is an easy transition from Excel to Access. There are also commercial
database software systems designed
specifically for museums and collections. However, the "data" from a well-designed Excel spreadsheet
can be exported to nearly any well-designed database. This
the need for new data entry and postpones decisions on Excel
alternatives. If there is a printed or card-file inventory for the
collection, it is possible it can be scanned and converted (Optical
Character Recognition, "OCR") into data that can be imported
into Excel. If a different database has been used, such as
(ancient) dBase, File
Maker Pro, etc., the data can also be exported into a file type compatible
with Excel. What is a "well-designed" (Excel) file?
Without getting too deeply into database file design, each item (for
example, a car) should be on a single Excel row. Each
characteristic, "Make", "Model", "Year",
etc. should be a separate column heading. In database terminology,
each row is a "record" and each column heading is a
"field". Each type of asset -
cars, books, photos, owner records - should be on a separate Excel file.
If any data element repeats frequently, such as "General Motors
Corporation", for the Manufacturer, it can be considered for
placement in a separate table/file. This technique is
"database normalization", which will be discussed in the next
step. This is an example of a Excel table/file for cars:
Why should a collector or historian want to be better organized or work towards an "archive"?
Some car hobbyists like to work and collect solely for their own pleasure. Other collectors/historians want to share their "collection" with others. Most collectors/historians enjoy seeing other car collections, libraries and "other material".
Automotive - and other - collections can be "organized" in many different ways and from "casual" to "obsessive". Traditionally, file folders and cabinets are used to store documents and photos. Artifacts (including spare parts!), books, magazines may be on shelves or in boxes.
For the collector who enjoys his collection alone, improved organization can make all of this material easier to access. Further, better organization may show relationships between some objects that is new knowledge - such as the commonality of a part to several manufacturers.
For collectors who plan to share their collected material and look for photos, documentation or literature from others, organization is very important. A collector may be willing to sort through partly organized material a few times looking for something particular, but random searches through files or books are not helpful for any practical exchanges of information.
Digital photos, digitized documents and Internet sources have greatly complicated the organization task - they initially have cryptic file names, are only viewable on a electronic device and remain numerous and unorganized without human intervention.
To summarize, goals may include:
a. Make your assets ("stuff") better identified and organized so you can find what you need!
b. Organize your assets in a standard categories so it can be shared with club members, other owners, or even "the world" by means of the Internet.
c. Use a database, content management, or digital library software to make it easier to find particular items in my collection.
Perhaps the first step is recognizing that the collection's assets are physical items! This may seem elementary, but it may help make a better understanding of the first steps:
Step One - Create an Inventory
Because this topic is about car collections, make a list of the relevant cars. Plan to soon add a separate list of car parts. Experience has shown that an Excel spreadsheet is very good for this purpose and for many hobbyists, Excel will be all that is needed. If a more capable inventory system is needed, Microsoft's Access database program is very powerful and there is an easy transition from Excel to Access. There are also commercial database software systems designed specifically for museums and collections. However, the "data" from a well-designed Excel spreadsheet can be exported to nearly any well-designed database. This eliminates the need for new data entry and postpones decisions on Excel alternatives.
If there is a printed or card-file inventory for the collection, it is possible it can be scanned and converted (Optical Character Recognition, "OCR") into data that can be imported into Excel. If a different database has been used, such as (ancient) dBase, File Maker Pro, etc., the data can also be exported into a file type compatible with Excel.
What is a "well-designed" (Excel) file? Without getting too deeply into database file design, each item (for example, a car) should be on a single Excel row. Each characteristic, "Make", "Model", "Year", etc. should be a separate column heading. In database terminology, each row is a "record" and each column heading is a "field".
Each type of asset - cars, books, photos, owner records - should be on a separate Excel file. If any data element repeats frequently, such as "General Motors Corporation", for the Manufacturer, it can be considered for placement in a separate table/file. This technique is "database normalization", which will be discussed in the next step.
This is an example of a Excel table/file for cars:
What is the "Accession Number" as used in the first column? This Wikipedia definition states: "In libraries and museums and other archives, an accession number or catalogue number is a unique, usually sequential, number given to each new item acquired, as it is catalogued."
Why should any (small) collector or historian care about this? If we look forward a few steps, there will be a time when you want to match a regular or digital photo, a document or even a part to a particular car. If your car(s) have an accession number, this will be the basis for a cross-reference in a database or digital archive. Or in any list! If you start using a unique numbering system at the beginning, you'll save much effort later.
Museums use a system of accession numbers based on the acquisition date and hand-written log books to establish provenance. This may not be important to a car collection and the exact date any item was acquired may be difficult to establish. I am using a system based on the "best guess" of year and month acquired, with three digits after that. A car bought in December 1975 could be therefore "75.12.1", but perhaps "1975.12.1" may be necessary if a collection spans more than 100 years.
For digital assets, such as photos or scanned documents, an accession number system can greatly help identifying multiple, lesser quality (or later) copies of the original. Although best museum practice states not to use suffixes on accession numbers, making subsequent copies of an original scan as "2012.12.1a", "2012.12.1b", etc. can be useful.
The Accession Number should be written (pencil) on each document, photograph and book. Perhaps also "on" each car and part.
This is a sample Excel file for books:
Note: The books file above was created and later enhanced through the use of LibraryThing.com; this service is discussed in detail below. Note also that an accession number to these books is not yet assigned.
Photographs and documents should use a format similar to that of books. A caption (description), date, location, and photographer should be the minimum data for each photo.
The digital scans of these photos and digital images will be discussed below; if you first concentrate on your "physical assets", the management of the digital representations of the physical objects may be more clear.
If you have lists of people, such as contacts or former owners of the car(s), these can be put into a standard mailing list format: last name, first name, address, etc., again in an Excel file.
Similarly, if there are lists of shows, rallies, races or other events that relate to the car(s), these can be recorded in a separate Excel file with "event name", "event type", "date", "award", etc. as the column headings.
Step Two - Review and Improve the Inventory Files
Each Excel file (table) should be reviewed for consistent and accurate data. As mentioned above, data redundancy can be lessened by the practice of "normalization". From Wikipedia: "Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships."
In practice, this is not so critical for most Excel files because copying cells is easy and "data storage" is cheap. As discussed in more detail below, using an Accession Number for each item (or for the data entry line of a person's name or business) makes cross-referencing (or "linking") items much easier. For example, it is easier to make Excel/database entries for 40 photographs of a particular 1955 Chevrolet with (dummy) Accession Number 19188.8.131.52 rather than repeat "1955 Chevrolet Bel Air S/N 1276ZTR0008" 40 times. Especially if the Excel history file (or actual collection) contains information on 27 different Chevrolet Bel Airs!
Because combining and splitting cells, creating consecutive numbers and moving rows and columns are normal operations in Excel, Accession Numbers can be added at any time and normalization can be improved.
Is it a "Chevy" or a "Chevrolet"? Is the company "GM", "General Motors Corporation" or "General Motors Company"? This is an issue that has concerned librarians almost since the first library! This introduces the concept of "name authorities". One source you should be aware of is the Library of Congress Authorities. For example, the Library of Congress recognizes both "General Motors Company" and "General Motors Corporation" as authorities at different times. If you want to "get it right" and create files and descriptors that are "standard" and can easily be searched by others, this is the right place to start.
There are similar authorities in other countries and there is an international service also, the Virtual International Authority File (VIAF). The Library of Congress states "Voisin, Gabriel, 1880-1973" is the authority for this early French car manufacturer. The VIAF agrees, with cites from France, Germany and the Netherlands!
LibraryThing.com can provide a shortcut to improving a books' inventory. This is an online service that allows free entry of 200 books, with unlimited entry for $10/year or $25/life. One-by-one entry of books is simple, with only a partial author's name or title usually needed to bring up (match) a full set of data on the particular book from more than 600 worldwide libraries. When the match is made, the book in your online "library" will have extensive data, such as the ISBN number, Library of Congress number, etc. The service provides a batch entry (import) from an Excel file (converted to CSV) and a similar export. Therefore a "round trip" of an Excel file of books through LibraryThing will result in a near-perfect book inventory, with excellent descriptors for each book.
But perfection is not necessary! Files can always be improved, but when you reach a stage where you - and perhaps a reviewer or two - think you have a good (Excel) inventory, you should realize that you have created digital assets - the inventory file(s) are the assets and the item descriptors (matched to the fields/column headings) are the "metadata" for each item.
At this stage, you may have gained "enough organization" with the inventory lists for your personal goals/purposes. If so, you can recognize that you are likely far ahead of most car collectors/historians!
If you want to jump directly into creating (a) digital archive(s), jump to webpage Importing Files Into Greenstone. Or continue to the following Step Three.
Step Three - More on Metadata
The common definition of metadata is "data about data, but Wikipedia provides much more detail and clarifies this common definition. Highly recommended. A traditional card in the (old) library catalog files was all metadata - book title, subject matter(s), keywords, author, date, publisher, etc.
Metadata has been with us for a very long time, but this nomenclature gained great recognition in the early (pre-Google) days of the Internet, as words/terms in specific categories were "embedded" in web pages, visible only when viewing the HTML code. This webpage, for example, has "DC.Title" content="Creating Digital Archives for Car Collections" near the top of the code behind this page. Before Google used certain techniques, including indexing the full contents of webpages - to refine searching, these meta tags were the only method to classify and search the Internet. Because we do not yet have "everything" digitized and fully text-searchable (difficult for images!), using meta tags and searching through metadata will be important for many years.
As an example of metadata, having "Aunt Sally's Ford" penciled on the back of an old photo is better than nothing, but a fuller description probably should state "Sally Brown", "1954 Ford Custom", "Seattle World's Fair", "May 13, 1962", "mother's sister".
What is the "DC" in "DC.Title" above? It stands for "Dublin Core", a widely recognized standard set of metadata categories defined in a 1995 metadata workshop in Dublin, Ohio. You can consult Wikipedia for more background and the fundamentals of this standard, but it is more important to know there are standard categories for your metadata that will be used and recognized by librarians, archivists and software used to make digital libraries and archives.
If your Excel inventory has used consistent "descriptors" for your cars and other items, just changing the column headings to the appropriate DC category above will make bring the descriptors into the "Dublin Core" standard. In the Greenstone Digital Library software, metadata categories can also be created, such as "car.Make", etc.
Metadata can be "internal" or "external" to the item. The index card in a library card catalog is external to the shelved book; the title, author, etc. on the first few pages are, of course, internal. Similarly, the data in the Excel inventory files are now also external metadata. Other digital items, such as digital photographs or your Excel inventory have embedded (internal) metadata. In the Excel program's "File" menu, select "Properties" and your name will appear as "Author" and your company name may appear as the "Business". Other Microsoft's Office software similarly embeds metadata also through the "File" and "Properties" menus. Many other digital formats have extensive metadata embedded in their files, but not readily visible.
Viewing Embedded Metadata with Windows Explorer
In the "View" menu in Windows Explorer, there is a "choose details" selection. This opens up an enormous list of metadata that can be added to that folder's view and/or replace the existing file data that is the normal default. Changes made will remain for the particular folder being viewed when it is next opened.
However, the Windows Explorer "menu" me be hidden by default. This link explains how to add the menus (worth having):http://windows.microsoft.com/en-us/windows7/change-folder-options.
Go to the second tip under "To change advanced file and folder settings". When this is selected , there are checkboxes for options to open multiple windows and many other useful features.
Alternatively, click the blue help/question mark on the top, far right when Explorer is open. This will give a screen that states "Working with the _____ folder". Further down this screen is a live link to "change folder options". Next, "click to open folder options", then the "view" tab at top. That will open a screen with all the checkboxes for options. The second checkbox "always show menus"
Dublin Core Metadata Example
The screen image below is a sample index made using the Dublin Core categories. The material consists of two items of personal correspondence and the articles in the "Chain Gang Gazette", issue #160, a publication of the Frazer Nash Car Club. This example is incomplete.
The "subject" and "description" categories remain difficult to define and distinguish although this topic has been researched. Perhaps more experience will bring clarity and "archive standards' to these categories.
The trial/test accession numbers for the Gazette articles are based on the year and issue number - the numbers are used for "source" and "resource identifier". This is better explained on http://CarLibrary.org/FN- archive.htm. The physical Gazette and the pdf scan use the same identifier. The accession number was extended to use page numbers for each article.
The entire Gazette issue was scanned, yielding accurate text recognition with the ABBYY FineReader OCR software. This scanned issue was added to the Frazer Nash archive in Greenstone.
An index exactly like, or similar to, this should be useful for many purposes. It may be too detailed for the current Greenstone Frazer Nash archive, but it is a good step if a professional Collections Management System (CMS) is planned.
click on image for full-size version
Music (MP3) files will typically have "Album", "Artist", "Date", "Genre" embedded, which is the source of this information on most music players - computers, tablets, phones. Digital photographs typically have hundreds of metadata items embedded, visible in photo organizing software such as Google's Picasa. Documents scanned to PDF and others formats may have only a few items embedded, unless the operator and scanning software have taken active steps to identify the documents through metadata.
Google announced on February 12, 2016 that Picasa, its desktop photo editing and management program, would not be supported after March, 2016. The Picasa Web Albums, the online feature of this program, would transition to Google Photos.
This webpage recommends using Picasa for basic photo captioning and metadata tagging, including location tagging. "Desktop" Picasa will function indefinitely for this use. Software ("app") program recommendations will be updated as replacements become known.
What is the significance of embedded metadata for archive organization? In the best cases, the embedded metadata may be more accurate than metadata added later, it may eliminate some data entry steps and it is (near) permanently attached to the item. Any amount of embedded metadata can be used in a digital archive to classify the item. Embedded digital image captions and keywords can be the only metadata necessary to build an archive. This webpage, "Digital Photo Identification", provides guidance for creating an archive with only embedded metadata.
Excel spreadsheets, when used as a "database" for an inventory is very flexible and transparent but adding normal database functions, such as searching (query) or linking (through lookups) can be complex. Microsoft's Access database is directly compatible with data from an Excel spreadsheet and has been used for hundreds/thousands of robust business functions, especially inventories.
Not only is Access very likely to manage large archive inventories, but it can also be the "front-end" (interface) to higher-end database engines with enterprise capabilities such as SQL Server, Oracle and Sybase. Access has been a Microsoft product for more than 20 years and there is a very large community of developers and consultants for professional assistance.
The basic abilities of Access to link tables (such as separate Excel spreadsheets), to provide pre-written or ad hoc input screens, forms and reports allows a collector/historian to expect reduced data entry, better file security and the potential for unlimited "custom" reports and file exports.
"Linking" tables is done on a unique "primary key" field. Access can create this automatically or an existing field can be designated. The Accession Number of the records in the inventory works well for this purpose.
As an example of linking, after separate "events" and "cars" Excel files were imported into Access, a custom list was easily produced showing, by date, all the events (races) for each car. This file was readily exported to a new Excel file which, in turn, could be imported into digital library software.
This webpage has examples and guidance for using Access to manage lists of cars, owners, events and collections.
The topics and steps above are about the physical items in a collection. In this digital era, a collection is likely to have the digital "clone" of those items or unique digital objects, such as a digital photo. While these are also "physical" in that they are bits and bytes on a storage medium, they are a unique collection category.
What special steps are necessary for digital assets?
Digital photographs initially have cryptic (non-descriptive) file names. The choices for easier identification of these images are to re-name the image files (wholly or partly) or add metadata to the file, as described in the Digital Photo Identification webpage. A program I have used, "ReNamer", provides a method to add more data to the file names in a group of image files, such as accession numbers.
Picasa can be used to add metadata to digital images one-by-one but the more capable ExifTool can add, delete, modify the metadata for a group (directory) of image files by "command line" operations. The ExifTool can produce a full list of image files in a directory with the metadata listed for each file. This list, with minimal further processing, can be directly imported into the Greenstone Digital Library software.
Neither renaming or adding metadata is necessary if the image files will be used in the Greenstone digital library software, which adds (external) metadata to each image - or group of images - in the "Enrich" function.
Should any of the re-named digital images include the same "description" as the actual object (car) or use the Accession Number of the object? For example, if the digital image is a photograph of a specific 1955 Chevrolet, the only good reason not to rename the image "1955Chevrolet_xxx".jpg is perhaps to preserve the identity of the original image/file. However, having a standard procedure to archive all, unedited digital images to a separate storage location overcomes this concern.
Should the re-named digital images include all or part of the Accession Number of the original object? As an example, if the car has Accession Number "19184.108.40.206", should we name the digital image of this car "1975125-1.jpg" (or similar)? The archive community seems to be split on this issue. One camp holds that every archived item receive a unique, consecutively assigned Accession Number. Another opinion holds using the original Accession Number with suffixes is acceptable.
Digital File Preservation and Copies
It is inevitable that any collection of digital images or scanned documents will have multiple copies of the original image or scan, perhaps at different stages of editing or with different resolutions, each created for a specific purpose. Confusion may be lessened by establishing a standard file storage and naming process. For example, all "originals" are stored in a specific drive and folder(s). All copies are named with a specific suffix, which can include letter codes to identify purpose and/or resolution.
The Picasa digital photo editing/organizing program does not alter an original digital photo during its editing (cropping, etc) process until the image is exported to a separate folder or until a deliberate menu "Save" is selected. All edits are retained in a separate, proprietary file for each folder of images.
The Greenstone program uses copies of images and documents for its "Gather" (import) function, the files are not moved from their original folders/drives storage locations.
These Picasa and Greenstone functions - leaving original images unaltered - seem to conform to standard archive practice.
Here are further, tested recommendations for digital assets:
, the ExifTool or other software of your choice to put captions on each photo. Captions will be useful for later identifying the photo in many software program and also are embedded metadata.
b.If you use Picasa's, use "tags" to add "keywords" to each photo or group of photos.
c. If you add an Accession Number (unique ID number) to each photo, make this the first tag in Picasa.
d. Use the Picasa geo-tag function (red pin) to locate each photo or group of photos on Google's maps.
e. "Export" the photos from Picasato a new directory for use in Greenstone or other archive software at a resolution suitable for the archive's planned use.
f. The ExifTool software provides more functions than Picasa. Further, the ExifToolGUI will metatag more than one photo in a single step.
Scannedphotographs and other images:
ists recommend 400-600 dpi.
b. If"archive quality" is not a concern, scanning to JPG format is acceptable.
c. Ifprofessional archive standards or long-term preservation are concerns, scan to TIFF or PDF/A format. (Note: PDF/A is a relatively new ISO international open standard which is being adopted as an alternative to TIFF).
d.For JPG and TIFF formats, use Picasa to add captions, tags, geo-tags to each image, as described above for digital photos.
e. Export the images, as above, from Picasa. Picasa exports TIFF images to JPG format and the image metadata is preserved.
f. Picasa does not recognize images in the PDF/A format. If you use this format, you must use a PDF editor (Adobe Acrobat, ABBYY FineReader, or Lightning PDF Editor) or the ExifTool to add subject, keyword and other identifying metadata.
Scanned Slides and Negatives:
by a digital printer.
b. The same steps for scanned images apply, except the most common negative/slide format - 35 mm - should be scanned at 2800-4000 dpi. This resolution should be within the optical resolution of your scanner. Better quality scanners (usually those not under $100 or a scanner that is part of a "all-in-one" printer) will give better, near-archive quality results.Large collections of slides and negative should considering acquiring a dedicated slide/negative scanner, an upgrade from most flatbed scanners.
c. The software included with your scanner may be adequate.Test scan several slides or negatives and check the results to determine whether you need a software upgrade or better scanning software. VueScan is recommended by many.
Greenstone is an open-source program (free!) developed at the University of Waikato, Hamilton, New Zealand. This project has easily customized it to be "car data friendly", with the goal to promote interest, discussion and use by the owners of car collections, auto historians or car hobbyists.
Greenstone is potentially very powerful. See, for example, "PapersPast". It is a collection of more than two million pages of now defunct New Zealand newspapers, both text searchable and viewable as exact images.
If you have created good Excel inventory files, the "hard work" is done! These files can be imported into Greenstone in a relatively easy step-by-step process. The import creates a separate "nul" file for each record/item imported and includes all the "fields" of that item as metadata. Digital assets can replace the nul file by dragging each asset into Greenstone with its "Gather" function. Importing many assets with their matching inventory files is possible with a custom script file.
We have a basic step-by-step guide to importing an Excel file of car-related data into Greenstone. A second version includes examples of the computer screen for each step.
A better method of creating a Greenstone collection is to create embedded metadata for/on each item to be part of the collection. Greenstone will use this metadata to classify the item. The user interface can be made to search for the metadata in these categories. This technique is discussed on the "Trials and Tests with Picasa, Metadata, Greenstone and the ExifTool" webpage on this site. The technique is discussed on "The Frazer Nash Archive" webpage. Highly recommended and very efficient!
Although all the sample car archives now prototyped in Greenstone have not modified the basic Greenstone interface (the basic webpage appearance and layout of the search functions), it is completely capable of being customized, as seen by reviewing other examples or this particular archive of Hawaiian books.
Email me with any suggestions or questions! Bob Schmitt, email@example.com
September 14, 2013
Revised March 28, 2016