July 21, 2016

Inventory Basics

A spreadsheet (Microsoft's Excel, LibreOffice Calc, etc.) is an excellent, but basic, method of recording a collection inventory.  For many collectors, a spreadsheet will satisfy immediate needs.  However, some collectors will need a more capable inventory system and may decide to use database software (File Make Pro, Microsoft's Access). A collector should expect a straight-forward import from a spreadsheet to a database program.  Any program that does not offer ready import and export to standard file format should be rejected.

Commercial and open-source software systems designed specifically for museums and collections - Collections Management Systems (CMS) - should also allow a direct data import.  However, the "data" in well-designed  spreadsheet can be enhanced and recognized as "metadata" for each collection item before an import. Whether used only as a spreadsheet, exported to a database or later used for its metadata, spreadsheet data can be recycled. This will eliminate the need for new data entry and perhaps postpone decisions on alternatives to the basic inventories on spreadsheets.

Data can normally be imported into a spreadsheet, database or CMS  from an older or different database program, such as dBase, FoxPro or File Maker Pro, etc., Data can also be exported into a file type compatible (usually Comma Separated Values - "CSV") with the spreadsheet of choice.  

However, a spreadsheet should be "well-designed".  In brief, each item (for example, a car, a car part, book or photograph) must be on a single a spreadsheet row. Do not leave blank rows. Each characteristic, "Make", "Model", "Year", etc. should be a separate column heading.  In database terminology, each row is a "record" and each column heading is a "field".

Each type of asset - cars, books, photos, owner records - should be listed in a separate a spreadsheet file.

An "Accession Number" should be assigned to each record, usually in the first column.  Wikipedia explains this: "In libraries and museums and other archives, an accession number or catalogue number is a unique, usually sequential, number given to each new item acquired, as it is catalogued."

Recognize and assign metadata ("data about data") to the records in Dublin Core categories (the widely recognized standard set of metadata categories.  Note there are other "standard" categories for metadata - choose one!

Note that a spreadsheet is a good central repository for records and their data.  Spreadsheets can provide a "bonus feature" for archives with many digital files - photographs or documents.  As described below, the free software program, ExifTool, will "read" an entire folder of digital files, including subfolders, to create a Comma Separated Values (CSV) file, listing each digital file with all, or specifically selected, metadata in each file. The CSV file can be opened with a spreadsheet program (Excel), then saved in a chosen format.  This ExifTool "read" use has no effect on the digital files being queried.

Collections/museums following "best practices" should consider creating a physical (peper) document - a "catalog sheet" - for individual (or groups of) photos and documents that will have a summary index of the physical file system or other storage.  This document can be viewed as a true "archive backup" which will likely survive any evolution of a computer-based inventory that may become obsolete.  A duplicate of these catalog sheets should be kept in an alternate, secure location.

One technique for creating a catalog sheet would be to make this a report from a simple database system.  Each item that should appear on the catalog sheet would be entered into the database as a record.  Because of the summary nature of a catalog sheet, this entry would be less detail than would be on a true CMS.  When the group of photos or documents is complete, the database would issue a report command to produce the catalog sheet.  The open-source Libre Base (part of LibreOffice) or Microsoft's Access programs could be used for this function.

Creating Image and Document Digital Files

Digital Photographs:

a.  Use the ExifTool, Picasa, or other photo editing software to put captions (metadata) on each photo.  Captions will be useful to later identify the photo in many software programs - this embedded metadata can be re-used.

b.  With Picasa, use "tags" to add "keywords" to each photo or group of photos.

c.  Use Picasa's geo-tag function (red pin) to locate each photo or group of photos on Google's maps.

e.  "Export" the photos from Picasa - or ensure the photos are saved - to a new directory for use in Greenstone or other archive software at a resolution recommended for that archive.

f.  The ExifTool software provides more functions than other programs tested to date.  Both the ExifTool and the ExifToolGUI will metatag more than one photo in a single step.

Scanned photographs and other images:

a.  Scan at least at 300 dpi; museum/archive best practice recommends 600 dpi.

b.  If "archive quality" is not a concern, scanning to JPG format is acceptable.

c.  If professional archive, best museum standards or long-term preservation are concerns, scan to TIFF or PDF/A format. (Note: PDF/A is a relatively new ISO international open standard which is being adopted as an alternative to TIFF).

d.  For JPG and TIFF formats, use software to add captions, tags, geo-tags to each image, as described above.

e.  Some photo editing software and Picasa do not recognize images in the PDF/A format.  If you use this format, you must use a PDF editor (Adobe Acrobat, ABBYY FineReader, or Lightning PDF Editor) or the ExifTool to add subject, keyword and other identifying metadata.

Scanned Slides and Negatives:

a.  If you have the original negative or slide for any image, scanning the slide or negative directly will almost always give better results than scanning the photo previously printed in a darkroom or by a digital printer.

b.  The same steps for scanned images apply, except the most common negative/slide format - 35 mm - should be scanned at 2800-4000 dpi.  This resolution should be within the optical resolution of your scanner.  Better quality scanners (usually those not under $100 or a scanner that is part of a "all-in-one" printer) will give better, near-archive quality results.  Large collections of slides and negative should considering acquiring a dedicated slide/negative scanner, an upgrade from most flatbed scanners.

c.  The software included with your scanner may be adequate.  Test scan several slides or negatives and check the results to determine whether you need a software upgrade or better scanning software.  VueScan is recommended by many.

Scanned Documents:

a.  If text recognition (and later text searching) is not a concern, scan as described above for images.

b.  However, text recognition should be considered important!  Therefore scan to PDF, multi-image TIFF or PDF/A at 300 dpi or higher.

c.  Process each document with good optical character recognition (OCR) software such as Adobe Acrobat, ABBYY FineReader or other software which has been tried and proven.  Documents which have been scanned to 600 dpi and the TIFF format (best museum practice) can be OCR-converted by Acrobat of FineReader singly or in a batch mode.

d.  Add identifying information in the OCR software after the OCR process stage.  This information will be located in the "XMP" metadata category.

Use Embedded Metadata

1.  Use the ExifTool from the command line to read entire folders/subfolders of photos.

2.  Review the resulting a spreadsheet file ("save as" from the CSV output file) to determine which metadata categories will help organize these collection assets.

3.  Dublin Core categories should have high priority, especially the "DC-Identifier" category which will prove very useful if used for a unique "accession number" for each digital asset.  Although the documentation for the ExifTool states that new (i.e., car-specific) metadata categories can be created, this is complex.  An accession number would be the best method to link the default embedded metadata categories (Dublin Core and similar) to car-specific categories that can be more easily created on a Collections Management System (CMS) and/or Greenstone.  Using accession numbers is also a museum "best practice". 

4.  Use the a spreadsheet copy/paste functions to file in missing metadata.  Use the a spreadsheet "data fill" command to create accession numbers in the DC:Identifier category.

5.  Use the ExifTool from the command line to write the Excel file (in CSV format) back to the entire set of  photo folders/subfolders.

6.  At any future time, use the ExifTool again from the command line to read these "metadata-updated" photo folders/subfolders to create data-rich Excel files for import into a collections management, content management or digital library (e.g. Greenstone) software program.


A free online training course, "Digital Libraries, Repositories and Documents" is very useful to learn terms, practices and steps to create a digital library.  Regular reference to this site and scanning the lessons can be very helpful. The module is described:

"The module covers the processes relevant to the creation and management of digital libraries and repositories, including digital file formats, metadata management, database management and the preservation of digital information."

A comprehensive reference book is "How to Build a Digital Library, Second Edition" by Ian H. Witten, David Bainbridge and David M. Nichols.  Reviews of this book note it is suitable as a university text for digital libraries/archives.

About two-thirds of this book is an excellent introduction to the concepts, history and issues of digital libraries - all relevant to the tasks needed to manage a collection.  The remaining parts of the book are Greenstone Digital Library  tutorials.

