- Software for Collections

May 19, 2016

Software for Collections

This webpage provides an overview of several categories of software that can be used to create and maintain archive/collection inventories. Some of this software will also provide systematic access to collection documents and display digital photographs.  The software categories are:

1. Spreadsheets

2. Databases

3. Document/Content Management Software

4. Digital Library Software

5. Collections Management Systems

Several of these categories are described in greater detail on other webpage topics; return to the Table of Contents to access these topics directly.

1. Spreadsheets

The prior webpage topic, Archive/Collections Basics, included an introduction to using spreadsheets for creating an archive inventory.  Although a spreadsheet used for this purpose is essentially a "flat-file" database, the data (in a spreadsheet's rows and columns) is also in a "Grand Central Station" to which the data can be imported and exported (transferred) to many other software programs.  This section will describe those uses.  It will also suggest a few spreadsheet functions that should be understood and used to improve the archive/spreadsheet data.

Spreadsheet Uses

A spreadsheet provides a basic introduction to a simple database-type inventory; it is easy to use and understand.  However, a spreadsheet can become unwieldy if there are more than a few hundred records (rows) and 25 columns (fields).  Further, the data is potentially "fragile" - easily erased or deleted by unintentional keystrokes or faulty assumptions.  It's always a good idea to backup a spreadsheet to a separate file or import it into a simple database.

In spite of these drawbacks, spreadsheet use can bring a "bonus feature" to an archive that lists many digital files, such as photographs or documents.  As described in detail on the Metadata webpage, the open-source software program, ExifTool, will "read" an entire folder of digital files, including subfolders, and create a Comma Separated Values (CSV) file that lists each file with all, or specifically selected, metadata of that file.  A CSV file is easily opened with a spreadsheet program (e.g. Excel) and then may be saved in a chosen format ("XLS").  This "read" use of the ExifTool does not affect the digital files being "read".

In practical terms, if a collection already has hundreds (or thousands) of digital photos (or PDF documents), a nearly "instant" inventory can be created; it's not necessary to enter data line-by-line for each photo or document.  The new inventory record (row) in the spreadsheet file will have, at a minimum, the file name, file type, size, and full directory listing.  Other (embedded) metadata types can be chosen when drafting the ExifTool command and then executing it.

More data (metadata) can be added to each record in the spreadsheet's empty cells or new fields/categories (columns) can be added.  After these additions or corrections, a very good inventory of these files is the result.

In an optional third step, this improved spreadsheet inventory file (i.e., "XLS") can be saved as a CSV file. Next, the ExifTool is run with a different command set to "write" (embed) the revised data into the original digital photo or document files.  The ExifTool will save the original files with a new "Original" file extension.  The files (with the original file names) can be checked to confirm they have new and more accurate embedded metadata.  These improved digital files can be used directly by the Greenstone Digital Library software to make a "collection".  The important classifying/identifying metadata for each file is safely stored in each file; it can be viewed by Windows Explorer or read and listed by the ExifTool

Further, the ExifTool initial "read" operation can be executed again to make a new spreadsheet inventory.  Thus, in turn, can be used as an imported file by many database or Collections Management System programs.

Useful Spreadsheet Functions

Sorting - this will group "like kind" records together to ease checking data for consistency.  Copying cells should fix errors.  To preserve the original order of the data file, create a new column and use "Data Fill" to make a list of numbers before sorting.  After sorting to fix any problems, sort on this new column to return to the original record sequence.

Search and Replace - use this function to make data consistent.  For example, search for "GM" or "GMC" and replace with "General Motors Corporation".

Data - Text to Columns - use this function to split text records into columns, forming a more useful database. Use "search and replace" to insert a "dummy separator" and then split the data with Text to Columns.  For example, if numerous records have a single cell containing "ChevyMalibuGMC", you can replace "vyMA" with "vyqqqMA" and then assign "qqq" as the "Delimiter Other".

Function - Text - Concatenate - "Insert" this function to combine data from separate cells into a single cell.  A vehicle model name may have been split by error, such as "Targa" and "Florio".  If there is a series of these errors, similar records can be grouped by sorting and this function copied into an inserted column.  Insert an adjacent column, then use the "Paste Special - Values" command to get the actual text.  The original columns and the column with the text function can then be deleted.  As an example, the Accession Number for the Frazer Nash spreadsheet was created by combining "Year" cell data, a period (".") and the Serial Number for the prewar cars.  The postwar cars needed an extra step to split off the last three digits from a longer Serial Number, such as "421/100/119".  This was done by the "Text to Columns" menu choice, as described above.

2. Database (Management Systems)

Wikipedia has this definition:

"...a "database" refers to a set of related data and the way it is organized. Access to these data is usually provided by a "database management system" (DBMS) consisting of an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized.

Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it.

Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index)..."

Database Management Systems (DBMS)

Popular DBMS programs that will import spreadsheets include LibreOffice Base, Microsoft's SQLServer, Access and Visual FoxPro, FileMaker Pro, and MySQL.  All of these programs provide tools to create user interfaces for data entry, searching and reporting and provide "safer" storage for collections inventory data than a spreadsheet.  However, a DBMS will require skilled work to create the user interface, search and report formats.  There will be a continuing need for skilled maintenance of the database.

Database Example

Car collectors typically accumulate a history of the car(s): photos, documents, books, magazines, videos, etc. directly for the particular car(s) or the marque. whether it is one car or hundreds of vehicles. Car clubs almost always have list of owners, their cars, former members and owners. Journalists, historians and various organizations involved with competition or exhibitions are greatly interested the "provenance" of a particular car - when was it manufactured, what color was it, who owned it in any time period.  This background led to a project to create or describe database software that would organize vehicle, owners and other factual data - and related material in traditional and digital formats.

When spreadsheets do not seem adequate to maintain this data, database software is traditionally the next step.  This was the author's experience with data related to Frazer Nash cars - a marque with a postwar production of only 85 cars.  Over 30+ years of ownership, correspondence with other owners, their friends and historians provided names of the Frazer Nash car owners in the U.S. (16 cars!). This owner and basic data on the 85 cars was maintained on a spreadsheet.  

In 2008, a spreadsheet of Frazer Nash Car Club members and their cars (some owners had as many as four cars) was used, following recommended database design principles) to make separate "Members/Owners" and "Cars" spreadsheet tables. "Racing History", "Literature" and "Exhibitions" were added from other sources, as was data on the US postwar cars owners. These tables were imported to Microsoft's Access 2000 database. Menu interfaces were made so that queries and reports could be produced.  At that time, lack of experience with Access prevented links between owners, cars and historical events from being made.

The Access 2000 database was revisited in 2011 with a goal to link all owners, over time, with all prewar and postwar Frazer Nash cars.  The original owner and car data produced good results, but the research and data-entry required for all 425 Frazer Nash cars and owners is clearly a large task, best accomplished by the Frazer Nash Car Club members with access to the archives in England.  If this task was taken up and completed, the Access database could perform all traditional car club tasks - member information, event planning, mail lists, etc.

The Table of Contents has a link to the detailed Access example.

3. Document/Content Management Software

This category of software seemed to offer promise as a follow-on to reviewing "document management" and "enterprise document management systems", assuming this software type would be inclusive for all types of documents. Although there are several open source systems of this type (see Wikipedia for details) and they could be potentially a useful software category, these systems have been overshadowed by "content management systems" -  primarily oriented towards managing digital assets and web content where there is collaborative editing or a need to meet internal or legal requirements for preservation.  A content management system could be useful for an automotive collection or museum with an extensive public website.  Drupal and Joomla are popular, fully-featured open-source software systems designed to create and manage content for websites.

Because the initial basic need of a collection an inventory, further use of content or document management software was not considered.

However, because car collections/museums are likely to have a library, the next research area was "library management systems", which were expected to include both physical items (books!) and digitized media in their scope.

Two New Zealand library management systems/databases were discovered.  "Koha" is fully-featured software for the management of an entire library - the catalog, borrowers, acquisitions, etc.  Installation and use requires some time and experience, but Koha is reported to manage many libraries, including all public libraries in Vermont.  

4. Digital Library Software

The other New Zealand program was "Greenstone Digital Library Software", which seemed very promising for organizing automobile-related images, reports, videos, websites, history, specifications - nearly any item in digital format.  Greenstone is an open-source program developed at Waikato University, Hamilton, New Zealand.

A digital library can be a repository of the digital version of these items - copies of the actual digital photographs and scanned documents. A database will typically be only a catalog or inventory of these items.

The capabilities of this digital library software were confirmed after using the tutorials and workshops on the Greenstone website. was then started, the software was installed on an older PC and easily customized to be "car data friendly".  Greenstone produced three sample "collections", later placed on a self-running (data) DVD and sent to car collection managers, auto historians and car hobbyists to promote interest, discussion and use. 

A similar open-source program, DSpace, was jointly developed by MIT and HP, targeted for academic use.  The DSpace website states that more than 300 institutions, mostly in the U.S, use this system.  At the data-level, the programs are compatible.  Greenstone has a tutorial showing how to move a digital collection from DSpace to Greenstone and vice-versa.  Internet comments and comparisons of both systems seem to confer no advantage to either and note that meta-tag classification in a collection is preserved in either system.

A separate webpage topic provides detailed step-by-step procedures to make an auto-related Greenstone collection, including the preparation of digital photos, documents, articles, etc.  These files should be well-identified with captions and key words. This preparation will simplify adding documents and photographs to a Greenstone library/archive.  See also the Table of Contents for all three Greenstone topics.

5. Collections Management Software (CMS)

An important category of software for collections/museums are proprietary and open-source software designed for collection/museum management.  This category of database software pre-dates "digital archives".  In general, a CMS is more focused on the management of "objects" in a collection or museum - their source, descriptions, locations.  Objects can include documents, books and photographs and their digital representations.  Although CMS systems almost always include direct display (or links to) for Internet access, they typically do not permit "full text" search and display that is a great Greenstone feature and perhaps may be important to a digital library/archive.

More on CMS can be found at a link to the CMS webpage and a link to table of available CMS systems.

Email me with any questions!  Bob Schmitt,