CarLibrary.org - Using Greenstone Digital Library Software

April 20, 2016

This webpage provides a step-by-step guide to using the Greenstone Digital Library software as an organizing system for a automotive "collection".  It is the first of three CarLibrary.org webpages on Greenstone, showing different methods using version 2.86 to make such a collection.

The Greenstone Digital Library software (open-source/free) from Waikato University, Hamilton, New Zealand, is very well suited to creating a collection's "digital repository" for cars, artifacts, photos, books and documents.  These items may be in a digital format or added later if digitization is planned.

Greenstone collections are much more than a "library" - they are a "digital repository" for a Collection, Archive, Library or Museum (CALM).  Collections managers/owners, archivists and curators know, through experience, their basic requirements for organizing, irrespective of the nature of their "items".

A digital repository can be used as an internal reference by a collector or curator on a single PC, for the collection's staff on a network, or made available on the Internet, totally or as subset, entirely open or with access restricted to named users with passwords.

Although there are other type of software, open source and commercial, which provide similar organization, search and display functionality, Greenstone is proven, professional and powerful.  Using Greenstone eliminates some cost and training hurdles and makes the "first step" of repository organization very accessible to anyone with modest computer skills.

Greenstone is extremely robust; see the examples of completed Collections at http://www.greenstone.org/examples  The largest collection, with a reported 1,000,000+ images, is an extensive archive of New Zealand newspapers, both as images and full text.

Recommend preparation before building a first Greenstone collection is to review (or download) the Greenstone tutorials and workshop courses.  The lessons of the 4.5 day Workshop are also recommended - this is a very well-designed course which shows how to use meta-tags to identify items in a collection. 

This "how-to" webpage cannot replace a Greenstone tutorial (the Greenstone wiki has links to the tutorial). There is a similar tutorial in the excellent book, "How to Build a Digital Library" (2nd edition).  This book includes much useful, general material towards library creation, such as international standards, types and uses of metadata/metatags.  Part 2 of this book is the Greenstone tutorial.

Step One includes a few basic organization suggestions for putting the cars and "items" on lists.  This step may seem redundant with previous webpages in the CarLibrary. org series, and gathering, identifying, and roughly classifying cars and items is tedious and time-consuming, but this single step may be sufficient for a collection.  The entire collection (i.e. "everything") need not be included in Step One - items can be added later.  If the benefits of Step One seem useful, further steps with even partial lists/inventory will create a Greenstone digital repository.

Step Two is an overview of Greenstone operations and features.  

Step One - Building and Improving Inventory Lists

This Step assumes some skill with Word tables or Excel files to produce lists/databases that can be imported into Greenstone.  Although collection data can be directly entered into Greenstone and thus avoid this step, experience has shown there are benefits from making lists:

1.  Excel makes it easy to view all the data in a single worksheet and then sort, copy and past data.  Your data can  become consistent and uniform.

2.  Excel can be a "universal standard" for collecting data from lists created by other programs.

3.  The data of the collection can be reused in other applications when exported ("saved as") from Excel, such as in a local or online database.  Greenstone does not have good export functions.

If you are using index cards or paper-based inventories (lists) for your cars, books and documents, these must be typed in Word or Excel (or scanned with OCR) to create digital files similar to this:

Identifying All Objects Uniquely

Does the car collection use "accession numbers", "object ID" or "catalog number" for the cars and items in the collection. What are these?  An accession number is a standard museum procedure, used for at least 150 years, to uniquely identify every object in a museum.  This is traditionally done in a ledger book for permanence and and establish ownership, but these ID numbers should be used in a digital repository also.  Greenstone provides an easy method to relate a document to a particular car, so a photo of a car may have "2008.3" as an accession number, but the record for the photo can also point to "1951.119".

A system of unique identifiers for the collection can be made in Excel (spreadsheet) in a semi-automated manner as described on the "Spreadsheet" webpage.  

Field Names/Metatag Categories

A table for the vehicles may contain these fields/catagories:

car.ID
car.Manufacturer
car.Make
car.Model
car.Year
car.Serial_No
car.Country
car.Keyword
car.Location
Engine No
Engine Type
Date
Reg No
Former RegNo
OriginalRegNo
OriginalColor
First owner
OwnerNo

Classification Introduction

Before attempting to "classify" beyond Title, Creator (author) and Description, some research or consultation with a librarian is useful.  As a first step, look at the description of the "Dublin Core" a standard for digital metadata. Next, check the official Dublin Core website, a good starting point, especially topic 4. "Elements" at the bottom of that webpage.  Even the basic categories such as "Title", "Description", and "Subject and Keyword" can be confusing and tedious to correct after import to Greenstone if there are many records.  It's best to get it (mostly) right in the beginning.

But avoid getting too deep in research: "The Perfect is the enemy of the Good (enough)"  - Voltaire

How Many Records?

How many records should be created in a first collection?  If there are no digital files and the collection has 50 cars, create records for all.  If there are about 100 photos or books, make records for all. For larger quantities, create about 30% records of all items.

If you have a Word table or Excel file of all the cars, improve it as much as possible, using the fields listed above to make it as complete as practical for an import.  The "fields" in Word or Excel become "meta tag" categories in Greenstone.  Once any "car make" is imported into Greenstone and in the digital repository, the term is available in a pick list to classify any other Austin that is added, or any book, photo, document with Austin content.

Although the examples above seem to show that the "cars" table is different from other "photos, books and documents table, a single table is all that is needed.  Trial Greenstone car collections have added the "car" metadata categories to the Dublin Core categories and a single collection is the result.  However, separate collections can be made and combined in the final Greenstone collection at any time.

Step Two - Making a Greenstone Collection

These are basic steps to create a collection with the Greenstone Digital Library software:

1.   Define a "Collection".

2.   "Gather" the material by dragging the images/documents in from local drives to a window. This is an introductory video which shows the basic steps to add photos, documents, etc. to the collection that has been defined.  This video uses the trial "Frazer Nash" collection, but the steps are the same for any collection.

A second video (11-minute "how to" add records) shows photographs added to the Frazer Nash collection, metatags added to each of those photos.  It also briefly shows how the Excel source for the "Frazer Nash Owners" collection was imported to Greenstone.  This operation is desribed in the next webpage topic.

3.   Optionally "Enrich" each item with meta-tags, a traditional librarian function. Numerous (new) meta tags can be added to any item. (Initially, existing EXIF metadata from digital photos may not be available, but PDF files are "gathered" with their existing metatags).  For the first trial collections, many metatags have been added to documents with the Enrich function.

4.   In the "Create" step, all items in the Collection are fully indexed.

5.   When Create is completed, "Preview" can be used with a web browser. Documents, including newer PDF files, are fully text-searchable. Numerous other index and search categories on meta-tags are available and can be pre-defined during the "Design" step.

6.   Searching and viewing documents and images is done with a web browser; the full image or document can be viewed by clicking on it from the search results.

New, unique metatag categories can also be created.  For auto collections, this avoids the confusion of which data to put in the standard "subject", "title" etc. categories  "Cars.manufacturer", "cars.make", "cars.year", etc. categories have been added to the trial car collections and can be used for direct searches or display of the items with these tags.

There is a Greenstone tutorial which describes how add new metadata elements with the Metadata Set Editor.  Either use this approach or click on the "Manage Metadata Sets" box in the lower left when you are in Greenstone's "Enrich" panel.  

Greenstone has functions to import nearly any type of file, change the web-browser interface appearance and change how the search and browse results are displayed.  This customization is not difficult through the Librarian Interface, but requires slightly more than very basic computer skills.  

Current versions of Greenstone (2.86 and 3.07) will also extract metatags from digital camera images. This is "EXIF" metadata, which includes many data categories, such as a photo's creation date, image size, camera make.  The advanced Metadata webpage describes how this data, contained in all digital files, can be used directly to make a Greenstone collection - and as imported data for many other programs.

Email me with any comments, suggestions or questions!  Bob Schmitt, rgschmitt@gmail.com