Validation

One of the major challenges of a project as large and diverse as EAST is ensuring the availability of retained titles to scholars across the member libraries.  The retention of approximately nine million holdings across distributed collections throughout the region requires that there be a high level of confidence in the availability of those titles for use by all partners.  Verification of the existence and condition of retained holdings (known amongst shared print programs as “validation”) helps to build trust among partner libraries that retained volumes will, in fact, be available and usable.

In order to evaluate the statistical likelihood that a retained volume exists on the shelves of any of the institutions, the EAST incorporated sample-based validation studies. The specific goals of this study were to establish and document the degree of confidence, and the possibility of error, in any EAST committed title being available for circulation. Results of the validation sample studies help predict the likelihood that titles selected for retention actually exist and can be located in the collection of a Retention Partner, and are in useable condition. This research should also help to determine the initial and ongoing costs of conducting validation in circulating monograph collections. 

Process

Working with a statistical consultant, Prof. Grant Ritter of Brandeis University, and advised by the EAST Validation Working Group, the EAST team developed the following methodology for the study:

Provided a sample of 6,000 items for each of the 52 libraries for a total of 312,000 items to be sampled across the two cohorts
The sample was drawn from the data extracts the library had provided as part of the collection analysis
Developed a data collection tool that supported easy and efficient collection of data by workers [typically student workers] in the library
Provided documentation on how to check the library's sample against its local circulation system to eliminate items known to be checked out before items were checked on the shelves
Developed an easy to implement mechanism to provide a cursory check of the physical condition of the item as it was being handled by the worker
Provided documentation to the libraries to facilitate training of their local workers.

Working closely with the Validation Working Group, which had been appointed by the EAST Executive Committee, as well as OCLC’s  Sustainable Collection Services®, the vendor working on the EAST collection analysis, the EAST team completed preparation for the VSS by mid-February 2016 with the data collection taking place in early spring of 2016 for Cohort 1 libraries, and in the fall of 2017 for Cohort 2 libraries.

The Data Collection Tool for the Validation Sample Study

To support efficient data collection for the first sample validation study, the EAST Data Librarian, Sara Amato, worked closely with members of the Validation Working Group to design a web-based tool.  The tool supports download of the random lists of titles selected from the holdings records for each of the libraries participating in the VSS.  Prior to using the tool, librarians ran the randomized list of 6,000 titles against their current ILS to mark any items currently in circulation.  Accounting for checked out volumes from the beginning ensured only items which are expected to be found present will be downloaded to the tool.

To use the tool, a student or staff worker specifies the number of items they wish to check and those are loaded in location and call number order.  Using a tablet, laptop or workstation with an attached barcode, the worker then scans the first title in the list. If the item is present based on the barcode read, the tool automatically updates the status as present.  The worker is then asked to do a brief condition check of the item to determine whether its condition is poor, good/acceptable or excellent. Both documentation and a short video were developed by the EAST team to provide training to the workers to simplify this check.

Although access to wifi is required for the initial download of the items to be checked and to upload the list once the checking is complete, the actual barcode scanning and condition checking can be completed offline.  The tool is also designed to support multiple users simultaneously checking items in the stacks.

Since the workers were handling each item to perform the check, they were also asked to perform a check of the condition of the physical item based on a three point scale: 1 - Poor 2 - Average 3 - Excellent.  This data was entered into the data collection tool following the scanning of the barcode and required only minimal additional time on the part of the workers. A training video for the workers was made available to assist and to better ensure consistency across the libraries.   

Because the data collection tool reported back results immediatly as groups of items checks were submitted, the EAST team had up-to-date access to the availability metrics across the EAST libraries.  Upon completion of the data collection, results [anonymized except for reporting to the local library] were shared across the EAST membership.

How the Validation Data Collection Tool Works

The data collection tool allows workers to check the shelves to determine whether or not items expected to be available on shelf are present.  Items which had been identified as being checked out or missing were not presented by the tool for shelf validation. For items that were not found on the shelf  but were expected to be present, the worker can perform a manual check against the catlaog using the tool. Whenever possible, the data downloaded to the tool included barcodes so that the workers could use a barcode reader attached to the laptop or tablet and simply scan the item's barcode to confirm presence.

As shown above,  once a worker can scanned the barcode, the tool will record it as "Present".  At that point, the worker performs a cursory examination of the physical item and selects the description that best reflects its condition: Excellent, Acceptable/Good or Poor.   Further details on the tool, including access to the open source software are available on GitHub. 

Results

Overall, EAST can report a 97% availability rate.  

The aggregated results from both cohorts (312,000 holdings across the 52 libraries) showed:

97% of monographs in the sample were accounted for: mean: 97%, median: 97.1%, high of 99.8% and low of 91%.  (Note: “accounted for” includes those items previously determined to be in circulation based on an automated check of the libraries’ ILS.)  
2.3% of titles were in circulation at the time of the study
90% of the titles were deemed to be in average or excellent condition with 10% marked as in poor condition.  Not surprisingly, older titles were in poorer condition.

A few notable observations include:

Items published pre-1900 were in significantly poorer condition - some 45% of these items ranked "poor" on the condition scale
An item being in poor condition was also somewhat correlated to its subject area
The most significant factor for an item being missing was the holding library.

For a full account of the results, including correlations between age and subject matter to condition of materials, see the Descriptive Analysis of the Validation Sample Study provided by Professor Grant Ritter for Cohort 1, Cohort 2 and a combined analysis.

Validation Sample Study A Deeper Dive into the Data

The Validation Sample Study with Cohort One libraries took place concurrently with the collection analysis being done with SCS. Unfortunately, this meant that the results of the validation study were not yet available to inform the collection analysis model being used to determine which titles libraries would agree to retain (a full description of the retention models is available under the Collection Analysis section of this website).   Of the 240,000 items sampled in the validation study, 92,575 subsequently received retention commitments, providing a large enough sample of the ‘collective collection’ to do statistically valid predictive modeling.

Using data from the Cohort 1 Validation Sample Study and data on the full holdings of the 40 EAST Cohort One libraries, Professor Ritter identified 77,925 titles (.01% of the collective collection at that time) as having a greater than 7.5% chance of being missing or a greater than 50% chance of being in poor condition. These were titles with only one copy being retained by EAST which had unallocated surplus copies at other EAST libraries. These titles were then provided back to the member libraries holding surplus copies with the highest validation scores as potential additional retention candidates.  The majority of these additional copies of at risk titles were accepted as additional retention commitments at Cohort One libraries.

Some 5,000 of the 77,000+ titles were not accepted by the owning library as additional retentions for various reasons, (e.g. not on shelf or the owning library was not willing to take on additional retentions), and approximately 9,000 of the titles determined to be at risk had no surplus copies in EAST Cohort One.  These were eventually passed on as potential retention candidates in Cohort Two, where most were able to be retained.

This is, to our knowledge, the first time that validation sampling has been used to inform retention modeling for shared print.  It is hoped that this type of data can help inform future shared print retention models.