Blacklight: A Discovery Tool Born From The Digital Humanities

Gena R. Chattin Dr.
Youngok Choi LSC555 Information Systems in Libraries and Information Centers March 5, 2011 Blacklight: A Discovery Tool Born From the Digital Humanities Introduction It is not entirely surprising that, due to rampant dissatisfaction with their available library OPAC, the University of Virginia (UVA) worked to implement an open source, next-generation OPAC solution. Several other research libraries at larger institutions of higher education have either implemented or are working toward a similar solution. What seems interesting and unique about UVAs Blacklight solution is that it was developed not by the library itself but by digital humanists who, fed up with available search and discovery tools, took it upon themselves to write their own in their margins of spare time while working on other, humanities-specific discovery applications. These digital humanists were able to not only create an open source OPAC, but they achieved library buy-in through a lengthy process of deliberation with library staff after development had begun, and Blacklight now powers the main university library search at http://www.lib.virginia.edu. In this paper, I will review the literature on Blacklights creation and reception, explain the profession-wide factors that set the stage for its creation, detail Blacklights development both in terms of its history and its technical construction, and analyze the research on how Blacklight measures up against other open source and commercial next-generation discovery tools. Literature Review
When Blacklight appears in the literature, it usually does so in the context of comparison with other next-generation OPACs, and the mention is usually very brief. However, there are three substantial analyses of Blacklight, its development, and its functionality. In a very general vein, several articles address Blacklight briefly in the frame of a larger discussion on next-generation OPACs and/or open source development in libraries. A short 2008 article by Singer looks at Blacklight among other discovery tools in terms of their failure to open up their silos and to move beyond the MARC record as the primary source of contents. A 2010 article by Marshall Breeding (The State of the Art in Library Discovery 2010 looks not only at OPACs but ILSs on the whole and where both stand in terms of interoperability, access to content, and progress toward the library 2.0 idea. Another 2010 article by Breeding (A Time of Opportunity for the Library Programmer focuses less on the systems themselves than the skills needed to implement and sustain them. Here Breeding also encourages library staff to move forward on developing and nurturing these skills within the profession. Moving on to more Blacklight-specific literature, in a 2008 article, Bess Sadler provides an overview of the project, the technology behind it, and its driving principles. This article dovetails nicely with the 2007 chapter from Library 2.0: Initiatives in Academic Libraries in which Sadler collaborated with Nowviskie and Hatcher to extensively detail the development of Blacklight out of the Collex project by the NINES digital publishing initiative at the University of Virginia. Both truly set the scene out of which Blacklight was born and also give an overview of the systems used, tools created as a result, and the projects impact on future projects as well as the institution itself.
In a seemingly exhaustive 2010 article, Yang and Wagner compared seven open source and ten proprietary discovery tools against a checklist of 12 desired features in next-generation catalogs as derived from library literature, primarily articles and presentations prepared by Marshall Breeding. The authors ranked the tools evaluated both by individual feature and overall. On the whole, open source tools performed astonishingly well compared to commercial tools. Also, despite being a side project that was never intended to meet 100% of the criteria judged here, Blacklight finished consistently between the middle and top of the pack. Current Outlooks on Next-generation OPACs and Their Features Much recent OPAC development has been driven by the eighteen key search engine features absent from online catalogs as listed by Karen Schneider in her 2006 series, How OPACs Suck (Nowviskie, Sadler, and Hatcher 63). A few of these features are relevance ranking, word stemming, field and object weighting, spell checking, and faceting (63). There is a consensus in the literature that the ideal next-generation catalog should include most if not all of these capabilities, and new systems are evaluated based on similar lists. We are now several years into development and implementation of numerous nextgeneration catalogs, and the current outlook within the profession is less one of novelty than frustration that they have been so slow to appear in libraries and achieve full functionality. In his article The State of the Art in Library Discovery 2010, Marshall Breeding reviews successful changes in OPACs, their interfaces, and Integrated Library Systems (ILS) on the whole. He tries to switch the discourse from next-generation catalogs to discovery interfaces because they aim to provide access to all aspects of library collections, not just those managed in the traditional library catalog (31). This was certainly a driving force in the development of many
discovery tools, including Blacklight and its close relative Collex. With the wealth of resources now provided by research libraries, Breeding says that catalog records in an ILS are now only one part of the content a library discovery system needs to cover, and weve moved past the time where a traditional online catalog of an ILS should be offered as the primary search tool for library content (32). To retain users in a search engine world, library discovery tools need to reduce the number of confusing search boxes on their front pages, stop wasting information literacy instruction time on forcing users to learn publisher and database brand names, and move toward federated searching of individual journal articles as well as books, journals, and other library resources (32, 34). This call to discovery interfaces to incorporate resources beyond traditionally cataloged monographs and e-publications echoes Singers 2008 assertion that next-generation OPACs were still relatively closed-world silos intended to index MARC records (140). Singer accused these OPACs of aping the look and feel of famous Web 2.0 examples without truly incorporating all the interactive factors that made those sites revolutionary (140). In short, while much progress has been made, the fact that the literature still recognizes these aspects as lacks after several years is telling. There is still a long way to go to achieve the ideal discovery interface in libraries, but is ideal really necessary when the traditional OPAC is still so lacking as to be barely functional when compared to users needs? Blacklight by no means includes all the desired Web 2.0 features, but it addresses a dire need in that it improves search functionality to the point where it finally achieves the accurate, relevant, and diverse search result set from all the librarys holdings that library users have needed as long as OPACs have been in existence. Discovery is a matter of life and death for academic libraries. Much can be learned from the process that spawned this imperfect but highly functional tool at the
University of Virginia, which will be detailed in the next section, and future tool developers should consider and build upon their experience. An Introduction to Blacklight History and Development The Blacklight discovery interface at UVA sprang from Collex development. Nowviskie, Sadler, and Hatcher describe Collex as a larger knowledge-discovery, folksonomical, and online publication system developed for scholarly research communities (61-62). It was intended to federate, search, browse, collect, allow comments upon, and allow end-user remixes (i.e. bibliographies, syllabi, reading packets, etc.) of peer-reviewed scholarship (62). It included most of the Web 2.0 tools so coveted by next-generation OPAC developers: tagging, feeds, and suggestions of other related material (62). The impetus for Collex had very little, in fact, to do with libraries themselves but, rather, the failings of libraries coupled with commercial publishers to get scholarly publications out in an affordable and widespread fashion. Libraries cannot afford ever-increasing prices of journals and databases much less monographs, and thus publishers solicit fewer manuscripts from an ever-growing pool of academics. This situation, referred to by Nowviskie et al as the serials crisis, negatively affects the health and distribution of scholarly research as well as academicians efforts to maintain their employment status (62). Therefore, digital publishing is essential to the ongoing health of academic institutions, yet so few libraries provide discovery tools equipped to handle it. Collex and, subsequently, Blacklight, were designed by digital humanists to address this and other problems. Thus Blacklight (through Collex) did not entirely originate in a library environment, but librarians at UVA took notice. Further, development was inspired by North Carolina State
Universitys Endeca efforts and other similar projects in academic libraries (Nowviskie, Sadler, and Hatcher 63). Collex already implemented many of the features outlined by Schneider in the Why OPACs Suck series mentioned here earlier, and it already hosted a plethora of different types of objects (64). Collexs weakness, however, was that it was not built to handle MARC records or to address other library standards, and so despite noticing the overall promise of Collex, the library did not agree to fund OPAC development based on it (64). NINES project administrators saw potential benefit to the original project in developing this MARC-friendly offshoot, so Nowviskie and Hatcher were permitted to work on Blacklight as a side project with no extra time or funding so long as it didnt adversely affect their other work (64). The authors called their collaborative effort to complete Blacklight a skunk works team, drawing from the terms Wikipedia description to define it as an unofficial, cooperative working arrangement (Nowviskie, Sadler, and Hatcher 68). It is this collaborative relationship between digital humanities and library staff and faculty that they credit many times with allowing the project to not only thrive but succeed and result in several long-lasting and beneficial changes in UVA library and digital humanities workflow, training, and organization. Even with the lack of funds and dedicated employees, relationships were built between departments, and although the increased number of stakeholders slowed the project somewhat, the end product was better for it in the long run and satisfied a greater number of users (68-69). Technological Underpinnings Blacklights two fundamental technologies are the Solr search server and the Ruby on Rails web application framework (Nowviskie, Sadler, and Hatcher 64). As such, Apache Solr is used to index and search records while Ruby on Rails is used to create the front end (Sadler 57).
Ruby on Rails is being embraced by the digital humanities community at large for the same reasons it was chosen by Blacklight developers several years ago. The open source community supporting the framework makes it possible for staff to access training and documentation without the expense of a more involved language or, worse yet, the API of a proprietary software package (Sadler 58). Its ease and malleable syntax allow a greater variety of staff to work on the project (Nowviskie, Sadler, and Hatcher 65). From a technical standpoint, its object-specific behavior allows librarians to define behaviors that are specific to certain kinds of objects (for example, displaying track listings for music CD records, etc.) without having to understand other objects, indexing, or data models (Sadler 58). From the back end, Solr provides a stunning amount of data processing in an open source package. Because of the way and speed with which Solr handles data, a MARC record is no longer necessary for every item included in the librarys search interface (Sadler 58). Items not yet cataloged can be searched alongside of traditional bibliographic records. Lucene, the underlying technology of Solr, provides fast, relevance-ranked full-text search capability, while Solr adds caching, highlighting, and faceting on top (Nowviskie, Sadler, and Hatcher 65). Field weighting and relevance ranking in Solr do not require re-indexing of the data, thus resulting in greater speed and flexibility for the end user (Sadler 59). The two technologies together allow for the creation of controllers thanks to Ruby on Railss model/view/controller software architecture. These controllers define default catalog behavior, search fields, and relevancy ranking (Sadler 59). As a result, Blacklight developers are able to create different controllers for different types of searchable materials. For instance, because of the special search needs required by music materials, Blacklight began by adding a music controller to the general catalog controller so that music materials are displayed and
organized differently than standard print materials and are, therefore, easier for all to search and retrieve (Sadler 59). The challenge in implementing this technology, however, was in importing MARC data. This problem was solved in part by the development of SolrMARC, a Java-based program developed in collaboration with the VuFind project that maps MARC fields into a Solr schema and indexes MARC records into Solr (Sadler 64). Using this tool, MARC records are exported and indexed into Solr, and mappings can be adjusted in a SolrMARC properties file (Sadler 58). The Solr index is updated nightly based on a report of additions, changes, and deletions from the ILS (Sadler 58). Custom Ruby scripts are written to handle non-MARC items (usually records using other metadata standards), and full text is indexed into a Solr full text field (Sadler 58). How Blacklight Measures Up When compared against other discovery systems by Yang and Wagner, Blacklight took some hits in the rankings because it was not designed with certain social features such as personalized recommendations, user contributions, and integration with social networking sites (699). This lack is, in part, intentional. The Blacklight team stopped short of including usercontributed tags in Blacklight, though this functionality is present in Collex, because it was beyond the scope of their project and available staff, time, and resources simply didnt allow it (Nowviskie, Sadler, and Hatcher 67-68). In other categories, such as faceted navigation, suggested corrections, and enriched content, Blacklight ranked well (Yang and Wagner, 699). Conclusions Given what the Blacklight team set out to achieve and their dearth of resources, they accomplished something truly noteworthy. They were able to incorporate desired discovery
features into the interface, which was eventually adopted by the library. The interface allowed greater access to siloed materials such as TEI documents, EAD finding aids, ILS materials, Fedora repositories, and other various silos of digital content (Sadler 64). In the short term, however, it is surprising to note that the product did not meet all its required goals. It did not actually provide the required benefits to the NINES and Collex projects initially projected and, in fact, was somewhat detrimental to progress on those projects (Nowviskie, Sadler, and Hatcher 70). However, as a discovery tool on its own, it has been rather successful both at home and at other schools where it has been adopted. The project should lead information professionals to consider potential effects on their own organizations and ask hard questions about how and if such a solution could be implemented therein. The Blacklight team had to address these same issues. Nowviskie et al listed several questions that guided UVAs efforts: Is there staffing to support this project? Can expertise be spread rather than depending on individuals who may leave? Can systems and IT departments handle the extra burden? Who will be responsible for fixing what breaks? Will vendors release a similar product soon? Or does a comparable commercial or open source product already exist? (Nowviskie, Sadler, and Hatcher 68). Additionally, the proliferation of projects like Blacklight will require greater skills of library information system staff, but there are some warnings for libraries considering such a venture. In his article A Time of Opportunity for the Library Programmer, Marshall Breeding highlights two major concerns of libraries considering this route: Sustainable and supportable results must be ensured, and projects should not depend too heavily on the specific skills of any individual who may leave the organization at any time (36).
Ultimately, stakeholders at UVA were pleased with the end result, and Blacklight is still in use today. The software has also been implemented and Stanford and North Carolina universities (Yang and Wagner 695-696). The project is still a work in progress, however, and it is by no means the last discovery interface the library will ever know. Even now it must constantly prove itself against competitors, and its developers concede that it may eventually give way to a centralized solution like OCLCs Local WorldCat or another open source or commercial discovery tool (Nowviskie, Sadler, and Hatcher 68).
Works Cited Breeding, M. The State of the Art in Library Discovery 2010. Computers in Libraries 30.1: 31. Print. Breeding, M. A Time of Opportunity for the Library Programmer. Computers in Libraries 30.5 (2010): 34. Print. Nowviskie, Bethany, Elizabeth Sadler, and Erik Hatcher. Adapting an Open-Source, Scholarly Web 2.0 System for Findability in Library Collections (or: "Frankly, Vendors, We Don't Give a Damn."). Library 2.0: Initiatives in Academic Libraries. Chicago: Association of College and Research Libraries, 2007. 58-72. Print. Sadler, Elizabeth (Bess). Project Blacklight: a next generation library catalog at a first generation university. Library Hi Tech 27.1 (2009): 57-67. Web. Singer, Ross. In Search of a Really "Next Generation" Catalog. Journal of Electronic Resources Librarianship 20.3 (2008): 139-142. Print. University of Virginia Library. Web. 5 Mar. 2011. Yang, Sharon Q., and Kurt Wagner. Evaluating and comparing discovery tools: how close are we towards next generation catalog? Library Hi Tech 28.4 (2010): 690-709. Web.

Blacklight: A Discovery Tool Born From The Digital Humanities

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Blacklight: A Discovery Tool Born From The Digital Humanities

Încărcat de

Drepturi de autor:

Formate disponibile

Gena R. Chattin Dr.

S-ar putea să vă placă și