Skip navigation.

Full-Text Global: the Quest for the Holy Grail (for academics!)

PDF version

The One Great Scholarly Search Engine has become something of the Holy Grail of indexing.. (Larson and Arret, 2001 in Willinksy, 2006).

Full-text Global

Search Globally, Retrieve Locally

 

If you can’t find the Holy Grail, create it!

A White Paper proposing Full-Text Global

·        

The objective of this white paper is to propose a concept, business model and technology - Full-Text Global (FTG) - for further collaboration and development requiring expertise beyond that of the idea’s author. No special consideration is sought, other than inclusion in the project if reaches the next phase – seed funding for development. It is presented here to develop discussion around the concept in principle and how it could be implemented, inviting comment from those with varying expertise. Specifically, the goal is to work towards developing:

·         a single, comprehensive search application customizeable for all varieties of access privileges to online scholarly content encountered worldwide at libraries

·         ensure that this search application indexes with a logic of ‘most direct route to highest quality form of all available material according to the user’s preferences and privileges’ – including all OA material and all material accessed through library privileges.

·         Provide a search application indexing all titles and providing a filtered search to all full-text freely available materials (OA); available free for the ‘subscriptionless researcher’; an individual anywhere in the world with no special privilege beyond access to the internet.

·         Provide access to high-quality index and search to all people, taking advantage of increasing availability of gratis material through OA and other access programs that vary geographically.

 

The concept is geared towards peer-reviewed scholarly journal articles, however depending on demand it could be expanded to include all types of library material, with P/R research a filter option.

 

Rationale - Toward for a Commercial Open Source Software enterprise for a Federated Global Search of Scholarly Resources

v     Available to all individuals worldwide with an internet connection

v     Customizeable for Libraries worldwide that grant access privileges to members

v     Maximizing access and uptake of research in the Majority World

 

The world increasingly has access to full-text scholarly articles online, but as yet has no comprehensive and centralized search engine that can deliver full-text articles from the wide variety of routes to access that have emerged.   This is a particularly acute and distressing problem for the developing world, where the benefits of open access and concession access programs will largely be lost if scholarly material goes unused due to the lack of a straightforward search engine that assists the user to search-navigate-retrieve articles.  The rigour, quality and objectivity of any literature review aspect of any research project is compromised if any article is selected in on the basis of convenience sample, a distortion that is introduced when the search engine used is either not comprehensive or not discriminating enough and buries relevant research too far down to find.   Databases are limited by the contents in the database.  Search engines can index metadata harvested globally without technical limitations, but suffer from other limitations as follows:

a)      They may not be sensitive to the user’s access privileges by institution, country or region (particularly non-library search engines).

b)      They may not index open access full-text appropriately, particularly gold OA journals (particularly library search engines).

c)      They may not contain sufficient federated capabilities for filtering subjects, languages, peer-reviewed vs. primary source materials etc.

 

In essence, each provides a substantial degree of possibility of error and none are sensitive to access inequalities in a way that minimizes their impact.  The improvements provided by Serials Solutions 360, in that regard, widen the information divide by serving to solve only the problems of the largest research libraries globally.  There are two types of error and two types of access inequality that can be addressed by Full-Text Global, a comprehensive and customizeable global federated search engine for scholarship.  Subscription-rich patrons and subscription-poor researchers are prone to both errors.

 

Errors

1)      Type I error – False Positive - Convenience Sample. Search is not sensitive enough, articles are included just because they are available. This is common with Google Scholar. Another possibility is that the refereed post-print is available to the user by a less convenient route, but they use the more readily available version that is not peer-reviewed, thus wasting the value of the peer-review process.

2)      Type II error – False Negative - Article Not Found. The search leaves out articles that are relevant.  This common to proprietary databases that are limited by the database composition, one may be unable to retrieve articles in searches such as Scholars Portal, PubMed etc. where articles appear under same keywords in Google Scholar. With Google Scholar, many hits point to pay-per-view (PPV) access.  There is no instruction on how to retrieve the article by other means (library log-in, ILL, direct from author, document delivery service (INASP), etc.), so the article is passed up. Subscription journals and OA journals may not be retrievable within the same indexes and databases.

 

Access Inequalities

a)      Literature Access – Researchers in developing countries belong to institutions whose libraries typically carry few subscriptions.  There are within-country divides in the North and South as well, where smaller libraries, rural and remote areas are at an access disadvantage.  Open access creates a level-playing field for all who have internet access.  Concession programs for developing countries such as HINARI make available thousands of journals to research libraries, and eIFL and INASP programs also create routes to access through bargaining.   Much of this material requires access management, a technical hurdle for the South, both in set-up and in the context of low-bandwith internet access with frequent interruptions that disrupt the user who has to log-in. 

b)      Access to indexing and search – Along with the cost of subscribing to journals that is prohibitive, the cost of subscribing to indexing services provides a major hurdle for research libraries in the poorest countries in the world.  Despite the growth of OA and the concession programs, uptake is hampered by the problems of indexing – search, navigation and retrieval.

 

The movement for open access emerged out of the state of the publishing market in an online environment in the context of the ‘serials crisis’ (Willinksy, 2006).  What has emerged is a system that creates wider access to research while introducing distortions into the process of searching, and these occur whether one is at a subscription-rich institution or a subscription-poor one, or if one is not an institutional member at all.  This situation has created the following routes to search-navigate-retrieval.

 

1)      Your institution’s subscriptions – search by various means and subscriber databases and indexes.  Comprehensive search with Google Scholar is possible, but not federated, not sensitive to access privileges. 

2)      Green OA – articles deposited in institutional OA repositories (over 3000) or archived on author’s sites.  Can be retrieved over Google Scholar (green arrow or ‘all versions’).  May not be refereed version. 

3)      Gold OA – articles published in gold OA journals (over 4000), OR, in mixed access journals (either post-embargo or by author fee open access). Retrieveable in various databases, but indexing is not necessarily consistent between gold, mixed and non-OA resources.

4)      UN Programs – HINARI/AGORA/OARE – articles available to low-income countries research libraries, available through access management at participating institutions.  Article versions may also be in OA repositories.

5)      eIFL – articles available to research libraries in developing countries by eIFL library consortia bargaining.

6)      INASP – articles available through INASP PERii program.  By document delivery.  African Journals Online as well – document delivery.

7)      Publisher’s Concessions – A number of large to small publishers provide free to heavily discounted access to resources, some simply by recognition of IP.

8)      Physical library resources and ILL – Interlibrary loan.  Depending on the library’s network of agreements for ILL.

9)      Author e-print or re-print by request.  If articles can’t be accessed by any of the other routes, they may be obtained directly from the author by request.

10)  Public domain digitized.

11)  ‘Grey market’ – articles may be obtained in ways that circumvent digital rights management of the publisher, but on the other hand may constitute fair use on the part of the end user.  For instance, it was reported that physicians in Thailand resorted to purchasing unauthorized copies of medical articles in order to gain the knowledge they needed to save lives, while they were in no position to pay the purchase price for the article or subscription and had no institutional subsidy like most doctors here would.

 

In terms of the Northern libraries who purchase a number of indexing services to manage their resources, John Willinksy (2006) has stated that “Despite this considerable array of indexes and portals, earnest scholars and students still have to wend their way through overlaps, gaps, and partiality in the coverage of the research literature that the indexes provide…”(p. 173).  When we add OA repositories, OA journals, UN programs, publisher concessions by IP address etc., one wonders if the available wealth of resources is used effectively, and how often specific resources are passed up because of search-navigation issues.  One can predict that the more barriers to retrieval, the more complex the search, and the more specified prior knowledge required of the user, the greater chance there will be an overall reduction in the quality, objectivity and cost (in time) of literature search and consequently the research endeavour itself.

 

Full-Text Global Business Model

‘‘Having one place to search that would include relevant resources would make research less fragmented’’ (Larson and Arret, 2001 in Willinksy, 2006).

Full-Text Global would be a commercial open source software (such as Sugar CMS) that would serve to address indexing issues is a mixed OA/subscription environment with particular attention to providing a truly global service with regards to the opportunities and needs of developing country institutions.  It would also provide a unique service accessible to anyone with an internet connection, with its basic application for federated global article search and global full-text search.  In that sense, it should serve people equally regardless of their location in the world, in the sense that it maximizes everyone’s access privileges and reduces the impact of access privilege inequality.  While a pure OA environment is ideal and may be inevitable, the need to manage knowledge resources in a mixed environment provides the possibility of making this product commercially viable.

 

Full-Text Global will be a company that believes in a vision for Open Access, but provides a service for maximizing access in a world complicated by access barriers.  It addresses the primary access inequality - access to literature - indirectly by addressing the second – access to indexing and search.  This is because the privilege possibilities as they exist today and as they will increase in the future risk under-utilization because of the complications of search and retrieval.  In addressing the indexing issue, FTG also serves to solve problems faced by the large research libraries of the North and provide a service that treats all articles on the basis of the preferences of the user and the route to retrieval, thus harnessing the full power of purchased and OA resources.  It should serve to provide the most direct route possible to free access to highest quality version of any article in the world.  It will do this by managing the routes to access and deductively and logically providing the link to the article, or information on the most direct alternative route listed above (except grey market!).

 

The non-profit business model would be as follows:

1)      Commercial open source software (Creative Commons-type license) for a federated and layered search - similar to Serials Solutions 360, but with attention to ‘searching all resources available to the patron by any route’ as opposed to ‘searching the library’s acquired resources’.

2)      Federated Search Service for Institutional Members - Customization service to integrate access privileges of institution, so that members can retrieve full-text available to them by privilege or by OA, from the comprehensive global search while logged in (without having to search Google Scholar and then return).  This puts the customized privilege-sensitive search engine and results page on your library website, giving your patrons the best possible search capacity.  The federated search would have three levels, two of which are common to the open version and one which is an option for libraries.

a.       A single search bar, all-in-one keyword search.  Simple, and effective (depending on purpose on strategy and keyword skill of user).

b.      An advanced search filtering for advanced keyword, subject, author, title, discipline, years etc.

c.       A specified search allowing the user to focus in on databases, and apply specific knowledge of them to their search.

Finally, the search results page, customized to the library patron’s privileges will provide information on the best way known to retrieve any article (except grey market!).

3)      Support Services to these libraries

4)      Service for Access Management and indexing/search (2+3 above) to consolidate privileges through HINARI/AGORA/OARE, eIFL, INASP, OA + publisher concession access in the South.  Payment scale model based an ability of institution to contribute - with contribution from UN, eIFL, INASP organizations.  Sponsorship and shared programs for North-South-South partners and networks (to outfit entire network of partners with net contribution from them.)

5)      Federated Open Access Search.  For general internet users who have no additional access privileges.  Will accept donations and generate a community of developers. This is essentially the basic software and one that can be skinned to any website.  One can turn on and off a filter to go from comprehensive abstracts/article listings to only results that provide a full-text.  Full-text search means that anyone in the world who can get on the internet will have a global federated search that indexes every journal article available to them (only hits that lead to full-text gratis articles are included, in essence creating the widest library possible for any general user at any given time. But there are several reasons for filtering back in the comprehensive index of titles.

a.       Your library does not yet have Full-Text Global, and you want to use the open version to find the article and then return to log-in to retrieve it (as people often do with Google Scholar).

b.      You want to ensure that your literature search is not biased by availability and convenience, to know what exists and/or to try alternative routes to access. 

6)      Provide both paid and open search services in as many languages as possible.

7)      Provide a scholarly and principled guide to literature-based research, focused on a logic for routes of retrieval for online materials but also utilizing your closest physical library, ILL, author re-print, document delivery etc. etc. in as many languages as possible. 

8)      Provide a data-based map and accounting of the world’s scholarly resources.

9)      Phase II.  Work on web 3.0 Semantic search capacities, and research and development into higher sophistication in computer-automated translation services, towards providing a service of automated computer translation plus human translator peer-review to accelerate high-quality translation of the world’s scholarly resources into as many languages as possible. 

 

The advantage for endowed libraries in the North in choosing Full-Text Global’s business model to provide the comprehensive search for online resources available to your patrons:

a)      the commercial open source commercial not-for-profit model builds sustainability into your purchases:

a.       no proprietary rent-seeking and no legal hassles with regards to patents

b.      freedom to build internal capacity to maintain and modify the software

c.       any programmer globally can work with standardized open source code, and communities develop around open source platforms - you are not reliant on any one company for trouble-shooting, maintenance and improvements.

d.      you pay for services, not for software and intellectual property.

e.       payments cover costs for operating Full-Text Global, and ‘profits’ are invested in the social mission.

 

b)      Full-text Global capitalizes on open access resources by integrating them fully into your indexing services used by your patrons. Even though your library did not have to acquire OA materials, this does not change their value to your patrons.  However, material that is green and gold OA and non O may not be searcheable simultaneously and seamlessly with information in the result page providing the most direct route to the highest quality version available to the user and all access options.  Your patrons’ can log-in one time, and be assured of a single, comprehensive and federated search for all materials available to them, and though privileges differ between individuals, the service that maximizes the use of those privileges for anyone in the world would be Full-Text Global.

 

c)      Full-text Global has a Social Mission that improves the global environment for librarianship, with long-term benefits to your research library. 

a.       Improved access to literature and to search and retrieval through indexing worldwide directly translates into increased production of knowledge resources by all countries in all regions, a capacity currently well below potential in the vast majority of the world’s societies.   We may be described globally as operating at marginal capacity today due to the information divide.

b.      Produced in a research and archival context, these resources will become available to your library’s patrons, likely at no cost through OA or at an affordable price.  This means more unique, diverse, authentic global sources for your patrons.

c.       By purchasing Full-Text Global services, you will be contributing to indexing access and Open Access’ uptake for everyone, regardless of subscription, through supporting the open version of Full-Text Global.

 

Reference:

Willinksy, John. 2006. The Access Principle. MIT, Cambridge Press.