OAI-PMH Third Party Implementation
Oxford Journals provides an OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interface for metadata harvesting.
OAI-PMH provides metadata in a standard XML format. Information of the history, standards and protocol of OAI-PMH can be found on the Open Archives Initiative website.
OAI-PMH is our preferred mechanism for supply of metadata, and existing data feeds are being migrated to OAI-PMH.
This document outlines a recommended process to jointly test harvesting, access, linking, and authentication. If you're ready to start using the mechanism, you can go directly to the Oxford Journals OAI-PMH page.
Third Party Benefits
Using OAI-PMH, the third party retrieves the required metadata on a self service basis, rather than Oxford Journals establishing a regular 'push' mechanism.
The benefits of OAI-PMH for OUP third parties are:
- Metadata can be harvested at any time, and as frequently as required.
- The access to full text (if entitled) allows access to HTML full text and extracts in addition to the current provision of access to PDF and abstracts.
- Harvesting parameters can be set to obtain metadata by journal, date, volume or issue.
- Advance Access metadata (articles published online ahead of print) is also included in OAI-PMH feeds.
- OAI-PMH provides the opportunity to fill existing gaps in data and for third parties to source their own claims.
- Articles are made available immediately after publication online.
Set up and testing the data
Third parties will need to download or develop a Metadata Harvester. There is a variety of harvesters available, and guidelines on the implementation of harvesters can be found here. The OCLC provides a number of OAI harvesters.
Once a harvester has been downloaded it should be tested to ensure that it has the ability to retrieve content with a variety of parameters, and that the content maintains the OAI-PMH standard and the third party expectations.
Even without a harvester the search parameters can be experimented with, and OAI-PMH metadata can be viewed to confirm that it is satisfactory.
Harvested data can be reviewed by running a search, the results will be displayed as a HTML web page but by viewing the source of this page the OAI-PMH will be displayed showing all tags and elements making up the metadata.
The following links/URLs can be used to harvest data using the OAI-PMH mechanism giving a variety of results.
To display results from a specified date which includes all OUP journals, use the following URL http://open-archive.highwire.org/handler?verb=ListRecords&metadataPrefix=oai_dc&from=2006-05-01&set=OUP
The Date Range can be amended to select titles FROM other dates. This date us not the Cover Date or Publication date but the ‘Date Stamp’. The date stamp is the date in which the record was uploaded. Records can be uploaded as new records or as amended records.
To select all volumes and issues from a particular journal the date range within the URL does not need to be included. The journal code to use is the Highwire journal code which on occasion differs from the Oxford Journals code and/or domain code. A spreadsheet is available which lists all codes needed on this page http://open-archive.highwire.org/handler?verb=ListRecords&metadataPrefix=oai_dc&set=afrafj
You can combine the 2 parameters to include both the date stamp and Journal code http://open-archive.highwire.org/handler?verb=ListRecords&metadataPrefix=oai_dc&from=2007-01-01&set=afrafj
More specific volume or issue records can be harvested by adding the volume (and issue) you wish to harvest. http://open-archive.highwire.org/handler?verb=ListRecords&metadataPrefix=oai_dc&set=afrafj:104:414
You cannot combine the From Date search with Volume and issue search as the variables may conflict.
Advance Access articles are available to harvest. Search parameters which omit volume and issue data will show these.
In order to access the full text or any other content from the harvested data links should be constructed in the following format:
http://[JOURNAL CODE].oxfordjournals.org/cgi/content/[FULL or SHORT]/[VOLUME]/[ISSUE]/[PAGE] for Abstracts or Extracts and HTML Full Text or http://[JOURNAL CODE].oxfordjournals.org/cgi/[REPRINT]/[VOLUME]/[ISSUE]/[PAGE] for PDFs
- Abstract / Extract http://afraf.oxfordjournals.org/cgi/content/short/105/420/333
- HTML Full Text http://afraf.oxfordjournals.org/cgi/content/full/105/420/333
- PDF http://afraf.oxfordjournals.org/cgi/reprint/105/420/333
Journal codes/domains can be found on either CSV or Spreadsheet on the following page
Authentication will take place by IP, there should be no change to your set up. We will continue to use the proxy IP ranges you have submitted and we’ll build subscriptions for you to encompass all your content entitlements. You will continue to authenticate your customers and we will authenticate your Proxy IPs. We will ask you to test access and we will run both authentication mechanisms until we are satisfied the new process is working.
Once we are certain that the third party harvester is working (using the test titles), that links are built correctly, and that third party authentication to content is active then harvesting full entitlements can begin.
Third Parties who are migrating to OAI-PMH from another mechanism may wish to move forward with all current content using appropriate search parameters and build links to full text content.
Third parties may then wish to back-fill their whole collection of entitlements replacing the metadata downstreamed with OAI-PMH harvested data. We recommend for consistency that one method of access, data sourcing and linking is used.
The current FTP data feeds will eventually cease as more of our third parties move to the OAI-PMH mechanism. Access to articles hosted by OUP on our database may also be moved at some point in the future so it will be important to adopt this mechanism and once adopted to back fill. We do appreciate that any transition may take time and we show understanding when considering the time frame for switching off the current process.