So long, Sunderland – and some data-related unfinished business

After nearly four and a half years at the University of Sunderland, I’m moving on to a new role at ORCID, as their Education & Outreach Specialist.  For most of my time at UoS, I’ve been the E-Resources Librarian and the Law Librarian, which has been a very interesting combination of roles.

When I started at UoS in 2012, we still had Classic Athens authentication and Single Sign-On running in parallel, EDS was implemented but needed more work, and EZproxy was hardly used.  Since then, the use of Classic Athens has been discontinued and SSO has been fine-tuned to give different access permissions to different types of users, EZproxy authentication is in place for all platforms which support it, and I’ve overseen the successful migration of our old EDS to the new EDS FTF.

I’ve enjoyed teaching others about various e-resources topics, especially while dressed as a pirate.  Other subjects included licences and subscriptionsjournals and platforms, and hyperauthorship.

Writing and editing my chapter on Open Access for the Legal Academic’s Handbook helped me to distill and refine my ideas in this field.  Participating in Helsinki University Library’s International Staff Exchange Week 2014 was an excellent experience and further fuelled my Suomi-philia.  And developing a framework for Professional Practice Forum helped to develop communications and nurture relationships within our Senior Library Staff team.

My participation in UKSG has grown from attending the 2013 conference (where I first heard about ORCID), the 2014 conference, being invited to join the UKSG Research & Innovation Sub-Committee, and then being elected to UKSG Committee.  I’m looking forward to carrying on this role in my new job, and glad that ORCID is fully supportive of my involvement.

I would like to thank the colleagues who have helped to realise many of these projects, especially Rachel Webb and Ian Frost, trusty allies in periodicals and IT.

Lastly, there is some unfinished business concerning EBSCO EDS and Single Sign-On.  Bref, EBSCO and Eduserv are proposing a change to how users log in to EDS, so that they will also  immediately be logged in to their personal folders.  This solution will appeal to libraries, as users often struggle with the current situation where you log in first to the system, and then again (with different credentials) to access your personal folders.  However, this change involves sending users’ personal data outside the EU, and therefore has Data Protection implications.  Here is my most recent communication to Eduserv on the matter, sent in advance of last week’s webinar “Approaches to authentication – evolution, security, options for the future”:

I would like to ask you about how the use of EDS and SSO fits with the Data Protection Act (1998) requirements that personal information used by organisations is not transferred outside the European Economic Area without adequate protection.
I have made this enquiry before have been told that it is up to the organisation to decide if EBSCO’s use of servers outside the EU complies with the DPA (really?).  This respondent also quoted the Safe Harbor framework, appearing not to know of the EU Court of Justice decision in 2015 that the Safe Harbor regime did not provide a valid legal basis for EEA-US transfers of all types of personal data.
I wonder if someone at this webinar may be able to provide a better response.  I urge Eduserv and EBSCO not to pass this matter back to individual organisations alone, but to offer some advice and guidance about the implications, especially as many library staff making decisions about implementing the EDS & SSO option may not be aware of the legal implications.

I have not yet had a response from them, and the recording of the webinar has not yet been released so I don’t know if it was addressed during the session.

Library colleagues, please be alert to the implications, keep asking Eduserv and EBSCO about this, and don’t let your users’ data be released without adequate legal and ethical safeguards.

EBSCO EDS and Single Sign-On

OpenAthens Single Sign-On (SSO) is a SAML-compliant Shibboleth-type authentication method used for University login to a wide range of electronic resources.

SSO works by mediating between an identity provider (e.g. a university, checking that the user’s account is current), and a service provider (e.g. a database, to which the user’s university has a current subscription).  Here’s a diagram of the data flow:

Authentication data flow. Image credit University of Florida.

Authentication data flow. Image credit University of Florida.

Critically, the identity provider and the service provider don’t communicate directly.  The user’s personal credentials are not transmitted to the service provider; just that their identity has been verified.

This means that when someone logs in to a database or journal platform, they are greeted by “Welcome, University of Sunderland user” or “You are logged in as University of Sunderland”, but the database or platform does not know anything further about their identity.

Why does this matter?  Service providers’ servers may be located anywhere in the world, often outside the EU.  The Data Protection Act 1998 controls how personal information is used by organisations, businesses or the government.  It requires that data controllers (organisations etc) handle personal data according to people’s data protection rights, and do not transfer it outside the European Economic Area without adequate protection.

Recently, EBSCO have started promoting the use of an enhanced version of SSO which means that a user will be authenticated into EBSCO Discovery Service (EDS) and simultaneously logged in to their personal folders.  This will sound very appealing to many EDS customers, as currently the personal folders require the user to log in (again) with their EBSCOhost account (yet another userID and password to remember).  With the standard SSO setup, this would not be possible, so I started asking questions about what additional data exchange would be needed in order for the user to be individually identified.

Email from EBSCO:

Essentially the only requirement for setting up SSO is that your shibboleth releases a persistent unique ID. However we generally recommend releasing other attributes:

Which user data attributes must be included within the IdP-generated SAML assertion?

Only a unique user ID (e.g. employee ID, organization-specific email) is required to be sent in the SAML assertion. It is recommended that First Name, Last Name and Email also be sent to better support sharing and email from within the EBSCO user interface.

At the mention of persistent unique ID, I started to wonder about the data protection law implications.

I followed this up with a phone call, asking about compliance with data protection law.   It seems that this query hadn’t previously arisen in the UK, though it had in Scandinavia where they are more aware of the issues.  Safe Harbo(u)r was mentioned, but I pointed out that in 2015, the European Court of Justice declared invalid the Safe Harbor data-transfer agreement that had governed EU data flows across the Atlantic for some fifteen years.  I was directed to EBSCO’s White Paper about information security, but it didn’t mention anything about data protection.

In advance of last week’s EBSCO and OpenAthens webinar “Single Sign-On to a World of Knowledge“, I repeated my enquiry to OpenAthens and received the following:

All data that is given to OpenAthens is stored here in the UK. We provide the option of mapping attributes out to various publishers however this is controlled and decided by you. The default information that is sent to authenticate the user does not hold any data that identifies the user personally.

To me, “this is controlled and decided by you” sounds very much like ducking the question.

I appreciate that decisions on the release of personal data are ultimately the responsibility of the data controller, but I am concerned that neither EBSCO nor OpenAthens seem to acknowledge the legal and ethical difficulties that this presents to libraries having to make these decisions.  I believe that if they are advocating this enhanced use of SSO, they have a moral obligation to point out the data protection implications, even if they can’t advise libraries on these matters.

I would be grateful to hear from anyone who knows more about this – please leave me a comment.  Thanks for any wisdom you can offer!

E-Resources – less frequently asked questions

This post follows on from E-Resources FAQ

A short history of remote or off-campus access

Eduserv developed the Athens system for remote access to e-resources.  It worked as a list of usernames and passwords hosted by Eduserv, and it allowed off-campus access without the need for VPN (which would authenticate the user via IP address).  VPN installation is not always easy (Mac users?) or possible (people in internet cafes or other places where they can’t download software onto the computer they’re using), and so was a great leap forward.

However, it was costly: JISC funded Athens access for UK higher education institutions and publishers also had to pay for it to work with their products.  JISC funded the access via Eduserv, but Athens was not a JISC product.

More recently, Shibboleth was developed as an open source software solution for web single sign-on for organisations, so it is free to use for both institutions and publishers.  In July 2008, JISC withdrew funding for Athens and started up their own access management organisation, The UK Access Management FederationAthens authentication continues to exist and is available on a subscription basis.

Hardly any US-based publishers (e.g. Highwire) used Athens, so switching to Shibboleth authentication meant that a wider range of resources was available off-campus than ever before.

Shibboleth is the technology that underlies our Oxford SSO (single sign-on) system.

What is EZproxy and how does it work with SSO?

EZproxy is another tool for remote access and it works by mimicking the Oxford IP range (like VPN):

EZproxy helps provide users with remote access to Web-based licensed content offered by libraries. It is middleware that authenticates library users against local authentication systems and provides remote access to licensed content based on the user’s authorization

Many e-journals and databases work with “Shibbolised” EZproxy, in which the proxy server is accessed via SSO.  The user is authenticated via SSO and then access to the proxy server is enabled, allows access to the resource via IP address authentication.  This means that IP-authenticated resources which aren’t SSO-compliant can be accessed off-campus using SSO via Shibbolised EZproxy.

E-resources access and walk-in users

EZproxy doesn’t kick in on-campus, so IP-authenticated resources allow walk-in user access.  In universities, walk-in users are visiting scholars or people with reader access who are not members of the University, and do not have SSO accounts.

Some publishers (usually in the legal or business fields) do not want to allow walk-in user access to their resources, so they require SSO authentication even on-campus.  Shibboleth access is secure and also gives them log files of user activity, so they can trace anyone they suspect of breaking the terms of their licence, for example by systematic downloading of their content.

Usernames and passwords

A few publishers still rely on username and password authentication based on usernames that they issue.  Typically, these are legal databases whose business model involves selling access to a few people at a variety of institutions in the commercial sector, and so they are not set up for other authentication methods.

These usernames and passwords are then stored on an SSO-protected website, such as Weblearn, our university’s virtual learning environment.

Other advantages of SSO over Athens

SSO provides more up-to-date authentication, as it retrieves user information from the identity provider each time access is requested.  The usernames and passwords hosted by Eduserv were only updated every month or so, so someone who had previously been a member of the University would often still be able to access resources for some time after they left.  SSO permissions can be finely tuned so that a student will lose their e-resources access immediately after finishing their course, but retain SSO access to their email until several months later.  Users are more aware of the value of their SSO, since it lets them in to so many services, and are less likely to share (or sell) it to other (non-University) people.  This had been a problem in the past with Athens usernames and passwords.

How Shibboleth works

The aim of a single sign-on system is to be able to access multiple resources with a single identity.  A variety of service providers (SPs, such as e-resources publishers) can sign up to work with Shibboleth, and a range of identity providers (IdPs, such as universities) can have users’ accounts verified by Shibboleth:

Shibboleth acts as a mediator between the services and the users (with different identities, affiliations and levels of permissions).  Therefore, when you access ScienceDirect via SSO, Shibboleth checks who you are and details about the service you are trying to access.  If it can identify you as a member of the University of Oxford and verify that the University has a current subscription to ScienceDirect, it will allow you access.

To reward you for reading this far, here’s a gory story about where the term shibboleth comes from.

E-Resources FAQ

This is a collection of things I wish everyone knew about e-resources.  Whether this area is new to you or not, I hope you find something useful here; and do let me know about any points I’ve missed in the comments.

What are e-resources?

E-resources are also known as electronic resources and there are two main types: e-journals (or electronic journals) and databases.

Many e-journals are digital copies of print journal articles, but increasingly e-journal articles are published without a print analogue.

There are several kinds of databases

  • Bibliographic – this type of database is a collection of references to published literature.  It functions in a similar way to a library catalogue, but indexes details of articles rather than books
  • A&I (abstracting and indexing) – in addition to bibliographic details, this type of database also contains abstracts of the individual articles
  • Full text – a database which includes the full text of all the articles it has indexed
  • Data/statistics – a collection of numbers and facts which you can query in order to extract a particular dataset.  A database in the purest sense of the word.
  • Images – a database containing a searchable index of images and the images themselves

What does full text mean?  Full text refers to an e-resources that makes available online the whole contents of journal articles, not just the abstract or citation.  Full text articles are often subscription resources, requiring an individual or institutional account for access.

What is an abstract?  An abstract is a summary of a journal article, often published at the beginning of the article.

What is a platform? A platform is a website which hosts content or programs.  Examples include JSTOR and ISI Web of Knowledge (which hosts a number of databases including, confusingly, Web of Science).

What is SFX?  SFX is an OpenURL link resolver, which works by compiling a list of all the journals to which an institution (such as a university) is subscribed and linking to that content.  Primarily, it functions to allow you to search an institution’s subscriptions to see if you can access a particular e-journal, and which years are included in the subscription.  At Oxford University, SFX is locally branded as OU eJournals and is one of a number of resources whose contents are searchable via SOLO.

What is MetaLib?  MetaLib is a search system which allows you to search for resources, link to them, and (in some cases) search within them.  This is not possible for all resources, as they need to be compliant with a protocol called Z39.50 in order to be searchable.  At Oxford University, MetaLib is locally branded as OxLIP+ and is one of a number of resources whose contents are searchable via SOLO.

What is a paywall?  A paywall is a barrier to a website which requires you to authenticate to view the content.  Usually, this requires a paid subscription.  An important implication of this is that any content behind a paywall is not indexable by search engines and therefore will not appear in the search results.  Not everything on the Internet is known to Google.

There are several methods of authentication

Internet Protocol (IP) – the IP address of your computer identifies where you are in the world, and is also used by sites like BBC iPlayer which use your IP address to check which country you are in.  If you are using the university’s computing facilities on campus, the computer you’re using will have an IP address within the university’s main range, which is detected by the e-resource you are trying to reach and access will be granted.  Working “off-campus” means that you are off the university network, perhaps using your own laptop in a university library or working from your own home.  This means that your computer’s IP address is not within the institution’s IP range and you will need a different method of access.  VPN software is commonly used to solve this issue and it works by extending the institution’s network to your computer, thereby bringing it into its IP range.

Want to find out your IP address?  Just go to whatismyipaddress.com

Single sign-on (SSO) – logging in via SSO identifies you as a member of an institution (such as a university) and therefore allows you access.  A great advantage of SSO login is that your authentication can be pushed from one site to another via your browser, so you don’t have to keep logging in when you go to a different subscription site that accepts SSO authentication.

Username and password – the old school method.  Nowadays, this only really applies to a small number of really expensive resources, where tight budgets or low demand mean that a several-user subscription than whole-campus access has been purchased.  There may only be (for example) 5 usernames and passwords for the resource, and if all 5 are in use, you will need to wait until someone has logged out so that you can use that ID to log in afresh.

Also good to know

What is a session identifier?  Session IDs or tokens are commonly used in online shopping sites and data/statistics databases.  These types of sites combine a variety of information to produce the page you are viewing, rather than retrieving a pre-prepared HTML page.  The session ID is used to track the individual user’s actions during the course of their session on the site.  Your shopping cart contents or dataset only exists because you have selected and combined certain elements during the session, which will time out after an order is finalised, or the user logs out, or after a period of inactivity.

URLs which contain “session” or “sid” indicate a session ID, and are not persistent.  If you are attempting to link to a resource, check the URL: if it contains a session ID, the URL will not work when someone tries to follow it later on because the session will have timed out.

Some e-resources have embargoes which are periods during which access is not allowed (usually to protect the publishers’ interests, or in JSTOR’s words “protect the economic sustainability of our content providers”).  There are several types of embargo:

  • A rolling or moving wall – a fixed period of months or years.   For example, most journals in JSTOR have an embargo of 3 or 5 years, and as a new issue is published, its equivalent from 3 or 5 years before will become available on JSTOR.
  • An annual cycle – for example, all content before 1st January of this year is available.  This will add another year to the archive on 1st January of each year
  • A fixed date – for example, only content before 2005 is available

If you’re carrying out research in your subject area, make sure you don’t rely exclusively on resources with embargoes, as you will be missing current and recent material.

E-resources and copyright – keep your use legal!

Most e-resources publishers have a ‘fair dealing’ arrangement which allows you to print or save one article per journal issue.  Downloading an article happens when you view the article on screen, not just if you save it.  Please be aware that systematic downloading is not permitted under fair dealing arrangements and may compromise your institution’s access to the resource.  Also, remember that your access to e-resources is for your own research and learning only, and you may not email pdfs or other downloaded documents to anyone outside your institution.

See also: E-Resources – less frequently asked questions for the next part of the story…