EBSCO EDS and Single Sign-On

OpenAthens Single Sign-On (SSO) is a SAML-compliant Shibboleth-type authentication method used for University login to a wide range of electronic resources.

SSO works by mediating between an identity provider (e.g. a university, checking that the user’s account is current), and a service provider (e.g. a database, to which the user’s university has a current subscription).  Here’s a diagram of the data flow:

Authentication data flow. Image credit University of Florida.

Authentication data flow. Image credit University of Florida.

Critically, the identity provider and the service provider don’t communicate directly.  The user’s personal credentials are not transmitted to the service provider; just that their identity has been verified.

This means that when someone logs in to a database or journal platform, they are greeted by “Welcome, University of Sunderland user” or “You are logged in as University of Sunderland”, but the database or platform does not know anything further about their identity.

Why does this matter?  Service providers’ servers may be located anywhere in the world, often outside the EU.  The Data Protection Act 1998 controls how personal information is used by organisations, businesses or the government.  It requires that data controllers (organisations etc) handle personal data according to people’s data protection rights, and do not transfer it outside the European Economic Area without adequate protection.

Recently, EBSCO have started promoting the use of an enhanced version of SSO which means that a user will be authenticated into EBSCO Discovery Service (EDS) and simultaneously logged in to their personal folders.  This will sound very appealing to many EDS customers, as currently the personal folders require the user to log in (again) with their EBSCOhost account (yet another userID and password to remember).  With the standard SSO setup, this would not be possible, so I started asking questions about what additional data exchange would be needed in order for the user to be individually identified.

Email from EBSCO:

Essentially the only requirement for setting up SSO is that your shibboleth releases a persistent unique ID. However we generally recommend releasing other attributes:

Which user data attributes must be included within the IdP-generated SAML assertion?

Only a unique user ID (e.g. employee ID, organization-specific email) is required to be sent in the SAML assertion. It is recommended that First Name, Last Name and Email also be sent to better support sharing and email from within the EBSCO user interface.

At the mention of persistent unique ID, I started to wonder about the data protection law implications.

I followed this up with a phone call, asking about compliance with data protection law.   It seems that this query hadn’t previously arisen in the UK, though it had in Scandinavia where they are more aware of the issues.  Safe Harbo(u)r was mentioned, but I pointed out that in 2015, the European Court of Justice declared invalid the Safe Harbor data-transfer agreement that had governed EU data flows across the Atlantic for some fifteen years.  I was directed to EBSCO’s White Paper about information security, but it didn’t mention anything about data protection.

In advance of last week’s EBSCO and OpenAthens webinar “Single Sign-On to a World of Knowledge“, I repeated my enquiry to OpenAthens and received the following:

All data that is given to OpenAthens is stored here in the UK. We provide the option of mapping attributes out to various publishers however this is controlled and decided by you. The default information that is sent to authenticate the user does not hold any data that identifies the user personally.

To me, “this is controlled and decided by you” sounds very much like ducking the question.

I appreciate that decisions on the release of personal data are ultimately the responsibility of the data controller, but I am concerned that neither EBSCO nor OpenAthens seem to acknowledge the legal and ethical difficulties that this presents to libraries having to make these decisions.  I believe that if they are advocating this enhanced use of SSO, they have a moral obligation to point out the data protection implications, even if they can’t advise libraries on these matters.

I would be grateful to hear from anyone who knows more about this – please leave me a comment.  Thanks for any wisdom you can offer!

RAPTOR workshop – an adventure 65 million sessions in the making?*

*with apologies to Jurassic Park

What is RAPTOR?

RAPTORRAPTOR is a JISC-funded kit for looking at e-resources statistics.  RAPTOR stands for Retrieval, Analysis, and Presentation Toolkit for usage of Online Resources.  It was a JISC-funded project led by the University of Cardiff – read more about the project here.  This post summarises my notes from the RAPTOR workshop in Birmingham earlier this week, delivered by Dr Rhys Smith and Dr Phil Smart of the University of Cardiff.  The first version of RAPTOR was released in 2011. Institutions have multiple authentications systems (e.g. Shibboleth, IP), and each logs usage by username.  However, each of these logs are on different systems and in different formats, and some info is missing (e.g. usernames, departments).  Federation operators have a need for stats to demonstrate value for money to their funders.  RAPTOR is a piece of software which allows these usage logs to be collated.

RAPTOR’s goals

  • easy to install & configure
  • not intrusive
  • web front-end for non-tech users
  • scalable
  • standards-based where possible
  • free to use
  • open source
  • community-driven

RAPTOR components

Local deployment options diagram

Local deployment options – image credit RAPTOR wiki

Client (ICA – information collector agents) sends info to the server (MUA – multi-unit aggregator; web):

This picture isn’t as good but it captures Phil doing RAPTOR hands:

Components

RAPTOR is a set of Java programs.  Each competent runs on its own Jetty instance.  Public/private keys, SSL handshakes.  Working on exposing MUAs to SAML metadata instead of keys.

Supported authentication systems

  • Shibboleth IdP
  • EZproxy
  • freeRADIUS

And soon to include OA LA (OpenAthens), OA LA proxy, simpleSAMLphp, Radiator – plus anything you can manually configure.  You can configure RAPTOR to parse any log file you like, you just need to be brave.

If you love xml

Application of RAPTOR

More information about usage, enriched with identity info, gives more business intelligence.  RAPTOR can currently pull out department and affiliation from the IdP [identity provider].  This could be extended in future to include other attributes – let the RAPTOR team know your requirements.
Can use the data to show usage of e-resources by department, system use by affiliation (e.g. UG/PG/staff) e.g. PC cluster room usage.  Could map e-resources usage to attainment info – caveat of correlation not causation.  SWITCH is the Swiss version of JANET – SWITCH AMAAIS [Accounting and Monitoring of AAI Services] project is doing similar things to RAPTOR.

RAPTOR-JUse project

The RAPTOR-JUse project aims to integrate stats from people and platforms by combining data from RAPTOR about the activity of individuals (via the IdP) and data from JUSP [Journal Usage Statistics Portal] about journal usage stats from the SP [service provider] end.

RAPTOR and JUSP have different reporting periods – RAPTOR is per event; JUSP uses defined reporting periods.  This is just one example of the issues to be overcome in this project.

Demo of RAPTOR

The RAPTOR login page is comfortingly simple – though you can’t use federated login (for now).  The irony was acknowledged 🙂  After logging in,  you will see something like this:

Example stats

Can you spot the summer holidays trough on this graph?

Summer holiday trough
You can add postprocessors to sort rows, extract top 10 only etc.  It’s possible to format the entity IDs with SAML organisation name.  The team hope to develop a layer in RAPTOR to represent stats by affiliation as a proportion of the total users, not just raw number.

Can’t do Boolean ‘AND’s in the filter

RAPTOR data can be downloaded in .xlsx .csv and .pdf formats.  It’s not (yet?) possible to see total combined stats for different authentication mechanisms through the web interface – the problem is caused by the different host names being owned by different publishers.  If unique IDs are brought in for publishers in future, this would then be possible.  For any users who’ve dropped out of the directory, no values will be recorded.

Data will be a lot more correct from the moment you install it and run it correctly

Installation options

  • simple – good for test deployment but won’t scale well
  • normal (one ICA on each service to monitor, MUA & web on a Raptor-server server (sic)) – good for large deployment, production use
  • completely separate (ICA, MUA, web elements all on different servers) – probably overkill for most situations

RAPTOR local deployment options in diagrammatic form:

Components

For different components to talk to each other, they need to know each other’s host name, and have encryption keys to swap.  Could have Shib/IP info going to different MUAs.

What do you want from RAPTOR?

Ease of config, supported systems, look & feel, dashboard, reporting vs graphing…?  Let the team know what enhancements you would like to see!  Tell them via the RAPTOR wiki.

WUGEN and WAYFless URLs

To explain what a WAYFless URL is, it’s best to begin with explaining what a WAYF URL is.  WAYF stands for Where Are You From, and it’s a type of URL that allows you access to a service provider via single sign-on by including a step where you have to choose your institution/organisation – hence “where are you from?”.  Therefore, a WAYFless URL is one which does not ask you for your institutional affiliation, and bypassing this step makes it easier and quicker for your users to access platforms.

Setting up service providers to work with your identity provider often involves building WAYFless URLs that are specific to your organisation.  However, they can be brittle and prone to breaking if the target platform changes domain name structure.

A WAYFless URL is one that takes you to an error page

And that’s where WUGEN comes in.  WUGEN [WAYFless URL Generator] is a tool for building robust WAYFless URLs.  The site leads you through a few steps and builds the URL for you.

Click on “Explain my WAYFless URL” to see a rating of the URL on the reliability thermometer:

Explanation of your WAYFless URL

Thanks Rhys and Phil for an excellent workshop 🙂  Before I left, there was time for a final RAPTOR hands moment:

RAPTOR hands

If you enjoy other forms of raptor-related humour, see Philosoraptor

See more on Know Your Meme