Data Strategies in Academic Publishing

Part three of Anthony Finn's series of posts on data management which also includes
Book Metadata – an IntroductionMetadata, Customer Data and Title Management Systems
and ISBNs – Vital or dying in e-publishing?

Academic publishing has at its core the responsibility to disseminate content as widely as possible. That means ensuring content is easily discoverable, regardless of content type, user requirements and desired methods of access.  This is particularly important now that the nature of content is changing, with more and more texts integrating audio and visual elements. User requirements include purchasing not only a whole book, but a subscription, or instant access to a chapter, article or chart. And of course methods of access now include tablet and phones as well as computers.

These changes put pressure on publishers’ control of their data, requiring a more detailed and nuanced approach for many publishers.

A typical publisher has data contained in a number of systems:

•    Financial system
•    CRM/Sales database
•    Authentication system
•    Fulfilment
•    Usage statistics
•    Submissions system
•    Author database

And a number of people can be entering data into these systems, including staff, authors, society members, agents in the supply chain, 3rd party organisations, and customers – through self-service systems. This plethora of data sources means that without some consistency, data can become impossible to manage. And the aim of a data strategy is to keep data as consistent, up-to-date, and useful as possible.

For customers, this consistency is absolutely vital, since readers are now engaged in trying to accurately identify, find, and re-find, research they need to read.There are approximately 657 English-language publishers amongst the trade and professional associations, producing some 11,500 journals, or 50% of global output. The 28,100 scholarly peer-reviewed journals published  mid-2012 produced 1.8-1.9 million articles p.a. with annual growth of about 3 – 3.5 % p.a.

Meanwhile, the number of researchers now stands at 6-9 million, depending on definition. These numbers mean that searching for a specific article, by a specific author, throws up plenty of near duplicates. Are you searching for

UCL:
•    University College London (UK)
•    Université Catholique de Louvain (Belgium)
•    Universidad Cristiana Latinoamericana (Ecuador)
•    University College Lillebælt (Denmark)
•    Centro Universitario Celso Lisboa (Brazil)
•    Union County Library (USA)

NPL:
•    National Physical Laboratory (UK)
•    National Physical Laboratory (India)

York Uni:
•    University of York (UK)
•    York University (Canada)

Identifiers enforce uniqueness, enabling ongoing data governance. Where a company has made an investment in obtaining data quality, identifiers help to protect that investment. And if the reader knows the identifier of an institution, journal or author, they can use that identifier to find the research they need more quickly.

ORCID (The Open Researcher and Contributor ID) aims to solve author and contributor ambiguity:
“ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized.”
ORCID.org

It’s the word ‘persistent’ that’s particularly important, since researchers move between institutions and their identifier needs to move with them.

Identification systems for organisations include:  
•    International Standard Name Identifier (ISNI)
•    Ringgold ID
•    DUNS Number (D&B) and other business and finance IDs
•    MDR PID Numbers and other marketing IDs
•    Library of Congress MARC Code List for Organizations

Purchase and Access

For academic publishers the complexity of institutional purchase can be an additional challenge. Which department at an organisation has purchased your journal? And who do they provide access to? A sale to a library consortium can mask a relationship to the institution, physical campus, then faculty or department, an institute or grouping within a department or a department within a school.
Similar hierarchies can be drawn for multi-national corporates, health organisations, and governments with central, regional and local departments.

Data Governance – a definition

Data governance is an emerging discipline with an evolving definition. It is really the convergence of:
•    data quality
•    data management
•    data policies
•    business process management
•    risk management surrounding the handling of data in an organization
•    auditability

Data governance is both a business issue and technical issue. It relies on collaborative working, shared ownership, shared responsibility for success or failure and considerable executive support.

Embarking on a data governance strategy involves deciding on the governance and leadership – with clear roles and responsibilities to ensure accountability for programme. Clear policies and procedures need to be in place to check data, backed up by systems and processes that secure the quality of data. Staff involved in data governance need to be trained so that they train staff so they have the appropriate knowledge, competencies and capacity for their roles (this is essential, and where data governance can go wrong). Everyone involved in the data governance then needs to focus on securing data which is accurate, valid, reliable, timely, relevant and complete. The steps in the project are usually:

•    Current data audit
•    Audit current data sources and quality
•    Clean (and continually review)
•    Data capture
•    Review current data capture processes
•    Revise, refine, rewrite process rules
•    Data integration
•    Review current and desired integration
•    How to transition from separate silos to integrated data
•    Analytics
•    How do you use data to run your business currently
•    Define data analysis objectives

As part of my work with Maverick I assist publishers with good data strategy policies.  A typical data strategy project involves a number of key phases; from auditing current data quality and processes to defining goals and aspirations, creating the methodology and systems and, if required, assisting with data cleansing. It’s all about establishing the best possible basis for the publisher to move forward.

Author: Anthony Finn