A data service, not a data portal

A data service is designed for all users, in all industries, and promises radically higher levels of reuse.


Posted by Anne Harper
June 22nd, 2017

Data service image

In the late 2000s, as more and more governments around the world began to adopt open data policies, we saw the rise of what has become known as the ‘open data portal.’ These portals came in many shapes and sizes, from the familiar tabular data catalogue to the many different kinds of ‘geoportal’ (read the ‘Spatial Reserves’ blog to get a sense of the breadth of the latter).

These portals were intended to improve on previous modes of data sharing, which included public servers and — most common of all — the classic ‘ask and maybe we’ll send it to you at some point on a hard-drive’ approach. These modes of sharing made it extremely difficult to find, appraise and access data; they also tended to require users to have technical expertise or individual relationships with agencies.

Ironically, all of this made the reuse of free public data an expensive undertaking. Reuse of public data distributed in this way was, to put it mildly, extremely limited.

Given this starting point, open data portals have been a great advance, and have helped a lot of people do a lot of great work.


The limits of data portals

Those that use data portals on a regular basis have always been aware of their limits. This is one of the reasons why we’ve seen such a proliferation of portal types, and why so many agencies around the world have decided — for better and worse — to build their own.

Every data portal, of course, has its own issues. To butcher Tolstoy, "effective data portals are all alike; every ineffective data portal is ineffective in its own way." Some portals lack proper API integrations; others lack a range of export formats; and others are just plain hard to use.

But there’s a growing consensus on what an effective data portal looks like. In March of this year, the peer network of urban Chief Data Officers across the United States, hosted by Harvard University, published ‘An open letter to the open data community.'

The letter is worth reading in its entirety (go on, I’ll wait). The general thrust of the letter is that the open data community — including users, publishers and technology providers — need “to set higher goals for open data to make it more accessible and usable. Our cities’ open data portals must continue to evolve to meet the public’s growing and changing needs.”

Growing and changing needs

The letter goes on to make eight specific recommendations, which generally fall into one of three categories:

  1. Better design and experience for end users.
  2. More features and functionality, including spatial support, layering, metadata management and revision history.
  3. Better experience for publishers, including automation of processes and pricing.

Without wishing to criticise, I actually think this list doesn’t set the bar high enough. Proper handling of metadata, good design, handling spatial and dealing with performance for large datasets — from my point of view, this all lies firmly in the category of ‘triage.’ The discipline of publishing data needs technology to catch up — it’s time for jet planes, not faster horses.

So, what have they missed? While we can always list more features, the gold standard for data publishing — the standard that we really ought to be aiming for as we talk about raising standards — is accessing all the data one needs (spatial and aspatial) in one place, with APIs and integrations into whatever software one is using.

This, coupled with the necessary improvements listed by the Chief Data Officers, effectively takes us into a different category of data distribution altogether. We could inelegantly talk about ‘the next generation of data portal,’ but that doesn’t do justice to the transformation this would represent—for the technology we use, and also for the outcomes we see in the world.

So, let me say it outright. The era of data portals — like the eras of public servers and ‘get in touch and maybe we’ll send you some data at some point’ — is finally coming to a close.

Active data and the data service

The limitations of first-generation data portals is fundamentally one of technology. They simply introduced too much friction into the user (and publisher) experience, dis-incentivising reuse — and dis-incentivising, in turn, greater investment in open data by agencies themselves.

As I’ve been arguing — here, and in my earlier blogs — the only way to truly realise the potential of open data is to remove the friction experienced by users and publishers, at every point in the data lifecycle.

When a ‘portal’ has achieved this transformation, it has ceased to be a portal at all — it’s evolved into a data service. A data service is fundamentally designed and engineered for all users, in all industries, and promises radically higher levels of reuse.

The data portal has been a great stopgap, helping agencies to ‘get their data out there,’ without necessarily achieving the potential articulated by open data advocates. But the era of the open data portal is at an end. And so, happily, is the era of dormant data, those hundreds of thousands of open datasets that are available somewhere on the internet, unloved and underused.