In the late 2000s, as more and more governments around the world began to adopt open data policies, we saw the rise of what has become known as the ‘open data portal.’ These portals came in many shapes and sizes, from the familiar tabular data catalog to the many different kinds of ‘geoportal.’
These portals were intended to improve on previous modes of data sharing, which included public servers and — most common of all — the ‘ask and maybe we’ll send it to you at some point on a hard-drive’ approach. These modes of sharing made it extremely difficult to find, appraise and access data; they also tended to requires users to have technical expertise or individual relationships with staff at government agencies.
Ironically, these approaches made the reuse of free and open public data an expensive undertaking. Given this starting point, open data portals have been a great advance, and have helped a lot of people do a lot of great work.
So, what’s the problem? The goal of the open data movement was never just ‘release more data.’ The actual goal was to see a sharp increase in high-value and high-impact reuse of authoritative data, across civil society, industry and government itself.
And getting data used is hard. While the open data project has been successful in getting more open data on the internet, it has struggled in generating the levels of use required to realise its initial vision (we’ll talk about why this is the case later on).
This is not specific to any one country. Every open government data project in the world has found it difficult to bridge the gap between ‘getting the data out there’ and ‘getting it used.’ While there are some great open data case studies, these are still relatively isolated. It’s fair to say that we haven’t yet realised the full, transformative potential of open government data
This is particularly the case with geospatial data, which has the greatest potential to transform our society and economy, but has also the greatest barriers to high-value use.
Geospatial professionals know these barriers all too well, particularly the low-level data wrangling tasks, such as sourcing and translating spatial datasets published across a range of government websites. These rudimentary, somewhat painful tasks have, unfortunately, become part of the job.
But what about those who lack geospatial training and software? What about the multitude of architects, engineers, analysts, designers, draftsmen, researchers and more—all those involved in the projects that determine how we understand and shape our planet?
The goal of the open data movement was never just 'release more data'. It was to see a sharp increase in high-value and high-impact reuse.
For these folks, the process of accessing and using data can be truly daunting. Indeed, lacking specialist software and training, simply finding, downloading and appraising large and complex geospatial datasets from a variety of sources and translating them into the formats they need—such as DWG, or geospatial PDFs—is nearly impossible.
For the open government data project to realise its potential, we need to find a way to reduce these barriers. Otherwise, industry, government and civil society will continue to bear the heavy transaction costs of accessing and using authoritative geospatial data.
Those that use data portals on a regular basis have always been aware of their limits. This is one of the reasons why we’ve seen such a proliferation of portal types, and why so many agencies around the world have decided — for better and worse — to build their own.
Every data portal, of course, has its own issues. To butcher Tolstoy, "effective data portals are all alike; every ineffective data portal is ineffective in its own way." Some portals lack proper API integrations; others lack a range of export formats; and others are just plain hard to use.
But there’s a growing consensus on what an effective data portal looks like, which generally falls into one of three categories:
Better design and experience for end users.
More features and functionality, including spatial support, layering, metadata management and revision history.
Better experience for publishers, including automation of processes and transparent pricing.
The limitations of first-generation data portals is fundamentally one of technology. They simply introduced too much friction into the user (and publisher) experience, dis-incentivizing reuse — and dis-incentivizing, in turn, greater investment in open data by agencies themselves.
So, if that’s the problem, what does the solution look like?
It’s no secret that a well-designed process or product should start with the user—how they work, how they think, where they carry out their tasks, and more. By taking this approach, it’s possible to dramatically reduce the barriers faced by new users of your process or product. With this in The workflow of most data users can be broken down into four general steps.
Users tend to search across a wide range of data sources, validating as they go that the data is from an authoritative source, such as a government agency or reputable vendor.
Users need to know what they’re getting before they’re able to use the data with any confidence. In the spatial world this usually involves previewing the dataset by itself (including interrogating the data to ensure it has the required features and content), and layering it with other relevant datasets to see how the combined output looks.
Geospatial datasets can be truly massive and are often complex, so users tend to ‘crop’ multiple data layers to their area of interest. And because the application of spatial is very broad, the file formats required are also broad — from established GIS formats like SHP, to non-GIS formats like DWG and geospatial PDF.
More technical users will often want to connect with data via APIs and, for geospatial professionals, OGC-compliant WFS and WMTS web services. This streamlines access, and enables the creation of new products and services.
Taken together, this workflow enables all users to get the data they need, without leaving their browser, relying on intermediaries, or subscribing to expensive and complicated software.
To date, open data products and processes haven’t usually considered the way data users actually work, which has placed an unnecessary burden on users. But the same is true for data publishers. While data publishing workflows are highly variable, depending on the size and nature of each organisation, we can identify a range of common pain-points and bottlenecks.
Agencies need to be able to add data directly from their internal data sources, and automate regular updates via API. They also need data to be cleaned and verified on import, to ensure users don’t experience errors.
At the platform-level, agencies need their data automatically SEO-optimised, and stored in such a way that it can be previewed and repackaged on-the-fly for data users.
Agencies need their own branded site, with built-in licensing, metadata and granular access controls for private data sharing of non-open data. They also need automatic APIs attached to every dataset,
Before publishing more open data, agencies need rich analytics to understand user behaviour and make informed investment decisions.
From what we’ve seen there is now an opportunity to radically reduce the friction of both publishing and accessing data. By connecting the workflows of data publishers and data users together, a connected data lifecycle is formed.
From the user perspective, the lifecycle enables users to access a wider range of authoritative data — data that has been published to a standard that supports and enables their reuse.
For publishers, this approach makes it easier to streamline data publishing, and provides greater, more granular analytics on reuse. Rich data on user behaviour enables publishers to make better decisions on their data investment.
This harmonious feedback loop sits at the heart of a better experience for both sides and radically increases the return-on-investment of public data. And it also allows us to finally deliver on the big promise of open data — where the data is actively used to inform the high-value projects that shape our social, environmental and economic landscape.
The connected data lifecycle shows how agencies can raise the bar and publish their open data better. This radically increases data reuse and allows everyone to derive more value from open government data.