Solving the problem of dormant data
We now have the tools — policy, legal, cultural and technical — to solve the problem of data that is open but underused.
Towards the end of March, I travelled around New Zealand speaking at the regional Esri User Conferences. Organised by local GIS professionals, these conferences were a fantastic way to learn more about some great projects, and get a glimpse of the future of the geospatial industry.
In my talk, I shared the view that it is now possible for agencies to radically increase the reuse of their open data. In the last few years, we’ve seen increasingly sophisticated browser technology, better internet connectivity and strong cultural change within agencies in favour open data. In parallel, we’ve also seen the rise of data platforms designed and engineered to maximise data reuse.
All this to say that now, more than ever, it’s possible to realise the potential of open data. But what is this potential?
The transformative potential of open data
Since its infancy in the late 2000s, the vision provided by the open data movement has been consistent: open data has the potential to transform our environment, economy and society. This sounds a bit vague and grandiose — I can imagine the slow upwards eye-roll of some readers — so let’s take a moment to break down what this actually means.
The first and most obvious point is that removing the friction experienced by these users in trying to find and access data can save companies, government agencies, NGOs and citizens — anyone working to shape our planet — a heap of time and money. Geospatial data is critical to thousands of projects undertaken and decisions made around the country. By making this data open, we can reduce risk, make better decisions and get better outcomes.
The second point is a little more exciting: open data, published in the right way, can kickstart projects that might never have otherwise existed. For many, closed data is invisible data; if you don’t know data exists, you obviously can’t do anything useful with it. This sounds — and is — a bit more ‘hand-wavey’, though it’s undeniable that open data has already enabled some exciting new projects.
I don’t think it’s an exaggeration to say that, taken together, these represent potentially transformative benefits for our economy, society and environment.
But we’re not there yet.
Open but dormant data
We’ve come a long way in the last decade. From data being ‘closed-by-default’, more and more users are coming to expect — and sometimes demand — that government agencies release their data openly. Agencies, in turn, are increasingly embracing open. Public data is slowly becoming open by default.
The problem here is that open data doesn’t automatically translate into those predicted social, environmental and economic benefits — those don't kick in until data is actually used. Despite the steady increase in data releases, we haven’t yet seen the sort of transformation predicted by the open data movement. Too much open data is woefully underused, especially in relation to the resources invested in its production, and its underlying potential value to countless projects. Too often, open data has become dormant data.
What is dormant data? Dormant data is simply data that isn’t being used either to the extent it could, or at all, because it’s hard to find and access. This is usually characterised by one or more of the following limitations:
- Data requires technical expertise to access. For example, geospatial data published directly to a server, or accessible only via API, with difficult to use or technical interface.
- Data is view-only. For example, data published to a GIS viewer.
- Data requires proprietary software to access, or is available in a limited number of formats and projections.
- Data is not available via API or web services.
- Data cannot be previewed or appraised before export.
- Data is not easily discoverable either within the site or via Google.
- Data has confusing or contradictory licensing statements. High-value users of data require clear and open licensing to commit to reuse, especially in commercial projects.
This is a long list, and it shows just how much we need to think about when publishing open data. But the good news for agencies is that we now have the tools — policy, legal, cultural and technical — to solve the problem of dormant data.
The fact that high-value open data is dormant and under-utilised isn’t a result of any one issue. As the list above reveals, data publishing is difficult, and it’s only in recent years that the desire has been there and the tools have evolved to the stage where it’s possible to make it genuinely easy for users to access the data they need. But now we have a unique opportunity — to transform dormant data into active data.
I’ve touched on the impact of getting data used in my earlier blog on the connected data lifecycle. I’ll drill into more detail on what active data means in my next post, but central to realising the true potential of open data is the belief that increased usage drives publishers to further invest in their data assets. By providing publishers insights into how their data is being used, the feedback loop directly informs which datasets need to be updated, improved or even created in the first place. While this is still relatively new ground, the first step is to create the all important two-way connection between publisher and user.
As I say, we’re not there. But we’re making progress. For my next post, I’ll explain the concept of active data in more detail, and flesh out the vision for the future of spatial data.