Let’s start with the good news. Over the last decade, we’ve seen a quiet revolution in the way government agencies share their data. Following the many open data mandates of the late 2000s, thousands of agencies — from small counties, councils, and cities to state and federal government — have started to publish their data on the internet.
This is reason enough to celebrate. By making data openly available, agencies can:
As has been predicted for the last decade, the potential impact of open government data is enormous. With users ranging from engineering and energy to policy and planning, open data has the potential to reshape our economy, society, and environment.
This is a great development. But now that we’re seeing greater publication of open data, it’s time to take a more objective look at how open data has been implemented, and ask: is this the best we can do?
MIT professor César Hidalgo doesn’t thinks so. Writing in Scientific American in 2016, Cesar argued that the current landscape of open data was akin to a “nightmarish supermarket.”
As he wrote at the time, "the consensus among those who have participated in the creation of open data sites is that current efforts have failed. [...] The design of most open data sites follows a throwing-spaghetti-against-the-wall strategy, where opening more data, instead of opening data better, has been the driving force.”
And it seems like key technical staff at government agencies agree. In an open letter at the end of 2016, a network of urban Chief Data Officers based at Harvard University argued for the need to “set higher goals for open data to make it more accessible and usable.”
Both Professor Hidalgo and the Chief Data Officers argue that the major weakness in the open data landscape is its technical implementation — specifically, the capabilities and features of open data portals themselves.
In our opinion, they are right to do so. With legal and policy obstacles largely overcome, and the culture of government agencies gradually becoming more 'open-friendly,' the limited functionality and design of open data portals is the most significant obstacle to realizing the potential of open data.
But it is also the most exciting opportunity. Most open data published today is hugely underused, especially when compared to the enormous investment put into data production. Most government agencies haven’t seen the true potential impact of their data. This is starting to change, as the some of the most complex technical challenges around the publication and use of data (especially geospatial data) are being solved.
Let’s dig into specifics. What do we need from our open data portals?
While it may or may not be true that “80% of data is geospatial,” it is certainly true that many of the most valuable datasets from government agencies are geospatial. This includes data on property, contours, transport, planning, aerial photography — the list goes on. From a technical perspective, geospatial data, with its size, complexity, variety of formats and multitude of projections, is much more difficult to handle than tabular CSVs. But that’s no longer an excuse. For open government data to have its desired impact, agencies need to treat geospatial data as their highest priority.
Formats and standards might seem (and arguably are) quite dry. But it’s essential agencies don’t underestimate their importance. In our experience with our flagship data service, Koordinates.com, ‘formats’ usually equates to ‘groups of users.’ This means that the more formats and access channels you support from your portal, the more users you’ll have.
Generally speaking, non-GIS users of geospatial data — including designers, architects, engineers and the public at large — have been underserved by existing open data portals, which tend to focus on formats for GIS software. To ensure that users can get data into their software of choice, it’s essential that your data portal supports automatic translation into a wide variety of formats.
For the first generation of open data portals, users had low expectations. The demand was usually for governments to simply ‘get it out there’ and put their data on the internet. This led to a range of experiments, from the overly technical (public servers) to the overly primitive (aspatial catalogs).
The upshot was a multitude of portals, catalogs, and servers — the ‘nightmarish supermarket’ Hidalgo refers to, above — which often had little to no involvement from professional designers. This, in turn, made it difficult for a wide range of users to access the data they need. These days, there’s no reason why data portals cannot be both feature-rich and user-friendly (for both data publishers and end-users).
When you’re costing out your options for data publishing, it’s obviously important to include staff time. And part of the cost here will depend on the complexity of getting data from your internal systems onto your portal of choice. Depending on your technical capacity, you’ll want a range of options, whether it’s manual upload, automatic import from a public server, or a more complex import directly from your internal systems.
Either way, it needs to be easy — otherwise your internal costs of publishing data (again, primarily staff time) may get in the way of open data becoming business-as-usual.
Beyond cost, it’s important to keep the broader aim of open data publishing in mind: To ensure that the data on your public portal and the data is your internal systems is — as much as possible — the same. This ensures that your users, who many be depending on your data for their mission-critical projects, have the most authoritative version.
APIs, on their own, are not enough to realize the potential of open data. Because their potential user-base is relatively small, an ‘API-only’ strategy is bound to exclude the vast majority of users. But APIs are still crucial, as they enable developers to build new products and services against your data, which can potentially have an enormous impact. The best solution for an agency is to have APIs (including OGC web and tile services for geospatial data) automatically provided for each and every imported dataset.
This may seem like an obvious point to make in 2017, though it’s worth reiterating: you need your data portal to be hosted in the cloud. This enable you easily scale your data publishing efforts; it also enables you to receive the usual benefits of other Software-as-a-Service products, such as rapid and automatic feature improvements. Also, cloud portals are disaster redundant — which is essential when the data you host is mission-critical for emergency response (as government data so often is).
Every organization that invests in open data at scale will need to make a business case, and a fundamental part of that business case is reporting on the Return-on-Investment (ROI). For open data, analytics are fundamental to any ROI calculation, especially if they allow for analysis of sector-specific usage.
Not all data can be made open. But many agencies need to collaborate privately — whether it be with colleagues across the same organization, or in industry, civil society or other government agencies. You need to be able to leverage your data portal to share privately with individuals or groups, so authorized users can access (and layer, combine, and export) all your data in one place.
Open data becomes truly transformative when organizations can build data access into their business processes. But for this to happen, they need to know the data is authoritative, up-to-date, and published on a reliable platform. For this to happen, you need to have SLA-backed reliability and uptime for your data portal. Without this, you won’t be able to guarantee professional reliability to your users, and they won’t be able to depend on your data in quite the same way.
Too often, open data publishing is seen as an all-or-nothing undertaking, whereby agencies have to commit to enterprise level agreements and sales processes before they can publish their data. This can be a significant blocker for small agencies or teams that are looking to trial a data portal before committing to larger levels of publication. Moreover, to make a proper ROI calculation, agencies need to know they’re being charged only for what they use—and nothing else.
It’s an exciting time to be working with government data. As the policy and legal frameworks grow more embedded, and as the culture of government agencies towards ‘open’ shifts, the technology is finally catching up. The good news for agencies looking to publish open data is that the first generation of data portals is giving way to something new.
This ‘something new’ is more technically sophisticated, but also expertly designed and easy to use, and capable of reaching a much wider range of users than earlier portals. While we still have work to do, it’s looking like we’ll come a whole lot closer to realizing the potential of open data in the decade ahead.