At its core, ‘open research’ refers to research outputs—think books, articles, and datasets—that have been made available, free of charge, and free of most legal and technical restrictions on reuse.
Advocates have been promoting the benefits of open research since the early 2000s, with university librarians and organisations like Creative Commons and SPARC leading the charge. And over the last few years, these efforts have started to bear fruit, with the passage of hundreds of policies and mandates from funders and institutions around the world.
As a result, we're seeing millions of publicly funded books, papers and datasets published for anyone to read, copy, and distribute, for free.
Facing mandates to publish their data—either from funders or employers—the first question most researchers ask is "how?" And it's at this point that many researchers producing geospatial data hit a brick wall.
This is because most of the established research data portals are ‘tabular-first’, and built almost entirely around the visualization and distribution of CSVs and documents. While these portals can be effective solutions for publishing tabular data, they are generally inadequate for the publication of geospatial data.
The net result is that geospatial research data is shared far less often than tabular data—and doesn’t tend to have a comparable level of impact.
As is true of open geospatial data from government, geospatial research data simply won’t see the levels of reuse we expect unless it’s published in a way that makes it easy for people to find, appraise, and access. This is important, because geospatial research data, broadly defined, has the potential to have an enormous impact on our society, environment, and economy.
So, where do we start? What do researchers need to publish their geospatial data? Let’s run through some of the key features.
Geospatial data can be large, and users need to be able to appraise it—that is, check its coverage and query its attributes—without leaving their browser. This includes visualizing the data on a basemap, and ideally layering it with other relevant datasets from authoritative sources.
Users need to be able to select data from their area of interest, and then export it to their format of choice. In order for geospatial data to have the greatest impact—and reach the broadest range of users—this should include all major GIS formats, geospatial PDF and DWG for CAD users, KML for Google Earth, and CSV.
This sounds obvious, but poorly named and described data simply won’t get used. Clear titles with detailed, plain English descriptions make it much more likely users will access your data—as will complete, accessible metadata. For guidance on naming your data, we’ve published this page on Koordinates Help.
For users to access data with confidence, researchers need to assign an unambiguous statement on how the data can be legally used. Some researchers choose to use a Creative Commons Attribution license, which gives everyone permission to reuse the data how they see fit, provided they give credit. Others choose to effectively waive copyright altogether using a public domain statement. Regardless of which option you choose, you need to ensure your licensing option is made clear to the end user.
For users to build data into the professional projects, they need to be able to know that the data is accessible when they want it. High levels of availability also help build trust in a data source. Data platforms ought to be able to provide Service Level Agreements (SLAs) around platform availability.
Generic geospatial APIs should be created against published data, enabling developers and other users to build new geospatial applications against research data. Web services, including WFS, make it easier for users to access data inside their professional applications (such as ArcGIS and QGIS).
When publishing geospatial data, it’s important to ensure that your users recognise you as an authoritative source. The standard way to do this is to publish data to a branded data service on your institutional or departmental domain.
To enable more effective citations, datasets should have their own Digital Object Identifier (or DOI), which can be used as a permanent identifier for your research data.
Sometimes, small vector datasets can be hugely valuable, but opaque ‘enterprise’ pricing structures make distribution unaffordable. This can lead to the use of platforms that are not fit-for-purpose for geospatial data. This is why the most sustainable way to publish geospatial research data is to pay by volume—that is, to pay for what you use.
The global research community is rapidly moving in a more open direction—and it's doing so for good reason. Around the world, research institutions produce an enormous amount of valuable geospatial research data, but most of this data is never published or shared.
As an individual researcher, it can be difficult to see the value of your research data beyond the goals of your specific project. But beyond mandates from funders or employers, there are powerful reasons why data publication and sharing is important. They include:
Reproducibility. Without sharing your research data, the scientific process—which fundamentally depends on allowing others to test and reproduce your results—is weakened.
Reputation. Funders and institutions are increasingly recognizing the value of a broader range of activities, which means that data citations and evidence of data reuse may be used to advance your career.
Inefficiency. Without effective distribution of research data, data needs to collected again and again, at great expense, which means that everyone is starting from scratch.
Impact. While this is fourth on the list, we think this is the most important. Research data can be useful across a range of contexts, from government agencies developing policy, to industry developing new products and services, to NGOs developing new solutions for their communities and their environment.
Publishing your geospatial research data, then, is important. But we need to remember that these potential outcomes depend on your data being accessible to the people that need it. This is why more researchers with geospatial data are seeking solutions that are 'geospatial first' and have been designed and engineered to cater to the widest possible range of geospatial data users.