How to geospatialise your CSVs

It's now possible to geospatialise your tabular data, leading to much higher levels of data reuse.

The limits of CSVs

In the geospatial industry, people sometimes like to claim that most data is inherently geospatial. What they mean is this: Much of the data that organizations create—across government, industry, and civil society—has a spatial component or ‘place’ to which the data relates. This place could be a region or city, or just a point on the map.

Despite this, most data published on the internet isn't accessible in any common geospatial formats, let alone with any geospatial previewing and API functionality. In fact, most data published and shared online is ‘tabular,’ and only available via CSV.

CSV is a great format for tabular data; but it remains, well, tabular. It can’t be previewed on a map; it can’t be trivially clipped to a geographic extent; it can’t be spatially queried; it can’t be layered alongside other geospatial layers; and it can’t easily be exported into popular geospatial formats, let alone formats for users of CAD or design software or Google Earth.

The upshot of this is that tabular-only data doesn’t reach the same number of users or have the same level of impact as geospatial data. Geospatial data is often easier for non-specialists to appraise and query, and leads to much higher levels of reuse

It’s too expensive to manually geospatialise your CSVs

This problem is becoming more widely recognised. As it becomes easier to publish and distribute geospatial data, organisations with tabular data—that is, non-geospatial data common accessed in CSV or spreadsheets—are looking to create new geospatial layers to distribute online.

The potential here is enormous. Government statistics agencies, for examples, have thousands of tabular datasets that have a spatial component, but are only rarely made available as geospatial data. There is a clear opportunity for organisations to derive more value from their tabular data assets, leading to more reuse and greater impact.

However, until recently, creating geospatial data from existing tabular data was a technically complex process. And the complexity of this task has led most organisations to stick to the status quo, and publish their data as CSV-only.

Introducing Derived Data

The good news is that this has now changed, and new technologies make it possible to virtually ‘geospatialise’ tabular data at the point of distribution. In Koordinates, for example, it is now possible to create a new, ‘derived’ geospatial layer from existing tabular data—without performing any external geoprocessing.

It works like this: After importing your tabular data, Koordinates can connect it to a designated geospatial vector layer. This creates a new, ‘virtual’ layer that has all the functionality of any other published geospatial layer. It can be appraised on the map, cropped to a specified area, exported to the user’s choice of format and projection, and accessed via APIs and web services.

Let’s work through an entirely fictional example. Simone has produced a tabular dataset listing all the cafes in New Zealand, and would like to publish it as a geospatial layer on her Data Service. She calls it the CafeFinder, and believes it will be a very popular dataset.

Simone knows that Stats NZ have published the Urban Area Boundaries layer on their Stats NZ Geographic Data Service, so she exports that as a shapefile, and then imports it—along with her CafeFinder CSV—to her own Data Service.

Simone then navigates to the Urban Area Boundaries Layer and selects ‘Derive New Item.’ Following the steps in the user interface, she connects her CafeFinder dataset—and, within minutes, she has published a new CafeFinder vector dataset.

Geospatialise your CSVs!

Want to try this yourself? The first thing to do, after you’ve signed up to your Koordinates Data Service, is make sure your tabular data contains an attribute column that can be matched to an attribute column in your geospatial reference layer.

After you’ve imported your data, you’ll just need to navigate to your chosen spatial dataset and click ‘Create Derived Layer.’ The creation process simply involves selecting the ID fields to link, and then generating the derived geospatial layer.

Before generating, you’ll be able to choose which fields to include in the final data layer, and which to ignore. Then, prior to publishing, you’ll then need to give it a title, description, and metadata.

You can read the documentation for this process here.

What does a Derived Data layer look like?

NZRS use Derived Data to produce the geospatial layers underlying their Broadband Map. NZRS source tabular data from a range of providers, geospatialise that data, and then use the Koordinates Query API and CartoCSS to produce their hugely popular NZ Broadband Map. You can read more about how NZRS built their Broadband Map here

Another Koordinates customer using Derived Data is Stats NZ, New Zealand’s national statistics agency. Stats NZ use derived data to create geospatial versions of their census data. With thousands of tabular datasets from each census, Stats NZ have started to create new derived layers of some of the most popular datasets, including ‘Age by Meshblock.'

Get started

You can get started with your own Derived Data project by signing up for a Koordinates Plan. Our plans start small, our pricing is transparent, and you only ever pay for what you use.