At the beginning of your data publishing journey, most of your initial workflows will be manual. Usually, you’ll have one staff member committing a small amount of time each week (or less) to extracting data from your internal databases, uploading it to your Data Service, and then publishing it to your users.
Soon, though, more and more users will start using your Data Service to export data, connect web services, and building on your data using the Koordinates APIs. And once you experience success, you’ll want to ramp up your publishing activities—including publishing more data and more frequent data updates.
But at this point, organisations tend to face one major hurdle.
Manual data publishing can be both time consuming and tedious; and as with any time consuming and tedious process, you’ll soon look at ways of automating as much of it as you can.
In practice, this means finding ways to automate the import and update of new data. The best way to do this is to connect your Data Service to an intermediary data source (detailed below). You can then use this connection to automatically scan the source for supported data, which will then appear within your Data Service, ready for pre-publication QA.
Note that this won’t be appropriate for every organisation. Ultimately, the decision to move beyond manual data publishing will depend on the nature of the data you’re looking to publish. If you have lots of frequently updated data, it’s going to make much more sense to publish via a data source. If you have a small amount of infrequently updated data, then it may not be worth setting up a data source, regardless of how much usage your data receives.
Once you’ve decided to automate more of your data publishing workflow, you’ll need to set up your data sources, prior to connecting them to your Data Service. At Koordinates, we support:
A PostgreSQL database is an open source database system. With a network link, Koordinates is able to connect to designated PostgreSQL databases, search available data, and identify data to make available for import on your site.
An ArcGIS server is a server for geospatial data and web services produced by Esri. With a query endpoint, Koordinates is able to scan and add data from public ArcGIS servers. We are also able to add data from ArcGIS servers that have a username and password.
WFS, or Web Feature Services, is an OGC standard web service for geospatial data. With a query endpoint, Koordinates is able to scan and add data from public WFS sources.
Amazon S3 is a cloud storage service provided by Amazon Web Services (AWS). Koordinates can scan and import data directly from S3 sources.
These are files uploaded to a publicly accessible URL, which Koordinates can scrape and make available in your Data Service.
We also support CIFS network shares via a Data Gateway (more on that, below).
Once you’ve set up your data sources, you’ll want to connect them to your Data Service, which is the application you’ll be using to publish your data to the world. In Koordinates, this can be done via an admin user interface.
The exact process of connection will depend on the source. For WFS and ArcGIS endpoints, you’ll need to enter a URL and, if applicable, any authentication required for the Data Service to access the source. For PostgreSQL, you’ll need to assign a host, port, and database.
For Amazon S3, there are two methods—the Web Console method for less experienced AWS users, and the CloudFormation Template method for more experienced AWS users.
Once the source is connected, you’ll be able to start a scan. ‘Scanning’ is a term we use at Koordinates to refer to the process of checking a connected data source for supported data, and then displaying that data as ‘import-ready’ within the Data Service user interface. The scanning step means that you can efficiently check data sources for new data and updates from within your Data Service.
After the data is scanned, the process works much like the manual process. You simply check which layers you wish to import, and follow your normal pre-publication QA process, before importing the data into your Data Service.
You can decide to scan your data source as frequently as you wish. The results of each scan will show how much data has been added, edited, and removed since the last scan, so you have an easy way of keeping track.
You can also use connected data sources to manage data updates. Because the scan will pick up any changes or edits to data, you can choose to pull in updates to already published data. This has two benefits. First, it means you don’t have to re-import the entire dataset every time you want to push through data updates. Second, it means that admins can choose to ignore or skip updates as needed.
The final step for advanced data publishing is to script these processes using an API. Koordinates supports a range of Administration APIs for Data Service customers, though these are generally only used by more experienced users who are already invested in data publishing. That said, we have made available an open source Python library to assist with data publishing tasks.
Once you’ve set up your data sources, kicked off your scans, and managed your update workflows, you may notice one final limitation: all of the above data sources exist (by necessity) outside of your corporate firewall.
To solve this issue, Koordinates has developed a unique feature called the Data Gateway. The Data Gateway involves a virtual machine running on VMWare, AWS, Azure or comparable service, fire-walled in a DMZ from the rest of the customer's corporate network.
This virtual machine connects with a secure network link to Koordinates, enabling you to upload and update data to your Data Service from your own internal data sources.
In short, the Data Gateway leads to simplified publishing workflows, as it support publishing from your existing internal data sources. It also provides additional security protections, simplified troubleshooting, and improved latency.
In our experience, publishing practices evolve over time. Commonly, an organisation will start small, uploading and importing data using the manual interface. As publishing activities grow, however—and as more data is published, and more updates are pushed—organisations look to automate more of their workflows, to ensure that they can continue to support a successful, growing Data Service.