Blog

Tom Eastman and the Data Gateway

We talk with Tom about data, security, and an exciting new feature known as the Data Gateway.

matt-mcgregor-small.jpg

Posted by Matt McGregor
October 9th, 2017

Tom-Eastman-long-hair

Tom Eastman joined Koordinates in August 2016 — but it’s only recently that we’ve been able to start showing off what he’s been up to. A long-serving member of the New Zealand Python Community, and a regular speaker at regional and international open source conferences, Tom has been working for the last year on an extended research and development project known as the ‘Data Gateway.’

As he says on his Twitter profile, Tom “writes words that control computers to tell other computers to build FAKE computers that run on DIFFERENT computers.” After following Tom’s progress for the last few months on an internal Slack channel, I was keen to hear him expand on this surprisingly accurate description of the Data Gateway.

Matt: Thanks for making time!

Tom: I remember that one of the on-boarding tasks someone (you?) set for when I started Koordinates was to write a blog post in the first three weeks. That didn’t exactly happen. If we subtract exactly one year, I think I’m still within that window. (Checks phone). No, I missed it yesterday.

Congratulations on your anniversary at Koordinates! When did you first hear about us?

I became aware of Koordinates at PyCon 2009, where I heard this one guy giving weird, crazy talks on weird, convoluted Python he was building. I didn’t realise till later, but that was Rob Coup, the Koordinates CTO. He gave a great talk about logging into a Python process as if it were a computer using SSH for debugging by attaching some clever Python libraries onto the side of your Python programme. When I started working at Koordinates, a friend made the connection, and said, ‘you must be working for Rob, who gave that crazy talk at the first Kiwi PyCon.’

It sounds like you might have your own PyCon-worthy talk with the Data Gateway. What is it, and what problem does it solve?

The DG is, at its simplest, a virtual machine appliance that a Koordinates customer would run on its internal network. This provides a secure connection for the Koordinates platform to integrate with local data sources.

So, if a customer has internal resources that they want to be easily integrated and imported into the Koordinates platform — and they don’t have the developer resource to take advantage of the Koordinates APIs — the DG allows us to scan and import resources from their internal sources in a way that is secure and protected on both sides.

To the client, it’s a virtual machine image that they would boot up in their internal environment and then configure through the Koordinates platform. To us, it’s a piece of infrastructure connected over a secure virtual private network connection that allows us to access their resources securely. It’s a bit of an iceberg, infrastructure-wise. They have an appliance that they boot up and they configure. And we have some associated infrastructure to orchestrate the network connection.

Security is a massive part of the project then?

It’s a major part of it. Part of the project — and something that’s fundamental to the security field — is developing that mischievous sense of curiosity on how to break your own stuff. It’s a great puzzle solving exercise.

I’m not a hacker or a penetration tester, but a lot of my friends are. It’s taught me to cultivate a healthy sense of paranoia about everything I build, and reminds me to always consider — and minimize — the attack surface of every component in a complex system.

And I hear that the DG is up and running! 

The DG has been successfully used with LINZ, so the basic mechanics described earlier are working. But the configuration is still a manual process, so what we’re looking to do next is get the Koordinates platform to tell the DG what to do using the internal API. The upshot is that the user experience of configuring the DG will actually be remarkably simple.

What’s next for the DG?

The other really important feature of the Data Gateway is that we can offer services that will deploy to it. The way I’ve built the system, we can offer further features to the DG — and these will be accessible by the customer. It’s still early days, though the DG could potentially run database software that the customer could use to connect to and access published data from the Koordinates platform. 

So, imagine if a customer said, ‘I want the latest version of this data on the Koordinates platform, and I want it ready to go on my local DG, to use it straight away without waiting to download it.’ If the customer wanted to access it by PostgreSQL, we could deploy PostgreSQL to the DG and give them the tools to access that data. And every time it updates on the platform, we could push those updates to the customer’s DG.

And that’s just one example. As I say, it’s still early days — but there’s a bunch of exciting projects that become feasible now that the DG has been developed, and we’ll be looking to tackle those in the year ahead.