Before you publish your data, you’ll need to think about data licensing. A data license — as with other kinds of license — should aim to give your users a clear, human-readable (and not just lawyer-readable) statement of what they can and cannot do with your data.
Because the world of copyright and licensing can be complex, and because it’s easy to get lost in the weeds, we’ve put together this short introductory guide on how to choose a data license. Our specific focus will be on open data licensing.
Before we get started, it’s worth noting that many governments around the world provide their own advice on data licensing. In some cases, they’ve even produced their own bespoke licenses (we mention them briefly, below). So, if you’re from a government agency, we recommend checking out their guidance.
And, of course, nothing in this guide should be taken as legal advice!
At present, many government agencies publishing data on the internet don’t worry too much about licensing. Preoccupied with ‘getting it out there,’ these agencies sometimes believe that data is ‘open’ by virtue of its being publicly available.
This is not the case. Copyright applies automatically (no ‘c’ required), and is more restrictive than most people realize. For non-government works, copyright lasts for the life of the author, plus 50-100 years (depending on where you live). And the fair use and fair dealing regimes in most jurisdictions are generally quite limited, particularly for commercial uses.
For government works, the copyright settings are a bit different. In New Zealand, works from government agencies receive something called ‘Crown Copyright’, which lasts for 100 years. In the United States, copyright works produced by Federal Agencies are in the public domain—though only in the US itself.
Copyright is complex in every country, and government agencies cannot expect users to necessarily know the ins and outs of their intellectual property legislation. The moral of the story, here, is that professional data users, especially those working on significant commercial projects, need legal certainty before they can reuse your data.
This is why we at Koordinates suggest applying a clear license telling users exactly what they can and cannot do with your data (in fact, we’ve built the ability to do so directly into our platform). Even if the work is in the public domain, data users need to know this at the outset — especially if they’re looking to build your data into their commercial business processes.
So, you’re convinced you need to apply a data license for your government data—but you’re not sure which one. The most popular open license system is provided by Creative Commons, an NGO founded by Harvard Law Professor Lawrence Lessig in 2002. Based in the United States, the organisation has over seventy local chapters around the world, from New Zealand and Nigeria to Mongolia and Poland. At present, there are over 1.2 billion works currently released under a Creative Commons license.
The Creative Commons licenses have the benefit of being both human and lawyer readable. This means that ordinary people can understand license terms, while the license itself remains legally robust.
The CC licenses are created from a mix of four basic license elements:
Attribution: This means that users have to give you credit when they copy and repurpose your data
ShareAlike: This means that users have to relicense any derivative works under the same license
No-Derivatives: This means that users aren’t allowed to make (and then distribute) any changes to your data.
Non-Commercial: This means users can’t copy and distribute your work if their primary motivation is commercial in nature.
These four elements combine into six main licenses, ranging from more free to more restrictive.
In addition to these licenses, Creative Commons provides tools to dedicate works to the ‘public domain.’ These tools are intended to waive all rights, including the right to attribution. You should get advice before applying them to government works, as they are not recommended for use in every jurisdiction.
Users of Creative Commons-licensed works might sometimes spot a number and country next to the license (for example ‘3.0 NZ’). This is because, for the first decade of its life, Creative Commons had an entire network of ‘versioned’ licenses, which were created to suit the vagaries and complexities of local legal systems. In addition, Creative Commons invested in an ongoing process of improvement, releasing three distinct iterations of the licenses over the years.
To reduce the complexity of this system, Creative Commons has developed what we might call ‘one license to rule them all.’ These new licenses — dubbed the ‘4.0’ licenses — are intended to be truly international, and have addressed several potential issues with data licensing in the European Union. They have since become the default open license on the Koordinates platform.
Generally speaking, government agencies releasing data should choose the most liberal license — the Creative Commons Attribution license or, if possible, a public domain declaration — unless there are strong reasons not to do so. This is because the more liberal your license, the more users you’ll get. And more users you get, the greater social, economic, and environment impact you will see from your open data.
It can be tempting for agencies to err on the side of ‘closed.’ At Koordinates, we often see publishers flirting with a ‘No-Derivatives’ license to stop people adapting or changing published data. Government agencies can be worried about potential ‘misuse’ or misrepresentation of their data, which in turns leads them to choose the more restrictive license option.
But the use of the Creative Commons ‘No-Derivatives’ license — or equivalent license — can have some unintended consequences for data publishers.
First and foremost, the license is likely to prevent several important and common uses of your data. For example, the following kinds of use would likely constitute derivative works, and may violate the license if they were shared / distributed in any way:
Cropping or clipping the data to a specified extent
Using data in a map or other visualisation
Combining data with other data layers (for example, in a geospatial app)
Format-shifting or translation (depending on your jurisdiction)
Agencies that use the No-Derivatives license, then, give the end user two options. Either the user does nothing; or they ignore the license and risk copyright infringement. For government agencies looking to increase the impact of their data, neither is a great outcome.
Instead, we would recommend using an open Creative Commons Attribution (or equivalent) license. This license already contains safeguards requiring users to mark derivative works as derivatives of the authoritative dataset. The license also restricts users’ ability to claim that an agency ‘endorses’ works created or published by third party users.
In addition, the Attribution licence requires users to link back to the original source (for example, the data layer published on a Koordinates Data Service). This means that users will always have a ready way to locate the authoritative data, hosted on a site with your branding, on your domain.
Other jurisdictions, including the UK and Canada, have written their own open data licenses. These licenses have their pros and cons, though they are generally interoperable with Creative Commons licenses. (‘Interoperability’ means that users can combine works from both licensing systems, without facing any significant contradictions or legal fishhooks in what they are allowed to do).
Regardless of which license you choose, the main takeaway from this guide should be make it obvious to your users what they can do with your data. While we encourage agencies to release their data openly, this isn’t always possible. But you shouldn’t expect your users to be mind-readers; nor should you necessarily expect them to be readers of the ‘copyright’ page of your corporate website.
At Koordinates, our approach has been to build the license directly into the UI. This means that users browsing data published on our platform know exactly what they can and cannot do with your data — and can even read the full license terms, without leaving the site. And when they decide to download your data, we ask them to log-in and click to accept all license terms prior to download (which means our publishers always have an audit trail). And, to top it off, we include the full license deed as a text file in every download.