Conditions for "open"


Data is open when the following conditions are met:

Machine-readable

It is important that data is machine-readable. This does not mean the same as "digital".

Many document formats we create every day using our laptops and computers are not machine-readable. They are unreadable unless you have the right software package.

Examples:

"But surely you can easily save an xlsx file in the csv format?" Yes, but then information is lost, such as formulas and meaningful use of colours.

Open data should be extractable and processable by the computer. The guarantee for this is the file format.

Machine-readable formats include.

Files in these formats can always be opened and read by any computer, regardless of the software installed.

More information on file formats


Presence of metadata

A table filled with numbers and/or text needs basic explanations:

The answer to those questions should become clear using metadata ("data about data"). It is common practice to add these to the dataset in a separate text file.

 

Open licence

Publishing a dataset on the internet does not mean that this makes it "open". This is because its creator has copyright! If he/she has not explicitly stated that the dataset is open for re-use, then the data is not open.

As with other "works of art, science or literature" (that's how it says in the Copyright Act), the creator of a dataset also has copyright on it: only the creator has the right to reproduce or distribute the dataset.

Copyright arises automatically, i.e. not only because the creator has placed a © sign.
And it persists even after the creator has published or allowed the dataset to be published on a website.

This means that, in principle, a user may only view and download someone else's dataset for their own use. Making copies for a group of students, combining the data with others, and then republishing or distributing them are infringements of the creator's copyright. Even in education!

Unless.... the creator has given prior conditional permission for use and re-use, i.e. a licence or license attached to the work. This preserves the creator's copyright, but creates opportunities for others to distribute the dataset.

Without a licence, the dataset cannot be open!


How do you find out if a dataset has a licence attached to it?

The creator could write out the licence terms themselves. However, this rarely, if ever, happens.

Creators almost always use an existing licensing system. Not having to invent and write out terms saves them time, and not having to read them saves you time.

Creative Commons (CC) is the most widely used licensing system worldwide. In this system, logos and abbreviations are used for terms and conditions. The creator selects one or more logos and/or abbreviations.

    CC-BY Re-use is allowed on condition that a correct source citation is added.
CC-ND Re-use is permitted provided no derivative works are published.
CC-SA Re-use is permitted, provided derivative works are published under the same licence (share alike).
CC-NC Re-use permitted, but only for non-commercial purposes.
CC-0 No conditions; public domain


 

Other licence forms

Some governments and international organisations do not use Creative Commons but have created their own licence forms, such as the UK Open Government License, The World Bank Terms of Use and the French Government License Ouverte.

 

"What is Creative Commons? Creative Commons License Types Basics Explained" by Creative Common Studio, 2020 https://youtu.be/4MYSVhKcnaA