The online encyclopaedia Wikipedia and other Wikimedia projects such as Wiktionary (dictionary) and Wikivoyage (travel guide), have an underlying database cum classification system: Wikidata.
Like the content of these reference works, Wikidata is also a product of crowdsourcing.
Wikidata has an open licence (CC0) and is special because it is not merely about searching for existing datasets. It allows you to generate your own datasets based on your own search. These can be downloaded in csv, tsv, and json format and used for any purpose.
Older versions are also available for download.
Please note that Wikidata is constantly changing!
Searches in Wikidata require knowledge of the structure of Wikipedia and of the SPARQL search language, but all kinds of help is available, including the Wikidata Query Builder and the Query Helper.
Like many knowledge databases, Wikidata is composed of so-called triples. A triple is a one set of subject, predicate and object. The predicate establishes the relationship between subject and object.
A triple can be formed by:
Subject: "Cristiano Ronaldo"
Predicate: "has been awarded with"
Object: the "Bravo Award 2004"
Suppose you want a dataset containing all Cristiano Ronaldo's awards and their corresponding years.
![]() |
Explanation:
After clicking the blue arrow, the search is started and the dataset is created.
The dataset shows, among other things:
The set may be downloaded in tsv-, csv- and json-format.
In the following video, the whole process is explained, now with reference to the residences of all women who studied at a particular university.
Only the first 10 minutes are relevant to our topic.
"Wikidata SPARQL Query Tutorial", by Wikimedian in Residence - University of Edinburgh https://youtu.be/1jHoUkj_mKw