Download

Contents

Latest extracts

Latest extracts are available for download in turtle format. The data is modeled using ontolex vocabulary. For each language, the data is available in several files.

ll_dbnary_ontolexThe core data as it has been extracted from Wiktionary. This file contains Pages, Entries, Senses, Lexical Relations, Translations, … modeled using ontolex vocabulary.
ll_dbnary_enhancedData computed from the core data. Mainly links that has been computed from Translations to the Sense(s) for which they are valid.
ll_dbnary_morphologyExtensive morphology, i.e. a set of alternate forms for lexical entries, along with their linguistic annotations. This data is not available for all languages.
ll_dbnary_limeThe Metadata description of each language edition, modeled using ontolex’s LIME submodule.
ll_dbnary_statisticsStatistics on the extracted data. These are modeled using the datacube vocabulary.
ll_dbnary_etymologyEtymological data (currently only available for English language edition)

Core data

Data is provided as a set of turtle files (one per language) and may be downloaded here. This link will also give you access to all previous versions (either in lemon or in ontolex format).

Data uses an extended version of LEMON vocabulary. The OWL description of the data is available here. A human readable (HTML+RDFa) description is available here.

The turtle files are updated each time a wiktionary dump is made available (almost once every 10 days for each language). Latest data is available in the folder “latest“, while every extraction version is available under each languages folder.

Disambiguated Translations

At LREC 2014 (at Reyjkjvik, see publications section), we presented an experiment where additional links are given to disambiguate the source of translations. This experiment produces a set of links from Translation to a LexicalSense. Note that in the original dataset, translations are linked to lexical entries and that these new links are established using a non perfect heuristic with state of the art accuracy. This additional dataset is to be used in conjunction with the core dataset and is available along with the core dataset (and computed in sync with core updates).

Morphology

Since December 2014, morphological data has been extracted from French and German language edition. This data is currently stored in exhaustive version, meaning that every inflected form may be found in an a lemon:otherForm property.

Non Wiktionary data

Since March 2016, we also provide data in lemon format that comes from other available datasets. The first such dataset comes from the DILAF project (Dictionaries for African Languages).

DILAFBambaraHausaKanuriTamashekZarma