Download

Latest extracts

core bg de el en es fi fr it id ja la lt mg nl no pl pt ru sh sv tr
disamb-trans bg de el en es fi fr it id ja la lt mg nl no pl pt ru sh sv tr
morpho de en fr sh

Core data

Data is provided as a set of turtle files (one per language) and may be downloaded here. This link will also give you access to all previous versions (either in lemon or in ontolex format).

Data uses an extended version of LEMON vocabulary. The OWL description of the data is available here. A human readable (HTML+RDFa) description is available here.

The turtle files are updated each time a wiktionary dump is made available (almost once every 10 days for each language). Latest data is available in the folder “latest“, while every extraction version is available under each languages folder.

Disambiguated Translations

At LREC 2014 (at Reyjkjvik, see publications section), we presented an experiment where additional links are given to disambiguate the source of translations. This experiment produces a set of links from Translation to a LexicalSense. Note that in the original dataset, translations are linked to lexical entries and that these new links are established using a non perfect heuristic with state of the art accuracy. This additional dataset is to be used in conjunction with the core dataset and is available in “disambiguated-translations” folder.

Morphology

Since December 2014, morphological data has been extracted from French and German language edition. This data is currently stored in exhaustive version, meaning that every inflected form may be found in an a lemon:otherForm property.

Non Wiktionary data

Since March 2016, we also provide data in lemon format that comes from other available datasets. The first such dataset comes from the DILAF project (Dictionaries for African Languages).

DILAF Bambara Hausa Kanuri Tamashek Zarma