Author: Gilles Sérasset

We are handling Thesaurus entries in English dataset

The English extractor has recently been extended to handle Thesaurus entries. This leads to more lexico-semantic relations. However, such relations originates and leads to Pages rather than to LexicalEntries. This shortcoming will be handled during data enhancement (still to come).

Disambiguated Translation are now systematically computed

After a long wait, I finally managed to integrate the “Source Translation Disambiguation” experiment into DBnary extractor. Hence the sources of some translation are disambiguated each time a new dump is extracted. What is this ? In wiktionary, translations are

DBnary is now using w3c Ontolex format

The DBnary data is now available using the ontolex vocabulary. New bugfixes and additional extracted data will now be available only using ontolex vocabulary. Of course, as usual, preceding extracted data is still available in lemon format. For a limited

DBnary extraction program is now on bitbucket

Due to demands we decided to migrate the DBnary programs forge from our own forge to bitbucket and to use git. If you want to develop new extractor or improve the existing ones, go to  . Happy back to

Dbnary is offering a new dataset for African languages

We just took a first step towards expanding the DBnary dataset with dictionaries provided by the DILAF project. We have extracted a LEMON version of the DILAF Bambara dictionary and we give it available on the DBnary server. As usual, it

Tagged with: ,

Kaiko’s going evasive

The recent failure of kaiko web server was due to a flow of SPARQL requests to DBnary. A client launched 46 Million requests (in less than a week) to the sparql server in a very brief delay, leading to an

We are back online !

After 3 days offline due to a major server failure, we are back online !

Major bug discovered and fixed

While using DBnary in conjunction with other lemon resources (mainly during the Lider datathon in Madrid), we discovered a small but major problem with DBnary data. Until now, the lemon prefix was while the official prefix was leading to


The DBnary dataset has been used for an experiment on a Machine Translation Quality measure based on METEOR. The research paper will be presented during MT-Summit 2016. For work replication, we provide the sources of the experiment: full source code +

21 languages are now available

Latin is now part of the extracted languages. It is a rather small language, but, while we were participating to the first Summer Datathon (SD-LLOD2015) in Cercedilla (Spain), this language seemed quite expected by certain participants.