Category: Announce

DBnary extraction program is now on bitbucket

Due to demands we decided to migrate the DBnary programs forge from our own forge to bitbucket and to use git. If you want to develop new extractor or improve the existing ones, go to https://bitbucket.org/serasset/dbnary  . Happy back to

Dbnary is offering a new dataset for African languages

We just took a first step towards expanding the DBnary dataset with dictionaries provided by the DILAF project. We have extracted a LEMON version of the DILAF Bambara dictionary and we give it available on the DBnary server. As usual, it

Tagged with: ,

METEOR with DBNARY

The DBnary dataset has been used for an experiment on a Machine Translation Quality measure based on METEOR. The research paper will be presented during MT-Summit 2016. For work replication, we provide the sources of the experiment: full source code +

Usage examples are now extracted and attached to word senses

During the French “TALN” workshop, several French researchers asked me if I could add usage examples in the extracted data. This has been done in French, with the addition of a new property (named dbnary:exampleSource) that gives the source of

The Polish Language Extractor is now online

DBnary now contains extraction from 13 wiktionary language editions, with Polish being added today. Polish data is available in the very same format as other languages. The extraction work has been really tedious as the Polish language edition uses a

Japanese extractor added

The Japanese wiktionary language edition is now added to dbnary. It took me quite some time to setup as the Japanese edition is quite inconsistent. As usual the data is available in the Download area. SPARQL access will be available

Several statistics available

Statistics are available in the “Dataset” page. They are currently displaying the number of elements, lexical relations and Translations from/to the extracted languages. These stats are updated daily to reflect new extracted dumps. More to come…

Turkish extractor added

With the introduction of the new Turkish extractor, dbnary now offers 9 language editions.

Portuguese extractor enhanced

The Portuguese wiktionary extractor has been considerably enhanced. After adapted it to a new architecture using GWT-Wiki library, I fixed several bugs in the translation extraction section. Moreover, homonymous entries were not extracted correctly. Hence since July 9th dump, the

Greek wiktionary added

With the addition of the Greek wiktionary. DBnary now provides lexical data for 8 language edition. More to come…

Top