DBnary is now using w3c Ontolex format

The DBnary data is now available using the ontolex vocabulary. New bugfixes and additional extracted data will now be available only using ontolex vocabulary.

Of course, as usual, preceding extracted data is still available in lemon format.

For a limited amount of time, the data will be extracted in both lemon and ontolex format. The online access will only be available in ontoloex format.

The DBnary specific vocabulary will be shortly adapted for this new base vocabulary. Say tuned for more information.

DBnary extraction program is now on bitbucket

Due to demands we decided to migrate the DBnary programs forge from our own forge to bitbucket and to use git.

If you want to develop new extractor or improve the existing ones, go to https://bitbucket.org/serasset/dbnary  .

Happy back to school period in France by the way 😉

Dbnary is offering a new dataset for African languages

We just took a first step towards expanding the DBnary dataset with dictionaries provided by the DILAF project. We have extracted a LEMON version of the DILAF Bambara dictionary and we give it available on the DBnary server. As usual, it is in lemon model, and URIs are dereferencable. Check for instance the  http://kaiko.getalp.org/dilaf/bam/daɲɛgafe1__n entry.

Kaiko’s going evasive

The recent failure of kaiko web server was due to a flow of SPARQL requests to DBnary. A client launched 46 Million requests (in less than a week) to the sparql server in a very brief delay, leading to an overflow of the log file that filled up the root partition and broke the server.

May I remind everybody that the DBnary data is available easily by downloading turtle files that you may use either directly using adequate libraries (e.g. JENA in java or others in other programming languages) or upload in a local dbnary mirror.

If you overflow the public server, then it will not be available to serve others.

In order to avoid future problems, the public server is going evasive, meaning that such overflowding clients will be temporarily blocked. Allowing the server to remain available to others.

If this new setting breaks your app, do not hesitate to contact me.

We are back online !

After 3 days offline due to a major server failure, we are back online !

Major bug discovered and fixed

While using DBnary in conjunction with other lemon resources (mainly during the Lider datathon in Madrid), we discovered a small but major problem with DBnary data.

Until now, the lemon prefix was http://www.lemon-model.net/ while the official prefix was http://lemon-model.net/ leading to a poor mapping between DBnary data and other lemon datasets.

The prefix has been fixed in the extractor and I also fixed ALL PREVIOUS VERSIONS of the dataset. Meaning that from now on, even if you use an older dataset (provided that you re-download it), you’ll have the correct mapping.

The sparql endpoint data will also be updated soon.

The DBnary dataset has been used for an experiment on a Machine Translation Quality measure based on METEOR. The research paper will be presented during MT-Summit 2016.

For work replication, we provide the sources of the experiment:

21 languages are now available

Latin is now part of the extracted languages. It is a rather small language, but, while we were participating to the first Summer Datathon (SD-LLOD2015) in Cercedilla (Spain), this language seemed quite expected by certain participants.

4 more languages immediately available

Thanks to Malick Diagne and Steve Roques, DBnary now extracts data from Dutch, Lithuanian, Serbo-Croat and Swedish editions. The Serbo-Croat language extractor also extracts morphological informations.

WikDict, a web service based on DBnary data

Karl Bartel has created a very simple web service to lookup translation of many languages.

The data powering wikDict is provided by DBnary. The service is still preliminary, but I’m quite sure it will improve with the time.

Go to wikDict home page…

