Announce – Dbnary

Eager to meet the Exolexica ?

By Gilles Sérasset Posted on January 29, 2022 Posted in Announce Tagged with exolexicon, HDT

Since its beginning, the DBnary dataset made the choice to extract what I call the endolexicon, i.e. the language data corresponding to the extracted language edition (e.g. French data is extracted from French language Edition). Since its 20220120 edition, DBnary …

Eager to meet the Exolexica ? Read more »

DBnary dataset is now made available in HDT

By Gilles Sérasset Posted on January 29, 2022 Posted in Announce Tagged with HDT

You can now directly download an HDT* version of a whole language edition dataset (i.e. core, statistics, lime, enhanced translations, morphology and etymology (when available)). Everything is combined in one HDT file that is directly usable. The HDT data may …

DBnary dataset is now made available in HDT Read more »

Pre 2017 extracts are now available on Zenodo only

By Gilles Sérasset Posted on August 24, 2021 Posted in Announce, Uncategorized Tagged with Archive, Zenodo

The DBnary project extracts every Wiktionary dumps since mid 2012 (for the earliest extracted language) until now. In July 2017, the extracts were transitioned from original lemon model to the now de facto standard ontolex model. As we are lacking …

Pre 2017 extracts are now available on Zenodo only Read more »

Kurdish language edition is now part of DBnary

By Gilles Sérasset Posted on July 9, 2021 Posted in Announce Tagged with extractor, Kurdish

DBnary adds a new language to its collection : Kurdish. Since July 1st 2021, the Kurdish lexical data is available in the DBnary dataset. This makes 22 languages in the collection.

Turkish extractor has been rewritten

By Gilles Sérasset Posted on July 22, 2020 Posted in Announce Tagged with extractor, Turkish

Since almost one year, the Turkish data was empty as the extractor was not fixed after the Turkish wiktionary community changed drastically the way the pages were encoded. This has now been fixed with a new version of the Turkish …

Turkish extractor has been rewritten Read more »

Swedish Morphology Extraction is available

By Gilles Sérasset Posted on July 22, 2020 Posted in Announce Tagged with Morphology, swedish

The DBnary extractor now extracts the Swedish morphology. This data is available from the 20200701 extraction version, that used the 2.3.1 version of the extractor. The morphological data may be downloaded and will be available as soon as we upload …

Swedish Morphology Extraction is available Read more »

DBnary server is back online

By Gilles Sérasset Posted on April 20, 2020 Posted in Announce

After a full week offline due to a failing disk array and the difficulties due to COVID-19 confinement the server is back online. I took the opportunity to update online data to the latest (20200401) version. Sorry for the delay.

We are handling Thesaurus entries in English dataset

By Gilles Sérasset Posted on October 19, 2017 Posted in Announce, Uncategorized

The English extractor has recently been extended to handle Thesaurus entries. This leads to more lexico-semantic relations. However, such relations originates and leads to Pages rather than to LexicalEntries. This shortcoming will be handled during data enhancement (still to come). …

We are handling Thesaurus entries in English dataset Read more »

Disambiguated Translation are now systematically computed

By Gilles Sérasset Posted on August 28, 2017 Posted in Announce, Uncategorized

After a long wait, I finally managed to integrate the “Source Translation Disambiguation” experiment into DBnary extractor. Hence the sources of some translation are disambiguated each time a new dump is extracted. What is this ? In wiktionary, translations are …

Disambiguated Translation are now systematically computed Read more »

DBnary is now using w3c Ontolex format

By Gilles Sérasset Posted on March 24, 2017 Posted in Announce, Uncategorized

The DBnary data is now available using the ontolex vocabulary. New bugfixes and additional extracted data will now be available only using ontolex vocabulary. Of course, as usual, preceding extracted data is still available in lemon format. For a limited …

DBnary is now using w3c Ontolex format Read more »

Category: Announce