Setting up a virtuoso-opensource server to mirror DBnary data

Contents

This post details the steps necessary to install virtuoso server and bulk load wiktionary data. You’ll have to adapt for you own settings.

Download and Compile virtuoso-opensource

First, you’ll have to install several development libraries (this is for Debian):

sudo aptitude install autoconf autoheader automake bison flex gawk gperf libtool build-essential autotools-dev 
sudo aptitude install libssl-dev libxml2 libxml2-dev imagemagick libreadline-dev libldap-dev libmagickwand5 libmagickwand-dev libwbxml2-dev libwbxml2-0

Then, clone dbpedia and virtuoso-opensource git repo, setup the i18n version of dbpedia (not sure this is really useful…), configure, make, make install, pour coffee…

git clone https://github.com/dbpedia/dbpedia-vad-i18n
git clone git://github.com/openlink/virtuoso-opensource.git
cd virtuoso-opensource/
git checkout develop/7
cd binsrc/
mv dbpedia dbpedia.orig
cp -r ../../dbpedia-vad-i18n/dbpedia .
./autogen.sh
CFLAGS="-O2 -m64"
export CFLAGS
cd ..
./configure --prefix=/opt/virtuoso-opensource --with-readline --enable-dbpedia-vad --enable-fct-vad --enable-rdfmappers-vad --with-port=2222
make
sudo make install

The option with-port=2222 is to be used if you compile a new version of virtuoso while another instance is already running with default settings.

Patch virtuoso facetted browser if necessary

Display labels for all languages

By default, the description.vsp program installed by default in virtuoso does not display the literal strings that are in a language which is not the language of the user.

The language of the user is either taken from the HTTP header (Accept-Language) or from the url (lang=xx in url get arguments). A value of “*” will display all languages.

This is inconvenient when using this linked data viewer for multilingual dictionary data (as in dbnary). Hence you have to modify this part of description.vsp page.

To force virtuoso to display all languages, you can patch the binsrc/b3s/rdfdesc/description.vsp file.

Find the part that tries to get the language of the user \u2014 something like:

 langs := http_request_header_full (lines, 'Accept-Language', 'en');
 ua := http_request_header (lines, 'User-Agent');
 all_langs := b3s_get_lang_acc (lines);
 lang_parm := get_keyword ('lang', params, '');
 if (length (lang_parm))
 {
 all_langs := vector (lang_parm, 1.0);
 langs := lang_parm;
 }

Add the following lines just before the “if (length (lang_parm))”:

-- GS: force all language strings to be displayed
lang_parm := '*';

Alternatively, you may edit the file after server deployment by using the DAV browser to navigate to file DAV/VAD/fct/rdfdesc and edit the file named “description.vsp”

Fix some remaining encoding issues

DBnary uses IRI negotiation. This allows to use international characters inside node names in RDF (aka. URI/IRI). However, facetted browser is not really tolerant to such use.

Among problems, the navigation used in facetted browsing will use an ill formed non UTF-8 encoded value as the IRI. The symptom is when you browse an entry that has a non ascii char in its IRI and click the “Next” button, you’ll get “no other information”.

To fix this, also modify the description.vsp file and change the line:

 <input type="hidden" name="url" value="<?V gr ?>" />

to

 <input type="hidden" name="url" value="<?V page_resource_uri ?>" />

Setup the virtuoso database directory

mkdir -p /opt/virtuoso/
cd /opt/virtuoso-opensource/var/lib/virtuoso/
mv db /opt/virtuoso/
ln -s /opt/virtuoso/db .
cd /opt/virtuoso/db
vim virtuoso.ini

Edit the .ini file:

  • no need to change the db file declaration (a symbolic link has been used).
  • DirsAllowed                     = ., /opt/virtuoso-opensource/share/virtuoso/vad, /opt/datasets/dbnary/
  • adjust memory settings to fit you computer’s configuration
  • add “ShortenLongURIs = 1” in SPARQL section
  • modify MaxCheckpointRemap in database section to 1/4th NumberOfBuffers

Setup automatic startup

sudo cp debian/init.d /etc/init.d/virtuoso-opensource
sudo chmod +x /etc/init.d/virtuoso-opensource
sudo vim /etc/init.d/virtuoso-opensource

Modify:

PATH, DAEMON (put the prefix you use at configure step…)
DBBASE: use the folder you configured in the previous step…

sudo update-rc.d virtuoso-opensource defaults

Setup apache for an external server

Put the following proxy passes in your apache conf file.

# The URL to the explicative website<
  Alias /about-dbnary /opt/www/kaiko/dbnary/
  ProxyPass /describe http://localhost:8890/describe
  ProxyPassReverse /describe http://localhost:8890/describe
  ProxyPass /conductor http://localhost:8890/conductor
  ProxyPassReverse /conductor http://localhost:8890/conductor
  ProxyPass /dbnary http://localhost:8890/dbnary
  ProxyPassReverse /dbnary http://localhost:8890/dbnary
# This is mandatory as the virtuoso server redirects to this url (that should be handled by apache).
  ProxyPassReverse /about-dbnary http://localhost:8890/about-dbnary
  ProxyPass /sparql http://localhost:8890/sparql connectiontimeout=300 timeout=300
  ProxyPassReverse /sparql http://localhost:8890/sparql
  ProxyPass /isparql http://localhost:8890/isparql
  ProxyPassReverse /isparql http://localhost:8890/isparql
  ProxyRequests Off
  #ProxyHTMLLogVerbose On
  #LogLevel Debug
<Location /fct>
    ProxyPass               http://localhost:8890/fct
    ProxyPassReverse        /fct
    # SetOutputFilter proxy-html
    # ProxyHTMLEnable         On
    # Apply rewrite rule to css and javascripts
    # ProxyHTMLExtended On
    # convert URLs in CSS and JS
    # ProxyHTMLURLMap "localhost:8890" "kaiko.getalp.org"
    # ProxyHTMLURLMap http://localhost:8890 http://kaiko.getalp.org

    # convert URLs in CSS and JS
    #ProxyHTMLURLMap "\"/fct" "\"/dbnary/fct" 
    #  Enable rewrite rules
    #ProxyHTMLURLMap         /fct /dbnary/fct
    #ProxyHTMLURLMap         http://localhost:8890/fct /dbnary/fct
    # Uncomment this when EnabledGzipContent=1 in virtuoso.ini
    #SetOutputFilter         INFLATE;DEFLATE
</Location>

Prepare database

Launch virtuoso open source and go to http://localhost:8890/conductor/
  • System Admin -> User account: modify dav et dba passwords (default values are dab and dba…)
  • System Admin -> Packages: install package “fct”
  • System Admin -> Packages: install package “isparql” (to get an advance SPARQL interface…)

The script below  will do so remaining setup automatically:

  • setup the /dbnary path for linked data access, with content negotiation;
  • Add the BDnary namespace in the list of known namespaces;
DB.DBA.VHOST_REMOVE (
lhost=>'*ini*',
vhost=>'*ini*',
lpath=>'/dbnary'
);

DB.DBA.VHOST_DEFINE (
lhost=>'*ini*',
vhost=>'*ini*',
lpath=>'/dbnary',
ppath=>'/DAV/',
is_dav=>1,
def_page=>'',
vsp_user=>'dba',
ses_vars=>0,
opts=>vector ('browse_sheet', '', 'url_rewrite', 'http_rule_list_1'),
is_default_host=>0
);

DB.DBA.URLREWRITE_CREATE_RULELIST (
'http_rule_list_1', 1,
vector ('http_rule_1', 'http_rule_2', 'http_rule_3', 'http_rule_4'));

DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'http_rule_1', 1,
'^/(.*)$',
vector ('par_1'),
1,
'/sparql?query=DESCRIBE%%20%%3Chttp%%3A%%2F%%2Fkaiko.getalp.org%%2F%U%%3E&format=%U',
vector ('par_1', '*accept*'),
NULL,
'(text/rdf.n3)|(application/rdf.xml)',
2,
303,
''
);

DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'http_rule_2', 1,
'^/(.*)$',
vector ('par_1'),
1,
'/describe/?url=http%%3A%%2F%%2Fkaiko.getalp.org%%2F%s',
vector ('par_1'),
NULL,
'(text/html)|(\\*/\\*)',
0,
303,
''
);

DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'http_rule_3', 1,
'^/dbnary/*$',
vector (),
0,
'/about-dbnary/lemon/dbnary-doc/index.html',
vector (),
NULL,
'(text/html)|(\\*/\\*)',
0,
303,
''
);

DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'http_rule_4', 1,
'^/dbnary/*$',
vector (),
0,
'/about-dbnary/lemon/latest/dbnary.owl',
vector (),
NULL,
'(text/rdf.n3)|(application/rdf.xml)',
0,
303,
''
);
-- Create namespaces for dbnary

DB.DBA.XML_SET_NS_DECL ('lexinfo', 'http://www.lexinfo.net/ontology/2.0/lexinfo#', 2);
DB.DBA.XML_SET_NS_DECL ('lexvo', 'http://lexvo.org/id/iso639-3/', 2);
DB.DBA.XML_SET_NS_DECL ('dcterms', 'http://purl.org/dc/terms/', 2);
DB.DBA.XML_SET_NS_DECL ('lemon', 'http://lemon-model.net/lemon#', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary', 'http://kaiko.getalp.org/dbnary#', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-fra', 'http://kaiko.getalp.org/dbnary/fra/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-eng', 'http://kaiko.getalp.org/dbnary/eng/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-ita', 'http://kaiko.getalp.org/dbnary/ita/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-rus', 'http://kaiko.getalp.org/dbnary/rus/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-deu', 'http://kaiko.getalp.org/dbnary/deu/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-por', 'http://kaiko.getalp.org/dbnary/por/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-fin', 'http://kaiko.getalp.org/dbnary/fin/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-ell', 'http://kaiko.getalp.org/dbnary/ell/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-tur', 'http://kaiko.getalp.org/dbnary/tur/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-jpn', 'http://kaiko.getalp.org/dbnary/jpn/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-spa', 'http://kaiko.getalp.org/dbnary/spa/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-bul', 'http://kaiko.getalp.org/dbnary/bul/', 2);
DB.DBA.XML_SET_NS_DECL ('dbnary-pol', 'http://kaiko.getalp.org/dbnary/pol/', 2);

You may now stop virtuoso and duplicate the database directory that may be reused afterwards as a bootstrap for a new version of DBnary.

Load DBnary data

Go to /opt/datasets/dbnary and uncompress all turtle files here. Create xxx.ttl.graph files that should contain the URI of the graph in which each xxx file will be added. E.g.: http://kaiko.getalp.org/dbnary/fra may be put into  fr_dbnary_lemon.ttl.graph.

For the remaining, you’ll have to launch isql (using screen or under an nx session as it may be long to process).

screen isql
-- we are in sql mode now

ld_dir ('/opt/datasets/dbnary/', '*.ttl', 'http://kaiko.getalp.org/dbnary');

-- do the following to see which files were registered to be added:
SELECT * FROM DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;
rdf_loader_run();

-- do nothing too heavy while data is loading
checkpoint;
commit WORK;
checkpoint;
EXIT;

This will take a long time. Do not overload your server during this loading. After this, relaunch isql to update caches and setup facetted browsing:

isql
sparql SELECT COUNT(*) WHERE { ?s ?p ?o } ;
sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;

-- Build Full Text Indexes by running the following commands using the Virtuoso isql program 
RDF_OBJ_FT_RULE_ADD (null, null, 'All');
VT_INC_INDEX_DB_DBA_RDF_OBJ ();
-- Run the following procedure using the Virtuoso isql program to populate label lookup tables periodically and activate the Label text box of the Entity Label Lookup tab:
urilbl_ac_init_db();
-- Run the following procedure using the Virtuoso isql program to calculate the IRI ranks. Note this should be run periodically as the data grows to re-rank the IRIs.
s_rank();