Setting up a virtuoso-opensource server to mirror DBnary data
Contents
Download and Compile virtuoso-opensource
First, you’ll have to install several development libraries (this is for Debian):
sudo aptitude install autoconf autoheader automake bison flex gawk gperf libtool build-essential autotools-dev sudo aptitude install libssl-dev libxml2 libxml2-dev imagemagick libreadline-dev libldap-dev libmagickwand5 libmagickwand-dev libwbxml2-dev libwbxml2-0
Then, clone dbpedia and virtuoso-opensource git repo, setup the i18n version of dbpedia (not sure this is really useful…), configure, make, make install, pour coffee…
git clone https://github.com/dbpedia/dbpedia-vad-i18n git clone git://github.com/openlink/virtuoso-opensource.git cd virtuoso-opensource/ git checkout develop/7 cd binsrc/ mv dbpedia dbpedia.orig cp -r ../../dbpedia-vad-i18n/dbpedia . ./autogen.sh CFLAGS="-O2 -m64" export CFLAGS cd ..
./configure --prefix=/opt/virtuoso-opensource --with-readline --enable-dbpedia-vad --enable-fct-vad --enable-rdfmappers-vad --with-port=2222 make sudo make install
The option with-port=2222 is to be used if you compile a new version of virtuoso while another instance is already running with default settings.
Patch virtuoso facetted browser if necessary
Display labels for all languages
By default, the description.vsp program installed by default in virtuoso does not display the literal strings that are in a language which is not the language of the user.
The language of the user is either taken from the HTTP header (Accept-Language) or from the url (lang=xx in url get arguments). A value of “*” will display all languages.
This is inconvenient when using this linked data viewer for multilingual dictionary data (as in dbnary). Hence you have to modify this part of description.vsp page.
To force virtuoso to display all languages, you can patch the binsrc/b3s/rdfdesc/description.vsp file.
Find the part that tries to get the language of the user \u2014 something like:
langs := http_request_header_full (lines, 'Accept-Language', 'en'); ua := http_request_header (lines, 'User-Agent'); all_langs := b3s_get_lang_acc (lines); lang_parm := get_keyword ('lang', params, ''); if (length (lang_parm)) { all_langs := vector (lang_parm, 1.0); langs := lang_parm; }
Add the following lines just before the “if (length (lang_parm))”:
-- GS: force all language strings to be displayed lang_parm := '*';
Alternatively, you may edit the file after server deployment by using the DAV browser to navigate to file DAV/VAD/fct/rdfdesc and edit the file named “description.vsp”
Fix some remaining encoding issues
DBnary uses IRI negotiation. This allows to use international characters inside node names in RDF (aka. URI/IRI). However, facetted browser is not really tolerant to such use.
Among problems, the navigation used in facetted browsing will use an ill formed non UTF-8 encoded value as the IRI. The symptom is when you browse an entry that has a non ascii char in its IRI and click the “Next” button, you’ll get “no other information”.
To fix this, also modify the description.vsp file and change the line:
<input type="hidden" name="url" value="<?V gr ?>" />
to
<input type="hidden" name="url" value="<?V page_resource_uri ?>" />
Setup the virtuoso database directory
mkdir -p /opt/virtuoso/ cd /opt/virtuoso-opensource/var/lib/virtuoso/ mv db /opt/virtuoso/ ln -s /opt/virtuoso/db . cd /opt/virtuoso/db vim virtuoso.ini
Edit the .ini file:
- no need to change the db file declaration (a symbolic link has been used).
- DirsAllowed = ., /opt/virtuoso-opensource/share/virtuoso/vad, /opt/datasets/dbnary/
- adjust memory settings to fit you computer’s configuration
- add “ShortenLongURIs = 1” in SPARQL section
- modify MaxCheckpointRemap in database section to 1/4th NumberOfBuffers
Setup automatic startup
sudo cp debian/init.d /etc/init.d/virtuoso-opensource sudo chmod +x /etc/init.d/virtuoso-opensource sudo vim /etc/init.d/virtuoso-opensource
Modify:
PATH, DAEMON (put the prefix you use at configure step…)
DBBASE: use the folder you configured in the previous step…
sudo update-rc.d virtuoso-opensource defaults
Setup apache for an external server
Put the following proxy passes in your apache conf file.
# The URL to the explicative website< Alias /about-dbnary /opt/www/kaiko/dbnary/ ProxyPass /describe http://localhost:8890/describe ProxyPassReverse /describe http://localhost:8890/describe ProxyPass /conductor http://localhost:8890/conductor ProxyPassReverse /conductor http://localhost:8890/conductor ProxyPass /dbnary http://localhost:8890/dbnary ProxyPassReverse /dbnary http://localhost:8890/dbnary # This is mandatory as the virtuoso server redirects to this url (that should be handled by apache). ProxyPassReverse /about-dbnary http://localhost:8890/about-dbnary ProxyPass /sparql http://localhost:8890/sparql connectiontimeout=300 timeout=300 ProxyPassReverse /sparql http://localhost:8890/sparql ProxyPass /isparql http://localhost:8890/isparql ProxyPassReverse /isparql http://localhost:8890/isparql ProxyRequests Off #ProxyHTMLLogVerbose On #LogLevel Debug <Location /fct> ProxyPass http://localhost:8890/fct ProxyPassReverse /fct # SetOutputFilter proxy-html # ProxyHTMLEnable On # Apply rewrite rule to css and javascripts # ProxyHTMLExtended On # convert URLs in CSS and JS # ProxyHTMLURLMap "localhost:8890" "kaiko.getalp.org" # ProxyHTMLURLMap http://localhost:8890 http://kaiko.getalp.org # convert URLs in CSS and JS #ProxyHTMLURLMap "\"/fct" "\"/dbnary/fct" # Enable rewrite rules #ProxyHTMLURLMap /fct /dbnary/fct #ProxyHTMLURLMap http://localhost:8890/fct /dbnary/fct # Uncomment this when EnabledGzipContent=1 in virtuoso.ini #SetOutputFilter INFLATE;DEFLATE </Location>
Prepare database
- System Admin -> User account: modify dav et dba passwords (default values are dab and dba…)
- System Admin -> Packages: install package “fct”
- System Admin -> Packages: install package “isparql” (to get an advance SPARQL interface…)
The script below will do so remaining setup automatically:
- setup the /dbnary path for linked data access, with content negotiation;
- Add the BDnary namespace in the list of known namespaces;
DB.DBA.VHOST_REMOVE ( lhost=>'*ini*', vhost=>'*ini*', lpath=>'/dbnary' ); DB.DBA.VHOST_DEFINE ( lhost=>'*ini*', vhost=>'*ini*', lpath=>'/dbnary', ppath=>'/DAV/', is_dav=>1, def_page=>'', vsp_user=>'dba', ses_vars=>0, opts=>vector ('browse_sheet', '', 'url_rewrite', 'http_rule_list_1'), is_default_host=>0 ); DB.DBA.URLREWRITE_CREATE_RULELIST ( 'http_rule_list_1', 1, vector ('http_rule_1', 'http_rule_2', 'http_rule_3', 'http_rule_4')); DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'http_rule_1', 1, '^/(.*)$', vector ('par_1'), 1, '/sparql?query=DESCRIBE%%20%%3Chttp%%3A%%2F%%2Fkaiko.getalp.org%%2F%U%%3E&format=%U', vector ('par_1', '*accept*'), NULL, '(text/rdf.n3)|(application/rdf.xml)', 2, 303, '' ); DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'http_rule_2', 1, '^/(.*)$', vector ('par_1'), 1, '/describe/?url=http%%3A%%2F%%2Fkaiko.getalp.org%%2F%s', vector ('par_1'), NULL, '(text/html)|(\\*/\\*)', 0, 303, '' ); DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'http_rule_3', 1, '^/dbnary/*$', vector (), 0, '/about-dbnary/lemon/dbnary-doc/index.html', vector (), NULL, '(text/html)|(\\*/\\*)', 0, 303, '' ); DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'http_rule_4', 1, '^/dbnary/*$', vector (), 0, '/about-dbnary/lemon/latest/dbnary.owl', vector (), NULL, '(text/rdf.n3)|(application/rdf.xml)', 0, 303, '' ); -- Create namespaces for dbnary DB.DBA.XML_SET_NS_DECL ('lexinfo', 'http://www.lexinfo.net/ontology/2.0/lexinfo#', 2); DB.DBA.XML_SET_NS_DECL ('lexvo', 'http://lexvo.org/id/iso639-3/', 2); DB.DBA.XML_SET_NS_DECL ('dcterms', 'http://purl.org/dc/terms/', 2); DB.DBA.XML_SET_NS_DECL ('lemon', 'http://lemon-model.net/lemon#', 2); DB.DBA.XML_SET_NS_DECL ('dbnary', 'http://kaiko.getalp.org/dbnary#', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-fra', 'http://kaiko.getalp.org/dbnary/fra/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-eng', 'http://kaiko.getalp.org/dbnary/eng/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-ita', 'http://kaiko.getalp.org/dbnary/ita/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-rus', 'http://kaiko.getalp.org/dbnary/rus/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-deu', 'http://kaiko.getalp.org/dbnary/deu/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-por', 'http://kaiko.getalp.org/dbnary/por/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-fin', 'http://kaiko.getalp.org/dbnary/fin/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-ell', 'http://kaiko.getalp.org/dbnary/ell/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-tur', 'http://kaiko.getalp.org/dbnary/tur/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-jpn', 'http://kaiko.getalp.org/dbnary/jpn/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-spa', 'http://kaiko.getalp.org/dbnary/spa/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-bul', 'http://kaiko.getalp.org/dbnary/bul/', 2); DB.DBA.XML_SET_NS_DECL ('dbnary-pol', 'http://kaiko.getalp.org/dbnary/pol/', 2);
You may now stop virtuoso and duplicate the database directory that may be reused afterwards as a bootstrap for a new version of DBnary.
Load DBnary data
Go to /opt/datasets/dbnary and uncompress all turtle files here. Create xxx.ttl.graph files that should contain the URI of the graph in which each xxx file will be added. E.g.: http://kaiko.getalp.org/dbnary/fra may be put into fr_dbnary_lemon.ttl.graph.
For the remaining, you’ll have to launch isql (using screen or under an nx session as it may be long to process).
screen isql -- we are in sql mode now ld_dir ('/opt/datasets/dbnary/', '*.ttl', 'http://kaiko.getalp.org/dbnary'); -- do the following to see which files were registered to be added: SELECT * FROM DB.DBA.LOAD_LIST; -- if unsatisfied use: -- delete from DB.DBA.LOAD_LIST; rdf_loader_run(); -- do nothing too heavy while data is loading checkpoint; commit WORK; checkpoint; EXIT;
This will take a long time. Do not overload your server during this loading. After this, relaunch isql to update caches and setup facetted browsing:
isql sparql SELECT COUNT(*) WHERE { ?s ?p ?o } ; sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2; -- Build Full Text Indexes by running the following commands using the Virtuoso isql program RDF_OBJ_FT_RULE_ADD (null, null, 'All'); VT_INC_INDEX_DB_DBA_RDF_OBJ (); -- Run the following procedure using the Virtuoso isql program to populate label lookup tables periodically and activate the Label text box of the Entity Label Lookup tab: urilbl_ac_init_db(); -- Run the following procedure using the Virtuoso isql program to calculate the IRI ranks. Note this should be run periodically as the data grows to re-rank the IRIs. s_rank();