Package org.getalp.dbnary.languages.eng
Class WiktionaryExtractor
- java.lang.Object
-
- org.getalp.dbnary.languages.AbstractWiktionaryExtractor
-
- org.getalp.dbnary.languages.eng.WiktionaryExtractor
-
- All Implemented Interfaces:
IWiktionaryExtractor
public class WiktionaryExtractor extends AbstractWiktionaryExtractor
- Author:
- serasset, pantaleo
-
-
Field Summary
Fields Modifier and Type Field Description protected CombinedWikiModel
combinedExpander
-
Fields inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
debutOrfinDecomPatternString, expander, NON_STANDARD_LANGUAGE_MAPPINGS, pageContent, wdh, wi, xmlCommentPattern
-
-
Constructor Summary
Constructors Constructor Description WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<org.apache.commons.lang3.tuple.Pair<org.apache.jena.rdf.model.Property,org.apache.jena.rdf.model.RDFNode>>
expandCitation(String example)
Set<org.apache.commons.lang3.tuple.Pair<org.apache.jena.rdf.model.Property,org.apache.jena.rdf.model.RDFNode>>
expandExample(String example)
void
extractData()
void
extractDefinition(String definition, int defLevel)
protected void
extractDerived(int blockStart, int end)
protected void
extractDescendants(int blockStart, int end)
protected void
extractEtymology(int blockStart, int end)
void
extractExample(String example)
protected void
extractLanguageData(ISO639_3.Lang lg, WikiText.WikiContent content)
protected void
extractNyms(String synRelation, WikiText.WikiContent blockContent)
protected void
extractOrthoAlt(WikiText.WikiContent blockContent)
protected void
extractPron(WikiText.WikiContent pronContent)
boolean
filterOutPage(String pagename)
void
postProcessData(String dumpFileVersion)
void
setWiktionaryIndex(WiktionaryPageSource wi)
protected void
setWiktionaryPageName(String wiktionaryPageName)
-
Methods inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
cleanUpMarkup, cleanUpMarkup, computeRegionEnd, computeStatistics, convertToHumanReadableForm, extractData, extractDefinition, extractDefinitions, extractExample, extractNyms, extractOrthoAlt, getWiktionaryPageName, populateMetadata, postProcessModel, removeXMLComments, stripParentheses, validateAndStandardizeLanguageCode
-
-
-
-
Field Detail
-
combinedExpander
protected CombinedWikiModel combinedExpander
-
-
Constructor Detail
-
WiktionaryExtractor
public WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
-
Method Detail
-
setWiktionaryIndex
public void setWiktionaryIndex(WiktionaryPageSource wi)
- Specified by:
setWiktionaryIndex
in interfaceIWiktionaryExtractor
- Overrides:
setWiktionaryIndex
in classAbstractWiktionaryExtractor
-
setWiktionaryPageName
protected void setWiktionaryPageName(String wiktionaryPageName)
- Overrides:
setWiktionaryPageName
in classAbstractWiktionaryExtractor
-
filterOutPage
public boolean filterOutPage(String pagename)
- Overrides:
filterOutPage
in classAbstractWiktionaryExtractor
- Parameters:
pagename
- the name of the page- Returns:
- returns true iff the pagename should be ignored during extraction.
-
extractData
public void extractData()
- Specified by:
extractData
in classAbstractWiktionaryExtractor
-
extractLanguageData
protected void extractLanguageData(ISO639_3.Lang lg, WikiText.WikiContent content)
-
extractDefinition
public void extractDefinition(String definition, int defLevel)
- Overrides:
extractDefinition
in classAbstractWiktionaryExtractor
-
extractExample
public void extractExample(String example)
- Overrides:
extractExample
in classAbstractWiktionaryExtractor
-
expandCitation
public Set<org.apache.commons.lang3.tuple.Pair<org.apache.jena.rdf.model.Property,org.apache.jena.rdf.model.RDFNode>> expandCitation(String example)
-
expandExample
public Set<org.apache.commons.lang3.tuple.Pair<org.apache.jena.rdf.model.Property,org.apache.jena.rdf.model.RDFNode>> expandExample(String example)
-
extractEtymology
protected void extractEtymology(int blockStart, int end)
-
extractDerived
protected void extractDerived(int blockStart, int end)
-
extractDescendants
protected void extractDescendants(int blockStart, int end)
-
extractNyms
protected void extractNyms(String synRelation, WikiText.WikiContent blockContent)
-
extractPron
protected void extractPron(WikiText.WikiContent pronContent)
-
extractOrthoAlt
protected void extractOrthoAlt(WikiText.WikiContent blockContent)
-
postProcessData
public void postProcessData(String dumpFileVersion)
- Specified by:
postProcessData
in interfaceIWiktionaryExtractor
- Overrides:
postProcessData
in classAbstractWiktionaryExtractor
-
-