Package org.getalp.dbnary.languages.rus
Class WiktionaryExtractor
- java.lang.Object
-
- org.getalp.dbnary.languages.AbstractWiktionaryExtractor
-
- org.getalp.dbnary.languages.rus.WiktionaryExtractor
-
- All Implemented Interfaces:
IWiktionaryExtractor
public class WiktionaryExtractor extends AbstractWiktionaryExtractor
- Author:
- serasset
-
-
Field Summary
Fields Modifier and Type Field Description protected int
DEFBLOCK
protected RussianDefinitionExtractorWikiModel
definitionExtractor
protected boolean
isCurrentlyExtracting
protected static Pattern
languageSectionPattern
protected static String
languageSectionPatternString
protected RussianMorphoExtractorWikiModel
morphoExtractor
protected static HashMap<String,String>
nymMarkerToNymName
protected static HashSet<String>
posMarkers
protected static String
pronounciationPatternString
protected static Pattern
sectionPattern
protected static String
sectionPatternString
protected RussianTranslationExtractorWikiModel
translationExtractor
-
Fields inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
debutOrfinDecomPatternString, NON_STANDARD_LANGUAGE_MAPPINGS, pageContent, wdh, wi, xmlCommentPattern
-
-
Constructor Summary
Constructors Constructor Description WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extractData()
void
extractDefinition(String definition, int defLevel)
void
setWiktionaryIndex(WiktionaryPageSource wi)
-
Methods inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
cleanUpMarkup, cleanUpMarkup, computeRegionEnd, computeStatistics, convertToHumanReadableForm, extractData, extractDefinition, extractDefinitions, extractExample, extractExample, extractNyms, extractOrthoAlt, filterOutPage, getHumanReadableForm, getWiktionaryPageName, populateMetadata, postProcessData, postProcessModel, removeXMLComments, setWiktionaryPageName, stripParentheses, validateAndStandardizeLanguageCode
-
-
-
-
Field Detail
-
languageSectionPatternString
protected static final String languageSectionPatternString
- See Also:
- Constant Field Values
-
sectionPatternString
protected static final String sectionPatternString
- See Also:
- Constant Field Values
-
pronounciationPatternString
protected static final String pronounciationPatternString
- See Also:
- Constant Field Values
-
definitionExtractor
protected RussianDefinitionExtractorWikiModel definitionExtractor
-
translationExtractor
protected RussianTranslationExtractorWikiModel translationExtractor
-
morphoExtractor
protected RussianMorphoExtractorWikiModel morphoExtractor
-
DEFBLOCK
protected final int DEFBLOCK
- See Also:
- Constant Field Values
-
languageSectionPattern
protected static final Pattern languageSectionPattern
-
sectionPattern
protected static final Pattern sectionPattern
-
isCurrentlyExtracting
protected boolean isCurrentlyExtracting
-
-
Constructor Detail
-
WiktionaryExtractor
public WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
-
Method Detail
-
setWiktionaryIndex
public void setWiktionaryIndex(WiktionaryPageSource wi)
- Specified by:
setWiktionaryIndex
in interfaceIWiktionaryExtractor
- Overrides:
setWiktionaryIndex
in classAbstractWiktionaryExtractor
-
extractData
public void extractData()
- Specified by:
extractData
in classAbstractWiktionaryExtractor
-
extractDefinition
public void extractDefinition(String definition, int defLevel)
- Overrides:
extractDefinition
in classAbstractWiktionaryExtractor
-
-