Package org.getalp.dbnary.languages.spa
Class WiktionaryExtractor
- java.lang.Object
-
- org.getalp.dbnary.languages.AbstractWiktionaryExtractor
-
- org.getalp.dbnary.languages.spa.WiktionaryExtractor
-
- All Implemented Interfaces:
IWiktionaryExtractor
public class WiktionaryExtractor extends AbstractWiktionaryExtractor
-
-
Field Summary
Fields Modifier and Type Field Description protected SpanishDefinitionExtractorWikiModel
definitionExpander
protected SpanishHeaderExtractorWikiModel
headerExtractor
protected static String
headerPatternString
protected static HashSet<String>
ignorableSectionMarkers
protected static String
languageSectionPatternString
protected static String
multilineMacroPatternString
protected static HashMap<String,String>
nymMarkerToNymName
protected static HashSet<String>
posMarkersPrefixes
protected static Pattern
sectionPattern
protected static String
sectionPatternString
protected static Pattern
spanishDefinitionPattern
protected static String
spanishDefinitionPatternString
protected SpanishTranslationExtractorWikiModel
translationExtractor
-
Fields inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
debutOrfinDecomPatternString, expander, NON_STANDARD_LANGUAGE_MAPPINGS, pageContent, wdh, wi, xmlCommentPattern
-
-
Constructor Summary
Constructors Constructor Description WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extractData()
void
extractData(WikiText page)
void
extractDefinition(String definition, String senseNumber)
protected void
extractDefinitions(int startOffset, int endOffset)
void
setWiktionaryIndex(WiktionaryPageSource wi)
-
Methods inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
cleanUpMarkup, cleanUpMarkup, computeRegionEnd, computeStatistics, convertToHumanReadableForm, extractData, extractDefinition, extractDefinition, extractExample, extractExample, extractNyms, extractOrthoAlt, filterOutPage, getWiktionaryPageName, populateMetadata, postProcessData, postProcessModel, removeXMLComments, setWiktionaryPageName, stripParentheses, validateAndStandardizeLanguageCode
-
-
-
-
Field Detail
-
languageSectionPatternString
protected static final String languageSectionPatternString
-
headerPatternString
protected static final String headerPatternString
-
spanishDefinitionPatternString
protected static final String spanishDefinitionPatternString
-
sectionPatternString
protected static final String sectionPatternString
-
spanishDefinitionPattern
protected static final Pattern spanishDefinitionPattern
-
multilineMacroPatternString
protected static final String multilineMacroPatternString
-
sectionPattern
protected static final Pattern sectionPattern
-
definitionExpander
protected SpanishDefinitionExtractorWikiModel definitionExpander
-
headerExtractor
protected SpanishHeaderExtractorWikiModel headerExtractor
-
translationExtractor
protected SpanishTranslationExtractorWikiModel translationExtractor
-
-
Constructor Detail
-
WiktionaryExtractor
public WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
-
Method Detail
-
setWiktionaryIndex
public void setWiktionaryIndex(WiktionaryPageSource wi)
- Specified by:
setWiktionaryIndex
in interfaceIWiktionaryExtractor
- Overrides:
setWiktionaryIndex
in classAbstractWiktionaryExtractor
-
extractData
public void extractData()
- Specified by:
extractData
in classAbstractWiktionaryExtractor
-
extractData
public void extractData(WikiText page)
-
extractDefinitions
protected void extractDefinitions(int startOffset, int endOffset)
- Overrides:
extractDefinitions
in classAbstractWiktionaryExtractor
-
-