Package org.getalp.dbnary.languages.por
Class WiktionaryExtractor
- java.lang.Object
-
- org.getalp.dbnary.languages.AbstractWiktionaryExtractor
-
- org.getalp.dbnary.languages.por.WiktionaryExtractor
-
- All Implemented Interfaces:
IWiktionaryExtractor
public class WiktionaryExtractor extends AbstractWiktionaryExtractor
- Author:
- serasset
-
-
Field Summary
Fields Modifier and Type Field Description protected PortugueseDefinitionExtractorWikiModel
definitionExtractor
protected boolean
isCurrentlyExtracting
protected static Pattern
languageSectionPattern
protected static String
languageSectionPatternString
protected static Pattern
level1HeaderPattern
protected static String
level1HeaderPatternString
protected static HashMap<String,String>
nymMarkerToNymName
protected static HashSet<String>
posMarkers
protected static String
pronounciationPatternString
protected static Pattern
sectionPattern
protected static String
sectionPatternString
protected PortugueseTranslationExtractorWikiModel
translationExtractor
-
Fields inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
debutOrfinDecomPatternString, expander, NON_STANDARD_LANGUAGE_MAPPINGS, pageContent, wdh, wi, xmlCommentPattern
-
-
Constructor Summary
Constructors Constructor Description WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extractData()
void
extractDefinition(String definition, int defLevel)
boolean
isCurrentlyExtracting()
void
setWiktionaryIndex(WiktionaryPageSource wi)
-
Methods inherited from class org.getalp.dbnary.languages.AbstractWiktionaryExtractor
cleanUpMarkup, cleanUpMarkup, computeRegionEnd, computeStatistics, convertToHumanReadableForm, extractData, extractDefinition, extractDefinitions, extractExample, extractExample, extractNyms, extractOrthoAlt, filterOutPage, getWiktionaryPageName, populateMetadata, postProcessData, postProcessModel, removeXMLComments, setWiktionaryPageName, stripParentheses, validateAndStandardizeLanguageCode
-
-
-
-
Field Detail
-
languageSectionPatternString
protected static final String languageSectionPatternString
- See Also:
- Constant Field Values
-
level1HeaderPatternString
protected static final String level1HeaderPatternString
- See Also:
- Constant Field Values
-
sectionPatternString
protected static final String sectionPatternString
- See Also:
- Constant Field Values
-
pronounciationPatternString
protected static final String pronounciationPatternString
- See Also:
- Constant Field Values
-
sectionPattern
protected static final Pattern sectionPattern
-
languageSectionPattern
protected static final Pattern languageSectionPattern
-
level1HeaderPattern
protected static final Pattern level1HeaderPattern
-
isCurrentlyExtracting
protected boolean isCurrentlyExtracting
-
definitionExtractor
protected PortugueseDefinitionExtractorWikiModel definitionExtractor
-
translationExtractor
protected PortugueseTranslationExtractorWikiModel translationExtractor
-
-
Constructor Detail
-
WiktionaryExtractor
public WiktionaryExtractor(IWiktionaryDataHandler wdh)
-
-
Method Detail
-
setWiktionaryIndex
public void setWiktionaryIndex(WiktionaryPageSource wi)
- Specified by:
setWiktionaryIndex
in interfaceIWiktionaryExtractor
- Overrides:
setWiktionaryIndex
in classAbstractWiktionaryExtractor
-
isCurrentlyExtracting
public boolean isCurrentlyExtracting()
-
extractData
public void extractData()
- Specified by:
extractData
in classAbstractWiktionaryExtractor
-
extractDefinition
public void extractDefinition(String definition, int defLevel)
- Overrides:
extractDefinition
in classAbstractWiktionaryExtractor
-
-