Package org.getalp.dbnary.morphology
Class RefactoredTableExtractor
- java.lang.Object
-
- org.getalp.dbnary.morphology.RefactoredTableExtractor
-
- All Implemented Interfaces:
Cloneable
- Direct Known Subclasses:
CatalanConjugationTableExtractor
,FrenchAccordsTableExtractor
,ImpersonalMoodTableExtractor
,StandardMoodTableExtractor
public abstract class RefactoredTableExtractor extends Object implements Cloneable
-
-
Field Summary
Fields Modifier and Type Field Description protected ArrayMatrix<org.jsoup.nodes.Element>
cells
protected String
entryName
protected List<String>
globalContext
protected String
language
protected ArrayMatrix<Set<LexicalForm>>
results
-
Constructor Summary
Constructors Constructor Description RefactoredTableExtractor(String entryName, String language, List<String> context)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected boolean
addToContext(ArrayMatrix<org.jsoup.nodes.Element> columnHeaders, int i, int j, List<String> res)
protected Object
clone()
protected boolean
elementIsAValidForm(org.jsoup.nodes.Element anchor)
protected org.jsoup.nodes.Element
getCell(int i, int j)
protected Set<LexicalForm>
getInflectedForms(org.jsoup.nodes.Element cell, InflectionScheme infl)
Extract wordforms from table cell
Splits cell content by <br\> or comma and removes HTML formattingprotected abstract InflectionScheme
getInflectionSchemeFromContext(List<String> context)
returns the inflection that correspond to current cell contextprotected Set<LexicalForm>
getLexicalFormsFromCell(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
returns the set of lexical forms that correspond to current cell and contextprotected Set<LexicalForm>
getResult(int i, int j)
protected List<String>
getRowAndColumnContext(int nrow, int ncol, ArrayMatrix<org.jsoup.nodes.Element> columnHeaders)
protected Set<LexicalForm>
handleNestedTables(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
protected Set<LexicalForm>
handleSimpleCell(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
protected boolean
isHeaderCell(org.jsoup.nodes.Element cell)
Set<LexicalForm>
parseTable(org.jsoup.nodes.Element tableElement)
protected boolean
shouldProcessCell(org.jsoup.nodes.Element cell)
true if the cell should be processed by the extractor.protected String
standardizeValue(String value)
-
-
-
Field Detail
-
language
protected final String language
-
entryName
protected String entryName
-
cells
protected ArrayMatrix<org.jsoup.nodes.Element> cells
-
results
protected ArrayMatrix<Set<LexicalForm>> results
-
-
Method Detail
-
clone
protected Object clone() throws CloneNotSupportedException
- Overrides:
clone
in classObject
- Throws:
CloneNotSupportedException
-
getCell
protected org.jsoup.nodes.Element getCell(int i, int j)
-
getResult
protected Set<LexicalForm> getResult(int i, int j)
-
parseTable
public Set<LexicalForm> parseTable(org.jsoup.nodes.Element tableElement)
-
shouldProcessCell
protected boolean shouldProcessCell(org.jsoup.nodes.Element cell)
true if the cell should be processed by the extractor. This is called for a normal cell and not for a header cell, it allows specific subclasses to further filter out cells based on their content.- Parameters:
cell
- the td element to be examined- Returns:
- true if the cell should be processed
-
handleSimpleCell
protected Set<LexicalForm> handleSimpleCell(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
-
handleNestedTables
protected Set<LexicalForm> handleNestedTables(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
-
isHeaderCell
protected boolean isHeaderCell(org.jsoup.nodes.Element cell)
-
getRowAndColumnContext
protected List<String> getRowAndColumnContext(int nrow, int ncol, ArrayMatrix<org.jsoup.nodes.Element> columnHeaders)
-
addToContext
protected boolean addToContext(ArrayMatrix<org.jsoup.nodes.Element> columnHeaders, int i, int j, List<String> res)
-
getLexicalFormsFromCell
protected Set<LexicalForm> getLexicalFormsFromCell(int i, int j, org.jsoup.nodes.Element cell, List<String> context)
returns the set of lexical forms that correspond to current cell and contextThe context is a list of String that corresponds to all column and row headers + section headers in which the cell appears.
- Parameters:
i
- the line number of the cell in the tablej
- the column number of the cell in the tablecontext
- a list of Strings that represent the celle context- Returns:
- The set of lexical forms corresponding to the context
-
getInflectionSchemeFromContext
protected abstract InflectionScheme getInflectionSchemeFromContext(List<String> context)
returns the inflection that correspond to current cell contextThe cell context is a list of String that corresponds to all column and row headers + section headers in which the cell appears.
- Parameters:
context
- a list of Strings that represent the celle context- Returns:
- The set of lexical forms corresponding to the context
-
getInflectedForms
protected Set<LexicalForm> getInflectedForms(org.jsoup.nodes.Element cell, InflectionScheme infl)
Extract wordforms from table cell
Splits cell content by <br\> or comma and removes HTML formatting- Parameters:
cell
- the current cell in the inflection tableinfl
- the inflection scheme corresponding to the current cell- Returns:
- Set of wordforms (Strings) from this cell
-
elementIsAValidForm
protected boolean elementIsAValidForm(org.jsoup.nodes.Element anchor)
-
-