Class Regexp
source code
object --+
|
parse.ParseI --+
|
ChunkParseI --+
|
object --+ |
| |
parse.ParseI --+ |
| |
parse.AbstractParse --+
|
Regexp
A grammar based chunk parser. C{chunk.Regexp} uses a set of
regular expression patterns to specify the behavior of the parser.
The chunking of the text is encoded using a C{ChunkString}, and
each rule acts by modifying the chunking in the C{ChunkString}.
The rules are all implemented using regular expression matching
and substitution.
A grammar contains one or more clauses in the following form:
NP:
{<DT|JJ>} # chunk determiners and adjectives
}<[\.VI].*>+{ # chink any tag beginning with V, I, or .
<.*>}{<DT> # split a chunk at a determiner
<DT|JJ>{}<NN.*> # merge chunk ending with det/adj with one starting with a noun
The patterns of a clause are executed in order. An earlier
pattern may introduce a chunk boundary that prevents a later
pattern from executing. Sometimes an individual pattern will
match on multiple, overlapping extents of the input. As with
regular expression substitution more generally, the chunker will
identify the first match possible, then continue looking for matches
after this one has ended.
The clauses of a grammar are also executed in order. A cascaded
chunk parser is one having more than one clause. The maximum depth
of a parse tree created by this chunk parser is the same as the
number of clauses in the grammar.
When tracing is turned on, the comment portion of a line is displayed
each time the corresponding pattern is applied.
@type _start: C{string}
@ivar _start: The start symbol of the grammar (the root node of resulting trees)
@type _stages: C{int}
@ivar _stages: The list of parsing stages corresponding to the grammar
|
__init__(self,
grammar,
top_node=' S ' ,
loop=1,
trace=0)
Create a new chunk parser, from the given start state and set of
chunk patterns. |
source code
|
|
Tree
|
parse(self,
chunk_struct,
trace=None)
Apply the chunk parser to this input. |
source code
|
|
string
|
|
string
|
|
Inherited from parse.AbstractParse :
get_parse ,
get_parse_list ,
grammar
Inherited from parse.ParseI :
get_parse_dict ,
get_parse_probs
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__
|
__init__(self,
grammar,
top_node=' S ' ,
loop=1,
trace=0)
(Constructor)
| source code
|
Create a new chunk parser, from the given start state and set of chunk
patterns.
- Parameters:
grammar (list of string ) - The list of patterns that defines the grammar
top_node (string or Nonterminal) - The top node of the tree being created
loop (int ) - The number of times to run through the patterns
trace (int ) - The level of tracing that should be used when parsing a text.
0 will generate no tracing output; 1
will generate normal tracing output; and 2 or higher
will generate verbose tracing output.
- Overrides:
parse.AbstractParse.__init__
|
Apply the chunk parser to this input.
- Parameters:
chunk_struct (Tree ) - the chunk structure to be (further) chunked (this tree is
modified, and is also returned)
trace (int ) - The level of tracing that should be used when parsing a text.
0 will generate no tracing output; 1
will generate normal tracing output; and 2 or
highter will generate verbose tracing output. This value
overrides the trace level value that was given to the
constructor.
- Returns:
Tree
- the chunked output.
- Overrides:
ChunkParseI.parse
|
repr(x)
- Returns:
string
- a concise string representation of this
chunk.Regexp .
- Overrides:
object.__repr__
|
__str__(self)
(Informal representation operator)
| source code
|
str(x)
- Returns:
string
- a verbose string representation of this
RegexpChunk .
- Overrides:
object.__str__
|