Package Bio :: Module SGMLExtractor
[hide private]
[frames] | no frames]

Module SGMLExtractor

source code

Code for more fancy file handles.


Classes:
SGMLExtractorHandle     File object that strips tags and returns content from specified
tags blocks.

SGMLExtractor   Object that scans for specified SGML tag pairs, removes any inner tags
and returns the raw content.
For example the object SGMLExtractor( [ 'h1' ] )on the following html file would return
'House that Jack built'
SGMLExtractor( [ 'dt' ] ) would return 'ratcatdogcowmaiden'
SGMLExtractor( [ 'dt', 'dd' ] ) would return 'rat that ate the malttcat ate  the rat' etc

<h1>House that Jack Built</h1>
<dl>
  <dt><big>rat</big></dt>
    <dd><big>ate the malt</big></dd>
  <dt><big>cat</big></dt>
    <dd><big>that ate the rat</big></dd>
  <dt><big>dog</big></dt>
    <dd><big>that worried the dats</big></dd>
  <dt><big>cow</big></dt>
    <dd><big>with crumpled horn</big></dd>
  <dt><big>maiden</big></dt>
    <dd><big>all forlorns</big></dd>
</dl>

Classes [hide private]
  SGMLExtractorHandle
A Python handle that automatically strips SGML tags and returns data from specified tag start and end pairs.
  SGMLExtractor
Functions [hide private]
 
is_empty(items) source code
Variables [hide private]
  __warningregistry__ = {('Bio.SGMLExtractor was deprecated, as ...
Variables Details [hide private]

__warningregistry__

Value:
{('Bio.SGMLExtractor was deprecated, as all Biopython modules that use\
 Bio.SGMLExtractor have been deprecated. If you do use this module, pl\
ease contact the Biopython developers at biopython-dev@biopython.org t\
o avoid permanent removal of this module',
  <type 'exceptions.UserWarning'>,
  36): 1}