Package Bio :: Package expressions :: Module genbank
[hide private]
[frames] | no frames]

Module genbank

source code

Martel based parser to read GenBank formatted files.

This is a huge regular regular expression for GenBank, built using the 'regular expressions on steroids' capabilities of Martel.

Documentation for GenBank format that I found:

o GenBank/EMBL feature tables are described at: http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

o There are also descriptions of different GenBank lines at: http://www.ibc.wustl.edu/standards/gbrel.txt

Functions [hide private]
 
define_block(identifier, block_tag, block_data, std_block_tag=None, std_tag=None)
Define a Martel grouping which can parse a block of text.
source code
Variables [hide private]
  INDENT = 12
  FEATURE_KEY_INDENT = 5
  FEATURE_QUALIFIER_INDENT = 21
  blank_space = Martel.Spaces()
  small_indent_space = Martel.Str(" "* 2)
  big_indent_space = Martel.Str(" "* FEATURE_KEY_INDENT)
  qualifier_space = Martel.Str(" "* FEATURE_QUALIFIER_INDENT) | ...
  locus = Martel.Group("locus", Martel.Re(r"[\w\-]+"))
  size = Martel.Group("size", Martel.Rep1(Martel.Integer()))
  valid_residue_prefixes = ['ss-', 'ds-', 'ms-']
  valid_residue_types = ['DNA', 'RNA', 'mRNA', 'tRNA', 'rRNA', '...
  residue_prefixes = map(Martel.Str, valid_residue_prefixes)
  residue_types = map(Martel.Str, valid_residue_types)
  residue_type = Martel.Group("residue_type", Martel.Opt(Martel....
  date = Martel.Group("date", Martel.Re("[-\w]+"))
  valid_divisions = ['PRI', 'ROD', 'MAM', 'VRT', 'INV', 'PLN', '...
  divisions = map(Martel.Str, valid_divisions)
  data_file_division = Martel.Group("data_file_division", Martel...
  locus_line = Martel.Group("locus_line", Martel.Str("LOCUS")+ b...
  definition_block = define_block("DEFINITION", "definition_bloc...
  accession = Martel.Group("accession", Martel.Re("[\w]+"))
  region = Martel.Group("region", Martel.Re("[\d]+..[\d]+"))
  accession_block = Martel.Group("accession_block", Martel.Str("...
  nid = Martel.Group("nid", Martel.Re("[\w\d]+"))
  nid_line = Martel.Group("nid_line", Martel.Str("NID")+ blank_s...
  pid = Martel.Group("pid", Martel.Re("[\w\d]+"))
  pid_line = Martel.Group("pid_line", Martel.Str("PID")+ blank_s...
  version = Martel.Group("version", Std.dbid(Martel.Re("[\w\d\.]...
  gi = Martel.Group("gi", Std.dbid(Martel.Re("[\d]+"), {"type": ...
  version_line = Martel.Group("version_line", Martel.Str("VERSIO...
  db_source_block = define_block("DBSOURCE", "db_source_block", ...
  keywords_block = define_block("KEYWORDS", "keywords_block", "k...
  segment = Martel.Group("segment", Martel.Integer("segment_num"...
  segment_line = Martel.Group("segment_line", Martel.Str("SEGMEN...
  source_block = define_block("SOURCE", "source_block", "source")
  organism = Martel.Group("organism", Martel.ToEol())
  taxonomy = Martel.Group("taxonomy", Martel.Rep1(blank_space+ M...
  organism_block = Martel.Group("organism_block", Martel.Str(" ...
  reference_num = Martel.Group("reference_num", Martel.Re("[\d]+"))
  reference_bases = Martel.Group("reference_bases", Martel.Str("...
  reference_line = Martel.Group("reference_line", Martel.Str("RE...
  authors_block = define_block(" AUTHORS", "authors_block", "au...
  consrtm_block = define_block(" CONSRTM", "consrtm_block", "co...
  title_block = define_block(" TITLE", "title_block", "title")
  journal_block = define_block(" JOURNAL", "journal_block", "jo...
  medline_line = Martel.Group("medline_line", Martel.Str(" MEDL...
  pubmed_line = Martel.Group("pubmed_line", Martel.Str(" PUBME...
  remark_block = define_block(" REMARK", "remark_block", "remark")
  reference = Martel.Group("reference", reference_line+ Martel.O...
  comment_block = define_block("COMMENT", "comment_block", "comm...
  primary_line = Martel.Group("primary_line", Martel.Str("PRIMAR...
  primary_ref_line = Martel.Group("primary_ref_line", blank_spac...
  primary = Martel.Group("primary", primary_line+ Martel.Rep1(pr...
  features_line = Martel.Group("features_line", Martel.Str("FEAT...
  feature_key = Martel.Group("feature_key", Martel.Re("[\w'-]+"))
location = Martel.Group("location",...
  location = Martel.Group("location", Std.feature_location(Marte...
  feature_key_line = Martel.Group("feature_key_line", big_indent...
  quote = Martel.Str('"')
  quoted_chars = Std.feature_qualifier_description(Martel.Re(r'(...
  quoted_string = quote+ quoted_chars+ Martel.Rep(Martel.AnyEol(...
  unquoted_string = Martel.AssertNot(quote)+ Std.feature_qualifi...
  qualifier = Std.feature_qualifier(qualifier_space+ Martel.Str(...
  feature = Std.feature(feature_key_line+ Martel.Rep(qualifier))
  feature_block = Std.feature_block(Martel.Rep1(feature), {"loca...
  base_count = Martel.Group("base_count", Martel.Re("[\w\d ]+"))
  base_count_line = Martel.Group("base_count_line", Martel.Str("...
  origin_line = Martel.Group("origin_line", Martel.Str("ORIGIN")...
  base_number = Martel.Group("base_number", Martel.Re("[\d]+"))
  sequence = Std.sequence(Martel.Group("sequence", Martel.Re("[\...
  sequence_plus_spaces = Martel.Group("sequence_plus_spaces", Ma...
  sequence_line = Martel.Group("sequence_line", blank_space+ Mar...
  sequence_entry = Std.sequence_block(Martel.Group("sequence_ent...
  contig_location = Martel.Group("contig_location", Martel.ToEol...
  contig_block = Martel.Group("contig_block", Martel.Str("CONTIG...
  record_end = Martel.Group("record_end", Martel.Str("//")+ Mart...
  record = Std.record(Martel.Group("genbank_record", locus_line+...
  header = Martel.Re("...
  ncbi_format = Martel.HeaderFooter("genbank", {"format": "ncbi_...
  format = Martel.ParseRecords("genbank", {"format": "genbank"},...
  __warningregistry__ = {('Bio.expressions was deprecated, as it...
Function Details [hide private]

define_block(identifier, block_tag, block_data, std_block_tag=None, std_tag=None)

source code 

Define a Martel grouping which can parse a block of text.

Many of the GenBank lines we'll want to process are grouped into a block like:

IDENTIFIER Blah blah blah

Where blah blah blah can wrap for multiple lines. This function makes it easy to consistently define a definition for these blocks.

Arguments: o identifier - The identifier that begins the block (like DEFINITION). o block_tag - A callback tag for the entire block. o block_data - A callback tag for the data in the block (ie. the stuff you are interested in). o std_block_tag - A Bio.Std Martel tag used to register the entire block as having being a "standard" type of information. o std_tag - A Bio.Std Martel tag used to register just the information in the block as being "standard"


Variables Details [hide private]

qualifier_space

Value:
Martel.Str(" "* FEATURE_QUALIFIER_INDENT) | Martel.Str("\t"+ " "*(FEAT\
URE_QUALIFIER_INDENT-8))

valid_residue_types

Value:
['DNA',
 'RNA',
 'mRNA',
 'tRNA',
 'rRNA',
 'uRNA',
 'scRNA',
 'snRNA',
...

residue_type

Value:
Martel.Group("residue_type", Martel.Opt(Martel.Alt(* residue_prefixes)\
)+ Martel.Opt(Martel.Alt(* residue_types))+ Martel.Opt(Martel.Opt(blan\
k_space)+ Martel.Alt(Martel.Str("circular"), Martel.Str("linear"))))

valid_divisions

Value:
['PRI',
 'ROD',
 'MAM',
 'VRT',
 'INV',
 'PLN',
 'BCT',
 'RNA',
...

data_file_division

Value:
Martel.Group("data_file_division", Martel.Alt(* divisions))

locus_line

Value:
Martel.Group("locus_line", Martel.Str("LOCUS")+ blank_space+ locus+ bl\
ank_space+ size+ blank_space+ Martel.Re("bp|aa")+ blank_space+ Martel.\
Opt(residue_type+ blank_space)+ data_file_division+ blank_space+ date+\
 Martel.AnyEol())

definition_block

Value:
define_block("DEFINITION", "definition_block", "definition", Std.descr\
iption_block, Std.description)

accession_block

Value:
Martel.Group("accession_block", Martel.Str("ACCESSION")+ Martel.Rep1(b\
lank_space+ Martel.Rep1(accession+ Martel.Opt(Martel.Opt(Martel.Str(" \
"))+ Martel.Str("REGION:")+ Martel.Opt(Martel.Str(" "))+ region)+ Mart\
el.Opt(Martel.Str(" ")))+ Martel.AnyEol()))

nid_line

Value:
Martel.Group("nid_line", Martel.Str("NID")+ blank_space+ nid+ Martel.A\
nyEol())

pid_line

Value:
Martel.Group("pid_line", Martel.Str("PID")+ blank_space+ pid+ Martel.A\
nyEol())

version

Value:
Martel.Group("version", Std.dbid(Martel.Re("[\w\d\.]+"), {"type": "pri\
mary", "dbname": "genbank"}))

gi

Value:
Martel.Group("gi", Std.dbid(Martel.Re("[\d]+"), {"type": "secondary", \
"dbname": "genbank"}))

version_line

Value:
Martel.Group("version_line", Martel.Str("VERSION")+ blank_space+ versi\
on+ Martel.Opt(blank_space+ Martel.Str("GI:")+ gi)+ Martel.AnyEol())

db_source_block

Value:
define_block("DBSOURCE", "db_source_block", "db_source")

keywords_block

Value:
define_block("KEYWORDS", "keywords_block", "keywords")

segment

Value:
Martel.Group("segment", Martel.Integer("segment_num")+ Martel.Str(" of\
 ")+ Martel.Integer("segment_total"))

segment_line

Value:
Martel.Group("segment_line", Martel.Str("SEGMENT     ")+ segment+ Mart\
el.AnyEol())

taxonomy

Value:
Martel.Group("taxonomy", Martel.Rep1(blank_space+ Martel.ToEol()))

organism_block

Value:
Martel.Group("organism_block", Martel.Str("  ORGANISM")+ blank_space+ \
organism+ taxonomy)

reference_bases

Value:
Martel.Group("reference_bases", Martel.Str("(")+ Martel.Re("[;\w\d \R]\
+")+ Martel.Str(")"))

reference_line

Value:
Martel.Group("reference_line", Martel.Str("REFERENCE")+ blank_space+ r\
eference_num+ Martel.Opt(blank_space+ reference_bases)+ Martel.AnyEol(\
))

authors_block

Value:
define_block("  AUTHORS", "authors_block", "authors")

consrtm_block

Value:
define_block("  CONSRTM", "consrtm_block", "consrtm")

journal_block

Value:
define_block("  JOURNAL", "journal_block", "journal")

medline_line

Value:
Martel.Group("medline_line", Martel.Str("  MEDLINE   ")+ Martel.Intege\
r("medline_id")+ Martel.AnyEol())

pubmed_line

Value:
Martel.Group("pubmed_line", Martel.Str("   PUBMED   ")+ Martel.Integer\
("pubmed_id")+ Martel.AnyEol())

reference

Value:
Martel.Group("reference", reference_line+ Martel.Opt(authors_block)+ M\
artel.Opt(consrtm_block)+ Martel.Opt(title_block)+ journal_block+ Mart\
el.Opt(medline_line)+ Martel.Opt(pubmed_line)+ Martel.Opt(remark_block\
))

comment_block

Value:
define_block("COMMENT", "comment_block", "comment")

primary_line

Value:
Martel.Group("primary_line", Martel.Str("PRIMARY")+ blank_space+ Marte\
l.Str("TPA_SPAN")+ blank_space+ Martel.Str("PRIMARY_IDENTIFIER")+ blan\
k_space+ Martel.Str("PRIMARY_SPAN")+ blank_space+ Martel.Str("COMP")+ \
Martel.ToEol())

primary_ref_line

Value:
Martel.Group("primary_ref_line", blank_space+ Martel.Re(r"\d+\-\d+")+ \
blank_space+ Martel.Re("[\S]+")+ blank_space+ Martel.Re("\d+\-\d+")+ M\
artel.Opt(blank_space+ Martel.Str("c"))+ Martel.ToEol())

primary

Value:
Martel.Group("primary", primary_line+ Martel.Rep1(primary_ref_line))

features_line

Value:
Martel.Group("features_line", Martel.Str("FEATURES")+ blank_space+ Mar\
tel.Str("Location/Qualifiers")+ Martel.AnyEol())

feature_key


location = Martel.Group("location",
                      Martel.ToEol("feature_location") +                       Martel.Rep(qualifier_space +                                  Martel.Re("(?!/)") +                                  Martel.ToEol("feature_location")))

Value:
Martel.Group("feature_key", Martel.Re("[\w'-]+"))

location

Value:
Martel.Group("location", Std.feature_location(Martel.UntilEol())+ Mart\
el.AnyEol()+ Martel.Rep(qualifier_space+ Martel.AssertNot(Martel.Str("\
/"))+ Std.feature_location(Martel.UntilEol())+ Martel.AnyEol()))

feature_key_line

Value:
Martel.Group("feature_key_line", big_indent_space+ Std.feature_name(fe\
ature_key)+ location)

quoted_chars

Value:
Std.feature_qualifier_description(Martel.Re(r'([^"\R]|"")*'))

quoted_string

Value:
quote+ quoted_chars+ Martel.Rep(Martel.AnyEol()+ qualifier_space+ quot\
ed_chars)+ quote+ Martel.AnyEol()

unquoted_string

Value:
Martel.AssertNot(quote)+ Std.feature_qualifier_description(Martel.Unti\
lEol())+ Martel.AnyEol()

qualifier

Value:
Std.feature_qualifier(qualifier_space+ Martel.Str("/")+ Std.feature_qu\
alifier_name(Martel.Word("feature_qualifier_name"))+(Martel.AnyEol() |\
(Martel.Str("=")+ Martel.Group("feature_qualifier_description", (unquo\
ted_string | quoted_string)))))

feature_block

Value:
Std.feature_block(Martel.Rep1(feature), {"location-style": "genbank"})

base_count_line

Value:
Martel.Group("base_count_line", Martel.Str("BASE COUNT")+ blank_space+\
 base_count+ Martel.AnyEol())

origin_line

Value:
Martel.Group("origin_line", Martel.Str("ORIGIN")+(Martel.ToEol("origin\
_name") | Martel.AnyEol()))

sequence

Value:
Std.sequence(Martel.Group("sequence", Martel.Re("[\w]+")))

sequence_plus_spaces

Value:
Martel.Group("sequence_plus_spaces", Martel.Rep1(Martel.Str(" ")+ Mart\
el.Opt(sequence))+ Martel.Opt(Martel.Str(" ")))

sequence_line

Value:
Martel.Group("sequence_line", blank_space+ Martel.Opt(base_number)+ se\
quence_plus_spaces+ Martel.AnyEol())

sequence_entry

Value:
Std.sequence_block(Martel.Group("sequence_entry", origin_line+ Martel.\
Rep1(sequence_line)))

contig_location

Value:
Martel.Group("contig_location", Martel.ToEol("feature_location")+ Mart\
el.Rep(Martel.Str(" "* INDENT)+ Martel.Re("(?!/)")+ Martel.ToEol("feat\
ure_location")))

contig_block

Value:
Martel.Group("contig_block", Martel.Str("CONTIG")+ blank_space+ contig\
_location)

record_end

Value:
Martel.Group("record_end", Martel.Str("//")+ Martel.Rep1(Martel.AnyEol\
()))

record

Value:
Std.record(Martel.Group("genbank_record", locus_line+ definition_block\
+ accession_block+ Martel.Opt(nid_line)+ Martel.Opt(pid_line)+ Martel.\
Opt(version_line)+ Martel.Opt(db_source_block)+ keywords_block+ Martel\
.Opt(segment_line)+ source_block+ organism_block+ Martel.Rep(reference\
)+ Martel.Opt(primary)+ Martel.Opt(comment_block)+ features_line+ feat\
ure_block+ Martel.Alt(Martel.Opt(base_count_line)+ sequence_entry, con\
tig_block)+ record_end))

header

Value:
Martel.Re("""\
(?P<filename>[^ ]+) +Genetic Sequence Data Bank
 *(?P<release_day>\d+) (?P<release_month>\w+) (?P<release_year>\d+)

 *(?P<data_bank_name>[^\R]+)

 *(?P<data_bank_name>[^\R]+)

...

ncbi_format

Value:
Martel.HeaderFooter("genbank", {"format": "ncbi_genbank"}, header, Rec\
ordReader.CountLines, (10,), record, RecordReader.EndsWith, ("//",), N\
one, None, None,)

format

Value:
Martel.ParseRecords("genbank", {"format": "genbank"}, record, RecordRe\
ader.StartsWith, ("LOCUS ",))

__warningregistry__

Value:
{('Bio.expressions was deprecated, as it does not work with recent ver\
sions of mxTextTools. If you want to continue to use this module, plea\
se get in contact with the Biopython developers at biopython-dev@biopy\
thon.org to avoid permanent removal of this module from Biopython',
  <type 'exceptions.DeprecationWarning'>,
  16): 1}