Sequences¶
sequence.DNA
¶
coral.DNA
is the core data structure of coral
. If you are
already familiar with core python data structures, it mostly acts like a
container similar to lists or strings, but also provides further
object-oriented methods for DNA-specific tasks, like reverse
complementation. Most design functions in coral
return a
coral.DNA
object or something that contains a coral.DNA
object
(like coral.Primer
). In addition, there are related coral.RNA
and coral.Peptide
objects for representing RNA and peptide sequences
and methods for converting between them.
To get started with coral.DNA
, import coral
:
Your first sequence¶
Let’s jump right into things. Let’s make a sequence that’s the first 30 bases of gfp from A. victoria. To initialize a sequence, you feed it a string of DNA characters.
ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT
A few things just happened behind the scenes. First, the input was
checked to make sure it’s DNA (A, T, G, and C). For now, it supports
only unambiguous letters - no N, Y, R, etc. Second, the internal
representation is converted to an uppercase string - this way, DNA is
displayed uniformly and functional elements (like annealing and overhang
regions of primers) can be delineated using case. If you input a non-DNA
sequence, a ValueError
is raised.
For the most part, a sequence.DNA
instance acts like a python
container and many string-like operations work.
ATG
TAC
CACTGGA
GTGACCT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGAAGGAACTTATG
TCCTTCCTTGAATAC
'AT' is in our sequence: True.
'ATT' is in our sequence: False.
Several other common special methods and operators are defined for
sequences - you can concatenate DNA (so long as it isn’t circular) using
+
, repeat linear sequences using *
with an integer, check for
equality with ==
and !=
(note: features, not just sequences,
must be identical), check the length with len(dna_object)
, etc.
Simple sequences - methods¶
In addition to slicing, sequence.DNA
provides methods for common
molecular manipulations. For example, reverse complementing a sequence
is a single call:
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA
An extremely important method is the .copy()
method. It may seem
redundant to have an entire function for copying a sequence - why not
just assign a sequence.DNA
object to a new variable? As in most
high-level languages, python does not actually copy entire objects in
memory when assignment happens - it just adds another reference to the
same data. The short of it is that the very common operation of
generating a lot of new variants to a sequence, or copying a sequence,
requires the use of a .copy()
method. For example, if you want to
generate a new list of variants where an ‘a’ is substituted one at a
time at each part of the sequence, using .copy()
returns the correct
result (the first example) while directly accessing example_dna has
horrible consequences (the edits build up, as they all modify the same
piece of data sequentially):
ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT
['AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA']
['ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'AAGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATAAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAATAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGAAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAAGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGAAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAAAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAAAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAAATTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACATTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTATTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTATCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTACACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTAACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCAATGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACAGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTAGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGAA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA']
An important fact about sequence.DNA
methods and slicing is that
none of the operations modify the object directly (they don’t mutate
their parent) - if we look at example_dna, it has not been
reverse-complemented itself. Running
example_dna.reverse_complement()
outputs a new sequence, so if you
want to save your chance you need to assign a variable:
ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA
You also have direct access important attributes of a sequence.DNA
object. The following are examples of how to get important sequences or
information about a sequence.
ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
True
False
True
ATGAGTAAAGGAGAAGAACTTTTCACTGGA
GAGTAAAGGAGAAGAACTTTTCACTGGAAT
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA