Sequences

sequence.DNA

coral.DNA is the core data structure of coral. If you are already familiar with core python data structures, it mostly acts like a container similar to lists or strings, but also provides further object-oriented methods for DNA-specific tasks, like reverse complementation. Most design functions in coral return a coral.DNA object or something that contains a coral.DNA object (like coral.Primer). In addition, there are related coral.RNA and coral.Peptide objects for representing RNA and peptide sequences and methods for converting between them.

To get started with coral.DNA, import coral:

Your first sequence

Let’s jump right into things. Let’s make a sequence that’s the first 30 bases of gfp from A. victoria. To initialize a sequence, you feed it a string of DNA characters.

ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT

A few things just happened behind the scenes. First, the input was checked to make sure it’s DNA (A, T, G, and C). For now, it supports only unambiguous letters - no N, Y, R, etc. Second, the internal representation is converted to an uppercase string - this way, DNA is displayed uniformly and functional elements (like annealing and overhang regions of primers) can be delineated using case. If you input a non-DNA sequence, a ValueError is raised.

For the most part, a sequence.DNA instance acts like a python container and many string-like operations work.

ATG
TAC
CACTGGA
GTGACCT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGAAGGAACTTATG
TCCTTCCTTGAATAC
'AT' is in our sequence: True.
'ATT' is in our sequence: False.

Several other common special methods and operators are defined for sequences - you can concatenate DNA (so long as it isn’t circular) using +, repeat linear sequences using * with an integer, check for equality with == and != (note: features, not just sequences, must be identical), check the length with len(dna_object), etc.

Simple sequences - methods

In addition to slicing, sequence.DNA provides methods for common molecular manipulations. For example, reverse complementing a sequence is a single call:

TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA

An extremely important method is the .copy() method. It may seem redundant to have an entire function for copying a sequence - why not just assign a sequence.DNA object to a new variable? As in most high-level languages, python does not actually copy entire objects in memory when assignment happens - it just adds another reference to the same data. The short of it is that the very common operation of generating a lot of new variants to a sequence, or copying a sequence, requires the use of a .copy() method. For example, if you want to generate a new list of variants where an ‘a’ is substituted one at a time at each part of the sequence, using .copy() returns the correct result (the first example) while directly accessing example_dna has horrible consequences (the edits build up, as they all modify the same piece of data sequentially):

ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT
['AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA']

['ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'AAGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATAAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAATAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGAAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAAGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGAAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAAAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAAAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAAATTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACATTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTATTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTATCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTACACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTAACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA', 'ATGAGTAAAGGAGAAGAACTTTTCAATGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACAGGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTAGA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGAA', 'ATGAGTAAAGGAGAAGAACTTTTCACTGGA']

An important fact about sequence.DNA methods and slicing is that none of the operations modify the object directly (they don’t mutate their parent) - if we look at example_dna, it has not been reverse-complemented itself. Running example_dna.reverse_complement() outputs a new sequence, so if you want to save your chance you need to assign a variable:

ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TACTCATTTCCTCTTCTTGAAAAGTGACCT
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA

You also have direct access important attributes of a sequence.DNA object. The following are examples of how to get important sequences or information about a sequence.

ATGAGTAAAGGAGAAGAACTTTTCACTGGA
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
True
False
True
ATGAGTAAAGGAGAAGAACTTTTCACTGGA

GAGTAAAGGAGAAGAACTTTTCACTGGAAT
TCCAGTGAAAAGTTCTTCTCCTTTACTCAT
AGGTCACTTTTCAAGAAGAGGAAATGAGTA