View Source Bio.Sequence.Alphabets (bio_ex_sequence v0.1.1)

Alphabets relevant to the sequences, coding schemes are expressed in essentially BNF. Values and interpretations for the scheme were accessed from here.

Also exposes the complementary elements for DNA/RNA allowing strands to be complemented. These functions shouldn't be used directly, but look at Bio.Sequence.Dna.complement/2 and Bio.Sequence.Rna.complement/1 for more information.

Alphabets may be used in the declaration of Bio.BaseSequence structs to define how they should be validated. In case one is not supplied, a default may be preferred. See Bio.Sequence.Dna, Bio.Sequence.Rna, Bio.Sequence.AminoAcid, and Bio.Polymer.valid?/2 for more information.

  • Bio.Sequence.Dna The DNA alphabets provided are:

    • common - The standard bases ATGCatgc
    • with_n - The standard alphabet, but with the ambiguous "any" character Nn
    • iupac - The IUPAC standard values ACGTRYSWKMBDHVNacgtryswkmbdhvn
  • Bio.Sequence.Rna

    • common - The standard bases ACGUacgu
    • with_n - The standard alphabet, but with the ambiguous "any" character Nn
    • iupac - The IUPAC standard values ACGURYSWKMBDHVNacguryswkmbdhvn
  • Bio.Sequence.AminoAcid

    • common - The standad 20 amino acid codes ARNDCEQGHILKMFPSTWYVarndceqghilkmfpstwyv
    • iupac - ABCDEFGHJIKLMNPQRSTVWXYZabcdefghjiklmnpqrstvwxyz

Coding Schemes

Deoxyribonucleic Acid codes

A ::= Adenine
C ::= Cytosine
G ::= Guanine
T ::= Thymine

R ::= A | G
Y ::= C | T
S ::= G | C
W ::= A | T
K ::= G | T
M ::= A | C

B ::= S | T (¬A)
D ::= R | T (¬C)
H ::= M | T (¬G)
V ::= M | G (¬T)
N ::= ANY

Ribonucleic Acid codes

A ::= Adenine
C ::= Cytosine
G ::= Guanine
U ::= Uracil

R ::= A | G
Y ::= C | U
S ::= G | C
W ::= A | U
K ::= G | U
M ::= A | C

B ::= S | U (¬A)
D ::= R | U (¬C)
H ::= M | U (¬G)
V ::= M | G (¬U)
N ::= ANY

Amino Acid codes

A ::= Alanine
C ::= Cysteine
D ::= Aspartic Acid
E ::= Glutamic Acid
F ::= Phenylalanine
G ::= Glycine
H ::= Histidine
I ::= Isoleucine
K ::= Lysine
L ::= Leucine
M ::= Methionine
N ::= Asparagine
P ::= Proline
Q ::= Glutamine
R ::= Arginine
S ::= Serine
T ::= Threonine
V ::= Valine
W ::= Tryptophan
Y ::= Tyrosine

B ::= D | N
Z ::= Q | E
J ::= I | L
X ::=  ANY