View Source Bio.Polymeric protocol (bio_ex_sequence v0.1.1)
Define Polymeric interface of a sequence type.
The Bio.Polymeric
protocol allows us to define implementations
of a kmers/2
function. This is part of the approach to translating different
polymers according to the nature of actual biological or chemical processes.
The idea is that defining how a sequence is sub-divided into k-mers for enumeration is something that must occur for specific conversions. However, it's also something that you would not necessarily want to have to do every single time you applied the conversion.
Essentially, each structural definition of a sequence will have some meaningful way of splitting it into a Kmer enumeration. This is used in all forms of computation, largely though, in conversions. For example, DNA -> RNA conversions require element-wise (k=1) conversion functions. Whereas, RNA -> Amino Acid requires codon-wise (k=3).
In order to preserve the standard interface defined by Bio.Polymer
and Bio.Polymer.convert/3
, we define this as a protocol.
For a valid return, the consideration should be:
- The enumerable returned (
Enum.t()
) should contain the information required to perform a conversion. Examples can be found in theBio.Sequence.DnaStrand
andBio.Sequence.DnaDoubleStrand
modules. There, you'll see that for a simple sequence, it makes sense to simple iterate the grouped chunks. Whereas the double stranded sequence returns a list of tuples of chunks. - The
map()
should contain relevant data for the re-capitulation of a struct. So if you're converting aDnaStrand
, you should consider passing back out thelabel
field. This allows the conversion function to attach it to the newly constructed type.
The error mode for various sequences will vary, but generally the idea of
mismatching the sequence length to the k
value will hold. For the build in
Bio.Sequence.DnaStrand
, this is merely the even division. For the
Bio.Sequence.DnaDoubleStrand
it's more complicated. That type assumes that
you want to see pairs of aggregated values (top/bottom), but they may be
offset. So you can't just look at if the values are empty.
Instead, it looks to see if there can be complete aggregates, even if they're paired with empty space.
Keep these considerations in mind implementing your own Polymeric
types.
In addition to the enumeration of the elements, this also makes sense as the
location for defining validity. That is, there are two further methods
valid?/2
and validate/2
.
These make the assumption that a relevant alphabet
is defined for the
polymer. For example, IUPAC DNA
Codes.
Your implementation of valid?/2
and validate/2
should prefer the alphabet
given to them. This will be respected by the Bio.Polymer.valid?/2
and
Bio.Polymer.validate/2
function. Essentially, when used, these will always
prefer the given value, but will default back to the value attached to the
type if it is defined.
Example
iex>alias Bio.Sequence.Alphabets.Dna, as: Alpha
...>Bio.Sequence.DnaStrand.new("atgcnn", alphabet: Alpha.common())
...>|> Bio.Polymer.valid?()
false
iex>alias Bio.Sequence.Alphabets.Dna, as: Alpha
...>Bio.Sequence.DnaStrand.new("atgcnn", alphabet: Alpha.common())
...>|> Bio.Polymer.valid?(Alpha.with_n())
true
Note
In case neither is defined, the
validate/2
function will return an error tuple, where thevalid?
will simply return false.
The validate/2
function behaves similarly, but it should return a new struct
with the valid?
key set.
Example
iex>alias Bio.Sequence.Alphabets.Dna, as: Alpha
...>Bio.Sequence.DnaStrand.new("atgcnn", alphabet: Alpha.common())
...>|> Bio.Polymer.validate()
{
:error,
[{:mismatch_alpha, "n", 4}, {:mismatch_alpha, "n", 5}]
}
iex>alias Bio.Sequence.Alphabets.Dna, as: Alpha
...>Bio.Sequence.DnaStrand.new("atgcnn", alphabet: Alpha.common())
...>|> Bio.Polymer.validate(Alpha.with_n())
{
:ok,
%Bio.Sequence.DnaStrand{
sequence: "atgcnn",
length: 6,
alphabet: "ACGTNacgtn",
valid?: true
}
}
Note
The applied alphabet is the one that is returned in the struct. This ensures that you are correctly tracking what a type is valid for. So be careful about assumptions.
Summary
Functions
Split a polymer into chunks of k
size
Determine if the content of a polymer matches an alphabet
Validate if the content of a polymer matches an alphabet, returning an updated struct.
Types
@type t() :: term()
Functions
Split a polymer into chunks of k
size
Determine if the content of a polymer matches an alphabet
@spec validate( struct(), String.t() | nil ) :: {:ok, struct()} | {:error, {atom(), String.t(), integer()}} | {:error, [{atom(), String.t(), integer()}]}
Validate if the content of a polymer matches an alphabet, returning an updated struct.
Depends on the struct implementing both an alphabet
and valid?
keys.