View Source Bio.Sequence (bio_elixir v0.3.0)
Bio.Sequence is the basic building block of the sequence types.
The core concept here is that a polymer is a sequence of elements encoded as a
binary. This is stored in the base %Bio.Sequence{} struct, which has both a
sequence and length field, and may carry a label and alphabet field as
well.
The struct is intentionally sparse on information since this is meant to
compose into larger data types. For example, the Bio.Sequence.DnaDoubleStrand struct,
which has two polymer Bio.Sequence.DnaStrands as the top_strand and
bottom_strand fields.
Because many of the sequence behaviors are shared, they are implemented by
Bio.BaseSequence and used in the modules that need them. This allows us to
ensure that there is a consistent implementation of the Enumerable protocol,
which in turn allows for common interaction patterns a la Python strings:
iex>"gmc" in Bio.Sequence.new("agmctbo")
true
iex>Bio.Sequence.new("agmctbo")
...>|> Enum.map(&(&1))
["a", "g", "m", "c", "t", "b", "o"]My hope is that this alleviates some of the pain of coming from a language where strings are slightly more complex objects.
Additionally, you should look at the Bio.Enum module for dealing with cases
where the Enum default implementation results in odd behavior. It also
implements certain behaviors like returning the same type for functions:
iex>Bio.Sequence.new("agmctbo")
...>|> Enum.slice(2, 2)
'mc'vs
iex>alias Bio.Enum, as: Bnum
...>Bio.Sequence.new("agmctbo")
...>|> Bnum.slice(2, 2)
%Bio.Sequence{sequence: "mc", length: 2}