View Source Bio.Sequence (bio_elixir v0.2.0)
Bio.Sequence is the basic building block of the sequence types.
The core concept here is that a polymer is a sequence of elements encoded as a
binary. This is stored in the base %Bio.Sequence{} struct, which has both a
sequence and length field, and may carry a label as well.
The struct is intentionally sparse on information since this is meant to
compose into larger data types. For example, the Bio.Sequence.DnaDoubleStrand struct,
which has two polymer Bio.Sequence.DnaStrands as the top_strand and
bottom_strand fields.
Because many of the sequence behaviors are shared, they are implemented by
Bio.SimpleSequence and used in the modules that need them. This allows us to
ensure that there is a consistent implementation of the Enumerable protocol,
which in turn allows for common interaction patterns a la Python strings:
Examples
iex>"gmc" in Bio.Sequence.new("agmctbo")
true
iex>Bio.Sequence.new("agmctbo")
...>|> Enum.map(&(&1))
["a", "g", "m", "c", "t", "b", "o"]
iex>alias Bio.Enum, as: Bnum
...>Bio.Sequence.new("agmctbo")
...>|> Bnum.slice(2, 2)
%Bio.Sequence{sequence: "mc", length: 2, label: ""}My hope is that this alleviates some of the pain of coming from a language where strings are slightly more complex objects.