View Source Bio.IO.SnapGene (bio_elixir v0.3.0)
Read a SnapGene file
The file is read into a struct with the following fields:
%Bio.IO.SnapGene{
sequence: %Bio.Sequence.DnaStrand{},
circular?: boolean(),
valid?: boolean(),
features: tuple()
}
The circular?
and sequence
fields are parsed from the DNA packet.
The sequence
field is represented by default as a Bio.Sequence.DnaStrand
,
but any module that behaves as a Bio.Behaviours.Sequence
can be used, since
the new/2
method is applied to create the struct.
Error
No validation is applied to the sequence, so you can force an invalid sequence struct by passing a module for the incorrect sequence.
The valid?
property is determined by parsing the SnapGene cookie to ensure that it
contains the requisite "SnapGene" string.
Note
The concept of validity has to do with the snap gene file, and not the sequence or any other of the parsed data.
Features require a bit more explanation, since they are stored in XML. Parsing them into a map is certainly a possibility, but it seemed like doing so would reduce the ability of a developer to leverage what I am hoping is a lower level library than some.
In the interest of leaving the end user with as much power as possible, this
method does not attempt to parse the XML stored within the file. Instead, the
XML is returned to you in the form generated by :xmerl_scan.string/1
. In
doing it this way you have access to the entire space of data stored within
the file, not just a subset that is parsed. This also means that in order to
query the data, you need to be comfortable composing XPaths. As an example, if
you have a terminator feature as the first feature and you want to get the
segment range:
iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('string(/*/Feature[1]/Segment/@range)', sample.features)
{:xmlObj, :string, '400-750'}
As another note, this will also require some familiarity with the file type, for example whether or not a range is exclusive or inclusive on either end. Attempting to access a node that doesn't exist will return an empty array.
iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('string(/*/Feature[1]/Unknown/Path/@range)', sample.features)
{:xmlObj, :string, []}
The semantics of this are admittedly odd. But there's not much to be done about that.
The object returned from :xmerl_xpath.string/[2,3,4]
is a tuple, so
Enumerable
isn't implemented for it. You're best off sticking to XPath to
get the required elements. The counts of things are simple enough to retrieve
in this way though. For example, if I wanted to know how many Feature Segments
there were:
iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('count(/*/Feature/Segment)', sample.features)
{:xmlObj, :number, 2}
Now it's a simple matter to map over the desired queries to build up some data from the XML:
iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>Enum.map(1..2, fn i -> :xmerl_xpath.string('string(/*/Feature[#{i}]/Segment/@range)', sample.features) end)
[{:xmlObj, :string, '400-750'},{:xmlObj, :string, '161-241'}]
I cover the basics of using XPath to perform queries in the Using XML guide. I also plan to write a follow up guide with further examples of queries, and an explanation of the mapping of concepts between the XML and what is parsed from BioPython.
Link to this section Summary
Functions
Read the contents of a SnapGene file.
Link to this section Functions
@spec read(filename :: Path.t(), opts :: keyword()) :: {:ok, struct()} | {:error, File.posix()}
Read the contents of a SnapGene file.
Takes a filename and reads the contents into the %Bio.IO.SnapGene{}
struct.
Returns an error tuple on failure with the cause from File.read/1
.
You can use :file.format_error/1
to get a descriptive string of the error.