View Source Bio.IO.SnapGene (bio_elixir v0.3.0)

Read a SnapGene file

The file is read into a struct with the following fields:

%Bio.IO.SnapGene{
    sequence: %Bio.Sequence.DnaStrand{},
    circular?: boolean(),
    valid?: boolean(),
    features: tuple()
  }

The circular? and sequence fields are parsed from the DNA packet.

The sequence field is represented by default as a Bio.Sequence.DnaStrand, but any module that behaves as a Bio.Behaviours.Sequence can be used, since the new/2 method is applied to create the struct.

Error

No validation is applied to the sequence, so you can force an invalid sequence struct by passing a module for the incorrect sequence.

The valid? property is determined by parsing the SnapGene cookie to ensure that it contains the requisite "SnapGene" string.

Note

The concept of validity has to do with the snap gene file, and not the sequence or any other of the parsed data.

Features require a bit more explanation, since they are stored in XML. Parsing them into a map is certainly a possibility, but it seemed like doing so would reduce the ability of a developer to leverage what I am hoping is a lower level library than some.

In the interest of leaving the end user with as much power as possible, this method does not attempt to parse the XML stored within the file. Instead, the XML is returned to you in the form generated by :xmerl_scan.string/1. In doing it this way you have access to the entire space of data stored within the file, not just a subset that is parsed. This also means that in order to query the data, you need to be comfortable composing XPaths. As an example, if you have a terminator feature as the first feature and you want to get the segment range:

iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('string(/*/Feature[1]/Segment/@range)', sample.features)
{:xmlObj, :string, '400-750'}

As another note, this will also require some familiarity with the file type, for example whether or not a range is exclusive or inclusive on either end. Attempting to access a node that doesn't exist will return an empty array.

iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('string(/*/Feature[1]/Unknown/Path/@range)', sample.features)
{:xmlObj, :string, []}

The semantics of this are admittedly odd. But there's not much to be done about that.

The object returned from :xmerl_xpath.string/[2,3,4] is a tuple, so Enumerable isn't implemented for it. You're best off sticking to XPath to get the required elements. The counts of things are simple enough to retrieve in this way though. For example, if I wanted to know how many Feature Segments there were:

iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>:xmerl_xpath.string('count(/*/Feature/Segment)', sample.features)
{:xmlObj, :number, 2}

Now it's a simple matter to map over the desired queries to build up some data from the XML:

iex>{:ok, sample} = SnapGene.read("test/io/snap_gene/sample-e.dna")
...>Enum.map(1..2, fn i -> :xmerl_xpath.string('string(/*/Feature[#{i}]/Segment/@range)', sample.features) end)
[{:xmlObj, :string, '400-750'},{:xmlObj, :string, '161-241'}]

I cover the basics of using XPath to perform queries in the Using XML guide. I also plan to write a follow up guide with further examples of queries, and an explanation of the mapping of concepts between the XML and what is parsed from BioPython.

Link to this section Summary

Functions

Read the contents of a SnapGene file.

Link to this section Functions

Link to this function

read(filename, opts \\ [])

View Source
@spec read(filename :: Path.t(), opts :: keyword()) ::
  {:ok, struct()} | {:error, File.posix()}

Read the contents of a SnapGene file.

Takes a filename and reads the contents into the %Bio.IO.SnapGene{} struct. Returns an error tuple on failure with the cause from File.read/1.

You can use :file.format_error/1 to get a descriptive string of the error.