View Source Bio.IO.Fasta (bio_elixir v0.3.0)

Allow the input/output of FASTA formatted files.

The FASTA file format is composed of pairs of lines where the pair is demarcated by the ">" character. All data proceeding the ">" character represents the 'header' of the pair, while the next line after a newline represents sequence data.

Any data after subsequent newlines that are not preceded by a second ">" character are assumed to be multi-line data. For example, the following two files would be considered equivalent data:

# fasta 1
>header1
atgcatgca

and

# fasta 2
>header1
atgc
atgca

The FASTA file format does not specify the type of the data in the sequence. That means that you can reasonably store RNA, DNA, amino acid, or any other sequence using the format. In general, the expectation is that the data is ASCII encoded.

The methods in this module do support reading into specified types. See read/2 for more details.

Link to this section Summary

Functions

Read a FASTA formatted file

Read a FASTA formatted file

Write a FASTA file using sequence data.

Link to this section Types

@type fasta_data() ::
  [String.t()]
  | [struct()]
  | [{header(), sequence()}]
  | %{headers: [header()], sequences: [sequence()]}
@type header() :: String.t()
@type read_opts() :: {:type, any()} | {:parse_header, (String.t() -> String.t())}
@type sequence() :: String.t()

Link to this section Functions

Link to this function

read(filename, opts \\ [])

View Source
@spec read(filename :: Path.t(), opts :: [read_opts()]) ::
  {:ok, any()} | {:error, File.posix()}

Read a FASTA formatted file

The read/2 function returns an error tuple of the content or error code from File.read. You can use :file.format_error/1 to get a descriptive string of the error.

You can specify the return type of the contents by using a module which implements the Bio.Sequential behaviour. Specifically the type must have a new/2 method.

options

Options

  • :type - The module for the type of struct you wish to have returned. This should minimally implement a new/2 function equivalent to the Bio.Sequential behaviour. Otherwise the base Bio.Sequence is used.
  • :parse_header - A callable for parsing the header values of the FASTA file. Otherwise identity is used and the header is returned as is.
Link to this function

read!(filename, opts \\ [])

View Source
@spec read!(filename :: Path.t(), opts :: [read_opts()]) :: any() | no_return()

Read a FASTA formatted file

The same as read/2, but will raise a File.Error on failure.

Link to this function

write(filename, data, modes \\ [])

View Source
@spec write(filename :: Path.t(), data :: fasta_data(), [File.mode()]) ::
  :ok | {:error, File.posix()}

Write a FASTA file using sequence data.

The data type that this function accepts is varied, and may be one of a number of Lists. Examples of which types are handled:

List:

  # a list of header/sequence tuples
  [{header(), sequence()}, ...]
  # a list of header/sequence implicitly paired
  [header(), sequence(), header(), sequence(), ...]
  # a list of struct()
  [%Bio.Sequence._{}, ...]

Where %Bio.Sequence._{} indicates any struct of the Bio.Sequence module or modules implementing the Bio.Sequential behaviour.

It also supports data in a Map format:

%{
  headers: [header(), ...],
  sequences: [sequence(), ...]
}

examples

Examples

iex> Fasta.write("/tmp/test_file.fasta", ["header", "sequence", "header2", "sequence2"])
:ok

Will return error types in common with File.write/3