dna_search v0.0.1 DNASearch.NCBI

Provides functions for querying the NCBI Nucleotide database.

Links to NCBI API Documentation:

Summary

Functions

Returns a raw FASTA string containing DNA data associated with the organism

Returns a raw FASTA string containing the sequences for the specified record IDs

Returns a list of NCBI IDs (strings) for sequences associated with the organism

Queries NCBI for sequence records and returns a map containing the following keys:

  • ids: NCBI record IDs corresponding to sequences
  • start_at_record_index: the index of the first returned record ID
  • num_records: the number of record IDs in the current result set
  • total_num_records: the total number of matching record IDs

Functions

get_fasta(organism_name, options \\ [])

Returns a raw FASTA string containing DNA data associated with the organism.

Parameters

  • organism_name: name of the organism you’re interested in. works best as a species names, e.g. "Homo sapiens" over "human".
  • options (optional):

    • limit (optional): number of records to include in the FASTA. default: 10, max: 50.
    • start_at_record_index (optional): the index of the first record to return. default: 0 to return the first set of records.
    • properties (optional): string specifying special properties to filter by. default: biomol_genomic to filter to genomic sequences. see possible values for this field here.
    • timeout (optional): request timeout in milliseconds. default: 10_000 (10 seconds).
get_fasta_for_sequence_ids(id_strings, options \\ [])

Returns a raw FASTA string containing the sequences for the specified record IDs.

Parameters

  • id_strings: list of NCBI ID strings corresponding to sequence records
  • options (optional):

    • timeout (optional): request timeout in milliseconds. default: 10_000 (10 seconds).
get_sequence_ids(organism_name, options \\ [])

Returns a list of NCBI IDs (strings) for sequences associated with the organism.

Parameters

  • organism_name: name of the organism you’re interested in. works best as a species names, e.g. "Homo sapiens" over "human".
  • options (optional):

    • limit (optional): number of records to include in the results. default: 10, max: 50.
    • start_at_record_index (optional): the index of the first record to return. default: 0 to return the first set of records.
    • properties (optional): string specifying special properties to filter by. default: biomol_genomic to filter to genomic sequences. see possible values for this field here.
    • timeout (optional): request timeout in milliseconds. default: 10_000 (10 seconds).
get_sequence_records(organism_name, options \\ [])

Queries NCBI for sequence records and returns a map containing the following keys:

  • ids: NCBI record IDs corresponding to sequences
  • start_at_record_index: the index of the first returned record ID
  • num_records: the number of record IDs in the current result set
  • total_num_records: the total number of matching record IDs

Parameters

  • organism_name: name of the organism you’re interested in. works best as a species names, e.g. "Homo sapiens" over "human".
  • options (optional):

    • limit (optional): number of records to include in the result set. default: 10, max: 50.
    • start_at_record_index (optional): the index of the first record to return. default: 0 to return the first set of records.
    • properties (optional): string specifying special properties to filter by. default: biomol_genomic to filter to genomic sequences. see possible values for this field here.
    • timeout (optional): request timeout in milliseconds. default: 10_000 (10 seconds).