essence v0.2.0 Essence.Document

This module defines the struct type Essence.Document, as well as a variety of convenience methods for access the document’s text, paragraphs, sentences and tokens.

Summary

Functions

Pretty prints all occurrences of token in the given Essence.Document, doc. Prints n (default=20) characters of context

For each occurrence of token in the given Essence.Document, doc, returns a list containing the token as well as n (default=5) tokens to the left and right of the occurrence

Retrieve the list of all tokens contained in the given Essence.Document

Find all occurrences of token in the given Essence.Document. Returns a list of [token: index] tuples

Read the text represented by a String and create an Essence.Document

Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the given token in the given document, excluding the token itself

Retrieve a the n-th tokenized paragraph from the given Essence.Document

Retrieve the tokenized paragraphs from the given Essence.Document

Retrieve the n-th tokenized sentence from the given Essence.Document

Retrieve the tokenized sentences from the given Essence.Document

Retrieve the list of all words in the given Essence.Document, ignoring all tokens that are punctuation

Functions

concordance(document, token, n \\ 20)

Specs

concordance(doc :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, token :: String.t, n :: number) :: none

Pretty prints all occurrences of token in the given Essence.Document, doc. Prints n (default=20) characters of context.

context_of(document, token, n \\ 5)

Specs

context_of(doc :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, token :: String.t, n :: number) :: List.t

For each occurrence of token in the given Essence.Document, doc, returns a list containing the token as well as n (default=5) tokens to the left and right of the occurrence.

enumerate_tokens(document)

Specs

enumerate_tokens(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}) :: List.t

Retrieve the list of all tokens contained in the given Essence.Document

find_token(document, token)

Specs

find_token(doc :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, token :: String.t) :: List.t

Find all occurrences of token in the given Essence.Document. Returns a list of [token: index] tuples.

from_text(text)

Specs

from_text(text :: String.t) :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}

Read the text represented by a String and create an Essence.Document.

one_contexts_of(document, token)

Specs

one_contexts_of(doc :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, token :: String.t) :: List.t

Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the given token in the given document, excluding the token itself.

paragraph(document, n)

Specs

paragraph(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, n :: integer) :: List.t

Retrieve a the n-th tokenized paragraph from the given Essence.Document

paragraphs(document)

Specs

paragraphs(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}) :: List.t

Retrieve the tokenized paragraphs from the given Essence.Document.

sentence(document, n)

Specs

sentence(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}, n :: integer) :: List.t

Retrieve the n-th tokenized sentence from the given Essence.Document

sentences(document)

Specs

sentences(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}) :: List.t

Retrieve the tokenized sentences from the given Essence.Document.

words(document)

Specs

words(document :: %Essence.Document{meta: term, nested_tokens: term, text: term, type: term, uri: term}) :: List.t

Retrieve the list of all words in the given Essence.Document, ignoring all tokens that are punctuation.