Essence.Document (essence v0.3.0)
This module defines the struct type Essence.Document, as well as a
variety of convenience methods for access the document's text, paragraphs,
sentences and tokens.
Link to this section Summary
Functions
Pretty prints all occurrences of token in the given Essence.Document,
doc. Prints n (default=20) characters of context.
For each occurrence of token in the given Essence.Document, doc,
returns a list containing the token as well as n (default=5) tokens to the left and
right of the occurrence.
Retrieve the list of all tokens contained in the given Essence.Document
Find all occurrences of token in the given Essence.Document. Returns a
list of [token: index] tuples.
Read the text represented by a String and create an Essence.Document.
Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the
given token in the given document, excluding the token itself.
Retrieve a the n-th tokenized paragraph from the given Essence.Document
Retrieve the tokenized paragraphs from the given Essence.Document.
Retrieve the n-th tokenized sentence from the given Essence.Document
Retrieve the tokenized sentences from the given Essence.Document.
Retrieve the list of all words in the given Essence.Document, ignoring all tokens that are punctuation.
Link to this section Functions
concordance(doc, token, n \\ 20)
Specs
concordance(
doc :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
token :: String.t(),
n :: number()
) :: :ok
Pretty prints all occurrences of token in the given Essence.Document,
doc. Prints n (default=20) characters of context.
context_of(doc, token, n \\ 5)
Specs
context_of(
doc :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
token :: String.t(),
n :: number()
) :: List.t()
For each occurrence of token in the given Essence.Document, doc,
returns a list containing the token as well as n (default=5) tokens to the left and
right of the occurrence.
enumerate_tokens(document)
Specs
enumerate_tokens(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
}
) :: List.t()
Retrieve the list of all tokens contained in the given Essence.Document
find_token(doc, token)
Specs
find_token(
doc :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
token :: String.t()
) :: List.t()
Find all occurrences of token in the given Essence.Document. Returns a
list of [token: index] tuples.
from_text(text)
Specs
from_text(text :: String.t()) :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }
Read the text represented by a String and create an Essence.Document.
one_contexts_of(doc, token)
Specs
one_contexts_of(
doc :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
token :: String.t()
) :: List.t()
Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the
given token in the given document, excluding the token itself.
paragraph(document, n)
Specs
paragraph(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
n :: integer()
) :: List.t()
Retrieve a the n-th tokenized paragraph from the given Essence.Document
paragraphs(document)
Specs
paragraphs(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
}
) :: List.t()
Retrieve the tokenized paragraphs from the given Essence.Document.
sentence(doc, n)
Specs
sentence(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
},
n :: integer()
) :: List.t()
Retrieve the n-th tokenized sentence from the given Essence.Document
sentences(document)
Specs
sentences(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
}
) :: List.t()
Retrieve the tokenized sentences from the given Essence.Document.
words(doc)
Specs
words(
document :: %Essence.Document{
meta: term(),
nested_tokens: term(),
text: term(),
type: term(),
uri: term()
}
) :: List.t()
Retrieve the list of all words in the given Essence.Document, ignoring all tokens that are punctuation.