Essence.Document (essence v0.3.0)
This module defines the struct type Essence.Document
, as well as a
variety of convenience methods for access the document's text, paragraphs,
sentences and tokens.
Link to this section Summary
Functions
Pretty prints all occurrences of token
in the given Essence.Document
,
doc
. Prints n
(default=20) characters of context.
For each occurrence of token
in the given Essence.Document
, doc
,
returns a list containing the token as well as n
(default=5) tokens to the left and
right of the occurrence.
Retrieve the list of all tokens contained in the given Essence.Document
Find all occurrences of token
in the given Essence.Document
. Returns a
list of [token: index] tuples.
Read the text
represented by a String
and create an Essence.Document
.
Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the
given token
in the given document
, excluding the token itself.
Retrieve a the n
-th tokenized paragraph from the given Essence.Document
Retrieve the tokenized paragraphs from the given Essence.Document
.
Retrieve the n
-th tokenized sentence from the given Essence.Document
Retrieve the tokenized sentences from the given Essence.Document
.
Retrieve the list of all words in the given Essence.Document
, ignoring all tokens that are punctuation.
Link to this section Functions
concordance(doc, token, n \\ 20)
Specs
concordance( doc :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, token :: String.t(), n :: number() ) :: :ok
Pretty prints all occurrences of token
in the given Essence.Document
,
doc
. Prints n
(default=20) characters of context.
context_of(doc, token, n \\ 5)
Specs
context_of( doc :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, token :: String.t(), n :: number() ) :: List.t()
For each occurrence of token
in the given Essence.Document
, doc
,
returns a list containing the token as well as n
(default=5) tokens to the left and
right of the occurrence.
enumerate_tokens(document)
Specs
enumerate_tokens( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() } ) :: List.t()
Retrieve the list of all tokens contained in the given Essence.Document
find_token(doc, token)
Specs
find_token( doc :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, token :: String.t() ) :: List.t()
Find all occurrences of token
in the given Essence.Document
. Returns a
list of [token: index] tuples.
from_text(text)
Specs
from_text(text :: String.t()) :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }
Read the text
represented by a String
and create an Essence.Document
.
one_contexts_of(doc, token)
Specs
one_contexts_of( doc :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, token :: String.t() ) :: List.t()
Returns a list of all the 1-contexts (1 token to the left, 1 token to the right) of the
given token
in the given document
, excluding the token itself.
paragraph(document, n)
Specs
paragraph( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, n :: integer() ) :: List.t()
Retrieve a the n
-th tokenized paragraph from the given Essence.Document
paragraphs(document)
Specs
paragraphs( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() } ) :: List.t()
Retrieve the tokenized paragraphs from the given Essence.Document
.
sentence(doc, n)
Specs
sentence( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() }, n :: integer() ) :: List.t()
Retrieve the n
-th tokenized sentence from the given Essence.Document
sentences(document)
Specs
sentences( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() } ) :: List.t()
Retrieve the tokenized sentences from the given Essence.Document
.
words(doc)
Specs
words( document :: %Essence.Document{ meta: term(), nested_tokens: term(), text: term(), type: term(), uri: term() } ) :: List.t()
Retrieve the list of all words in the given Essence.Document
, ignoring all tokens that are punctuation.