Geminix.V1beta.WhiteSpaceConfig (geminix v0.2.0)
Configuration for a white space chunking algorithm [white space delimited].
Fields:
:max_overlap_tokens(integer/0) - Maximum number of overlapping tokens between two adjacent chunks.:max_tokens_per_chunk(integer/0) - Maximum number of tokens per chunk. Tokens are defined as words for this chunking algorithm. Note: we are defining tokens as words split by whitespace as opposed to the output of a tokenizer. The context window of the latest gemini embedding model as of 2025-04-17 is currently 8192 tokens. We assume that the average word is 5 characters. Therefore, we set the upper limit to 2**9, which is 512 words, or 2560 tokens, assuming worst case a character per token. This is a conservative estimate meant to prevent context window overflow.
Summary
Functions
Create a Geminix.V1beta.WhiteSpaceConfig.t/0 from a map returned
by the Gemini API.
Types
Functions
@spec from_map(t(), map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}
Create a Geminix.V1beta.WhiteSpaceConfig.t/0 from a map returned
by the Gemini API.
Sometimes, this function should not be applied to the full response body, but instead it should be applied to the correct part of the map in the response body. This depends on the concrete API call.