# `Chunx.Chunker.Sentence`
[🔗](https://github.com/preciz/chunx/blob/main/lib/chunx/chunker/sentence.ex#L1)

Implements sentence based chunking strategy.

Splits text into overlapping chunks based on sentences while
respecting token limits.

# `chunk_opts`

```elixir
@type chunk_opts() :: [
  chunk_size: pos_integer(),
  chunk_overlap: pos_integer(),
  min_sentences_per_chunk: pos_integer(),
  delimiters: [String.t()],
  short_sentence_threshold: pos_integer()
]
```

# `chunk`

```elixir
@spec chunk(binary(), Tokenizers.Tokenizer.t(), chunk_opts()) ::
  {:ok, [Chunx.Chunk.t()]} | {:error, term()}
```

Splits text into overlapping chunks using sentence boundaries.

## Options
  * `:chunk_size` - Maximum number of tokens per chunk (default: 512). The chunker will try to fit
    as many complete sentences as possible while staying under this limit. If a single sentence
    exceeds this limit, it will still be included as its own chunk.

  * `:chunk_overlap` - Number of tokens that should overlap between consecutive chunks (default: 128).
    This helps maintain context between chunks by including some sentences from the end of the previous
    chunk at the start of the next chunk. Must be less than chunk_size.

  * `:min_sentences_per_chunk` - Minimum number of sentences that must be included in each chunk
    (default: 1). This ensures chunks contain complete thoughts, even if including multiple sentences
    would exceed chunk_size.

  * `:delimiters` - List of sentence delimiters. Sentences will be split
    at these delimiters. (default: ["." "!" "?" "\n"])

  * `:short_sentence_threshold` - Below this byte size a sentence is considered too short and will be
     concatenated with the next sentence. (default: 6)

---

*Consult [api-reference.md](api-reference.md) for complete listing*