# `Chunx.Chunker.Word`
[🔗](https://github.com/preciz/chunx/blob/main/lib/chunx/chunker/word.ex#L1)

Implements word based chunking strategy.

Splits text into overlapping chunks based on words while
respecting token limits.

# `chunk_opts`

```elixir
@type chunk_opts() :: [
  chunk_size: pos_integer(),
  chunk_overlap: pos_integer() | float()
]
```

# `chunk`

```elixir
@spec chunk(binary(), Tokenizers.Tokenizer.t(), chunk_opts()) ::
  {:ok, [Chunx.Chunk.t()]} | {:error, term()}
```

Splits text into overlapping chunks using word boundaries.

## Options
  * `:chunk_size` - Maximum number of tokens per chunk (default: 512)
  * `:chunk_overlap` - Number of tokens (integer) or percentage (float between 0 and 1) to overlap between chunks (default: 0.25)

## Examples

    iex> {:ok, tokenizer} = Tokenizers.Tokenizer.from_pretrained("gpt2")
    iex> Chunx.Chunker.Word.chunk("Some text to split", tokenizer, chunk_size: 3, chunk_overlap: 1)
    {
      :ok,
      [
        %Chunx.Chunk{end_byte: 12, start_byte: 0, text: "Some text to", token_count: 3},
        %Chunx.Chunk{end_byte: 18, start_byte: 9, text: " to split", token_count: 2}
      ]
    }

---

*Consult [api-reference.md](api-reference.md) for complete listing*
