Core Byte-Pair Encoding merge algorithm.
Given a sequence of bytes and a rank map, repeatedly merges the lowest-ranked adjacent pair until no more merges are possible.
Summary
Functions
Encodes a binary chunk into a list of token rank integers using BPE.
Functions
@spec encode(binary(), %{required(binary()) => non_neg_integer()}) :: [ non_neg_integer() ]
Encodes a binary chunk into a list of token rank integers using BPE.
The input should be a pre-tokenized chunk (output of Pretokenizer.split/2).
Returns a list of integer token IDs (ranks).