Ensures a model is loaded with the correct configuration on local providers (LM Studio, Ollama, vLLM) before inference begins.
LM Studio: checks GET /api/v1/models for loaded_instances first,
only calls POST /api/v1/models/load when the model
isn't already loaded with the required context_length.Ollama: POST /api/generate with keep_alive (or model is auto-loaded)
For cloud providers this is a no-op.
Summary
Functions
Ensures the model is loaded on the provider with the configured
context length. Returns :ok or {:error, reason}.