Troubleshooting
View SourceReference this guide when CLI or SDK calls fail or diverge from expectations. Most fixes involve configuration, backpressure handling, or local environment setup.
Authentication or config errors
- Missing API key/base URL:
Tinkex.Config.new/1raises or returns validation errors whenapi_key/base_urlare absent. SetTINKER_API_KEY(and optionallyTINKER_BASE_URL) or pass explicit options. - Non-default pool selection: If you override
:base_urlwithout starting a matching Finch pool, requests fall back to Finch defaults. Use the same base URL configured inTinkex.Applicationfor production workloads, or provide a custom pool viaconfig :tinkex, :http_pool, MyPool. - Session SDK version too old: Some endpoints may reject requests (notably vision/image input) if the reported SDK version is too old. Tinkex reports the official Python Tinker SDK version configured in
mix.exs; update to the latest Tinkex if you hit this.
Vision and multimodal inputs
- Asset is not a valid image: The backend rejected the image bytes. Verify you are sending a real PNG/JPEG (and that
formatmatches the file), try a different image, and avoid settingexpected_tokensunless you know the correct value. The bundled example supportsTINKER_IMAGE_PATH/TINKER_IMAGE_EXPECTED_TOKENS.
Timeouts, queuing, or 429 responses
- Long-running training steps: Increase
:timeoutonTinkex.Configor pass:await_timeoutto client calls. Training requests are sent sequentially; enqueue fewer simultaneous batches to keep the GenServer responsive. - Queue backpressure: Sampling and training futures emit telemetry
[:tinkex, :queue, :state_change]. AttachTinkex.Telemetry.attach_logger/1or a custom handler to watch for:paused_rate_limit/:paused_capacity. - HTTP 429: The RateLimiter stores per-tenant backoff windows. You do not need to manually retry while a backoff is active—subsequent calls will sleep. When testing, lower concurrency or reuse the same
ServiceClientto share limiter state.
Tokenizer (NIF) issues
- Compilation/ABI errors: Ensure Rust toolchains and C toolchains are available; re-run
mix deps.compile tokenizers. - Runtime crashes: The ETS cache stores NIF handles; verify the same OS/CPU architecture used to build dependencies. If you suspect a bad cache entry, restart the BEAM and clear
_build/deps. - Unexpected token IDs: Confirm you are passing fully formatted text (chat templates are not inserted) and the correct model name. For Llama-3 variants, the SDK automatically swaps to
"thinkingmachineslabinc/meta-llama-3-tokenizer". - Kimi K2 tokenizers: Kimi uses
tiktoken.model+tokenizer_config.json(viatiktoken_ex), not a HuggingFacetokenizer.json. Ensure those files can be downloaded from HuggingFace or passtiktoken_model_path/tokenizer_config_path.
CLI failures
--outputmissing:tinkex checkpointrequires--outputto write metadata. Provide a path with write permissions.- Missing base model: Both
runandcheckpointexpect--base-model(or--model-pathforrun). Validate the option spelling and casing. - Prompt file errors:
--prompt-fileaccepts plain text or a JSON array of token IDs. Confirm the file is readable and valid UTF-8/JSON. - EXLA errors: EXLA is optional and is not started automatically. If you need EXLA-backed Nx operations, run via
mix run/ an OTP release and start:exlabefore callingNx.default_backend/1. - Stuck or slow runs: Pass
--http-timeout/--timeoutand monitor telemetry logs. Use--jsonto inspect raw server payloads when diagnosing errors.
Comparing with the Python SDK
- Use the same base model, prompt text, sampling params (temperature, top_p, max_tokens), and seed (if supported) on both clients.
- Request logprobs (
prompt_logprobs/topk_prompt_logprobs) to compare token-level probabilities. Expect similar, not identical, text output. - If results diverge, verify tokenizer IDs match (
TrainingClient.get_info/1when available) and that both clients point to the samebase_url.
Documentation build issues
mix docs relies on dev-only deps. Run it in a dev environment (not production releases) and ensure ex_doc is installed. If assets are missing, rebuild the escript or fetch deps again with mix deps.get.