Custom LLM Endpoint Guide

View Source

This guide explains how to integrate custom LLM endpoints with GettextTranslator using LangChain 0.4.0.

Table of Contents

Overview

GettextTranslator uses LangChain to communicate with LLM providers. As of version 0.5.0, the library uses LangChain 0.4.0, which introduced breaking changes in how messages are structured.

Key Change in LangChain 0.4.0:

  • Message content is now a list of ContentPart structs instead of plain strings
  • All custom endpoints must return responses in this format

Supported Models

LangChain 0.4.0 officially supports:

ProviderModuleStatus
OpenAILangChain.ChatModels.ChatOpenAI✅ Fully Supported
AnthropicLangChain.ChatModels.ChatAnthropic✅ Fully Supported
Google GeminiLangChain.ChatModels.ChatGoogleAI✅ Fully Supported
Google Vertex AILangChain.ChatModels.ChatVertexAI✅ Fully Supported
OllamaLangChain.ChatModels.ChatOllamaAI⚠️ May not work
OthersCustom implementation⚠️ Requires custom adapter

Important: If you're using Ollama or other unsupported models, they may not function correctly with LangChain 0.4.0. Consider:

  • Using GettextTranslator 0.4.5 (with LangChain 0.3.3)
  • Switching to a supported provider
  • Implementing a custom adapter (see below)

Response Format Requirements

Synchronous (Non-Streaming) Response

Your LLM endpoint must return a tuple {:ok, updated_chain} where the chain contains the response message.

Required Structure:

{:ok, %LangChain.Chains.LLMChain{
  last_message: %LangChain.Message{
    role: :assistant,
    content: [
      %LangChain.Message.ContentPart{
        type: :text,
        content: "Translated text here"
      }
    ],
    status: :complete,
    index: 0
  },
  messages: [
    # All messages in the conversation
  ]
}}

Critical Requirements:

  1. Content as List: content MUST be a list of ContentPart structs

    # ✅ CORRECT
    content: [%ContentPart{type: :text, content: "text"}]
    
    # ❌ WRONG
    content: "text"
  2. ContentPart Structure: Each part must have type and content

    %LangChain.Message.ContentPart{
      type: :text,        # Required: :text, :image, :tool_call, etc.
      content: "string"   # Required: the actual content
    }
  3. Message Fields:

    • role: Must be :assistant for LLM responses
    • status: Should be :complete when done
    • content: List of ContentPart structs (never a string)

Streaming Response

For streaming responses, emit deltas via callbacks.

Delta Structure:

# Each delta emitted
%LangChain.MessageDelta{
  role: :assistant,
  content: [
    %LangChain.Message.ContentPart{
      type: :text,
      content: "Partial text chunk"
    }
  ],
  status: :incomplete  # or :complete for the final delta
}

Streaming Requirements:

  1. The on_llm_new_delta callback receives a list of deltas:

    def handle_delta(deltas) when is_list(deltas) do
      # Process list of MessageDelta structs
    end
  2. Merge deltas using LLMChain.merge_deltas/2:

    updated_chain = LLMChain.merge_deltas(current_chain, deltas)
  3. Access merged content via MessageDelta.merged_content:

    text = updated_chain.delta.merged_content
  4. Final delta must have status: :complete

Error Response

On error, return a three-element tuple:

{:error, updated_chain, reason}

Components:

  • updated_chain: The chain state at the time of error (can be nil)
  • reason: Error description (string, atom, or structured error)

Example:

{:error, chain, "API rate limit exceeded"}

GettextTranslator handles errors gracefully:

  • Logs the error with context
  • Returns an empty translation {:ok, ""}
  • Allows the translation process to continue

Custom Adapter Implementation

Minimal Adapter Example

defmodule MyApp.CustomLLMAdapter do
  @moduledoc """
  Custom LLM adapter for GettextTranslator using LangChain 0.4.0.
  """

  use LangChain.ChatModels.ChatModel

  alias LangChain.Message
  alias LangChain.Message.ContentPart

  defstruct [
    :model,
    :temperature,
    :endpoint,
    :api_key
  ]

  @type t :: %__MODULE__{
    model: String.t(),
    temperature: float(),
    endpoint: String.t(),
    api_key: String.t()
  }

  @doc """
  Creates a new instance of the custom LLM adapter.

  ## Examples

      iex> MyApp.CustomLLMAdapter.new!(%{
      ...>   model: "custom-model-v1",
      ...>   temperature: 0.0,
      ...>   endpoint: "https://api.example.com/v1/chat",
      ...>   api_key: "sk-..."
      ...> })
  """
  @impl true
  def new(attrs \\ %{}) do
    %__MODULE__{
      model: attrs[:model] || "default-model",
      temperature: attrs[:temperature] || 0.7,
      endpoint: attrs[:endpoint] || "https://api.example.com/v1/chat",
      api_key: attrs[:api_key]
    }
    |> validate()
  end

  @impl true
  def new!(attrs \\ %{}) do
    case new(attrs) do
      {:ok, adapter} -> adapter
      {:error, reason} -> raise ArgumentError, reason
    end
  end

  defp validate(adapter) do
    cond do
      is_nil(adapter.api_key) ->
        {:error, "API key is required"}

      is_nil(adapter.endpoint) ->
        {:error, "Endpoint URL is required"}

      true ->
        {:ok, adapter}
    end
  end

  @doc """
  Sends messages to the LLM and returns the response.

  This is the main function called by LangChain to get completions.
  """
  @impl true
  def call(adapter, messages, _functions \\ []) do
    # Build the request payload
    payload = build_payload(adapter, messages)

    # Make the API request
    case make_api_request(adapter, payload) do
      {:ok, response_text} ->
        # Convert to LangChain 0.4.0 format
        message = %Message{
          role: :assistant,
          content: [
            %ContentPart{
              type: :text,
              content: response_text
            }
          ],
          status: :complete
        }

        {:ok, message}

      {:error, reason} ->
        {:error, nil, reason}
    end
  end

  defp build_payload(adapter, messages) do
    %{
      model: adapter.model,
      temperature: adapter.temperature,
      messages: Enum.map(messages, &message_to_api_format/1)
    }
  end

  defp message_to_api_format(%Message{role: role, content: content}) do
    %{
      role: to_string(role),
      content: ContentPart.parts_to_string(content)
    }
  end

  defp make_api_request(adapter, payload) do
    headers = [
      {"Authorization", "Bearer #{adapter.api_key}"},
      {"Content-Type", "application/json"}
    ]

    body = Jason.encode!(payload)

    case HTTPoison.post(adapter.endpoint, body, headers) do
      {:ok, %{status_code: 200, body: response_body}} ->
        case Jason.decode(response_body) do
          {:ok, %{"choices" => [%{"message" => %{"content" => content}} | _]}} ->
            {:ok, content}

          {:error, _} = error ->
            error
        end

      {:ok, %{status_code: status_code, body: body}} ->
        {:error, "API returned status #{status_code}: #{body}"}

      {:error, %HTTPoison.Error{reason: reason}} ->
        {:error, "HTTP request failed: #{inspect(reason)}"}
    end
  end
end

Streaming Adapter Example

defmodule MyApp.StreamingLLMAdapter do
  use LangChain.ChatModels.ChatModel

  alias LangChain.Message
  alias LangChain.Message.ContentPart
  alias LangChain.MessageDelta
  alias LangChain.Chains.LLMChain

  # ... struct and new/new! implementations ...

  @impl true
  def call(adapter, messages, _functions \\ []) do
    # For streaming, we need to handle Server-Sent Events (SSE)
    payload = build_payload(adapter, messages, stream: true)

    # Initialize the delta accumulator
    delta_acc = %MessageDelta{
      role: :assistant,
      content: [],
      status: :incomplete
    }

    case stream_api_request(adapter, payload, delta_acc) do
      {:ok, final_delta} ->
        # Convert final delta to message
        message = MessageDelta.to_message(final_delta)
        {:ok, message}

      {:error, reason} ->
        {:error, nil, reason}
    end
  end

  defp stream_api_request(adapter, payload, delta_acc) do
    headers = [
      {"Authorization", "Bearer #{adapter.api_key}"},
      {"Content-Type", "application/json"},
      {"Accept", "text/event-stream"}
    ]

    body = Jason.encode!(payload)

    # Use streaming HTTP client
    case HTTPoison.post(adapter.endpoint, body, headers, stream_to: self(), async: :once) do
      {:ok, %HTTPoison.AsyncResponse{id: ref}} ->
        receive_stream(ref, delta_acc)

      {:error, reason} ->
        {:error, reason}
    end
  end

  defp receive_stream(ref, delta_acc) do
    receive do
      %HTTPoison.AsyncChunk{id: ^ref, chunk: chunk} ->
        # Parse SSE chunk
        case parse_sse_chunk(chunk) do
          {:ok, content_chunk} ->
            # Create a delta for this chunk
            new_delta = %MessageDelta{
              role: :assistant,
              content: [%ContentPart{type: :text, content: content_chunk}],
              status: :incomplete
            }

            # Merge with accumulator
            updated_acc = merge_message_deltas(delta_acc, new_delta)

            # Request next chunk
            HTTPoison.stream_next(%HTTPoison.AsyncResponse{id: ref})
            receive_stream(ref, updated_acc)

          {:done} ->
            # Stream complete
            {:ok, %{delta_acc | status: :complete}}

          {:error, reason} ->
            {:error, reason}
        end

      %HTTPoison.AsyncEnd{id: ^ref} ->
        {:ok, %{delta_acc | status: :complete}}

      %HTTPoison.Error{id: ^ref, reason: reason} ->
        {:error, reason}
    after
      30_000 ->
        {:error, "Stream timeout"}
    end
  end

  defp parse_sse_chunk(chunk) do
    # Parse Server-Sent Events format
    # Example: "data: {\"choices\":[{\"delta\":{\"content\":\"text\"}}]}\n\n"
    case String.trim(chunk) do
      "data: [DONE]" ->
        {:done}

      "data: " <> json_data ->
        case Jason.decode(json_data) do
          {:ok, %{"choices" => [%{"delta" => %{"content" => content}} | _]}} ->
            {:ok, content}

          _ ->
            {:ok, ""}
        end

      _ ->
        {:ok, ""}
    end
  end

  defp merge_message_deltas(acc, new_delta) do
    # Merge content lists
    merged_content = acc.content ++ new_delta.content
    %{acc | content: merged_content}
  end
end

Configuration

Basic Configuration

# config/config.exs
config :gettext_translator, GettextTranslator,
  endpoint: MyApp.CustomLLMAdapter,
  endpoint_model: "custom-model-v1",
  endpoint_temperature: 0,
  endpoint_config: %{
    "api_key" => System.get_env("CUSTOM_LLM_API_KEY"),
    "endpoint" => "https://api.example.com/v1/chat"
  },
  persona: "You are a professional translator. Translate accurately while preserving meaning and length.",
  style: "Casual, using simple language",
  ignored_languages: ["en"]

Runtime Configuration

The endpoint_config map is dynamically applied at runtime. GettextTranslator converts config keys to LangChain application environment variables:

# This config map:
endpoint_config: %{
  "api_key" => "sk-...",
  "custom_setting" => "value"
}

# Becomes:
Application.put_env(:langchain, :api_key, "sk-...")
Application.put_env(:langchain, :custom_setting, "value")

This allows you to configure any LangChain-compatible adapter without code changes.

Environment-Specific Configuration

# config/dev.exs
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOllamaAI,
  endpoint_model: "llama3.2:latest",
  endpoint_temperature: 0,
  endpoint_config: %{},  # Local Ollama, no config needed
  persona: "You are a professional translator.",
  style: "Casual",
  ignored_languages: ["en"]

# config/prod.exs
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "gpt-4",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("OPENAI_API_KEY")
  },
  persona: "You are a professional translator.",
  style: "Casual",
  ignored_languages: ["en"]

Testing Your Custom Endpoint

1. Unit Test Your Adapter

# test/my_app/custom_llm_adapter_test.exs
defmodule MyApp.CustomLLMAdapterTest do
  use ExUnit.Case

  alias MyApp.CustomLLMAdapter
  alias LangChain.Message
  alias LangChain.Message.ContentPart

  test "new/1 creates adapter with valid config" do
    {:ok, adapter} = CustomLLMAdapter.new(%{
      model: "test-model",
      api_key: "test-key",
      endpoint: "https://test.com"
    })

    assert adapter.model == "test-model"
    assert adapter.api_key == "test-key"
  end

  test "new/1 validates required fields" do
    assert {:error, _} = CustomLLMAdapter.new(%{model: "test"})
  end

  test "call/2 returns properly formatted message" do
    adapter = CustomLLMAdapter.new!(%{
      model: "test-model",
      api_key: "test-key",
      endpoint: "https://test.com"
    })

    messages = [
      Message.new_user!("Translate 'hello' to Spanish")
    ]

    # Mock the API response
    # ... your mocking logic ...

    assert {:ok, response} = CustomLLMAdapter.call(adapter, messages)
    assert %Message{} = response
    assert response.role == :assistant
    assert is_list(response.content)
    assert [%ContentPart{type: :text, content: text}] = response.content
    assert is_binary(text)
  end
end

2. Integration Test with GettextTranslator

# test/integration/translation_test.exs
defmodule GettextTranslatorIntegrationTest do
  use ExUnit.Case

  test "translates with custom adapter" do
    provider = %{
      ignored_languages: ["en"],
      persona: "Professional translator",
      style: "Casual",
      endpoint: %{
        config: %{
          "api_key" => "test-key"
        },
        adapter: MyApp.CustomLLMAdapter,
        model: "test-model",
        temperature: 0
      }
    }

    opts = %{
      language_code: "es",
      message: "Hello, world!"
    }

    assert {:ok, translation} = GettextTranslator.Processor.LLM.translate(provider, opts)
    assert is_binary(translation)
    assert translation != ""
  end
end

3. Manual Testing

# Run translation with your custom adapter
mix gettext_translator.run

# Check logs for errors
tail -f log/dev.log | grep -i "error\|translat"

4. Verify Response Format

Add logging to verify your adapter returns the correct format:

def call(adapter, messages, _functions) do
  case make_api_request(adapter, messages) do
    {:ok, response_text} ->
      message = %Message{
        role: :assistant,
        content: [%ContentPart{type: :text, content: response_text}],
        status: :complete
      }

      # Debug logging
      require Logger
      Logger.debug("Adapter response: #{inspect(message)}")

      {:ok, message}

    {:error, reason} ->
      Logger.error("Adapter error: #{inspect(reason)}")
      {:error, nil, reason}
  end
end

Common Issues

Issue 1: "content is not a list"

Error:

** (FunctionClauseError) no function clause matching in ContentPart.parts_to_string/1

Cause: Your adapter is returning content as a string instead of a list of ContentPart structs.

Solution:

# ❌ WRONG
content: "translated text"

# ✅ CORRECT
content: [%ContentPart{type: :text, content: "translated text"}]

Issue 2: "undefined function ContentPart.new/1"

Error:

** (UndefinedFunctionError) function LangChain.Message.ContentPart.new/1 is undefined

Cause: Trying to use ContentPart.new/1 which doesn't exist.

Solution: Use struct syntax instead:

# ❌ WRONG
ContentPart.new(%{type: :text, content: "text"})

# ✅ CORRECT
%ContentPart{type: :text, content: "text"}

Issue 3: "pattern match failed on {:ok, result}"

Error:

** (MatchError) no match of right hand side value: {:ok, %Message{...}}

Cause: GettextTranslator expects {:ok, %{last_message: %Message{}}} but your adapter returns {:ok, %Message{}}.

Solution: Make sure your LLMChain implementation wraps the message properly:

# Your adapter's call/2 should return:
{:ok, %Message{...}}

# LLMChain will wrap it as:
{:ok, %LLMChain{last_message: %Message{...}}}

Issue 4: Empty translations returned

Symptoms: Translations complete but return empty strings.

Possible Causes:

  1. API errors being silently caught
  2. Response parsing errors
  3. Content extraction failing

Debug Steps:

# Add detailed logging in your adapter
def call(adapter, messages, _functions) do
  Logger.debug("Sending messages: #{inspect(messages)}")

  case make_api_request(adapter, payload) do
    {:ok, response_text} ->
      Logger.debug("Received response: #{inspect(response_text)}")
      # ... rest of code

    {:error, reason} ->
      Logger.error("API error: #{inspect(reason)}")
      {:error, nil, reason}
  end
end

Issue 5: Streaming not working

Symptoms: Streaming responses timeout or fail.

Checklist:

  • [ ] Endpoint supports Server-Sent Events (SSE)
  • [ ] Accept: text/event-stream header is set
  • [ ] SSE parsing handles data: prefix correctly
  • [ ] Stream timeout is sufficient (30+ seconds)
  • [ ] Deltas are properly merged
  • [ ] Final delta has status: :complete

Examples

Example 1: OpenAI-Compatible Endpoint

Many providers offer OpenAI-compatible APIs. You can use them with the built-in adapter:

config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "your-model-name",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("API_KEY"),
    "openai_endpoint" => "https://your-provider.com/v1/chat/completions"
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]

Example 2: Azure OpenAI

config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOpenAI,
  endpoint_model: "gpt-4",
  endpoint_temperature: 0,
  endpoint_config: %{
    "openai_key" => System.get_env("AZURE_OPENAI_KEY"),
    "openai_endpoint" => "https://your-resource.openai.azure.com/openai/deployments/your-deployment/chat/completions?api-version=2023-05-15"
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]

Example 3: Local LLM with Ollama

# Note: Ollama support may be limited in LangChain 0.4.0
config :gettext_translator, GettextTranslator,
  endpoint: LangChain.ChatModels.ChatOllamaAI,
  endpoint_model: "llama3.2:latest",
  endpoint_temperature: 0,
  endpoint_config: %{
    # Empty if using default local endpoint
  },
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]

Example 4: Multiple Providers (Environment-Based)

# config/config.exs
config :gettext_translator, GettextTranslator,
  endpoint: System.get_env("LLM_PROVIDER", "ollama") |> provider_module(),
  endpoint_model: System.get_env("LLM_MODEL", "llama3.2:latest"),
  endpoint_temperature: 0,
  endpoint_config: provider_config(),
  persona: "Professional translator",
  style: "Casual",
  ignored_languages: ["en"]

defp provider_module("openai"), do: LangChain.ChatModels.ChatOpenAI
defp provider_module("anthropic"), do: LangChain.ChatModels.ChatAnthropic
defp provider_module("gemini"), do: LangChain.ChatModels.ChatGoogleAI
defp provider_module("ollama"), do: LangChain.ChatModels.ChatOllamaAI
defp provider_module(_), do: LangChain.ChatModels.ChatOpenAI

defp provider_config do
  case System.get_env("LLM_PROVIDER", "ollama") do
    "openai" ->
      %{"openai_key" => System.get_env("OPENAI_API_KEY")}

    "anthropic" ->
      %{"anthropic_key" => System.get_env("ANTHROPIC_API_KEY")}

    "gemini" ->
      %{"google_ai_key" => System.get_env("GOOGLE_AI_KEY")}

    _ ->
      %{}
  end
end

Building Your Own LLM Gateway

If you're running LLMs locally (e.g., in Docker) and want to provide an HTTP API that works with LangChain:

📘 See LLM_GATEWAY_EXAMPLE.md for a complete, production-ready Elixir implementation that includes:

  • OpenAI-compatible HTTP API endpoint
  • Queue system with GenStage for backpressure control
  • Support for multiple LLM backends (Ollama, vLLM, TGI)
  • API key authentication and rate limiting
  • Both streaming (SSE) and synchronous responses
  • Complete Docker deployment setup

This gateway sits between LangChain clients and your local LLM, handling queuing, authentication, and protocol translation.

Additional Resources

Support

If you encounter issues with custom endpoints:

  1. Check this guide for common issues
  2. Enable debug logging in your adapter
  3. Verify response format matches LangChain 0.4.0 requirements
  4. Open an issue on GitHub with:
    • Your adapter code (sanitized)
    • Error messages and stack traces
    • LangChain version
    • Example request/response payloads