View Source LangChain.ChatModels.ChatBumblebee (LangChain v0.2.0)

Represents a chat model hosted by Bumblebee and accessed through an Nx.Serving.

Many types of models can be hosted through Bumblebee, so this attempts to represent the most common features and provide a single implementation where possible.

For streaming responses, the Bumblebee serving must be configured with stream: true and should include stream_done: true as well.

Example:

Bumblebee.Text.generation(model_info, tokenizer, generation_config,
  # ...
  stream: true,
  stream_done: true
)

This supports a non streaming response as well, in which case, a completed LangChain.Message is returned at the completion.

The stream_done option sends a final message to let us know when the stream is complete and includes some token information.

The chat model can be created like this and provided to an LLMChain:

ChatBumblebee.new!(%{
  serving: @serving_name,
  template_format: @template_format,
  receive_timeout: @receive_timeout,
  stream: true
})

The serving is the module name of the Nx.Serving that is hosting the model.

The following are the supported values for template_format. These are provided by LangChain.Utils.ChatTemplates.

Chat models are trained against specific content formats for the messages. Some models have no special concept of a system message. See the LangChain.Utils.ChatTemplates documentation for specific format examples.

Using the wrong format with a model may result in poor performance or hallucinations. It will not result in an error.

Full example of chat through Bumblebee

Here's a full example of having a streaming conversation with Llama 2 through Bumblebee.

defmodule MyApp.BumblebeeChat do
  @doc false
  alias LangChain.Message
  alias LangChain.ChatModels.ChatBumblebee
  alias LangChain.Chains.LLMChain

  def run_chat do
    # Used when streaming responses. The function fires as data is received.
    callback_fn = fn
      %LangChain.MessageDelta{} = delta ->
        # write to the console as the response is streamed back
        IO.write(delta.content)

      %LangChain.Message{} = message ->
        # inspect the fully finished message that was assembled from all the deltas
        IO.inspect(message, label: "FULLY ASSEMBLED MESSAGE")
    end

    # create and run the chain
    {:ok, _updated_chain, %Message{} = message} =
      LLMChain.new!(%{
        llm:
          ChatBumblebee.new!(%{
            serving: Llama2ChatModel,
            template_format: :llama_2,
            stream: true
          }),
        verbose: true
      })
      |> LLMChain.add_message(Message.new_system!("You are a helpful assistant."))
      |> LLMChain.add_message(Message.new_user!("What is the capital of Taiwan? And share up to 5 interesting facts about the city."))
      |> LLMChain.run(callback_fn: callback_fn)

    # print the LLM's fully assembled answer
    IO.puts("\n\n")
    IO.puts(message.content)
    IO.puts("\n\n")
  end
end

Then run the code in IEx:

  recompile; MyApp.BumblebeeChat.run_chat

Summary

Functions

Setup a ChatBumblebee client configuration.

Setup a ChatBumblebee client configuration and return it or raise an error if invalid.

Types

@type callback_fn() :: (LangChain.Message.t() | LangChain.MessageDelta.t() -> any())
@type t() :: %LangChain.ChatModels.ChatBumblebee{
  seed: term(),
  serving: term(),
  stream: term(),
  template_format: term()
}

Functions

@spec new(attrs :: map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}

Setup a ChatBumblebee client configuration.

@spec new!(attrs :: map()) :: t() | no_return()

Setup a ChatBumblebee client configuration and return it or raise an error if invalid.