Real-time voice and multimodal AI pipelines for Elixir, inspired by pipecat. See the live voice demo to try it out.

Demo

Disclaimer: Feline is an experiment in porting pipecat to Elixir using only LLMs (no human-written code). It is not reliable yet — expect rough edges, missing features, and untested paths. Use at your own risk.

Feline reimplements pipecat's core architecture using BEAM/OTP primitives — each processor is a GenServer, pipelines are supervised process trees, and frame priority is handled through selective receive rather than async queues.

Core Concepts

Frames are the universal data unit. Audio, text, transcriptions, LLM responses, control signals — everything flows through the pipeline as a frame. Frames are categorized as:

  • System — high priority, processed immediately (e.g. StartFrame, CancelFrame, InterruptionFrame)
  • Data — regular content (e.g. TextFrame, OutputAudioRawFrame, TranscriptionFrame)
  • Control — lifecycle signals (e.g. EndFrame, HeartbeatFrame, TTSStartedFrame)

Processors are GenServer processes that receive frames, transform them, and push them downstream (or upstream for errors). Each processor implements the Feline.Processor behaviour:

defmodule MyApp.Uppercaser do
  use Feline.Processor

  @impl true
  def init(_opts), do: {:ok, %{}}

  @impl true
  def handle_frame(%Feline.Frames.TextFrame{text: text} = frame, :downstream, state) do
    {:push, %{frame | text: String.upcase(text)}, :downstream, state}
  end

  def handle_frame(frame, direction, state) do
    {:push, frame, direction, state}
  end
end

Pipelines chain processors together. A Pipeline.Task starts all processors under a DynamicSupervisor, links them in order, and sends a StartFrame to kick things off:

pipeline = Feline.Pipeline.new([
  {Feline.Services.Deepgram.STT, api_key: "...", sample_rate: 16_000},
  {Feline.Services.OpenAI.LLM, api_key: "...", model: "gpt-4.1-mini"},
  {Feline.Services.ElevenLabs.TTS, api_key: "...", voice_id: "..."}
])

Feline.Pipeline.Runner.run(pipeline)

Services are processors with pre-built frame handling for common AI tasks. Implement one callback and the service macro handles the rest:

  • Feline.Services.LLM — receives LLMContextFrame, calls your process_context/2, pushes LLMTextFrame
  • Feline.Services.STT — receives InputAudioRawFrame, calls your run_stt/2, pushes TranscriptionFrame
  • Feline.Services.TTS — receives TextFrame, calls your run_tts/2, pushes TTSAudioRawFrame

How It Works

[Source]  [Processor 1]  [Processor 2]  ...  [Sink]
    upstream                              downstream 
  1. Pipeline.Task starts all processors as GenServer processes under a DynamicSupervisor
  2. Processors are linked in order — each knows its next and prev PID
  3. Frames flow via message passing: send(next_pid, {:frame, frame, :downstream})
  4. System frames use a separate tag {:system_frame, ...} and are drained from the mailbox before each regular frame via selective receive
  5. Interruptions clear buffered frames; cancellation drops all non-system frames
  6. The sink forwards frames back to the Pipeline.Task, which manages lifecycle (EndFrame = done)

Key Differences from Python Pipecat

Python pipecatFeline
asyncio.PriorityQueueSelective receive on message tags
isinstance() dispatchPattern matching on structs
prev/next object pointersPIDs in GenServer state
asyncio.Task managementOTP DynamicSupervisor
Single-threaded concurrencyTrue parallel BEAM processes
try/except error handlingErrorFrame upstream + supervisor restart

Built-in Services

ServiceModuleStreaming
OpenAI Chat CompletionsFeline.Services.OpenAI.LLMFeline.Services.OpenAI.StreamingLLM
Deepgram STTFeline.Services.Deepgram.STTFeline.Services.Deepgram.StreamingSTT
ElevenLabs TTSFeline.Services.ElevenLabs.TTSFeline.Services.ElevenLabs.StreamingTTS

Additional Features

  • Parallel pipelinesFeline.Pipeline.Parallel fans out frames to concurrent processor branches
  • Voice Activity Detection — energy-based VAD processor with configurable thresholds
  • Function call handlingFeline.Processors.FunctionCallHandler for LLM tool calls
  • Context aggregationUserContextAggregator and AssistantContextAggregator for managing LLM conversation state
  • Telemetry:telemetry hooks and observer callbacks for frame processing metrics

Live Voice Demo

Talk to an AI agent through your microphone. Requires API keys and sox (brew install sox).

  1. Add keys to .env in the project root:
OPENAI_API_KEY=sk-...
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
  1. Run:
mix feline.talk

Speak into your mic and the agent responds in real-time — both text and audio. You can also type messages directly in the console.

Customize the system prompt:

mix feline.talk --system "You are a pirate. Respond in pirate speak."

The demo pipeline (source):

Mic (ffmpeg)  VAD  Deepgram STT  Context Aggregation  OpenAI LLM  Sentence Aggregation  ElevenLabs TTS  Speaker (sox)

Features working in the demo:

  • Streaming speech-to-text and text-to-speech
  • Streaming LLM token output (printed to console as it arrives)
  • Echo suppression (mic is muted while bot speaks)

Installation

Add feline to your dependencies in mix.exs:

def deps do
  [
    {:feline, "~> 0.1"}
  ]
end