# `LangChain.Chains.DataExtractionChain`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.11/lib/chains/data_extraction_chain.ex#L1)

Defines an LLMChain for performing data extraction from a body of text.

Provide the schema for desired information to be parsed into. It is treated as
though there are 0 to many instances of the data structure being described so
information is returned as an array.

The result is always a list. If the LLM returns a single map instead of an
array, it is automatically wrapped in a list so callers can rely on a
consistent return type.

Originally based on:
- https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chains/openai_functions/extraction.ts#L43

## Example

    # JSONSchema definition of data we want to capture or extract.
    schema_parameters = %{
      type: "object",
      properties: %{
        person_name: %{type: "string"},
        person_age: %{type: "number"},
        person_hair_color: %{type: "string"},
        dog_name: %{type: "string"},
        dog_breed: %{type: "string"}
      },
      required: []
    }

    # Model setup
    {:ok, chat} = ChatOpenAI.new(%{temperature: 0})

    # run the chain on the text information
    data_prompt =
      "Alex is 5 feet tall. Claudia is 4 feet taller than Alex and jumps higher than him.
      Claudia is a brunette and Alex is blonde. Alex's dog Frosty is a labrador and likes to play hide and seek."

    {:ok, result} = LangChain.Chains.DataExtractionChain.run(chat, schema_parameters, data_prompt)

    # Example result
    [
      %{
        "dog_breed" => "labrador",
        "dog_name" => "Frosty",
        "person_age" => nil,
        "person_hair_color" => "blonde",
        "person_name" => "Alex"
      },
      %{
        "dog_breed" => nil,
        "dog_name" => nil,
        "person_age" => nil,
        "person_hair_color" => "brunette",
        "person_name" => "Claudia"
      }
    ]

If the LLM returns a single map (e.g. when only one entity is found), it is
wrapped in a list automatically:

    # Single-entity result normalised to a list
    [
      %{
        "person_name" => "Alex",
        "person_age" => nil,
        ...
      }
    ]

The `schema_parameters` in the previous example can also be expressed using a
list of `LangChain.FunctionParam` structs. An equivalent version looks like
this:

    alias LangChain.FunctionParam

    schema_parameters = [
      FunctionParam.new!(%{name: "person_name", type: :string}),
      FunctionParam.new!(%{name: "person_age", type: :number}),
      FunctionParam.new!(%{name: "person_hair_color", type: :string}),
      FunctionParam.new!(%{name: "dog_name", type: :string}),
      FunctionParam.new!(%{name: "dog_breed", type: :string})
    ]
    |> FunctionParam.to_parameters_schema()

# `build_extract_function`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.11/lib/chains/data_extraction_chain.ex#L178)

```elixir
@spec build_extract_function(json_schema :: map()) ::
  LangChain.Function.t() | no_return()
```

Build the function to expose to the LLM that can be called for data
extraction.

# `normalize_extraction_info`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.11/lib/chains/data_extraction_chain.ex#L109)

```elixir
@spec normalize_extraction_info(term()) ::
  {:ok, [any()]} | {:error, LangChain.LangChainError.t()}
```

Coerces the extraction tool's `info` argument to a list of rows.

Models sometimes return one JSON object instead of a one-element array; `run/4`
uses this so callers always get `{:ok, list}`.

# `run`
[🔗](https://github.com/brainlid/langchain/blob/v0.8.11/lib/chains/data_extraction_chain.ex#L123)

```elixir
@spec run(
  LangChain.ChatModels.ChatOpenAI.t(),
  json_schema :: map(),
  prompt :: [any()],
  opts :: Keyword.t()
) :: {:ok, result :: [any()]} | {:error, LangChain.LangChainError.t()}
```

Run the data extraction chain.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
