Sensitive Data Filtering Guide
View SourceA comprehensive guide to protecting sensitive information in ReqCassette recordings.
Table of Contents
- Why Filter Sensitive Data?
- Quick Start
- Filtering Methods
- LLM API Protection
- Common Patterns
- Complete Examples
- Verification
- Best Practices
- Troubleshooting
Why Filter Sensitive Data?
Cassettes record real HTTP interactions, which often contain sensitive information:
- API Keys - In query strings, headers, or request bodies
- Authentication Tokens - Bearer tokens, session tokens, OAuth tokens
- Credentials - Passwords, secret keys, certificates
- Personal Data - Emails, names, addresses, phone numbers
- Internal Information - Infrastructure details, internal URLs
The Risk: If cassettes are committed to version control without filtering, sensitive data becomes permanently embedded in your repository's history, potentially exposing it to:
- Public repositories on GitHub/GitLab
- Unauthorized team members
- Security breaches through compromised accounts
- Automated secret scanners
The Solution: ReqCassette provides comprehensive filtering to remove or redact sensitive data before cassettes are written to disk.
Quick Start
For LLM APIs (Anthropic, OpenAI, etc.), always filter authorization headers:
with_cassette "my_llm_test",
[filter_request_headers: ["authorization", "x-api-key", "cookie"]],
fn plug ->
ReqLLM.generate_text(
"anthropic:claude-sonnet-4-20250514",
"Hello!",
req_http_options: [plug: plug]
)
endThis prevents API keys in Authorization: Bearer sk-ant-... headers from being
saved.
Filtering Methods
ReqCassette supports four complementary filtering approaches, applied in this order:
- Regex filters - Pattern-based replacement
- Header filters - Remove specific headers
- Request callback (
filter_request) - Request-only custom filtering - Response callback (
filter_response) - Response-only custom filtering (always safe!)
1. Header Filtering
Remove sensitive headers entirely from requests and responses.
When to use: When headers contain secrets you never want to save (API keys, session cookies, auth tokens).
with_cassette "api_test",
[
filter_request_headers: ["authorization", "x-api-key", "cookie"],
filter_response_headers: ["set-cookie", "x-session-token"]
],
fn plug ->
Req.get!(
"https://api.example.com/data",
headers: [{"authorization", "Bearer secret-token"}],
plug: plug
)
endFeatures:
- Case-insensitive matching (
Authorizationmatchesauthorization) - Completely removes headers from cassette
- Separate lists for request and response headers
- Headers are never written to disk
Result: Cassette will not contain the specified headers at all.
2. Regex Pattern Filtering
Replace matching patterns with redacted values using regular expressions.
When to use: For secrets embedded in URLs, query strings, or request/response bodies.
with_cassette "api_test",
[
filter_sensitive_data: [
{~r/api_key=[\w-]+/, "api_key=<REDACTED>"},
{~r/"token":"[^"]+"/, ~s("token":"<REDACTED>")},
{~r/Bearer [\w.-]+/, "Bearer <REDACTED>"}
]
],
fn plug ->
Req.get!("https://api.example.com/data?api_key=secret123", plug: plug)
endFeatures:
- Applied to URIs, query strings, and all body types (text, JSON, blob)
- Multiple patterns processed in order
- Works with JSON bodies (pattern matching on serialized form + recursive)
- Supports binary/blob bodies (base64-decoded, filtered, re-encoded)
Result: Cassette contains the URL
https://api.example.com/data?api_key=<REDACTED>
3. Request Callback Filtering
Custom filtering for requests with complex logic.
When to use: For complex request transformations, normalization, or conditional filtering based on request data.
with_cassette "api_test",
[
filter_request: fn request ->
request
|> update_in(["body_json", "email"], fn _ -> "user@example.com" end)
|> update_in(["body_json", "timestamp"], fn _ -> "<NORMALIZED>" end)
end
],
fn plug ->
Req.post!(
"https://api.example.com/events",
json: %{event: "login", timestamp: DateTime.utc_now(), email: "alice@real.com"},
plug: plug
)
endFeatures:
- Applied during BOTH recording and matching (like regex/header filters)
- Only receives request portion of interaction
- Safe for complex request transformations
- Cannot break replay if used correctly
⚠️ Important: If filter_request modifies fields used for matching (method,
uri, query, headers, body), ensure transformations are idempotent or adjust
match_requests_on to exclude those fields.
Request structure:
%{
"method" => "POST",
"uri" => "https://...",
"query_string" => "...",
"headers" => %{},
"body_type" => "json",
"body_json" => %{} # or "body" for text, "body_blob" for binary
}4. Response Callback Filtering
Custom filtering for responses - always safe!
When to use: For complex response transformations, redaction, or conditional filtering based on response data.
with_cassette "api_test",
[
filter_response: fn response ->
response
|> update_in(["body_json", "password"], fn _ -> "<REDACTED>" end)
|> update_in(["body_json", "email"], fn _ -> "user@example.com" end)
|> put_in(["headers", "x-secret"], ["<REDACTED>"])
end
],
fn plug ->
Req.post!(
"https://api.example.com/users",
json: %{email: "alice@real.com", password: "secret"},
plug: plug
)
endFeatures:
- Applied ONLY during recording
- Only receives response portion of interaction
- Always safe - responses don't affect matching
- Simplest callback type for response filtering
Response structure:
%{
"status" => 200,
"headers" => %{},
"body_type" => "json",
"body_json" => %{} # or "body" for text, "body_blob" for binary
}LLM API Protection
LLM APIs use Authorization headers with sensitive API keys. Always filter these headers when using ReqCassette with LLM services.
Why It's Critical for LLMs
# ❌ WITHOUT FILTERING - API key saved to cassette!
with_cassette "llm_test", fn plug ->
ReqLLM.generate_text(
"anthropic:claude-sonnet-4-20250514",
"Hello",
req_http_options: [plug: plug]
)
end
# Cassette contains:
# "headers": {
# "authorization": ["Bearer sk-ant-api03-YOUR_SECRET_KEY_HERE"]
# }# ✅ WITH FILTERING - API key protected!
with_cassette "llm_test",
[filter_request_headers: ["authorization", "x-api-key", "cookie"]],
fn plug ->
ReqLLM.generate_text(
"anthropic:claude-sonnet-4-20250514",
"Hello",
req_http_options: [plug: plug]
)
end
# Cassette does NOT contain authorization headerRecommended Pattern for LLM Tests
defmodule MyApp.LLMTest do
use ExUnit.Case, async: true
@cassette_dir "test/cassettes/llm"
@cassette_opts [
cassette_dir: @cassette_dir,
mode: :record,
filter_request_headers: ["authorization", "x-api-key", "cookie"]
]
test "generates response" do
with_cassette "llm_generation", @cassette_opts, fn plug ->
{:ok, response} = ReqLLM.generate_text(
"anthropic:claude-sonnet-4-20250514",
"Explain Elixir",
max_tokens: 100,
req_http_options: [plug: plug]
)
assert response.choices[0].message.content =~ "Elixir"
end
end
endAgent/Multi-Turn Protection
For agents making multiple LLM calls:
{:ok, agent} = MyAgent.start_link(
cassette_opts: [
cassette_name: "my_agent",
cassette_dir: "test/cassettes",
mode: :record,
filter_request_headers: ["authorization", "x-api-key", "cookie"]
]
)
MyAgent.prompt(agent, "What is 15 * 7?")Common Patterns
API Keys (Query Parameters)
filter_sensitive_data: [
{~r/api_key=[\w-]+/, "api_key=<REDACTED>"},
{~r/access_token=[\w-]+/, "access_token=<REDACTED>"}
]API Keys (JSON Bodies)
filter_sensitive_data: [
{~r/"apiKey":"[^"]+"/, ~s("apiKey":"<REDACTED>")},
{~r/"api_key":"[^"]+"/, ~s("api_key":"<REDACTED>")}
]Bearer Tokens
filter_sensitive_data: [
{~r/Bearer [\w.-]+/, "Bearer <REDACTED>"}
]OAuth Tokens
filter_sensitive_data: [
{~r/"access_token":"[^"]+"/, ~s("access_token":"<REDACTED>")},
{~r/"refresh_token":"[^"]+"/, ~s("refresh_token":"<REDACTED>")}
]Email Addresses
filter_sensitive_data: [
{~r/[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}/, "user@example.com"}
]Credit Card Numbers
filter_sensitive_data: [
{~r/\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}/, "XXXX-XXXX-XXXX-XXXX"}
]UUIDs (for deterministic cassettes)
filter_sensitive_data: [
{~r/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/,
"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"}
]Timestamps (for deterministic cassettes)
filter_sensitive_data: [
{~r/"timestamp":"[^"]+"/, ~s("timestamp":"<NORMALIZED>")},
{~r/"created_at":"[^"]+"/, ~s("created_at":"<NORMALIZED>")}
]Complete Examples
Basic Authentication API
with_cassette "auth_api",
[
filter_request_headers: ["authorization"],
filter_response_headers: ["set-cookie"],
filter_sensitive_data: [
{~r/password=[\w-]+/, "password=<REDACTED>"}
]
],
fn plug ->
Req.post!(
"https://api.example.com/login?password=secret",
headers: [{"authorization", "Basic dXNlcjpwYXNz"}],
plug: plug
)
endPayment API with Multiple Secrets
with_cassette "payment_api",
[
filter_request_headers: ["authorization", "x-api-key"],
filter_sensitive_data: [
# Credit cards
{~r/\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}/, "XXXX-XXXX-XXXX-XXXX"},
# CVV
{~r/"cvv":"\d{3,4}"/, ~s("cvv":"XXX")},
# SSN
{~r/\d{3}-\d{2}-\d{4}/, "XXX-XX-XXXX"}
],
# Redact customer email in request
filter_request: fn request ->
update_in(
request,
["body_json", "customer", "email"],
fn _ -> "customer@example.com" end
)
end
],
fn plug ->
Req.post!(
"https://api.stripe.com/v1/charges",
json: %{
amount: 1000,
card: %{number: "4242424242424242", cvv: "123"},
customer: %{email: "alice@real.com", ssn: "123-45-6789"}
},
headers: [{"authorization", "Bearer sk_test_secret"}],
plug: plug
)
endLLM with Full Protection
with_cassette "llm_protected",
[
mode: :record,
filter_request_headers: ["authorization", "x-api-key", "cookie"],
filter_sensitive_data: [
# Normalize timestamps for deterministic cassettes
{~r/"timestamp":"[^"]+"/, ~s("timestamp":"<NORMALIZED>")},
# Normalize request IDs
{~r/"request_id":"[^"]+"/, ~s("request_id":"<NORMALIZED>")}
],
# Normalize request fields
filter_request: fn request ->
update_in(request, ["body_json", "created_at"], fn _ -> "<NORMALIZED>" end)
end,
# Normalize response fields
filter_response: fn response ->
response
|> put_in(["headers", "x-request-id"], ["<NORMALIZED>"])
|> put_in(["body_json", "id"], "<NORMALIZED>")
end
],
fn plug ->
{:ok, response} = ReqLLM.generate_text(
"anthropic:claude-sonnet-4-20250514",
"Hello!",
max_tokens: 100,
req_http_options: [plug: plug]
)
assert response.choices[0].message.content =~ "Hello"
endVerification
Verify Cassettes Are Properly Filtered
After recording, check cassettes for leaked secrets:
# Search for common secret patterns
grep -r "Bearer" test/cassettes/
grep -r "api_key=" test/cassettes/
grep -r "password" test/cassettes/
# Check specific cassette
cat test/cassettes/my_test.json | grep -i "authorization"
Programmatic Verification
test "cassette does not contain secrets" do
cassette_path = "test/cassettes/my_test.json"
{:ok, content} = File.read(cassette_path)
# Verify secrets are redacted
refute String.contains?(content, "sk-ant-") # Anthropic API key
refute String.contains?(content, "Bearer sk_") # Generic bearer token
refute String.contains?(content, "my-secret-key")
# Verify redaction markers are present
assert String.contains?(content, "<REDACTED>")
endCheck Cassette Structure
test "cassette has properly filtered headers" do
cassette_path = "test/cassettes/my_test.json"
{:ok, data} = File.read(cassette_path)
{:ok, cassette} = Jason.decode(data)
interaction = hd(cassette["interactions"])
# Request headers should not include authorization
request_headers = interaction["request"]["headers"]
refute Map.has_key?(request_headers, "authorization")
refute Map.has_key?(request_headers, "x-api-key")
# Response headers should not include cookies
response_headers = interaction["response"]["headers"]
refute Map.has_key?(response_headers, "set-cookie")
endBest Practices
1. Filter by Default
Create a module-level constant for common filter options:
defmodule MyApp.APITest do
use ExUnit.Case, async: true
@cassette_opts [
cassette_dir: "test/cassettes",
mode: :record,
filter_request_headers: ["authorization", "x-api-key", "cookie"],
filter_response_headers: ["set-cookie"]
]
test "API call" do
with_cassette "my_test", @cassette_opts, fn plug ->
# Your test code
end
end
end2. Use Environment-Based Recording
setup do
mode = case System.get_env("CI") do
"true" -> :replay # CI always replays (no API keys needed)
_ -> if System.get_env("RECORD"), do: :record, else: :replay
end
cassette_opts = [
cassette_dir: "test/cassettes",
mode: mode,
filter_request_headers: ["authorization", "x-api-key", "cookie"]
]
{:ok, cassette_opts: cassette_opts}
end3. Document Required Filtering
Add comments to remind future developers:
# IMPORTANT: Always filter authorization headers for LLM APIs
# to prevent API keys from being committed to version control
with_cassette "llm_test",
[filter_request_headers: ["authorization", "x-api-key", "cookie"]],
fn plug ->
# ...
end4. Test Filtered Cassettes
Add tests to verify filtering is working:
describe "cassette security" do
test "cassettes do not contain API keys" do
cassettes_dir = "test/cassettes"
cassettes_dir
|> File.ls!()
|> Enum.filter(&String.ends_with?(&1, ".json"))
|> Enum.each(fn filename ->
content = File.read!(Path.join(cassettes_dir, filename))
# Add your secret patterns here
refute String.contains?(content, "sk-ant-"),
"#{filename} contains Anthropic API key"
refute String.contains?(content, "sk-test-"),
"#{filename} contains Stripe API key"
end)
end
end5. Audit Before Committing
Before committing cassettes to git:
# Quick audit script
for file in test/cassettes/*.json; do
echo "Checking $file..."
grep -E "(sk-|Bearer|password|token|api_key)" "$file" && echo "⚠️ SECRETS FOUND!" || echo "✅ Clean"
done
6. Use .gitignore for Unfiltered Cassettes
If you want to record without filtering locally but never commit:
# .gitignore
test/cassettes/unfiltered/Then use:
with_cassette "debug_test",
[cassette_dir: "test/cassettes/unfiltered"],
fn plug ->
# Test without filtering for debugging
endSummary
Essential Filtering for LLM APIs:
filter_request_headers: ["authorization", "x-api-key", "cookie"]Comprehensive Protection:
with_cassette "secure_test",
[
# Remove auth headers
filter_request_headers: ["authorization", "x-api-key", "cookie"],
filter_response_headers: ["set-cookie"],
# Redact patterns
filter_sensitive_data: [
{~r/api_key=[\w-]+/, "api_key=<REDACTED>"},
{~r/"token":"[^"]+"/, ~s("token":"<REDACTED>")}
],
# Request filtering (normalization)
filter_request: fn request ->
update_in(request, ["body_json", "timestamp"], fn _ -> "<NORMALIZED>" end)
end,
# Response filtering (redaction)
filter_response: fn response ->
update_in(response, ["body_json", "email"], fn _ -> "user@example.com" end)
end
],
fn plug ->
# Your code here
endRemember:
- Filter authorization headers for ALL LLM APIs
- Use
filter_responsefor response-only filtering (always safe!) - Use
filter_requestfor request normalization (timestamps, IDs) - Test cassettes for leaked secrets before committing
- Use environment variables to control recording modes
- Default to
:recordmode with filtering enabled
For more examples, see: