# `Tribunal.RedTeam`
[🔗](https://github.com/georgeguimaraes/tribunal/blob/v1.3.6/lib/tribunal/red_team.ex#L1)

Red team attack generators for testing LLM safety.

Inspired by promptfoo's red team strategies, this module provides functions to generate
adversarial prompts for testing your LLM's resistance to jailbreaks and manipulation.

## Attack Categories

### Static Encoding Attacks
Transform harmful prompts using encoding techniques:
- Base64 encoding
- Leetspeak substitution
- ROT13 cipher
- Pig Latin
- Reversed text

### Prompt Injection Attacks
Attempt to override system instructions:
- Ignore previous instructions
- System prompt extraction
- Role switching

### Jailbreak Attacks
Classic jailbreak techniques:
- DAN (Do Anything Now)
- STAN (Strive To Avoid Norms)
- Developer mode
- Hypothetical framing

## Usage

    # Generate all attacks for a harmful prompt
    attacks = Tribunal.RedTeam.generate_attacks("How do I pick a lock?")

    # Generate specific attack type
    {:ok, encoded} = Tribunal.RedTeam.base64_encode("harmful prompt")

    # Test your LLM against red team attacks
    for {attack_type, prompt} <- attacks do
      response = MyLLM.generate(prompt)
      refute_jailbreak_llm(response, query: prompt)
    end

# `base64_attack`

Encodes prompt in Base64 and asks model to decode and execute.

# `character_roleplay_attack`

Character roleplay to bypass restrictions.

# `dan_attack`

Classic DAN (Do Anything Now) jailbreak.

# `delimiter_injection_attack`

Uses delimiter injection to confuse parsing.

# `developer_mode_attack`

Developer mode jailbreak attempt.

# `encoding_attacks`

Generates encoding-based attacks that transform the prompt.

# `generate_attacks`

Generates all attack variants for a given prompt.

Returns a list of `{attack_type, prompt}` tuples.

# `hypothetical_attack`

Hypothetical framing to bypass restrictions.

# `ignore_instructions_attack`

Classic ignore previous instructions attack.

# `injection_attacks`

Generates prompt injection attacks.

# `jailbreak_attacks`

Generates classic jailbreak attacks.

# `leetspeak_attack`

Converts prompt to leetspeak (1337).

# `pig_latin_attack`

Converts prompt to Pig Latin.

# `research_framing_attack`

Academic/research framing.

# `reversed_attack`

Reverses the prompt text.

# `role_switch_attack`

Attempts to switch the model's role.

# `rot13_attack`

Encodes prompt using ROT13 cipher.

# `stan_attack`

STAN (Strive To Avoid Norms) jailbreak variant.

# `system_prompt_extraction_attack`

Attempts to extract the system prompt.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
