Grammar Customization Guide
View SourceThis document explains how to customize and extend Nasty's grammar rules by creating external grammar resource files.
Overview
Starting with version 0.2.0, Nasty externalizes grammar rules from hardcoded Elixir modules into configurable .exs resource files. This allows you to:
- Customize existing grammar rules without modifying source code
- Create domain-specific grammar variants (e.g., legal, medical, technical)
- Add support for new languages
- A/B test different parsing strategies
- Share grammar rule sets across projects
Architecture
Grammar rules are stored as Elixir term files (.exs) in:
priv/languages/{language_code}/grammars/{rule_type}.exsFor variants (e.g., formal, informal, technical):
priv/languages/{language_code}/variants/{variant_name}/{rule_type}.exsLanguage Codes
- English:
enorenglish - Spanish:
esorspanish - Catalan:
caorcatalan(future)
Rule Types
Each language can have the following grammar rule files:
phrase_rules.exs- Phrase structure patterns (NP, VP, PP, AdjP, AdvP)dependency_rules.exs- Universal Dependencies relations and extraction rulescoordination_rules.exs- Coordinating conjunctions and coordination patternssubordination_rules.exs- Subordinating conjunctions and subordinate clause patterns
Grammar Loader API
Loading Grammar Rules
alias Nasty.Language.GrammarLoader
# Load default grammar rules
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules)
# Load with variant
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules, variant: "formal")
# Force reload (bypass cache)
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules, force_reload: true)Cache Management
# Clear all cached grammar
GrammarLoader.clear_cache()
# Clear specific cached rules
GrammarLoader.clear_cache(:en, :phrase_rules, :default)Direct File Loading
# Load from custom path
{:ok, rules} = GrammarLoader.load_file("/path/to/custom_rules.exs")Creating Grammar Files
File Structure
Grammar files are Elixir term files that evaluate to a map:
%{
# Top-level keys define rule categories
rule_category_1: [...],
rule_category_2: %{...},
# Metadata
notes: %{
key: "description"
}
}Example: Simple Phrase Rules
Create priv/languages/en/grammars/custom_phrase_rules.exs:
%{
# Noun phrase patterns
noun_phrases: [
# Simple NP: Det + Noun
{:np, [:det, :noun]},
# NP with adjective: Det + Adj + Noun
{:np, [:det, :adj, :noun]},
# NP with PP: Det + Noun + PP
{:np, [:det, :noun, :pp]}
],
# Verb phrase patterns
verb_phrases: [
# Simple VP: just Verb
{:vp, [:verb]},
# VP with object: Verb + NP
{:vp, [:verb, :np]},
# VP with auxiliary: Aux + Verb
{:vp, [:aux, :verb]}
],
notes: %{
version: "1.0.0",
author: "Your Name",
description: "Custom phrase rules for domain-specific parsing"
}
}English Grammar Reference
Phrase Rules (phrase_rules.exs)
See priv/languages/en/grammars/phrase_rules.exs for the complete reference.
Key sections:
%{
noun_phrases: [
# List of NP patterns
{:np, [:det, :noun]},
{:np, [:det, :adj, :noun]},
# ...
],
verb_phrases: [
# List of VP patterns
{:vp, [:verb]},
{:vp, [:aux, :verb, :np]},
# ...
],
prepositional_phrases: [
# PP patterns
{:pp, [:prep, :np]},
# ...
],
adjectival_phrases: [
# AdjP patterns
{:adjp, [:adv, :adj]},
# ...
],
adverbial_phrases: [
# AdvP patterns
{:advp, [:adv]},
# ...
],
relative_clauses: [
# Relative clause patterns
{:relative_clause, [:relative_marker, :clause]},
# ...
],
special_rules: [
# Special handling rules
{:comparative_than, :pseudo_prep},
# ...
]
}Dependency Rules (dependency_rules.exs)
See priv/languages/en/grammars/dependency_rules.exs for the complete reference.
Key sections:
%{
core_arguments: [
# Subject, object, complements
%{
relation: :nsubj,
description: "Nominal subject",
head_pos: [:verb],
dependent_pos: [:noun, :propn, :pron],
example: "The cat sleeps → nsubj(sleeps, cat)"
},
# ...
],
nominal_dependents: [
# Determiners, modifiers
%{relation: :det, ...},
%{relation: :amod, ...},
# ...
],
function_words: [
# Auxiliaries, copulas, markers
%{relation: :aux, ...},
# ...
],
extraction_priorities: [
# Order of dependency extraction
:nsubj, :obj, :det, :amod, # ...
]
}Coordination Rules (coordination_rules.exs)
Key sections:
%{
coordinating_conjunctions: [
%{
conjunction: "and",
type: :copulative,
example: "cats and dogs"
},
# ...
],
coordination_patterns: [
%{
pattern: :np_coordination,
structure: "NP CCONJ NP",
example: "cats and dogs"
},
# ...
],
special_cases: [
# Correlative conjunctions, etc.
%{
type: :correlative,
patterns: [
%{pair: ["both", "and"], example: "both cats and dogs"},
# ...
]
}
]
}Subordination Rules (subordination_rules.exs)
Key sections:
%{
subordinating_conjunctions: [
%{
conjunction: "because",
type: :causal,
example: "I stayed because it rained"
},
# ...
],
relative_markers: [
%{
marker: "who",
type: :relative_pronoun,
example: "the person who came"
},
# ...
],
subordinate_clause_types: [
%{
type: :adverbial,
dependency_relation: :advcl,
subtypes: [:temporal, :causal, :conditional, ...]
},
# ...
]
}Spanish Grammar Reference
Spanish grammar files follow the same structure but include Spanish-specific features:
- Post-nominal adjectives:
la casa roja(the red house) - Pro-drop: null subjects allowed
- Flexible word order: SVO, VSO, VOS
- Clitic pronouns:
dámelo(give-me-it) - Personal 'a':
Veo a Juan(I see Juan) - Two copulas:
servs.estar - Phonetic variants:
y→e,o→ubefore vowels
See files in priv/languages/es/grammars/ for complete Spanish grammar.
Creating Domain-Specific Variants
Example: Technical English
Create priv/languages/en/variants/technical/phrase_rules.exs:
%{
# Inherit base rules and add technical-specific patterns
noun_phrases: [
# Standard patterns
{:np, [:det, :noun]},
# Technical compound nouns (e.g., "TCP/IP protocol")
{:np, [:propn, :noun]},
{:np, [:propn, :sym, :propn, :noun]},
# Noun phrases with technical modifiers
{:np, [:num, {:unit, [:noun]}, :noun]}, # "5 GB memory"
# Multi-word technical terms
{:np, [{:many, :noun}]} # "machine learning model"
],
verb_phrases: [
# Standard patterns
{:vp, [:verb, :np]},
# Technical action verbs (instantiate, serialize, etc.)
{:vp, [:tech_verb, :np, :pp]},
# Passive constructions common in technical writing
{:vp, [:aux, :verb, :pp]}
],
notes: %{
domain: "technical",
use_case: "Software documentation, API specs, technical papers"
}
}Example: Legal English
%{
noun_phrases: [
# Legal entities
{:np, [:det, :legal_entity]}, # "the plaintiff", "the defendant"
# Complex legal terms
{:np, [:det, :adj, :legal_term, :pp]}, # "the aforementioned contractual obligation"
# References (Section X, Article Y)
{:np, [:legal_ref_type, :num]} # "Section 5"
],
subordination_patterns: [
# Legal conditionals (provided that, in the event that)
{:conditional, :multiword_legal_conj}
],
notes: %{
domain: "legal",
use_case: "Contracts, legislation, court documents"
}
}Using Custom Grammar in Code
Option 1: Load and Use Directly
# Load custom grammar
{:ok, custom_phrase_rules} = GrammarLoader.load(:en, :custom_phrase_rules)
# Use in your parser
custom_np_patterns = custom_phrase_rules.noun_phrases
# Process with custom patterns...Option 2: Extend Parser Module
defmodule MyApp.CustomParser do
alias Nasty.Language.GrammarLoader
def parse_technical_text(text) do
# Load technical variant
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules, variant: "technical")
# Parse using custom rules
# ... your parsing logic using rules ...
end
endOption 3: Runtime Configuration
# In config/config.exs
config :nasty,
default_grammar_variant: "technical"
# In your code
variant = Application.get_env(:nasty, :default_grammar_variant, :default)
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules, variant: variant)Grammar Validation
The grammar loader validates that all files return a map:
# Valid
%{
rules: [...],
notes: %{}
}
# Invalid - will raise error
[1, 2, 3] # Not a mapFor more complex validation, extend GrammarLoader.validate_rules/1.
Best Practices
1. Start with Base Grammar
Copy existing grammar files and modify rather than starting from scratch:
cp priv/languages/en/grammars/phrase_rules.exs \
priv/languages/en/variants/custom/phrase_rules.exs
2. Document Your Rules
Include comprehensive notes in your grammar files:
%{
rules: [...],
notes: %{
version: "1.0.0",
author: "Team Name",
created: "2026-01-08",
description: "Custom grammar for medical text parsing",
changes: [
"Added medical entity patterns",
"Extended VP patterns for medical procedures"
],
examples: [
"The patient underwent cardiac catheterization",
"Diagnose: Type 2 diabetes mellitus"
]
}
}3. Test Your Grammar
Create tests for custom grammar:
defmodule MyApp.CustomGrammarTest do
use ExUnit.Case
alias Nasty.Language.GrammarLoader
test "custom grammar loads successfully" do
assert {:ok, rules} = GrammarLoader.load(:en, :custom_rules)
assert is_map(rules)
assert Map.has_key?(rules, :noun_phrases)
end
test "custom grammar includes domain patterns" do
{:ok, rules} = GrammarLoader.load(:en, :custom_rules, variant: "medical")
assert Enum.any?(rules.noun_phrases, fn pattern ->
# Check for medical-specific patterns
end)
end
end4. Version Your Grammar
Track grammar versions for reproducibility:
%{
metadata: %{
version: "2.1.0",
compatible_with: "nasty >= 0.2.0"
},
# ... rules ...
}5. Keep Grammar Files Focused
Separate concerns across different rule types:
- Phrase structure →
phrase_rules.exs - Dependencies →
dependency_rules.exs - Coordination →
coordination_rules.exs - Subordination →
subordination_rules.exs
Don't mix all rules into one file.
Performance Considerations
Caching
Grammar files are cached in ETS after first load:
# First load: reads from disk
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules) # ~5ms
# Subsequent loads: from cache
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules) # ~0.1msClear cache when updating grammar during development:
GrammarLoader.clear_cache()File Size
Keep grammar files under 1MB for fast loading. If needed, split into multiple files:
phrase_rules_np.exs # Noun phrase patterns
phrase_rules_vp.exs # Verb phrase patterns
phrase_rules_pp.exs # Prepositional phrase patternsTroubleshooting
Grammar File Not Found
Grammar file not found: .../en/grammars/missing_rules.exs, using empty rulesSolution: Check file exists and path is correct. Grammar files must be in priv/languages/{lang}/grammars/.
Invalid Grammar Format
** (ArgumentError) Grammar rules must be a map, got: [...]Solution: Ensure file evaluates to a map:
# Correct
%{rules: [...]}
# Wrong
[...]Compilation Errors
** (SyntaxError) invalid syntaxSolution: Grammar files must be valid Elixir. Test with:
elixir priv/languages/en/grammars/your_rules.exs
Cache Issues
If changes to grammar files aren't reflected:
# Clear cache
Nasty.Language.GrammarLoader.clear_cache()
# Or force reload
{:ok, rules} = GrammarLoader.load(:en, :phrase_rules, force_reload: true)Examples Repository
See working examples in the main repository:
- English grammar:
priv/languages/en/grammars/ - Spanish grammar:
priv/languages/es/grammars/ - Test fixtures:
test/fixtures/grammars/
Contributing Custom Grammars
To contribute grammar variants to the Nasty project:
- Create grammar files following the structure above
- Add tests demonstrating the grammar works
- Document the use case and domain
- Submit a pull request to the main repository
Further Reading
- PARSING_GUIDE.md - Understanding the parsing pipeline
- ENGLISH_GRAMMAR.md - English grammar specification
- ARCHITECTURE.md - System architecture overview
- Universal Dependencies: https://universaldependencies.org/