aho_corasick v0.0.1 AhoCorasick

Usage:

graph = AhoCorasick.new(["my", "dictionary", "terms"])

results = AhoCorasick.search(graph, "I wonder if any of the terms from my dictionary appear in this text, and if so, where?")

=> #MapSet<[{"dictionary", 37, 10}, {"my", 34, 2}, {"terms", 23, 5}]>

Summary

Functions

Add a dictionary term to the graph

if you want to manually/dynamically add terms to the tree, this method must be called before you can search the graph for matches

Create a new AhoCorasick graph, but don’t populate it. You will need to call add_token to add tokens, and then build_trie before searching against this AhoCorasick

Create a new fully-formed AhoCorasick graph. Pass in all the dictionary terms you want to search against. You can immediately call search with this graph after this

follows a list of tokens from the :root node and returns the :digraph vertex at the end, if found. Returns nil otherwise

returns the number of nodes in the graph. maybe useful for debugging?

prints an ascii representation of the token tree (only shows token edges and result values, but not failure edges)

Searches for dictionary term matches in the given input text

Functions

add_term(ac, term)

Add a dictionary term to the graph

build_trie(ac, queue \\ [:root])

if you want to manually/dynamically add terms to the tree, this method must be called before you can search the graph for matches.

Usage:

g = AhoCorasick.new()
AhoCorasick.add_term(g, "a term")

# must be called before calling search!
AhoCorasick.build_trie(g)

# internal trie/graph is built. now you can search:
AhoCorasick.search(g, input_text)
new()

Create a new AhoCorasick graph, but don’t populate it. You will need to call add_token to add tokens, and then build_trie before searching against this AhoCorasick

new(terms)

Create a new fully-formed AhoCorasick graph. Pass in all the dictionary terms you want to search against. You can immediately call search with this graph after this.

node_at_path(ac, tokens, node \\ :root)

follows a list of tokens from the :root node and returns the :digraph vertex at the end, if found. Returns nil otherwise

num_nodes(ac)

returns the number of nodes in the graph. maybe useful for debugging?

print(ac)

prints an ascii representation of the token tree (only shows token edges and result values, but not failure edges)

print(ac, node, level)
search(ac, input)

Searches for dictionary term matches in the given input text

Returns a MapSet of {matched_term, start_index_in_input_text, run_length}