Text analysis pipeline demonstrating why process isolation beats free-threaded Python.
The pipeline has two phases:
Phase 1 -- Pipe run: prepare -> analyze -> classify by sentiment.
- Prepares raw text by attaching stopwords from an ETS-backed store (beam stage)
- Sends batches to Python for word frequency, readability, and sentiment analysis
- Routes results into positive, negative, or neutral buckets based on sentiment score
Phase 2 -- Worker stats collection: After the pipe run, uses Dispatch directly to
call get_worker_stats on each Python worker. Each worker reports its accumulated
state (_global_index, _doc_count), proving that:
- Different workers accumulated different subsets of documents
- Each worker's
_doc_countis consistent (no torn counters) - The Elixir side can merge per-worker results safely in a single process
Summary
Functions
Prepare a raw text item for Python analysis.
Run the text analysis demo, printing formatted results to stdout.
Functions
Prepare a raw text item for Python analysis.
Reads the full stopword list from the ETS store and packages it
alongside the text into the map expected by text_analyzer.analyze_batch.
Run the text analysis demo, printing formatted results to stdout.
Executes two phases:
- The text analysis pipe (prepare -> analyze -> classify)
- Worker stats collection via Dispatch (proves process isolation)