Penelope.Application
- Top
- Summary
- Functions
  - start/2
Penelope.ML.CRF.Tagger
- Top
- Summary
- Functions
Penelope.ML.Feature.ContextFeaturizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Feature.MergeFeaturizer
- Top
- Summary
- Functions
  - compile/1
  - export/1
  - fit/4
  - transform/3
Penelope.ML.Feature.StackVectorizer
- Top
- Summary
- Functions
  - compile/1
  - export/1
  - fit/4
  - transform/3
Penelope.ML.Linear.Classifier
- Top
- Summary
- Functions
Penelope.ML.Pipeline
- Top
- Summary
- Functions
Penelope.ML.Registry
- Top
- Summary
- Functions
Penelope.ML.SVM.Classifier
- Top
- Summary
- Functions
Penelope.ML.Text.CountVectorizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.LowercasePreprocessor
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.POSFeaturizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.PTBDigitTokenizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.PTBTokenizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.RegexVectorizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.TokenFeaturizer
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Text.TokenFilter
- Top
- Summary
- Functions
  - transform/3
Penelope.ML.Vector
- Top
- Summary
- Types
  - t/0
- Functions
  - add/2
  - concat/2
  - empty/0
  - from_list/1
  - get/2
  - scale/2
  - scale_add/3
  - size/1
  - to_list/1
  - zeros/1
Penelope.ML.Word2vec.Index
- Top
- Summary
- Types
  - t/0
- Functions
  - close/1
  - compile!/2
  - create!/3
  - fetch!/2
  - insert!/2
  - lookup!/2
  - open!/2
  - parse_insert!/2
  - parse_line!/1
Penelope.ML.Word2vec.MeanVectorizer
- Top
- Summary
- Functions
  - transform/3
Penelope.NIF
- Top
- Summary
- Functions
Penelope.NLP.IntentClassifier
- Top
- Summary
- Types
  - model/0
- Functions
Penelope.NLP.POSTagger
- Top
- Summary
- Types
  - model/0
- Functions
  - compile/1
  - export/1
  - fit/4
  - tag/3
Penelope.NLP.Tokenize.BertTokenizer
- Top
- Summary
- Functions
  - decode/1
  - encode/3
Penelope.NLP.Tokenize.PennTreebankTokenizer
- Top
- Summary
- Functions
  - detokenize/1
  - tokenize/1
Penelope.NLP.Tokenize.Tokenizer
- Top
- Summary
- Callbacks
  - detokenize/1
  - tokenize/1

Penelope v0.5.0 Penelope.NLP.Tokenize.PennTreebankTokenizer View Source

The tokenization scheme used for the creation of the Penn Treebank corpus. See ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html.

Some alterations have been made to the original script to better handle common Unicode replacement characters.

Link to this section Summary

Functions

detokenize(tokens)

Detokenize a string tokenized by the Penn Treebank tokenizer. The PTB tokenization scheme is lossy; attributes like capitalization, multiple spaces, and padding around certain punctuation will be removed from the output

tokenize(text)

Separate a string into a list of tokens

Link to this section Functions

detokenize(tokens)

tokenize(text)

Separate a string into a list of tokens.

Callback implementation for Penelope.NLP.Tokenize.Tokenizer.tokenize/1.