Wanakana v0.1.0 Wanakana.Tokenize View Source
Link to this section Summary
Functions
Splits into sets of kanji/katakana/hiragana/romaji
Link to this section Functions
Splits into sets of kanji/katakana/hiragana/romaji
Does not split into parts of speech!
The js version of this is more sophisticated, but I'll add that if someone needs it.
Examples:
iex> Wanakana.Tokenize.tokenize("")
[]
iex> Wanakana.Tokenize.tokenize("ふふフフ")
["ふふ", "フフ"]
iex> Wanakana.Tokenize.tokenize("感じ")
["感", "じ"]
iex> Wanakana.Tokenize.tokenize("私は悲しい")
["私", "は", "悲", "しい"]
iex> Wanakana.Tokenize.tokenize("what the...私は「悲しい」。")
["what the...", "私", "は", "「", "悲", "しい", "」。"]