View Source ExAwabi (ExAwabi v0.2.1)

hex.pm hex.pm hex.pm github.com

Elixir wrapper for Awabi, a morphological analyzer using MeCab dictionary, written in Rust.

Additional doc can be found at https://hexdocs.pm/exawabi.

Requirements

  • MeCab and related dictionary.
  • Rust for compiling Rust's Natively Implemented Function (NIF) binding.

Debian/Ubuntu

$ sudo apt install mecab mecab-ipadic-utf8

Mac OS X (homebrew)

$ brew install mecab
$ brew install mecab-ipadic

Installation

The package can be installed by adding exawabi to your list of dependencies in mix.exs:

def deps do
  [
    {:exawabi, "~> 0.1.2"}
  ]
end

Summary

Functions

Tokenize the string.

Tokenize the string with N best matches.

Functions

@spec tokenize(binary()) :: [binary()]

Tokenize the string.

Examples

iex> ExAwabi.tokenize("すもももももももものうち")
[
  {"すもも", "名詞,一般,*,*,*,*,すもも,スモモ,スモモ"},
  {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
  {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
  {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
  {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
  {"の", "助詞,連体化,*,*,*,*,の,ノ,ノ"},
  {"うち", "名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ"}
]
@spec tokenize_n_best(binary(), integer()) :: [[binary()]]

Tokenize the string with N best matches.

Examples

iex> ExAwabi.tokenize_n_best("すもももももももものうち", 3)
[
  [
    {"すもも", "名詞,一般,*,*,*,*,すもも,スモモ,スモモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"の", "助詞,連体化,*,*,*,*,の,ノ,ノ"},
    {"うち", "名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ"}
  ],
  [
    {"すもも", "名詞,一般,*,*,*,*,すもも,スモモ,スモモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"の", "助詞,連体化,*,*,*,*,の,ノ,ノ"},
    {"うち", "名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ"}
  ],
  [
    {"すもも", "名詞,一般,*,*,*,*,すもも,スモモ,スモモ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"もも", "名詞,一般,*,*,*,*,もも,モモ,モモ"},
    {"も", "助詞,係助詞,*,*,*,*,も,モ,モ"},
    {"の", "助詞,連体化,*,*,*,*,の,ノ,ノ"},
    {"うち", "名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ"}
  ]
]