View Source Hanyutils

hex.pm hexdocs.pm hex.pm hex.pm github.com

Flexible, modular utilities for dealing with Chinese characters (Hanzi) and Pinyin.

features

Features

  • Convert Chinese characters (Hanzi) to Pinyin
    • Based on the Unicode Han Database
    • Showing only the most common pronunciation
    • Showing the most common pronunciation for Taiwan (if it differs from the most common pronunciation in mainland China)
    • Showing all available pronunciations
  • Read and manipulate pinyin strings
    • Read both tone-marked and numbered pinyin strings. Supports capitalized and uppercase pinyin words, and supports strings containing pinyin mixed with regular text
    • Convert to either representation (numbered or tone marked)
  • Direct access to the building blocks of the library for more esoteric use cases

The following features are planned for a future version of hanyutils:

  • Handle punctuation (。,?,!,...) when translating Han characters
  • Support for 儿 (e.g. translate 这儿 to "zhe'er" instead of "zheer")

installation

Installation

Add hanyutils to your list of dependencies in mix.exs:

def deps do
  [
    {:hanyutils, "~> 0.2.5"}
  ]
end

Note that this package automatically generates functions based on a (large) file. Therefore, compiling this dependency takes some time (around a minute on my machine).

usage

Usage

If you are dealing with a common use case (e.g. converting all characters in a string to pinyin) it is likely your use case is covered by the Hanyutils module:

iex> Hanyutils.to_marked_pinyin("你好")
"nǐhǎo"

iex> Hanyutils.to_numbered_pinyin("你好")
"ni3hao3"

iex> Hanyutils.to_zhuyin("你好")
"ㄋㄧˇㄏㄠˇ"

iex> Hanyutils.characters?("你好")
true

iex> Hanyutils.mark_pinyin("ni3hao3")
"nǐhǎo"

iex> Hanyutils.number_pinyin("nǐhǎo")
"ni3hao3"

iex> Hanyutils.zhuyin_to_numbered_pinyin("ㄋㄧˇㄏㄠˇ")
"ni3hao3"

iex> Hanyutils.zhuyin_to_marked_pinyin("ㄋㄧˇㄏㄠˇ")
"nǐhǎo"
"ㄋㄧˇㄏㄠˇ"

iex> Hanyutils.pinyin_to_zhuyin("ni3hǎo")
"ㄋㄧˇㄏㄠˇ"

The Hanyutils module is built on top of the Hanzi, Pinyin, and Zhuyin modules. You can use these lower-level modules directly if your use case is not present in Hanyutils.

Feel free to file an issue if you feel like your use case should be covered by the Hanyutils module.

As an example, the to_marked_pinyin function shown above could be replaced by the following code:

iex> "你好" |> Hanzi.read() |> Hanzi.to_pinyin() |> Pinyin.marked()
"nǐhǎo"

Alternative versions of this procedure which show all available pronunciations can be written as follows:

iex> "重庆" |> Hanzi.read() |> Hanzi.to_pinyin(&Hanzi.all_pronunciations/1) |> Pinyin.marked()
"[ zhòng | chóng | tóng ]qìng"

iex> "重庆" |> Hanzi.read() |> Hanzi.to_pinyin(&Hanzi.all_pronunciations(&1, "{", ";", "}")) |> Pinyin.marked()
"{zhòng;chóng;tóng}qìng"

license

License

MIT