Nasty.Statistics.Neural.Transformers.Multilingual (Nasty v0.3.0)
View SourceMultilingual support utilities for transformer models.
Provides helpers for:
- Cross-lingual model selection (XLM-RoBERTa, mBERT)
- Language detection and routing
- Cross-lingual transfer learning
- Zero-shot cross-lingual prediction
Supported Languages
XLM-RoBERTa supports 100 languages including:
- European: English, Spanish, Catalan, French, German, Italian, Portuguese, etc.
- Asian: Chinese, Japanese, Korean, Arabic, Hindi, Thai, Vietnamese, etc.
- Others: Russian, Turkish, Hebrew, Indonesian, etc.
Examples
# Detect language and use appropriate model
{:ok, language} = Multilingual.detect_language(text)
{:ok, model} = Multilingual.model_for_language(language)
# Cross-lingual transfer: train on English, predict on Spanish
{:ok, model} = Multilingual.train_cross_lingual(:en, training_data, :es)
# Zero-shot cross-lingual: use English model for Spanish
{:ok, tagged} = Multilingual.predict_cross_lingual(model, spanish_tokens)
Summary
Functions
Lists all available multilingual models.
Detects the language of input text.
Gets the best multilingual model for a specific language.
Gets information about a multilingual model.
Predicts using a cross-lingual model on target language text.
Checks if a language is well-supported by multilingual models.
Trains a model on one language for use on another (cross-lingual transfer).
Functions
@spec available_models() :: [atom()]
Lists all available multilingual models.
Examples
Multilingual.available_models()
# => [:xlm_roberta_base, :mbert, :xlm_mlm_100]
Detects the language of input text.
This is a simple heuristic-based detector. For production use, consider using a dedicated language detection library.
Examples
{:ok, language} = Multilingual.detect_language("Hello world")
# => {:ok, :en}
{:ok, language} = Multilingual.detect_language("Hola mundo")
# => {:ok, :es}
Gets the best multilingual model for a specific language.
Examples
{:ok, model_name} = Multilingual.model_for_language(:es)
# => {:ok, :xlm_roberta_base}
{:ok, model_name} = Multilingual.model_for_language(:zh)
# => {:ok, :xlm_roberta_base}
Gets information about a multilingual model.
Examples
{:ok, info} = Multilingual.model_info(:xlm_roberta_base)
# => {:ok, %{languages: 100, best_for: [:cross_lingual, ...]}}
@spec predict_cross_lingual(map(), [Nasty.AST.Token.t()], keyword()) :: {:ok, [map()]} | {:error, term()}
Predicts using a cross-lingual model on target language text.
Examples
{:ok, predictions} = Multilingual.predict_cross_lingual(
model,
spanish_tokens,
target_language: :es
)
Checks if a language is well-supported by multilingual models.
Examples
Multilingual.supported_language?(:es)
# => true
Multilingual.supported_language?(:tlh) # Klingon
# => false
@spec train_cross_lingual( [Nasty.Statistics.Neural.Transformers.FineTuner.training_example()], keyword() ) :: {:ok, map()} | {:error, term()}
Trains a model on one language for use on another (cross-lingual transfer).
This is useful when you have training data in one language but want to apply the model to another language.
Options
:source_language- Language of training data (e.g., :en):target_languages- Languages to apply model to (e.g., [:es, :ca]):task- Task type (:pos_tagging, :ner, etc.)- All FineTuner options
Examples
# Train English POS tagger, use for Spanish/Catalan
{:ok, model} = Multilingual.train_cross_lingual(
en_training_data,
source_language: :en,
target_languages: [:es, :ca],
task: :pos_tagging,
num_labels: 17
)
# Use the model for Spanish
{:ok, tagged} = predict_for_language(model, spanish_tokens, :es)