Model Fine-Tuning Guide

This guide covers fine-tuning Gemini models using supervised learning on Vertex AI.

Overview

Fine-tuning allows you to adapt Gemini models to your specific use case by training them on your custom datasets. This improves model performance for domain-specific tasks like customer support, code generation, content moderation, or specialized Q&A.

Key Benefits:

Improved accuracy for domain-specific tasks
Consistent output formatting and style
Better understanding of domain terminology
Reduced need for extensive prompting

Prerequisites

Required Setup

Vertex AI Authentication - Tuning is only available on Vertex AI
Google Cloud Project - Active GCP project with billing enabled
Vertex AI API - Enable the Vertex AI API in your project
Cloud Storage - GCS bucket for training data
Permissions - Vertex AI User or Vertex AI Admin role

Supported Models

The following Gemini models support fine-tuning on Vertex AI:

gemini-2.5-pro-001 - Best quality, higher cost
gemini-2.5-flash-001 - Balanced quality and speed
gemini-2.5-flash-lite-001 - Fastest, most cost-effective

Cost Considerations

Fine-tuning incurs costs based on:

Training time (typically 1-4 hours)
Base model size
Number of training examples
Number of epochs

Estimate costs using the Google Cloud Pricing Calculator.

Quick Start

1. Prepare Training Data

Create a JSONL file with your training examples:

{"contents": [{"role": "user", "parts": [{"text": "What is your refund policy?"}]}, {"role": "model", "parts": [{"text": "We offer full refunds within 30 days of purchase with proof of receipt."}]}]}
{"contents": [{"role": "user", "parts": [{"text": "How do I track my order?"}]}, {"role": "model", "parts": [{"text": "Visit our tracking page at example.com/track and enter your order number."}]}]}
{"contents": [{"role": "user", "parts": [{"text": "Do you ship internationally?"}]}, {"role": "model", "parts": [{"text": "Yes, we ship to over 50 countries. Shipping times vary by destination."}]}]}

Best Practices:

Minimum 100 examples recommended (more is better)
Maximum 10,000 examples per job
Balance your dataset across different topics
Include diverse input phrasings
Ensure consistent output quality

2. Upload to Cloud Storage

Upload your training data to GCS:

gsutil cp training-data.jsonl gs://my-bucket/tuning/training-data.jsonl

Optionally, create validation data:

gsutil cp validation-data.jsonl gs://my-bucket/tuning/validation-data.jsonl

3. Configure Authentication

Set up Vertex AI credentials:

# Using environment variables
System.put_env("VERTEX_PROJECT_ID", "my-project-id")
System.put_env("VERTEX_LOCATION", "us-central1")
System.put_env("VERTEX_ACCESS_TOKEN", "ya29....")

# Or using application config
config :gemini, :vertex_ai,
  project_id: "my-project-id",
  location: "us-central1",
  access_token: "ya29...."

4. Create a Tuning Job

alias Gemini.Types.Tuning.CreateTuningJobConfig
alias Gemini.APIs.Tunings

# Create job configuration
config = %CreateTuningJobConfig{
  base_model: "gemini-2.5-flash-001",
  tuned_model_display_name: "customer-support-model",
  training_dataset_uri: "gs://my-bucket/tuning/training-data.jsonl",
  validation_dataset_uri: "gs://my-bucket/tuning/validation-data.jsonl",
  epoch_count: 10,
  learning_rate_multiplier: 1.0
}

# Start tuning
{:ok, job} = Tunings.tune(config, auth: :vertex_ai)

IO.puts("Job created: #{job.name}")
IO.puts("State: #{job.state}")

5. Monitor Progress

Poll the job status periodically:

# Manual polling
{:ok, job} = Tunings.get(job_name, auth: :vertex_ai)

case job.state do
  :job_state_succeeded ->
    IO.puts("Training complete!")
    IO.puts("Tuned model: #{job.tuned_model}")

  :job_state_running ->
    IO.puts("Still training...")

  :job_state_failed ->
    IO.puts("Training failed: #{job.error.message}")

  _ ->
    IO.puts("Current state: #{job.state}")
end

# Or use automatic waiting
{:ok, completed_job} = Tunings.wait_for_completion(
  job.name,
  poll_interval: 60_000,    # Check every minute
  timeout: 7_200_000,       # Wait up to 2 hours
  on_status: fn j ->
    IO.puts("State: #{j.state}")
  end,
  auth: :vertex_ai
)

6. Use the Tuned Model

Once training succeeds, use your tuned model:

{:ok, response} = Gemini.generate(
  "What is your shipping policy?",
  model: completed_job.tuned_model,
  auth: :vertex_ai
)

IO.puts(response.text)

Training Data Format

Required Structure

Each line in your JSONL file must be a complete conversation:

{
  "contents": [
    {
      "role": "user",
      "parts": [{"text": "input text"}]
    },
    {
      "role": "model",
      "parts": [{"text": "expected output"}]
    }
  ]
}

Multi-Turn Conversations

For multi-turn examples:

{
  "contents": [
    {"role": "user", "parts": [{"text": "Hello"}]},
    {"role": "model", "parts": [{"text": "Hi! How can I help you?"}]},
    {"role": "user", "parts": [{"text": "I need help with my order"}]},
    {"role": "model", "parts": [{"text": "I'd be happy to help. What's your order number?"}]}
  ]
}

Validation Data

Create a separate validation set (10-20% of total data):

config = %CreateTuningJobConfig{
  base_model: "gemini-2.5-flash-001",
  tuned_model_display_name: "my-model",
  training_dataset_uri: "gs://bucket/training.jsonl",
  validation_dataset_uri: "gs://bucket/validation.jsonl"  # Optional but recommended
}

Hyperparameter Tuning

Epoch Count

Number of times the model trains on the full dataset:

config = %CreateTuningJobConfig{
  # ... other fields
  epoch_count: 15  # Default: 10, Range: 1-100
}

Guidelines:

More epochs = better learning but risk overfitting
Start with default (10) and adjust based on validation metrics
Use validation data to detect overfitting

Learning Rate Multiplier

Controls how quickly the model adapts:

config = %CreateTuningJobConfig{
  # ... other fields
  learning_rate_multiplier: 0.5  # Default: 1.0, Range: 0.1-2.0
}

Guidelines:

Lower (0.3-0.7) = more stable, slower convergence
Higher (1.5-2.0) = faster convergence, risk of instability
Start with 1.0 and adjust if needed

Adapter Size

Model capacity for fine-tuning:

config = %CreateTuningJobConfig{
  # ... other fields
  adapter_size: "ADAPTER_SIZE_FOUR"
}

Options:

"ADAPTER_SIZE_ONE" - Smallest, fastest, least capacity
"ADAPTER_SIZE_FOUR" - Balanced (default)
"ADAPTER_SIZE_EIGHT" - Larger capacity
"ADAPTER_SIZE_SIXTEEN" - Maximum capacity

Guidelines:

Use larger adapters for complex tasks
Start with default and increase if underfitting

Managing Tuning Jobs

List All Jobs

# List recent jobs
{:ok, response} = Tunings.list(auth: :vertex_ai)

Enum.each(response.tuning_jobs, fn job ->
  IO.puts("#{job.tuned_model_display_name}: #{job.state}")
end)

# With pagination
{:ok, response} = Tunings.list(
  page_size: 50,
  page_token: response.next_page_token,
  auth: :vertex_ai
)

# Get all jobs automatically
{:ok, all_jobs} = Tunings.list_all(auth: :vertex_ai)

Filter Jobs

# Filter by state
{:ok, succeeded} = Tunings.list(
  filter: "state=JOB_STATE_SUCCEEDED",
  auth: :vertex_ai
)

# Filter by label
{:ok, production} = Tunings.list(
  filter: "labels.environment=production",
  auth: :vertex_ai
)

Cancel Running Jobs

{:ok, job} = Tunings.cancel(job_name, auth: :vertex_ai)

# Verify cancellation
{:ok, updated} = Tunings.get(job_name, auth: :vertex_ai)
assert updated.state in [:job_state_cancelling, :job_state_cancelled]

Best Practices

Data Quality

Curate High-Quality Examples
- Review and validate each example
- Remove duplicates and errors
- Ensure consistent formatting
Balance Your Dataset
- Equal representation of different topics
- Diverse input phrasings
- Consistent output style
Use Validation Data
- Hold out 10-20% for validation
- Helps detect overfitting
- Provides performance metrics

Training Strategy

Start Simple

# Initial training
config = %CreateTuningJobConfig{
  base_model: "gemini-2.5-flash-001",
  tuned_model_display_name: "model-v1",
  training_dataset_uri: "gs://bucket/data.jsonl",
  epoch_count: 10,
  learning_rate_multiplier: 1.0
}

Iterate and Improve
- Test the tuned model
- Collect failure cases
- Add to training data
- Retrain with updated data

Monitor Metrics

{:ok, job} = Tunings.get(job_name, auth: :vertex_ai)

if job.tuning_data_stats do
  IO.inspect(job.tuning_data_stats, label: "Training Statistics")
end

Production Deployment

Version Your Models

tuned_model_display_name: "support-model-v2-#{Date.utc_today()}"

Label Your Jobs

config = %CreateTuningJobConfig{
  # ... other fields
  labels: %{
    "environment" => "production",
    "version" => "v2",
    "team" => "ml-ops"
  }
}

Test Before Deployment
- Validate on held-out test set
- Compare with base model
- A/B test in production

Troubleshooting

Common Issues

"Training data not found"

Verify GCS URI is correct
Check bucket permissions
Ensure file is in JSONL format

"Invalid training data format"

Validate each line is valid JSON
Check contents structure
Ensure proper role and parts fields

"Insufficient training data"

Minimum 100 examples recommended
Add more diverse examples
Check for duplicates

"Job failed during training"

Check error message in job.error
Verify data quality
Try reducing learning rate

Getting Help

# Check job error details
{:ok, job} = Tunings.get(job_name, auth: :vertex_ai)

if job.state == :job_state_failed do
  IO.puts("Error: #{job.error.message}")
  IO.puts("Code: #{job.error.code}")
  IO.inspect(job.error.details, label: "Details")
end

Complete Example

defmodule MyApp.ModelTuning do
  alias Gemini.Types.Tuning.CreateTuningJobConfig
  alias Gemini.APIs.Tunings

  def train_customer_support_model do
    # 1. Create configuration
    config = %CreateTuningJobConfig{
      base_model: "gemini-2.5-flash-001",
      tuned_model_display_name: "support-v1-#{Date.utc_today()}",
      training_dataset_uri: "gs://my-bucket/support-training.jsonl",
      validation_dataset_uri: "gs://my-bucket/support-validation.jsonl",
      epoch_count: 15,
      learning_rate_multiplier: 0.8,
      labels: %{"team" => "support", "version" => "v1"}
    }

    # 2. Start tuning
    {:ok, job} = Tunings.tune(config, auth: :vertex_ai)
    IO.puts("Started job: #{job.name}")

    # 3. Wait for completion
    {:ok, completed} = Tunings.wait_for_completion(
      job.name,
      poll_interval: 120_000,  # 2 minutes
      on_status: &log_progress/1,
      auth: :vertex_ai
    )

    # 4. Handle result
    case completed.state do
      :job_state_succeeded ->
        IO.puts("Success! Model: #{completed.tuned_model}")
        test_model(completed.tuned_model)

      :job_state_failed ->
        IO.puts("Failed: #{completed.error.message}")

      _ ->
        IO.puts("Unexpected state: #{completed.state}")
    end
  end

  defp log_progress(job) do
    IO.puts("[#{DateTime.utc_now()}] State: #{job.state}")

    if job.tuning_data_stats do
      IO.inspect(job.tuning_data_stats, label: "Stats")
    end
  end

  defp test_model(model_name) do
    test_prompts = [
      "What is your refund policy?",
      "How do I track my order?",
      "Do you ship internationally?"
    ]

    Enum.each(test_prompts, fn prompt ->
      {:ok, response} = Gemini.generate(prompt,
        model: model_name,
        auth: :vertex_ai
      )

      IO.puts("Q: #{prompt}")
      IO.puts("A: #{response.text}\n")
    end)
  end
end

Additional Resources

Next Steps

Review Rate Limiting & Cached Contexts to reduce costs with tuned models
Explore Function Calling for enhanced capabilities
Check Streaming Guide for real-time responses

← Previous Page Live API Guide

Next Page → File Search Stores Guide