Vllm.ForwardContext.DPMetadata (VLLM v0.3.0)

Copy Markdown View Source

DPMetadata(max_tokens_across_dp_cpu: torch.Tensor, num_tokens_across_dp_cpu: torch.Tensor, local_sizes: list[int] | None = None)

Summary

Functions

Context manager to compute and temporarily set the per-rank local token

Python method DPMetadata.cu_tokens_across_sp.

Python method DPMetadata.get_chunk_sizes_across_dp_rank.

Initialize self. See help(type(self)) for accurate signature.

Context manager for setting self.local_sizes. Same as self.chunked_sizes

Types

t()

@opaque t()

Functions

chunked_sizes(ref, sequence_parallel_size, max_chunk_size_per_rank, chunk_idx, opts \\ [])

@spec chunked_sizes(SnakeBridge.Ref.t(), integer(), integer(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Context manager to compute and temporarily set the per-rank local token

sizes for a specific chunk during chunked forward execution.

This is necessary to ensure each DP (data parallel) rank processes its designated portion of tokens in lockstep with others, even when the token counts are uneven or some ranks have completed their input early.

For chunked execution, we break up the total tokens on each rank into multiple chunks (of at most max_chunk_size_per_rank), and for a given chunk_idx, this context manager sets self.local_sizes to the number of tokens to process in that chunk on each rank.

self.local_sizes is only valid inside the context.

Parameters

  • sequence_parallel_size - When Attn is TP and MoE layers are EP, we use SP between the layers to avoid redundant ops. We need this value to compute the chunked sizes.
  • max_chunk_size_per_rank - The max number of tokens each rank is allowed to process in this chunk.
  • chunk_idx - The index of the chunk to compute sizes for.

Returns

  • term()

cu_tokens_across_sp(ref, sp_size, opts \\ [])

@spec cu_tokens_across_sp(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Python method DPMetadata.cu_tokens_across_sp.

Parameters

  • sp_size (integer())

Returns

  • term()

get_chunk_sizes_across_dp_rank(ref, opts \\ [])

@spec get_chunk_sizes_across_dp_rank(
  SnakeBridge.Ref.t(),
  keyword()
) :: {:ok, term()} | {:error, Snakepit.Error.t()}

Python method DPMetadata.get_chunk_sizes_across_dp_rank.

Returns

  • term()

local_sizes(ref)

@spec local_sizes(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}

make(ref, parallel_config, num_tokens, num_tokens_across_dp_cpu, opts \\ [])

@spec make(SnakeBridge.Ref.t(), term(), integer(), term(), keyword()) ::
  {:ok, t()} | {:error, Snakepit.Error.t()}

Python method DPMetadata.make.

Parameters

  • parallel_config (term())
  • num_tokens (integer())
  • num_tokens_across_dp_cpu (term())

Returns

  • Vllm.ForwardContext.DPMetadata.t()

new(max_tokens_across_dp_cpu, num_tokens_across_dp_cpu, args, opts \\ [])

@spec new(term(), term(), [term()], keyword()) ::
  {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}

Initialize self. See help(type(self)) for accurate signature.

Parameters

  • max_tokens_across_dp_cpu (term())
  • num_tokens_across_dp_cpu (term())
  • local_sizes (term() default: None)

sp_local_sizes(ref, sequence_parallel_size, opts \\ [])

@spec sp_local_sizes(SnakeBridge.Ref.t(), integer(), keyword()) ::
  {:ok, term()} | {:error, Snakepit.Error.t()}

Context manager for setting self.local_sizes. Same as self.chunked_sizes

but without any chunking.

Parameters

  • sequence_parallel_size (integer())

Returns

  • term()