DPMetadata(max_tokens_across_dp_cpu: torch.Tensor, num_tokens_across_dp_cpu: torch.Tensor, local_sizes: list[int] | None = None)
Summary
Functions
Context manager to compute and temporarily set the per-rank local token
Python method DPMetadata.cu_tokens_across_sp.
Python method DPMetadata.get_chunk_sizes_across_dp_rank.
Python method DPMetadata.make.
Initialize self. See help(type(self)) for accurate signature.
Context manager for setting self.local_sizes. Same as self.chunked_sizes
Types
Functions
@spec chunked_sizes(SnakeBridge.Ref.t(), integer(), integer(), integer(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Context manager to compute and temporarily set the per-rank local token
sizes for a specific chunk during chunked forward execution.
This is necessary to ensure each DP (data parallel) rank processes its designated portion of tokens in lockstep with others, even when the token counts are uneven or some ranks have completed their input early.
For chunked execution, we break up the total tokens on each rank into
multiple chunks (of at most max_chunk_size_per_rank), and for a given
chunk_idx, this context manager sets self.local_sizes to the number
of tokens to process in that chunk on each rank.
self.local_sizes is only valid inside the context.
Parameters
sequence_parallel_size- When Attn is TP and MoE layers are EP, we use SP between the layers to avoid redundant ops. We need this value to compute the chunked sizes.max_chunk_size_per_rank- The max number of tokens each rank is allowed to process in this chunk.chunk_idx- The index of the chunk to compute sizes for.
Returns
term()
@spec cu_tokens_across_sp(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method DPMetadata.cu_tokens_across_sp.
Parameters
sp_size(integer())
Returns
term()
@spec get_chunk_sizes_across_dp_rank( SnakeBridge.Ref.t(), keyword() ) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Python method DPMetadata.get_chunk_sizes_across_dp_rank.
Returns
term()
@spec local_sizes(SnakeBridge.Ref.t()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
@spec make(SnakeBridge.Ref.t(), term(), integer(), term(), keyword()) :: {:ok, t()} | {:error, Snakepit.Error.t()}
Python method DPMetadata.make.
Parameters
parallel_config(term())num_tokens(integer())num_tokens_across_dp_cpu(term())
Returns
Vllm.ForwardContext.DPMetadata.t()
@spec new(term(), term(), [term()], keyword()) :: {:ok, SnakeBridge.Ref.t()} | {:error, Snakepit.Error.t()}
Initialize self. See help(type(self)) for accurate signature.
Parameters
max_tokens_across_dp_cpu(term())num_tokens_across_dp_cpu(term())local_sizes(term() default: None)
@spec sp_local_sizes(SnakeBridge.Ref.t(), integer(), keyword()) :: {:ok, term()} | {:error, Snakepit.Error.t()}
Context manager for setting self.local_sizes. Same as self.chunked_sizes
but without any chunking.
Parameters
sequence_parallel_size(integer())
Returns
term()