# VLLM v0.3.0 - Table of Contents

vLLM for Elixir via SnakeBridge - Easy, fast, and cheap LLM serving for everyone.
High-throughput LLM inference with PagedAttention, continuous batching, and OpenAI-compatible API.

## Pages

- [LICENSE](license.md)

- Guides
  - [README](readme.md)
  - [Quickstart Guide](quickstart.md)

- Features
  - [Offline Inference](offline_inference.md)
  - [Online Serving](online_serving.md)
  - [Sampling Parameters](sampling_params.md)
  - [Configuration Guide](configuration.md)
  - [Multimodal Models](multimodal.md)
  - [LoRA Adapters](lora.md)
  - [Structured Outputs](structured_outputs.md)

- Reference
  - [Supported Models](supported_models.md)
  - [Quantization](quantization.md)

- Examples
  - [VLLM Examples](examples.md)

- Release Notes
  - [Changelog](changelog.md)

## Modules

- [GracefulSerialization.Helpers](GracefulSerialization.Helpers.md): Helper wrappers for `graceful_serialization`.
- [VLLM.ConfigHelper](VLLM.ConfigHelper.md): Runtime configuration helper for using vLLM safely via SnakeBridge/Snakepit.
- [Vllm](Vllm.md): vLLM: a high-throughput and memory-efficient inference engine for LLMs
- [Vllm.Assets](Vllm.Assets.md): Submodule bindings for `vllm.assets`.
- [Vllm.AsyncLLMEngine](Vllm.AsyncLLMEngine.md): Protocol class for Clients to Engine

- [Vllm.Attention](Vllm.Attention.md): Submodule bindings for `vllm.attention`.
- [Vllm.BeamSearch](Vllm.BeamSearch.md): Submodule bindings for `vllm.beam_search`.
- [Vllm.BeamSearch.BeamSearchInstance](Vllm.BeamSearch.BeamSearchInstance.md): Wrapper for Python class BeamSearchInstance.

- [Vllm.BeamSearch.BeamSearchOutput](Vllm.BeamSearch.BeamSearchOutput.md): The output of beam search.
- [Vllm.BeamSearch.BeamSearchSequence](Vllm.BeamSearch.BeamSearchSequence.md): A sequence for beam search.
- [Vllm.BeamSearch.LoRARequest](Vllm.BeamSearch.LoRARequest.md): Request for a LoRA adapter.
- [Vllm.BeamSearch.Logprob](Vllm.BeamSearch.Logprob.md): Infos for supporting OpenAI compatible logprobs and token ranks.
- [Vllm.Benchmarks](Vllm.Benchmarks.md): Submodule bindings for `vllm.benchmarks`.
- [Vllm.CollectEnv](Vllm.CollectEnv.md): Submodule bindings for `vllm.collect_env`.
- [Vllm.CollectEnv.SystemEnv](Vllm.CollectEnv.SystemEnv.md): SystemEnv(torch_version, is_debug_build, cuda_compiled_version, gcc_version, clang_version, cmake_version, os, libc_version, python_version, python_platform, is_cuda_available, cuda_runtime_version, cuda_module_loading, nvidia_driver_version, nvidia_gpu_models, cudnn_version, pip_version, pip_packages, conda_packages, hip_compiled_version, hip_runtime_version, miopen_runtime_version, caching_allocator_config, is_xnnpack_available, cpu_info, rocm_version, vllm_version, vllm_build_flags, gpu_topo, env_vars)

- [Vllm.Compilation](Vllm.Compilation.md): Submodule bindings for `vllm.compilation`.
- [Vllm.Config](Vllm.Config.md): Submodule bindings for `vllm.config`.
- [Vllm.Config.AttentionConfig](Vllm.Config.AttentionConfig.md): Configuration for attention mechanisms in vLLM.

- [Vllm.Config.CUDAGraphMode](Vllm.Config.CUDAGraphMode.md): Constants for the cudagraph mode in CompilationConfig.
- [Vllm.Config.CacheConfig](Vllm.Config.CacheConfig.md): Configuration for the KV cache.

- [Vllm.Config.CompilationConfig](Vllm.Config.CompilationConfig.md): Configuration for compilation.
- [Vllm.Config.CompilationMode](Vllm.Config.CompilationMode.md): The compilation approach used for torch.compile-based compilation of the
- [Vllm.Config.DeviceConfig](Vllm.Config.DeviceConfig.md): Configuration for the device to use for vLLM execution.

- [Vllm.Config.ECTransferConfig](Vllm.Config.ECTransferConfig.md): Configuration for distributed EC cache transfer.

- [Vllm.Config.EPLBConfig](Vllm.Config.EPLBConfig.md): Configuration for Expert Parallel Load Balancing (EP).

- [Vllm.Config.KVEventsConfig](Vllm.Config.KVEventsConfig.md): Configuration for KV event publishing.

- [Vllm.Config.KVTransferConfig](Vllm.Config.KVTransferConfig.md): Configuration for distributed KV cache transfer.

- [Vllm.Config.LoRAConfig](Vllm.Config.LoRAConfig.md): Configuration for LoRA.

- [Vllm.Config.LoadConfig](Vllm.Config.LoadConfig.md): Configuration for loading the model weights.

- [Vllm.Config.ModelConfig](Vllm.Config.ModelConfig.md): Configuration for the model.

- [Vllm.Config.MultiModalConfig](Vllm.Config.MultiModalConfig.md): Controls the behavior of multimodal models.

- [Vllm.Config.ObservabilityConfig](Vllm.Config.ObservabilityConfig.md): Configuration for observability - metrics and tracing.

- [Vllm.Config.ParallelConfig](Vllm.Config.ParallelConfig.md): Configuration for the distributed execution.

- [Vllm.Config.PassConfig](Vllm.Config.PassConfig.md): Configuration for custom Inductor passes.
- [Vllm.Config.PoolerConfig](Vllm.Config.PoolerConfig.md): Controls the behavior of output pooling in pooling models.

- [Vllm.Config.ProfilerConfig](Vllm.Config.ProfilerConfig.md): Dataclass which contains profiler config for the engine.

- [Vllm.Config.SchedulerConfig](Vllm.Config.SchedulerConfig.md): Scheduler configuration.

- [Vllm.Config.SpeculativeConfig](Vllm.Config.SpeculativeConfig.md): Configuration for speculative decoding.

- [Vllm.Config.SpeechToTextConfig](Vllm.Config.SpeechToTextConfig.md): Configuration for speech-to-text models.

- [Vllm.Config.StructuredOutputsConfig](Vllm.Config.StructuredOutputsConfig.md): Dataclass which contains structured outputs config for the engine.

- [Vllm.Config.SupportsMetricsInfo](Vllm.Config.SupportsMetricsInfo.md): Wrapper for Python class SupportsMetricsInfo.

- [Vllm.Config.VllmConfig](Vllm.Config.VllmConfig.md): Dataclass which contains all vllm-related configuration. This
- [Vllm.Connections](Vllm.Connections.md): Submodule bindings for `vllm.connections`.
- [Vllm.Connections.HTTPConnection](Vllm.Connections.HTTPConnection.md): Helper class to send HTTP requests.

- [Vllm.DeviceAllocator](Vllm.DeviceAllocator.md): Submodule bindings for `vllm.device_allocator`.
- [Vllm.Distributed](Vllm.Distributed.md): Submodule bindings for `vllm.distributed`.
- [Vllm.Distributed.DeviceCommunicatorBase](Vllm.Distributed.DeviceCommunicatorBase.md): Base class for device-specific communicator.
- [Vllm.Distributed.GraphCaptureContext](Vllm.Distributed.GraphCaptureContext.md): GraphCaptureContext(stream: torch.cuda.streams.Stream)

- [Vllm.Distributed.GroupCoordinator](Vllm.Distributed.GroupCoordinator.md): PyTorch ProcessGroup wrapper for a group of processes.
- [Vllm.Distributed.StatelessProcessGroup](Vllm.Distributed.StatelessProcessGroup.md): A dataclass to hold a metadata store, and the rank, world_size of the
- [Vllm.Distributed.TensorMetadata](Vllm.Distributed.TensorMetadata.md): TensorMetadata(device, dtype, size)

- [Vllm.Engine](Vllm.Engine.md): Submodule bindings for `vllm.engine`.
- [Vllm.Entrypoints](Vllm.Entrypoints.md): Submodule bindings for `vllm.entrypoints`.
- [Vllm.EnvOverride](Vllm.EnvOverride.md): Submodule bindings for `vllm.env_override`.
- [Vllm.Envs](Vllm.Envs.md): Submodule bindings for `vllm.envs`.
- [Vllm.Exceptions](Vllm.Exceptions.md): Custom exceptions for vLLM.
- [Vllm.Exceptions.VLLMValidationError](Vllm.Exceptions.VLLMValidationError.md): vLLM-specific validation error for request validation failures.
- [Vllm.ForwardContext](Vllm.ForwardContext.md): ForwardContext(no_compile_layers: dict[str, typing.Any], attn_metadata: dict[str, vllm.v1.attention.backend.AttentionMetadata] | list[dict[str, vllm.v1.attention.backend.AttentionMetadata]], virtual_engine: int, dp_metadata: vllm.forward_context.DPMetadata | None = None, cudagraph_runtime_mode: vllm.config.compilation.CUDAGraphMode = <CUDAGraphMode.NONE: 0>, batch_descriptor: vllm.forward_context.BatchDescriptor | None = None, ubatch_slices: list[vllm.v1.worker.ubatch_utils.UBatchSlice] | None = None, additional_kwargs: dict[str, typing.Any] = <factory>)

- [Vllm.ForwardContext.AttentionMetadata](Vllm.ForwardContext.AttentionMetadata.md): Wrapper for Python class AttentionMetadata.

- [Vllm.ForwardContext.BatchDescriptor](Vllm.ForwardContext.BatchDescriptor.md): Batch descriptor for cudagraph dispatching. We should keep the num of
- [Vllm.ForwardContext.DPMetadata](Vllm.ForwardContext.DPMetadata.md): DPMetadata(max_tokens_across_dp_cpu: torch.Tensor, num_tokens_across_dp_cpu: torch.Tensor, local_sizes: list[int] | None = None)

- [Vllm.ForwardContext.Module](Vllm.ForwardContext.Module.md): Submodule bindings for `vllm.forward_context`.
- [Vllm.Grpc](Vllm.Grpc.md): vLLM gRPC protocol definitions.
- [Vllm.Inputs](Vllm.Inputs.md): Submodule bindings for `vllm.inputs`.
- [Vllm.Inputs.DataPrompt](Vllm.Inputs.DataPrompt.md): Represents generic inputs handled by IO processor plugins.

- [Vllm.Inputs.EmbedsInputs](Vllm.Inputs.EmbedsInputs.md): Represents embeddings-based inputs.

- [Vllm.Inputs.EmbedsPrompt](Vllm.Inputs.EmbedsPrompt.md): Schema for a prompt provided via token embeddings.

- [Vllm.Inputs.EncoderDecoderInputs](Vllm.Inputs.EncoderDecoderInputs.md): The inputs in [`LLMEngine`][vllm.engine.llm_engine.LLMEngine] before they
- [Vllm.Inputs.ExplicitEncoderDecoderPrompt](Vllm.Inputs.ExplicitEncoderDecoderPrompt.md): Represents an encoder/decoder model input prompt,
- [Vllm.Inputs.TextPrompt](Vllm.Inputs.TextPrompt.md): Schema for a text prompt.

- [Vllm.Inputs.TokenInputs](Vllm.Inputs.TokenInputs.md): Represents token-based inputs.

- [Vllm.Inputs.TokensPrompt](Vllm.Inputs.TokensPrompt.md): Schema for a tokenized prompt.

- [Vllm.LLM](Vllm.LLM.md): An LLM for generating texts from given prompts and sampling parameters.
- [Vllm.LLMEngine](Vllm.LLMEngine.md): Legacy LLMEngine for backwards compatibility.

- [Vllm.Logger](Vllm.Logger.md): Logging configuration for vLLM.
- [Vllm.Logger.ColoredFormatter](Vllm.Logger.ColoredFormatter.md): Wrapper for Python class ColoredFormatter.

- [Vllm.Logger.NewLineFormatter](Vllm.Logger.NewLineFormatter.md): Wrapper for Python class NewLineFormatter.

- [Vllm.Logger.VllmLogger](Vllm.Logger.VllmLogger.md): Note
- [Vllm.LoggingUtils](Vllm.LoggingUtils.md): Submodule bindings for `vllm.logging_utils`.
- [Vllm.LoggingUtils.ColoredFormatter](Vllm.LoggingUtils.ColoredFormatter.md): Adds ANSI color codes to log levels for terminal output.
- [Vllm.LoggingUtils.NewLineFormatter](Vllm.LoggingUtils.NewLineFormatter.md): Adds logging prefix to newlines to align multi-line messages.

- [Vllm.LogitsProcess](Vllm.LogitsProcess.md): Submodule bindings for `vllm.logits_process`.
- [Vllm.LogitsProcess.NoBadWordsLogitsProcessor](Vllm.LogitsProcess.NoBadWordsLogitsProcessor.md): Wrapper for Python class NoBadWordsLogitsProcessor.

- [Vllm.LogitsProcess.TokenizerLike](Vllm.LogitsProcess.TokenizerLike.md): Wrapper for Python class TokenizerLike.

- [Vllm.Logprobs](Vllm.Logprobs.md): Submodule bindings for `vllm.logprobs`.
- [Vllm.Logprobs.FlatLogprobs](Vllm.Logprobs.FlatLogprobs.md): Flat logprobs of a request into multiple primitive type lists.
- [Vllm.Logprobs.Logprob](Vllm.Logprobs.Logprob.md): Infos for supporting OpenAI compatible logprobs and token ranks.
- [Vllm.Lora](Vllm.Lora.md): Submodule bindings for `vllm.lora`.
- [Vllm.ModelExecutor](Vllm.ModelExecutor.md): Submodule bindings for `vllm.model_executor`.
- [Vllm.ModelExecutor.BasevLLMParameter](Vllm.ModelExecutor.BasevLLMParameter.md): Base parameter for vLLM linear layers. Extends the torch.nn.parameter
- [Vllm.ModelExecutor.Models.Adapters](Vllm.ModelExecutor.Models.Adapters.md): Submodule bindings for `vllm.model_executor.models.adapters`.
- [Vllm.ModelExecutor.Models.Interfaces](Vllm.ModelExecutor.Models.Interfaces.md): Submodule bindings for `vllm.model_executor.models.interfaces`.
- [Vllm.ModelExecutor.Models.InterfacesBase](Vllm.ModelExecutor.Models.InterfacesBase.md): Submodule bindings for `vllm.model_executor.models.interfaces_base`.
- [Vllm.ModelExecutor.PackedvLLMParameter](Vllm.ModelExecutor.PackedvLLMParameter.md): Parameter for model weights which are packed on disk.
- [Vllm.ModelInspection](Vllm.ModelInspection.md): Model inspection utilities for vLLM.
- [Vllm.Multimodal](Vllm.Multimodal.md): Submodule bindings for `vllm.multimodal`.
- [Vllm.Multimodal.Inputs](Vllm.Multimodal.Inputs.md): Submodule bindings for `vllm.multimodal.inputs`.
- [Vllm.Multimodal.Inputs.MultiModalFieldConfig](Vllm.Multimodal.Inputs.MultiModalFieldConfig.md): MultiModalFieldConfig(field: vllm.multimodal.inputs.BaseMultiModalField, modality: str)

- [Vllm.Multimodal.Inputs.MultiModalFieldElem](Vllm.Multimodal.Inputs.MultiModalFieldElem.md): Represents a keyword argument inside a
- [Vllm.Multimodal.Inputs.MultiModalInputs](Vllm.Multimodal.Inputs.MultiModalInputs.md): Represents the outputs of
- [Vllm.Multimodal.Inputs.MultiModalKwargsItem](Vllm.Multimodal.Inputs.MultiModalKwargsItem.md): A collection of
- [Vllm.Multimodal.Inputs.MultiModalKwargsItems](Vllm.Multimodal.Inputs.MultiModalKwargsItems.md): A dictionary of
- [Vllm.Multimodal.Inputs.PlaceholderRange](Vllm.Multimodal.Inputs.PlaceholderRange.md): Placeholder location information for multi-modal data.

- [Vllm.Multimodal.MultiModalDataBuiltins](Vllm.Multimodal.MultiModalDataBuiltins.md): Type annotations for modality types predefined by vLLM.

- [Vllm.Multimodal.MultiModalHasher](Vllm.Multimodal.MultiModalHasher.md): Wrapper for Python class MultiModalHasher.

- [Vllm.Multimodal.MultiModalKwargsItems](Vllm.Multimodal.MultiModalKwargsItems.md): A dictionary of
- [Vllm.Multimodal.MultiModalRegistry](Vllm.Multimodal.MultiModalRegistry.md): A registry that dispatches data processing according to the model.

- [Vllm.Multimodal.Parse](Vllm.Multimodal.Parse.md): Submodule bindings for `vllm.multimodal.parse`.
- [Vllm.Multimodal.Processing](Vllm.Multimodal.Processing.md): Submodule bindings for `vllm.multimodal.processing`.
- [Vllm.Multimodal.Registry](Vllm.Multimodal.Registry.md): Submodule bindings for `vllm.multimodal.registry`.
- [Vllm.Outputs](Vllm.Outputs.md): Submodule bindings for `vllm.outputs`.
- [Vllm.Outputs.ClassificationOutput](Vllm.Outputs.ClassificationOutput.md): The output data of one classification output of a request.
- [Vllm.Outputs.ClassificationRequestOutput](Vllm.Outputs.ClassificationRequestOutput.md): The output data of a pooling request to the LLM.
- [Vllm.Outputs.CompletionOutput](Vllm.Outputs.CompletionOutput.md): The output data of one completion output of a request.
- [Vllm.Outputs.EmbeddingOutput](Vllm.Outputs.EmbeddingOutput.md): The output data of one embedding output of a request.
- [Vllm.Outputs.EmbeddingRequestOutput](Vllm.Outputs.EmbeddingRequestOutput.md): The output data of a pooling request to the LLM.
- [Vllm.Outputs.PoolingOutput](Vllm.Outputs.PoolingOutput.md): The output data of one pooling output of a request.
- [Vllm.Outputs.PoolingRequestOutput](Vllm.Outputs.PoolingRequestOutput.md): The output data of a pooling request to the LLM.
- [Vllm.Outputs.RequestOutput](Vllm.Outputs.RequestOutput.md): The output data of a completion request to the LLM.
- [Vllm.Outputs.RequestStateStats](Vllm.Outputs.RequestStateStats.md): Stats that need to be tracked across delta updates.

- [Vllm.Outputs.ScoringOutput](Vllm.Outputs.ScoringOutput.md): The output data of one scoring output of a request.
- [Vllm.Outputs.ScoringRequestOutput](Vllm.Outputs.ScoringRequestOutput.md): The output data of a pooling request to the LLM.
- [Vllm.Platforms](Vllm.Platforms.md): Submodule bindings for `vllm.platforms`.
- [Vllm.Platforms.CpuArchEnum](Vllm.Platforms.CpuArchEnum.md): Enum members for `CpuArchEnum`.
- [Vllm.Platforms.Platform](Vllm.Platforms.Platform.md): Wrapper for Python class Platform.

- [Vllm.Platforms.PlatformEnum](Vllm.Platforms.PlatformEnum.md): Enum members for `PlatformEnum`.
- [Vllm.Plugins](Vllm.Plugins.md): Submodule bindings for `vllm.plugins`.
- [Vllm.PoolingParams](Vllm.PoolingParams.md): API parameters for pooling models.
- [Vllm.PoolingParams.Module](Vllm.PoolingParams.Module.md): Submodule bindings for `vllm.pooling_params`.
- [Vllm.PoolingParams.RequestOutputKind](Vllm.PoolingParams.RequestOutputKind.md): Enum members for `RequestOutputKind`.
- [Vllm.PoolingParamsClass](Vllm.PoolingParamsClass.md): API parameters for pooling models.
- [Vllm.Profiler](Vllm.Profiler.md): Submodule bindings for `vllm.profiler`.
- [Vllm.Ray](Vllm.Ray.md): Submodule bindings for `vllm.ray`.
- [Vllm.Reasoning](Vllm.Reasoning.md): Submodule bindings for `vllm.reasoning`.
- [Vllm.Reasoning.ReasoningParser](Vllm.Reasoning.ReasoningParser.md): Abstract reasoning parser class that should not be used directly.
- [Vllm.Reasoning.ReasoningParserManager](Vllm.Reasoning.ReasoningParserManager.md): Central registry for ReasoningParser implementations.
- [Vllm.SamplingParams](Vllm.SamplingParams.md): Sampling parameters for text generation.
- [Vllm.SamplingParams.BeamSearchParams](Vllm.SamplingParams.BeamSearchParams.md): Beam search parameters for text generation.

- [Vllm.SamplingParams.Module](Vllm.SamplingParams.Module.md): Sampling parameters for text generation.
- [Vllm.SamplingParams.PydanticMsgspecMixin](Vllm.SamplingParams.PydanticMsgspecMixin.md): Sampling parameters for text generation.

- [Vllm.SamplingParams.RequestOutputKind](Vllm.SamplingParams.RequestOutputKind.md): Enum members for `RequestOutputKind`.
- [Vllm.SamplingParams.SamplingType](Vllm.SamplingParams.SamplingType.md): Enum members for `SamplingType`.
- [Vllm.SamplingParams.StructuredOutputsParams](Vllm.SamplingParams.StructuredOutputsParams.md): Sampling parameters for text generation.

- [Vllm.SamplingParams.TokenizerLike](Vllm.SamplingParams.TokenizerLike.md): Sampling parameters for text generation.

- [Vllm.SamplingParamsClass](Vllm.SamplingParamsClass.md): Sampling parameters for text generation.
- [Vllm.ScalarType](Vllm.ScalarType.md): ScalarType can represent a wide range of floating point and integer
- [Vllm.ScalarType.Module](Vllm.ScalarType.Module.md): Submodule bindings for `vllm.scalar_type`.
- [Vllm.ScalarType.NanRepr](Vllm.ScalarType.NanRepr.md): Enum members for `NanRepr`.
- [Vllm.ScalarType.ScalarTypes](Vllm.ScalarType.ScalarTypes.md): Wrapper for Python class scalar_types.

- [Vllm.Scripts](Vllm.Scripts.md): Submodule bindings for `vllm.scripts`.
- [Vllm.Sequence](Vllm.Sequence.md): Sequence and its related classes.
- [Vllm.Sequence.IntermediateTensors](Vllm.Sequence.IntermediateTensors.md): For all pipeline stages except the last, we need to return the hidden
- [Vllm.Sequence.KVConnectorOutput](Vllm.Sequence.KVConnectorOutput.md): Special type indicating an unconstrained type.
- [Vllm.Tasks](Vllm.Tasks.md): Submodule bindings for `vllm.tasks`.
- [Vllm.Tokenizers](Vllm.Tokenizers.md): Submodule bindings for `vllm.tokenizers`.
- [Vllm.Tokenizers.TokenizerLike](Vllm.Tokenizers.TokenizerLike.md): Wrapper for Python class TokenizerLike.

- [Vllm.ToolParsers](Vllm.ToolParsers.md): Submodule bindings for `vllm.tool_parsers`.
- [Vllm.ToolParsers.ToolParser](Vllm.ToolParsers.ToolParser.md): Abstract ToolParser class that should not be used directly. Provided
- [Vllm.ToolParsers.ToolParserManager](Vllm.ToolParsers.ToolParserManager.md): Central registry for ToolParser implementations.
- [Vllm.Tracing](Vllm.Tracing.md): Submodule bindings for `vllm.tracing`.
- [Vllm.Tracing.BaseSpanAttributes](Vllm.Tracing.BaseSpanAttributes.md): Wrapper for Python class BaseSpanAttributes.

- [Vllm.Tracing.SpanAttributes](Vllm.Tracing.SpanAttributes.md): Wrapper for Python class SpanAttributes.

- [Vllm.TransformersUtils](Vllm.TransformersUtils.md): Submodule bindings for `vllm.transformers_utils`.
- [Vllm.TritonUtils](Vllm.TritonUtils.md): Submodule bindings for `vllm.triton_utils`.
- [Vllm.TritonUtils.TritonLanguagePlaceholder](Vllm.TritonUtils.TritonLanguagePlaceholder.md): Wrapper for Python class TritonLanguagePlaceholder.

- [Vllm.TritonUtils.TritonPlaceholder](Vllm.TritonUtils.TritonPlaceholder.md): Wrapper for Python class TritonPlaceholder.

- [Vllm.Usage](Vllm.Usage.md): Submodule bindings for `vllm.usage`.
- [Vllm.Utils](Vllm.Utils.md): Submodule bindings for `vllm.utils`.
- [Vllm.V1](Vllm.V1.md): Submodule bindings for `vllm.v1`.
- [Vllm.Version](Vllm.Version.md): Submodule bindings for `vllm.version`.

- Core API
  - [VLLM](VLLM.md): VLLM - vLLM for Elixir via SnakeBridge.