Documentation Index Fetch the complete documentation index at: https://mintlify.com/zenml-io/zenml/llms.txt
Use this file to discover all available pages before exploring further.
Containerization packages your pipeline code, dependencies, and environment into Docker images for reproducible execution across different infrastructure. ZenML handles Docker image building automatically while giving you full control over the build process.
Why Containerization Matters
Containerized orchestrators (Kubernetes, AWS SageMaker, GCP Vertex AI, etc.) run pipeline steps in isolated Docker containers. This ensures:
Reproducibility Same environment across development, staging, and production
Isolation Steps run in clean environments without conflicting dependencies
Portability Run anywhere Docker runs - local, cloud, or on-premise
Version Control Track exact versions of all dependencies and code
Docker Settings
Configure Docker builds using DockerSettings:
from zenml import pipeline
from zenml.config import DockerSettings
docker_settings = DockerSettings(
# Base image
parent_image = "python:3.9-slim" ,
# Python packages
requirements = [ "scikit-learn==1.3.0" , "pandas==2.0.0" ],
# System packages
apt_packages = [ "git" , "wget" ],
# Environment variables
environment = {
"LOG_LEVEL" : "INFO" ,
"CUSTOM_CONFIG" : "production" ,
},
)
@pipeline ( settings = { "docker" : docker_settings})
def ml_pipeline ():
# Pipeline steps here
pass
The ContainerizedOrchestrator Base Class
Orchestrators that run steps in containers inherit from ContainerizedOrchestrator:
from zenml.orchestrators import ContainerizedOrchestrator
from typing import List
from zenml.config.build_configuration import BuildConfiguration
class MyOrchestrator ( ContainerizedOrchestrator ):
"""Orchestrator that runs steps in Docker containers."""
@ staticmethod
def get_image (
snapshot : PipelineSnapshotResponse,
step_name : Optional[ str ] = None ,
) -> str :
"""Get the Docker image for a pipeline or step.
Args:
snapshot: Pipeline snapshot
step_name: Optional step name for step-specific image
Returns:
Image name or digest
"""
if not snapshot.build:
raise RuntimeError ( f "Missing build for snapshot { snapshot.id } " )
return snapshot.build.get_image(
component_key = ORCHESTRATOR_DOCKER_IMAGE_KEY ,
step = step_name,
)
def get_docker_builds (
self , snapshot : PipelineSnapshotBase
) -> List[BuildConfiguration]:
"""Get required Docker builds.
Args:
snapshot: Pipeline snapshot
Returns:
List of build configurations
"""
pipeline_settings = snapshot.pipeline_configuration.docker_settings
builds = []
# Check each step for custom Docker settings
for name, step in snapshot.step_configurations.items():
step_settings = step.config.docker_settings
if step_settings != pipeline_settings:
# Step has custom settings, needs separate image
builds.append(
BuildConfiguration(
key = ORCHESTRATOR_DOCKER_IMAGE_KEY ,
settings = step_settings,
step_name = name,
)
)
else :
# Use pipeline-level image
if not any (b.step_name is None for b in builds):
builds.append(
BuildConfiguration(
key = ORCHESTRATOR_DOCKER_IMAGE_KEY ,
settings = pipeline_settings,
)
)
return builds
Image Building Process
ZenML’s image building follows this process:
┌─────────────────────────────────────────────────────────────┐
│ 1. Determine Base Image │
│ - Custom dockerfile specified? Use it as base │
│ - Otherwise: Use parent_image or ZenML default │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. Collect Requirements │
│ - Local environment (if replicate_local_python_environment)│
│ - Stack requirements (if install_stack_requirements) │
│ - Integration requirements (required_integrations) │
│ - Pyproject.toml (if pyproject_path specified) │
│ - Explicit requirements (requirements) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. Build Image │
│ - Install system packages (apt_packages) │
│ - Set environment variables (environment) │
│ - Install Python packages │
│ - Copy code files (if allowed) │
│ - Set runtime environment (runtime_environment) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. Push to Registry │
│ - Tag with target_repository (or default) │
│ - Push to container registry from stack │
└─────────────────────────────────────────────────────────────┘
Requirements Installation Order
Dependencies are installed in this specific order:
# 1. Local environment packages (optional)
if docker_settings.replicate_local_python_environment:
# Captures pip freeze or uv freeze output
install_from_local_environment()
# 2. Stack requirements (optional)
if docker_settings.install_stack_requirements:
# Installs requirements for all stack components
install_stack_dependencies()
# 3. Integration requirements
for integration in docker_settings.required_integrations:
# Installs ZenML integration packages
install_integration(integration)
# 4. Pyproject.toml (optional)
if docker_settings.pyproject_path:
# Exports and installs from pyproject.toml
install_from_pyproject()
# 5. Explicit requirements
if docker_settings.requirements:
# Installs from requirements file or list
install_requirements()
Common Configuration Patterns
Basic Configuration
Simple setup with explicit requirements:
from zenml.config import DockerSettings
docker_settings = DockerSettings(
requirements = [
"scikit-learn==1.3.0" ,
"pandas==2.0.0" ,
"numpy==1.24.0" ,
],
)
Using Requirements File
# requirements.txt in your project root
# scikit-learn==1.3.0
# pandas==2.0.0
# numpy==1.24.0
docker_settings = DockerSettings(
requirements = "requirements.txt" ,
)
Custom Base Image
docker_settings = DockerSettings(
parent_image = "pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime" ,
requirements = [ "transformers==4.30.0" ],
)
Custom Dockerfile
For complete control:
# Dockerfile
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y python3.9 python3-pip
# Install ZenML
RUN pip install zenml==0.55.0
# Install ML libraries
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
docker_settings = DockerSettings(
dockerfile = "Dockerfile" ,
build_context_root = "." , # Directory containing Dockerfile
)
System Dependencies
docker_settings = DockerSettings(
apt_packages = [ "git" , "wget" , "curl" , "build-essential" ],
requirements = [ "gitpython" ],
)
Environment Variables
docker_settings = DockerSettings(
# Set during build (available during requirement installation)
environment = {
"PIP_INDEX_URL" : "https://custom.pypi.org/simple" ,
"BUILD_CONFIG" : "production" ,
},
# Set at runtime (available during step execution)
runtime_environment = {
"LOG_LEVEL" : "INFO" ,
"API_KEY" : "$ {SECRET_API_KEY} " , # Reference from secrets
},
)
Integration Requirements
docker_settings = DockerSettings(
# Automatically install integration dependencies
required_integrations = [ "sklearn" , "mlflow" , "s3" ],
# Stack requirements auto-detected
install_stack_requirements = True ,
)
Step-Specific Docker Settings
Different steps can use different images:
from zenml import pipeline, step
from zenml.config import DockerSettings
# Default settings for most steps
pipeline_docker = DockerSettings(
requirements = [ "pandas==2.0.0" ],
)
# Special settings for GPU training
training_docker = DockerSettings(
parent_image = "pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime" ,
requirements = [ "transformers==4.30.0" ],
)
@step
def load_data () -> dict :
# Uses pipeline_docker
return { "data" : "loaded" }
@step ( settings = { "docker" : training_docker})
def train_model ( data : dict ) -> dict :
# Uses training_docker with GPU support
import torch
device = torch.device( "cuda" if torch.cuda.is_available() else "cpu" )
return { "model" : "trained" , "device" : str (device)}
@pipeline ( settings = { "docker" : pipeline_docker})
def ml_pipeline ():
data = load_data()
model = train_model(data)
Dynamic Pipelines and Docker
For dynamic pipelines, the orchestration container needs its own image:
class ContainerizedOrchestrator ( BaseOrchestrator ):
def should_build_pipeline_image ( self , snapshot : PipelineSnapshotBase) -> bool :
"""Whether to build the pipeline image.
Args:
snapshot: Pipeline snapshot
Returns:
True for dynamic pipelines (need orchestration container)
"""
return snapshot.is_dynamic
The orchestration container:
Runs the pipeline function to discover steps
Submits discovered steps to the orchestration backend
Typically needs modest resources (CPU and memory only)
Advanced Build Options
Build Options
Customize Docker build behavior:
from zenml.config import DockerSettings, DockerBuildConfig, DockerBuildOptions
docker_settings = DockerSettings(
build_config = DockerBuildConfig(
build_options = DockerBuildOptions(
# Always pull base image
pull = True ,
# Don't use cache
no_cache = False ,
# Build arguments
build_args = {
"PYTHON_VERSION" : "3.9" ,
"BUILD_ENV" : "production" ,
},
# Cache from images
cache_from = [
"myregistry/ml-base:latest" ,
],
# Labels
labels = {
"version" : "1.0.0" ,
"team" : "ml-platform" ,
},
),
),
)
Package Installers
Choose between pip and uv:
from zenml.config import DockerSettings, PythonPackageInstaller
# Use uv (faster, default)
docker_settings = DockerSettings(
python_package_installer = PythonPackageInstaller. UV ,
requirements = [ "pandas" , "scikit-learn" ],
)
# Use pip
docker_settings = DockerSettings(
python_package_installer = PythonPackageInstaller. PIP ,
python_package_installer_args = {
"index_url" : "https://custom.pypi.org/simple" ,
},
requirements = [ "pandas" , "scikit-learn" ],
)
Code Inclusion
Control how code is made available:
docker_settings = DockerSettings(
# Allow including files in image
allow_including_files_in_images = True ,
# Allow downloading from code repository
allow_download_from_code_repository = True ,
# Allow downloading from artifact store
allow_download_from_artifact_store = True ,
)
Code resolution priority:
Download from code repository (if configured and allowed)
Download from artifact store (if allowed)
Include in Docker image (if allowed)
Running as Non-Root User
docker_settings = DockerSettings(
# Run container as specific user
user = "zenml" ,
# User becomes owner of /app directory
requirements = [ "pandas" ],
)
Testing Docker Builds Locally
Test your Docker configuration before running on cloud infrastructure:
# Use the local Docker orchestrator
from zenml import pipeline
@pipeline ( name = "test_pipeline" )
def test_docker_config ():
# Your pipeline steps
pass
# Run with local Docker orchestrator
test_docker_config.with_options(
orchestrator = "local_docker" ,
)()
This builds the Docker image locally and runs steps in containers on your machine.
Best Practices
Pin Versions Always specify exact versions for reproducibility: pandas==2.0.0 not pandas
Minimize Image Size Use slim base images and only install necessary packages
Layer Caching Order Dockerfile commands from least to most frequently changing
Security Scanning Regularly update base images and scan for vulnerabilities
Security Considerations
# DON'T: Hardcode secrets in Docker settings
docker_settings = DockerSettings(
environment = {
"API_KEY" : "secret-key-12345" , # ❌ Bad!
}
)
# DO: Reference secrets from ZenML secret store
docker_settings = DockerSettings(
runtime_environment = {
"API_KEY" : "$ {SECRET_API_KEY} " , # ✅ Good!
}
)
Multi-Stage Builds
For complex build requirements:
# Dockerfile with multi-stage build
FROM python:3.9 AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user -r requirements.txt
FROM python:3.9-slim
# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local
# Install ZenML
RUN pip install zenml==0.55.0
# Update PATH
ENV PATH=/root/.local/bin:$PATH
Next Steps
Custom Orchestrators Build orchestrators that work with Docker images
Resource Configuration Configure resources for containerized steps
Dynamic Pipelines Understand orchestration containers for dynamic pipelines
Custom Materializers Handle data between containerized steps