×
1 Choose EITC/EITCA Certificates
2 Learn and take online exams
3 Get your IT skills certified

Confirm your IT skills and competencies under the European IT Certification framework from anywhere in the world fully online.

EITCA Academy

Digital skills attestation standard by the European IT Certification Institute aiming to support Digital Society development

LOG IN TO YOUR ACCOUNT

CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR PASSWORD?

AAH, WAIT, I REMEMBER NOW!

CREATE AN ACCOUNT

ALREADY HAVE AN ACCOUNT?
EUROPEAN INFORMATION TECHNOLOGIES CERTIFICATION ACADEMY - ATTESTING YOUR PROFESSIONAL DIGITAL SKILLS
  • SIGN UP
  • LOGIN
  • INFO

EITCA Academy

EITCA Academy

The European Information Technologies Certification Institute - EITCI ASBL

Certification Provider

EITCI Institute ASBL

Brussels, European Union

Governing European IT Certification (EITC) framework in support of the IT professionalism and Digital Society

  • CERTIFICATES
    • EITCA ACADEMIES
      • EITCA ACADEMIES CATALOGUE<
      • EITCA/CG COMPUTER GRAPHICS
      • EITCA/IS INFORMATION SECURITY
      • EITCA/BI BUSINESS INFORMATION
      • EITCA/KC KEY COMPETENCIES
      • EITCA/EG E-GOVERNMENT
      • EITCA/WD WEB DEVELOPMENT
      • EITCA/AI ARTIFICIAL INTELLIGENCE
    • EITC CERTIFICATES
      • EITC CERTIFICATES CATALOGUE<
      • COMPUTER GRAPHICS CERTIFICATES
      • WEB DESIGN CERTIFICATES
      • 3D DESIGN CERTIFICATES
      • OFFICE IT CERTIFICATES
      • BITCOIN BLOCKCHAIN CERTIFICATE
      • WORDPRESS CERTIFICATE
      • CLOUD PLATFORM CERTIFICATENEW
    • EITC CERTIFICATES
      • INTERNET CERTIFICATES
      • CRYPTOGRAPHY CERTIFICATES
      • BUSINESS IT CERTIFICATES
      • TELEWORK CERTIFICATES
      • PROGRAMMING CERTIFICATES
      • DIGITAL PORTRAIT CERTIFICATE
      • WEB DEVELOPMENT CERTIFICATES
      • DEEP LEARNING CERTIFICATESNEW
    • CERTIFICATES FOR
      • EU PUBLIC ADMINISTRATION
      • TEACHERS AND EDUCATORS
      • IT SECURITY PROFESSIONALS
      • GRAPHICS DESIGNERS & ARTISTS
      • BUSINESSMEN AND MANAGERS
      • BLOCKCHAIN DEVELOPERS
      • WEB DEVELOPERS
      • CLOUD AI EXPERTSNEW
  • FEATURED
  • SUBSIDY
  • HOW IT WORKS
  •   IT ID
  • ABOUT
  • CONTACT
  • MY ORDER
    Your current order is empty.
EITCIINSTITUTE
CERTIFIED

How do I deploy a custom container on Google Cloud AI Platform?

by MIRNA HANŽEK / Tuesday, 25 November 2025 / Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, Google Cloud AI Platform, Training models with custom containers on Cloud AI Platform

Deploying a custom container on Google Cloud AI Platform (now part of Vertex AI) is a process that allows practitioners to leverage their own software environments, dependencies, and frameworks for training and prediction tasks. This approach is particularly beneficial when default environments do not meet the requirements of a project, such as when custom libraries, proprietary code, or unsupported frameworks are needed.

1. Overview of Custom Containers in Cloud AI Platform

A custom container is a Docker image containing all the code, packages, and dependencies necessary for a machine learning task. Google Cloud AI Platform supports custom containers for both training and serving models. This flexibility ensures that developers can maintain control over their runtime environment, implement advanced workflows, and use any language or ML framework.

The process broadly involves creating a Docker image, uploading it to Google Container Registry (GCR) or Artifact Registry, and configuring an AI Platform job to use the image.

2. Prerequisites

– A Google Cloud Project with billing enabled.
– The Google Cloud SDK (`gcloud`) installed and authenticated.
– Docker installed locally.
– Permissions: At minimum, `roles/ml.admin` and `roles/storage.admin` for the project.
– (Optional) Service account with necessary permissions for programmatic access.

3. Constructing the Dockerfile

The Dockerfile defines the environment for training or serving. It should:

– Start from a suitable base image (e.g., a Python image, TensorFlow, PyTorch, or a custom environment).
– Copy source code into the image.
– Install required system libraries and Python packages.
– Define the entry point for training or prediction.

Example Dockerfile for a Custom Training Job (Python-based):

dockerfile
FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy source code
WORKDIR /app
COPY . /app/

# Install Python dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Set entry point for training
ENTRYPOINT ["python", "train.py"]

In this example, `train.py` is the main script that initiates the training logic.

4. Building and Pushing the Docker Image

After writing the Dockerfile and placing the source code in the build context, the image can be built and pushed to a container registry accessible by AI Platform.

Steps:
– Set your Google Cloud project:

bash
  gcloud config set project [PROJECT_ID]
  

– Build the Docker image:

bash
  docker build -t gcr.io/[PROJECT_ID]/custom-ml-image:latest .
  

– Authenticate Docker to the Google Container Registry:

bash
  gcloud auth configure-docker
  

– Push the image to the registry:

bash
  docker push gcr.io/[PROJECT_ID]/custom-ml-image:latest
  

5. Preparing Training Code and Entry Point

The training code must conform to certain standards:

– Accept command-line arguments for hyperparameters, data paths, and output directories.
– Write model artifacts to the directory specified by the `–model-dir` or a similar argument.
– Log progress and errors using standard output (stdout) and standard error (stderr).

Sample `train.py`:

python
import argparse
import logging
import os
from model import train_model  # Assume train_model is defined in model.py

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--data-dir', type=str, required=True)
    parser.add_argument('--model-dir', type=str, required=True)
    parser.add_argument('--epochs', type=int, default=10)
    return parser.parse_args()

def main():
    args = parse_args()
    logging.info(f"Training with data from {args.data_dir}")
    # Training logic
    train_model(data_dir=args.data_dir, output_dir=args.model_dir, epochs=args.epochs)

if __name__ == "__main__":
    main()

6. Configuring and Submitting the Training Job

Training jobs can be submitted using `gcloud` or via the AI Platform API.

Using `gcloud` CLI:

bash
gcloud ai custom-jobs create \
  --region=us-central1 \
  --display-name=custom-container-job \
  --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=gcr.io/[PROJECT_ID]/custom-ml-image:latest,local-package-path=.,python-module=train

Key Parameters:
– `–region`: The region for the training job.
– `–display-name`: Friendly name for identification.
– `–worker-pool-spec`: Specifies the compute resources, image, and entry point.
– `container-image-uri`: URI of the pushed Docker image.

For distributed training, adjust `replica-count` and specify additional worker pool specs if needed.

7. Data and Artifact Management

Datasets and model artifacts should be stored in Google Cloud Storage (GCS). The training script should read data from GCS and write outputs back to GCS.

References in the training script:
– Input data path: `gs://[BUCKET_NAME]/data/`
– Output model path: `gs://[BUCKET_NAME]/models/`

8. Monitoring and Logging

Once the job is submitted, its progress can be monitored from the Google Cloud Console under Vertex AI > Training, or via `gcloud ai custom-jobs describe [JOB_ID]`. Logs are streamed to Stackdriver Logging (now Cloud Logging), where stdout and stderr from the container are accessible.

9. Deploying a Custom Container for Prediction

Serving with a custom container is similar. The container must implement a web server that exposes HTTP endpoints conforming to the Vertex AI prediction protocol (typically `/v1/endpoints` with `predict` method). The container must listen on port 8080.

Dockerfile for Prediction:

dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY . /app/
RUN pip install -r requirements.txt
EXPOSE 8080

CMD ["gunicorn", "--bind", "0.0.0.0:8080", "predict:app"]

Here, `predict:app` refers to a Python module `predict.py` with a WSGI app named `app` (for example, built with Flask or FastAPI).

Sample Flask-Based Model Server:

python
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
model = joblib.load('model.joblib')

@app.route('/v1/endpoints/predict', methods=['POST'])
def predict():
    data = request.get_json()
    prediction = model.predict(data['instances'])
    return jsonify({'predictions': prediction.tolist()})

10. Deploying the Prediction Service

Once the image is pushed to GCR, deploy the model:

– Register the model:

bash
  gcloud ai models upload \
    --region=us-central1 \
    --display-name=custom-container-model \
    --container-image-uri=gcr.io/[PROJECT_ID]/custom-ml-image:latest
  

– Deploy an endpoint:

bash
  gcloud ai endpoints create --region=us-central1 --display-name=custom-endpoint
  

– Deploy the model to the endpoint:

bash
  gcloud ai endpoints deploy-model [ENDPOINT_ID] \
    --region=us-central1 \
    --model=[MODEL_ID] \
    --display-name=custom-container-deployment \
    --machine-type=n1-standard-4
  

11. Best Practices

– Automate Image Builds: Use Cloud Build or CI/CD pipelines for consistent, repeatable container builds.
– Parameterize Scripts: Design code to accept parameters via command-line or environment variables for flexibility and reproducibility.
– Handle Errors Gracefully: Ensure the container exits with non-zero status codes on failure for proper job monitoring.
– Testing: Test the container locally with representative data before deploying to production.
– Security: Use least-privilege IAM roles for the service account running the job, and scan images for vulnerabilities.

12. Example Workflow: End-to-End

Suppose a team develops a PyTorch-based image classification model that requires custom C++ dependencies and a particular version of torchvision not available in managed environments.

Steps:
1. Write Dockerfile: Install required system libraries, PyTorch, and custom dependencies.
2. Prepare Code: Ensure `train.py` reads from GCS and writes outputs to GCS, accepts hyperparameters as arguments.
3. Build and Push Image: Build the image locally and push to GCR.
4. Submit Training Job: Use `gcloud ai custom-jobs create` with the custom image, specifying data and output directories.
5. Monitor Progress: Use Google Cloud Console and Cloud Logging for job monitoring.
6. Model Serving: Write a Flask app exposing a `/predict` endpoint (listening on port 8080), package it in a Docker image, push to GCR, and deploy via Vertex AI endpoints.

13. Troubleshooting

– Job Fails to Start: Check logs for errors in Dockerfile or entry point.
– Container Not Found: Ensure correct image URI and that the image is in a registry accessible to Vertex AI.
– Port Not Exposed: For prediction, the container must listen on port 8080.
– Data Access Issues: Verify GCS paths and service account permissions.

14. Further Reading

– [Vertex AI Custom Containers Documentation](https://cloud.google.com/vertex-ai/docs/training/custom-containers-training)
– [Docker Reference](https://docs.docker.com/engine/reference/builder/)
– [gcloud AI Platform Commands](https://cloud.google.com/sdk/gcloud/reference/ai/)

Other recent questions and answers regarding Training models with custom containers on Cloud AI Platform:

  • Can one utilize the configuration file for the CMLE model deployment when using a distributed ML model training to define how many machines will be used in training?
  • Why would you use custom containers on Google Cloud AI Platform instead of running the training locally?
  • What additional functionality do you need to install when building your own container image?
  • What is the advantage of using custom containers in terms of library versions?
  • How can custom containers future-proof your workflow in machine learning?
  • What are the benefits of using custom containers on Google Cloud AI Platform for running machine learning?

More questions and answers:

  • Field: Artificial Intelligence
  • Programme: EITC/AI/GCML Google Cloud Machine Learning (go to the certification programme)
  • Lesson: Google Cloud AI Platform (go to related lesson)
  • Topic: Training models with custom containers on Cloud AI Platform (go to related topic)
Tagged under: Artificial Intelligence, Cloud Training, Docker, Google Cloud, Machine Learning, Vertex AI
Home » Artificial Intelligence » EITC/AI/GCML Google Cloud Machine Learning » Google Cloud AI Platform » Training models with custom containers on Cloud AI Platform » » How do I deploy a custom container on Google Cloud AI Platform?

Certification Center

USER MENU

  • My Account

CERTIFICATE CATEGORY

  • EITC Certification (105)
  • EITCA Certification (9)

What are you looking for?

  • Introduction
  • How it works?
  • EITCA Academies
  • EITCI DSJC Subsidy
  • Full EITC catalogue
  • Your order
  • Featured
  •   IT ID
  • EITCA reviews (Medium publ.)
  • About
  • Contact

EITCA Academy is a part of the European IT Certification framework

The European IT Certification framework has been established in 2008 as a Europe based and vendor independent standard in widely accessible online certification of digital skills and competencies in many areas of professional digital specializations. The EITC framework is governed by the European IT Certification Institute (EITCI), a non-profit certification authority supporting information society growth and bridging the digital skills gap in the EU.
Eligibility for EITCA Academy 90% EITCI DSJC Subsidy support
90% of EITCA Academy fees subsidized in enrolment

    EITCA Academy Secretary Office

    European IT Certification Institute ASBL
    Brussels, Belgium, European Union

    EITC / EITCA Certification Framework Operator
    Governing European IT Certification Standard
    Access contact form or call +32 25887351

    Follow EITCI on X
    Visit EITCA Academy on Facebook
    Engage with EITCA Academy on LinkedIn
    Check out EITCI and EITCA videos on YouTube

    Funded by the European Union

    Funded by the European Regional Development Fund (ERDF) and the European Social Fund (ESF) in series of projects since 2007, currently governed by the European IT Certification Institute (EITCI) since 2008

    Information Security Policy | DSRRM and GDPR Policy | Data Protection Policy | Record of Processing Activities | HSE Policy | Anti-Corruption Policy | Modern Slavery Policy

    Automatically translate to your language

    Terms and Conditions | Privacy Policy
    EITCA Academy
    • EITCA Academy on social media
    EITCA Academy


    © 2008-2026  European IT Certification Institute
    Brussels, Belgium, European Union

    TOP
    CHAT WITH SUPPORT
    Do you have any questions?
    We will reply here and by email. Your conversation is tracked with a support token.