If your laptop takes hours to train a model, how would you use a VM with GPU and JupyterLab to speed up the process and organize dependencies without breaking your environment?

by JOSE ALFONSIN PENA / Tuesday, 25 November 2025 / Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, Advancing in Machine Learning, Deep learning VM Images

When training deep learning models, computational resources play a significant role in determining the feasibility and speed of experimentation. Most consumer laptops are not equipped with powerful GPUs or sufficient memory to handle large datasets or complex neural network architectures efficiently; consequently, training times can extend to several hours or days. Utilizing cloud-based virtual machines (VMs) with dedicated GPUs significantly alleviates these constraints, enabling rapid prototyping and iteration. Google Cloud Platform (GCP) offers Deep Learning VM Images, which are preconfigured virtual machine images optimized for machine learning tasks.

Using a Google Cloud VM with GPU and JupyterLab for Efficient Model Training

1. Selecting the Appropriate Deep Learning VM Image

Google Cloud provides Deep Learning VM Images pre-installed with popular frameworks such as TensorFlow, PyTorch, and JAX, alongside GPU drivers and libraries (e.g., CUDA, cuDNN, NCCL). These images also include JupyterLab, a powerful interactive development environment. To begin, select a Deep Learning VM Image that matches your requirements in terms of the deep learning framework and the type of GPU you wish to use (such as NVIDIA Tesla T4, P100, V100, or A100, depending on availability and your budget).

2. Creating the VM Instance

Using the Google Cloud Console or the `gcloud` CLI, create a new VM instance:

– Choose a machine type with sufficient vCPUs and RAM (e.g., n1-standard-8 or higher).
– Specify the number and type of GPUs in the “GPUs” section.
– Select a Deep Learning VM Image from the Marketplace.
– Adjust disk size based on dataset and model requirements.
– Open the required ports (notably, TCP:8080 or TCP:8888) to allow access to JupyterLab.

Example `gcloud` command:

bash
gcloud compute instances create my-dl-vm \
  --zone=us-central1-a \
  --machine-type=n1-standard-8 \
  --accelerator=type=nvidia-tesla-t4,count=1 \
  --image-family=tf-latest-gpu \
  --image-project=deeplearning-platform-release \
  --maintenance-policy=TERMINATE \
  --metadata="install-nvidia-driver=True" \
  --boot-disk-size=200GB \
  --scopes=https://www.googleapis.com/auth/cloud-platform

This command creates a VM with an 8 vCPU processor, a T4 GPU, and a 200 GB boot disk, using the latest TensorFlow GPU image.

3. Accessing JupyterLab

Once the VM is running, connect via SSH and start JupyterLab. On Google Cloud Deep Learning VMs, JupyterLab is typically preconfigured and can be accessed by navigating to the External IP address of the VM in your browser, appending `:8080` or `:8888` (the default port), depending on the configuration.

If not already running, JupyterLab can be manually started:

bash
jupyter lab --ip=0.0.0.0 --port=8080 --no-browser

For secure access, set up SSH tunneling or configure an HTTPS connection. Google Cloud offers a built-in “Open JupyterLab” button for Deep Learning VMs, which simplifies this process.

4. Organizing Dependencies Using Virtual Environments

A common challenge in machine learning is dependency management. Different projects may require different versions of libraries, and upgrading or downgrading packages globally can lead to conflicts or incompatibilities. To isolate dependencies, use Python virtual environments or `conda` environments.

– To create a virtual environment with `venv`:

bash
python3 -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt

– To use `conda` (installed by default on Deep Learning VMs):

bash
conda create -n myenv python=3.8
conda activate myenv
conda install tensorflow-gpu==2.8.0 numpy pandas matplotlib

After activating the environment, ensure JupyterLab recognizes it as a kernel:

bash
pip install ipykernel
python -m ipykernel install --user --name=myenv --display-name="Python (myenv)"

This allows you to select your environment as a kernel within JupyterLab, ensuring your notebooks use the correct dependencies.

5. Transferring Data and Notebooks

Upload your datasets and notebooks to the VM. This can be achieved through:

– Google Cloud Storage (GCS): Upload data to a GCS bucket and use the `gsutil` command or the Python GCS client to download it to the VM.
– SCP: Use secure copy (SCP) to transfer files directly from your local machine to the VM.
– JupyterLab’s graphical interface: Drag and drop files via the browser.

Example using `gsutil`:

bash
gsutil cp gs://your-bucket/dataset.csv /home/jupyter/

6. Training Your Model on the GPU-equipped VM

With your environment set up, open your notebook in JupyterLab. Ensure that the framework (e.g., TensorFlow, PyTorch) detects the GPU. In TensorFlow, for example, run:

python
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

If a GPU is detected, model training will utilize it, significantly reducing training time compared to CPU-only environments. Monitor GPU usage via command-line tools such as `nvidia-smi`:

bash
watch -n 1 nvidia-smi

This command displays GPU memory usage, temperature, and running processes, allowing you to ensure efficient utilization.

7. Managing and Preserving Environments

To prevent breaking your environment:

– Avoid installing or upgrading packages globally.
– Use virtual or `conda` environments for each project.
– Export your environment’s dependencies for reproducibility:

bash
pip freeze > requirements.txt  # For venv
conda env export > environment.yml  # For conda

Should you need to recreate the environment, use these files to install the same dependencies.

– For team projects, consider storing these files in version control alongside your code.
– Regularly backup important data and notebooks to GCS or your local machine.

8. Shutting Down Resources

Cloud resources incur costs based on usage. When computation is not required, stop or delete the VM to avoid unnecessary charges. Data can be persisted in GCS buckets or attached persistent disks.

Example Workflow: From Local Laptop to Cloud GPU VM

Suppose you are training a convolutional neural network (CNN) on the CIFAR-10 dataset using TensorFlow. Training on your laptop (CPU-only) takes 3 hours per epoch. By migrating to a Google Cloud VM with a T4 GPU and configuring your environment as described:

– Training time per epoch drops to 10 minutes.
– Your dependencies are managed in a `conda` environment with TensorFlow 2.8, NumPy, and Matplotlib.
– Dataset is stored in a GCS bucket and downloaded as needed.
– JupyterLab enables interactive development and visualization.
– GPU usage is monitored with `nvidia-smi`.
– The environment can be recreated elsewhere using the exported `environment.yml`.

Benefits of This Approach

– Speed: GPU acceleration drastically reduces training times, enabling faster experimentation and result iteration.
– Scalability: VM resources can be adjusted as your needs grow, including adding more GPUs or increasing RAM and storage.
– Reproducibility: Organized dependency management prevents version conflicts and ensures consistent results across team members and sessions.
– Flexibility: JupyterLab supports interactive development, rapid prototyping, and collaborative work, while virtual environments keep project dependencies isolated.
– Cost-Efficiency: Temporary use of powerful hardware eliminates the need for costly personal GPU hardware, with the ability to shut down VMs when not in use.

Potential Pitfalls and Solutions

– Environment Drift: Always use virtual environments and record dependencies.
– Data Security: Restrict access to the VM (use firewall rules, IAM permissions).
– Session Management: Regularly save your work and back up data; cloud VMs may be preempted or terminated.
– Resource Limits: Be aware of your account’s GPU quota and request increases if needed.

Automation and Infrastructure as Code

For advanced users, infrastructure can be managed programmatically using Terraform or Deployment Manager, enabling repeatable and version-controlled VM provisioning. Docker containers may also be used for further reproducibility and portability, but the Deep Learning VM Images already encapsulate most requirements for most users.

Leveraging Google Cloud Deep Learning VM Images with GPU acceleration and JupyterLab provides a scalable, efficient, and organized solution for model training far beyond the capabilities of a typical laptop. By isolating dependencies in virtual environments and adopting best practices for cloud resource management, you can maximize productivity while maintaining reproducibility and minimizing costs.

EITCA Academy

EITCA Academy is a part of the European IT Certification framework

We care about your privacy

Necessary

Functional

Preferences

External media and social features

Analytics

Marketing and conversions

EITCA Academy

LOG IN TO YOUR ACCOUNT

FORGOT YOUR PASSWORD?

CREATE AN ACCOUNT

If your laptop takes hours to train a model, how would you use a VM with GPU and JupyterLab to speed up the process and organize dependencies without breaking your environment?

Other recent questions and answers regarding Deep learning VM Images:

More questions and answers:

We care about your privacy