×
1 Choose EITC/EITCA Certificates
2 Learn and take online exams
3 Get your IT skills certified

Confirm your IT skills and competencies under the European IT Certification framework from anywhere in the world fully online.

EITCA Academy

Digital skills attestation standard by the European IT Certification Institute aiming to support Digital Society development

LOG IN TO YOUR ACCOUNT

CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR PASSWORD?

AAH, WAIT, I REMEMBER NOW!

CREATE AN ACCOUNT

ALREADY HAVE AN ACCOUNT?
EUROPEAN INFORMATION TECHNOLOGIES CERTIFICATION ACADEMY - ATTESTING YOUR PROFESSIONAL DIGITAL SKILLS
  • SIGN UP
  • LOGIN
  • INFO

EITCA Academy

EITCA Academy

The European Information Technologies Certification Institute - EITCI ASBL

Certification Provider

EITCI Institute ASBL

Brussels, European Union

Governing European IT Certification (EITC) framework in support of the IT professionalism and Digital Society

  • CERTIFICATES
    • EITCA ACADEMIES
      • EITCA ACADEMIES CATALOGUE<
      • EITCA/CG COMPUTER GRAPHICS
      • EITCA/IS INFORMATION SECURITY
      • EITCA/BI BUSINESS INFORMATION
      • EITCA/KC KEY COMPETENCIES
      • EITCA/EG E-GOVERNMENT
      • EITCA/WD WEB DEVELOPMENT
      • EITCA/AI ARTIFICIAL INTELLIGENCE
    • EITC CERTIFICATES
      • EITC CERTIFICATES CATALOGUE<
      • COMPUTER GRAPHICS CERTIFICATES
      • WEB DESIGN CERTIFICATES
      • 3D DESIGN CERTIFICATES
      • OFFICE IT CERTIFICATES
      • BITCOIN BLOCKCHAIN CERTIFICATE
      • WORDPRESS CERTIFICATE
      • CLOUD PLATFORM CERTIFICATENEW
    • EITC CERTIFICATES
      • INTERNET CERTIFICATES
      • CRYPTOGRAPHY CERTIFICATES
      • BUSINESS IT CERTIFICATES
      • TELEWORK CERTIFICATES
      • PROGRAMMING CERTIFICATES
      • DIGITAL PORTRAIT CERTIFICATE
      • WEB DEVELOPMENT CERTIFICATES
      • DEEP LEARNING CERTIFICATESNEW
    • CERTIFICATES FOR
      • EU PUBLIC ADMINISTRATION
      • TEACHERS AND EDUCATORS
      • IT SECURITY PROFESSIONALS
      • GRAPHICS DESIGNERS & ARTISTS
      • BUSINESSMEN AND MANAGERS
      • BLOCKCHAIN DEVELOPERS
      • WEB DEVELOPERS
      • CLOUD AI EXPERTSNEW
  • FEATURED
  • SUBSIDY
  • HOW IT WORKS
  •   IT ID
  • ABOUT
  • CONTACT
  • MY ORDER
    Your current order is empty.
EITCIINSTITUTE
CERTIFIED

How PyTorch reduces making use of multiple GPUs for neural network training to a simple and straightforward process?

by EITCA Academy / Saturday, 02 September 2023 / Published in Artificial Intelligence, EITC/AI/DLPP Deep Learning with Python and PyTorch, Advancing with deep learning, Computation on the GPU, Examination review

PyTorch, an open-source machine learning library developed by Facebook’s AI Research lab, has been designed with a strong emphasis on flexibility and simplicity of use.

One of the important aspects of modern deep learning is the ability to leverage multiple GPUs to accelerate neural network training. PyTorch was specifically designed to simplify this process in comparison to other frameworks.

PyTorch simplifies the process of using multiple GPUs for neural network training, making it accessible even to those who may not have extensive experience with distributed computing. This has been achieved by building into PyTorch features that make the process of running deep learning models on multiple GPUs indeed a simple one, such as the DataParallel and the DistributedDataParallel modules, which are integral parts of PyTorch.

DataParallel Module

The most straightforward method PyTorch offers for utilizing multiple GPUs is the `torch.nn.DataParallel` module. This module allows for parallelizing the computation across multiple GPUs by splitting the input data across the available devices and then gathering the results. The `DataParallel` module works by wrapping around a neural network model:

python
import torch
import torch.nn as nn

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

# Instantiate the model
model = SimpleModel()

# Wrap the model in DataParallel
model = nn.DataParallel(model)

# Move the model to the first GPU
model = model.cuda()

# Create dummy input data
input_data = torch.randn(32, 10).cuda()

# Forward pass
output = model(input_data)

In this example, `DataParallel` automatically handles the distribution of the input data `input_data` to multiple GPUs, performs the forward pass on each GPU, and then collects the results. This approach requires minimal changes to the existing code, making it an attractive option for many users.

DistributedDataParallel Module

For more advanced users who require finer control over the parallelization process, PyTorch provides the `torch.nn.parallel.DistributedDataParallel` (DDP) module. DDP is designed for multi-process, multi-GPU training and offers better performance and scaling compared to `DataParallel`. DDP works by launching multiple processes, each handling a subset of the data and running on a separate GPU.

To use DDP, one must set up a distributed environment, initialize the process group, and then wrap the model with `DistributedDataParallel`:

python
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.nn as nn
import torch.optim as optim

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

# Initialize the process group
def setup(rank, world_size):
    dist.init_process_group("nccl", rank=rank, world_size=world_size)

# Clean up the process group
def cleanup():
    dist.destroy_process_group()

# Define the training loop
def train(rank, world_size):
    setup(rank, world_size)

    # Create the model and move it to the appropriate device
    model = SimpleModel().to(rank)
    ddp_model = nn.parallel.DistributedDataParallel(model, device_ids=[rank])

    # Define the loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)

    # Create dummy input data
    input_data = torch.randn(32, 10).to(rank)
    target = torch.randn(32, 10).to(rank)

    # Forward pass
    output = ddp_model(input_data)
    loss = criterion(output, target)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    cleanup()

# Number of GPUs
world_size = 2

# Spawn the processes
mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)

In this example, `mp.spawn` is used to launch multiple processes, each running the `train` function on a separate GPU. The `setup` function initializes the process group using the NCCL backend, which is optimized for NVIDIA GPUs. The model is then wrapped in `DistributedDataParallel`, and the training loop proceeds as usual.

Automatic Mixed Precision (AMP)

Another feature that simplifies multi-GPU training in PyTorch is Automatic Mixed Precision (AMP). Mixed precision training involves using both 16-bit and 32-bit floating-point numbers to reduce memory usage and increase computational speed. PyTorch’s `torch.cuda.amp` module provides a simple interface for implementing mixed precision training.

To use AMP, one can wrap the forward and backward passes with `torch.cuda.amp.autocast` and `torch.cuda.amp.GradScaler`:

python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

# Instantiate the model and move it to the first GPU
model = SimpleModel().cuda()

# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Create a GradScaler
scaler = GradScaler()

# Create dummy input data
input_data = torch.randn(32, 10).cuda()
target = torch.randn(32, 10).cuda()

# Forward pass with autocast
with autocast():
    output = model(input_data)
    loss = criterion(output, target)

# Backward pass with GradScaler
optimizer.zero_grad()
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

In this example, the `autocast` context manager automatically casts the inputs and model parameters to the appropriate precision. The `GradScaler` scales the loss to prevent underflow during the backward pass and updates the model parameters accordingly.

Model Sharding

For very large models that cannot fit into the memory of a single GPU, PyTorch offers model sharding techniques. Model sharding involves splitting the model itself across multiple GPUs. The `torch.distributed` package provides tools for implementing model sharding, such as the `torch.distributed.rpc` module for remote procedure calls and the `torch.distributed.pipeline.sync.Pipe` module for pipeline parallelism.

python
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.nn as nn
import torch.optim as optim
from torch.distributed.pipeline.sync import Pipe

# Define a simple model with two stages
class Stage1(nn.Module):
    def __init__(self):
        super(Stage1, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

class Stage2(nn.Module):
    def __init__(self):
        super(Stage2, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

# Initialize the process group
def setup(rank, world_size):
    dist.init_process_group("nccl", rank=rank, world_size=world_size)

# Clean up the process group
def cleanup():
    dist.destroy_process_group()

# Define the training loop
def train(rank, world_size):
    setup(rank, world_size)

    # Create the model stages and move them to the appropriate devices
    stage1 = Stage1().to(rank)
    stage2 = Stage2().to(rank + 1)

    # Create a pipeline model
    model = Pipe(torch.nn.Sequential(stage1, stage2), chunks=2)

    # Define the loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.001)

    # Create dummy input data
    input_data = torch.randn(32, 10).to(rank)
    target = torch.randn(32, 10).to(rank + 1)

    # Forward pass
    output = model(input_data)
    loss = criterion(output, target)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    cleanup()

# Number of GPUs
world_size = 2

# Spawn the processes
mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)

In this example, the model is divided into two stages, each running on a separate GPU. The `Pipe` module handles the communication between the stages, allowing for efficient pipeline parallelism.

PyTorch offers a range of integrated tools and techniques for simplifying the use of multiple GPUs in neural network training. From the high-level `DataParallel` module to the more advanced `DistributedDataParallel` and model sharding techniques, PyTorch provides the flexibility and performance needed to tackle a wide variety of deep learning tasks using multiple GPUs in a simple way as compared to other frameworks. Automatic Mixed Precision further enhances the efficiency and simplifies multi-GPU training, reducing memory usage and increasing computational speed. These features make PyTorch a powerful and user-friendly library for deep learning practitioners, characterized by simplicity of using multiple GPUs for neural network training. Using these features generally involves a sraightforward wrapping of a model with the DataParallel or DistributedDataParallel module and ensuring that data inputs are correctly placed on the GPU.

These built-in features of PyTorch make the process of running deep learning models on multiple GPUs indeed a simple one, and that was one of the aims behind developing PyTorch.

Other recent questions and answers regarding Examination review:

  • Why one cannot cross-interact tensors on a CPU with tensors on a GPU in PyTorch?
  • What will be the particular differences in PyTorch code for neural network models processed on the CPU and GPU?
  • What are the differences in operating PyTorch tensors on CUDA GPUs and operating NumPy arrays on CPUs?
  • How can specific layers or networks be assigned to specific GPUs for efficient computation in PyTorch?
  • How can the device be specified and dynamically defined for running code on different devices?
  • How can cloud services be utilized for running deep learning computations on the GPU?
  • What are the necessary steps to set up the CUDA toolkit and cuDNN for local GPU usage?
  • What is the importance of running deep learning computations on the GPU?

More questions and answers:

  • Field: Artificial Intelligence
  • Programme: EITC/AI/DLPP Deep Learning with Python and PyTorch (go to the certification programme)
  • Lesson: Advancing with deep learning (go to related lesson)
  • Topic: Computation on the GPU (go to related topic)
  • Examination review
Tagged under: Artificial Intelligence, DataParallel, DistributedDataParallel, Mixed Precision, Model Sharding, Multi-GPU, PyTorch
Home » Artificial Intelligence » EITC/AI/DLPP Deep Learning with Python and PyTorch » Advancing with deep learning » Computation on the GPU » Examination review » » How PyTorch reduces making use of multiple GPUs for neural network training to a simple and straightforward process?

Certification Center

USER MENU

  • My Account

CERTIFICATE CATEGORY

  • EITC Certification (105)
  • EITCA Certification (9)

What are you looking for?

  • Introduction
  • How it works?
  • EITCA Academies
  • EITCI DSJC Subsidy
  • Full EITC catalogue
  • Your order
  • Featured
  •   IT ID
  • EITCA reviews (Medium publ.)
  • About
  • Contact

EITCA Academy is a part of the European IT Certification framework

The European IT Certification framework has been established in 2008 as a Europe based and vendor independent standard in widely accessible online certification of digital skills and competencies in many areas of professional digital specializations. The EITC framework is governed by the European IT Certification Institute (EITCI), a non-profit certification authority supporting information society growth and bridging the digital skills gap in the EU.
Eligibility for EITCA Academy 90% EITCI DSJC Subsidy support
90% of EITCA Academy fees subsidized in enrolment

    EITCA Academy Secretary Office

    European IT Certification Institute ASBL
    Brussels, Belgium, European Union

    EITC / EITCA Certification Framework Operator
    Governing European IT Certification Standard
    Access contact form or call +32 25887351

    Follow EITCI on X
    Visit EITCA Academy on Facebook
    Engage with EITCA Academy on LinkedIn
    Check out EITCI and EITCA videos on YouTube

    Funded by the European Union

    Funded by the European Regional Development Fund (ERDF) and the European Social Fund (ESF) in series of projects since 2007, currently governed by the European IT Certification Institute (EITCI) since 2008

    Information Security Policy | DSRRM and GDPR Policy | Data Protection Policy | Record of Processing Activities | HSE Policy | Anti-Corruption Policy | Modern Slavery Policy

    Automatically translate to your language

    Terms and Conditions | Privacy Policy
    EITCA Academy
    • EITCA Academy on social media
    EITCA Academy


    © 2008-2026  European IT Certification Institute
    Brussels, Belgium, European Union

    TOP

    We care about your privacy

    EITCI uses cookies and similar technologies to keep this site secure, remember your choices, provide personalized experience, measure the traffic, serve more relevant content and certification programmes. You can accept all cookies or customize your preferences. Cookies are variables used to store website specific information on your device to facilitate processing of data for personalized website visit, such as login to your account, accessing the programmes, placing enrolment orders in chosen programmes and improving your EITC certification journey. You can change or withdraw your consent at any time by clicking the Consent Preferences button at the left-bottom of your screen. We respect your choices and are committed to providing you with a transparent and secure browsing experience, which may be limited when cookies aren't accepted. For more details refer to the Privacy Policy
    Customize Consent Preferences
    We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.
    The cookies categorized as Necessary are stored on your browser as they are essential for enabling the basic functionalities of the site.
    To learn more about how Google processes personal information, visit: Google privacy policy

    Necessary

    Always Active

    Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

    Functional

    Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

    Preferences

    Stores personalization choices such as interface preferences.

    External media and social features

    Allows embedded video, social, chat, and external interactive services that may set their own cookies. Keep off until the user chooses these features.

    Analytics

    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

    Marketing and conversions

    Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

    CHAT WITH SUPPORT
    Do you have any questions?
    Attach files with the paperclip or paste screenshots into the message box (Ctrl+V). Max 5 file(s), 10 MB each.
    We will reply here and by email. Your conversation is tracked with a support token.