Can I preview a course before enrolling?

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

When will I have access to the lectures and assignments?

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

What will I get when I enroll?

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Train Large Language Models Faster - Parallelism Deep Dive

Gain next-level skills with Coursera Plus for $199 (regularly $399). Save now.

Train Large Language Models Faster - Parallelism Deep Dive

Instructor: Packt - Course Instructors

16 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

16 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Learn to apply parallelism strategies to accelerate LLM training.
Understand the differences and use cases of data, model, and hybrid parallelism.
Gain hands-on experience with PyTorch and DeepSpeed for LLM training optimization.
Master fault tolerance and checkpointing strategies to ensure training reliability.

Skills you'll gain

Generative AI

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 16 modules in this course

This course features Coursera Coach!

A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course focuses on accelerating the training of large language models (LLMs) through parallelism strategies. By exploring techniques such as data, model, and hybrid parallelism, you will learn how to optimize training processes for faster results. The course breaks down complex topics in a structured way, starting with an introduction to parallel computing and scaling laws, before diving into hands-on applications using popular libraries like PyTorch and DeepSpeed. You will also gain practical experience running parallelism strategies on multi-GPU systems and exploring fault tolerance techniques to ensure reliable training. The course integrates theoretical concepts with real-world examples to provide a comprehensive understanding of LLM training. Throughout the course, you will explore various types of parallelism—data, model, pipeline, and tensor parallelism—and their applications in LLMs. You’ll work with datasets like MNIST and WikiText, gaining hands-on experience implementing parallel strategies to optimize training speed. The course culminates in an exploration of advanced checkpointing strategies and fault tolerance methods, ensuring you understand how to recover from system failures during training. This course is perfect for learners interested in optimizing machine learning workflows and accelerating AI model development. A background in machine learning or deep learning is recommended, and the course is suitable for intermediate learners seeking to deepen their knowledge of LLM training strategies. By the end of the course, you will be able to implement and compare various parallelism techniques for LLM training, run distributed training on multi-GPU environments, apply fault tolerance strategies, and understand advanced topics in parallel computing.

In this module, we will introduce the course, explain the key objectives, and provide a roadmap of how parallelism techniques will accelerate large language model training. You will gain an overview of what to expect and get familiar with the course structure.

What's included

3 videos1 reading

In this module, we will explore the different parallelism strategies for LLM training, including single GPU vs. parallel strategies. You'll understand how parallelism improves efficiency and learn its key advantages in real-world applications.

What's included

4 videos1 assignment

In this module, we will establish a foundational understanding of IT concepts crucial for training LLMs. Topics like cloud computing, storage solutions, and computer architecture will provide the context for optimizing LLM workflows.

What's included

10 videos1 assignment

10 videosTotal 54 minutes

IT Fundamentals - Introduction1 minute
Introduction to Cloud Computing and Traditional IT8 minutes
What is a Computer - CPU and RAM Overview6 minutes
Data Storage and File Systems3 minutes
OS File System Structure2 minutes
LAN Introduction10 minutes
What is the Internet7 minutes
Internet Communication Deep Dive4 minutes
Understanding Servers and Clients6 minutes
GPUs - Overview1 minute

1 assignmentTotal 15 minutes

IT Fundamental Concepts - Assessment15 minutes

In this module, we will explore GPU architecture and its role in LLM training. You'll learn how GPUs are designed to handle the massive computations required by large models, ensuring faster and more efficient training.

What's included

2 videos1 assignment

In this module, we will cover the fundamentals of machine learning and deep learning. We’ll explore neural networks, training processes, and key differences between ML and DL to lay the groundwork for LLM training.

What's included

11 videos1 assignment

11 videosTotal 63 minutes

Machine and Deep Learning Introduction1 minute
Deep and Machine Learning - Overview and Breakdown9 minutes
Deep Learning Key Aspects10 minutes
Deep Neural Networks - Deep Dive9 minutes
The Single Neuron Computation - Deep Dive5 minutes
Weights2 minutes
Activation Functions - Deep Dive6 minutes
Deep Learning - Summary1 minute
Machine Learning Introduction - ML vs DL4 minutes
Learning Types and Full ML & DL Analogy Example5 minutes
DL and ML Comparative Capabilities - Summary4 minutes

1 assignmentTotal 15 minutes

Deep and Machine Learning - Deep Dive -Assessment15 minutes

In this module, we will dive into the fundamentals of LLMs, starting with the Transformer architecture. You'll learn about key components such as self-attention and how the Transformer library powers modern AI applications.

What's included

5 videos1 assignment

In this module, we will introduce parallel computing concepts and their relevance to LLM training. You’ll gain a deeper understanding of how parallelism reduces bottlenecks and accelerates model development.

What's included

2 videos1 assignment

In this module, we will explore data, model, and hybrid parallelism in detail. You’ll learn how each strategy optimizes training workflows and where to apply them for maximum efficiency in LLM training.

What's included

11 videos1 assignment

11 videosTotal 49 minutes

Types of Parallelism in LLM Training1 minute
Data Parallelism - How It Works11 minutes
Data Parallelism Advantages for LLM Training0 minutes
Real-world Example - Data Parallelism in GPT-3 Training5 minutes
Model Parallelism and Tensor Parallelism and Layer Parallelism - Deep Dive8 minutes
LLM Relevance and Implementation2 minutes
Model vs Data Parallelism8 minutes
Key Differences Highlighted - Data vs Model Parallelism2 minutes
Data vs Model Parallelism1 minute
Hybrid Parallelism - Animation4 minutes
Hybrid Parallelism - What is It and Motivation2 minutes

1 assignmentTotal 15 minutes

Types of Parallelism in LLM Training - Data, Model, and Hybrid Parallelism - Assessment15 minutes

In this module, we will delve into pipeline and tensor parallelism, explaining their key concepts and how they work together to enhance training efficiency. You’ll also explore real-world strategies for implementing these techniques.

What's included

11 videos1 assignment

11 videosTotal 54 minutes

Pipeline Parallelism Overview3 minutes
Pipeline Parallelism Key Concepts and How it Works - Step by Step6 minutes
Pipeline Bubbles Key Concepts3 minutes
Pipeline Schedules Key Concepts3 minutes
Activation Recomputation - Overview and Introduction1 minute
Neural Network and Activation and Backward and Forward Passes - Full Dive7 minutes
Understanding Activation Recomputation vs Standard Training - Deep Dive9 minutes
Demo - Activation Recomputation Visualization2 minutes
Activation Recomputation vs Standard Approach3 minutes
Benefits of Activation Recomputation and Implementation Strategies9 minutes
Pipeline Parallelism Implementation Frameworks and Key Takeaways4 minutes

1 assignmentTotal 15 minutes

Types of Parallelism - Pipeline and Tensor Parallelism - Assessment15 minutes

In this module, we will dive deep into tensor parallelism, focusing on partitioning strategies, communication patterns, and device synchronization. You'll gain a clear understanding of how this technique accelerates LLM training.

What's included

8 videos1 assignment

8 videosTotal 39 minutes

What is Tensor Parallelism and Why - Benefits4 minutes
Tensor Parallel Pizza Making Analogy2 minutes
Tensors and Partitioning Strategies - Deep Dive6 minutes
Tensor Communication Patterns - Deep Dive9 minutes
Device Mesh Communication Pattern - Deep Dive5 minutes
How Components Work Together in Distributed LLM Training3 minutes
Understanding Tensor Parallelism with LEGO Bricks Animation Demo3 minutes
Putting it All Together - All Strategies in LLM Training4 minutes

1 assignmentTotal 15 minutes

Tensor Parallelism - Deep Dive - Assessment15 minutes

In this module, we will shift to hands-on learning, applying data parallelism techniques in PyTorch. You'll train a small model on the MNIST dataset, testing different parallelism strategies and observing their effects on performance.

What's included

11 videos1 assignment

11 videosTotal 59 minutes

Strategies for Parallelizing LLMs - Hands-on Introduction0 minutes
Pytorch - LLM Training Library Overview4 minutes
The Transformers Library - Overview1 minute
Numpy Overview1 minute
TorchVision and TorchDistributed Overview3 minutes
DeepSpeed and Megatron-LM - Overview3 minutes
Datasets and Why this Toolkit3 minutes
HANDS-On: Data Parallelism - Training a Small Model - MNIST Dataset20 minutes
Testing Pseudo Data Parallelism Trained Model8 minutes
HANDS-ON: Data Parallelism - Colab - Full Demo9 minutes
Data Parallelism - Simulated Parallelism on GPU Takeaways2 minutes

1 assignmentTotal 15 minutes

HANDS-ON: Strategies for Parallelism - Data Parallelism Deep Dive - Assessment15 minutes

In this module, we will apply data parallelism to the WikiText-2 dataset and use DeepSpeed to optimize memory usage. You'll gain hands-on experience with advanced techniques to improve LLM training efficiency.

What's included

3 videos1 assignment

In this module, we will guide you through setting up Runpod.io for multi-GPU parallelism. You’ll gain practical experience running parallelism experiments on a distributed environment and working with large-scale models.

What's included

5 videos1 assignment

In this module, we will dive into fault tolerance and checkpointing strategies. You'll learn how to ensure scalable, resilient LLM training workflows that can recover from failures and continue without interruptions.

What's included

10 videos1 assignment

10 videosTotal 49 minutes

Fault Tolerance Introduction & Types of Failures in Distributed LLM Training3 minutes
Strategies for Fault Tolerance5 minutes
Checkpointing in LLM Training - Animation4 minutes
Basic Checkpointing in LLM Training3 minutes
Incremental Checkpointing in LLM Training7 minutes
Asynchronous Checkpointing in LLM Training6 minutes
Multi-level Checkpointing in LLM Training - Animation9 minutes
Checkpoint Storage Considerations - Deep Dive3 minutes
Implementing a Hybrid Approach - Performance, Failure, Optimizations - Full Dive4 minutes
Checkpoint Storage Strategy - Summary1 minute

1 assignmentTotal 15 minutes

Fault Tolerance and Scalability & Advanced Checkpointing Strategies - Deep Dive - Assessment15 minutes

In this module, we will explore cutting-edge advancements in parallel computing and LLM training. You'll gain insight into the latest trends and technologies that are revolutionizing AI and the future of machine learning.

What's included

1 video1 assignment

In this module, we will wrap up the course by summarizing everything you've learned about parallelism and LLM training. You'll also receive guidance on how to proceed with your AI journey and apply these skills in future projects.

What's included

1 video2 assignments

Instructor

Packt - Course Instructors

Packt

1,299 Courses334,545 learners

Offered by

Packt

Explore more from Cloud Computing

Status: Free Trial
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 3
Course
Status: Free Trial
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 1
Course
Status: Free Trial
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 2
Course
Packt
Decoding Large Language Models
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

Parallelism in LLM training refers to distributing the workload of training a large language model across multiple computational resources, such as GPUs, to speed up the process. This is relevant because LLMs require vast amounts of computational power and time for training, and parallelism allows researchers and engineers to overcome these challenges. By using parallelism techniques, you can significantly reduce training time, optimize resource utilization, and enable more scalable AI systems.

This course focuses on teaching how to use parallelism to train large language models more efficiently. It covers various parallelism strategies, such as data parallelism, model parallelism, and hybrid parallelism. Through theoretical lessons and hands-on demos, you'll explore how parallel computing can accelerate LLM training, dive into GPU architectures, and gain practical experience with tools like PyTorch, DeepSpeed, and Megatron-LM. By the end of the course, you'll be well-equipped to apply parallelism strategies to your own LLM projects.

Upon completion of this course, you will be able to apply parallelism techniques to train large language models efficiently. You will have hands-on experience with various parallelism strategies such as data, model, and hybrid parallelism. You'll also be familiar with tools like PyTorch and DeepSpeed to optimize the training of LLMs, and you’ll understand how to implement fault tolerance and scalability strategies in distributed environments. This will allow you to improve model training speed and performance in real-world scenarios.

To get the most out of this course, you should have a foundational understanding of machine learning concepts, particularly deep learning and neural networks. Familiarity with Python and machine learning libraries like PyTorch will be helpful, as the course involves hands-on exercises using these tools. While experience with GPUs, cloud computing, or distributed systems is not required, it is recommended as the course will introduce these concepts as part of the curriculum.

This course is ideal for machine learning engineers, data scientists, and AI practitioners who want to optimize the training of large language models. If you're looking to improve your skills in distributed computing, parallelism, and large-scale AI model training, this course will provide you with the knowledge and practical skills needed to succeed. It's also suitable for those working in research or production environments who are involved in training and optimizing LLMs.

The course consists of 8 hours of video content. Depending on your learning pace and how much time you spend on hands-on activities and exercises, it can typically be completed within a week or two. The course is designed to be engaging and practical, offering both theoretical and applied knowledge to help you master parallelism techniques for LLM training.

Train Large Language Models Faster - Parallelism Deep Dive

What you'll learn

Skills you'll gain

Details to know

See how employees at top companies are mastering in-demand skills

There are 16 modules in this course

Introduction

What's included

Strategies for Parallelizing LLMS - Deep Dive

What's included

IT Fundamental Concepts

What's included

GPU Architecture for LLM Training Deep Dive

What's included

Deep and Machine Learning - Deep Dive

What's included

Large Language Models - Fundamentals of AI and LLMs

What's included

Parallel Computing Fundamentals & Parallelism in LLM Training

What's included

Types of Parallelism in LLM Training - Data, Model, and Hybrid Parallelism

What's included

Types of Parallelism - Pipeline and Tensor Parallelism

What's included

Tensor Parallelism - Deep Dive

What's included

HANDS-ON: Strategies for Parallelism - Data Parallelism Deep Dive

What's included

HANDS-ON: Data Parallelism w/ WikiText Dataset & DeepSpeed Mem. Optimization

What's included

Running TRUE Parallelism on Multiple GPU Systems - Runpod.io

What's included

Fault Tolerance and Scalability & Advanced Checkpointing Strategies - Deep Dive

What's included

Advanced Topics and Emerging Trends

What's included

Wrap up and Next Steps

What's included

Instructor

Offered by

Explore more from Cloud Computing

Quick Start Guide to Large Language Models (LLMs): Unit 3

Quick Start Guide to Large Language Models (LLMs): Unit 1

Quick Start Guide to Large Language Models (LLMs): Unit 2

Decoding Large Language Models

Why people choose Coursera for their career

Next level skills. New Year savings.

Drive your business forward and empower your teams

Frequently asked questions

What is parallelism in large language model (LLM) training, and why is it relevant?

What is this course about?

What will I be able to do after completing this course?

More questions