"Fine-tuning large language models (LLMs) is essential for aligning them with specific business needs, improving accuracy, and optimizing performance. In today’s AI-driven world, organizations rely on fine-tuned models to generate precise, actionable insights that drive innovation and efficiency. This course equips aspiring generative AI engineers with the in-demand skills employers are actively seeking.

Early bird sale! Unlock 10,000+ courses from Google, IBM, and more for 50% off. Save today.


Generative AI Advance Fine-Tuning for LLMs
This course is part of multiple programs.



Instructors: Joseph Santarcangelo
8,091 already enrolled
Included with
(79 reviews)
Recommended experience
What you'll learn
In-demand generative AI engineering skills in fine-tuning LLMs that employers are actively seeking
Instruction tuning and reward modeling using Hugging Face, plus understanding LLMs as policies and applying RLHF techniques
Direct preference optimization (DPO) with partition function and Hugging Face, including how to define optimal solutions to DPO problems
Using proximal policy optimization (PPO) with Hugging Face to build scoring functions and tokenize datasets for fine-tuning
Skills you'll gain
Details to know

Add to your LinkedIn profile
5 assignments
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There are 2 modules in this course
In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) through instruction tuning and reward modeling. You’ll begin by defining instruction tuning and learning its process, including dataset loading, text generation pipelines, and training arguments using Hugging Face. You’ll then delve into reward modeling, where you’ll preprocess datasets, apply low-rank adaptation (LoRA) configurations, and quantify quality responses to guide model optimization and align with human preferences. You’ll also describe and utilize reward trainers and reward model loss functions. In addition, the hands-on labs will reinforce your learning with practical experience in instruction tuning and reward modeling, empowering you to effectively customize LLMs for targeted tasks.
What's included
6 videos4 readings2 assignments2 app items3 plugins
In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) using reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), and direct preference optimization (DPO). You’ll begin by describing how LLMs function as probabilistic distributions and how these can be transformed into policies to generate responses based on input text. You’ll examine the relationship between policies and language models as a function of parameters, such as omega, and how rewards can be calculated using human feedback. This includes training response samples, evaluating agent performance, and defining scoring functions for tasks like sentiment analysis using PPO. You’ll also be able to explain PPO configuration, learning rates, and the PPO trainer’s role in optimizing chatbot responses using Hugging Face tools. The module further introduces DPO, a more direct and efficient way to align models with human preferences. While complex topics like PPO and reinforcement learning are introduced, you are not expected to understand them in depth for this course. The hands-on labs in this module will allow you to practice applying RLHF and DPO. To support your learning, a cheat sheet and glossary are included for quick reference.
What's included
10 videos5 readings3 assignments2 app items4 plugins
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Offered by
Explore more from Machine Learning
- Status: Free Trial
- Status: Free Trial
- Status: Free Trial
- Status: Free Trial
Why people choose Coursera for their career




Learner reviews
79 reviews
- 5 stars
72.50%
- 4 stars
8.75%
- 3 stars
5%
- 2 stars
5%
- 1 star
8.75%
Showing 3 of 79
Reviewed on Mar 10, 2025
The course gave me a good understanding of fine-tuning LLMs. It made complex topics easy to learn.
Reviewed on Mar 10, 2025
Very Informative – Covers advanced fine-tuning techniques in a clear and structured way
Reviewed on Mar 10, 2025
Great course, love the deep-rooted content. All my concepts are so clear now. Kudos!!

Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
It takes about 3–5 hours to complete this course, so you can have the job-ready skills you need to impress an employer within just two weeks!
This course is intermediate level, so to get the most out of your learning, you must have basic knowledge of Python, large language models (LLMs), reinforcement learning, and instruction-tuning. You should also be familiar with machine learning and neural network concepts.
This course is part of the Generative AI Engineering with LLMs specialization. When you complete the specialization, you will have the skills and confidence to take on job roles such as AI engineer, data scientist, machine learning engineer, deep learning engineer, AI engineer, and developers seeking to work with LLMs.
More questions
Financial aid available,