This course covers transformer models and their applications in natural language processing and computer vision. Topics include the T5 model, fine-tuning for tasks such as abstractive summarization, and the Vision Transformer. Students will learn to build an image captioning system by combining vision and language models. The course also provides practical instruction on deploying models, including MLOps practices, sharing models on HuggingFace, and cloud deployment with FastAPI. By the end of the course, students will have the knowledge and skills to implement, fine-tune, and deploy transformer models for various real-world tasks.

This Labor Day, enjoy $120 off Coursera Plus. Unlock access to 10,000+ programs. Save today.


Introduction to Transformer Models for NLP: Unit 3
This course is part of Introduction to Transformer Models for NLP Specialization

Instructor: Pearson
Included with
Recommended experience
What you'll learn
Understand and apply the T5 model’s end-to-end transformer architecture for advanced NLP tasks.
Fine-tune and evaluate transformer models for complex applications like abstractive summarization.
Leverage Vision Transformers and build custom image captioning systems by combining vision and language models.
Deploy, share, and operationalize transformer models using modern MLOps tools and cloud frameworks.
Skills you'll gain
Details to know

Add to your LinkedIn profile
August 2025
5 assignments
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There is 1 module in this course
This module explores advanced transformer models and their applications across natural language processing and computer vision. Learners will examine the T5 model’s end-to-end architecture and cross-attention mechanism, apply and fine-tune T5 for complex NLP tasks, and discover how vision transformers extend these techniques to image processing and image captioning. The module concludes with practical strategies for deploying and sharing transformer models using MLOps principles, HuggingFace, and FastAPI, equipping students with both theoretical understanding and hands-on skills for state-of-the-art model development and deployment.
What's included
15 videos5 assignments
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Explore more from Machine Learning
Why people choose Coursera for their career





Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.
If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.
Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.
More questions
Financial aid available,