IBM
Build Multimodal Generative AI Applications
IBM

Build Multimodal Generative AI Applications

Hailey Quach
IBM Skills Network Team

Instructors: Hailey Quach

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

7 hours to complete
3 weeks at 2 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

7 hours to complete
3 weeks at 2 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Build the job-ready skills you need to build multimodal generative AI applications in just 3 weeks

  • Understand the fundamental concepts and challenges in multimodal AI, including the integration of text, speech, images, and video

  • Build multimodal AI applications using state-of-the-art models and frameworks such as IBM’s Granite, Meta’s Llama, OpenAI’s Whisper, DALL·E and Sora

  • Develop multimodal AI solutions, including chatbots and image/video generation models, using IBM watsonx.ai, Hugging Face, Flask and Gradio

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

May 2025

Assessments

6 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your Software Development expertise

This course is part of the IBM RAG and Agentic AI Professional Certificate
When you enroll in this course, you'll also be enrolled in this Professional Certificate.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from IBM
Coursera Career Certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Coursera Career Certificate

There are 3 modules in this course

This module provides an in-depth introduction to multimodal AI, focusing on how AI systems process and integrate multiple data types, including text, speech, and images. You will explore core concepts and some of the challenges you will face in multimodal AI, gaining foundational skills with text and speech processing techniques. Through hands-on labs, you will apply AI-powered storytelling, speech-to-text transcription, and text-to-speech synthesis to real-world applications, such as AI-generated audiobooks and automated meeting assistants. 

What's included

4 videos2 readings2 assignments2 app items6 plugins

This module explores how AI processes generate visual data by integrating images and videos with text. You will examine text-to-image/image-to-text and text-to-video/video-to-text models, image captioning, and the fusion techniques necessary for effective multimodal AI systems. Through hands-on labs, you will apply state-of-the-art models like DALL·E and Sora to generate images and videos from text prompts. Additionally, you will implement an image captioning system using Meta’s Llama 4, gaining practical experience in combining vision and language models for real-world applications.

What's included

2 videos1 reading2 assignments2 app items3 plugins

The final module explores advanced multimodal AI applications, integrating image, text, and retrieval-based systems to build innovative solutions. You will dive into multimodal retrieval and search, multimodal Question Answering (QA), and chatbots, learning how cross-modal retrieval techniques enhance search engines and recommendation systems. Additionally, you will learn how integrating visual and textual data improves chatbot interactions. Through hands-on labs, you will build fully functional web applications with multimodal capabilities using Flask, applying state-of-the-art models and frameworks. 

What's included

3 videos3 readings2 assignments2 app items1 plugin

Instructors

Hailey Quach
IBM
0 Courses0 learners

Offered by

IBM

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

New to Software Development? Start here.

Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions