Multi-Modal Symposium 2024


April 21th - 22th, 2024

Location: Cheng Yu Tung Building, CUHK

Pavilion of Harmony @ CUHK (photo from Wikipedia)

Multi-Modal Symposium 2024

Recently, the development of multi-modal large models has progressed rapidly, covering various research topics including text-to-image and text-to-video generation, more comprehensive image and text understanding, 3D visual scene understanding, and Embodied AI. At this stage of rapid development in multi-modal technology, we have specifically organized the Multi-Modal Symposium. This symposium aims to invite experts to share their recent public research findings and discuss advancements in the area of multi-modal models.

Program Schedule

You are welcome to join our symposium.
Venue 1: Room CYT LT1B, Cheng Yu Tung Building, CUHK
Venue 2: Room CYT 214, Cheng Yu Tung Building, CUHK

(April 21) Day 1 Morning Session (Venue 1)
8:30 - 9:15 Keynote Speak:
Irwin King (CUHK)
9:20 - 10:05 Topic: Speech Foundation Models for Disordered Speech Reconstruction
Xixin Wu (CUHK)
10:05 - 10:20 Coffee Break
10:20 - 11:05 Topic: Towards Efficient and Resilient Visual Perception at the Edge (Zoom)
Yin Li (UW-Madison)
11:10 - 11:55 Topic: Embodied Intelligence for Surgical Robot Applications
Qi Dou (CUHK)
(April 21) Day 1 Afternoon Session (Venue 1)
13:40 - 14:25 Topic: Multimodal Agents: from Text to Multimodal Reasoning and Action (Zoom)
Zhengyuan Yang (Microsoft Redmond)
14:30 - 15:15 Topic: From Words to Actions: Toward Environment-Grounded Large Language Models for Embodied Agents
Wenjie Li (Polyu) and Jing Li (Polyu)
15:20 - 16:05 Topic: From Evaluation To Understanding: Auto-benchmarking (Multi-modal) LLMs and Beyond
Yixin Cao (SMU)
16:05 - 16:20 Coffee Break
16:20 - 17:05 Topic: From Circuit Learning to Large Circuit Models
Qiang Xu (CUHK)
17:10 - 17:55 Topic: Towards Controllable and Compositional Visual Content Generation
Xihui Liu (HKU)
18:00 - 18:45 Doctoral Forum Panel
Dr. Zhengzhe Liu
(April 21) Day 1 Afternoon Session (Venue 2)
14:30 - 17:50 Industry Seminar
(April 22) Day 2 Morning Session (Venue 1)
8:30 - 9:15 Topic: What Can Robots Generate and What Can Be Generated for Robots? (Zoom)
Huazhe Xu (Tsinghua)
9:20 - 10:05 Topic: Towards Unified Multi-modal Learning
Xiangyu Yue (CUHK)
10:05 - 10:20 Coffee Break
10:20 - 11:05 Topic: Recent Advances in Video Generation and Editing with Video Diffusion Models
Qifeng Chen (HKUST)
11:10 - 11:55 Topic: Exploring Pathways to 3D Foundation Models
Hengshuang Zhao (HKU)
(April 22) Day 2 Afternoon Session (Venue 1)
13:40 - 14:25 Keynote Speak: Design and Deployment of Multi-modal Learning Systems for Smart Health
Guoliang Xing (CUHK)
14:30 - 15:15 Topic: Learning to Reconstruct, Understand, and Recreate the 3D World
Xiaojuan Qi (HKU)
15:20 - 16:05 Topic: Multi-Modal Multi-Task Scene Perception, Reconstruction, and Generation
Dan Xu (HKUST)
16:05 - 16:20 Coffee Break
16:20 - 17:05 Topic: Efficient Diffusion Transformer for Image and Video Generation and Understanding
Ping Luo (HKU)
17:10 - 17:55 Topic: Image Quality Evaluation Using Large Language Model
Tianfan Xue (CUHK)
18:00 - 18:45 Topic: Learning Open-World Visual Knowledge from Natural Language
Dr. Yiwu Zhong
(April 22) Day 2 Afternoon Session (Venue 2)
14:30 - 17:50 Industry Seminar

Location

The Cheng Yu Tung Building (鄭裕彤樓) is located right next to the University MTR Station and Hyatt Hotel, outside of the CUHK campus. No campus pass is needed to access the building.

From University Station:
Exit via University MTR Station Exit B, follow the walkway on your right to enter the Cheng Yu Tung Building.

From Hyatt Hotel:
Go to the Ground floor (one level down from the hotel Lobby), walk across the parking lot, and follow the walkway on your left to enter the Cheng Yu Tung Building.