Multi-Modal Symposium 2024

Recently, the development of multi-modal large models has progressed rapidly, covering various research topics including text-to-image and text-to-video generation, more comprehensive image and text understanding, 3D visual scene understanding, and Embodied AI. At this stage of rapid development in multi-modal technology, we have specifically organized the Multi-Modal Symposium. This symposium aims to invite experts to share their recent public research findings and discuss advancements in the area of multi-modal models.

Program Schedule

You are welcome to join our symposium.
Venue 1: Room CYT LT1B, Cheng Yu Tung Building, CUHK
Venue 2: Room CYT 214, Cheng Yu Tung Building, CUHK

(April 21) Day 1 Morning Session (Venue 1)
8:30 - 9:15	Keynote Speak: Irwin King (CUHK)
9:20 - 10:05	Topic: Speech Foundation Models for Disordered Speech Reconstruction Xixin Wu (CUHK)
10:05 - 10:20	Coffee Break
10:20 - 11:05	Topic: Towards Efficient and Resilient Visual Perception at the Edge (Zoom) Yin Li (UW-Madison)
11:10 - 11:55	Topic: Embodied Intelligence for Surgical Robot Applications Qi Dou (CUHK)

(April 21) Day 1 Afternoon Session (Venue 1)
13:40 - 14:25	Topic: Multimodal Agents: from Text to Multimodal Reasoning and Action (Zoom) Zhengyuan Yang (Microsoft Redmond)
14:30 - 15:15	Topic: From Words to Actions: Toward Environment-Grounded Large Language Models for Embodied Agents Wenjie Li (Polyu) and Jing Li (Polyu)
15:20 - 16:05	Topic: From Evaluation To Understanding: Auto-benchmarking (Multi-modal) LLMs and Beyond Yixin Cao (SMU)
16:05 - 16:20	Coffee Break
16:20 - 17:05	Topic: From Circuit Learning to Large Circuit Models Qiang Xu (CUHK)
17:10 - 17:55	Topic: Towards Controllable and Compositional Visual Content Generation Xihui Liu (HKU)
18:00 - 18:45	Doctoral Forum Panel Dr. Zhengzhe Liu

(April 21) Day 1 Afternoon Session (Venue 2)
14:30 - 17:50	Industry Seminar

(April 22) Day 2 Morning Session (Venue 1)
8:30 - 9:15	Topic: What Can Robots Generate and What Can Be Generated for Robots? (Zoom) Huazhe Xu (Tsinghua)
9:20 - 10:05	Topic: Towards Unified Multi-modal Learning Xiangyu Yue (CUHK)
10:05 - 10:20	Coffee Break
10:20 - 11:05	Topic: Recent Advances in Video Generation and Editing with Video Diffusion Models Qifeng Chen (HKUST)
11:10 - 11:55	Topic: Exploring Pathways to 3D Foundation Models Hengshuang Zhao (HKU)

(April 22) Day 2 Afternoon Session (Venue 1)
13:40 - 14:25	Keynote Speak: Design and Deployment of Multi-modal Learning Systems for Smart Health Guoliang Xing (CUHK)
14:30 - 15:15	Topic: Learning to Reconstruct, Understand, and Recreate the 3D World Xiaojuan Qi (HKU)
15:20 - 16:05	Topic: Multi-Modal Multi-Task Scene Perception, Reconstruction, and Generation Dan Xu (HKUST)
16:05 - 16:20	Coffee Break
16:20 - 17:05	Topic: Efficient Diffusion Transformer for Image and Video Generation and Understanding Ping Luo (HKU)
17:10 - 17:55	Topic: Image Quality Evaluation Using Large Language Model Tianfan Xue (CUHK)
18:00 - 18:45	Topic: Learning Open-World Visual Knowledge from Natural Language Dr. Yiwu Zhong

(April 22) Day 2 Afternoon Session (Venue 2)
14:30 - 17:50	Industry Seminar

Symposium Organizers

CUHK CSE Language and Vision Lab (LaVi Lab)

Location

The Cheng Yu Tung Building (鄭裕彤樓) is located right next to the University MTR Station and Hyatt Hotel, outside of the CUHK campus. No campus pass is needed to access the building.

From University Station:
Exit via University MTR Station Exit B, follow the walkway on your right to enter the Cheng Yu Tung Building.

From Hyatt Hotel:
Go to the Ground floor (one level down from the hotel Lobby), walk across the parking lot, and follow the walkway on your left to enter the Cheng Yu Tung Building.

Multi-Modal Symposium 2024

April 21th - 22th, 2024 Location: Cheng Yu Tung Building, CUHK