Jing Yu Koh


I am a 2nd year PhD student in the Machine Learning Department at Carnegie Mellon University, advised by Daniel Fried and Ruslan Salakhutdinov. I work on grounded language understanding, usually in the context of vision-and-language problems.

Prior to this, I was a Research Engineer (and previously an AI Resident) at Google Research in Jason Baldridge's team from 2019-2022, where I worked on vision-and-language problems and generative models. Before that, I completed my undergraduate studies at the Singapore University of Technology and Design summa cum laude (highest honors) in 2019.

My first name is "Jing Yu" and informally I go by the nickname "JY". 许靖宇 is my name in Chinese. I'm from Singapore.


Selected Publications [Google Scholar]


VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Jing Yu Koh, Robert Lo*, Lawrence Jang*, Vikram Duvvur*, Ming Chong Lim*, Po-Yu Huang*, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

Preprint, 2024.


Generating Images with Multimodal Language Models

Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Advances in Neural Information Processing Systems (NeurIPS), 2023.

Grounding Language Models to Images for Multimodal Inputs and Outputs

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

The International Conference on Machine Learning (ICML), 2023.

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

The International Conference on Computer Vision (ICCV), 2023.

Simple and Effective Synthesis of Indoor 3D Scenes

Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson (* denotes equal contribution)

AAAI Conference on Artificial Intelligence (AAAI), 2023.


Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

Transactions of Machine Learning Research (TMLR), 2022.


Pathdreamer: A World Model for Indoor Navigation

Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

The International Conference on Computer Vision (ICCV), 2021.

Vector-quantized Image Modeling with Improved VQGAN

Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

The International Conference on Learning Representations (ICLR), 2022.


Cross-Modal Contrastive Learning for Text-to-Image Generation

Han Zhang*, Jing Yu Koh*, Jason Baldridge, Honglak Lee, Yinfei Yang (* denotes equal contribution)

The Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Text-to-Image Generation Grounded by Fine-Grained User Attention

Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

The IEEE Winter Conference on Applications of Computer Vision (WACV), 2021.


SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information

Jing Yu Koh, Duc Thanh Nguyen, Quang-Trung Truong, Sai-Kit Yeung, Alexander Binder

The European Conference on Computer Vision (ECCV), 2020.