Zeeshan Patel

I am a Member of Technical Staff at xAI focusing on multimodal / world models research.

Previously, I worked on generative world models at NVIDIA Research and foundation models at Apple AI/ML.

I graduated with an M.S. in EECS from UC Berkeley in May 2025 advised by Professor Alexei Efros and Professor Jitendra Malik at Berkeley Artificial Intelligence Research (BAIR). I graduated with honors from UC Berkeley with a Bachelor's in CS & Statistics in May 2024.

X  /  LinkedIn  /  Github  /  Google Scholar

profile photo
Research

I'm broadly interested in deep learning, generative models, and physical AI. Specifically, I'm interested in scaling deep learning with principled techniques that efficiently utilize data and compute.

Cosmos World Foundation Model Platform for Physical AI
NVIDIA: Zeeshan Patel (Contributor),
arXiv, 2025
project page / arXiv / code / keynote / press: New York Times, Wall Street Journal, Fortune, TechCrunch, Forbes, Wired, BBC

Generative world foundation models for data-driven simulation of physical AI systems.

Training Video Foundation Models with NVIDIA NeMo
NVIDIA: Zeeshan Patel (Lead Contributor),
arXiv / NVIDIA technical blog / code

Open-source video foundation model training framework, providing accelerated video dataset curation, multimodal dataloading, and parallelized video diffusion model training and inference.

Scaling Properties of Diffusion Models For Perceptual Tasks
Zeeshan Patel*, Rahul Ravishankar*, Jathushan Rajasegaran, Jitendra Malik
CVPR 2025
project page / arXiv / code

Iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and segmentation under image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perception tasks.

SWAG: Storytelling With Action Guidance
Zeeshan Patel*, Jonathan Pei*, Karim El-Refai*, Tianle Li
EMNLP, 2024
arXiv

We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop. SWAG can optimize open-sourced LLMs to substantially outperform previous end-to-end story generation techniques leveraging closed-source models.

Exploring Diffusion and Flow Matching Under Generator Matching
Zeeshan Patel*, James DeLoye, Lance Mathias
Preprint, 2024
arXiv

We explore diffusion and flow matching models under the theoretical framework of generator matching. Our analysis offers a fresh perspective on the relationships between these state-of-the-art generative modeling paradigms and how to build new generative Markov processes that benefit from both approaches.

Test-Time Training for Image Superresolution
Zeeshan Patel*, Yossi Gandelsman
Preprint, 2023
paper / code

We present a self-supervised test-time training approach for fine-tuning image superresolution models to adapt to new test distributions on-the-fly.



Template by Jon Barron.