I am a Research Scientist at Google Android XR — Multimodal AI for Glasses, building multimodal LLMs that run in real time on wearable hardware. My work bridges foundation-model research and the hardware that ships to users: from 3D action recognition (InfoGCN, 500+ citations) to speech foundation model compression (DiceHuBERT) to on-device LLMs for Ray-Ban Meta glasses and Ray-Ban Meta Display.
Previously, I was a Research Engineer at Meta Reality Labs and an AIML Resident (converted to ML Engineer) at Apple Siri Speech & Understanding. I received my Ph.D. in ECE from Purdue University, advised by Prof. Karthik Ramani. 17+ papers at CVPR / ECCV / TPAMI / Interspeech, 1,000+ citations, and 7+ patents in multimodal AI, on-device foundation models, and human activity understanding.
I work on bringing foundation models out of the datacenter and onto the devices people actually wear — glasses, phones, and AR/VR headsets. That means tight compute, memory, and power budgets, and a different set of research questions than cloud-scale LLMs: What does efficient multimodal pretraining look like when the target is a 4-bit model? How do you distill a large speech encoder without losing downstream transfer? What representations compose vision, audio, language, and sensor streams into a single on-device policy?
📫 stnoah1@gmail.com — reach out if you’re working on on-device multimodal foundation models for wearables. Always happy to chat research or collaborations.
Powered by Jekyll and Minimal Light theme.