ABOUT ME

I am an undergradute student at Rice University studying computer science. I’m applying for Ph.D. programs in the Fall 2024 cycle.

My research interests lie at the intersection of efficient ML and hardware-aware systems. I am interested in developing efficient LLM algorithms, as well as hardware-aware systems that accelerate these emerging architectures.

My most recent project proposed a new fine-tuning paradigm on compressed LLM models using parameter-sharing, demonstrating better accuracy and up to 3x inference efficiency compared to SOTA adapter-based compressive fine-tuning techniques. Our paper is in submission to ICLR’2025.

Previously, I developed a concurrent LLM adapter serving system tailored for structural sparse BOFT adapters, achieving a 2.12x speedup compared to SOTA LoRA adapter serving systems on restricted GPU resource. Our paper is accepted in EMNLP’2024 findings and was presented in ES-FOMO (workshop) at ICML’24.