Gary Wei | Machine Learning

ByteDance Ltd.

1199 Coleman Ave.

San Jose, CA 95110

My research interests lie at the intersection of Machine Learning Systems, High Performance Computing, and AI for Science, where I leverage my mathematical and engineering talents to pioneer cutting-edge solutions. I was a Research Assistant at Cornell Relax ML Lab advised by Prof. Chris De Sa on efficient machine learning algorithms and systems. Grounded in mathematical principles, our work aims to expedite large-scale, high-performance machine learning systems that are efficient, parallel, and distributed in real-world settings. Parallel to this, I collaborate on AI-driven molecule generation with talented Ph.D. students.

My academic journey has led me to a Master of Engineering in Computer Science at Cornell University. Prior to this, I pursued dual B.S. degrees in Computer Science and Mathematics at the University of Massachusetts Amherst, complemented by a minor in Japanese. My specialization in Mathematics focused on Applied Math and Scientific Computing. Prior to my graduation, I engaged in research at the UMass BioNLP Lab under Prof. Hong Yu, concentrating on bio-medical and clinical NLP applications within Electronic Health Records.

Check out garywei.dev for my personal website!

news

Sep 26, 2024	I am excited to share that our paper, “Navigating Chemical Space with Latent Flows,” has been accepted by NeurIPS 2024.
Jun 16, 2024	I am thrilled to announce that our paper, “Navigating Chemical Space with Latent Flows,” received the Spotlight (Top 10%) at the ICML 2024 AI for Science workshop.
Feb 29, 2024	I am happy to continue my position at the Relax Lab at Cornell University as a graduate researcher.
Dec 31, 2023	I am delighted to have graduated from Cornell University with a Master of Engineering in Computer Science.

latest posts

Apr 06, 2022	CTF Crypto Challenge Writeup: Cracking the Hatmash Hash Function with Matrix Operations
Apr 04, 2022	Advanced CTF Writeup: Exploiting Buffer Overflow in ZIP Parser with ret2dlresolve
Mar 04, 2021	简单理解Bash中子进程(child process)和子shell (subshell)的区别以及SHLVL和BASH_SUBSHELL

selected publications

NeurIPS 2024
Navigating Chemical Space with Latent Flows

Guanghao Wei^*, Yining Huang^*, Chenru Duan, Yue Song^†, and Yuanqi Du^†

In Advances in Neural Information Processing Systems, 2024

Abs DOI arXiv Bib PDF Supp Code Poster

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.
@inproceedings{wei2024navigating, title = {Navigating Chemical Space with Latent Flows}, author = {Wei, Guanghao and Huang, Yining and Duan, Chenru and Song, Yue and Du, Yuanqi}, year = {2024}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Globerson, A. and Mackey, L. and Belgrave, D. and Fan, A. and Paquet, U. and Tomczak, J. and Zhang, C.}, pages = {58663--58697}, volume = {37}, publisher = {Curran Associates, Inc.}, doi = {10.48550/arXiv.2405.03987}, archiveprefix = {arXiv}, eprint = {2405.03987}, primaryclass = {cs.LG}, url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/6bbefb73c0ede70635823a18426b9208-Paper-Conference.pdf}, }
GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch

Guanghao Wei

2023

Abs DOI arXiv Bib PDF Code Website

The online Gradient Balancing (GraB) algorithm greedily choosing the examples ordering by solving the herding problem using per-sample gradients is proved to be the theoretically optimal solution that guarantees to outperform Random Reshuffling. However, there is currently no efficient implementation of GraB for the community to easily use it.

This work presents an efficient Python library, GraB-sampler, that allows the community to easily use GraB algorithms and proposes 5 variants of the GraB algorithm. The best performance result of the GraB-sampler reproduces the training loss and test accuracy results while only in the cost of 8.7% training time overhead and 0.85% peak GPU memory usage overhead.
@misc{wei2023grabsampler, title = {GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch}, author = {Wei, Guanghao}, year = {2023}, doi = {10.48550/arXiv.2309.16809}, archiveprefix = {arXiv}, eprint = {2309.16809}, primaryclass = {cs.LG}, }