I am a Postdoctoral Research Scientist in the Department of Biostatistics at Columbia University, working with Professor Yuanjia Wang. Currently, I also serve on the Statistical Society of Canada ’s Student and Recent Graduates Committee.
I was very fortunate to be supervised by Professors Grace Y. Yi, Wenqing He, Shelley B. Bull, and Rahim Moineddin during my PhD and Master’s studies. I also have the privilege to work as research trainees at the Fields Institute for Research in Mathematical Sciences, Lunenfeld-Tanenbaum Research Institute, University of Toronto, and University of Wetsern Ontario.
During my graduate studies, my research focused on developing statistical methodologies and machine learning techniques to address challenges posed by noisy data, including missing data, censored data, and measurement errors. My current research explores machine learning and reinforcement learning approaches, with applications to interventional and observational biomedical data, such as electronic health records, multi-modal neuroimaging/omics, behavioral tasks, clinical assessments, and environmental exposures.
PhD in Statistics, 2024
University of Western Ontario
MSc in Biostatistics, Thesis Option, 2020
University of Toronto
HBSc in Mathematics and Statistics, 2017
University of Toronto
This work investigates joint models for genetic association with longitudinal biomarkers and time-to-event outcomes, developing a closed-form sample size formula, using spline functions for robustness against non-linearity, and evaluating validity, sensitivity, and power through simulations and application to genetic data from the Diabetes Control and Complications Trial.
This work evaluates the efficacy of Diclectin for nausea and vomiting during pregnancy (NVP) using a double-blind randomized controlled trial, finding that statistical inferences about its effectiveness depend on the choice of missing data methods and models, with results suggesting its benefit is not clinically significant under the pre-specified minimal clinically important difference.
This work addresses challenges in missing data analysis by proposing a unified modeling framework that uses generalized additive models to handle various missing data mechanisms without requiring specific assumptions, while incorporating regularized likelihood for concurrent estimation and variable selection, with rigorous theoretical guarantees and demonstrated empirical performance.