Case Study | Yuan Bian

This work applies machine learning and statistical modeling to real-world prediction and inference problems across diverse domains, including social media influence, infrastructure reliability, historical survival analysis, pharmacological evaluation, and healthcare diagnostics. The studies identify key predictive and causal factors, incorporate uncertainty quantification, cost-aware decision-making, and robust missing-data handling, and systematically compare model performance across a range of methodological approaches. Results demonstrate how data-driven methods can uncover important drivers of outcomes, such as language accessibility in TED talk popularity, meteorological conditions in pipeline failures, treatment effects in nausea and vomiting during pregnancy, demographic and socioeconomic factors in Titanic survival, and multimodal clinical indicators in Parkinson’s disease diagnosis. Collectively, the work highlights the importance of interpretability, reliability, sensitivity to modeling choices, and practical efficiency in deploying statistical and machine learning methods in real-world settings.