Machine Learning Applications
image credit to this websiteThis work presents three studies that apply machine learning and statistical techniques to understand and predict outcomes in diverse real-world contexts.
The first study investigates the factors influencing the popularity of TED talks, a global platform designed to disseminate innovative ideas. Using network analysis, principal component analysis (PCA), and the least absolute shrinkage and selection operator (LASSO), we develop a predictive model for TED talk popularity. Our results show that the number of available languages is the most influential factor, contributing at least three times more than other variables, while publishing talks during weekends, particularly on Saturdays—significantly enhances audience reach.
The second study examines the impact of geographical and meteorological conditions on pipeline failures. Two machine learning models, neural networks and random forests, are employed to identify key predictive variables. The results indicate that meteorological factors play a dominant role, with snow on ground, total snow, and total rainfall emerging as the most important predictors. Among these, snow on ground consistently appears as the most influential variable, while geographical characteristics show limited contribution to pipeline failure prediction.
The third study focuses on predicting passenger survival in the 1912 Titanic disaster using multiple machine learning methods, including logistic regression, ridge regression, LASSO, decision trees, random forests, conditional forests, support vector machines, and k-nearest neighbors. The analysis confirms that women and passengers with higher socioeconomic status had higher survival probabilities. A simple rule-based model based on this assumption achieved an accuracy of 75.60%, while the conditional forest method achieved the highest predictive performance with an accuracy of 81.34%.
Overall, these studies demonstrate how machine learning and statistical modeling can provide insights into complex systems and improve predictive performance across domains ranging from social media analytics to infrastructure management and historical data analysis.


