Predicting the Survival of Titanic Passengers

image credit to this website

Abstract
In 1912, Titanic sank during her maiden voyage, causing thousands of people died. This paper will use several machine learning techniques, namely Principal Component Analysis, Logistic Regression, Ridge Regression, Lasso Regression, Decision Tree, Random Forest, Conditional Forest, Support Vector Machine, and K Nearest Neighbours, to predict whether the given passengers survived or not, based on the information of passengers on board. During the study, our assumption about women and rich people are more likely to survived proven to be true. Our naive prediction model based on such assumption give us 75.60% correctness. Conditional Forest method outperformed other methods, with score 81.34%, followed by K Nearest Neighbours and Support Vector Machine.

Yuan Bian
Yuan Bian
Incoming Postdoc in Biostatistics