ISEG

Aluno: JosÉ Diogo Sequeira BertÃo

Resumo

ABSTRACT With an emphasis on the creation of machine learning models for the early diagnosis of diabetes, this thesis investigates the potential of data analytics in the healthcare industry. This study attempts to address the growing global prevalence of diabetes and the urgent need for accurate and widely available early detection techniques at a low cost. Even though they are useful, traditional diagnostic techniques frequently detect diseases at a later stage and may not be accessible to everyone in need, which raises the possibility of harsh consequences. This study uses a dataset from Kaggle that includes several features relevant to the diagnosis of diabetes to create prediction models that try to detect the disease early on. The methodology employed involves the application of several machine learning techniques, including Logistic Regression, Decision Tree, Random Forest, Extreme Gradient Boosting (XGBoost), Gradient Boosting, Naive Bayes, K-Nearest-Neighbors and Neural Networks (Multi-layer Perceptron), implemented in Python. These models were evaluated based on their accuracy and precision metrics for diabetes detection. Furthermore, this thesis also delves into the importance of feature selection to enhance the predictive performance of the models. The primary findings of this study highlight how data analytics can transform healthcare, especially in managing chronic diseases. The machine learning models that were created showed good levels of accuracy, suggesting that data-driven procedures can greatly enhance conventional diagnostic techniques. In addition to supporting current initiatives to prevent diabetes by early identification, this work sheds light on the wider health implications of data analytics and offers directions for future investigation into the use of technology to enhance medical outcomes.

Trabalho final de Mestrado

TFM_JosÉ Diogo Sequeira BertÃo