Predicting Antibiotic Resistance in E. coli Using Machine Learning Models
Author: Melika Teimouri
Faculty Supervisor: Pleuni Pennings
Department: Biology
Antibiotic resistance in Escherichia coli (E. coli) poses a significant public health challenge, requiring swift and accurate identification of drug-resistant strains to optimize patient treatment and curb resistance spread. This study aims to predict antibiotic resistance in E. coli through genomic analysis, evaluating machine learning techniques. Leveraging a one-year dataset from a US hospital, genomic data underwent annotation and pan-genome analysis to identify relevant features. Complementary metadata on antibiotic susceptibility and epidemiology were integrated using Python programming. Random Forest (RF) and Extreme Gradient Boosted Tree (XGBT) algorithms were trained to predict resistance, with performance assessed using accuracy, precision, and recall metrics. Both models showed strong performance, with XGBT slightly outperforming RF across antibiotics. Results focused on Penicillin (Ampicillin), Tetracycline, and Fluoroquinolones (Ciprofloxacin). Identifying key features influencing resistance informs medical decision-making, highlighting machine learning's potential in predicting E. coli antibiotic resistance accurately. Future work includes expanding scope, optimizing accuracy, exploring deep neural networks, and experimenting with other machine learning models, with implications for patient care and resistance mitigation.