Comparing the Performance of Machine Learning Algorithms for Predicting Antibiotic Resistance

By: Jameel Ali, Meris Johnson-Hagler, Faye Orcales, Kristiene Recto, John Matthew Suntay, Fayeeza Shaikh, and Lucy Moctezuma

Department: Cellular & Molecular Biology

Faculty Advisors: Dr. Pleuni Pennings and Dr. Scott Roy

Antibiotic resistance has become a global public health concern. Bacteria are evolving resistance to the current arsenal of prescribed antibiotics resulting in strains that are developing multi-drug resistance. Currently, clinics are often performing traditional culture-based assays to determine antibiotic resistance in bacterial strains. However, this method is time-consuming and inaccurate. To determine antibiotic resistance with a greater degree of accuracy and efficiency than traditional methods, we will be utilizing machine learning algorithms. The machine learning algorithms will process publicly available whole genome sequences of E. coli strains to produce Decision Trees, Random Forest, Gradient Boosted Trees, Logistic Regression, and Neural Network. We want to compare the machine learning models to determine which model is the most accurate for predicting antibiotic resistance when using population structure, isolation year, and gene content as features. We aim to use what we have learned from this study to contribute to a future where machine learning can be used as a diagnostic tool to accurately predict antibiotic resistance from whole genome sequencing data.