A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data

dc.contributor.authorTercan, SH
dc.contributor.authorYumak, A
dc.contributor.authorColak, UC
dc.contributor.authorOzcanan, S
dc.date.accessioned2026-01-19T11:06:12Z
dc.date.available2026-01-19T11:06:12Z
dc.date.issuedDec 4 2025en_US
dc.departmentHKÜ, Mühendislik Fakültesi, İnşaat Mühendisliği Bölümüen_US
dc.description.abstractTraffic accidents remain a major public safety concern, particularly in regions where rapid motorization and limited infrastructure increase crash risk. This study proposes a machine learning-based framework to classify traffic accident severity and identify high-risk road segments using multidimensional crash data from & Scedil;& imath;rnak Province, Turkey. The dataset, obtained from the General Directorate of Security (EGM), contains 29 variables describing traffic, geometric, and operational roadway characteristics for crashes reported between 2018 and 2023. Due to the severe imbalance between injury and fatal crashes, the Synthetic Minority Oversampling Technique (SMOTE) was applied to enhance model sensitivity to the minority class. Five classifiers-Logistic Regression (LR), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)-were trained and evaluated using accuracy, F1-score, ROC-AUC, and alarm metrics. Results from the original dataset showed that several models struggled to detect fatal crashes, while LR demonstrated moderate sensitivity. After SMOTE, performance improved across all models. XGBoost achieved the highest F1-score (0.61) with the lowest False Alarm rate (0.01), followed by RF and MLP, whereas SVM and LR yielded comparatively lower accuracy. Computation time analysis indicated that LR and SVM had the fastest runtimes, while MLP and XGBoost required longer training times. Overall, findings highlight the effectiveness of ensemble models-particularly XGBoost-in capturing critical crash patterns and supporting risk-based decision-making. Future work should incorporate time-series analysis and GIS-based spatial modeling to further enhance predictive capability and inform geographically targeted safety interventions.en_US
dc.identifier.citationTercan, SH, Yumak, A, Colak, UC & Ozcanan, S (Dec 4 2025). A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data. Applied Sciences-Basel. (15,23). https://doi.org/10.3390/app152312824.en_US
dc.identifier.doi10.3390/app152312824
dc.identifier.issn2076-3417
dc.identifier.issue23en_US
dc.identifier.orcid0000-0002-1729-6421en_US
dc.identifier.scopus2-s2.0-105024688292
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.3390/app152312824
dc.identifier.urihttps://hdl.handle.net/20.500.11782/5160
dc.identifier.volume15en_US
dc.indekslendigikaynakScopus
dc.language.isoen
dc.relation.ispartofApplied Sciences-Basel
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.snmzHKUDK
dc.subjecttraffic accidenten_US
dc.subjectmachine learningen_US
dc.subjectclassificationen_US
dc.subjectaccident severity predictionen_US
dc.subjectXGBoosten_US
dc.subjectdata miningen_US
dc.titleA Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data
dc.typeArticle

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
applsci-15-12824.pdf
Boyut:
2.52 MB
Biçim:
Adobe Portable Document Format
Açıklama:

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: