Performance Analysis of Machine Learning Algorithms on Structured Data Sets

Ishaan Gupta

Independent Researcher

India

Abstract

Machine learning (ML) algorithms have gained significant traction in solving complex engineering problems, especially those involving structured data sets. Structured data, commonly represented in tabular formats, requires efficient algorithms for classification and regression tasks. This study presents a comparative performance analysis of several widely used ML algorithms on structured data sets to identify their strengths and weaknesses in terms of accuracy, computational efficiency, and robustness. The algorithms evaluated include Decision Trees, Random Forests, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Gradient Boosting Machines (GBM). Using benchmark datasets from the UCI Machine Learning Repository, the study employs cross-validation to ensure reliability. Results indicate that ensemble methods such as Random Forest and GBM generally outperform single classifiers in accuracy but demand higher computational resources. This work provides engineers and data scientists with practical insights into selecting appropriate ML algorithms for structured data applications.

Keywords

Machine learning, structured data, decision trees, random forests, support vector machines, classification, regression, performance analysis.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD, 785–794.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Dua, D., & Graff, C. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1), 3133-3181.
Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. Annals of statistics, 1189-1232.
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12, 2825-2830.
Quinlan, J.R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.