Evaluación comparativa de los algoritmos de aprendizaje automático Support Vector Machine y Random Forest: efectos del tamaño del conjunto de entrenamiento

Julián Garzón Barrero; Nancy Estela Sánchez Pineda; Darío Fernando Londoño Pinilla

doi:10.18359/rcin.6996

Julián Garzón Barrero Universidad del Quindío
Nancy Estela Sánchez Pineda http://orcid.org/0009-0008-4259-9505
Darío Fernando Londoño Pinilla Universidad del Quindío

Keywords: Machine Learning (ML), Object-Based Image Analysis (OBIA), Support Vector Machine (SVM), Random Trees (RT), Training Samples, Satellite Image Classification, Geomatic Engineering, Remote Sensing

Abstract Authors Downloads References How to Cite

Abstract

This study examined the performance of Support Vector Machine (SVM) and Random Forest (RF) algorithms using an Object-Based Image Analysis (OBIA) model in the metropolitan area of Barranquilla, Colombia. The purpose was to investigate how changes in training set size and imbalance in land cover classes influence the accuracy of classifier models. Kappa coefficient values and overall accuracy consistently revealed that SVM outperformed RF. Additionally, the inability to calibrate certain SVM parameters in arcgis Pro posed challenges. The choice of the number of trees in RF proved to be crucial, with a limited number of trees (50) affecting the model’s adaptability, especially in imbalanced datasets. This study highlights the complexity of choosing and configuring machine learning models, emphasizing the importance of carefully considering class proportions and homogeneity in data distributions to achieve accurate predictions in land use and land cover classification.According to the findings, achieving user accuracies exceeding 90% in clean grass, forests, road networks, and continental water classes, using the SVM model in arcgis Pro, requires assigning training samples covering 2%, 1%, 3%, and 8% of the classified area, respectively.

Author Biographies

Julián Garzón Barrero, Universidad del Quindío

Ph.D. en Ingeniería Geomática, magíster en Sistemas de Información Geográfica, especialista en Geomática.Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Nancy Estela Sánchez Pineda, http://orcid.org/0009-0008-4259-9505

Magíster en Ingeniería Hidráulica y Medio Ambiente, ingeniera civil. Universidad del Quindío, Programa
de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Darío Fernando Londoño Pinilla, Universidad del Quindío

Magíster en Ingeniería énfasis en Geomática. Licenciado en Matemáticas. Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Downloads

Download data is not yet available.

Author Biographies

Julián Garzón Barrero, Universidad del Quindío

Ph.D. en Ingeniería Geomática, magíster en Sistemas de Información Geográfica, especialista en Geomática.Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Nancy Estela Sánchez Pineda, http://orcid.org/0009-0008-4259-9505

Magíster en Ingeniería Hidráulica y Medio Ambiente, ingeniera civil. Universidad del Quindío, Programa
de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Darío Fernando Londoño Pinilla, Universidad del Quindío

Magíster en Ingeniería énfasis en Geomática. Licenciado en Matemáticas. Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

References

S. M. Oswald et al., "Using urban climate modelling and improved land use classifications to support climate change adaptation in urban environments: A case study for the city of Klagenfurt, Austria", Urban Clim., vol. 11, no. 10, p. 1692, mar., 2020, https://doi.org/10.1016/j.uclim.2020.100582

S. Afrin, A. Gupta, B. Farjad, M. Razu Ahmed, G. Achari y Q. Hassan, "Development of land-use/land-cover maps using landsat-8 and MODIS data, and their integration for hydro-ecological applications", Sensors, vol. 19, no. 22, p. 4891, nov., 2019,https://doi.org/10.3390/s19224891

K. Vatitsi et al., "LULC Change Effects on Environmental Quality and Ecosystem Services Using EO Data in Two Rural River Basins in Thrace, Greece", Land, vol. 12, no. 6, p. 1140, mayo, 2023, https://doi.org/10.3390/land12061140

C. Zhang y X. Li, "Land Use and Land Cover Mapping in the Era of Big Data", Land, vol. 11, no. 10, sept., 2022,https://doi.org/10.3390/land11101692

B. Rimal, L. Zhang, H. Keshtkar, B. N. Haack, S. Rijal y P. Zhang, "Land use/land cover dynamics and modeling of urban land expansion by the integration of cellular automata and markov chain", ISPRS Int. J. Geo-Information, vol. 7, no. 4, p. 154, abr., 2018,https://doi.org/10.3390/ijgi7040154

S. Dahhani, M. Raji, M. Hakdaoui y R. Lhissou, "Land Cover Mapping Using Sentinel-1 Time-Series Data and Machine-Learning Classifiers in Agricultural Sub-Saharan Landscape", Remote Sens., vol. 15, no. 1, p. 65, dic., 2022,https://doi.org/10.3390/rs15010065

R. Showstack, "Landsat 9 Satellite Continues Half-Century of Earth Observations," Bioscience, vol. 72, no. 3, pp. 226-232, mar., 2022,https://doi.org/10.1093/biosci/biab145

H. You, X. Tang, W. Deng, H. Song, Y. Wang y J. Chen, "A study on the difference of LULC classification results based on Landsat 8 and Landsat 9 data", Sustainability, vol. 14, no. 21, p. 13730, oct., 2022,https://doi.org/10.3390/su142113730

A. E. Maxwell, T. A. Warner y F. Fang, "Implementation of machine-learning classification in remote sensing: An applied review", Int. J. Remote Sens., vol. 39, no. 9, pp. 2784-2817, feb., 2018, https://doi.org/10.1080/01431161.2018.1433343

D. Lu y Q. Weng, "A survey of image classification methods and techniques for improving classification performance", Int. J. Remote Sens., vol. 28, no. 5, pp. 823-870, mar., 2007,https://doi.org/10.1080/01431160600746456

N. Wu, L. G. T. Crusiol, G. Liu, D. Wuyun y G. Han, "Comparing Machine Learning Algorithms for Pixel/Object-Based Classifications of Semi-Arid Grassland in Northern China Using Multisource Medium Resolution Imageries", Remote Sens., vol. 15, no. 3, p. 750, ene., 2023, https://doi.org/10.3390/rs15030750

E. Y. Boateng, J. Otoo y D. A. Abaye, "Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review", J. Data Anal. Inf. Process., vol. 8, no. 4, pp. 341-357, nov., 2020,https://doi.org/10.4236/jdaip.2020.84020

C. Zhang, Y. Liu y N. Tie, "Forest Land Resource Information Acquisition with Sentinel-2 Image Utilizing Support Vector Machine, K-Nearest Neighbor, Random Forest, Decision Trees and Multi-Layer Perceptron", Forests, vol. 14, no. 2, p. 254, ene., 2023,https://doi.org/10.3390/f14020254

T. K. Oo, N. Arunrat, S. Sereenonchai, A. Ussawarujikulchai, U. Chareonwong y W. Nutmagul, "Comparing Four Machine Learning Algorithms for Land Cover Classification in Gold Mining: A Case Study of Kyaukpahto Gold Mine, Northern Myanmar", Sustainability, vol. 14, no. 17, p. 10754, ago., 2022,https://doi.org/10.3390/su141710754

Y. Ouma et al., "Comparison of Machine Learning Classifiers for Multitemporal and Multisensor Mapping of Urban Lulc Features", Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS Arch., vol. XLIII-B3-2, pp. 681-689, 2022,https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-681-2022

J. S. Deng, K. Wang, Y. H. Deng y G. J. Qi, "PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data," Int. J. Remote Sens., vol. 29, no. 16, pp. 4823-4838, jul., 2008,https://doi.org/10.1080/01431160801950162

M. Pfeifer, M. Disney, T. Quaife y R. Marchant, "Terrestrial ecosystems from space: A review of earth observation products for macroecology applications," Glob. Ecol. Biogeogr., vol. 21, no. 6, pp. 603-624, oct., 2011,https://doi.org/10.1111/j.1466-8238.2011.00712.x

P. Lourenço, A. C. Teodoro, J. A. Gonçalves, J. P. Honrado, M. Cunha y N. Sillero, "Assessing the performance of different OBIA software approaches for mapping invasive alien plants along roads with remote sensing data," Int. J. Appl. Earth Obs. Geoinf., vol. 95, p. 102263, mar., 2021, https://doi.org/10.1016/j.jag.2020.102263

Q. Feng, Y. Li y B. Yang, "Modeling Land Seismic Exploration Random Noise in a Weakly Heterogeneous Medium and the Application to the Training Set," IEEE Geosci. Remote Sens. Lett., vol. 17, no. 4, pp. 1-5, abr., 2020,https://doi.org/10.1109/LGRS.2019.2926756

A. Jamali, "Evaluation and comparison of eight machine learning models in land use/land cover mapping using Landsat 8 OLI: a case study of the northern region of Iran," SN Appl. Sci., vol. 1, p. 1448, oct., 2019,https://doi.org/10.1007/s42452-019-1527-8

S. Basheer et al., "Comparison of Land Use Land Cover Classifiers Using Different Satellite Imagery and Machine Learning Techniques," Remote Sens., vol. 14, no. 19, p. 4978, oct., 2022, https://doi.org/10.3390/rs14194978

Y. G. Yuh, W. Tracz, H. D. Matthews y S. E. Turner, "Application of machine learning approaches for land cover monitoring in northern Cameroon," Ecol. Inform., vol. 74, p. 101955, mayo, 2023,https://doi.org/10.1016/j.ecoinf.2022.101955

M. Azadbakht, C. S. Fraser y K. Khoshelham, "Synergy of sampling techniques and ensemble classifiers for classification of urban environments using full-waveform LiDAR data," Int. J. Appl. Earth Obs. Geoinf., vol. 73, pp. 277-291, dic., 2018,https://doi.org/10.1016/j.jag.2018.06.009

Alcaldía de Barranquilla, "Plan de Desarrollo. Soy Barranquilla 2020-2023," 2020. https://www.barranquilla.gov.co/transparencia/normatividad/normativa-de-la-entidad/politicas-lineamientos-y-manuales/plan-de-desarrollo

J. Aldana Domínguez, I. Palomo, J. Gutiérrez-Angonese, C. Arnaiz-Schmitz, C. Montes y F. Narvaez, "Assessing the effects of past and future land cover changes in ecosystem services, disservices and biodiversity: A case study in Barranquilla Metropolitan Area (BMA), Colombia," Ecosyst. Serv., vol. 37, p. 100915, jun., 2019,https://doi.org/10.1016/j.ecoser.2019.100915

J. Aldana-Domínguez, C. Montes y J. A. González, "Understanding the past to envision a sustainable future: A social-ecological history of the Barranquilla Metropolitan Area (Colombia)," Sustain., vol. 10, no. 7, p. 2247, jun., 2018,https://doi.org/10.3390/su10072247

A. Tassi, D. Gigante, G. Modica, L. Di Martino y M. Vizzari, "Pixel-vs. Object-based landsat 8 data classification in google earth engine using random forest: The case study of maiella national park," Remote Sens., vol. 13, no. 12, p. 2299, jun., 2021,https://doi.org/10.3390/rs13122299

G. Chander, B. L. Markham y D. L. Helder, "Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors," Remote Sens. Environ., vol. 113, no. 12, pp. 893-903, mayo, 2009,https://doi.org/10.1016/j.rse.2009.01.007

P. S. J. Chavez, "An improved dark-object subtraction technique for atmospheric scattering correction of multispectral data," Remote Sens. Environ., vol. 24, no. 3, pp. 459-479, abr., 1988, https://doi.org/10.1016/0034-4257(88)90019-3

C. Valdivieso-Ros, F. Alonso-Sarria y F. Gomariz-Castillo, "Effect of different atmospheric correction algorithms on sentinel-2 imagery classification accuracy in a semiarid mediterranean area," Remote Sens., vol. 13, no. 9, p. 1770, mayo, 2021,https://doi.org/10.3390/rs13091770

J. D. Revuelta-Acosta, E. S. Guerrero-Luis, J. E. Terrazas-Rodriguez, C. Gomez-Rodriguez y G. A. Perea, "Application of Remote Sensing Tools to Assess the Land Use and Land Cover Change in Coatzacoalcos, Veracruz, Mexico," Appl. Sci., vol. 12, no. 4, p. 1882, feb., 2022, https://doi.org/10.3390/app12041882

J. A. Sobrino, J. C. Jiménez-Muñoz y L. Paolini, "Land surface temperature retrieval from LANDSAT TM 5," Remote Sens. Environ., vol. 90, no. 4, pp. 434-440, abr., 2004, https://doi.org/10.1016/j.rse.2004.02.003

C. A. Ramezan, T. A. Warner y A. E. Maxwell, "Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification," Remote Sens., vol. 11, no. 2, p. 185, ene., 2019,https://doi.org/10.3390/rs11020185

G. M. Foody, "Sample size determination for image classification accuracy assessment and comparison," Int. J. Remote Sens., vol. 30, no. 20, pp. 5273-5291, sep., 2009, https://doi.org/10.1080/01431160903130937

P. Thanh Noi y M. Kappas, "Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery," Sensors, vol. 18, no. 1, p. 18, dic., 2017,https://doi.org/10.3390/s18010018

D. Comaniciu y P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603-619, mayo, 2002.

K. Luo, B. Li y J. P. Moiwo, "Monitoring land-use/land-cover changes at a provincial large scale using an object-oriented technique and medium-resolution remote-sensing images," Remote Sens., vol. 10, no. 12, p. 2012, dic., 2018.https://doi.org/10.3390/rs10122012

Y. Chabalala, E. Adam y K. A. Ali, "Machine Learning Classification of Fused Sentinel-1 and Sentinel-2 Image Data towards Mapping Fruit Plantations in Highly Heterogenous Landscapes," Remote Sens., vol. 14, no. 11, p. 2621, mayo, 2022.https://doi.org/10.3390/rs14112621

Y. Wei, W. Wang, X. Tang, H. Li, H. Hu y X. Wang, "Classification of Alpine Grasslands in Cold and High Altitudes Based on Multispectral Landsat-8 Images : A Case Study in Sanjiangyuan National Park , China," Remote Sens., vol. 14, no. 15, p. 3714, ago., 2022. https://doi.org/10.3390/rs14153714

G. De Luca et al., "Object-based land cover classification of cork oak woodlands using UAV imagery and Orfeo Toolbox," Remote Sens., vol. 11, no. 10, p. 1238, mayo, 2019. https://doi.org/10.3390/rs11101238

S. Talukdar, P. Singha, S. Mahato, S. Pal, Y. A. Liou y A. Rahman, "Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations-A Review," Remote Sens., vol. 12, no. 7, p. 1135, abr., 2020.https://doi.org/10.3390/rs12071135

G. R. Morgan, C. Wang, Z. Li, S. R. Schill y D. R. Morgan, "Deep Learning of High-Resolution Aerial Imagery for Coastal Marsh Change Detection: A Comparative Study," ISPRS Int. J. Geo-Information, vol. 11, no. 2, p. 100, feb., 2022.https://doi.org/10.3390/ijgi11020100

A. Sabat-Tomala, E. Raczko y B. Zagajewski, "Comparison of support vector machine and random forest algorithms for invasive and expansive species classification using airborne hyperspectral data," Remote Sens., vol. 12, no. 3, p. 516, feb., 2020. https://doi.org/10.3390/rs12030516

M. Wessel, M. Brandmeier y D. Tiede, "Evaluation of different machine learning algorithms for scalable classification of tree types and tree species based on Sentinel-2 data," Remote Sens., vol. 10, no. 9, p. 1419, sept., 2018.https://doi.org/10.3390/rs10091419

X. Li, R. Wang, X. Chen, Y. Li y Y. Duan, "Classification of Transmission Line Corridor Tree Species Based on Drone Data and Machine Learning," Sustainability, vol. 14, no. 14, p. 8273, jul., 2022.https://doi.org/10.3390/su14148273

T. Adugna, W. Xu y J. Fan, "Comparison of Random Forest and Support Vector Machine Classifiers for Regional Land Cover Mapping Using Coarse Resolution FY-3C Images," Remote Sens., vol. 14, no. 3, p. 574, ene., 2022.https://doi.org/10.3390/rs14030574

I. Potić et al., "Improving Forest Detection Using Machine Learning and Remote Sensing: A Case Study in Southeastern Serbia," Appl. Sci., vol. 13, no. 14, p. 8289, jul., 2023. https://doi.org/10.3390/app13148289

A. Mellor, S. Boukir, A. Haywood y S. Jones, "Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin," ISPRS J. Photogramm. Remote Sens., vol. 105, pp. 155-168, jul., 2015. https://doi.org/10.1016/j.isprsjprs.2015.03.014

C. A. Ramezan, T. A. Warner, A. E. Maxwell y B. S. Price, "Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data," Remote Sens., vol. 13, no. 3, p. 368, ene., 2021.https://doi.org/10.3390/rs13030368

A. Zafari, R. Zurita-Milla y E. Izquierdo-Verdiguier, "Evaluating the performance of a Random Forest Kernel for land cover classification," Remote Sens., vol. 11, no. 5, p. 575, mar., 2019. https://doi.org/10.3390/rs11050575

How to Cite

Garzón Barrero, J., Sánchez Pineda, N. E., & Londoño Pinilla, D. F. (2023). Comparative Evaluation of Support Vector Machine and Random Forest Machine Learning Algorithms: Effects of Training Set Size. Ciencia E Ingenieria Neogranadina, 33(2), 131–148. https://doi.org/10.18359/rcin.6996

Download Citation

Comparative Evaluation of Support Vector Machine and Random Forest Machine Learning Algorithms

Effects of Training Set Size

Abstract

Author Biographies

Downloads

Author Biographies

References

Altmetric

Most read articles by the same author(s)

Some similar items:

Make a Submission

Language

indexacion

estadisticas

instrucciones

portico

dora