Evaluación comparativa de los algoritmos de aprendizaje automático Support Vector Machine y Random Forest: efectos del tamaño del conjunto de entrenamiento

Julián Garzón Barrero; Nancy Estela Sánchez Pineda; Darío Fernando Londoño Pinilla

doi:10.18359/rcin.6996

Julián Garzón Barrero Universidad del Quindío
Nancy Estela Sánchez Pineda http://orcid.org/0009-0008-4259-9505
Darío Fernando Londoño Pinilla Universidad del Quindío

Palabras clave: Machine Learning (ML), Object-Based Image Analysis (OBIA), Support Vector Machine (SVM), Random Trees (RT), muestras de entrenamiento, clasificación de imágenes satelitales, ingeniería geomática, Teledetección

Resumen Autores/as Descargas Referencias bibliográficas Cómo citar

Resumen

En el presente estudio se examinó el rendimiento de los algoritmos Support Vector Machine (SVM) y Random Forest (RF) utilizando un modelo de segmentación de imágenes basado en objetos (OBIA) en la zona metropolitana de Barranquilla, Colombia. El propósito fue investigar de qué manera los cambios en el tamaño de los conjuntos de entrenamiento y el desequilibrio en las clases de cobertura terrestre influyen en la precisión de los modelos clasificadores. Los valores del coeficiente Kappa y la precisión general revelaron que svm superó consistentemente a RF. Además, la imposibilidad de calibrar ciertos parámetros de SVM en ArcGIS Pro planteó desafíos. La elección del número de árboles en RF mostró ser fundamental, con un número limitado de árboles (50) que afectó la adaptabilidad del modelo, especialmente en conjuntos de datos desequilibrados. Este estudio resalta la complejidad de elegir y configurar modelos de aprendizaje automático, que acentúan la importancia de considerar cuidadosamente las proporciones de clases y la homogeneidad en las distribuciones de datos para lograr predicciones precisas en la clasificación de uso del suelo y cobertura terrestre. Según los hallazgos, alcanzar precisiones de usuario superiores al 90 % en las clases de pastos limpios, bosques, red vial y agua continental, mediante el modelo svm en ArcGIS Pro, requiere asignar muestras de entrenamiento que cubran respectivamente el 2 %, 1 %, 3 % y 8 % del área clasificada.

Biografía del autor/a

Julián Garzón Barrero, Universidad del Quindío

Ph.D. en Ingeniería Geomática, magíster en Sistemas de Información Geográfica, especialista en Geomática.Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Nancy Estela Sánchez Pineda, http://orcid.org/0009-0008-4259-9505

Magíster en Ingeniería Hidráulica y Medio Ambiente, ingeniera civil. Universidad del Quindío, Programa
de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Darío Fernando Londoño Pinilla, Universidad del Quindío

Magíster en Ingeniería énfasis en Geomática. Licenciado en Matemáticas. Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Julián Garzón Barrero, Universidad del Quindío

Ph.D. en Ingeniería Geomática, magíster en Sistemas de Información Geográfica, especialista en Geomática.Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Nancy Estela Sánchez Pineda, http://orcid.org/0009-0008-4259-9505

Magíster en Ingeniería Hidráulica y Medio Ambiente, ingeniera civil. Universidad del Quindío, Programa
de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Darío Fernando Londoño Pinilla, Universidad del Quindío

Magíster en Ingeniería énfasis en Geomática. Licenciado en Matemáticas. Universidad del Quindío, Programa de Ingeniería Topográfica y Geomática, Armenia, Colombia.

Referencias bibliográficas

S. M. Oswald et al., "Using urban climate modelling and improved land use classifications to support climate change adaptation in urban environments: A case study for the city of Klagenfurt, Austria", Urban Clim., vol. 11, no. 10, p. 1692, mar., 2020, https://doi.org/10.1016/j.uclim.2020.100582

S. Afrin, A. Gupta, B. Farjad, M. Razu Ahmed, G. Achari y Q. Hassan, "Development of land-use/land-cover maps using landsat-8 and MODIS data, and their integration for hydro-ecological applications", Sensors, vol. 19, no. 22, p. 4891, nov., 2019,https://doi.org/10.3390/s19224891

K. Vatitsi et al., "LULC Change Effects on Environmental Quality and Ecosystem Services Using EO Data in Two Rural River Basins in Thrace, Greece", Land, vol. 12, no. 6, p. 1140, mayo, 2023, https://doi.org/10.3390/land12061140

C. Zhang y X. Li, "Land Use and Land Cover Mapping in the Era of Big Data", Land, vol. 11, no. 10, sept., 2022,https://doi.org/10.3390/land11101692

B. Rimal, L. Zhang, H. Keshtkar, B. N. Haack, S. Rijal y P. Zhang, "Land use/land cover dynamics and modeling of urban land expansion by the integration of cellular automata and markov chain", ISPRS Int. J. Geo-Information, vol. 7, no. 4, p. 154, abr., 2018,https://doi.org/10.3390/ijgi7040154

S. Dahhani, M. Raji, M. Hakdaoui y R. Lhissou, "Land Cover Mapping Using Sentinel-1 Time-Series Data and Machine-Learning Classifiers in Agricultural Sub-Saharan Landscape", Remote Sens., vol. 15, no. 1, p. 65, dic., 2022,https://doi.org/10.3390/rs15010065

R. Showstack, "Landsat 9 Satellite Continues Half-Century of Earth Observations," Bioscience, vol. 72, no. 3, pp. 226-232, mar., 2022,https://doi.org/10.1093/biosci/biab145

H. You, X. Tang, W. Deng, H. Song, Y. Wang y J. Chen, "A study on the difference of LULC classification results based on Landsat 8 and Landsat 9 data", Sustainability, vol. 14, no. 21, p. 13730, oct., 2022,https://doi.org/10.3390/su142113730

A. E. Maxwell, T. A. Warner y F. Fang, "Implementation of machine-learning classification in remote sensing: An applied review", Int. J. Remote Sens., vol. 39, no. 9, pp. 2784-2817, feb., 2018, https://doi.org/10.1080/01431161.2018.1433343

D. Lu y Q. Weng, "A survey of image classification methods and techniques for improving classification performance", Int. J. Remote Sens., vol. 28, no. 5, pp. 823-870, mar., 2007,https://doi.org/10.1080/01431160600746456

N. Wu, L. G. T. Crusiol, G. Liu, D. Wuyun y G. Han, "Comparing Machine Learning Algorithms for Pixel/Object-Based Classifications of Semi-Arid Grassland in Northern China Using Multisource Medium Resolution Imageries", Remote Sens., vol. 15, no. 3, p. 750, ene., 2023, https://doi.org/10.3390/rs15030750

E. Y. Boateng, J. Otoo y D. A. Abaye, "Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review", J. Data Anal. Inf. Process., vol. 8, no. 4, pp. 341-357, nov., 2020,https://doi.org/10.4236/jdaip.2020.84020

C. Zhang, Y. Liu y N. Tie, "Forest Land Resource Information Acquisition with Sentinel-2 Image Utilizing Support Vector Machine, K-Nearest Neighbor, Random Forest, Decision Trees and Multi-Layer Perceptron", Forests, vol. 14, no. 2, p. 254, ene., 2023,https://doi.org/10.3390/f14020254

T. K. Oo, N. Arunrat, S. Sereenonchai, A. Ussawarujikulchai, U. Chareonwong y W. Nutmagul, "Comparing Four Machine Learning Algorithms for Land Cover Classification in Gold Mining: A Case Study of Kyaukpahto Gold Mine, Northern Myanmar", Sustainability, vol. 14, no. 17, p. 10754, ago., 2022,https://doi.org/10.3390/su141710754

Y. Ouma et al., "Comparison of Machine Learning Classifiers for Multitemporal and Multisensor Mapping of Urban Lulc Features", Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS Arch., vol. XLIII-B3-2, pp. 681-689, 2022,https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-681-2022

J. S. Deng, K. Wang, Y. H. Deng y G. J. Qi, "PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data," Int. J. Remote Sens., vol. 29, no. 16, pp. 4823-4838, jul., 2008,https://doi.org/10.1080/01431160801950162

M. Pfeifer, M. Disney, T. Quaife y R. Marchant, "Terrestrial ecosystems from space: A review of earth observation products for macroecology applications," Glob. Ecol. Biogeogr., vol. 21, no. 6, pp. 603-624, oct., 2011,https://doi.org/10.1111/j.1466-8238.2011.00712.x

P. Lourenço, A. C. Teodoro, J. A. Gonçalves, J. P. Honrado, M. Cunha y N. Sillero, "Assessing the performance of different OBIA software approaches for mapping invasive alien plants along roads with remote sensing data," Int. J. Appl. Earth Obs. Geoinf., vol. 95, p. 102263, mar., 2021, https://doi.org/10.1016/j.jag.2020.102263

Q. Feng, Y. Li y B. Yang, "Modeling Land Seismic Exploration Random Noise in a Weakly Heterogeneous Medium and the Application to the Training Set," IEEE Geosci. Remote Sens. Lett., vol. 17, no. 4, pp. 1-5, abr., 2020,https://doi.org/10.1109/LGRS.2019.2926756

A. Jamali, "Evaluation and comparison of eight machine learning models in land use/land cover mapping using Landsat 8 OLI: a case study of the northern region of Iran," SN Appl. Sci., vol. 1, p. 1448, oct., 2019,https://doi.org/10.1007/s42452-019-1527-8

S. Basheer et al., "Comparison of Land Use Land Cover Classifiers Using Different Satellite Imagery and Machine Learning Techniques," Remote Sens., vol. 14, no. 19, p. 4978, oct., 2022, https://doi.org/10.3390/rs14194978

Y. G. Yuh, W. Tracz, H. D. Matthews y S. E. Turner, "Application of machine learning approaches for land cover monitoring in northern Cameroon," Ecol. Inform., vol. 74, p. 101955, mayo, 2023,https://doi.org/10.1016/j.ecoinf.2022.101955

M. Azadbakht, C. S. Fraser y K. Khoshelham, "Synergy of sampling techniques and ensemble classifiers for classification of urban environments using full-waveform LiDAR data," Int. J. Appl. Earth Obs. Geoinf., vol. 73, pp. 277-291, dic., 2018,https://doi.org/10.1016/j.jag.2018.06.009

Alcaldía de Barranquilla, "Plan de Desarrollo. Soy Barranquilla 2020-2023," 2020. https://www.barranquilla.gov.co/transparencia/normatividad/normativa-de-la-entidad/politicas-lineamientos-y-manuales/plan-de-desarrollo

J. Aldana Domínguez, I. Palomo, J. Gutiérrez-Angonese, C. Arnaiz-Schmitz, C. Montes y F. Narvaez, "Assessing the effects of past and future land cover changes in ecosystem services, disservices and biodiversity: A case study in Barranquilla Metropolitan Area (BMA), Colombia," Ecosyst. Serv., vol. 37, p. 100915, jun., 2019,https://doi.org/10.1016/j.ecoser.2019.100915

J. Aldana-Domínguez, C. Montes y J. A. González, "Understanding the past to envision a sustainable future: A social-ecological history of the Barranquilla Metropolitan Area (Colombia)," Sustain., vol. 10, no. 7, p. 2247, jun., 2018,https://doi.org/10.3390/su10072247

A. Tassi, D. Gigante, G. Modica, L. Di Martino y M. Vizzari, "Pixel-vs. Object-based landsat 8 data classification in google earth engine using random forest: The case study of maiella national park," Remote Sens., vol. 13, no. 12, p. 2299, jun., 2021,https://doi.org/10.3390/rs13122299

G. Chander, B. L. Markham y D. L. Helder, "Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors," Remote Sens. Environ., vol. 113, no. 12, pp. 893-903, mayo, 2009,https://doi.org/10.1016/j.rse.2009.01.007

P. S. J. Chavez, "An improved dark-object subtraction technique for atmospheric scattering correction of multispectral data," Remote Sens. Environ., vol. 24, no. 3, pp. 459-479, abr., 1988, https://doi.org/10.1016/0034-4257(88)90019-3

C. Valdivieso-Ros, F. Alonso-Sarria y F. Gomariz-Castillo, "Effect of different atmospheric correction algorithms on sentinel-2 imagery classification accuracy in a semiarid mediterranean area," Remote Sens., vol. 13, no. 9, p. 1770, mayo, 2021,https://doi.org/10.3390/rs13091770

J. D. Revuelta-Acosta, E. S. Guerrero-Luis, J. E. Terrazas-Rodriguez, C. Gomez-Rodriguez y G. A. Perea, "Application of Remote Sensing Tools to Assess the Land Use and Land Cover Change in Coatzacoalcos, Veracruz, Mexico," Appl. Sci., vol. 12, no. 4, p. 1882, feb., 2022, https://doi.org/10.3390/app12041882

J. A. Sobrino, J. C. Jiménez-Muñoz y L. Paolini, "Land surface temperature retrieval from LANDSAT TM 5," Remote Sens. Environ., vol. 90, no. 4, pp. 434-440, abr., 2004, https://doi.org/10.1016/j.rse.2004.02.003

C. A. Ramezan, T. A. Warner y A. E. Maxwell, "Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification," Remote Sens., vol. 11, no. 2, p. 185, ene., 2019,https://doi.org/10.3390/rs11020185

G. M. Foody, "Sample size determination for image classification accuracy assessment and comparison," Int. J. Remote Sens., vol. 30, no. 20, pp. 5273-5291, sep., 2009, https://doi.org/10.1080/01431160903130937

P. Thanh Noi y M. Kappas, "Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery," Sensors, vol. 18, no. 1, p. 18, dic., 2017,https://doi.org/10.3390/s18010018

D. Comaniciu y P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603-619, mayo, 2002.

K. Luo, B. Li y J. P. Moiwo, "Monitoring land-use/land-cover changes at a provincial large scale using an object-oriented technique and medium-resolution remote-sensing images," Remote Sens., vol. 10, no. 12, p. 2012, dic., 2018.https://doi.org/10.3390/rs10122012

Y. Chabalala, E. Adam y K. A. Ali, "Machine Learning Classification of Fused Sentinel-1 and Sentinel-2 Image Data towards Mapping Fruit Plantations in Highly Heterogenous Landscapes," Remote Sens., vol. 14, no. 11, p. 2621, mayo, 2022.https://doi.org/10.3390/rs14112621

Y. Wei, W. Wang, X. Tang, H. Li, H. Hu y X. Wang, "Classification of Alpine Grasslands in Cold and High Altitudes Based on Multispectral Landsat-8 Images : A Case Study in Sanjiangyuan National Park , China," Remote Sens., vol. 14, no. 15, p. 3714, ago., 2022. https://doi.org/10.3390/rs14153714

G. De Luca et al., "Object-based land cover classification of cork oak woodlands using UAV imagery and Orfeo Toolbox," Remote Sens., vol. 11, no. 10, p. 1238, mayo, 2019. https://doi.org/10.3390/rs11101238

S. Talukdar, P. Singha, S. Mahato, S. Pal, Y. A. Liou y A. Rahman, "Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations-A Review," Remote Sens., vol. 12, no. 7, p. 1135, abr., 2020.https://doi.org/10.3390/rs12071135

G. R. Morgan, C. Wang, Z. Li, S. R. Schill y D. R. Morgan, "Deep Learning of High-Resolution Aerial Imagery for Coastal Marsh Change Detection: A Comparative Study," ISPRS Int. J. Geo-Information, vol. 11, no. 2, p. 100, feb., 2022.https://doi.org/10.3390/ijgi11020100

A. Sabat-Tomala, E. Raczko y B. Zagajewski, "Comparison of support vector machine and random forest algorithms for invasive and expansive species classification using airborne hyperspectral data," Remote Sens., vol. 12, no. 3, p. 516, feb., 2020. https://doi.org/10.3390/rs12030516

M. Wessel, M. Brandmeier y D. Tiede, "Evaluation of different machine learning algorithms for scalable classification of tree types and tree species based on Sentinel-2 data," Remote Sens., vol. 10, no. 9, p. 1419, sept., 2018.https://doi.org/10.3390/rs10091419

X. Li, R. Wang, X. Chen, Y. Li y Y. Duan, "Classification of Transmission Line Corridor Tree Species Based on Drone Data and Machine Learning," Sustainability, vol. 14, no. 14, p. 8273, jul., 2022.https://doi.org/10.3390/su14148273

T. Adugna, W. Xu y J. Fan, "Comparison of Random Forest and Support Vector Machine Classifiers for Regional Land Cover Mapping Using Coarse Resolution FY-3C Images," Remote Sens., vol. 14, no. 3, p. 574, ene., 2022.https://doi.org/10.3390/rs14030574

I. Potić et al., "Improving Forest Detection Using Machine Learning and Remote Sensing: A Case Study in Southeastern Serbia," Appl. Sci., vol. 13, no. 14, p. 8289, jul., 2023. https://doi.org/10.3390/app13148289

A. Mellor, S. Boukir, A. Haywood y S. Jones, "Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin," ISPRS J. Photogramm. Remote Sens., vol. 105, pp. 155-168, jul., 2015. https://doi.org/10.1016/j.isprsjprs.2015.03.014

C. A. Ramezan, T. A. Warner, A. E. Maxwell y B. S. Price, "Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data," Remote Sens., vol. 13, no. 3, p. 368, ene., 2021.https://doi.org/10.3390/rs13030368

A. Zafari, R. Zurita-Milla y E. Izquierdo-Verdiguier, "Evaluating the performance of a Random Forest Kernel for land cover classification," Remote Sens., vol. 11, no. 5, p. 575, mar., 2019. https://doi.org/10.3390/rs11050575

Cómo citar

Garzón Barrero, J., Sánchez Pineda, N. E., & Londoño Pinilla, D. F. (2023). Evaluación comparativa de los algoritmos de aprendizaje automático Support Vector Machine y Random Forest: efectos del tamaño del conjunto de entrenamiento. Ciencia E Ingeniería Neogranadina, 33(2), 131–148. https://doi.org/10.18359/rcin.6996

Descargar cita

Evaluación comparativa de los algoritmos de aprendizaje automático Support Vector Machine y Random Forest

efectos del tamaño del conjunto de entrenamiento

Resumen

Biografía del autor/a

Descargas

Biografía del autor/a

Referencias bibliográficas

Métricas

Algunos artículos similares:

Enviar un artículo

Idioma

indexacion

estadisticas

instrucciones

portico

dora