Scalable Distributed K-Means Clustering Using the Firefly Algorithm with Tree- and Hash-Based Optimization for Big Data

Authors

  • Shivlingappa Battur KLE Technological University, Hubli, Karnataka, India
  • Shashikumar Totad KLE Technological University, Hubli, Karnataka, India
Volume: 16 | Issue: 3 | Pages: 37033-37039 | June 2026 | https://doi.org/10.48084/etasr.17969

Abstract

The rapid growth of digital data has exposed significant limitations in traditional clustering methods, particularly with respect to scalability, computational overhead, and clustering quality. To address these challenges, this paper proposes Firefly–K-Means with Tree- and Hash-based optimization (FKTH), a scalable distributed clustering framework that integrates an adaptive Firefly Algorithm (FA) with K-Means, enhanced through KD-Tree–based distance computation, hash map–based constant-time centroid updates, and Hadoop MapReduce–based parallel processing. The adaptive Firefly component dynamically adjusts attraction, absorption, and randomness parameters during optimization to balance exploration and exploitation and avoid premature convergence. The proposed framework is evaluated on large-scale real-world datasets ranging from 100K to over 1M records across varying cluster node configurations. Experimental results demonstrate that FKTH achieves superior scalability and consistently outperforms existing metaheuristic-based clustering methods in terms of execution time, Silhouette Score, Davies–Bouldin Index (DBI), and F1-score, making it well suited for large-scale distributed data analytics.

Keywords:

distributed clustering, adaptive Firefly Algorithm, K-Means, Hadoop MapReduce, KD-Tree, large-scale data analytics, metaheuristic optimization

References

A. Badshah, A. Daud, R. Alharbey, A. Banjar, A. Bukhari, and B. Alshemaimri, "Big data applications: overview, challenges and future," Artificial Intelligence Review, vol. 57, no. 11, Sept. 2024, Art. no. 290.

S. Battur, N. Tejas, B. Naveenkumar, K. Aditi, T. V, and S. G. Totad, "Scalable Data Clustering Using Firefly Algorithm in Distributed Environment," in 6th International Conference on Data Science and Applications, Jaipur, India, 2025, pp. 348–358.

N. Sikarwar and R. S. Tomar, "A New Approach for Wireless Sensor Networks based on Tree-based Routing using Hybrid Fuzzy C-Means with Genetic Algorithm," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14141–14147, June 2024.

A. M. Ikotun, M. S. Almutari, and A. E. Ezugwu, "K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions," Applied Sciences, vol. 11, no. 23, Dec. 2021, Art. no. 11246.

N. Tremblay, G. Puy, P. Borgnat, R. Gribonval, and P. Vandergheynst, "Accelerated spectral clustering using graph filtering of random signals," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 4094–4098.

K. Golalipour, E. Akbari, S. S. Hamidi, M. Lee, and R. Enayatifar, "From clustering to clustering ensemble selection: A review," Engineering Applications of Artificial Intelligence, vol. 104, Sept. 2021, Art. no. 104388.

X.-S. Yang, Nature-inspired Metaheuristic Algorithms, 2nd ed. Beckington, Somerset, UK: Luniver Press, 2010.

J. Xue and B. Shen, "A novel swarm intelligence optimization approach: sparrow search algorithm," Systems Science & Control Engineering, vol. 8, no. 1, pp. 22–34, Jan. 2020.

S. Battur, R. H. Shrinidhi, A. Kinagi, D. G. Nayana, M. Priya, and S. G. Totad, "Enhancing the Performance of PSO Algorithm for Clustering High-Dimensional Data Using Autoencoders," in International Conference on Data Science and Applications, Jaipur, India, 2023, pp. 515–534.

T. Hassanzadeh and M. R. Meybodi, "A new hybrid approach for data clustering using firefly algorithm and K-means," in The 16th CSI International Symposium on Artificial Intelligence and Signal Processing, Shiraz, Iran, 2012, pp. 007–011.

Q. Li, P. Wang, W. Wang, H. Hu, Z. Li, and J. Li, "An Efficient K-means Clustering Algorithm on MapReduce," in 19th International Conference on Database Systems for Advanced Applications, Bali, Indonesia, 2014, pp. 357–371.

M. M. Saeed, Z. A. Aghbari, and M. Alsharidah, "Big data clustering techniques based on Spark: a literature review," PeerJ Computer Science, vol. 6, Nov. 2020, Art. no. e321.

M. Sherar and F. Zulkernine, "Particle swarm optimization for large-scale clustering on apache spark," in 2017 IEEE Symposium Series on Computational Intelligence, Honolulu, HI, USA, 2017, pp. 1–8.

A. Trindade, "ElectricityLoadDiagrams20112014." UCI Machine Learning Repository, 2015.

J. Blackard, "Covertype." UCI Machine Learning Repository, 1998.

Y. K. C. Sakar, "Online Shoppers Purchasing Intention Dataset." UCI Machine Learning Repository, 2018.

S. B. Henrik Blunck, "Heterogeneity Activity Recognition." UCI Machine Learning Repository, 2015.

Downloads

How to Cite

[1]
S. Battur and S. Totad, “Scalable Distributed K-Means Clustering Using the Firefly Algorithm with Tree- and Hash-Based Optimization for Big Data”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 3, pp. 37033–37039, Jun. 2026.

Metrics

Abstract Views: 8
PDF Downloads: 4

Metrics Information