Optimizing Similar Audience Search in Targeted Advertising: Effectiveness of Siamese Networks for Autoencoder-based User Embeddings
Received: 10 February 2025 | Revised: 12 March 2025 and 26 March 2025 | Accepted: 29 March 2025 | Online: 26 May 2025
Corresponding author: Marat Nurtas
Abstract
This study investigates the effectiveness of using Siamese networks for comparing embedding vectors that describe user profiles. A model was developed to identify similar audiences in the context of targeted advertising. The analysis of the requirements for such a model revealed that traditional approaches to tabular data processing often struggle to address the unique challenges posed by this task, particularly in terms of scalability and adaptability. The proposed approach allows for the effective identification of lookalike users without relying on explicit feature engineering. This method was evaluated using an anonymized proprietary dataset provided by a telecommunications operator, which included sociodemographic descriptions of subscribers, their tariff plans, and mobile devices. Experimental results showed that the model achieved an F1 score of 0.75, a ROC-AUC of 0.79, and a lift score in the top 1 of 12.9, outperforming baseline methods in targeted user identification by 41.61% on average. The results highlight the ability of the proposed method to meet the key requirements for this task, showcasing its effectiveness and scalability. This study highlights the versatility of the proposed approach, emphasizing its applicability across various domains for tabular data classification tasks. Future research will focus on developing multiple autoencoders tailored to different domains and integrating them to solve specific tasks.
Keywords:
user profiling, siamese network, embeddings, audience selection, autoencoder, targeted advertising, cosine similarity distanceDownloads
References
A. Altaibek, I. Tokhtakhunov, M. Nurtas, D. Kozhamzharova, and M. Aitimov, "The Efficacy of Autoencoders in the Utilization of Tabular Data for Classification Tasks," Procedia Computer Science, vol. 238, pp. 492–502, 2024. DOI: https://doi.org/10.1016/j.procs.2024.06.052
K. Mamta and S. Sangwan, "AaPiDL: an ensemble deep learning-based predictive framework for analyzing customer behaviour and enhancing sales in e-commerce systems," International Journal of Information Technology, vol. 16, no. 5, pp. 3019–3025, Jun. 2024. DOI: https://doi.org/10.1007/s41870-024-01796-z
R. Gustriansyah, J. Alie, and N. Suhandi, "A Hybrid Machine Learning Model for Market Clustering," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18824–18828, Dec. 2024. DOI: https://doi.org/10.48084/etasr.9259
W. Wang, "Dimensionality Reduction Task," in Principles of Machine Learning, Springer Nature Singapore, 2025, pp. 481–505. DOI: https://doi.org/10.1007/978-981-97-5333-8_15
S. A. Wegner, "Curse and Blessing of High Dimensionality," in Mathematical Introduction to Data Science, Springer Berlin Heidelberg, 2024, pp. 115–125. DOI: https://doi.org/10.1007/978-3-662-69426-8_8
Y. Bengio, L. Yao, G. Alain, and P. Vincent, "Generalized Denoising Auto-Encoders as Generative Models," in Advances in Neural Information Processing Systems, 2013, vol. 26, [Online]. Available: https://proceedings.neurips.cc/paper/2013/hash/559cb990c9dffd8675f6bc2186971dc2-Abstract.html.
H. S. Lom, A. C. Thoo, W. M. Lim, and K. Y. Koay, "Advertising value and privacy concerns in mobile advertising: the case of SMS advertising in banking," Journal of Financial Services Marketing, vol. 29, no. 3, pp. 1135–1153, Sep. 2024. DOI: https://doi.org/10.1057/s41264-023-00263-3
N. Capuano, M. Meyer, and F. D. Nota, "Analyzing the impact of conversation structure on predicting persuasive comments online," Journal of Ambient Intelligence and Humanized Computing, vol. 15, no. 11, pp. 3719–3732, Nov. 2024. DOI: https://doi.org/10.1007/s12652-024-04841-8
S. Merugu, R. Yadav, V. Pathi, and H. R. Perianayagam, "Identification and Improvement of Image Similarity using Autoencoder," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15541–15546, Aug. 2024. DOI: https://doi.org/10.48084/etasr.7548
W. Lee, S. Lee, H. Kim, and J. Lee, "Sliced Wasserstein adversarial training for improving adversarial robustness," Journal of Ambient Intelligence and Humanized Computing, vol. 15, no. 8, pp. 3229–3242, Aug. 2024. DOI: https://doi.org/10.1007/s12652-024-04791-1
M. A. Javed et al., "Leveraging Convolutional Neural Network (CNN)-based Auto Encoders for Enhanced Anomaly Detection in High-Dimensional Datasets," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17894–17899, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8619
K. M. Ghori, M. Imran, A. Nawaz, R. A. Abbasi, A. Ullah, and L. Szathmary, "Performance analysis of machine learning classifiers for non-technical loss detection," Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 11, pp. 15327–15342, Nov. 2023. DOI: https://doi.org/10.1007/s12652-019-01649-9
P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proceedings of the 25th international conference on machine learning - ICML ’08, Helsinki, Finland, 2008, pp. 1096–1103. DOI: https://doi.org/10.1145/1390156.1390294
Z. Hu, Z. Xiao, H. Sun, and H. Yang, "Autoencoder evolutionary algorithm for large-scale multi-objective optimization problem," International Journal of Machine Learning and Cybernetics, vol. 15, no. 11, pp. 5159–5172, Nov. 2024. DOI: https://doi.org/10.1007/s13042-024-02221-4
T. O. Hodson, "Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not," Geoscientific Model Development, vol. 15, no. 14, pp. 5481–5487, Jul. 2022. DOI: https://doi.org/10.5194/gmd-15-5481-2022
N. Serrano and A. Bellogín, "Siamese neural networks in recommendation," Neural Computing and Applications, vol. 35, no. 19, pp. 13941–13953, Jul. 2023. DOI: https://doi.org/10.1007/s00521-023-08610-0
F. Baier, S. Mair, and S. G. Fadel, "Self-supervised Siamese Autoencoders," in Advances in Intelligent Data Analysis XXII, vol. 14641, I. Miliou, N. Piatkowski, and P. Papapetrou, Eds. Cham: Springer Nature Switzerland, 2024, pp. 117–128. DOI: https://doi.org/10.1007/978-3-031-58547-0_10
Y. Zhang et al., "Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification," Journal of Cheminformatics, vol. 15, no. 1, Aug. 2023, Art. no. 75. DOI: https://doi.org/10.1186/s13321-023-00744-6
A. Fedele, R. Guidotti, and D. Pedreschi, "Explaining Siamese networks in few-shot learning," Machine Learning, vol. 113, no. 10, pp. 7723–7760, Oct. 2024. DOI: https://doi.org/10.1007/s10994-024-06529-8
W. Q. Yan, "Generative Adversarial Networks and Siamese Nets," in Computational Methods for Deep Learning, Springer Nature Singapore, 2023, pp. 125–140. DOI: https://doi.org/10.1007/978-981-99-4823-9_4
A. J. Chemmanam, B. Jose, and A. Moopan, "Improved multi object tracking with locality sensitive hashing," Pattern Analysis and Applications, vol. 27, no. 4, Dec. 2024, Art. no. 136. DOI: https://doi.org/10.1007/s10044-024-01353-1
Q. Ma, M. Wen, Z. Xia, and D. Chen, "A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension," in Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, Dec. 2016, pp. 51–67. [Online]. Available: https://proceedings.mlr.press/v53/
ma16.html.
O. Rainio, J. Teuho, and R. Klén, "Evaluation metrics and statistical tests for machine learning," Scientific Reports, vol. 14, no. 1, Mar. 2024, Art. no. 6086. DOI: https://doi.org/10.1038/s41598-024-56706-x
K. Berahmand, F. Daneshfar, E. S. Salehi, Y. Li, and Y. Xu, "Autoencoders and their applications in machine learning: a survey," Artificial Intelligence Review, vol. 57, no. 2, Feb. 2024, Art. no. 28. DOI: https://doi.org/10.1007/s10462-023-10662-6
M. Nurtas, et al., "Predicting the Likelihood of an Earthquake by Leveraging Volumetric Statistical Data through Machine Learning Techniques," Engineered Science, 2023.
M. Nurtas, Z. Zhantaev, and A. Altaibek, "Earthquake time-series forecast in Kazakhstan territory: Forecasting accuracy with SARIMAX," Procedia Computer Science, vol. 231, pp. 353–358, 2024. DOI: https://doi.org/10.1016/j.procs.2023.12.216
Downloads
How to Cite
License
Copyright (c) 2025 Il'murat Tokhtakhunov, Aizhan Altaibek, Marat Nurtas

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.