| Issue |
A&A
Volume 701, September 2025
|
|
|---|---|---|
| Article Number | A150 | |
| Number of page(s) | 27 | |
| Section | Numerical methods and codes | |
| DOI | https://doi.org/10.1051/0004-6361/202554025 | |
| Published online | 12 September 2025 | |
Learning novel representations of variable sources from multi-modal Gaia data via autoencoders
1
Institute of Astronomy, KU Leuven,
Celestijnenlaan 200D,
3001
Leuven,
Belgium
2
Millennium Institute of Astrophysics,
Nuncio Monseñor Sotero Sanz 100, Of. 104, Providencia,
Santiago,
Chile
3
Department of Astronomy, University of Geneva,
Chemin Pegasi 51,
1290
Versoix,
Switzerland
4
Department of Astronomy, University of Geneva,
Chemin d’Ecogia 16,
1290
Versoix,
Switzerland
5
Sednai Sàrl,
Geneva,
Switzerland
6
INAF – Osservatorio Astrofisico di Torino,
Via Osservatorio 20,
10025
Pino Torinese,
Italy
7
Konkoly Observatory, HUN-REN Research Centre for Astronomy and Earth Sciences,
Konkoly Thege 15–17,
1121
Budapest,
Hungary
8
CSFK, MTA Centre of Excellence,
Konkoly Thege 15–17,
1121
Budapest,
Hungary
9
INAF – Osservatorio di Astrofisica e Scienza dello Spazio di Bologna,
Via Piero Gobetti 93/3,
Bologna
40129,
Italy
10
Starion for European Space Agency, Camino bajo del Castillo, s/n, Urbanizacion Villafranca del Castillo, Villanueva de la Cañada,
28692
Madrid,
Spain
11
Department of Astrophysics, IMAPP, Radboud University Nijmegen,
PO Box 9010,
6500
GL
Nijmegen,
The Netherlands
12
Max Planck Institute for Astronomy,
Koenigstuhl 17,
69117
Heidelberg,
Germany
★ Corresponding author: pablo.huijse@kuleuven.be
Received:
4
February
2025
Accepted:
18
July
2025
Context. Gaia Data Release 3 (DR3) has published for the first time epoch photometry, BP/RP (XP) low-resolution mean spectra, and supervised classification results for millions of variable sources. This extensive dataset offers a unique opportunity to study the variability of these objects by combining multiple Gaia data products.
Aims. In preparation for DR4, we propose and evaluate a machine learning methodology capable of ingesting multiple Gaia data products to achieve an unsupervised classification of stellar and quasar variability.
Methods. A dataset of 4 million Gaia DR3 sources was used to train three variational autoencoders (VAEs), which are artificial neural networks (ANNs) designed for data compression and generation. One VAE was trained on Gaia XP low-resolution spectra, another on a novel approach based on the distribution of magnitude differences in the Gaia G band, and the third on folded Gaia G band light curves. Each Gaia source was compressed into 15 numbers, representing the coordinates in a 15-dimensional latent space generated by combining the outputs of these three models.
Results. The learned latent representation produced by the ANN effectively distinguishes between the main variability classes present in Gaia DR3, as demonstrated through both supervised and unsupervised classification analysis of the latent space. The results highlight a strong synergy between light curves and low-resolution spectral data, emphasising the benefits of combining the different Gaia data products. A 2D projection of the latent variables revealed numerous overdensities, most of which strongly correlate with astrophysical properties, showing the potential of this latent space for astrophysical discovery.
Conclusions. We show that the properties of our novel latent representation make it highly valuable for variability analysis tasks, including classification, clustering, and outlier detection.
Key words: methods: data analysis / methods: numerical / methods: statistical / techniques: photometric / techniques: spectroscopic / stars: variables: general
© The Authors 2025
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.