Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2

Cristobal Donoso-Oliva; Ignacio Becker; Pavlos Protopapas; Guillermo Cabrera-Vives; Martina Cádiz-Leyton; Daniel Moreno-Cartagena

doi:10.1051/0004-6361/202554026

Open Access

Issue		A&A Volume 707, March 2026


Article Number		A170
Number of page(s)		12
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202554026
Published online		11 March 2026

A&A, 707, A170 (2026)

Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2

Cristobal Donoso-Oliva¹^,3^,5^★, Ignacio Becker², Pavlos Protopapas², Guillermo Cabrera-Vives¹^,3^,4^,5^,6, Martina Cádiz-Leyton¹^,3 and Daniel Moreno-Cartagena¹^,3

¹ Department of Computer Science, Universidad de Concepción, Edmundo Larenas 219, Concepción, Chile
² John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
³ Center for Data and Artificial Intelligence, Universidad de Concepción, Edmundo Larenas 310, Concepción, Chile
⁴ Millennium Institute of Astrophysics (MAS), Nuncio Monseñor Sotero Sanz 100, Of. 104, Providencia, Santiago, Chile
⁵ Millennium Nucleus on Young Exoplanets and their Moons (YEMS), Chile
⁶ Heidelberg Institute for Theoretical Studies, Heidelberg, Baden-Württemberg, Germany

^★ Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 4 February 2025
Accepted: 5 January 2026

Abstract

Context. Foundational models have emerged as a powerful paradigm within the deep learning field. Their capacity relies on the ability to learn robust representations from large-scale datasets and generalize to diverse downstream applications, such as classification. In this paper, we present Astromer 2, a foundational model designed for extracting light curve embeddings.

Aims. We introduce Astromer 2, an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pretrained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model’s representations.

Methods. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Finetuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an multilayer perceptron (MLP) classifier trained on Astromer-generated embeddings.

Results. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20,100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer’s attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.

Key words: methods: data analysis / methods: statistical / techniques: photometric / stars: variables: general

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.