| Issue |
A&A
Volume 700, August 2025
|
|
|---|---|---|
| Article Number | A136 | |
| Number of page(s) | 17 | |
| Section | Numerical methods and codes | |
| DOI | https://doi.org/10.1051/0004-6361/202554540 | |
| Published online | 13 August 2025 | |
From few to many maps: A fast map-level emulator for extreme augmentation of cosmic microwave background systematics datasets
1
INFN Sezione di Ferrara,
Via Saragat 1,
44122
Ferrara,
Italy
2
ICSC, Centro Nazionale “High Performance Computing, Big Data and Quantum Computing”,
Casalecchio di Reno,
Italy
3
Laboratoire d'Océanographie Physique et Spatiale (LOPS), Univ. Brest, CNRS, Ifremer,
IRD,
29200
Brest,
France
4
Dipartimento di Fisica e Scienze della Terra, Università degli Studi di Ferrara,
via Saragat 1,
44122
Ferrara,
Italy
5
Institut d'Astrophysique Spatiale, CNRS, Univ. Paris-Sud, Université Paris-Saclay,
Bât. 121,
91405
Orsay Cedex,
France
6
Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Univ. PSL, CNRS, Sorbonne Univ.,
Univ. Paris Cité,
75005
Paris,
France
★ Corresponding author.
Received:
14
March
2025
Accepted:
10
June
2025
Context. Generating massive sets of end-to-end simulations of time-ordered data for Monte Carlo analyses in cosmic microwave background (CMB) experiments typically incurs exceedingly high computational costs.
Aims. To address this challenge, we introduce a novel, fast, and efficient generative model built upon scattering covariances, the most recent iteration of the scattering transform statistics. This model is designed to augment by several orders of magnitude the number of map simulations in datasets of computationally expensive CMB instrumental systematics simulations, including their non-Gaussian and inhomogeneous features. Unlike conventional neural network-based algorithms, this generative model requires only a minimal number of training samples, making it highly compatible with the computational constraints of typical CMB simulation campaigns. While our primary focus is on spherical data, the framework is inherently versatile and readily applicable to 1D and 2D planar data, leveraging the localized nature of scattering statistics.
Methods. We validated the method using realistic simulations of CMB systematics, which are particularly challenging to emulate, and performed extensive statistical tests to confirm its ability to produce new statistically independent approximate realizations.
Results. Remarkably, even when trained on as few as ten simulations, the emulator closely reproduces key summary statistics including the angular power spectrum, scattering coefficients, and Minkowski functionals – and provides pixel covariance estimates with substantially reduced sample noise compared to those obtained without augmentation.
Conclusions. The proposed approach has the potential to shift the paradigm in simulation campaign design. Instead of producing large numbers of low- or medium-accuracy simulations, future pipelines can focus on generating a few high-accuracy simulations that are then efficiently augmented using such a generative model. This promises significant benefits not only for current and forthcoming cosmological surveys such as Planck, LiteBIRD, Simons Observatory, CMB-S4, Euclid, and Rubin-LSST, but also for diverse fields including oceanography and climate science. We make both the general framework for scattering transform statistics, HealpixML, and the emulator, CMBSCAT, available to the community.
Key words: methods: data analysis / methods: statistical / cosmic background radiation
© The Authors 2025
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.