| Issue |
A&A
Volume 710, June 2026
|
|
|---|---|---|
| Article Number | A91 | |
| Number of page(s) | 9 | |
| Section | Catalogs and data | |
| DOI | https://doi.org/10.1051/0004-6361/202558722 | |
| Published online | 02 June 2026 | |
ARCAFF: Cutout classification dataset
1
Department of Mathematics, University of Genova,
via Dodecaneso 35,
16146
Genova,
Italy
2
Astronomy & Astrophysics Section, School of Cosmic Physics, Dublin Institute for Advanced Studies,
31 Fitzwilliam Place,
Dublin 2,
D02 XF86,
Ireland
3
Istituto Nazionale di Astrofisica, Osservatorio Astrofisico di Torino,
via Osservatorio 20,
10025
Pino Torinese,
Italy
★ Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
22
December
2025
Accepted:
16
April
2026
Abstract
Context. Solar active regions (ARs) are dynamic and magnetically complex areas on the Sun’s surface, often associated with phenomena such as solar flares and coronal mass ejections. Accurate identification and classification of these regions are essential for understanding solar magnetic activity and forecasting space weather events. Traditional AR classification methods have predominantly relied on manual observation and analysis, which, while effective, are time-consuming and subject to human bias. Existing datasets for AR classification have significant limitations that hinder their effectiveness for deep-learning applications.
Aims. We present ARCAFF:CCD (Active Region Classification and Flare Forecasting: Cutout Classification Dataset), a new large-scale dataset of solar AR magnetograms and continuum cutouts specifically designed for machine-learning applications.
Methods. The dataset combines co-temporal line-of-sight magnetograms and continuum intensity images from the SOHO/MDI and SDO/HMI instruments. Each cutout is linked to AR identifiers assigned by the National Oceanic and Atmospheric Administration (NOAA) and includes both Mount Wilson and McIntosh classifications. The pipeline performs calibration, alignment, and structured labelling of full-disc magnetograms using AR classification metadata from NOAA Solar Region Summary (SRS) reports.
Results. The dataset covers nearly three decades of observations spanning multiple solar cycles, comprising 33 045 AR cutouts and 72 297 quiet Sun cutouts. Each cutout is accompanied by comprehensive metadata and, to our knowledge, represents the most extensive and detailed publicly available resource of its kind.
Conclusions. The ARCAFF:CCD dataset provides a comprehensive resource for developing and testing machine-learning models for solar AR classification and space weather forecasting.
Key words: methods: data analysis / techniques: image processing / catalogs / Sun: magnetic fields
© The Authors 2026
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.