SolarZip: An efficient and adaptive compression framework for Solar EUV imaging data

Zedong Liu; Song Tan; Alexander Warmuth; Frédéric Schuller; Yun Hong; Wenjing Huang; Yida Gu; Bojing Zhu; Guangming Tan; Dingwen Tao

doi:10.1051/0004-6361/202555193

Home

All issues

Volume 702 (October 2025)

A&A, 702 (2025) A160

Full HTML

Open Access

Issue		A&A Volume 702, October 2025


Article Number		A160
Number of page(s)		17
Section		The Sun and the Heliosphere
DOI		https://doi.org/10.1051/0004-6361/202555193
Published online		16 October 2025

A&A, 702, A160 (2025)

Application to Solar Orbiter/EUI data

Zedong Liu¹^,2^,5^,⋆, Song Tan³^,4^,⋆, Alexander Warmuth³, Frédéric Schuller³, Yun Hong⁶, Wenjing Huang¹^,5, Yida Gu¹^,5, Bojing Zhu⁷^,8^,5, Guangming Tan¹^,5 and Dingwen Tao¹^,5^⋆⋆

¹ Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
² University of Electronic Science and Technology of China, Chengdu 610054, China
³ Leibniz-Institut für Astrophysik Potsdam (AIP), An der Sternwarte 16, 14482 Potsdam, Germany
⁴ Institut für Physik und Astronomie, Universität Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany
⁵ University of Chinese Academy of Sciences, Beijing 100049, China
⁶ Minzu University of China, Beijing 100081, China
⁷ Yunnan Observatories, Chinese Academy of Sciences, Kunming 650216, China
⁸ Centre for Astronomical Mega-Science, Chinese Academy of Sciences, Beijing 100012, China

^⋆⋆ Corresponding author: taodingwen@ict.ac.cn

Received: 17 April 2025
Accepted: 25 July 2025

Abstract

Context. With the advancement of solar physics research, next-generation solar space missions and ground-based telescopes face significant challenges in efficiently transmitting and/or storing large-scale observational data.

Aims. We have developed an efficient compression and evaluation framework for solar EUV data, specifically optimized for Solar Orbiter Extreme Ultraviolet Imager (EUI) data. It significantly reduces the data volume, while preserving scientific usability.

Methods. We evaluated four error-bounded lossy compressors across two Solar Orbiter/EUI datasets spanning 50 months of observations. However, existing methods cannot perfectly handle the EUV data. Building on this analysis, we developed SolarZip, an adaptive compression framework featuring: (1) a hybrid strategy controller that dynamically selects the optimal compression strategy; (2) enhanced spline interpolation predictors with grid-wise anchor points and level-wise error bound auto-tuning; and (3) a comprehensive two-stage evaluation methodology integrating standard distortion metrics with domain-specific post hoc scientific analyses.

Results. Our SolarZip framework achieved a data compression ratio of up to 800× for Full Sun Imager (FSI) data and 500× for High Resolution Imager (HRI_EUV) data. It significantly outperformed both traditional and advanced algorithms, achieving 3−50× higher compression ratios than traditional algorithms, surpassing the second-best algorithm by up to 30%. Simulation experiments verified that SolarZip can reduce data transmission time by up to 270× while ensuring the preservation of scientific usability.

Conclusions. The SolarZip framework significantly enhances solar observational data compression efficiency, while preserving scientific usability by dynamically selecting the optimal compression methods based on observational scenarios and user requirements. This approach offers a promising data management solution for deep space missions, such as Solar Orbiter.

Key words: methods: data analysis / space vehicles: instruments / techniques: image processing / Sun: corona

^⋆

These authors contributed equally to this work.

© The Authors 2025

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

With the advancement of solar physics research, next-generation solar space missions and ground-based telescopes demand increasingly higher spatial and temporal resolutions for observational data. This has made the efficient transmission and processing of large-scale data an urgent challenge. Solar Orbiter (Müller et al. 2020), a collaborative mission between the European Space Agency (ESA) and National Aeronautics and Space Administration (NASA), was successfully launched in February 2020. Its unique orbital design enables unprecedented close-up observations of the Sun (approaching as near as 0.28 AU) and provides the first high-resolution images of the solar polar regions. However, due to inherent telemetry constraints of deep space missions, the amount of data observed by Solar Orbiter surpasses its transmission capabilities by a substantial margin, making efficient on-board data compression essential for achieving the mission’s scientific objectives (Fischer et al. 2017). The Extreme Ultraviolet Imager (EUI, Rochus et al. 2020), one of Solar Orbiter’s core remote sensing instruments, consists of a Full Sun Imager (FSI) and two High Resolution Imagers (HRI_EUV and HRI_Lyα), providing comprehensive observations from the chromosphere to the corona in EUV wavelengths (17.4 nm, 30.4 nm) and the Lyman-α band. EUI data exhibit significant high dynamic range characteristics, with intensity differences between bright and dark regions spanning several orders of magnitude. Furthermore, as Solar Orbiter’s orbital position and viewing angle continuously change, the characteristics of EUI data vary significantly, imposing strict adaptability requirements on compression algorithms.

Compression techniques for scientific data can be broadly categorized into “lossless” and “lossy” approaches (Patel et al. 2015; Di et al. 2024). While lossless compression guarantees exact data reconstruction, it typically achieves limited compression ratios of only 2:1 to 3:1 for scientific floating-point data (Patel et al. 2015). In contrast, error-bounded lossy compression can achieve much higher ratios, while maintaining scientific usability by controlling data distortion within user-specified tolerances (Cappello et al. 2019; Di et al. 2024).

In recent years, a new generation of lossy compressors designed specifically for scientific data has emerged, including SZ (Di & Cappello 2016; Tao et al. 2017; Liang et al. 2018a; Zhao et al. 2021), ZFP (Lindstrom 2014), MGARD (Ainsworth et al. 2019), and SPERR (Li et al. 2023). Unlike traditional lossy compressors such as JPEG (Wallace 1991; Taubman & Marcellin 2002), these error-bounded lossy compressors are designed to compress scientific data, while providing strict error control based on user requirements. These compressors have been successfully applied across various scientific domains. In climate simulation, Baker et al. (2014, 2016, 2017, 2019) employed lossy compression on data produced by the Community Earth System Model. For cosmological simulations, Pulido et al. (2019), Jin et al. (2020, 2021) proposed efficient schemes to reduce storage and transmission costs for Nyx and WarpX simulation codes. In astronomical observations, studies have evaluated how transform-based algorithms affect radio astronomy data quality (Peters & Kitaeff 2014; Vohl et al. 2015; Chege et al. 2024), while other researchers have explored efficient on-board compression algorithms for satellite missions using Cassini observational data (Xie et al. 2021; Zhang et al. 2025).

Traditional compression methods have been explored for solar data. The RICE encoding algorithm (Rice & Plaunt 1971), which relies on basic preprocessing and encoding, achieves a maximum compression ratio of 20× for Solar Orbiter EUI data (refer to Section 3). Fischer et al. (2017) implemented JPEG2000, a wavelet-transform-based compression method, yet its compression ratio was limited to 30×, with significant quality degradation at higher compression levels. Recent efforts have explored machine learning approaches (Zafari et al. 2022, 2023; Liu et al. 2024a,b), such as attention mechanisms and generative adversarial networks (GANs), achieving promising compression ratios but introducing substantial training and computational overhead.

Despite these advances, existing approaches for solar data compression have notable limitations: (1) they rely primarily on traditional lossy compression algorithms with relatively naive strategies, achieving limited compression ratios; (2) no comprehensive study has systematically applied or evaluated advanced error-bounded lossy compression techniques on solar EUV data; and (3) previous works lack interdisciplinary insights from both astronomy and computer science, failing to comprehensively illustrate the impact of lossy compression on scientific analyses of solar observations. To address these gaps, this paper introduces SolarZip, an efficient compression and evaluation framework for solar EUV imaging data. It features the following contributions:

We analyzed four advanced lossy compressors across two Solar Orbiter EUI datasets with 14 settings. These tools demonstrate clear advantages over traditional methods, such as RICE and JPEG2000, but still face limitations.
We introduced SolarZip, a comprehensive compression and evaluation framework for solar EUV imaging data, particularly optimized for Solar Orbiter/EUI data.
We designed a two-stage data evaluation framework that integrates strict error control with downstream scientific workflows: data distortion analysis and scientific post hoc analysis. This approach ensures that compressed data remain suitable for critical scientific research.
We developed an adaptive hybrid compression strategy with optimized predictors to enhance compression quality. This method dynamically selects optimal compression methods based on different observational scenarios and user requirements, achieving a compression ratio of up to 800× reduction for FSI data and 500× for HRI_EUV data.

This paper is structured as follows. Section 2 introduces the fundamentals of data compression techniques. Section 3 presents a comprehensive experimental comparison between advanced compression algorithms and traditional methods. In Section 4, we detail the SolarZip framework, including its compression workflow, hybrid strategy, and optimization methodology. Section 5 provides a thorough evaluation of the SolarZip framework, demonstrating its superior performance through extensive experimental results. Finally, Section 6 presents a conclusion to the paper and outlines promising directions for future research.

2. Data compression foundation

There are some traditional lossy compressors for images and videos, such as JPEG and MPEG, but they do not perform well in the context of scientific data. Error-bounded lossy compression represents a new generation designed for scientific data.

2.1. Error-bounded lossy compression

Generally, error-bounded lossy compressors require users to set an error type, such as the point-wise absolute error bound and point-wise relative error bound, along with an error bound level (e.g., abs error = 10⁻³). The compressor ensures that the differences between the original data and the reconstructed data remain within the user-set error tolerance (Cappello et al. 2019). This ensures that the reconstructed data maintain a controlled level of accuracy, making it suitable for scenarios where precision is paramount.

The workflow of error-bounded lossy compressors can be summarized in the following steps: (1) data preprocessing, such as domain transformation and data blocking; (2) decorrelation via compression models, which are broadly categorized as either prediction-based or transform-based; (3) quantization with controlled error to achieve data compression; and (4) further lossless compression of quantized codes and other parameters using techniques such as arithmetic coding. The core component of a lossy compressor is its decorrelation model, as it critically influences both compression efficiency and speed.

The SZ family of compressors is a representative example of prediction-based compression models. Their predictors include linear regression predictors, the Lorenzo predictor, which was used in SZ1.4–SZ2.0 (Tao et al. 2017; Liang et al. 2018a), and the spline interpolation predictor, introduced in SZ3 (Zhao et al. 2021; Liang et al. 2022). Previous studies (Tao et al. 2019; Liang et al. 2019; Zhao et al. 2020) have explored the strengths and limitations of these predictors, as well as their cooperative use. Overall, the SZ series achieves high compression ratios, while maintaining a favorable compression speed, making it a versatile solution for scientific data compression.

Conversely, transform-based compression models employ different techniques to achieve decorrelation. For instance, ZFP (Lindstrom 2014) utilizes near-orthogonal transforms, while SPERR (Li et al. 2023) applies the CDF 9/7 biorthogonal wavelet transform. ZFP is particularly notable for its high-speed performance; however, its limitation is that the compression ratio is constrained. The advantage of SPERR is that the hierarchical multidimensional DWT in SPERR can effectively capture the relevance between data points, which yields a high compression ratio after the SPECK encoding. One limitation of SPERR is that the transform and encoding processes have high computational costs and, hence, its execution speed is low, typically around 30% of SZ3. Technical details related to compression can be found in Appendix B.

2.2. Applications of compression in solar observation data

Using Solar Orbiter as an example (Fig. 1), we explain how data compression techniques are integrated into the lifecycle of solar observational data. The process begins with data collection by the spacecraft’s remote-sensing and in situ instruments. Once generated, the data undergo initial processing and compression on-board. Since the space communication bandwidth is limited (nominal 150 kbit/s at 1 AU) and storage is constrained, compression is essential to facilitating efficient transmission. The data is downlinked via X-band telemetry to ESA’s deep-space ground stations. After reaching ground stations, they undergo initial validation and calibration. Subsequently, the data is transferred to dedicated scientific data centers, where scientists and researchers conduct various levels of data processing. Compression may also be applied at data centers to manage storage and transmission challenges posed by the vast volume of data. Finally, publicly available datasets are released through mission archives, enabling broader scientific access.

Fig. 1.

Life cycle of Solar Orbiter observation data. This figure illustrates the application of data compression techniques throughout the lifecycle of solar observation data. These techniques significantly reduce data volume, addressing communication and storage challenges, particularly for on-board systems and data centers.

2.3. Problem and metrics description

In this paper, we focus on the design and implementation of a lossy compression algorithm for solar EUV observational data represented by Solar Orbiter/EUI. The key goal is to achieve efficient data compression, significantly reducing the data volume. At the same time, it is crucial to ensure that the quality of the reconstructed data meets the requirements of scientific research. To achieve better compression methods, we employed the following metrics, which are widely used in prior literature and considered standard in the field (Leung & Apperley 1994).

Metric 1 CR: We used the compression ratio (CR) to measure the compression performance. It is calculated by dividing the original data size by the compressed data size. A CR of 100 indicates that the compressed data is 1/100 the size of the original,

$\begin{matrix} CR = \frac{original size}{compressed size} \cdot \end{matrix}$ $\begin{aligned} \mathrm{CR} = \frac{\text{ original}\,\text{ size}}{\text{ compressed}\,\text{ size}}\cdot \end{aligned}$ (1)

Metric 2 PSNR: For a data point, i, we let $e_{{abs}_{i}} = x_{i} - {\tilde{x}}_{i}$ $e_{\text{abs}_i} = x_i - \tilde{x}_i$ , where [e_abs] is the absolute error. Also, we denote the range of X based on R_X. To evaluate the average error in the compression, we first used the common root mean squared error (RMSE).

$\begin{matrix} RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(e_{{abs}_{i}})}^{2}} . \end{matrix}$ $\begin{aligned} \mathrm{RMSE} = \sqrt{\frac{1}{N} \sum _{i = 1}^{N} \left(e_{\text{abs}_i}\right)^2}. \end{aligned}$ (2)

The peak signal-to-noise ratio (PSNR) is another commonly used average error metric for evaluating a lossy compression method (Berger 2003), especially in visualization. A higher value of the PSNR represents a lower error. It is calculated as

$\begin{matrix} 20 \cdot {log}_{10} (\frac{R_{X}}{RMSE}) \cdot \end{matrix}$ $\begin{aligned} 20 \cdot \log _{10}\left(\frac{R_X}{\mathrm{RMSE}}\right)\cdot \end{aligned}$ (3)

Metric 3 ρ: We employed the Pearson correlation coefficient, ρ, to assess the linear correlation between the original and reconstructed datasets. A correlation coefficient of at least 0.9999 is generally required to ensure high fidelity,

$\begin{matrix} ρ = \frac{cov (X, \tilde{X})}{σ_{X} σ_{\tilde{X}}} \cdot \end{matrix}$ $\begin{aligned} \rho = \frac{\mathrm{cov}(X, \tilde{X})}{\sigma _X \sigma _{\tilde{X}}}\cdot \end{aligned}$ (4)

Metric 4 SSIM: The structural similarity index (SSIM) is another popular metric for evaluating the perceptual quality of images in compression. Unlike traditional error-based metrics, SSIM considers structural information, including luminance and contrast, which better reflects human visual perception. A higher SSIM value corresponds to greater similarity, with a value of 1 signifying perfect structural and perceptual equivalence. In this formula, μ_x and μ_y denote the mean intensities of images x and y, σ_x² and σ_y² are their variances, and σ_xy is the covariance between them. The constants C₁ = (K₁L)² and C₂ = (K₂L)² are used to stabilize the division, where L is the dynamic range of pixel values, and K₁, K₂ are small constants,

$\begin{matrix} SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{xy} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})} \cdot \end{matrix}$ $\begin{aligned} \mathrm{SSIM}(x,y) = \frac{(2 \mu _x \mu _y + C_1)(2 \sigma _{xy} + C_2)}{(\mu _x^2 + \mu _y^2 + C_1)(\sigma _x^2 + \sigma _y^2 + C_2)}\cdot \end{aligned}$ (5)

Post hoc metrics: In addition to conventional data distortion metrics, this paper also introduces domain-specific scientific downstream analysis metrics. Aiming for a comprehensive evaluation of the impact of compression on EUI data, we provide a detailed discussion in Section 4.4.

3. Comparison of advanced and traditional compression algorithms

In this section, we evaluate the performance of advanced lossy compression methods and traditional image compression techniques on EUI data. Based on the metrics specified in Section 2.3, we perform a comparative analysis to assess the strengths and weaknesses of both classes of compressors.

3.1. Experimental setup

Environment: The advanced compression methods (error-bounded lossy compressors) included in the experiment are SZ2, SZ3, ZFP, and SPERR, while the traditional image compression techniques include JPEG2000 and RICE (native algorithm of the FITS format). As a reference, we also included the lossless compression algorithm GZip. The experimental environment is deployed on a cloud server equipped with two 16-core Intel Xeon Gold 6151 3.00 GHz CPU, and 1007 GB of RAM. Our test EUI dataset consists of two parts, described below.

FSI Dataset: EUI/FSI is an EUV imager observing the full solar disk in the 17.4 nm and 30.4 nm EUV passbands. It continually provides synoptic observations with a variable cadence depending on telemetry allocations and observing mode. We adopted a daily sampling strategy, selecting 17.4 nm images from the latest EUI data release 6.0 (Kraaikamp et al. 2023). During the data screening process, we excluded observations when FSI was operating in coronagraph mode to ensure data consistency and representativeness. The final FSI dataset totals 7.2 GB, covering observations from different orbital positions throughout Solar Orbiter’s nearly 50 months from December 2021 to January 2025, comprehensively reflecting the characteristic variations in FSI data.

HRI_EUV Dataset: The EUI/HRI_EUV dataset is based on a flare observation campaign conducted on April 5, 2024. During this campaign, HRI_EUV continuously observed a solar limb active region for approximately 4 hours from 19:59 to 23:59 with a temporal cadence of 16 seconds. This long time series observation generated a substantial amount of high-resolution data, totaling 4.2 GB. This dataset is particularly suitable for evaluating the performance of compression algorithms when processing complex, rapidly evolving solar active regions.

Before becoming publicly available, the raw EUI data undergo several preprocessing steps, which are illustrated in Fig. 3 (Nicula et al. 2005; Poupat & Vitulli 2013). Considering that the raw data are scarce and not publicly available, our main experiments are conducted on level-1 data, and all selected data have undergone the same on-board compression mode (lossy-high quality). To account for the potential impact of these prior processing steps, we also evaluated our method on level-1 data processed with different onboard compression mode in Appendix A. Furthermore, in Appendix C, we provide a mathematical justification to demonstrate the reliability of our compression results on the selected dataset.

3.2. Performance analysis and comparison

To ensure a rigorous and objective comparison of compression performance across different algorithms, we adopted a unified quality metric PSNR (as Equation 3). A higher compression ratio at a given image quality indicates a superior algorithm. Figure 2b shows the compression ratios of different algorithms over observation time at a PSNR of 88. The SZ family and SPERR demonstrate the highest compression ratios, while JPEG2000 and ZFP achieve similar but inferior results compared to the best-performing algorithms. The RICE algorithm (native algorithm of the FITS format) exhibits the lowest compression ratio. Figure 2b shows the results at PSNR = 75, where the performance gap widens further. The best-performing algorithms achieve compression ratios close to 300×, whereas JPEG2000 is limited to 50 × −100×. As an example, on January 1, 2023, SZ3 achieved a compression ratio 2.5× higher than JPEG2000 and 19.2× higher than RICE.

Fig. 2.

Panel a: Visualization of the Solar Orbiter’s trajectory based on the complete FSI dataset. Panel b: Compression ratio trends over time at a fixed quality level corresponding to a PSNR = 88 and PSNR = 75. Color coding indicates different compression methods. RICE and lossless compression (GZip) have fixed compression ratios and are used as baselines for reference. Advanced lossy compressors demonstrate significant advantages in compression ratio. Panel c: Visual comparison and difference maps between the original image (acquired on April 5, 2024) and the reconstructed images from five compression algorithms. All difference map display ranges were set to ±50 for consistency in this work.

In Fig. 2c, we compare the reconstructed images of four advanced lossy compressors and JPEG2000 on the sample data. At the same compression ratio, JPEG2000 exhibits noticeable distortion and severe compression artifacts in many regions. In contrast, the four advanced compressors deliver superior performance. While ZFP shows minor deviations, the reconstructed images from SZ3, SZ2, and SPERR are visually indistinguishable from the original, maintaining excellent image quality.

Strength: Experiments on full time series dataset and corresponding visual analyses confirm the superior performance of advanced compression algorithms on EUI data. Quantitative evaluation reveals that they significantly outperform traditional methods: achieving 10−30× higher compression ratios over RICE and 1.5−3× over JPEG2000, with equal reconstruction quality. This breakthrough is primarily attributed to the error-bound control mechanisms employed by these algorithms, which enable efficient data reduction while preserving image fidelity.

3.3. Observation and motivation

However, although advanced lossy compressors show clear advantages over traditional algorithms, we also observed their inherent limitations, which motivated the design of SolarZip.

Observation 1: Compression ratios vary over time and closely follow changes in the spacecraft’s distance from the Sun. Specifically, when the spacecraft is farther from the Sun, the compression ratio increases under the same error bound; when it is closer, the compression ratio decreases. This can be explained by the reduced apparent size of the Sun in distant images, leading to a more uniform background distribution that is easier to compress.

Observation 2: No single compressor performs the best for EUI data. As shown in Fig. 2, at a PSNR of 75 dB, ZFP exhibits the lowest compression efficiency, while SZ3 and SPERR perform similarly well. At a higher PSNR threshold of 88 dB, the compression performance analysis reveals that SZ2 achieves superior results compared to other compressors, while SZ3 exhibits marginally lower performance than SPERR. We further investigated the reasons behind these variations. SZ3 relies on global interpolation and Lorenzo predictors, which perform well on datasets with strong global continuity. In contrast, SZ2 utilizes a block-based Lorenzo predictor, achieving higher accuracy for locally continuous data.

Fig. 3.

Processing pipeline of EUI data (using FSI data as an example). The on-board WICOM compression (similar to JPEG2000) is mostly lossy, with only a small portion using lossless compression. Subsequent processing is performed on the ground.

Motivation: These observations confirm our key insight: EUI data characteristics are highly dependent on observation conditions, making it impossible to define a single optimal compression strategy. Therefore, a dynamic multi-compression framework is necessary to adapt to complex observational images and meet diverse scientific requirements.

4. SolarZip compression framework

As discussed above, designing a comprehensive compression framework for EUI data must address the diverse characteristic variations inherent in the data. This requires the dynamic selection of compression strategies tailored to varying data characteristics, while maintaining high compression efficiency. To this aim, we have proposed the SolarZip data compression and evaluation framework (Fig. 4).

Fig. 4.

Overview of SolarZip framework. The system consists of three stages: preprocessing, compression, and analysis. The core algorithmic innovation lies in the strategy controller, which can automatically tune and select the optimal compression strategy. The subsequent two stages of comprehensive analysis ensure that the decompressed data remain suitable for scientific purposes.

4.1. Overview of SolarZip

The SolarZip workflow consists of four stages: (1) initialization, which is the pre-processing of FITS files and setting configurations; (2) compression, which consists of selecting the optimal compression strategy, automatically optimizing it, then applying the compressor to the input data; (3) distortion analysis of the errors, evaluating distortion, and visualizing the results; and (4) post hoc analysis, performing downstream tasks such as a coronal structure analysis and dynamic feature analysis on reconstructed data.

Our workflow follows a modular design, enabling the efficient parallel compression of large volumes of FITS files. Users need to provide a configuration file and the FITS files. During the initialization stage, the system is configured based on the user’s configuration parameters and the FITS files are preprocessed. The preprocessing includes splitting the HDUs in the FITS files and extracting the data to be compressed. The data is then converted into binary format and passed to the next phase. FITS files are processed in batches, with the compression of each file occurring in parallel. Importantly, with the adaptive strategy controller, we are able to achieve the highest compression ratio for images from different observational scenarios and user requirements. Details on the adaptive strategy and optimization are presented in the next subsection.

After decompression, a distortion analysis is performed by comparing the decompressed data with the original, producing distortion metrics. The data processor then reconstructs the decompressed data back into FITS format, and the reconstructed FITS files are used for a further post hoc analysis.

4.2. Adaptive hybrid compression strategy

We conducted a thorough evaluation of four lossy compressors on Solar Orbiter/EUI data (detailed in Section 3). The results indicate that no single compressor consistently outperforms the others across various observational scenarios and scientific objectives. The core reason behind these variations lies in the distinct data prediction and transformation mechanisms of these advanced lossy compression algorithms (see more details in Section 2.1).

In this subsection, we propose an adaptive optimization strategy that automatically selects the most suitable compression algorithm based on data characteristics in different scenarios. This strategy leverages a sampling-based approach to dynamically determine the optimal parameter configuration.

Our adaptive hybrid compression strategy is depicted in Fig. 5. We categorize the precision requirements of EUI data into two types: relaxed and strict, using a relative error threshold of 1 × 10⁻⁴ as the boundary. For a relaxed error bound (eb > 1 × 10⁻⁴), the optimal strategy is chosen between spline interpolation prediction and Lorenzo prediction. Under a strict error bound (eb ≤ 1 × 10⁻⁴), we select the best strategy from transform-based predictors and linear regression predictors.

Fig. 5.

Steps of the adaptive hybrid compression strategy. Left panel shows the compression strategy under relaxed error bounds, while the right panel shows strategy under strict error bounds. Our method dynamically selects the optimal compression strategy and optimize it. The different strategies are denoted by S1–S4 in the figure.

Specifically, we employed a heuristic sampling approach, where N data points are uniformly selected from the dataset to guide the compression strategy selection. For a relaxed error bound (eb > 1 × 10⁻⁴), the tuning process consists of four steps: (1) uniformly sampling N data points from the dataset; (2) optimizing the spline interpolation predictor (trial run S1 in Fig. 5) by selecting the best-fit interpolation method (linear or cubic) and optimizing the sequence of interpolation dimensions; (3) optimizing the Lorenzo predictor by dynamically selecting between first-order and second-order predictions; (4) selecting the best strategy with the highest compression ratio. For strict error bounds (eb ≤ 1 × 10⁻⁴), the tuning process follows the same principle, except that run A corresponds to the transform-based predictor, while run B corresponds to the regression-based predictor.

Linear regression and Lorenzo predictions have been effectively applied in previous studies and they are both block-based prediction methods. In this work, we conduct an offline analysis to determine the optimal block size of 8 × 8, which is then set as the default parameter in our framework. Both linear regression and Lorenzo predictors support first-order and second-order prediction functions. Following the approach proposed in previous studies (Zhao et al. 2020), we dynamically select between first-order and second-order prediction functions for each data block. This ensures that every data block applies the optimal compression strategy. We applied the optimal orthogonal transformation in ZFP (Lindstrom 2014) as our prediction model because its de-correlation efficiency has been shown to be more effective than that of other transforms, such as the discrete cosine transform or wavelet transform.

4.3. Optimization for spline interpolation predictor

The compression method based on classical spline interpolation is able to achieve a high compression ratio under large error bounds. However, in some cases, noticeable image quality degradation occurs. For instance, in highly non-smooth regions of solar EUI images, such as flares, the interpolation predictor introduces visible compression artifacts in those areas. This issue arises because the basic interpolation-based predictor suffers from considerably low accuracy in long-range interpolation (Liu et al. 2022). Since it does not control the maximum stride length, the prediction accuracy becomes fairly low when the interpolation spans a long distance in the data array. To address these problems, SolarZip implements two key optimizations, which are described below.

Grid-wise anchor points interpolation. In the interpolation process, we specifically introduce grid-wise anchor points. Anchor points are predetermined data points, which are losslessly encoded and stored during compression. These anchor points divide the entire dataset into multiple blocks, and all other data points are predicted using points within a certain range, employing a multi-level interpolation method. This method effectively addresses the issues caused by long-range predictions. It is noteworthy that we found that if an appropriate stride is set for the anchor grid, the overhead associated with storing the losslessly compressed anchor points becomes negligible. More details on the anchor points interpolation are described in Appendix B.3.

Level-wise interpolation with error bound auto-tuning. We set different error bounds at different levels of interpolation (as opposed to the unified error bounds used in SZ3). In our two- (2D) data, 75% of the data points fall within the lowest interpolation level (level 1), which are predicted by higher-level reconstructed data points, while the remaining 25% of the data points are predicted at higher levels. Therefore, setting smaller error bounds at higher levels helps ensure the overall prediction accuracy, thereby improving the compression quality,

$\begin{matrix} e_{l} = \frac{e}{min (α^{l - 1}, β)} (α \geq 1 and β \geq 1) . \end{matrix}$ $\begin{aligned} e_l = \frac{e}{\min (\alpha ^{l-1}, \beta )} \quad (\alpha \ge 1 \; \mathrm{and} \; \beta \ge 1). \end{aligned}$ (6)

The level-wise error bounds e_l are dynamically adjusted based on Equation (6). The parameters α and β are introduced, where e represents the global error bound set by the user. We perform offline testing with parameter sets α = {1, 1.5, 2} and β = {2, 3, 4}, comparing the bit-rate and PSNR values across different parameter configurations. Ultimately, we select α = 1.5 and β = 4 as our optimal parameters.

4.4. Post hoc analysis stage

In addition to the standard distortion analysis, we implemented specialized post hoc analysis tailored to the scientific requirements of solar physics research.

4.4.1. FSI large-scale coronal structure analysis

For the FSI data, we focus on evaluating how compression affects large-scale dynamic structures in the solar corona. We implement a circular intensity extraction method, where a virtual circle is placed at 1.05 solar radii from the disk center, corresponding to the lower corona region where many important dynamic phenomena occur. Intensity values are sampled along this circle to generate a 1D intensity profile that captures the coronal structures.

The intensity profiles from the original and reconstructed compressed images are then compared to assess how different compression algorithms preserve the coronal features. We compare several metrics, including:

morphology-based visual comparison of full-disk FSI images;
intensity profile correlation coefficient.

This analysis is particularly important for studying large-scale coronal evolution and identifying the onset of coronal mass ejections, where subtle changes in intensity distribution can have significant scientific implications.

4.4.2. HRI_EUV small-scale dynamic feature analysis

For the HRI_EUV data, our post hoc analysis focuses on the preservation of small-scale dynamic structures in selected frames. We examine features such as plasma flows and fine magnetic structures, which are crucial for understanding energy transport and release in the solar atmosphere. We implement feature tracking algorithms to identify and characterize dynamic features in both the original and compressed reconstructed data. The analysis includes:

morphology-based visual comparison of local dynamic features in HRI_EUV images;
difference maps between original and compressed images;
median and standard deviation of the pixel difference distribution in the difference maps.

In contrast to FSI’s analysis of large-scale structures, for HRI_EUV, we focus on a detailed analysis of representative small-scale structures, comparing the sensitivity of different features under various compression ratios to propose targeted dynamic compression strategies.

5. Evaluation results

We tested the compression performance and reconstruction quality of SolarZip alongside five advanced lossy compression techniques on EUI data. Our evaluation framework consists of two analytical stages. The first stage calculates compression performance metrics by comparing the original and reconstructed data, while the second stage incorporates expert domain knowledge for systematic post hoc analysis, ensuring a comprehensive multidimensional assessment of the compression performance. Additionally, we conducted simulation experiments based on the hardware conditions of Solar Orbiter/EUI, verifying the significant improvement in data transmission efficiency enabled by lossy compression algorithms.

This detailed examination evaluates whether the compressed data retains sufficient information to support the scientific analysis of transient solar phenomena and small-scale structures, which is essential for studies of magnetic reconnection, wave dynamics, and plasma heating mechanisms in the solar atmosphere. By combining these specialized post hoc metrics with standard distortion analysis, we provide a comprehensive evaluation of compression performance that directly addresses the scientific use cases for EUI data.

5.1. Evaluation of compression performance

Table 1 shows the test results of SolarZip against four error-bounded lossy compression algorithms (ZFP, SZ3, SZ2, and SPERR) on the FSI dataset and HRI_EUV dataset. The experiments show that SolarZip achieves optimal compression ratios on both datasets, with an improvement of 5.6−30.4% over the second-best compressor, while maintaining excellent fidelity (PSNR > 60 dB). This advantage is mainly due to our proposed adaptive hybrid compression strategy (AHCS), which demonstrates excellent adaptability to the data and can optimize and select the optimal compression strategy, as well as the optimization of the spline interpolation predictor, which improves the quality of the reconstructed images under high error bounds. Thus, SolarZip achieves a high compression ratio for solar science data while ensuring scientific usability.

Table 1.

Comparison of four compressors on FSI and HRI_EUV datasets under three error bounds.

To enhance the rigor and objectivity of the compression performance comparison, we plot the rate-distortion curves for our solution and other lossy compressors, comparing the distortion quality at the same rate. Here, rate refers to the bit rate in bits per value, and we use the PSNR to measure the distortion quality. The PSNR is calculated by Equation (3) in decibels. Generally speaking, in a rate-distortion curve (Berger 2003), a higher bit rate indicates that more bits are required to store each value, resulting in higher quality of the reconstructed data after decompression, as reflected by a higher PSNR.

As discussed in Section 4.2, we designed an adaptive hybrid strategy to optimize the compression quality across the entire bit-rate range. Fig. 6a presents the rate-distortion curves of our algorithm compared to five other methods on the FSI dataset. The results demonstrate that our adaptive hybrid strategy plays a crucial role in improving compression quality. As shown in Fig. 6a, our compression algorithm achieves near-optimal quality across almost all bit rates. Particularly, for bit rates below 1.0, our method exhibits notably superior compression quality compared to SZ3 and SZ2, attributed to its dynamic selection between the automatically optimized spline interpolation predictor and the Lorenzo predictor. Moreover, our approach performs comparably well to the SPERR compressor, which is specifically designed for high-quality compression but is unsuitable for our task due to its inherent limitations, as discussed later in this work. At a PSNR of 77, our method achieves a compression ratio 300% higher than JPEG2000, demonstrating the efficacy of our spline interpolation predictor optimizations. When the bit rate exceeds 1.0, our solution surpasses SPERR, becoming the most rate-distortion efficient compressor. This is because, under a relaxed error bound greater than 1 × 10⁻⁴, our strategy accurately selects the linear regression predictor and the transformation-based predictor (using the same orthogonal transformation matrix as ZFP). Notably, our method consistently outperforms JPEG2000, a widely used compression standard in astronomy.

Fig. 6.

Rate-distortion curves on different datasets. Different compressors are distinguished by color, with our method indicated by the red line. A higher PSNR corresponds to better image quality at the same bit rate. SolarZip demonstrates the best overall performance on both datasets. (a) FSI Rate-distortion Curves. (b) HRI Rate-distortion Curves.

Fig. 6b presents the rate-distortion curves of all compressors on the HRI_EUV dataset. The results demonstrate that our solution achieves the best performance among all six compressors. It can be observed that at a bit rate of approximately 3, our rate-distortion curve exhibits a distinct inflection point, while maintaining superior quality. This behavior results from an adaptive adjustment in our selection strategy as the error bounds transition from a strict to a relaxed mode, with the threshold determined through offline analysis. Additionally, at the same PSNR level (e.g., equal to 60), our method achieves a 50% higher compression ratio than JPEG2000, further validating the effectiveness of our optimizations on the spline interpolation predictor in improving the image quality.

We conducted a simulation experiment on the compression and data transmission process aboard the Solar Orbiter. Solar Orbiter/EUI is powered by a WICOM compression ASIC with SDRAM, delivering an average downlink telemetry rate of approximately 300 Kbit/s to ground stations (Rochus et al. 2020; Marirrodriga et al. 2021). We simulated the data transmission process after compression and recorded the total time, including both compression and transmission durations. In Fig. 7, we present the results under two different error bounds. Our solution achieved the shortest elapsed time, demonstrating the highest data transmission efficiency. For instance, transferring 1000 MB of uncompressed data takes approximately 7.5 hours, whereas our method completes the task in just 101 seconds, improving the efficiency by a factor of 270. This is due to the superior compression performance and high compression speed of our method. Although SPERR achieved a slightly higher compression ratio than our solution, its slow compression speed resulted in a significantly longer total time.

Fig. 7.

Comparison of elapsed transmission times for different compressor with a data volume of 1000 MB. The elapsed time includes compression time (red) and transmission time (blue), while SolarZip demonstrates the highest efficiency, reducing the total time by 270×. (a) Error Bound 1e−3. (b) Error Bound 1e−2.

5.2. Evaluation of the post hoc analysis

After evaluating generic compression performance, our focus shifts to whether reconstructed FSI and HRI_EUV images can meet the requirements of solar physics observational research. Based on the data’s inherent variations and specific research content, we propose what we consider appropriate dynamic compression strategies.

5.2.1. FSI reconstructed image analysis

As shown in Fig. 8, we present two sets of FSI comparison images from April 6, 2024 (near perihelion) and January 9, 2024 (near aphelion), with the same error bound of 1e−2. We find that at high compression ratios (190× and 860×), no significant differences are visible in the perihelion image (Fig. 8a), while the aphelion reconstructed image shows discrepancies in coronal morphology (appearing as discontinuities in the coronal structure, Fig. 8b). This occurs because at perihelion, the closer distance provides a resolution for the solar disk that is approximately three time higher compared to aphelion, resulting in lower compression ratios near perihelion. We extracted the intensity curves from a circle at 1.05 solar radii in the FSI images and compared the correlation coefficients between original and compressed image intensity curves (shown in the right panel). We found that despite the apparent morphological information loss in the aphelion images (with higher compression ratios), they still maintain high correlation coefficients with the original image intensity curves. In Fig. 8c, we present the pearson correlation coefficients between the original and reconstructed intensity distributions at 1.05 solar radii over the complete 30-month FSI dataset under three error bounds. At an error bound of 1e−3, the correlation remains extremely close to 1 (average of 0.99998). Even under a looser bound of 1e−2, although the coefficient is lower, our visual inspection confirms that the resulting reconstruction errors still remain within acceptable levels.

Fig. 8.

Post hoc analysis comparison results of FSI. Panel a: Comparison image near perihelion on April 6, 2024, with red and blue dashed circles extracting intensity distributions at 1.05 solar radii from the original and reconstructed images. Results are shown in the right panel. Panel b: Comparison image near aphelion on January 9, 2024, with red and blue dashed circles extracting intensity distributions at 1.05 solar radii from the original and compressed images. Results are displayed in the right panel. Panel c: Correlation coefficients of intensity distributions at 1.05 solar radii between original and reconstructed images under three different error bounds, based on 2.5 years of FSI data.

Unlike traditional coronal EUV imagers, FSI has an unprecedentedly large FOV: (228′)² (Rochus et al. 2020; Berghmans et al. 2023), which has significant overlap with the Solar Orbiter coronagraph Metis (Antonucci et al. 2020). At perihelion, this FOV corresponds to (4 R_⊙)² such that the full solar disk is always seen, even at maximum off-pointing (1 R_⊙). This FOV is significantly wider than the (3.34 R_⊙)² of EUVI (Howard et al. 2008) or the (3.38 R_⊙)² of SWAP (Seaton et al. 2013). When close to aphelion, this FOV corresponds to (14.3 R_⊙)², providing unique opportunities to image the middle corona and eruptions that transit through this region.

Our post-analysis study demonstrates that for FSI with such dynamically varying field of view and observational targets, our SolarZip algorithm achieves compression ratios of nearly 200× for perihelion FSI images at the error bound of 1e−2, while aphelion FSI images reach ultra-high compression ratios exceeding 800×. Considering the dynamic variations of FSI data, users can lower the error bound when Solar Orbiter approaches the Sun to achieve higher fidelity, while increaseing the error bound when it moves away from the Sun to reduce fidelity and maintaining higher compression ratios, keeping the overall compression performance within an optimal range.

5.2.2. HRI_EUV reconstructed image analysis

The HRI_EUV plate scale is 0.492″, which implies unprecedented ultra-high resolution EUV observations. On April 5, 2024, Solar Orbiter reached a distance of 0.29 AU from the Sun, giving (single) pixel values on the Sun of (105 km)² for HRI_EUV. This unparalleled resolution provided us with an opportunity to study small-scale dynamic structures in the solar corona. We selected the frame at 22:53:48 UT as a representative image to analyze whether the reconstructed image meets the actual scientific requirements (Fig. 9a). We selected two regions for comprehensive analysis, covering two well-studied phenomena widely present in the solar atmosphere: solar jets and prominences.

Fig. 9.

Post hoc analysis comparison results of HRI_EUV. Panel a: Selected demonstration image showing two representative features: a jet (area of 150 square pixels) and a prominence (area of 300 square pixels). Panel b: Comparison between the original image and images at four different compression ratios in the jet region, with annotations showing compression ratios and the sum of pixel intensities within the region. Panel c: Comparison between the original image and images at four different compression ratios in the prominence region, with annotations showing compression ratios and the sum of pixel intensities within the region.

Solar jets, defined as collimated, beam-like plasma ejections along magnetic field lines, are ubiquitous in all regions of the solar atmosphere, including active regions, coronal holes, and quiet-Sun regions (Raouafi et al. 2016; Shen 2021; Joshi 2021; Tan et al. 2022, 2023; Sterling et al. 2024). Prominences, as widely present magnetized plasma structures in the solar atmosphere, exhibit rich dynamic characteristics (Chen 2011; Webb & Howard 2012; Warmuth 2015; Shen et al. 2020; Asai et al. 2012). We compared the reconstructed and original images of the jet region and prominence region under different error bound values (corresponding to different compression ratios; Figs. 10b and c). First, we can find that the intensity and of both the original image and the reconstructed image at different compression rates remain highly consistent. For the smaller jet region (150 square pixels), we observed noticeable blurring effects beginning at a compression ratio of 831×, while both 137× and 495× compression ratios maintained morphological consistency. For the larger prominence region (300 square pixels), our morphology-based visual inspection shows that the reconstructed images maintained excellent prominence structural features compared to the original image, even at the highest compression ratio of 1286×.

To analyze the differences between original and reconstructed HRI_EUV images, we employed two complementary methods. First, we generated difference maps by subtracting the reconstructed images from the originals, highlighting structural discrepancies in selected regions containing solar jets (Figs. 10a–d, left panels) and prominences (Figs. 10e–h, left panels). This enabled direct morphological comparison of fine-scale features. Second, we computed the normalized intensity difference (1-(original and reconstructed)) and plotted its histogram (Fig. 10, right panels). We computed the median and standard deviation to characterize the compression artifacts, with vertical lines marking the median and ±1σ boundaries. The two structures exhibited different sensitivities to different compression ratios. At compression ratios of 137× and 495×, difference images of the jet region exhibit no distinct structural features, appearing primarily as random noise. However, at 1286×, elongated jet structures emerged in the difference map (Fig. 10d), indicating significant discrepancies at the jet edge. In contrast, the prominence region difference images only begin to show subtle prominence structures at the 1286× (Fig. 10h); yet the histogram at this compression ratio maintains a relatively narrow distribution, supporting our visual assessment of preserved morphological features.

Fig. 10.

Panels a–h: Difference maps between original and compressed images, labeled with corresponding panels from Fig. 9. Right side of difference maps: Pixel difference distribution of the difference maps, annotated with median and standard deviation. Black solid line indicates the median position, blue dashed lines represent the range of one standard deviation on either side of the median.

In our analysis of the jet region, we found that the difference image at a compression ratio of 137 showed almost no jet structures (Fig. 11a), instead exhibiting features of noise signals. We therefore sought to compare the algorithm-induced errors of the jet structures with the inherent temporal noise of the HRI_EUV instrument itself. The temporal noise on repeated HRI_EUV pixel values is typically dominated by the sensor read noise and the photon shot noise according to the formula (Kraaikamp et al. 2023):

Fig. 11.

Comparison of algorithm-induced errors versus HRI_EUV temporal noise errors. Panel a: Intensity distributions extracted across the jet structure from both the original HRI_EUV image and the image with 137× compression ratio. Panel b: Intensity distributions extracted across the jet structure from both the original image and the image with 495× compression ratio. The intensity maxima in the cross-jet intensity distributions are marked for each image.

$\begin{matrix} s^{2} = r^{2} + I * t * a / n, \end{matrix}$ $\begin{aligned} s^2 = r^2 + I * t * a/n, \end{aligned}$ (7)

where s is the uncertainty on measured value, r is the readout noise (2 DN), I is the measured intensity (DN/s), t is the exposure time (s), a is the photon to DN conversion factor (6.85 DN/photon), and n is the sample size (number of pixels over which the intensity is averaged). As shown in Fig. 11, we compared the algorithm-induced errors at compression ratios of 137× and 495× with the temporal noise of the HRI_EUV instrument. For the intensity distribution across the jet shown, the peak intensities in the reconstructed and original images were 5249:5211 and 5152:5211, indicating errors of 38 DN/s (137×) and −59 DN/s (495×), respectively. For I = 5211 DN/s with an exposure time of t = 2.0 s, the uncertainty on the measured value is 267 DN. This demonstrates that even at a compression ratio of 495×, the error in the peak intensity of the jet in the reconstructed image due to the algorithm remains significantly smaller than the measurement uncertainty caused by the HRI_EUV temporal noise.

Our comprehensive analysis indicates that for this set of ultra-high-resolution HRI_EUV observations near perihelion, SolarZip can effectively achieve compression ratios of several hundred times without affecting scientific analysis. Specifically, for high-contrast dynamic structures represented by solar jets, compression ratios of around 500× can be achieved (corresponding to a 2e−2 error bound). For quiescent prominences, even higher compression ratios of approximately 800× are possible (corresponding to a 3e−2 error bound).

6. Conclusions

This paper presents SolarZip, an efficient and adaptive compression and evaluation framework specifically designed for solar EUV data from solar missions. To our knowledge, this is the first study to systematically apply and analyze advanced error-bounded lossy compressors (SZ, ZFP, and SPERR) on solar EUV data, demonstrating their significant advantages over traditional methods. However, their inherent limitations motivated us to design an adaptive hybrid compression strategy that dynamically selects optimal decorrelation models based on observational scenarios, coupled with optimized interpolation predictors to enhance compression quality. SolarZip achieved unprecedented compression ratios of up to 800× for EUI/FSI data and 500× for EUI/HRI_EUV data, surpassing traditional algorithms by 3−50×. Our comprehensive two-stage evaluation framework ensures that compressed data remains suitable for critical scientific research by integrating strict error control with downstream scientific workflows. The simulation experiments based on Solar Orbiter hardware conditions confirm that SolarZip can reduce data transmission time by a factor of 270, addressing a critical bottleneck in deep space solar missions.

The test data used in this work are publicly available EUI level-1 data that have already undergone on-board “lossy-high quality” compression processing. As demonstrated in Appendix A, these prior processing steps inherently limit the compression performance of SolarZip, as the data have already been subject to lossy compression, dynamic range reduction, and RICE encoding. Our comparative analysis across different on-board compression modes (Table A.1) reveals that SolarZip is expected to achieve 15−50% higher compression ratios when applied to less processed data, such as those compressed on-board using lossless modes. The mathematical analysis in Appendix C provides a theoretical justification that our performance evaluation on level-1 data remains scientifically valid and representative, as the preprocessing operations preserve the essential solar physical patterns that determine compression effectiveness. This suggests that SolarZip’s performance on raw or minimally processed solar data would be even more effective, going beyond the already substantial improvements demonstrated in this work.

Future works will focus on algorithmic enhancements tailored to the specific properties of solar data. For example, we aim to introduce a region of interest (ROI) approach, enabling the application of tighter error constraints in scientifically critical regions to achieve superior compression performance. Moreover, the versatility of our compression framework suggests its potential applicability to other types of astronomical datasets, such as radio imaging observations, which represents a promising avenue for future exploration.

Data availability

Solar Orbiter data is publicly available through the Solar Orbiter Archive¹. The sample data used to generate the figures in this work are publicly available at the following link: SolarZip-TestData². The dataset includes the raw data and the decompressed reconstructed data for each figure. The source code will be made publicly available at SolarZip³.

¹

https://soar.esac.esa.int/soar/

²

https://github.com/CapitalLiu/SolarZip-TestData

³

https://github.com/hpdps-group/SolarZip

Acknowledgments

We thank the anonymous reviewer for the constructive comments, which undoubtedly significantly improved the scientific quality and readability of the manuscript. We also thank the Solar Orbiter/EUI team at the Royal Observatory of Belgium for their generous assistance, particularly David Berghmans and Emil Kraaikamp for their help with the use and understanding of EUI data. Solar Orbiter is a space mission of international collaboration between ESA and NASA, operated by ESA. The EUI instrument was built by CSL, IAS, MPS, MSSL/UCL, PMOD/WRC, ROB, LCF/IO with funding from the Belgian Federal Science Policy Office (BELPSO); the Centre National d’Etudes Spatiales (CNES); the UK Space Agency (UKSA); the Bundesministerium für Wirtschaft und Energie (BMWi) through the Deutsches Zentrum für Luft- und Raumfahrt (DLR); and the Swiss Space Office (SSO). D.T. and G.T. would like to acknowledge support from the National Natural Science Foundation of China (Grant Nos. 62032023, and T2125013) and the Innovation Funding of ICT, CAS (Grant No. E461050). B.Z. was supported by the NSFC Fund (042274216). The AIP team was supported by the German Space Agency (DLR), grant number 50 OT 2304. This research used the SunPy (The SunPy Community 2020; Mumford et al. 2020) and NicePlots (https://github.com/mdolab/niceplots) software package to present the observation results.

References

Ainsworth, M., Tugluk, O., Whitney, B., & Klasky, S. 2019, SIAM J. Sci. Comput., 41, A2146 [Google Scholar]
Antonucci, E., Romoli, M., Andretta, V., et al. 2020, A&A, 642, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Asai, A., Ishii, T. T., Isobe, H., et al. 2012, ApJ, 745, L18 [NASA ADS] [CrossRef] [Google Scholar]
Baker, A. H., Xu, H., Dennis, J. M., et al. 2014, Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, 203 [Google Scholar]
Baker, A. H., Hammerling, D. M., Mickelson, S. A., et al. 2016, Geosci. Model Dev., 9, 4381 [Google Scholar]
Baker, A. H., Xu, H., Hammerling, D. M., Li, S., & Clyne, J. P. 2017, High Performance Computing: ISC High Performance 2017 (Springer) [Google Scholar]
Baker, A. H., Hammerling, D. M., & Turton, T. L. 2019, Comput. Gr. Forum, 38, 517 [Google Scholar]
Berger, T. 2003, Wiley Encyclopedia of Telecommunications (Wiley Online Library) [Google Scholar]
Berghmans, D., Antolin, P., Auchère, F., et al. 2023, A&A, 675, A110 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Cappello, F., Di, S., Li, S., et al. 2019, Int. J. High Perform. Comput. Appl., 33, 1201 [Google Scholar]
Chege, J., Koopmans, L., Offringa, A., et al. 2024, A&A, 692, A211 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Chen, P. F. 2011, Liv. Rev. Sol. Phys., 8, 1 [Google Scholar]
Di, S., & Cappello, F. 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 730 [Google Scholar]
Di, S., Liu, J., Zhao, K., et al. 2024, ArXiv e-prints [arXiv:2404.02840] [Google Scholar]
Diffenderfer, J., Fox, A. L., Hittinger, J. A., Sanders, G., & Lindstrom, P. G. 2019, SIAM J. Sci. Comput., 41, A1867 [Google Scholar]
Fischer, C. E., Müller, D., & De Moortel, I. 2017, Sol. Phys., 292, 16 [Google Scholar]
Howard, R. A., Moses, J., Vourlidas, A., et al. 2008, Space Sci. Rev., 136, 67 [NASA ADS] [CrossRef] [Google Scholar]
Jin, S., Grosset, P., Biwer, C. M., et al. 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 105 [Google Scholar]
Jin, S., Pulido, J., Grosset, P., et al. 2021, Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 45 [Google Scholar]
Joshi, R. 2021, Ph.D. Thesis, Kumaun University, India [Google Scholar]
Kraaikamp, E., Gissot, S., Stegen, K., et al. 2023, SolO/EUI Data Release 6.0 2023-01 (Royal Observatory of Belgium (ROB)) [Google Scholar]
Leung, Y. K., & Apperley, M. D. 1994, ACM Trans. Comput. Human Interact. (TOCHI), 1, 126 [Google Scholar]
Li, S., Lindstrom, P., & Clyne, J. 2023, 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 1007 [Google Scholar]
Liang, X., Di, S., Tao, D., et al. 2018a, 2018 IEEE International Conference on Big Data (Big Data) (IEEE), 438 [Google Scholar]
Liang, X., Di, S., Tao, D., Chen, Z., & Cappello, F. 2018b, 2018 IEEE International Conference on Cluster Computing (CLUSTER) (IEEE), 179 [Google Scholar]
Liang, X., Di, S., Li, S., et al. 2019, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1 [Google Scholar]
Liang, X., Zhao, K., Di, S., et al. 2022, IEEE Trans. Big Data, 9, 485 [Google Scholar]
Lindstrom, P. 2014, IEEE Trans. Vis. Comput. Gr., 20, 2674 [Google Scholar]
Liu, J., Di, S., Zhao, K., et al. 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE), 1 [Google Scholar]
Liu, X., Liu, Y., Yang, L., et al. 2024a, PASP, 136, 075001 [Google Scholar]
Liu, Z., Jiang, P., Zeng, F., Bian, H., & Toe, T. T. 2024b, 2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI) (IEEE), 58 [Google Scholar]
Marirrodriga, C. G., Pacros, A., Strandmoe, S., et al. 2021, A&A, 646, A121 [CrossRef] [EDP Sciences] [Google Scholar]
Müller, D., St. Cyr, O. C., Zouganelis, I., et al. 2020, A&A, 642, A1 [Google Scholar]
Mumford, S., Freij, N., Christe, S., et al. 2020, J. Open Source Softw., 5, 1832 [NASA ADS] [CrossRef] [Google Scholar]
Nicula, B., Berghmans, D., & Hochedez, J.-F. 2005, Sol. Phys., 228, 253 [NASA ADS] [CrossRef] [Google Scholar]
Patel, H., Itwala, U., Rana, R., & Dangarwala, K. 2015, Int. J. Eng. Res. Technol., 4, 926 [Google Scholar]
Peters, S. M., & Kitaeff, V. V. 2014, Astron. Comput., 6, 41 [Google Scholar]
Poupat, J. L., & Vitulli, R. 2013, DASIA 2013 – DAta Systems in Aerospace, 720, 62 [Google Scholar]
Pulido, J., Lukic, Z., Thorman, P., et al. 2019, J. Phys. Conf. Ser., 1290, 012008 [Google Scholar]
Raouafi, N. E., Patsourakos, S., Pariat, E., et al. 2016, Space Sci. Rev., 201, 1 [Google Scholar]
Rice, R., & Plaunt, J. 1971, IEEE Trans. Commun. Technol., 19, 889 [Google Scholar]
Rochus, P., Auchere, F., Berghmans, D., et al. 2020, A&A, 642, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Seaton, D., Berghmans, D., Nicula, B., et al. 2013, Sol. Phys., 286, 43 [NASA ADS] [CrossRef] [Google Scholar]
Shen, Y. 2021, Proc. Roy. Soc. London Ser. A, 477, 217 [NASA ADS] [Google Scholar]
Shen, Y., Li, B., Chen, P., Zhou, X., & Liu, Y. 2020, Chin. Sci. Bull., 65, 3909 [NASA ADS] [CrossRef] [Google Scholar]
Sterling, A. C., Panesar, N. K., & Moore, R. L. 2024, ApJ, 963, 4 [Google Scholar]
Tan, S., Shen, Y., Zhou, X., et al. 2022, MNRAS, 516, L12 [Google Scholar]
Tan, S., Shen, Y., Zhou, X., et al. 2023, MNRAS, 520, 3080 [CrossRef] [Google Scholar]
Tao, D., Di, S., Chen, Z., & Cappello, F. 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 1129 [Google Scholar]
Tao, D., Di, S., Liang, X., Chen, Z., & Cappello, F. 2019, IEEE Trans. Parallel Distrib. Syst., 30, 1857 [Google Scholar]
Taubman, D. S., & Marcellin, M. W. 2002, Proc. IEEE, 90, 1336 [Google Scholar]
The SunPy Community (Barnes, W. T., et al.) 2020, ApJ, 890, 68 [Google Scholar]
Vohl, D., Fluke, C. J., & Vernardos, G. 2015, Astron. Comput., 12, 200 [Google Scholar]
Wallace, G. K. 1991, Commun. ACM, 34, 30 [Google Scholar]
Warmuth, A. 2015, Liv. Rev. Sol. Phys., 12, 3 [Google Scholar]
Webb, D. F., & Howard, T. A. 2012, Liv. Rev. Sol. Phys., 9, 3 [Google Scholar]
Xie, H., West, R. A., Seignovert, B., et al. 2021, J. Astron. Telesc. Instrum. Syst., 7, 028002 [Google Scholar]
Zafari, A., Khoshkhahtinat, A., Mehta, P. M., et al. 2022, 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (IEEE), 198 [Google Scholar]
Zafari, A., Khoshkhahtinat, A., Grajeda, J. A., et al. 2023, IEEE Trans. Aerospace Electron. Syst., 60, 918 [Google Scholar]
Zhang, B., Guo, L., Tian, J., et al. 2025, Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 557 [Google Scholar]
Zhao, K., Di, S., Liang, X., et al. 2020, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, 89 [Google Scholar]
Zhao, K., Di, S., Dmitriev, M., et al. 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE) (IEEE), 1643 [Google Scholar]

Appendix A: Data and Supplementary Experiments

The EUI data undergo a comprehensive multi-stage processing pipeline, beginning with significant on-board data reduction before the generation of scientific data products on the ground. The process is initiated on-board with the simultaneous acquisition of two 12-bit images: a high gain (HG) channel optimized for faint features and a low gain (LG) channel for bright structures. These images first undergo an initial calibration to correct for instrumental artifacts, most notably a strong, systematic four-column banding noise. Following this, the calibrated HG and LG images are merged to create a single 15-bit "combined gain" image with an extended dynamic range. This 15-bit image is then subjected to an integer square root recoding, converting it to an 8-bit format. This recoding serves the dual purpose of suppressing shot noise and preparing the data for the on-board compression hardware, which requires an eight-bit input. The final on-board step is a lossy wavelet-based compression. Once transmitted to the ground, these compressed data packets are decompressed, and the recoding is reversed to reconstruct the image’s bit depth, producing level-1 (L1) data. These L1 files are packaged using a lossless RICE tile compression. Subsequent processing to create level-2 (L2) data involves further scientific calibrations, such as optical distortion correction for FSI, which necessitates data resampling. The L2 files are also tile-compressed, although this can introduce minor quantization artifacts due to the scaling of floating-point data prior to compression.

Fig. A.1.

Timeline distribution of FSI 174 A L1 data compression modes in EUI Data Release 6.0.

On-board compression on EUI includes four quality modes: lossless, lossy-high quality, lossy-strong, and lossy-extreme. Ideally, the best case for evaluating our SolarZip algorithm would be to use EUI data compressed in the lossless mode. However, we ultimately chose data compressed using the "Lossy–high quality" on-board mode as the primary dataset for our experiments, since this mode accounts for the largest proportion of the data and spans the widest time range (see the accompanying analysis of FSI-L1 compression mode distribution and temporal coverage for details).

In addition, we sought to address a key question concerning whether the on-board compression mode affects the performance of downstream compression algorithms such as SolarZip. To investigate this possibility, we conducted additional experiments. As shown in Table A.1, We traversed all possible data (data with four different compression modes in one hour, totaling 15 groups of 60 files), under different on-board compression modes to evaluate the resulting compression ratios. The results clearly indicate that the on-board compression mode does have a significant impact on SolarZip’s performance. The more aggressively the data was compressed on-board by WICOM, the lower the additional compression ratio achieved by SolarZip.

Table A.1.

Average compression ratios (15 groups) for L1 data: which were compressed by four on-board modes and then recompressed by SolarZip.

It is worth noting that the EUI team retains a very limited set of data that is both lossless and unrecoded. However, these images use single-gain channels, which differs from the combined-gain images that make up the vast majority of observations. As this dataset is not publicly available, it was not included in our tests. Nonetheless, our comparison across different L1 compression modes already provides sufficient evidence for our conclusion: when applied to less compressed (more original) data, SolarZip consistently achieves higher compression ratios. This further demonstrates the broad applicability and effectiveness of SolarZip.

Appendix B: Lossy Compression Technique

B.1. Advanced lossy compressors

Appendix B provides a detailed description of the four advanced error-bounded lossy compression algorithms employed for evaluation and comparison in this study. We first present their workflows and then highlight their respective characteristics.

SZ2 (Liang et al. 2018b) refers to the second-generation SZ compression algorithm, Which is a prediction-based error-bounded lossy compressor.

Workflow: SZ2’s compression has four main steps. Firstly, it divides raw data to be compressed into small blocks. For each of these blocks, a separate prediction function is generated. Secondly, SZ2 quantizes the data with the specified error bound. Thirdly, it employs Huffman encoding to encode the quantization index. Finally, lossless compression methods are used to further improve the compression ratio.

Insight: The SZ2 compression process is simple and efficient, achieving notable compression performance and speed on most scientific datasets. However, due to its block-wise prediction approach, its effectiveness is limited for high-dimensional and nonlinear data.

SZ3 (Liang et al. 2022; Zhao et al. 2021) uses a modular approach to compress the data. In fact, it is not only a compression software but also a flexible framework allowing users to customize specific copmression pipelines according to their datasets or use cases.

Workflow: The SZ3 compression pipeline is composed of five stages: preprocessing, prediction, quantization, variablelength encoding, and lossless compression. The preprocessing stage starts by transforming and shaping the raw data to make it easier to compress. The second stage is prediction. For different domain datasets, SZ developers have developed many predictors, including Lorenzo, linear regression, and dynamic spline interpolation. Thirdly, the error produced by the predictor is quantized. Fourthly, the quantized error data is encoded, shrinking its size. Fifthly, the now encoded data is losslessly compressed, reducing the size even further.

Insight: With an advanced predictive model and a flexible modular design, SZ3 significantly enhances compression performance and adaptability. However, these improvements come with higher computational costs.

ZFP (Lindstrom 2014; Diffenderfer et al. 2019) is a transform-based error-bounded lossy compressor.

Workflow: ZFP splits the whole dataset into many fixed-size blocks (e.g., 4×4×4 for a 3D dataset) These blocks are then individually compressed. The ZFP compressor executes four steps in each block. The first step align the value in block to a common exponent and convert the floating-point values to a fixed-point repressention. The next step uses orthogonal block transform to decorrelate data. Thirdly, it orders the transform coefficient by expected magnitude. Finally, it encodes the coefficients to reduce data size.

Insight: ZFP generally features high compression and decompression performance because of the performance optimization strategies in its implementation. However, as a block-wise compressor, ZFP faces the same limitations as SZ2.

SPERR is a transform-based lossy compressor based on the CDF9/7 discrete wavelet transform and SPECK encoding algorithm (Li et al. 2023).

Workflow: Compression pipeline of SPERR includes four stages: (1) CDF9/7 wavelet transform; (2) SPECK lossy encoding of wavelet coefficients; (3) outlier encoding (only in error-bounding mode); (4) zstd postprocessing of compressed data (optional).

Insight: SPERR’s advantage is that the hierarchical multidimension DWT in SPERR can effectively capture the relevance between data points, which brings a high compression ratio after the SPECK encoding. One limitation of SPERR is that the wavelet transform and the SPECK encoding processes have high computational costs; hence, its (sequential) execution speed is low, typically around 30% of SZ3.

B.2. Decorrelation module

Fig. B.1.

Categories of six types of decorrelation modules. The Lorenzo predictor is used in SZ2, spline interpolation is adopted in SZ3, and wavelet transform is applied in SPERR.

The decorrelation module is a critical component in lossy compressors, as it significantly affects the compressor’s performance. In general, the more accurately a predictor can model the inherent patterns in the data, the better the decorrelation it achieves. Data that has been decorrelated is more amenable to compression. Below, we introduce several mainstream predictors in detail.

The earliest proposed predictor is the Lorenzo predictor, which uses adjacent data points to predict the next one. For example, in the 1D case shown in Figure A.1, the value of the next point is always predicted using the value of the point to its left. This can be viewed as a form of extrapolative prediction. Later, spline interpolation was introduced as an alternative, which can be considered an interpolative approach. This method first predicts long-range points and then uses multiple points to estimate the intermediate values. Its advantage lies in its ability to capture more high-dimensional and global information.

Fig. B.2.

Level-wise anchor points based dynamic spline interpolation. In 2D data, there are two interpolation directions, and each interpolation step can utilize either a linear or cubic interpolation function. The algorithm automatically optimizes the selection process. It is evident that at the more critical level 2, a smaller error bound is assigned, which further enhances the accuracy of the interpolation prediction.

Unlike numerical prediction methods, transform-based approaches operate in the data domain. An example is the wavelet transform used in SPERR. Wavelet transforms convert the original data into a domain that is more favorable for compression. In this domain, the data is represented as coefficients with varying levels of significance. More information is concentrated in the more important coefficients, allowing the subsequent compression algorithm to selectively retain these while discarding or coarsely compressing the less significant ones. Transform-based approaches often yield better compression quality, but they come at a higher computational cost.

B.3. Anchor points interpolations with error bound auto-tuning

An issue is that classical spline interpolation applies the same error bound to all prediction levels, which does not fully account for their relative importance. As shown in the Fig. B.2, the predicted data points in level 1 participate in five subsequent prediction steps. Consequently, earlier-predicted data points should be considered more important.

First, we apply different interpolation methods at different levels of interpolation prediction. Specifically, the interpolation types include linear interpolation and cubic interpolation. In our 2D data, the interpolation process is essentially carried out through multiple 1D interpolation operations. Since there are two dimensions (dim0 and dim1), two distinct interpolation orderings are possible, and we select the optimal interpolation sequence.

Next, we set different error bounds at different levels of interpolation (as opposed to the unified error bounds used in SZ3). In our 2D data, 75% of the data points fall within the lowest interpolation level (level 1), which are predicted by higher-level reconstructed data points, while the remaining 25% of the data points are predicted at higher levels. Therefore, setting smaller error bounds at higher levels helps ensure overall prediction accuracy, thus improving compression quality. The level-wise error bounds, e_l, are dynamically adjusted based on equation B.1. The parameters α and β are introduced, where e represents the global error bound set by the user. We perform offline testing with parameter sets α = {1, 1.5, 2} and β = {2, 3, 4}, comparing the bit-rate and PSNR values across different parameter configurations. Ultimately, we selected α = 1.5 and β = 4 as our optimal parameters.

$\begin{matrix} e_{l} = \frac{e}{m i n (α^{l - 1}, β)} (α \geq 1 and β \geq 1) . \end{matrix}$ $\begin{aligned} e_l = \frac{e}{min(\alpha ^{l-1}, \beta )} \quad (\alpha \ge 1 \; \mathrm{and} \; \beta \ge 1). \end{aligned}$ (B.1)

Appendix C: Mathematical proof: Reliability of compression performance evaluation on EUI data products

C.1. Theorem Statement

Theorem: Let 𝒟₀, 𝒟₂ represent raw and level-2 solar EUI data respectively, where 𝒟₂ results from sequential preprocessing operations including calibration 𝒞, on-board compression 𝒫, and RICE encoding ℛ. For SolarZip (𝒜_SZ) employing spline interpolation prediction, the compression performance metrics (compression ratio, CR, and distortion measures RMSE and PSNR) evaluated on 𝒟₂ remain statistically equivalent to those evaluated on 𝒟₀, as preprocessing preserves the underlying solar physical patterns while reducing noise and instrumental artifacts.

C.2. Mathematical Framework

For solar imagery, we decompose the data into physical components,

$\begin{matrix} D_{0} = S_{0} + N_{0} + A_{0}, \end{matrix}$ $\begin{aligned} \mathcal{D} _0 = \mathcal{S} _0 + \mathcal{N} _0 + \mathcal{A} _0, \end{aligned}$ (C.1)

where 𝒮₀ represents underlying solar physical patterns, 𝒩₀ is random noise, and 𝒜₀ denotes instrumental artifacts.

The preprocessing sequence preserves physical patterns while reducing noise,

$\begin{matrix} D_{2} = R (P (C (D_{0}))) = S_{0} + ϵ S_{0} + N_{2} + A_{2}, \end{matrix}$ $\begin{aligned} \mathcal{D} _2 = \mathcal{R} (\mathcal{P} (\mathcal{C} (\mathcal{D} _0))) = \mathcal{S} _0 + \epsilon \mathcal{S} _0 + \mathcal{N} _2 + \mathcal{A} _2, \end{aligned}$ (C.2)

where |ϵ|≪1, ∥𝒩₂∥₂ ≤ ∥𝒩₀∥₂, and ∥𝒜₂∥₂ ≤ ∥𝒜₀∥₂.

C.3. Spline interpolation analysis for SolarZip

SolarZip employs B-spline interpolation for predictive compression. The B-spline basis functions of degree p:

$\begin{matrix} B_{i, 0} (t) = {\begin{matrix} 1 & if t_{i} \leq t < t_{i + 1}, \\ 0 & otherwise, \end{matrix} \end{matrix}$ $\begin{aligned}&B_{i,0}(t) = {\left\{ \begin{array}{ll} 1&\mathrm{if}\; t_i \le t < t_{i+1}, \\ 0&\text{ otherwise}, \end{array}\right.} \end{aligned}$ (C.3)

$\begin{matrix} B_{i, p} (t) = \frac{t - t_{i}}{t_{i + p} - t_{i}} B_{i, p - 1} (t) + \frac{t_{i + p + 1} - t}{t_{i + p + 1} - t_{i + 1}} B_{i + 1, p - 1} (t) . \end{matrix}$ $\begin{aligned}&B_{i,p}(t) = \frac{t - t_i}{t_{i+p} - t_i} B_{i,p-1}(t) + \frac{t_{i+p+1} - t}{t_{i+p+1} - t_{i+1}} B_{i+1,p-1}(t). \end{aligned}$ (C.4)

The spline approximation for solar data,

$\begin{matrix} S (x, y) \approx \sum_{i, j} c_{ij} B_{i}^{p} (x) B_{j}^{q} (y) . \end{matrix}$ $\begin{aligned} \mathcal{S} (x,y) \approx \sum _{i,j} c_{ij} B_i^p(x) B_j^q(y). \end{aligned}$ (C.5)

Lemma: Solar physical patterns exhibit smooth spatial variations amenable to spline representation.

Proof: Solar magnetic field configurations follow MHD equations, yielding smooth solutions. The approximation error for B-splines of degree p:

$\begin{matrix} ‖ S - \sum_{i, j} c_{ij} B_{i}^{p} (x) B_{j}^{q} {(y) ‖}_{\infty} \leq C h^{p + 1} {‖ S^{(p + 1)} ‖}_{\infty} . \end{matrix}$ $\begin{aligned} \Vert \mathcal{S} - \sum _{i,j} c_{ij} B_i^p(x) B_j^q(y)\Vert _\infty \le C h^{p+1} \Vert \mathcal{S} ^{(p+1)}\Vert _\infty . \end{aligned}$ (C.6)

Since preprocessing maintains smoothness: ∥𝒮₂^(p + 1) − 𝒮₀^(p + 1)∥_∞ ≤ ϵ_s, the spline approximation quality remains consistent. ▫

C.4. Compression Performance Invariance

The prediction error using spline interpolation:

$\begin{matrix} e [n] = x [n] - \sum_{i = 0}^{N} c_{i} B_{i}^{p} (n) . \end{matrix}$ $\begin{aligned} e[n] = x[n] - \sum _{i = 0}^{N} c_i B_i^p(n). \end{aligned}$ (C.7)

For optimal coefficients minimizing ∥e∥₂², we solve:

$\begin{matrix} G c = b, \end{matrix}$ $\begin{aligned} \mathbf G \mathbf c = \mathbf b , \end{aligned}$ (C.8)

where G_ij = ⟨B_i^p, B_j^p⟩ and b_j = ⟨x, B_j^p⟩.

Since preprocessing preserves smooth solar patterns: ∥b₂ − b₀∥₂ ≤ ϵ_b, the prediction quality remains consistent:

$\begin{matrix} ‖ e_{2} ‖_{2}^{2} - ‖ e_{0} ‖_{2}^{2} \leq 2 ϵ_{b} {‖ c ‖}_{2} + ϵ_{b}^{2} \end{matrix}$ $\begin{aligned} \Vert \mathbf e _2\Vert _2^2 - \Vert \mathbf e _0\Vert _2^2 \le 2\epsilon _b \Vert \mathbf c \Vert _2 + \epsilon _b^2 \end{aligned}$ (C.9)

C.5. Performance bounds

For the compression ratio:

$\begin{matrix} | \frac{C R_{2} - C R_{0}}{C R_{0}} | \leq \frac{ϵ_{s}}{{S/N}_{pattern}} . \end{matrix}$ $\begin{aligned} \left|\frac{CR_2 - CR_0}{CR_0}\right| \le \frac{\epsilon _s}{\text{ S/N}_{pattern}}. \end{aligned}$ (C.10)

For distortion measures (RMSE):

$\begin{matrix} \frac{| {RMSE}_{2} - {RMSE}_{0} |}{{RMSE}_{0}} \leq \frac{ϵ_{s}}{{RMSE}_{0}}, \end{matrix}$ $\begin{aligned} \frac{|\text{ RMSE}_2 - \text{ RMSE}_0|}{\text{ RMSE}_0} \le \frac{\epsilon _s}{\text{ RMSE}_0}, \end{aligned}$ (C.11)

where ${S/N}_{pattern} = \frac{‖ S_{0} ‖_{2}^{2}}{‖ N_{0} + A_{0} ‖_{2}^{2}}$ $\text{ S/N}_{pattern} = \frac{\|\mathcal{S}_0\|_2^2}{\|\mathcal{N}_0 + \mathcal{A}_0\|_2^2}$ is the pattern-to-noise ratio.

C.6. Conclusion

The mathematical analysis demonstrates that compression performance evaluation on L1/L2 solar imaging data maintains full scientific validity. The preprocessing operations preserve essential physical patterns that determine SolarZip’s effectiveness while reducing detrimental noise and artifacts.

Key Results:

Spline interpolation accuracy is maintained due to preserved spatial smoothness;
Compression ratios and distortion measures exhibit bounded, negligible deviations;
Statistical significance of evaluations is preserved or enhanced.

Therefore, L1/L2 imaging data (under difference compression mode) provides a reliable basis for evaluating SolarZip compression performance on solar imagery data.

All Tables

Table 1.

Comparison of four compressors on FSI and HRI_EUV datasets under three error bounds.

In the text

Table A.1.

Average compression ratios (15 groups) for L1 data: which were compressed by four on-board modes and then recompressed by SolarZip.

In the text

All Figures

	Fig. 1. Life cycle of Solar Orbiter observation data. This figure illustrates the application of data compression techniques throughout the lifecycle of solar observation data. These techniques significantly reduce data volume, addressing communication and storage challenges, particularly for on-board systems and data centers.
In the text

Fig. 2.

Panel a: Visualization of the Solar Orbiter’s trajectory based on the complete FSI dataset. Panel b: Compression ratio trends over time at a fixed quality level corresponding to a PSNR = 88 and PSNR = 75. Color coding indicates different compression methods. RICE and lossless compression (GZip) have fixed compression ratios and are used as baselines for reference. Advanced lossy compressors demonstrate significant advantages in compression ratio. Panel c: Visual comparison and difference maps between the original image (acquired on April 5, 2024) and the reconstructed images from five compression algorithms. All difference map display ranges were set to ±50 for consistency in this work.

In the text

	Fig. 3. Processing pipeline of EUI data (using FSI data as an example). The on-board WICOM compression (similar to JPEG2000) is mostly lossy, with only a small portion using lossless compression. Subsequent processing is performed on the ground.
In the text

	Fig. 4. Overview of SolarZip framework. The system consists of three stages: preprocessing, compression, and analysis. The core algorithmic innovation lies in the strategy controller, which can automatically tune and select the optimal compression strategy. The subsequent two stages of comprehensive analysis ensure that the decompressed data remain suitable for scientific purposes.
In the text

	Fig. 5. Steps of the adaptive hybrid compression strategy. Left panel shows the compression strategy under relaxed error bounds, while the right panel shows strategy under strict error bounds. Our method dynamically selects the optimal compression strategy and optimize it. The different strategies are denoted by S1–S4 in the figure.
In the text

	Fig. 6. Rate-distortion curves on different datasets. Different compressors are distinguished by color, with our method indicated by the red line. A higher PSNR corresponds to better image quality at the same bit rate. SolarZip demonstrates the best overall performance on both datasets. (a) FSI Rate-distortion Curves. (b) HRI Rate-distortion Curves.
In the text

	Fig. 7. Comparison of elapsed transmission times for different compressor with a data volume of 1000 MB. The elapsed time includes compression time (red) and transmission time (blue), while SolarZip demonstrates the highest efficiency, reducing the total time by 270×. (a) Error Bound 1e−3. (b) Error Bound 1e−2.
In the text

Fig. 8.

Post hoc analysis comparison results of FSI. Panel a: Comparison image near perihelion on April 6, 2024, with red and blue dashed circles extracting intensity distributions at 1.05 solar radii from the original and reconstructed images. Results are shown in the right panel. Panel b: Comparison image near aphelion on January 9, 2024, with red and blue dashed circles extracting intensity distributions at 1.05 solar radii from the original and compressed images. Results are displayed in the right panel. Panel c: Correlation coefficients of intensity distributions at 1.05 solar radii between original and reconstructed images under three different error bounds, based on 2.5 years of FSI data.

In the text

Fig. 9.

Post hoc analysis comparison results of HRI_EUV. Panel a: Selected demonstration image showing two representative features: a jet (area of 150 square pixels) and a prominence (area of 300 square pixels). Panel b: Comparison between the original image and images at four different compression ratios in the jet region, with annotations showing compression ratios and the sum of pixel intensities within the region. Panel c: Comparison between the original image and images at four different compression ratios in the prominence region, with annotations showing compression ratios and the sum of pixel intensities within the region.

In the text

	Fig. 10. Panels a–h: Difference maps between original and compressed images, labeled with corresponding panels from Fig. 9. Right side of difference maps: Pixel difference distribution of the difference maps, annotated with median and standard deviation. Black solid line indicates the median position, blue dashed lines represent the range of one standard deviation on either side of the median.
In the text

Fig. 11.

Comparison of algorithm-induced errors versus HRI_EUV temporal noise errors. Panel a: Intensity distributions extracted across the jet structure from both the original HRI_EUV image and the image with 137× compression ratio. Panel b: Intensity distributions extracted across the jet structure from both the original image and the image with 495× compression ratio. The intensity maxima in the cross-jet intensity distributions are marked for each image.

In the text

	Fig. A.1. Timeline distribution of FSI 174 A L1 data compression modes in EUI Data Release 6.0.
In the text

	Fig. B.1. Categories of six types of decorrelation modules. The Lorenzo predictor is used in SZ2, spline interpolation is adopted in SZ3, and wavelet transform is applied in SPERR.
In the text

Fig. B.2.

Level-wise anchor points based dynamic spline interpolation. In 2D data, there are two interpolation directions, and each interpolation step can utilize either a linear or cubic interpolation function. The algorithm automatically optimizes the selection process. It is evident that at the more critical level 2, a smaller error bound is assigned, which further enhances the accuracy of the interpolation prediction.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Ainsworth, M., Tugluk, O., Whitney, B., & Klasky, S. 2019, SIAM J. Sci. Comput., 41, A2146 [Google Scholar]

[2] Antonucci, E., Romoli, M., Andretta, V., et al. 2020, A&A, 642, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[3] Asai, A., Ishii, T. T., Isobe, H., et al. 2012, ApJ, 745, L18 [NASA ADS] [CrossRef] [Google Scholar]

[4] Baker, A. H., Xu, H., Dennis, J. M., et al. 2014, Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, 203 [Google Scholar]

[5] Baker, A. H., Hammerling, D. M., Mickelson, S. A., et al. 2016, Geosci. Model Dev., 9, 4381 [Google Scholar]

[6] Baker, A. H., Xu, H., Hammerling, D. M., Li, S., & Clyne, J. P. 2017, High Performance Computing: ISC High Performance 2017 (Springer) [Google Scholar]

[7] Baker, A. H., Hammerling, D. M., & Turton, T. L. 2019, Comput. Gr. Forum, 38, 517 [Google Scholar]

[8] Berger, T. 2003, Wiley Encyclopedia of Telecommunications (Wiley Online Library) [Google Scholar]

[9] Berghmans, D., Antolin, P., Auchère, F., et al. 2023, A&A, 675, A110 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[10] Cappello, F., Di, S., Li, S., et al. 2019, Int. J. High Perform. Comput. Appl., 33, 1201 [Google Scholar]

[11] Chege, J., Koopmans, L., Offringa, A., et al. 2024, A&A, 692, A211 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[12] Chen, P. F. 2011, Liv. Rev. Sol. Phys., 8, 1 [Google Scholar]

[13] Di, S., & Cappello, F. 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 730 [Google Scholar]

[14] Di, S., Liu, J., Zhao, K., et al. 2024, ArXiv e-prints [arXiv:2404.02840] [Google Scholar]

[15] Diffenderfer, J., Fox, A. L., Hittinger, J. A., Sanders, G., & Lindstrom, P. G. 2019, SIAM J. Sci. Comput., 41, A1867 [Google Scholar]

[16] Fischer, C. E., Müller, D., & De Moortel, I. 2017, Sol. Phys., 292, 16 [Google Scholar]

[17] Howard, R. A., Moses, J., Vourlidas, A., et al. 2008, Space Sci. Rev., 136, 67 [NASA ADS] [CrossRef] [Google Scholar]

[18] Jin, S., Grosset, P., Biwer, C. M., et al. 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 105 [Google Scholar]

[19] Jin, S., Pulido, J., Grosset, P., et al. 2021, Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 45 [Google Scholar]

[20] Joshi, R. 2021, Ph.D. Thesis, Kumaun University, India [Google Scholar]

[21] Kraaikamp, E., Gissot, S., Stegen, K., et al. 2023, SolO/EUI Data Release 6.0 2023-01 (Royal Observatory of Belgium (ROB)) [Google Scholar]

[22] Leung, Y. K., & Apperley, M. D. 1994, ACM Trans. Comput. Human Interact. (TOCHI), 1, 126 [Google Scholar]

[23] Li, S., Lindstrom, P., & Clyne, J. 2023, 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 1007 [Google Scholar]

[24] Liang, X., Di, S., Tao, D., et al. 2018a, 2018 IEEE International Conference on Big Data (Big Data) (IEEE), 438 [Google Scholar]

[25] Liang, X., Di, S., Tao, D., Chen, Z., & Cappello, F. 2018b, 2018 IEEE International Conference on Cluster Computing (CLUSTER) (IEEE), 179 [Google Scholar]

[26] Liang, X., Di, S., Li, S., et al. 2019, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1 [Google Scholar]

[27] Liang, X., Zhao, K., Di, S., et al. 2022, IEEE Trans. Big Data, 9, 485 [Google Scholar]

[28] Lindstrom, P. 2014, IEEE Trans. Vis. Comput. Gr., 20, 2674 [Google Scholar]

[29] Liu, J., Di, S., Zhao, K., et al. 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE), 1 [Google Scholar]

[30] Liu, X., Liu, Y., Yang, L., et al. 2024a, PASP, 136, 075001 [Google Scholar]

[31] Liu, Z., Jiang, P., Zeng, F., Bian, H., & Toe, T. T. 2024b, 2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI) (IEEE), 58 [Google Scholar]

[32] Marirrodriga, C. G., Pacros, A., Strandmoe, S., et al. 2021, A&A, 646, A121 [CrossRef] [EDP Sciences] [Google Scholar]

[33] Müller, D., St. Cyr, O. C., Zouganelis, I., et al. 2020, A&A, 642, A1 [Google Scholar]

[34] Mumford, S., Freij, N., Christe, S., et al. 2020, J. Open Source Softw., 5, 1832 [NASA ADS] [CrossRef] [Google Scholar]

[35] Nicula, B., Berghmans, D., & Hochedez, J.-F. 2005, Sol. Phys., 228, 253 [NASA ADS] [CrossRef] [Google Scholar]

[36] Patel, H., Itwala, U., Rana, R., & Dangarwala, K. 2015, Int. J. Eng. Res. Technol., 4, 926 [Google Scholar]

[37] Peters, S. M., & Kitaeff, V. V. 2014, Astron. Comput., 6, 41 [Google Scholar]

[38] Poupat, J. L., & Vitulli, R. 2013, DASIA 2013 – DAta Systems in Aerospace, 720, 62 [Google Scholar]

[39] Pulido, J., Lukic, Z., Thorman, P., et al. 2019, J. Phys. Conf. Ser., 1290, 012008 [Google Scholar]

[40] Raouafi, N. E., Patsourakos, S., Pariat, E., et al. 2016, Space Sci. Rev., 201, 1 [Google Scholar]

[41] Rice, R., & Plaunt, J. 1971, IEEE Trans. Commun. Technol., 19, 889 [Google Scholar]

[42] Rochus, P., Auchere, F., Berghmans, D., et al. 2020, A&A, 642, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[43] Seaton, D., Berghmans, D., Nicula, B., et al. 2013, Sol. Phys., 286, 43 [NASA ADS] [CrossRef] [Google Scholar]

[44] Shen, Y. 2021, Proc. Roy. Soc. London Ser. A, 477, 217 [NASA ADS] [Google Scholar]

[45] Shen, Y., Li, B., Chen, P., Zhou, X., & Liu, Y. 2020, Chin. Sci. Bull., 65, 3909 [NASA ADS] [CrossRef] [Google Scholar]

[46] Sterling, A. C., Panesar, N. K., & Moore, R. L. 2024, ApJ, 963, 4 [Google Scholar]

[47] Tan, S., Shen, Y., Zhou, X., et al. 2022, MNRAS, 516, L12 [Google Scholar]

[48] Tan, S., Shen, Y., Zhou, X., et al. 2023, MNRAS, 520, 3080 [CrossRef] [Google Scholar]

[49] Tao, D., Di, S., Chen, Z., & Cappello, F. 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE), 1129 [Google Scholar]

[50] Tao, D., Di, S., Liang, X., Chen, Z., & Cappello, F. 2019, IEEE Trans. Parallel Distrib. Syst., 30, 1857 [Google Scholar]

[51] Taubman, D. S., & Marcellin, M. W. 2002, Proc. IEEE, 90, 1336 [Google Scholar]

[52] The SunPy Community (Barnes, W. T., et al.) 2020, ApJ, 890, 68 [Google Scholar]

[53] Vohl, D., Fluke, C. J., & Vernardos, G. 2015, Astron. Comput., 12, 200 [Google Scholar]

[54] Wallace, G. K. 1991, Commun. ACM, 34, 30 [Google Scholar]

[55] Warmuth, A. 2015, Liv. Rev. Sol. Phys., 12, 3 [Google Scholar]

[56] Webb, D. F., & Howard, T. A. 2012, Liv. Rev. Sol. Phys., 9, 3 [Google Scholar]

[57] Xie, H., West, R. A., Seignovert, B., et al. 2021, J. Astron. Telesc. Instrum. Syst., 7, 028002 [Google Scholar]

[58] Zafari, A., Khoshkhahtinat, A., Mehta, P. M., et al. 2022, 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) (IEEE), 198 [Google Scholar]

[59] Zafari, A., Khoshkhahtinat, A., Grajeda, J. A., et al. 2023, IEEE Trans. Aerospace Electron. Syst., 60, 918 [Google Scholar]