A&A 459, 987-1000 (2006)
DOI: 10.1051/0004-6361:20054468

A wavelet analysis of CMB time-ordered data applied to Archeops

J. F. Macías-Pérez1 - A. Bourrachot2


1 - Laboratoire de Physique Subatomique et Cosmologie, 53 avenue des Martyrs, 38026 Grenoble Cedex, France
2 - Laboratoire de l'Accélérateur Linéaire BP 34, Campus Orsay, 91898 Orsay Cedex, France

Received 3 November 2005 / Accepted 3 April 2006

Abstract
We present an alternative analysis of CMB time-ordered data (TOD) using a wavelet-based representation of the data time-frequency plane. We demonstrate that the wavelet transform decorrelates 1/f-type Gaussian stationary noise and permits a simple and functional description of locally stationary processes. In particular, this makes it possible to generalize the classical algorithms of map making and CMB power-spectrum estimation to the case of locally stationary 1/f type noise. As an example, we present a wavelet-based algorithm for the destriping of CMB-like maps. In addition, we describe a wavelet-based analysis of the Archeops data including time-frequency visualization, wavelet destriping, and filtering of the TOD. These filtered data were used to produce polarized maps of Galactic dust diffuse emission. Finally, we describe the modeling of the non-stationarity on the Archeops noise for estimating the CMB power spectrum.

Key words: cosmology: cosmic microwave background

   
1 Introduction

After discovery by the COBE satellite (Smoot et al. 1992), the cosmic microwave background (CMB) temperature anisotropies have become one of the most powerful observational tests for cosmology. Indeed, recent measurements of their angular temperature power spectrum by ground-based experiments, such as DASI (Halverson et al. 2002), CBI (Pearson et al. 2002), and VSA (Dickinson et al. 2004) and by balloon-borne experiments such as BOOMERang (Netterfield et al. 2002), MAXIMA (Lee et al. 2001), and Archeops (Benoît et al. 2003b; Tristram et al. 2005) have provided strong constraints on the main cosmological parameters, such as the Hubble constant H0, the Universe total density  $\Omega_{0}$, matter density  $\Omega_{\rm m}$, and energy density  $\Omega_{\Lambda}$, the scalar $n_{\rm s}$, and tensor $n_{\rm t}$ spectral indexes associated to the power spectrum of the primordial fluctuations. More recently, the satellite mission WMAP[*] (Bennett et al. 2003) measured both the temperature CMB angular power spectrum and the cross correlation between temperature and polarization (Kogut et al. 2003), leading to high precision determination of the cosmological parameters. In the near future, the PLANCK satellite mission[*] will significantly improve the accuracy of measurements of the angular power spectrum of the CMB temperature anisotropies and will also measure the CMB polarization anisotropies. This will lead to a rather more selective test for distinguishing competing cosmological models and to constraints on the inflationary paradigm.

The analysis of the PLANCK satellite data will require both faster and more accurate algorithms than those currently available, to deal with such large data sets containing few 109 samples and to measure polarized fluctuation signals that are two or more orders of magnitude smaller than temperature fluctuations. One of the most important problems in the analysis of CMB data is characterizing the properties of the noise in the time-ordered data (TOD), exploiting them for optimal map making, and for reconstructing the angular power spectrum of the CMB temperature and polarization anisotropies. The extremely large size of the data sets makes it computationally unfeasible to directly invert the noise covariance matrix and even its storage (Borrill 1999). This leads to the common use of Fourier techniques (Doré et al. 2001; Hivon et al. 2002; Yvon et al. 2005), which are valid exclusively in the case of stationary and Gaussian noise for which the covariance matrix is Toeplitz, and it is diagonalized by the Fourier transform. In the case of PLANCK, observations will take place over two years, so it seems reasonable to expect that the TOD noise properties will change during the mission. Further, residuals from badly removed parasitic noises will also contribute to the non stationarity and non Gaussianity of the TOD noise.

New promising Fourier methods (Natoli et al. 2001) based on the concept of locally stationary noise have been presented to deal with the non stationarity problem. However, they are not adapted well to the case of residuals from parasitic signals, and they significantly reduce the number of data samples to be used in the analysis so that the overall instrument sensitivity also decreases. In addition, these new methods requires prior knowledges of the zones of local stationarity on the data, but no algorithm has been proposed yet to address this issue. In this paper, we describe a new approach to deal with non stationarity in CMB data sets. This approach is based on simultaneous analysis of the time and the frequency properties of noise using the wavelet and wavelet packet transforms. By contrast to Fourier-based methods, non stationarity is naturally described by the wavelet transform, so no cutting of the data is needed to properly define locally stationary processes. In addition, we discuss how wavelet techniques are very useful identifying systematics in the TOD, both visually and automatically. Application of these techniques to the Archeops data is presented going from the simple visual inspection for classifying the data to specifically designed algorithms to remove low-frequency components in the TOD and the remaining stripes on the Archeops maps.

Archeops[*] is a balloon-borne bolometer experiment that aimed at measuring the CMB temperature anisotropy on large and small angular scales. It provided the first determination of the $C_\ell$ spectrum from the COBE multipoles (Smoot et al. 1992) to the first acoustic peak (Benoît et al. 2003a; Tristram et al. 2005) from which it gave a precise determination of various cosmological parameters, such as the total density of the Universe and the baryon fraction (Benoît et al. 2003b). Archeops was also designed as a test bed for Planck-HFI, so it shared the technological design: a Gregorian off-axis telescope with a 1.5 m primary mirror, bolometers operating at 143, 217, 353, and 545 GHz, which are cooled down at 100 mK by a 3He/4He dilution designed to work at zero gravity, and a similar scanning strategy. Because of this, Archeops data are expected to have common features with Planck data with respect to random noise and systematics. A detailed description of the instrument and its performances can be found in Benoît et al. (2002).

The plan for this paper is the following. Section 2 introduces the wavelet and wavelet packet transforms. Section 3 discusses the properties of the wavelet transform of stationary Gaussian process. Section 4 deals with non stationary noise and, in particular, locally stationary noise. Section 5 discusses a wavelet-based destriping algorithm. Finally, Sect. 6 describes a wavelet analysis of the Archeops TOD.

2 The wavelet and wavelet-packet transforms

In this section we give a brief overview of the discrete wavelet and wavelet packet transforms that will be used extensively in the following sections.

   
2.1 The discrete wavelet transform

The discrete wavelet transform (DWT) is an orthonormal transform  $\mathcal{W}$ that takes a time series - TOD - $\vec{X}= \left\lbrace X_{0}, X_{1}, \ \ldots, \ X_{N-1}\right\rbrace^{T}$ and yields a vector of N DWT coefficients

 \begin{displaymath}%
\vec{W} \equiv \mathcal{W}\vec{X} =
\left\lbrace \vec{W}_{1...
...}_{2}^{T},\ldots,\vec{W}_{J}^{T},\vec{V}_{J}^{T}\right\rbrace.
\end{displaymath} (1)

The sub-vector $\vec{W}_{j}$ contains $N_{j} = \frac{N}{2^{j}}$ wavelet coefficients associated with scale $\tau_{j} \equiv 2^{j-1}$, whereas  $\vec{V}_{J}$ contains $\frac{N}{2^{J}}$ scaling coefficients associated with scale $\lambda_{J} \equiv 2^{J}$. These scaling coefficients are needed to compensate for the truncation of the DWT at level J. As discussed later in this section, they account for the smooth component of the time series that is left over by the wavelet coefficients. Indeed, for $J=J_{\rm max} \equiv \lg_{2} N$ then, $\vec{V}_{J_{\rm max}}$ is composed of a single scaling coefficient that represents the mean value of the time series. The orthonormality condition $\mathcal{W}^{T}\mathcal{W}= I_{N}$ implies that the inverse DWT is given by

 \begin{displaymath}%
\vec{X} \equiv \mathcal{W}^{T}\vec{W}=
\left\lbrace X_{0}, X_{1}, \ \ldots, \ X_{N-1}\right\rbrace^{T}.
\end{displaymath} (2)

We can formally describe the DWT in terms of wavelet and scaling filters as follows. Assume $\vec{h}_{1} = \left\lbrace h_{1,0},\ldots, h_{1,L-1},0,\ldots,0\right\rbrace^{T}$ to be a vector of length N whose first L < N elements are the unit wavelet filter coefficients for a given compactly supported wavelet (Daubechies 1992) and with the discrete Fourier transform (DFT) $\left\lbrace H_{1,k}, k=0,\ldots, N-1 \right\rbrace$. Then $\vec{g}_{1} = \left\lbrace g_{1,0},\ldots, g_{1,L-1},0,\ldots,0\right\rbrace^{T}$ is a vector of length N containing the zero-padded scaling filter coefficients for unit level, defined via $g_{1,l} = \left(-1\right)^{l+1} \ h_{1,L-1-l}$ for $l=0, \ldots, L-1$ with DFT $\left\lbrace G_{1,k}, k=0,\ldots, N-1 \right\rbrace$. To obtain the level 1 wavelet and scaling coefficients, $\vec{X}$ is filtered using h1 and g1 and then resampled by 2

\begin{eqnarray*}&& W_{1,n} = \sum\limits_{l=0}^{\min(L,N)-1}\ \ h_{1,l} \ X_{2(...
...=0}^{\min(L,N)-1}\ \ g_{1,l} \ X_{2(n+1)-1-l \ {{\rm mod}} \ n}
\end{eqnarray*}


where $n=0,\ldots,N/2-1$ and $\vec{h}_{1}$ and  $\vec{g}_{1}$ can be considered as high pass and low pass filters, respectively. Therefore, the wavelet coefficients  $\vec{W}_{1}$ represents the coarse component of X, and the scaling coefficients  $\vec{V}_{1}$ the smooth part of X. To obtain the level 2 wavelet  $\vec{W}_{2}$ and scaling  $\vec{V}_{2}$coefficients, $\vec{V}_{1}$ is high pass and low pass filtered as before using h1 and g1 and then subsampled by 2. Finally, at level j, $\vec{W}_{j}$ and  $\vec{V}_{j}$are given by subsampling by 2 the high pass and low pass filtering of  $\vec{V}_{j-1}$ using h1 and g1. Here, the time series is considered to be circular; as a consequence, the first L-2 coefficients at each level j are affected by boundary conditions, so their statistical properties differ from those unaffected by circularity. This algorithm is known as the DWT pyramid algorithm (Mallat 1989).

It is convenient to define equivalent $\vec{h}_{j}$ and $\vec{g}_{j}$ filters with Lj=(2j-1)(L-1)+1 non zero elements such that

 
$\displaystyle %
W_{j,n} = \sum\limits_{l=0}^{\min(L_{j},N)-1}\ \ h_{j,l} \ X_{2^{j}(n+1)-1-l \ {{\rm mod}} \ n}$      
$\displaystyle V_{j,n} = \sum\limits_{l=0}^{\min(L_{j},N)-1}\ \ g_{j,l} \ X_{2^{j}(n+1)-1-l \ {{\rm mod}} \ n}$     (3)

for $n=0,\ldots,N_{j}-1$. These are the inverse DFT of

\begin{displaymath}%
H_{j,k}=H_{1,2^{j-1}k \ {{\rm mod}} \ N \ } \ \prod_{l=0}^{j-2} G_{1,2^{l}k \ {{\rm mod}} \ N }, \ \ k=0, \ldots, N-1
\end{displaymath} (4)

and

\begin{displaymath}%
G_{j,k} = \prod_{l=0}^{j-1} G_{1,2^{l}k \ {{\rm mod}} \ N}, \ \ k=0, \ldots, N-1.
\end{displaymath} (5)

This equivalent filter representation permits a clear understanding of the overall filtering applied to the time series to obtain the wavelet and scaling coefficients at level j. Within this framework it is obvious that the coarser components of X are represented by the low j wavelet coefficients and the smoother ones by the large j wavelet coefficients and the scaling coefficients. The DWT leads to a dyadic sampling in frequency so that the wavelet coefficients are obtained by band-pass filtering of X in the frequency interval ${\mathcal{I}}{j} \equiv
\frac{1}{2^{j+1}} \leq \left\vert f\right\vert \leq \frac{1}{2^{j}}$for unity sampling frequency.

Finally, the DWT can also be represented as the decomposition of the time series into an orthonormal base

\begin{displaymath}%
X(t_{m}) = \sum\limits_{j=1}^{J} \ \sum\limits_{k=0}^{N/2^{...
...
\sum\limits_{k=0}^{N/2^{J}-1} V_{J,k} \ \phi_{j,k} (t_{m});
\end{displaymath} (6)

therefore, the total energy is conserved as

\begin{displaymath}%
\left\Vert \vec{X} \right\Vert^{2} = \sum\limits_{j=1}^{J} ...
...t\vec{W}_{j}\right\Vert^2 + \left\Vert\vec{V}_{J}\right\Vert^2
\end{displaymath} (7)

where $\psi_{j,k}$ and $\phi_{j,k}$ are the wavelet and scaling functions, respectively.

   
2.2 Wavelet bases

This paper concentrates on compactly supported orthogonal wavelet bases, such as the Haar, Daubechies Coifflets, least asymmetric (also called symmlets), and best localized wavelets, which are extensively used in the statistical community (see for example Daubechies 1992; Percival & Walden 2000; Vidakovic 1999).

Each wavelet family is defined as fulfilling a particular requirement like for example the symmetry of the wavelet function in the case of the least asymmetric family. For each family, the width, L, of the scaling and wavelet filters is a variable parameter, and as discussed by Lai 1995, it is directly related to the shape of the low-pass and high-pass equivalent filters. In particular, the larger L, the closer the low and high pass filters will be to a perfect low pass and high pass filter, respectively. This will be discussed in more detail in the following sections.

   
2.3 The maximal overlap DWT

The maximal overlap DWT (MODWT), also known as the non-decimated DWT or the stationary DWT, is a particularly interesting generalization of the DWT. It allows us to keep the original sampling in time of the time series at each level j of the wavelet transform. In addition, the MODWT is invariant with respect to time shifting, while the DWT is not. These two properties become important when studying the time evolution properties of the timeline.

From a mathematical point of view, level J0MODWT of a time series Xt of N data samples is a transform consisting of J0+1 vectors $\widetilde{{\vec{W}}}_{1}, \ldots,\widetilde{{\vec{W}}}_{J_{0}}$ and $\widetilde{{\vec{V}}}_{J_{0}}$, all of which have dimension N. The vector  $\widetilde{{\vec{W}}}_{j}$ contains the MODWT wavelet coefficients associated with changes on the scale $\tau_{j} \equiv 2^{j-1}$ while the $\widetilde{{\vec{V}}}_{J_{0}}$ contains the MODWT scaling coefficients associated with averages on the scales $\lambda_{J_{0}} \equiv 2^{J_{0}}$.

As for the DWT, $\widetilde{{\vec{W}}}_{j}$ and $\widetilde{{\vec{V}}}_{j}$ can be calculated by filtering Xt, namely,

 
    $\displaystyle \widetilde{{\vec{W}}}_{j} \equiv \sum_{l=0}^{L_{j}-1} \ {\widetilde{h}}_{j,l} X_{t-l \ {\rm mod} \ N}$  
    $\displaystyle \widetilde{{\vec{V}}}_{j} \equiv \sum_{l=0}^{L_{j}-1} \ {\widetilde{g}}_{j,l} X_{t-l \ {\rm mod} \ N}$ (8)

$t=0, \ldots, N-1$, where $\left\lbrace {\widetilde{h}}_{j,l} \right\rbrace$ and $\left\lbrace {\widetilde{h}}_{j,l} \right\rbrace$ are the jth level MODWT wavelet and scaling filters. These filters are defined in terms of the DWT wavelet and scaling filters as ${\widetilde{h}}_{j,l} = h_{j,l}/2^{\frac{j}{2}}$ and ${\widetilde{g}}_{j,l} = g_{j,l}/2^{\frac{j}{2}}$ and have width $L_{j} \equiv (2^{j}-1)(L-1)+1$. In practice the wavelet and scaling coefficients are not calculated from Eq. (8) but via a pyramidal algorithm similar to the DWT (Percival & Walden 2000; Vidakovic 1999). Note that the first ${\rm min}\left\lbrace L_{j}-2,N-1 \right\rbrace$ elements in both $\widetilde{{\vec{W}}}_{j}$ and $\widetilde{{\vec{V}}}_{j}$ are affected by circularity so are boundary coefficients.

Unlike the DWT, the MODWT is not an orthonormal transform because at each level j the components of both $\widetilde{{\vec{W}}}_{j}$ and $\widetilde{{\vec{V}}}_{j}$ are not independent. Nevertheless, the MODWT is capable of producing a multi-resolution analysis of the data such that

\begin{displaymath}%
\left\Vert \vec{X} \right\Vert^{2} =
\sum\limits_{j=1}^{J}...
...\Vert^2 +
\left\Vert {\widetilde{\vec{V}}}_{J} \right\Vert^2.
\end{displaymath} (9)

Finally, the MODWT can also be expressed as matrix operations such that $\widetilde{{\vec{W}}}_{j} = \widetilde{{\mathcal{W}}}_{j} \vec{X}$ and $\widetilde{{\vec{V}}}_{j} = \widetilde{{\mathcal{V}}}_{j} \vec{X}$. The time series $\vec{X}$can then be recovered from

\begin{eqnarray*}\vec{X} = \sum_{j=1}^{J} \widetilde{{\mathcal{W}}}_{j}^{T} \wid...
...+
\widetilde{{\mathcal{V}}}_{J}^{T} \widetilde{{\vec{V}}}_{J}.
\end{eqnarray*}


Typically, the MODWT is used to represent the time evolution of the time-series power for each scale in the analysis. As the MODWT preserves the original time sampling for all scales, this representation is very handy and makes it easy to compare the time evolution of very different time scales. For example in Sect. 4.2, we use the MODWT to estimate the time evolution function of the power spectrum of a time-modulated stationary process. However, the use of this representation to infer the statistical properties of the data is delicate, as the MODWT coefficients for a given scale are artificially correlated in time by the MODWT transform itself.

   
2.4 The discrete wavelet packet transform

The wavelet transform as defined above is just a representation of temporal signals at different scales j that correspond to the frequency intervals $\left\lbrace \frac{1}{2^{j+1}},\frac{1}{2^{j}}\right\rbrace$. Although extremely useful and powerful, this representation is unusual from the Fourier point of view. In this respect, it is interesting to define the discrete wavelet packet transform, DWPT, by generalizing the DWT, so that we can obtain an equipartition of the frequency domain and of the time domain similar to that given by a Windowed Fourier Transform (WFT). In simple words, the DWPT is obtained at each level j by high pass filtering with  $\left\lbrace h_{1,l} \right\rbrace$ and low pass with  $\left\lbrace g_{1,l} \right\rbrace$ the smooth component  $\vec{V}_{j-1}$ in the DWT, but also high-pass and low-pass filtering the coarse component  ${\vec{W}}_{j-1}$.

The level j DWPT of a N=2J dimensional vector $\vec{X}$is an orthonormal transform yielding an N dimensional vector of coefficients that can be partitioned as

\begin{eqnarray*}\left\lbrace {\vec{W}}_{j,n}, \ \ n=0,\ldots,2^{j}-1 \right\rbrace
\end{eqnarray*}


where each ${\vec{W}}_{j,n}$ has dimension Nj=2J-j and is nominally associated with the frequency interval ${\mathcal{I}}_{j,n} \equiv \left\lbrace \frac{n}{2^{j+1}} \frac{n+1}{2^{j+1}}\right\rbrace$. Note that we define ${\vec{W}}_{0,0}=\vec{X}$ following previous definitions. Together these 2j vectors divide the frequency interval $\left\lbrack 0,\frac{1}{2}\right\rbrack$ into 2j intervals of equal width  $\frac{1}{2^{j+1}}$. This leads to an equipartition both in time and frequency of the time-frequency plane similar to that produced by the WFT without explicit time partitioning.

After defining Wj,n,t as the th element of ${\vec{W}}_{j,n}$, we can write

 \begin{displaymath}%
W_{j,n,t}=\sum_{l=0}^{L-1} u_{n,l} \ W_{j-1, \lfloor \frac{n}{2}\rfloor, 2t+1-l
\ {{\rm mod}} \ N_{j-1}}
\end{displaymath} (10)

for $t=0,\ \ldots, \ N_{j}-1$ and where

\begin{displaymath}%
u_{n,l} \equiv \left\lbrace
\begin{array}{l}
g_{1,l}, {\rm ...
... {\rm mod} \ \ 4 =1 \ \ {\rm or} \ \ 2; \\
\end{array}\right.
\end{displaymath} (11)

and $\lfloor \frac{n}{2}\rfloor$ is $\frac{n}{2}$ if n is even, otherwise it is $\frac{n-1}{2}$.

Since the DWPT is an orthonormal transform, we can use it to partition the energy in $\vec{X}$ via

\begin{eqnarray*}\left\Vert \vec{X} \right\Vert^{2} = \sum\limits_{n=0}^{2^{j}-1} \left\Vert\ {\vec{W}}_{j,n} \right\Vert^2
\end{eqnarray*}


where $\left\Vert\ {\vec{W}}_{j,n} \right\Vert^2$ can be interpreted as the contribution to the energy by frequencies in the band ${\mathcal{I}}_{j,n}$. As the above relationship is valid for any given level j in the interval $ 1 \leq j \leq J$, we can consider that the DWPT for $ 1 \leq j \leq J$ is an overcomplete representation of $\vec{X}$. Furthermore, the basis functions associated to it constitute a sort dictionary that is commonly used for representing functions via matching pursuit techniques (Mallat & Zhang 1993).

   
3 Wavelet analysis of Gaussian stationary noise

The analysis of stationary Gaussian processes is traditionally performed in the Fourier domain because of the decorrelation properties of the Fourier transform. Indeed, the covariance matrix, $\hat{C}_{X}$, of stationary Gaussian distributed noise, Xt, is diagonal in Fourier space and completely described by the noise power spectrum, PX,

\begin{displaymath}%
\hat{C}_{X}(f,f^{\prime}) = {\rm diag} \left\lbrace P_{X} (f) \right\rbrace.
\end{displaymath} (12)

This property makes it trivial to calculate the inverse of the covariance matrix and from the point of view of CMB analysis, it considerably eases tasks like map making and angular power spectrum estimation. However, this feature cannot be extrapolated to non stationary Gaussian noise. In this section, we show how the DWT can also decorrelate Gaussian stationary noise. Therefore, we can reasonably expect it will also decorrelate non stationary Gaussian noise, based on the localization properties of the DWT, as discussed in the following sections.

3.1 Correlation between wavelet coefficients

Using Eq. (3) and assuming a stationary process Xt with auto-covariance

\begin{displaymath}%
s_{X,\tau} \equiv \left < X_{t},X_{t,\tau}\right >
\end{displaymath} (13)

that depends only on $\tau$, we can write the correlation between non boundary wavelet coefficients as follows
$\displaystyle {\rm cov} \left\lbrace W_{j,t}, \ W_{j^{\prime},t^{\prime}} \righ...
...rime},l^{\prime}} \
s_{X,2^{j}(t+1)-l-2^{j^{\prime}}(t^{\prime}+1)+l^{\prime}},$     (14)

which using the DFT transforms into
 
$\displaystyle {\rm cov} \left\lbrace W_{j,t}\ W_{j^{\prime},t^{\prime}} \right\...
...eft( t+1\right)\right)}
H_{j}(f) \ H_{j^{\prime}}^{*}(f)
\ S_{X} (f) \ {\rm d}f$     (15)

where SX (f) is the spectral density function of Xt. Under the assumption of nominal band pass filters, Hj(f) and  $H_{j^{\prime}}(f)$ have band-passes given by  ${\mathcal{I}}_{j}$ and  ${\mathcal{I}}_{j^{\prime}}$, which for  $j \neq j^{\prime}$ do not overlap, and thus the correlation of wavelet coefficients across scales should be negligible for a Gaussian stationary process.

Furthermore, when $j=j^{\prime}$ and $t^{\prime}=t+\tau$ the correlation between wavelet coefficients reads

 \begin{displaymath}%
C_{W} (j, \tau) = \left < W_{j,t}\ W_{j,t+\tau} \right > = ...
...{j+1}\pi f \tau }
{\mathcal{H}}_{j}(f) \ S_{X} (f) \ {\rm d}f.
\end{displaymath} (16)

Once again under the approximation of nominal band-pass filters, we can argue that the above should be approximately zero for $\tau \neq 0$, when SX(f) is approximately constant in the interval  ${\mathcal{I}}_{j}$. In other words, at each level j the wavelet coefficients  $\vec{W}_{j}$ behave as uncorrelated white noise the same way Fourier coefficients do.

As discussed above, the transfer function of the wavelet filter  ${\mathcal{H}}_{j} = H_{j} \ H_{j}^{*}$ at each level j can be well approximated by a nominal band-pass filter taking the form

\begin{displaymath}%
{\mathcal{H}}_{j}(f) = \left\lbrace
\begin{array}{ll}
2^{j...
... \frac{1}{2^{j}} \\
0 & {\rm otherwise}.
\end{array} \right.
\end{displaymath} (17)

Within a particular wavelet family, this approximation will be more accurate for large filter widths, L; however, the number of boundary coefficients $n_{\rm b} = L-2$ will be more important. In practice, it is always possible to find a relatively low value of L for which the wavelet filter nearly acts like a perfect high pass on the data, such that the decorrelation between wavelet coefficients takes place.

In summary, for a stationary Gaussian process Xt with spectral density function SX (f) and under the nominal band-pass filter approximation, the wavelet coefficients at scale j are uncorrelated with variance

 
                         Cj = $\displaystyle {\rm var} \left\lbrace {\vec{W}}_{j} \right\rbrace =
\int_{\frac{-1}{2}}^{\frac{1}{2}} {\mathcal{H}}_{j}(f) \ S_{X}(f) \ {\rm d}f$  
  $\textstyle \simeq$ $\displaystyle \frac{1}{\frac{1}{2^{j}}-\frac{1}{2^{j+1}}} \int_{\frac{1}{2^{j+1}}}^
{\frac{1}{2^{j}}} S_{X}(f)\ {\rm d}f.$ (18)

We have ignored the scaling coefficients. However, we can also calculate their variance from the rms of the timeline so that

 \begin{displaymath}%
C_{J+1}= {\rm var} \left\lbrace {\vec{V}}_{J}\right\rbrace ...
...\lbrace {\vec{W}}_{j} \right\rbrace}{2^{j}} \right\rbrace\cdot
\end{displaymath} (19)

3.2 Wavelet description of 1/f-type noise

For most CMB data sets, the noise associated to each detector timeline generally corresponds to a stationary long memory random process also known as 1/f-type noise. This kind of process presents long term correlations in time that show up in the Fourier domain as a strong increase in power with decreasing frequency. The specific characteristics of the noise depend closely on the detector itself as well as on the operational conditions like the temperature of the focal plane bath, the current and intensity applied to the detector, the electronics, etc. In general, the noise properties of any given detector are badly known and directly derived from the data themselves. For large data sets, as would be the case for the PLANCK satellite surveyor, this task can be extremely expensive in computation and can require fast techniques and algorithms. In this section we show how the wavelet transform can be used to represent and study 1/f-type noise based on the previous results. The DWT has two main advantages with respect to the DFT. First, from the computational point of view, it is faster and presents no limits to the number of samples to be used as the DFT do; and second, the wavelet representation of the stationary process is more compact in terms of number of coefficients.

The simplest 1/f-type noise is the white noise that is completely uncorrelated in the time domain and presents a flat spectral density function of the form

 \begin{displaymath}%
S^{WN} (f) = \sigma^{2}.
\end{displaymath} (20)

Under the hypothesis of nominal band-pass wavelet filters and using Eq. (18), the variance of the white noise wavelet coefficients at scale j is given by

\begin{displaymath}%
C^{WN}_{j} = \sigma^{2}.
\end{displaymath} (21)

Using Eq. (16) we can show that the white noise wavelet coefficients are fully uncorrelated at each scale j

\begin{displaymath}%
C_{W}^{WN} (j,\tau) = \left\lbrace
\begin{array}{ll}
C^{...
...f \ \tau = 0 \\
0 & {\rm otherwise}.
\end{array}
\right.
\end{displaymath} (22)

As the white noise wavelet coefficients are also uncorrelated between different scales j and  $j^{\prime}$, we can state that the wavelet transform of a white noise is also a white noise of the same spectral density function.


  \begin{figure}
\par\mbox{\epsfig{file=Figures/4468f1.ps,height=5cm,width=9cm} \e...
...,width=9cm} \epsfig{file=Figures/4468f6.ps,height=5cm,width=9cm} }
\end{figure} Figure 1: From left to right and from top to bottom, input power spectrum of simulated 1/f-type noise, histogram of the simulated noise wavelet coefficients and their expected values at scale j=1, correlation of the wavelet coefficients and their expected values for scale j=1 and j=10 and cross correlation of the wavelet coefficients at j=1 and j=10 with scales j=2 and j=11 respectively.
Open with DEXTER


  \begin{figure}
\par\mbox{\includegraphics[height=4.25cm,width=7.4cm,clip]{Figure...
...ludegraphics[height=4.25cm,width=7.65cm,clip]{Figures/4468f8.ps} }
\end{figure} Figure 2: Left panel: simulated 1/f-type noise using a wavelet algorithm. Right panel: power spectrum of the wavelet simulated 1/f-type noise compared to the input power spectrum in red.
Open with DEXTER

In general, the spectral density function of 1/f-type noise is represented as

 \begin{displaymath}%
S_{X} (f) = \sigma^{2} \left( 1 + {\left( \frac{f_{{\rm knee}}}{f} \right)}^{\alpha} \right)
\end{displaymath} (23)

where $f_{\rm knee}$ is known as the knee frequency. For $f > f_{\rm knee}$ the spectral density function is dominated by the white noise term that is parametrized by $\sigma$. In contrast, the 1/f dominates for $f < f_{\rm knee}$. From Eq. (18) the variance of the wavelet coefficients at level j for 1/f-type noise is given by

\begin{displaymath}%
C^{1/f}_{j} = \left\lbrace
\begin{array}{ll}
\sigma^{2} \...
... (j+1)} \right] & {\rm if}\ \alpha > 1.
\end{array}
\right.
\end{displaymath} (24)

Then using Eq. (16), the correlation between wavelet coefficients is given by
    $\displaystyle C_{W}^{1/f}(j,\tau) = C_{W}^{WN} (\tau) + \sigma^{2} \ 2^{(j+1)} \ f_{\rm knee}^{\alpha} \ \ \ldots$  
    $\displaystyle \qquad \ldots \ \int_{2^{-j-1}}^{2^{-j}} \cos \left(2^{j+1} \pi f \tau \right) \ f^{-\alpha} \ {\rm d}f,$ (25)

which is largely dominated by the white noise term so that the wavelet coefficients are quasi-uncorrelated. This equation is only valid for nominal band-pass filters. However, from a practical point of view, increasing the number of non-zero coefficients in the wavelet filter will flatten the 1/f term in the integral leading to the desired level of decorrelation between wavelet coefficients.

To test the above statements we simulated a TOD of 1/f-type noise with 215 samples. Figure 1 shows the input power spectrum of simulated 1/f-type noise, the histogram of the simulated noise wavelet coefficients at scale j=1, the correlation of the wavelet coefficients and their expected values for scale j=1 and j=10, and the cross correlation of the wavelet coefficients and their expected values at j=1 and j=10 with scales j=2 and j=11. We observe that the wavelet coefficients show a Gaussian structure. Likewise, they are quasi-decorrelated as expected within the same scale and totally uncorrelated between scales.

   
3.3 Simulation of Gaussian stationary noise using wavelets

From the above we conclude that, under the nominal band-pass filter approximation, the DWT decorrelates Gaussian stationary processes. Thus it is possible to simulate Gaussian stationary processes via its DWT (Percival et al. 2000; Vidakovic 1999) in the same way it is performed using the DFT. Indeed we can easily simulate stationary and Gaussian time series Xt of length N=2J and spectral density function SX(f) by performing the following steps

1.
Using a random-number generator, generate a vector  ${\vec{Z}}_{M}$ containing M=4N Gaussian white noise deviates with zero mean and unit variance.

2.
Integrate numerically Eq. (18) to calculate the approximate wavelet coefficient variances $C_{j}, \ \ j=1, \ \ldots,\ J+2$ and use Eq. (19) to calculate the scaling coefficient variances.

3.
Multiply the first M/2 elements of ${\vec{Z}}_{M}$ by  $\sqrt{C_{1}}$, the next M/4 values by  $\sqrt{C_{2}}$, and so forth, until to the final four elements, which need to be multiply by  $\sqrt{C_{J+1}}$, $\sqrt{C_{J+1}}$, $\sqrt{C_{J+2}}$, $\sqrt{C_{J+3}}$.

4.
Compute the inverse DWT of the modified ${\vec{Z}}_{M}$ vector and then randomly choose M samples from it.
The above recipe can be easily understood in terms of the wavelet covariance matrix of a Gaussian stationary time series with N=2J samples, which due to the decorrelation property is given by

 \begin{displaymath}%
\begin{array}{l}
\Sigma_{{\vec{W}}} =
{\rm diag} \ \lbrac...
...m of} \ {\rm these}}, C_{J}, C_{J+1}
\ \rbrace. \\
\end{array}\end{displaymath} (26)

The left panel of Fig. 2 shows a simulation of 1/f-type noise obtained from the algorithm described above. In the right panel, we represent the power spectrum of the simulated data, which agrees with the input power spectrum.

   
4 Locally stationary Gaussian noise


  \begin{figure}
\par\mbox{\includegraphics[height=4.25cm,width=7.2cm,clip]{Figure...
...ludegraphics[height=4.25cm,width=7.2cm,clip]{Figures/4468f10.ps} }
\end{figure} Figure 3: From left to right, locally stationary Gaussian noise obtained from two white noises of variances 1 and 25, and its wavelet transform with wavelet coefficients ordered from left to right for decreasing scale.
Open with DEXTER

Locally stationary processes appear in many physical systems in which the mechanisms that produce random fluctuations change slowly in time. Over given time intervals, such processes can be approximated by a stationary one. This is the case, for example, for the variations observed in the total noise power of some CMB detectors due to fluctuations in the focal plane temperature as discussed in Sect. 6.1. From the Fourier point of view we can imagine the variation in the data power spectrum with time. We could also define locally-stationary Gaussian processes as those for which the time-frequency plane is divided into time intervals each corresponding to a stationary process and all of which are uncorrelated between them. Notice that the above definitions of locally-stationary processes are very general and, within the limit of infinitesimal time intervals, they lead to non-stationarity in a more general manner.

4.1 Fourier approximation to the locally-stationary process

For a zero-mean random process ${\vec{X}}_{t}$, we can define its time evolving auto-covariance as

\begin{displaymath}%
s_{{\vec{X}}}(t,t^{\prime})= \left < {\vec{X}}(t), \ {\vec{X}}(t^{\prime}) \right >,
\end{displaymath} (27)

which can also be expressed in terms of the distance between t and  $t^{\prime}$ and the mid-point between them:

\begin{displaymath}%
s_{{\vec{X}}}(t,t^{\prime})= C \left(\frac{t+t^{\prime}}{2},t-t^{\prime}\right).
\end{displaymath} (28)

Note that in this definition the covariance of a stationary process reads as

\begin{displaymath}%
C \left(\frac{t+t^{\prime}}{2},t-t^{\prime}\right) = C (t-t^{\prime}) = C (\tau).
\end{displaymath} (29)

For a locally stationary process, we expect that in the neighborhood of any $x \in \mathcal{R}$, there exists an interval of size l(x) where the process can be approximated by a stationary one. Therefore for any $\frac{t+t^{\prime}}{2} \in \left\lbrack x-\frac{l(x)}{2},
x+\frac{l(x)}{2}\right\rbrack$

\begin{displaymath}%
C \left(\frac{t+t^{\prime}}{2},t-t^{\prime}\right) \approx C (x,t-t^{\prime}).
\end{displaymath} (30)

If t and $t^{\prime}$ are far apart and do not belong to the same interval of stationarity, then we expect that X(t) and  $X(t^{\prime})$ are uncorrelated. This means that for any $u \in \left\lbrack x-\frac{l(x)}{2}, x+\frac{l(x)}{2}\right\rbrack$, if $\left\vert v \right\vert > \frac{l(x)}{2}$, then

\begin{displaymath}%
C (u,v) = \left < X\left(u+\frac{v}{2}\right), X\left(u-\frac{v}{2}\right)\right > \approx 0.
\end{displaymath} (31)

In the time frequency-plane we can define a time-varying spectrum  $\Lambda(t,w)$ which would be given by the Fourier transform of the time-varying covariance $C (t+\frac{v}{2},t-\frac{v}{2})$ with respect to v. For stationary processes  $\Lambda(t,w)$ does not depend on time and corresponds to the standard power spectrum. For locally-stationary processes (Mallat et al. 1998) the spectrum varies with time but is constant in time within the stationarity intervals defined above. This means that the same way the Fourier transform decorrelates Gaussian stationary processes, the windowed Fourier transform decorrelates locally stationary processes. In other words, the covariance matrix of locally-stationary Gaussian noise is diagonal in an orthogonal base given by performing a windowed Fourier transform in each stationary interval (Mallat et al. 1998). This base divides the time-frequency plane into rectangles of the form

\begin{eqnarray*}\left\lbrack x-\frac{l(x)}{2},x+\frac{l(x)}{2}\right\rbrack \ \...
... f_{k}-\frac{1}{2 \ l(x)},f_{k}+\frac{1}{2 \ l(x)}\right\rbrack
\end{eqnarray*}


where $f_{k} = \frac{k+ \frac{1}{2}}{l(x)}, \ \ k=1,2,\ldots, \frac{f^{\rm samp}}{2 \ l(x)}$ and $f^{\rm samp}$ is the sampling frequency. Note that the DWPT produces a similar decomposition of the time-frequency plane and can also be used to obtain a diagonal base for the covariance matrix of stationary Gaussian process  ${\vec{X}}_{t}$. For this purpose, best-basis algorithms combining the DWPT at all available scales j have been developed (Coifman & Wickerhauser 1992; Percival & Walden 2000; Vidakovic 1999). We will not account for them in this paper.

In the previous section we have shown that the DWT decorrelates Gaussian stationary processes under the approximation of nominal band-pass wavelet and scaling filters. Therefore, the DWT will also decorrelate a locally-stationary Gaussian process that can be considered as a series of attached stationary Gaussian processes that are uncorrelated between them. The same way we defined a time varying power spectrum above, we can also consider the evolution in time of the variance of the wavelet coefficients at each level j. As wavelet coefficients are localized in time, this evolution can be computed naturally from the DWT of the data. Thus, for Gaussian locally-stationary processes, we expect the variance of the wavelet coefficients to change from one stationary interval to another and to be constant within a given interval. This statement can be fully understood by considering Eq. (18) and accounting for the time evolution of the data power spectrum. To clarify this issue we can use Eq. (26) to write the covariance matrix of the wavelet coefficients of a Gaussian locally-stationary process composed of two stationary time intervals

 
                                   $\displaystyle \Sigma_{{\vec{W}}} =
{\rm diag}\ \big \lbrace
\!\!\underbrace{C^{...
...underbrace{C^{1}_{2},C^{1}_{2}}_{\frac{N}{8} \ {\rm o}f \ {\rm these}}, \ldots,$  
    $\displaystyle \qquad \underbrace{C^{2}_{2},C^{2}_{2}}_{\frac{N}{8} \ {\rm of} \...
...race{C^{1}_{j},C^{1}_{j}}_{\frac{N}{2^{j+1}} \ {\rm of} \ {\rm these}}, \ldots,$  
    $\displaystyle \qquad \underbrace{C^{2}_{j},C^{2}_{j}}_{\frac{N}{2^{j+1}} \ {\rm...
...underbrace{C^{2}_{J-1}}_{1 \ {\rm of} \ {\rm these}}, C_{J}, C_{J+1}\big\rbrace$ (32)

where the indexes 1 and 2 refer to each of the stationary time intervals, respectively. In summary, the DWT decorrelates Gaussian locally-stationary processes and naturally permits identification of the distinct intervals of stationarity in the data using the time evolution of the wavelet coefficient variance.

For illustration, in the left plot of Fig. 3, we show an example of a locally stationary noise given by the superposition in time of two white noises of variances 1 and 25. Its wavelet transform is traced in the right plot with the wavelet coefficients ordered from left to right for decreasing scale. We can clearly observe two distinct parts for each scale corresponding to the two white noises.

   
4.2 Time modulated-stationary processes

Often, the noise total power of CMB instrument detectors changes with time due to slow fluctuations in the temperature of the focal plane bath or drifts in the background power. In general, this leads to locally-stationary Gaussian noise for which the power spectrum varies during the observation period but in the same way for all frequency components. Indeed, the noise total power increases and/or decreases with time, but the shape of the power spectrum remains the same throughout the observation period. These processes can be approximated well by a stationary Gaussian process modulated in time (Fryzlewicz et al. 2002) which is defined by

 \begin{displaymath}%
X_{t} = \sigma(t) \ \times \ Y_{t}
\end{displaymath} (33)

where Yt is a Gaussian stationary process and $\sigma (t)$ is a function of time. From a more physical point of view, $\sigma (t)$ represents the change in the noise total power. For convenience we consider that the variance function  $\sigma^{2}(t)$ is defined on (0,1) by assuming Yt has the time-constant power spectrum of Xt as power spectrum. Finally, notice that time-modulated processes are just a particular case of locally-stationary processes.

From a practical point of view, the above model is quite handy because only $\sigma (t)$ has to be estimated from the data. Simple, although biased, estimates of $\sigma (t)$ can be obtained by comparing the non time-dependent power spectrum of the time series PX(w) to power spectra, Px(w,tk) calculated at different but connected time intervals during the observation period

\begin{displaymath}%
{\widetilde{\sigma}}^{2} (t_{k}) = \frac{\sum_{w} P_{x}(w,t_{k})}{\sum_{w} P_{X}(w)}\cdot
\end{displaymath} (34)

This estimate leads to the approximation of $\sigma (t)$ by a step-function so that the time intervals have to be chosen especially to represent the shape of $\sigma (t)$ as well as possible. In contrast, using the MODWT is possible to easily obtain unbiased estimates of $\sigma (t)$ (Fryzlewicz et al. 2002) from

 \begin{displaymath}%
{\widetilde{\sigma}}^{2} (t) = \sum_{j=1}^{J} \ 2^{-j} \left\vert{\widetilde{W}}_{j,t}\right\vert^{2}.
\end{displaymath} (35)

The extra factor 2-j can be easily understood from the normalization factor introduced in the definition of the MODWT wavelet and scaling filters when deduced from the DWT ones. Few more comments on Eq. (35) are needed. First, $\sigma (t)$ is not defined on (0,1) and has to be renormalized to be compared to the Fourier estimate. Second, as discussed in Sect. 2.3, the MODWT is not an orthonormal transform and, for each scale j, the MODWT coefficients are artificially correlated in time by the MODWT transform itself. However, this does not bias or limit the estimation of $\sigma (t)$ as we consider that all scales evolve in the same way in time so we are only interested in smooth time variations. Third, the MODWT estimate is not consistent, so it has to be smeared out to reduce its variance. Finally, we could obtain an unbiased estimator from the DWT in a similar manner but its construction is more complex due to the time undersampling performed by the DWT as the levels increase.

In summary, the time-modulated stationary model for random processes can be very interesting when dealing with Gaussian locally-stationary CMB time series that present time evolution in the noise total power but not in their power spectrum shape. Indeed, once estimates for the $\sigma (t)$ function are available, the simulation of such processes is reduced to simulating a stationary Gaussian process from the power spectrum of Xt, which is then multiplied by that function. Moreover, the covariance matrix of such processes can be approximated well in the wavelet space as shown in the following subsection.

4.3 Time modulated-stationary wavelet processes

Based on the definition of time modulated-stationary processes and on the properties of the wavelet coefficients for stationary and locally stationary processes, we can also define time modulated-stationary wavelet processes as those for which the variance of the wavelets coefficients at level j has the form

 \begin{displaymath}%
{\rm var}\left\lbrace W_{j,t} \right\rbrace = \sigma^{2}_{j}(t) \times C_{j}
\end{displaymath} (36)

where Cj is a constant and $\sigma_{j}(t)$ is a function of time. Notice that this definition also accounts for the time modulated-stationary processes for which the different $\sigma_{j}(t)$ functions are a consequence of the undersampling performed in the DWT.

As in the above definition we have supposed the wavelet coefficients are uncorrelated. Their covariance matrix takes a diagonal form:

 
                                      $\displaystyle \Sigma_{{\vec{W}}} =
{\rm diag} \ \big\lbrace
\underbrace{\sigma_...
...,\sigma_{1}(t^{1}_{2}) \ C_{1}}_{\frac{N}{2} \ {\rm of} \ {\rm these}}, \ldots,$  
    $\displaystyle \qquad \underbrace{\sigma_{2}(t^{2}_{1}) \ C_{2},\sigma_{2}(t^{2}...
...gma_{j}(t^{j}_{2}) \ C_{j}}_{\frac{N}{2^{j}} \ {\rm of} \ {\rm these}}, \ldots,$  
    $\displaystyle \qquad \underbrace{\sigma_{J-1}(t^{J-1}_{1})C_{J-1},\sigma_{J-1}(t^{J-1}_{2})C_{J-1}}_{2 \ {\rm of} \ {\rm these}}, C_{J}, C_{J+1}\big\rbrace$ (37)

where tjk represents the kth time sample of the level j DWT.

   
4.4 Statistical analysis of CMB data sets with locally stationary noise

Often, in the time-domain processing of CMB data sets (subtracting systematics and optimal map making), we have to deal with the inversion of the noise covariance matrix. In general, we consider the noise to be Gaussian and stationary or piece-wise stationary, so that the noise correlation matrix is diagonal in Fourier space. Therefore, this can be trivially inverted for each piece of data. Above, we have shown that the covariance matrix of correlated Gaussian and non stationary time-modulated noise (in particular, locally stationary) is diagonal in the wavelet space and can be described simply by a set of coefficients Cj, where j is the wavelet scale index, modulated by a time-dependent function  $\sigma_{j}(t)$. Therefore, the inversion of this matrix is also trivial in wavelet space.

If we consider maximum likelihood algorithms such as optimal map making, the resolution of the system involves terms in the form N-1d where N is the noise correlation matrix and d a data vector. In Fourier space we perform this operation by first taking the Fourier transform of d, then dividing it by the diagonal terms of the Fourier representation of N and finally transforming back to real space. Equivalently, in the wavelet case we can write for the stationary case

\begin{displaymath}%
N^{-1}d = W^{T}\left(\left\lbrace Wd \right\rbrace_{j,k}/C_{j} \right)
\end{displaymath} (38)

where W represents the direct DWT, WT the inverse wavelet transform, and Cj are the variances of the wavelet coefficients of the noise wavelet transform. In the same way for non stationary time-modulated noise, we can write

\begin{displaymath}%
N^{-1}d = W^{T}\left(\left\lbrace W\left(d/\sigma(t)\right)\right\rbrace_{j,k}/(C_{j})\right)
\end{displaymath} (39)

where $\sigma (t)$ is the time modulation function.

In general the noise covariance matrix is not known and has to be estimated directly from the data. For the stationary case, this is simple since we only need to compute the variance of the wavelet coefficients for each of the different scales. For non stationary time modulated noise, we can have a very good approximation using Eq. (36) by first computing the variance of the wavelet coefficients assuming stationary noise and then computing the time modulation function from the global evolution of the wavelet coefficients.

The top panel of Fig. 4 shows the global time variation of the noise of one of the Archeops bolometers that decreases with time (a complete description of the Archers data is presented in Sect. 6). We overplot the time modulation function, $\sigma (t)$, as determined from the data following the above technique. Finally, the bottom panel of the figure shows the global time variation of a simulation of the Archeops noise. This was obtained by multiplying by $\sigma (t)$a realization of a stationary Gaussian noise produced as described in Sect. 3.3. In red we overplot the $\sigma (t)$estimated for the simulation that behaves like the one estimated from the data. Notice that this kind of non stationary simulation of the noise in the Archeops data was used when computing the Archeops CMB angular power spectrum in Benoît et al. (2003a).


  \begin{figure}
\par\includegraphics[height=9cm,width=7.8cm,clip]{Figures/4468f11.ps} \end{figure} Figure 4: Top: reconstructed time modulation function $\sigma (t)$ of the Archeops noise in bolometer 217K04. Bottom: time modulation function $\sigma (t)$ for a non-stationary simulation of the Archeops noise in bolometer 217K04.
Open with DEXTER

   
5 Wavelet destriping

Data from CMB experiments are commonly limited by intrinsic 1/f-type noise in the detectors. For experiments with a circular scanning strategy like Archeops, WMAP, and Planck, the 1/f-type noise shows up like stripes in the sky maps. The techniques used to remove those stripes are in general called destriping algorithms (Efstathiou 2005; Keihänen et al. 2004; Maino et al. 2002; Poutanen et al. 2004; Revenu et al. 2000; Sbarra et al. 2003). These are based both on the statistical properties of 1/f-type noise and on the fact that the noise is not coherently projected in the maps, in contrast to the sky signal.

5.1 Derivation of a wavelet destriping algorithm

In the previous sections we have shown that the wavelet transform nearly diagonalizes the covariance matrix of 1/f-type noise. This allows us to design a destriping algorithm based on the wavelet transform.

We assume that the time domain data for a typical CMB experiment can be written as

 \begin{displaymath}%
d_{t} = s_{t} + n^{\rm c}_{t} +n^{\rm w}_{t}
\end{displaymath} (40)

where st is the sky signal, and $n^{\rm w}$ and $n^{\rm c}$ are a white and a correlated (color) noise component, respectively. A simple coaddition map of the sky m, can be formed from the time series data s as follows

 \begin{displaymath}%
m = \left(PP^{T} \right)^{-1} P^{T} s
\end{displaymath} (41)

where Pt,p is a pointing matrix defined to be 1 if at time t the instrument was pointing to the position of pixel p on the sky and 0 otherwise. This equation holds for any pixelization scheme of the sky. Thus, we can rewrite Eq. (40) as

 \begin{displaymath}%
d_t = P_{tp} m_p + n^{\rm c}_t +n^{\rm w}_t
\end{displaymath} (42)

Ptp mp defines a TOD formed from the deprojection of the sky map mand therefore in the following we can compute its DWT in 1D.

The main purpose of a destriping algorithm is simply to remove or significantly reduce the contribution from the correlated noise. This is obtained by maximizing the likelihood function, ${\cal L }$, over the noise and the sky signal defined by the sky map m. We can characterize the correlated noise $n^{\rm c}$ using its wavelet transform

\begin{displaymath}%
\vec{W}_{\rm nc} = {\cal W} n^{\rm c}_t,
\end{displaymath} (43)

for which the wavelet coefficients are Gaussian distributed and quasi-uncorrelated. We can then write the likelihood function as follows
 
                            $\displaystyle %
{\cal L }$ = $\displaystyle P(\vec{d}\vert\vec{m})= P(\vec{W}_{\rm w}\vert\vec{m})P(\vec{W}_{nc}\vert\vec{m})$ (44)
  $\textstyle \sim$ $\displaystyle \exp\left[-\frac{1}{2} \left(
\vec{W}_{\rm w}^T {\tilde C}_{\rm w...
...+
\vec{W}_{\rm nc}^T {\tilde C}_{\rm nc}^{-1} \vec{W}_{\rm nc} \right) \right],$ (45)

where ${\tilde C}_{\rm w} $ and ${\tilde C}_{\rm nc}$ are the covariance matrices in wavelet space of the correlated and white noise, respectively:
    $\displaystyle {\tilde C}_{\rm w} = \left\langle \vec{W}_{\rm w} \vec{W}_{\rm w}^T \right\rangle$ (46)
    $\displaystyle {\tilde C}_{\rm nc} = \left\langle \vec{W}_{\rm nc} \vec{W}_{\rm nc}^T \right\rangle.$ (47)

Notice that for our assumptions both matrices are diagonal.

By taking the log of the likelihood function (45), we obtain

 
                           $\displaystyle %
\chi^{2}$ = $\displaystyle \vec{W}_{\rm w}^T {{\tilde C}_{\rm w}}^{-1} \vec{W}_{\rm w}
+{\vec{W}_{\rm nc}}^{T}{{\tilde C}_{\rm nc}}^{-1}{\vec{W}_{\rm nc}}$ (48)
  = $\displaystyle (\vec{d}-{\cal W}^T \vec{W}_{\rm nc}- {\cal P}\vec{m} )^{T}{C_{\rm w}}^{-1} \ldots$ (49)
    $\displaystyle \ldots (\vec{d}-{\cal W}^T \vec{W}_{\rm nc} - {\cal P}\vec{m}) +
\vec{W}_{\rm nc}^{T}{{\tilde C}_{\rm nc}}^{-1}{\vec{W}_{\rm nc}},$ (50)

whose minimization with respect to $\vec{m}$ and $\vec{W}_{\rm nc}$ leads to the following system of equations:
    $\displaystyle \vec{m} = ({\cal P} {\cal P})^{-1}{\cal P}^T (\vec{d} - {\cal W}^T {\vec{W}}_{\rm nc})$ (51)
    $\displaystyle 0 = - {{\tilde C}_{\rm w}}^{-1}{\cal W} ( I-{\cal P} ({\cal P} {\cal P})^{-1}{\cal P}^T)(\vec{d}-{\cal W}^T
{\vec{W}}_{\rm nc})+ \ldots$ (52)
    $\displaystyle \qquad \ldots {{\tilde C}_{\rm nc}}^{-1}{{\vec{W}}_{\rm nc}}$ (53)

that can be combined into
 
$\displaystyle %
({\cal W}( I-{\cal P} ({\cal P} {\cal P})^{-1}{\cal P}^T) {\cal...
...}}_{\rm nc} =
{\cal W}( I-{\cal P} ({\cal P} {\cal P})^{-1}{\cal P}^T) \vec{d}.$     (54)

The solution to this system of equations can be found using iterative methods and in particular we use a conjugate gradient method (Press et al. 1996).

5.2 Practical implementation

For 1/f-type noise, the variance of the wavelet coefficients at level j is given by

 \begin{displaymath}%
C_j = \: {\sigma}^2 \:2^{j \alpha} + {\sigma_{\rm w}}^2
\end{displaymath} (55)

where ${\sigma_{\rm w}}^2$ is the white noise variance and ${\sigma}^2 \:2^{j \alpha}$ is the variance of the correlated part, with $\alpha$ being the 1/f exponent as described in Eq. (23).

When the noise dominates the sky signal, we can extract the noise parameters  ${\sigma_w}$, $\sigma$, and $\alpha$ directly from the data itself via Monte Carlo Markov chain methods (Wada & Ito 2000). When the signal contribution cannot be neglected, the data wavelet variance is given by

 \begin{displaymath}%
{\sigma_{{\rm b},j}}^2 \:= {\sigma_{{\rm w},j}}^2 + \sigma_{n_{\rm c},j}^2 +
{\sigma_{{\rm sky}, j}}^2
\end{displaymath} (56)

where ${\sigma_{{\rm b},j}}$ and ${\sigma_{{\rm w},j}}$ can be iteratively evaluated from the data using a template of the sky signal that is improved at each iteration. From the wavelet transforms of the template and the data we then obtain
 
$\displaystyle %
{\sigma_{n_{\rm c}, j}}^2 ={{\sigma}_{{\rm b},j}}^2 - {{\sigma}_{{\rm sky},j}}^2
- {\sigma_{{\rm w},j }}^2.$     (57)

5.3 Application to simulated data


  \begin{figure}
\par\includegraphics[height=4.4cm,width=7.2cm,clip]{Figures/4468f12.ps}\end{figure} Figure 5: Baseline reconstruction on simulated Archeops data using a wavelet based destriping algorithm. In black we trace the simulated Archeops TOD. We overplot the reconstructed baseline in blue, green and red for the first, second, and third steps of the algorithm, respectively.
Open with DEXTER

The wavelet destriping was applied to simulated Archeops TOD at 545 GHz with 3 $\times$ 221 samples. We considered two main components in the data, Galactic dust emission, and 1/f-type noise. For the former we used the template for the Galactic dust emission scaled down to 545 GHz provided by Finkbeiner et al. (1999). The 1/f-type noise properties were deduced from the Archeops data.

We proceeded in three main steps. First of all, we estimated the lowest frequency components of the data from its wavelet decomposition and subtracted it. Secondly, we improved this estimation by solving Eq. (54) iteratively, starting with largest scales and progressively adding smaller and smaller scales. Finally, we computed an approximation to the signal, st, by thresholding the previous destriped map and deprojecting it into the time domain. We then, applied step two to the residuals, dt-st.


  \begin{figure}
\par\includegraphics[height=3.85cm,width=7.3cm,clip]{Figures/4468...
...\includegraphics[height=3.85cm,width=7.3cm,clip]{Figures/4468f14.ps}\end{figure} Figure 6: Top panel: destriped map of the simulated Archeops data at 545 GHz using the wavelet destriping algorithm. Bottom panel: residual stripes on the destriped map above.
Open with DEXTER

Figure 5 shows the simulated Archeops TOD at 545 GHz. We overplot the reconstructed noise, low-frequency components for steps one, two and three. We notice that we improve the estimated baseline in step three significantly reducing the stripes in the maps as shown in Fig. 6, where we represent the destriped map in the top panel and the residual stripes in the bottom panel.

   
6 Time-frequency visualization and analysis of the Archeops TOD

In the previous sections we concentrated on the wavelet description of random Gaussian processes. We studied the decorrelation properties of the DWT and their application to the statistical analysis of CMB data sets and in particular to the destriping of CMB maps. From the point of view of CMB data analysis this is very useful but not all we can obtain from the wavelet transform. Actually, the most important property of wavelets is their simultaneous localization of time and frequency. In this respect, wavelet analysis and, in particular, the DWPT are fundamental tools for data visualization and characterization. The CMB time series dt is in general a linear combination of signals, both galactic and cosmological in origin, of systematics effects likeatmospheric contamination, parasitic noises, electromagnetic contamination, and/or glitches and of random Gaussian noise

$\displaystyle %
d_{t} = \alpha^{\rm cos} s^{\rm cos}_{t} + \alpha^{\rm gal} s^{...
...\rm sys}^{\rm pn}_{t} + \alpha^{\rm ec} {\rm sys}^{\rm ec}_{t}+ \ldots + n_{t}.$     (58)

These components are different in nature and they present different time and frequency properties. For example, the cosmological and galactic signals only depend on the position of the sky observed, which is fully given by the instrument pointing. By contrast, the atmospheric signal depends both on the sky position and on the time evolution of the atmospheric conditions. Further, other systematic effects vary mainly with time and in general are related to physical phenomena in the detector surroundings. Finally, the noise component is intrinsic to each detector and is often of 1/f-type, Gaussian and locally-stationary. Since, the cosmological signal, the CMB emission, is much weaker in the time domain than all other components, these have to be identified, characterized, and removed with a high degree of precision before extracting the CMB anisotropies power spectrum.

   
6.1 Time-frequency analysis of the Archeops data


  \begin{figure}
\par\includegraphics[height=8.7cm,width=7.7cm,clip]{Figures/4468f15.ps}\end{figure} Figure 7: Top: time-frequency analysis of the TOD of the 217K04 Archeops bolometer using the DWPT. Bottom: time-frequency analysis of the expected Galactic emission in the TOD of the 217K04 Archeops bolometer using the DWPT. See text for details.
Open with DEXTER

In the following we present the most relevant issues in the wavelet analysis of the Archeops data set. A more detailed description of the Archeops data and its processing can be found in Macías-Pérez et al. (2005). To simplify further discussions, we assume only four main components in the Archeops timelines: CMB emission, galactic and atmospheric emissions, unidentified systematics, and Gaussian noise.

The top panel of Fig. 7 shows the DWPT time-frequency representation of a typical Archeops detector signal. We observe mainly two components that dominate at high and low frequencies. The low-frequency signal that varies significantly with time is mainly given by galactic emission, as shown by the bottom panel of the same figure where we represent the expected galactic emission for that detector. The circular scanning strategy of Archeops is such that we cross the Galactic plane perpendicularly during most of the observation time, so the galactic emission shows up as a spike in time. In contrast the scanning direction in the last hours of flight is collinear to the Galactic plane, and the galactic emission then becomes broader in the time direction, although we can still observe very few frequency spikes related to intense and compact regions of the Galactic plane. The nearly time-fixed structures observed at low frequency - although they look very much as 1/f noise - are due to systematics that are mainly dominated by atmospheric emission.


  \begin{figure}
\par\includegraphics[height=8.7cm,width=7.7cm,clip]{Figures/4468f16.ps}\end{figure} Figure 8: Top: time-frequency analysis of the TOD of the 217T04 Archeops bolometer using the DWPT. Bottom: time-frequency analysis of the expected Galactic emission in the TOD of the 217T04 Archeops bolometer using the DWPT. See text for details.
Open with DEXTER

The high frequency signal corresponds to the white noise of the detector whose power is artificially raised with increasing frequency when deconvolving from the bolometer time constant. We observe that the noise decreases smoothly with time showing a clear non-stationary behavior. Indeed, as time passed during the flight, the bolometer temperature decreased, therefore reducing the bolometer intrinsic noise. A detailed wavelet description and modeling of the non-stationarity of the Archeops noise is presented in the following subsection.

The top panel of Fig. 8 shows the DWPT of the worst Archeops bolometer (217T04) both in terms of noise and systematics. As above we plot the DWPT of the expected Galactic signal for this bolometer in the bottom panel. We observe that the DWPT of 217T04 presents, as above, the Galactic and atmospheric components at low frequencies and a time varying noise at high frequency. However, it is dominated by a strong time-varying signal in the frequency range 25 to 35 Hz. This signal is not observed in the data presented above so can be identified clearly as residuals from unknown systematic effects. We can, therefore, characterize the quality of the Archeops bolometers data just by visual inspection and identify which of them are badly affected by systematics. We have used this technique to identify, in terms of sensitivity and low level of systematics, the best Archeops bolometers for constructing the Archeops sky maps.

6.2 Modeling of the non-stationarity of the Archeops noise


  \begin{figure}
\par\includegraphics[height=9cm,width=7.8cm,clip]{Figures/4468f17.ps}\end{figure} Figure 9: Top: average power spectrum of the TOD of the 217K04 Archeops bolometer as a function of frequency computed from its DWPT. Bottom: time evolution of the power spectrum of the TOD of the 217K04 Archeops bolometer computed from its DWPT.
Open with DEXTER

The Archeops cryostat radically increased temperature when taking off and then cooled down slowly to achieve a nominal temperature of 95 mK for the last ten hours of flight. That produced a fair decrease in the noise power with time, which is the cause of the non stationarity of the Archeops bolometer noise. To account for this non stationarity, we modeled Archeops noise as a time-modulated stationary wavelet process, as it can be considered as locally (piece-wise) stationary noise with slowly time-varying power but for the first two hours of flight. Following Sects. 4.2 and 4.4 we have computed the $\sigma (t)$ function for each of the Archeops bolometers. The top panel in Fig. 9 shows the mean wavelet power as a function of frequency for the the Archeops bolometer 217K04. The bottom panel shows the reconstructed $\sigma (t)$ function for the same bolometer. From these two quantities we can produce non-stationary simulations of the Archeops noise as shown in Fig. 4. These non stationary simulations of the Archeops bolometer noise were used to determine the angular power spectrum of the noise in the Archeops maps that were needed for computing the Archeops CMB angular power spectrum in Benoît et al. (2003a) via a MASTER-like algorithm.

6.3 Wavelet destriping of the Archeops data


  \begin{figure}
\par\includegraphics[height=8.2cm,width=4.3cm,angle=90,clip]{Figu...
...graphics[height=8.2cm,width=4.3cm,angle=90,clip]{Figures/4468f19.ps}\end{figure} Figure 10: Top: wavelet destriped map of the Archeops 545 GHz after one iteration of the algorithm described in Sect. 5. Bottom: wavelet destriped map of the Archeops 545 GHz after two iteration of the algorithm.
Open with DEXTER

We applied the wavelet destriping algorithm presented in Sect. 5 to the Archeops data at 545 GHz. When working with the simulated Archeops data, we only considered two main components, Galaxy emission and 1/f-type noise. As shown above, for the Archeops data we also need to consider the atmospheric emission at low frequencies which mimic a 1/f-type but is not Gaussian-distributed. To overcome this problem, we first estimated and then removed a very low-frequency baseline in the data. After that, we applied two iterations of the wavelet destriping algorithm as described in Sect. 5. The results are presented in Fig. 10. The top figure corresponds to the destriped map after one iteration of the algorithm. Residual stripes dominate the right top corner of the map. These are significantly reduced by the second iteration at shown in the lower plot. It is important to notice that destriping allows us to recover the diffuse Galactic emission on the Gemini region (middle left of the figure).

The wavelet destriping algorithm presented here is mainly based on the minimization of the variance per pixel in the final map. As discussed in Bourrachot (2004) and Macías-Pérez et al. (2005) this approach is not optimal for the Archeops data because of the atmospheric emission. A better approach is to minimize the ratio in the map between the variance perpendicular and parallel to the scanning direction, $\frac{\sigma_{\parallel}^2}{\sigma_{\perp}^2}$. The generalization of the wavelet destriping algorithm to this latter approach is straight forward. As its implementation is much harder than when considering Gabor atoms as base functions, it has not being considered for the analysis of the Archeops data. However, despite the simpler approach, the quality wavelet destriped maps can be compared to that of the final destriped maps used for the Archeops analysis (Macías-Pérez et al. 2005).

6.4 Wavelet detrending of the Archeops data

As shown above, the Archeops destriped maps contain residual stripes that need to be removed before any scientific analysis of the data. In general these stripes are superposed on the sky signal of interest, and so careful filtering is needed. As the Galactic signal in the Archeops maps is spike-like, any filtering will produce ringing in the final maps. To overcome these problems, we developed a wavelet-based detrending algorithm. First of all, we mask the Galactic signal and obtain a first approximation of the low frequency components of the data by fitting them to a base of Gabor atoms. Then, we use the fitted data to interpolate over the Galactic mask. Finally, via a wavelet denoising algorithm we obtain the trends on the interpolated data, which are then removed before map making. For denoising we use a wavelet thresholding algorithm limited to the few first smaller scales, typically up to j=9 when working with 6-million data samples.

Figure 11 shows the performance of the detrending algorithm in the time domain. We plot the interpolation to the data obtained by fitting Gabor atoms and the baseline obtained via the wavelet algorithm. We observe clearly that the wavelet baseline manages to follow the accidents on the data much better reducing both the residual stripes and the ringing in the final maps. The bottom panel of Fig. 12 represents the wavelet filtered map at 353 GHz, which can be compared to the Fourier-filtered map in the top panel of the figure. In the wavelet-filtered maps, the residual stripes and ringing are much less. Moreover, the diffuse structure of the Galactic emission is better preserved. Notice that the wavelet filtered maps at 353 GHz were used for the determination of the polarization of the diffuse Galactic dust emission with Archeops (Benoît et al. 2004).


  \begin{figure}
\par\includegraphics[height=4cm,width=7cm,clip]{Figures/4468f20.ps}\end{figure} Figure 11: Performance of the wavelet filtering in the 353 GHz Archeops data. In orange we trace the reconstructed data baseline using a fit to a base of Gabor atoms. In blue we represent the reconstructed baseline using the wavelet detrending algorithm.
Open with DEXTER


  \begin{figure}
\par\includegraphics[height=8.25cm,width=4.3cm,angle=90,clip]{Fig...
...raphics[height=8.25cm,width=4.3cm,angle=90,clip]{Figures/4468f22.ps}\end{figure} Figure 12: Top: fourier-filtered map of the Archeops 353 GHz data. Bottom: wavelet-filtered map of the of the Archeops 353 GHz data. See text for details.
Open with DEXTER

   
7 Conclusions

Because of their simultaneous time and frequency localization properties, the DWT, the MODWT, and the DWPT are very important tools for analyzing TOD from CMB experiments. They allow us to trace the evolution in time of the data power spectrum and to easily identify systematic effects on the data.

The DWT permits a compact representation of Gaussian stationary noise. Indeed, it decorrelates Gaussian 1/f-type noise, which is common in the detectors of CMB experiments. Within the wavelet description, the covariance matrix of Gaussian 1/f-type noise is diagonal, as is the case in Fourier space. This allows us to efficiently and accurately simulate Gaussian 1/f-type noise using wavelets. Moreover, the DWT transform permits a straightforward description of locally stationary Gaussian noise via the time evolution of the variance of the wavelet coefficients but with a diagonal covariance matrix. A particular case of this is the time-modulated Gaussian noise for which the time evolution is common to all the wavelet scales.

The above properties allow us to generalize the Fourier-space algorithms for fast optimal map making and maximum likelihood determination of the CMB angular power spectrum to the wavelet space, including both stationary and locally stationary Gaussian noise. In this end, we have developed a wavelet-based destriping algorithm, which reduce significantly the level of stripes in the maps both in simulated and true Archeops data. Even though this algorithm is based on a simple approach to minimization of the variance per pixel, the results obtained can be compared to those from more precise destriping algorithms.

As a test, we performed a full wavelet analysis of the Archeops data. The visualization and careful study of the time-frequency space via the DWPT allowed us to clearly identify systematics and to characterize the quality of the data. Further, we proceeded to the careful wavelet modeling of the locally stationary noise in the Archeops data. This modeling allowed us to obtain via simulations the angular power spectrum of the Archeops noise needed for estimating the CMB angular power spectrum with Archeops (Benoît et al. 2003a). We applied the wavelet destriping algorithm to the Archeops data obtaining encouraging results. Finally, we have developed a detrending algorithm based on a wavelet denoising of the data. This algorithm applied to the Archeops destriped data significantly reduce the residual stripes on the final maps and introduces very little ringing. The wavelet detrended Archeops maps at 353 GHz were used to determine the polarized diffuse emission from Galactic dust (Benoît et al. 2004).

Acknowledgements
Fist of all, we would like to thank the Archeops collaboration for allowing us to analyze the Archeops data and to present the results in here. We also thank B. Vidakovic, D. B. Percival, and X. Desert for very useful discussions during the writing of this paper.

References

 

Copyright ESO 2006