| Issue |
A&A
Volume 707, March 2026
|
|
|---|---|---|
| Article Number | A233 | |
| Number of page(s) | 28 | |
| Section | Cosmology (including clusters of galaxies) | |
| DOI | https://doi.org/10.1051/0004-6361/202555402 | |
| Published online | 17 March 2026 | |
Euclid preparation
LXXXIII. The impact of redshift interlopers on the two-point correlation function analysis
1
INAF-Osservatorio Astronomico di Brera, Via Brera 28, 20122 Milano, Italy
2
INFN-Sezione di Genova, Via Dodecaneso 33, 16146 Genova, Italy
3
Dipartimento di Fisica, Università di Genova, Via Dodecaneso 33, 16146 Genova, Italy
4
Aix-Marseille Université, CNRS, CNES, LAM, Marseille, France
5
SISSA, International School for Advanced Studies, Via Bonomea 265, 34136 Trieste TS, Italy
6
ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing, Via Magnanelli 2 Bologna Italy
7
INFN, Sezione di Trieste, Via Valerio 2, 34127 Trieste, TS, Italy
8
Dipartimento di Fisica - Sezione di Astronomia, Università di Trieste, Via Tiepolo 11, 34131 Trieste, Italy
9
INAF-Osservatorio Astronomico di Trieste, Via G. B. Tiepolo 11, 34143 Trieste, Italy
10
IFPU, Institute for Fundamental Physics of the Universe, via Beirut 2, 34151 Trieste, Italy
11
Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive Pasadena, CA 91109, USA
12
Johns Hopkins University, 3400 North Charles Street Baltimore, MD 21218, USA
13
California Institute of Technology, 1200 E California Blvd Pasadena, CA 91125, USA
14
INAF-IASF Milano, Via Alfonso Corti 12, 20133 Milano, Italy
15
Institut d’Astrophysique de Paris, UMR 7095, CNRS, and Sorbonne Université, 98 bis boulevard Arago, 75014 Paris, France
16
Institute of Space Sciences (ICE, CSIC), Campus UAB, Carrer de Can Magrans s/n, 08193 Barcelona, Spain
17
Dipartimento di Fisica, Università degli studi di Genova, and INFN-Sezione di Genova, via Dodecaneso 33, 16146 Genova, Italy
18
Dipartimento di Fisica e Astronomia “G. Galilei”, Università di Padova, Via Marzolo 8, 35131 Padova, Italy
19
INFN-Padova, Via Marzolo 8, 35131 Padova, Italy
20
Waterloo Centre for Astrophysics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
21
Department of Physics and Astronomy, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
22
Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada
23
Minnesota Institute for Astrophysics, University of Minnesota, 116 Church St SE Minneapolis, MN 55455, USA
24
Infrared Processing and Analysis Center, California Institute of Technology, Pasadena, CA 91125, USA
25
Université Paris-Saclay, CNRS, Institut d’astrophysique spatiale, 91405 Orsay, France
26
ESAC/ESA, Camino Bajo del Castillo s/n. Urb. Villafranca del Castillo, 28692 Villanueva de la Cañada, Madrid, Spain
27
School of Mathematics and Physics, University of Surrey, Guildford, Surrey GU2 7XH, UK
28
INAF-Osservatorio di Astrofisica e Scienza dello Spazio di Bologna, Via Piero Gobetti 93/3, 40129 Bologna, Italy
29
Dipartimento di Fisica e Astronomia, Università di Bologna, Via Gobetti 93/2, 40129 Bologna, Italy
30
INFN-Sezione di Bologna, Viale Berti Pichat 6/2, 40127 Bologna, Italy
31
INAF-Osservatorio Astronomico di Padova, Via dell’Osservatorio 5, 35122 Padova, Italy
32
Space Science Data Center, Italian Space Agency, via del Politecnico snc, 00133 Roma, Italy
33
INAF-Osservatorio Astrofisico di Torino, Via Osservatorio 20, 10025 Pino Torinese, (TO), Italy
34
Department of Physics “E. Pancini”, University Federico II, Via Cinthia 6, 80126 Napoli, Italy
35
INAF-Osservatorio Astronomico di Capodimonte, Via Moiariello 16, 80131 Napoli, Italy
36
Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, PT4150-762 Porto, Portugal
37
Faculdade de Ciências da Universidade do Porto, Rua do Campo de Alegre, 4150-007 Porto, Portugal
38
Dipartimento di Fisica, Università degli Studi di Torino, Via P. Giuria 1, 10125 Torino, Italy
39
INFN-Sezione di Torino, Via P. Giuria 1, 10125 Torino, Italy
40
European Space Agency/ESTEC, Keplerlaan 1, 2201 AZ Noordwijk, The Netherlands
41
Institute Lorentz, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, The Netherlands
42
Leiden Observatory, Leiden University, Einsteinweg 55, 2333, CC Leiden, The Netherlands
43
INAF-Osservatorio Astronomico di Roma, Via Frascati 33, 00078 Monteporzio Catone, Italy
44
INFN-Sezione di Roma, Piazzale Aldo Moro, 2 – c/o Dipartimento di Fisica Edificio G. Marconi, 00185 Roma, Italy
45
Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT), Avenida Complutense 40, 28040 Madrid, Spain
46
Port d’Informació Científica, Campus UAB, C. Albareda s/n, 08193 Bellaterra, (Barcelona), Spain
47
Institute for Theoretical Particle Physics and Cosmology (TTK), RWTH Aachen University, 52056 Aachen, Germany
48
INFN section of Naples, Via Cinthia 6, 80126 Napoli, Italy
49
Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive Honolulu, HI 96822, USA
50
Dipartimento di Fisica e Astronomia “Augusto Righi” – Alma Mater Studiorum Università di Bologna, Viale Berti Pichat 6/2, 40127 Bologna, Italy
51
Instituto de Astrofísica de Canarias, Vía Láctea, 38205 La Laguna, Tenerife, Spain
52
Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ, UK
53
Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Oxford Road Manchester, M13 9PL, UK
54
European Space Agency/ESRIN, Largo Galileo Galilei 1, 00044, Frascati Roma, Italy
55
Université Claude Bernard Lyon 1, CNRS/IN2P3, IP2I Lyon, UMR 5822, Villeurbanne, F-69100, France
56
Institut de Ciències del Cosmos (ICCUB), Universitat de Barcelona (IEEC-UB), Martí i Franquès 1, 08028 Barcelona, Spain
57
Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig de Lluís Companys 23, 08010 Barcelona, Spain
58
UCB Lyon 1, CNRS/IN2P3, IUF, IP2I Lyon, 4 rue Enrico Fermi, 69622 Villeurbanne, France
59
Institut d’Estudis Espacials de Catalunya (IEEC), Edifici RDIT, Campus UPC, 08860 Castelldefels, Barcelona, Spain
60
Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, Edifício C8, Campo Grande, PT1749-016 Lisboa, Portugal
61
Instituto de Astrofísica e Ciências do Espaço, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
62
Department of Astronomy, University of Geneva, ch. d’Ecogia 16, 1290 Versoix, Switzerland
63
INAF-Istituto di Astrofisica e Planetologia Spaziali, via del Fosso del Cavaliere 100, 00100 Roma, Italy
64
Aix-Marseille Université, CNRS/IN2P3, CPPM, Marseille, France
65
INFN-Bologna, Via Irnerio 46, 40126 Bologna, Italy
66
School of Physics, HH Wills Physics Laboratory, University of Bristol, Tyndall Avenue Bristol, BS8 1TL, UK
67
Universitäts-Sternwarte München, Fakultät für Physik, Ludwig-Maximilians-Universität München, Scheinerstrasse 1, 81679 München, Germany
68
Max Planck Institute for Extraterrestrial Physics, Giessenbachstr. 1, 85748 Garching, Germany
69
Dipartimento di Fisica “Aldo Pontremoli”, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
70
INFN-Sezione di Milano, Via Celoria 16, 20133 Milano, Italy
71
Institute of Theoretical Astrophysics, University of Oslo, P.O. Box 1029 Blindern, 0315, Oslo, Norway
72
Felix Hormuth Engineering, Goethestr. 17, 69181 Leimen, Germany
73
Technical University of Denmark, Elektrovej 327, 2800 Kgs. Lyngby, Denmark
74
Cosmic Dawn Center (DAWN), Denmark
75
Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany
76
NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA
77
Department of Physics and Astronomy, University College London, Gower Street London, WC1E 6BT, UK
78
Department of Physics and Helsinki Institute of Physics, Gustaf Hällströmin katu 2, 00014 University of Helsinki, Finland
79
Université Paris-Saclay, Université Paris Cité, CEA, CNRS, AIM, 91191 Gif-sur-Yvette, France
80
Université de Genève, Département de Physique Théorique and Centre for Astroparticle Physics, 24 quai Ernest-Ansermet, CH-1211 Genève 4, Switzerland
81
Department of Physics, P.O. Box 64, 00014 University of Helsinki, Finland
82
Helsinki Institute of Physics, Gustaf Hällströmin katu 2 University of Helsinki Helsinki, Finland
83
Laboratoire d’etude de l’Univers et des phenomenes eXtremes, Observatoire de Paris, Université PSL, Sorbonne Université, CNRS, 92190 Meudon, France
84
Mullard Space Science Laboratory, University College London, Holmbury St Mary, Dorking, Surrey, RH5 6NT, UK
85
NOVA optical infrared instrumentation group at ASTRON, Oude Hoogeveensedijk 4, 7991PD Dwingeloo, The Netherlands
86
Centre de Calcul de l’IN2P3/CNRS, 21 avenue Pierre de Coubertin, 69627 Villeurbanne Cedex, France
87
University of Applied Sciences and Arts of Northwestern Switzerland, School of Computer Science, 5210 Windisch, Switzerland
88
Universität Bonn, Argelander-Institut für Astronomie, Auf dem Hügel 71, 53121 Bonn, Germany
89
Dipartimento di Fisica e Astronomia “Augusto Righi” – Alma Mater Studiorum Università di Bologna, via Piero Gobetti 93/2, 40129 Bologna, Italy
90
Department of Physics, Institute for Computational Cosmology, Durham University, South Road Durham, DH1 3LE, UK
91
Université Côte d’Azur, Observatoire de la Côte d’Azur, CNRS, Laboratoire Lagrange, Bd de l’Observatoire CS 34229, 06304 Nice cedex 4, France
92
Université Paris Cité, CNRS, Astroparticule et Cosmologie, 75013 Paris, France
93
CNRS-UCB International Research Laboratory, Centre Pierre Binétruy, IRL2007, CPB-IN2P3 Berkeley, USA
94
University of Applied Sciences and Arts of Northwestern Switzerland, School of Engineering, 5210 Windisch, Switzerland
95
Institut d’Astrophysique de Paris, 98bis Boulevard Arago, 75014 Paris, France
96
Institute of Physics, Laboratory of Astrophysics, Ecole Polytechnique Fédérale de Lausanne (EPFL), Observatoire de Sauverny, 1290 Versoix, Switzerland
97
Aurora Technology for European Space Agency (ESA), Camino bajo del Castillo s/n Urbanizacion Villafranca del Castillo Villanueva de la Cañada, 28692, Madrid, Spain
98
Institut de Física d’Altes Energies (IFAE), The Barcelona Institute of Science and Technology, Campus UAB, 08193 Bellaterra (Barcelona), Spain
99
School of Mathematics, Statistics and Physics, Newcastle University, Herschel Building, Newcastle-upon-Tyne, NE1 7RU, UK
100
DARK, Niels Bohr Institute, University of Copenhagen, Jagtvej 155, 2200 Copenhagen, Denmark
101
Centre National d’Etudes Spatiales – Centre spatial de Toulouse, 18 avenue Edouard Belin, 31401 Toulouse Cedex 9, France
102
Institute of Space Science, Str. Atomistilor, nr. 409 Măgurele Ilfov, 077125, Romania
103
Consejo Superior de Investigaciones Cientificas, Calle Serrano 117, 28006, Madrid, Spain
104
Universidad de La Laguna, Departamento de Astrofísica, 38206 La Laguna, Tenerife, Spain
105
Institut für Theoretische Physik, University of Heidelberg, Philosophenweg 16, 69120 Heidelberg, Germany
106
Institut de Recherche en Astrophysique et Planétologie (IRAP), Université de Toulouse, CNRS, UPS, CNES, 14 Av. Edouard Belin, 31400 Toulouse, France
107
Université St Joseph; Faculty of Sciences, Beirut, Lebanon
108
Departamento de Física, FCFM, Universidad de Chile, Blanco Encalada 2008 Santiago, Chile
109
Universität Innsbruck, Institut für Astro- und Teilchenphysik, Technikerstr. 25/8, 6020 Innsbruck, Austria
110
Satlantis, University Science Park, Sede Bld, 48940 Leioa-Bilbao, Spain
111
Instituto de Astrofísica e Ciências do Espaço, Faculdade de Ciências, Universidade de Lisboa, Tapada da Ajuda, 1349-018 Lisboa, Portugal
112
Cosmic Dawn Center (DAWN)
113
Niels Bohr Institute, University of Copenhagen, Jagtvej 128, 2200 Copenhagen, Denmark
114
Universidad Politécnica de Cartagena, Departamento de Electrónica y Tecnología de Computadoras, Plaza del Hospital 1, 30202 Cartagena, Spain
115
Kapteyn Astronomical Institute, University of Groningen, PO Box 800, 9700 AV Groningen, The Netherlands
116
Dipartimento di Fisica e Scienze della Terra, Università degli Studi di Ferrara, Via Giuseppe Saragat 1, 44122 Ferrara, Italy
117
Istituto Nazionale di Fisica Nucleare, Sezione di Ferrara, Via Giuseppe Saragat 1, 44122 Ferrara, Italy
118
INAF, Istituto di Radioastronomia, 40129 Bologna, Italy
119
Department of Physics, Oxford University, Keble Road Oxford, OX1 3RH, UK
120
INAF – Osservatorio Astronomico di Brera, via Emilio Bianchi 46, 23807 Merate, Italy
121
ICL, Junia, Université Catholique de Lille, LITL, 59000 Lille, France
122
Instituto de Física Teórica UAM-CSIC, Campus de Cantoblanco, 28049 Madrid, Spain
123
CERCA/ISO, Department of Physics, Case Western Reserve University, 10900 Euclid Avenue Cleveland, OH 44106, USA
124
Technical University of Munich, TUM School of Natural Sciences, Physics Department, James-Franck-Str. 1, 85748 Garching, Germany
125
Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85748 Garching, Germany
126
Departamento de Física Fundamental. Universidad de Salamanca. Plaza de la Merced s/n., 37008 Salamanca, Spain
127
Instituto de Astrofísica de Canarias (IAC); Departamento de Astrofísica, Universidad de La Laguna (ULL), 38200 La Laguna, Tenerife, Spain
128
Université de Strasbourg, CNRS, Observatoire astronomique de Strasbourg, UMR 7550, 67000 Strasbourg, France
129
Center for Data-Driven Discovery, Kavli IPMU (WPI), UTIAS, The University of Tokyo, Kashiwa, Chiba 277-8583, Japan
130
Ludwig-Maximilians-University, Schellingstrasse 4, 80799 Munich, Germany
131
Max-Planck-Institut für Physik, Boltzmannstr. 8, 85748 Garching, Germany
132
Department of Physics & Astronomy, University of California Irvine, Irvine, CA 92697, USA
133
Department of Mathematics and Physics E. De Giorgi, University of Salento, Via per Arnesano CP-I93, 73100 Lecce, Italy
134
INFN, Sezione di Lecce, Via per Arnesano CP-193, 73100 Lecce, Italy
135
INAF-Sezione di Lecce, c/o Dipartimento Matematica e Fisica, Via per Arnesano, 73100 Lecce, Italy
136
Departamento Física Aplicada, Universidad Politécnica de Cartagena, Campus Muralla del Mar, 30202 Cartagena, Murcia, Spain
137
Instituto de Física de Cantabria, Edificio Juan Jordá, Avenida de los Castros, 39005 Santander, Spain
138
Observatorio Nacional, Rua General Jose Cristino, 77-Bairro Imperial de Sao Cristovao, Rio de Janeiro, 20921-400, Brazil
139
CEA Saclay, DFR/IRFU, Service d’Astrophysique, Bat. 709, 91191 Gif-sur-Yvette, France
140
Institute of Cosmology and Gravitation, University of Portsmouth, Portsmouth, PO1 3FX, UK
141
Department of Computer Science, Aalto University, PO Box 15400 Espoo, FI-00 076, Finland
142
Instituto de Astrofísica de Canarias, c/ Via Lactea s/n, La Laguna 38200, Spain. Departamento de Astrofísica de la Universidad de La Laguna, Avda. Francisco Sanchez, La Laguna, 38200, Spain
143
Caltech/IPAC, 1200 E. California Blvd. Pasadena, CA 91125, USA
144
Ruhr University Bochum, Faculty of Physics and Astronomy, Astronomical Institute (AIRUB), German Centre for Cosmological Lensing (GCCL), 44780 Bochum, Germany
145
Department of Physics and Astronomy, Vesilinnantie 5, 20014 University of Turku, Finland
146
Serco for European Space Agency (ESA), Camino bajo del Castillo s/n Urbanizacion Villafranca del Castillo Villanueva de la Cañada, 28692, Madrid, Spain
147
ARC Centre of Excellence for Dark Matter Particle Physics, Melbourne, Australia
148
Centre for Astrophysics & Supercomputing, Swinburne University of Technology, Hawthorn, Victoria 3122, Australia
149
Department of Physics and Astronomy, University of the Western Cape, Bellville, Cape Town, 7535, South Africa
150
DAMTP, Centre for Mathematical Sciences, Wilberforce Road Cambridge, CB3 0WA, UK
151
Kavli Institute for Cosmology Cambridge, Madingley Road Cambridge, CB3 0HA, UK
152
Department of Astrophysics, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
153
Department of Physics, Centre for Extragalactic Astronomy, Durham University, South Road Durham, DH1 3LE, UK
154
IRFU, CEA, Université Paris-Saclay, 91191 Gif-sur-Yvette Cedex, France
155
Oskar Klein Centre for Cosmoparticle Physics, Department of Physics, Stockholm University, Stockholm, SE-106 91, Sweden
156
Astrophysics Group, Blackett Laboratory, Imperial College London, London, SW7 2AZ, UK
157
Univ. Grenoble Alpes, CNRS, Grenoble INP, LPSC-IN2P3, 53 Avenue des Martyrs, 38000 Grenoble, France
158
INAF-Osservatorio Astrofisico di Arcetri, Largo E. Fermi 5, 50125 Firenze, Italy
159
Dipartimento di Fisica, Sapienza Università di Roma, Piazzale Aldo Moro 2, 00185 Roma, Italy
160
Centro de Astrofísica da Universidade do Porto, Rua das Estrelas, 4150-762 Porto, Portugal
161
HE Space for European Space Agency (ESA), Camino bajo del Castillo s/n Urbanizacion Villafranca del Castillo Villanueva de la Cañada 28692 Madrid, Spain
162
Department of Astrophysical Sciences, Peyton Hall, Princeton University, Princeton, NJ 08544, USA
163
Theoretical astrophysics, Department of Physics and Astronomy, Uppsala University, Box 515, 751 20 Uppsala, Sweden
164
Mathematical Institute, University of Leiden, Einsteinweg 55, 2333 CA Leiden, The Netherlands
165
Institute of Astronomy, University of Cambridge, Madingley Road Cambridge, CB3 0HA, UK
166
Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, 59000 Lille, France
167
Space physics and astronomy research unit, University of Oulu, Pentti Kaiteran katu 1, FI-90014 Oulu, Finland
168
Center for Computational Astrophysics, Flatiron Institute, 162 5th Avenue, 10010 New York, NY, USA
★ Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
6
May
2025
Accepted:
23
November
2025
Abstract
Context. The Euclid galaxy survey is designed to measure the spectroscopic redshift of emission-line galaxies (ELGs) by identifying the Hα emission line in their slitless spectra. The efficacy of this approach crucially depends on the signal-to-noise ratio (S/N) of the line, as sometimes noise fluctuations in the spectrum continuum can be misidentified as Hα. In addition, other genuine strong emission lines can be mistaken for Hα, depending on the redshift of the source. Both effects lead to ambiguities in the redshift measurement that can result in catastrophic redshift errors and the inclusion of ‘interloper’ galaxies in the sample.
Aims. This paper forecasts the impact on the galaxy clustering analysis of the expected redshift errors in the Euclid spectroscopic sample. Specifically, it investigates the effect of the redshift interloper contamination on the galaxy two-point correlation function (2PCF) and, in turn, on the inferred growth rate of structure fσ8 and Alcock–Paczynski (AP) parameters α∥ and α⊥.
Methods. This work is based on the analysis of 1000 synthetic spectroscopic catalogues, the EuclidLargeMocks, which mimic the area and selection function of the Euclid Data Release 1 (DR1) sample. We estimated the 2PCF of contaminated catalogues and separated the different contributions, particularly those coming from galaxies with correctly measured redshift and from contaminants. We explored different models of increasing complexity to describe the measured 2PCF at a fixed cosmology, with the aim of identifying the most efficient model to reproduce the data. Finally, we performed a cosmological inference and evaluated the systematic error on the inferred fσ8, α∥, and α⊥ values associated with different models.
Results. Our results demonstrate that a minimal modelling approach, which only accounts for an attenuation of the clustering signal regardless of the type of contaminants, is sufficient to recover the correct values of fσ8, α∥, and α⊥ at DR1. The accuracy and precision of the estimated AP parameters are largely insensitive to the presence of interlopers. The adoption of a minimal modelling induces a 1%–3% systematic error on the growth rate of structure estimation, depending on the considered redshift. However, this error remains smaller than the statistical error expected for the Euclid DR1 analysis.
Key words: methods: observational / methods: statistical / techniques: spectroscopic / telescopes / cosmology: observations / large-scale structure of Universe
© The Authors 2026
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.
1. Introduction
Galaxy surveys aim to map the large-scale structure of the Universe using galaxies as tracers of the underlying matter distribution to infer the cosmological model. One of the largest surveys is being conducted by the Euclid space mission (Euclid Collaboration: Mellier et al. 2025), which was launched by the European Space Agency (ESA) on the 1 July 2023. Its primary goal is to probe the expansion history of the Universe and the evolution of cosmic structures over the past ten billion years and, in turn, indirectly probe the nature of its two dominant components: dark matter and dark energy. The Euclid satellite uses slitless spectroscopy and the Near-Infrared Spectrometer and Photometer (NISP, Euclid Collaboration: Jahnke et al. 2025) to measure the redshift of tens of millions of galaxies and create one of the largest and most detailed three-dimensional maps of the Universe. The redshift of the observed galaxies is primarily determined by the position of the strongest emission lines in their spectra, in particular the Hα line. Since the measured redshift of galaxies is used to estimate their radial distance from us, systematic errors in the redshift determination can introduce contaminants in the spectroscopic sample and ultimately alter the observed galaxy spatial distribution.
The slitless spectroscopy used in Euclid implies that the observed spectra will generally have a lower resolution and that there will be more contamination from adjacent objects than when using slit or fibre spectroscopy. This leads to larger redshift measurement uncertainties but also to systematically wrong redshift determinations. The redshift error can be several orders of magnitude larger than the statistical uncertainty targeted by the experiment, which is of Δz ∼ 0.001 (Euclid Collaboration: Mellier et al. 2025). Euclid, like the upcoming NASA Nancy Grace Roman Space Telescope satellite1, has a medium-low spectral resolution of R = λ/Δλ < 1000 and a limited bandwidth, which leaves room for ambiguity in emission-line identification. Moreover, in order to measure the redshift of millions of galaxies, emission lines are detected at a S/N that is typically lower than the threshold adopted in targeted ground-based spectroscopic surveys (such as DESI, Levi et al. 2019), just sufficient to determine the redshift using a single prominent emission line. As a result, a non-negligible fraction of the objects in the Euclid spectroscopic catalogue will be interloper galaxies, that is, galaxies whose estimated redshift has a catastrophic error. In that case, the detected line is not the expected one but is either another emission line or a notably prominent noise spike. This can affect the clustering statistics and, in turn, the cosmological parameters obtained from them.
The impact of redshift interlopers has been studied in previous works. Pullen et al. (2015) introduced the formalism to model the galaxy power spectrum in the presence of interlopers. Addison et al. (2019) adopt the same formalism to forecast the impact of interloper galaxies on the baryon acoustic oscillations (BAO) and redshift-space distortion (RSD) analysis of future spectroscopic surveys targeting emission-line galaxies (ELGs). Foroozan et al. (2022) and Nguyen et al. (2024) present two-point correlation function (2PCF) models in the presence of small displacement interlopers and assessed the performance of their methods to recover unbiased estimates of the BAO parameters. Hilmi et al. (2024) present an analysis of contamination in Lyman-break galaxy samples at high-redshift by studying the spatial correlation with intermediate-redshift galaxies. Furthermore, methods to mitigate the impact of interlopers have been studied in recent years using simulations (Farrow et al. 2021; Euclid Collaboration: Blanchard et al. 2020; Peng & Yu 2023).
Within the context of the Euclid preparation, Euclid Collaboration: Monaco et al. (in prep.) describe the strategy to identify all potential sources of data systematics in the pipeline for the spectroscopic data analysis. This paper focuses on assessing the impact of redshift errors on 2PCF measurements and configuration-space galaxy clustering analysis at Euclid DR1. The counterpart to this study in Fourier space is described in the companion paper, Euclid Collaboration: Lee et al. (in prep.).
Assessing the impact of redshift errors in Euclid requires the consideration of realistic types and fractions of redshift interlopers. The Euclid Consortium has released a suite of 1000 mock catalogues, named EuclidLargeMocks (Euclid Collaboration: Monaco et al. 2025), which currently offers the best balance between robust statistical power and a realistic modelling of selection effects. This suite effectively mimics the anticipated types and proportions of interlopers in the Euclid Wide Survey (EWS), and we adopt it for this analysis. Using these catalogues, we study how the assumption of an incomplete model for the measured 2PCF in the presence of interlopers affects the cosmological parameter estimates. We focus on the growth rate of structures and on the Alcock–Paczynski (AP) parameters, and conduct a separate analysis for each case. Although both analyses are based on the same set of measurements, they rely on fundamentally different theoretical models for the 2PCF and target distinct ranges of scales. For this reason, we chose to separate the analyses and present the methodology and results in distinct sections.
The paper is structured as follows. In Sect. 2, we introduce the Euclid mission and the types of interloper galaxies that we expect to find in the spectroscopic catalogue. We provide a quantitative assessment of these contaminants and of their effect on the galaxy clustering 2PCF. In Sect. 3, we present the estimator of the 2PCF and the predicted 2PCF in the presence of interlopers. In Sect. 4, we describe the mock catalogues and 2PCF measurements. In Sect. 5, we evaluate the amplitude and relevance of the interloper galaxy contributions to the measured 2PCF. In Sect. 6, we perform a Monte Carlo Markov chain (MCMC) analysis of the full shape of the 2PCF and study how constraints on the growth rate, fσ8, change when using different theoretical models of varying complexity to describe the contaminated signal. In Sect. 7, we focus on the modelling of the BAO signal and study the bias on the derived AP parameters induced by adopting an inadequate model that does not account for the interloper presence. In Sect. 8 we conclude with a comprehensive discussion of the results and we draw our final conclusions.
2. Interloper galaxies in the Euclid mission
The Euclid mission anchors the determination of galaxy redshifts to the detection of the Hα line, the most intense emission line expected in the optical and near-infrared rest-frame wavelength of an ELG spectrum. This detection is carried out by the NISP instrument, designed to cover during the EWS a wavelength range 1206–1892 nm that enables the detection of the Hα line in the redshift range 0.84 ≤ z ≤ 1.88 (Euclid Collaboration: Mellier et al. 2025). To maximize the number of observed galaxies in a given exposure time, NISP performs slitless spectroscopy, thus capturing the spectra of all objects entering the telescope field of view. However, this strategy results in a medium-low spectral resolution (R > 480, Euclid Collaboration: Jahnke et al. 2025). As a result, the Hα line and the N II λλ6549,6584 doublet are blended into a single emission feature and cannot be separated at the detection threshold in signal-to-noise ratio (S/N) adopted to select the Euclid spectroscopic sample (Euclid Collaboration: Scaramella et al. 2022). Moreover, the limited wavelength range and S/N of the spectra generally prevent the detection of multiple emission lines. This leads to the presence of interlopers in the catalogues, since in most of the cases we have to rely on a single-line detection to assign a redshift value. When the measured spectrum has only one significant emission line, a prior on this line being Hα is used (Euclid Collaboration: Le Brun et al. 2026), since this is the most prominent expected emission line. With no additional spectral features, this guess can result in an interloper detection.
The relation between the true and measured redshifts for any type of galaxy, including interloper ones, can be derived from the redshift definition as
(1)
where λobs is the observed wavelength of the line and λrest is the expected rest-frame wavelength. When the observed wavelength of a feature is interpreted as the rest-frame wavelength of the incorrect line at a incorrect redshift, the relation becomes2
(2)
where λwrong and λtrue are respectively the incorrect and true wavelengths. In case of an interloper detection, the ratio in Eq. (2) significantly deviates from unity.
2.1. Classification of interlopers
There are two possible ways of incorrectly identifying the Hα line, leading to two distinct types of interlopers:
-
‘Line interlopers’ are galaxies with detected genuine emission lines incorrectly classified as Hα. Apart from Hα , some other emission lines are sufficiently intense to be detected (see Euclid Collaboration: Granett et al., in prep., and Sect. 2.2). Those lines enter the NISP wavelength range one by one in different redshift intervals, leading to possible line misidentifications. The systematic error in the redshift estimate given by Eq. (2) is deterministic and it depends on the ratio between the Hα wavelength and the one of the misidentified line.
-
‘Noise interlopers’ consists of objects from the parent sample that happen to enter the spectroscopic catalogue because of the presence an intense noise fluctuation resembling an emission line in their low-S/N spectrum. Typically, they correspond to galaxies whose spectral features are either weak or located outside the wavelength range of the instrument. Stars can also be mistaken for galaxies when their spectra have a low S/N. All these objects have featureless spectra and high noise. This misidentification results in a redshift estimate that is catastrophically different from the true redshift. Unlike line interlopers, however, there is no one-to-one relationship between the true and measured redshifts in this case, since the detection is based on random spikes in the noisy spectra.
2.2. Foreseen Euclid interloper galaxies
To characterize the population of interloper galaxies expected in the Euclid spectroscopic sample, we make use of the Euclid redshift error baseline model derived from end-to-end simulations by Euclid Collaboration: Granett et al. (in prep.). These simulations rely on statistical tools that bypass the complexity of the Euclid spectroscopic data reduction pipeline, producing realistic, though approximate, data products in significantly less computational time. The redshift error model was calibrated using a set of simulated NISP spectra with noise characteristics mimicking those expected in the EWS. The spectra were constructed from the EL-COSMOS catalogue (Saito et al. 2020) with the Fastspec simulator (Euclid Collaboration: Granett et al., in prep.) and analysed by the OU-SPE3 processing function of the Euclid Science Ground Segment to measure spectral features and redshift.
From these simulations, two emission lines were identified as primary sources of redshift errors from line misidentification: O IIIλ5008 and S III λ9531 (hereafter noted O III and S III). The O III line is the brightest line in the H β λ4863-O IIIλλ4959,5008 complex. Its visibility range is about 1.5 < ztrue < 2.7. Within the range 1.5 < ztrue < 1.8, line misidentification is less likely since both Hα and the O III lines are potentially detectable. Line misidentification can increase at ztrue > 1.8, where the Hα line cannot be observed any more: in this range, all the prominent O III lines can be mistaken for Hα. Moving to higher redshifts, we expect the misidentification probability to decrease in general simply because the number of observable sources decreases with the redshift. Given the smaller emission wavelength of the O III line with respect to the Hα line, O III interlopers correspond to sources which are in reality further away compared to their estimated distance. The S III line is detectable in NISP over the redshift range 0.3 < ztrue < 0.94 and there is only a small redshift interval where both S III and Hα can be detected simultaneously. Since the S III emission wavelength is larger than the Hα one, S III interlopers are systematically positioned further away than their actual distance. More details on the emission lines of interest for this study can be found in Euclid Collaboration: Gabarra et al. (2023).
In addition to line interlopers, we expect to observe noise interlopers. Given the diverse nature of possible noise interlopers and the inherent random process of detecting a noise line mistaken for Hα, we expect a fairly uniform distribution of these interlopers across different wavelengths and redshift. This is consistent with the fact that such noise interlopers can originate from virtually any true redshift.
Figure 1 offers a graphical representation of the Euclid interlopers’ properties just described. It shows the expected distribution of the measured redshifts zmeas of galaxies versus their true redshifts ztrue, and highlights the off-diagonal location of all types of interlopers. The bisector line corresponds to the ‘correct galaxies’, i.e. those galaxies whose redshift was correctly measured within the instrumental uncertainty. The coloured tracks correspond to line interlopers (identified by their labels). When represented in the (ztrue, zmeas) plane, line interlopers lie along straight lines with slope different from one, whose characteristic value is determined by Eq. (2). The shaded blue distribution in the background consists of noise interlopers. The lack of correlation in their detection randomizes the positions of the noise interlopers in the (ztrue, zmeas) plane, forming a diffuse cloud of points.
![]() |
Fig. 1. Representation of interloper galaxies in the (ztrue, zmeas) plane, representative of the Euclid spectroscopic selection. The extent of the vertical axis corresponds to the baseline observed redshift range used for the spectroscopic analysis. The plot refers to one of the EuclidLargeMocks (see Euclid Collaboration: Monaco et al. 2025, and Sect. 4.1). |
2.3. Impact of interlopers on the galaxy density contrast
To estimate galaxy clustering properties, we first need to estimate the comoving positions of all galaxies from their measured redshift. For an interloper galaxy, the estimated comoving position x differs from the true one y due to the radial displacement caused by the incorrect redshift determination (Pullen et al. 2015). Quantitatively, we can link the interlopers observed and true positions geometrically via
(3)
where x∥, y∥, x⊥, and y⊥ are the radial and transverse components of the position vectors, and
(4)
(5)
This is a geometrical dilation whose expression is analogous to AP distortions but which depends on different quantities: the ratio between the comoving transverse distances, DA, and the Hubble parameters, H, at the true and observed redshifts.
For line interlopers, the value of the γ parameters is well-defined, since the relation between the observed and true redshifts is deterministic (see Eq. (2)). The values of γ∥ and γ⊥ are larger or smaller than unity depending on whether line interlopers are located at redshifts higher or lower than that of the correct galaxies. If the interloper redshift is higher, the estimated separation between any two line interlopers is smaller than the true one, while, if the redshift is lower, the estimated separations are overestimated. The parameters γ∥ and γ⊥ quantify this effect for the parallel and perpendicular components of the separation vectors, respectively. Table 1 reports the reference values of these parameters for the Euclid survey, assuming the cosmological model given in Table 2. Unlike line interlopers, the relation between the true and observed redshifts of noise interlopers is not unique. As a result, there is no single gamma value that can be associated with this type of interloper. Instead, this parameter will vary according to a probability distribution function, which is in principle independent of the true source position. By extending the formalism introduced by Pullen et al. (2015), the galaxy density contrast δ at an observed comoving position x in the presence of both line and noise interlopers can be approximated as
Geometric distortion factors γ for misidentified S III and O III galaxies in the four baseline Euclid spectroscopic bins.
(6)
where the subscript ‘c’ stands for ‘correct’ galaxies (defined at the end of Sect. 2.2), ‘n’ stands for noise interlopers, the sum runs over all the types ‘i’ of line interlopers, ftot is the total fraction of contaminants accounting for all types of interlopers, 𝒫n is the joint probability distribution function of γ∥ and γ⊥ for noise interlopers, and fn is the fraction of noise interlopers. The fraction relative to each population is defined as the ratio between the number of galaxies of a certain type with respect to the total number of galaxies in the sample as
(7)
By definition, we have that
(8)
Equation (6) shows that the density contrast measured in the presence of interlopers is diluted compared to the one measured on a catalogue made only of correct galaxies. The attenuation of the signal is proportional to the total contamination fraction. This can be intuitively understood by considering the extreme case of noise interlopers, which are clustered objects randomly displaced along the line of sight, thereby leading to a smoothed version of the original galaxy density contrast.
The contributions of interloper galaxies are weighted by the fraction of each interloper in the catalogue. We expect the contaminant terms to be subdominant and one of Euclid survey requirements is to keep the interloper contamination fraction below the 20% threshold (Euclid Collaboration: Mellier et al. 2025). Yet, these terms inevitably modify the measured clustering statistics since interlopers have their own clustering properties. We show the effect of interlopers on the 2PCF in the presence of the expected types and fractions of interlopers for Euclid in Sect. 5.
2.4. Interloper fractions and associated clustering properties
The EWS described in Euclid Collaboration: Scaramella et al. (2022) is complemented by periodic deeper observations on a smaller area, which constitute the Euclid Deep Survey (EDS). The EDS will be used to accurately characterise the typical EWS galaxy population, as EDS fields are meant to provide a 99% complete and 99% pure spectroscopic sample of the depth of the EWS, thanks to a high cumulative exposure time that will be reached along the mission (Euclid Collaboration: Mellier et al. 2025). By the end of the survey, the EDS will span an area of 53deg2 and be observed with both the blue and red grisms (Euclid Collaboration: Mellier et al. 2025). The EDS will enable us to measure the spectra of observed galaxies with a higher S/N compared to the shallower exposures of the EWS. By comparing the same fields, first observed at EDS depth and then in the EWS, we can identify and characterize all interlopers included in the contaminated EWS sample, as well as their redshift distributions. No noise interlopers are expected in EDS observations given the higher depth and higher spectrum S/N. Similarly, line interlopers should not be present in the EDS, as more than one line can be detected due to the higher S/N, leading to an unambiguous identification of the O III or S III line in the spectra for instance.
3. Estimated 2PCF in the presence of interlopers
The galaxy clustering analysis in configuration space in Euclid will use the galaxy 2PCF, which will be estimated using the Landy–Szalay (LS) estimator (Landy & Szalay 1993). This estimator arises from first defining a catalogue overdensity, defined as the fractional difference between the data and random catalogue counts, and taking the auto-correlation of it. The random catalogue, which comprises randomly distributed points within the survey volume, allows the mapping of the geometry and selection function of the survey. Schematically, the overdensity in galaxy counts at any position x is
(9)
and leads to the auto-correlation estimator
(10)
where D(x) and R(x) are data and random catalogue counts at position x, and DD(r), DR(r), RR(r) are respectively the normalized data-data, data-random and random-random pair counts as a function of the pair separation vector r. The normalization of pair counts originates from the fact that the random catalogue contains a much larger number of objects than the data catalogue, such that
(11)
(12)
(13)
where
,
,
are raw counts, and ND and NR are the total number of objects in the data and random catalogues respectively.
Similarly, by defining the overdensity of two populations δ1 = (D1 − R1)/R1 and δ2 = (D2 − R2)/R2, where now D1 (D2) and R1 (R2) stand for the data and random catalogue counts of the population 1 (2), we obtain the 2-point cross-correlation function estimator
(14)
where D1D2, D1R2, R1D2, and R1R2 are the data 1-data 2, data 1-random 2, random 1-data 2 and random 1-random 2 normalised pair counts, respectively.
We now consider the case of the measured 2PCF ξm4 obtained by applying the auto-correlation estimator on a data catalogue containing redshift interlopers. We can decompose the contaminated data and random catalogue counts in three different components according to the three classes of redshifts by writing
(15)
(16)
where we considered only one type of line interlopers for simplicity although the generalization to more than one is trivial. The subscripts m, c, ℓ, and n refer to measured (i.e. all observed objects), correct, line interloper, and noise interloper populations, respectively. In the random catalogues associated with correct, line interloper, and noise interloper populations, the radial distributions follow respectively those of correct, line interloper, and noise interloper galaxies. Here, we consider a simplified case where the only systematic in the data is redshift error, with no angular mask applied. This matches the configuration of the mock catalogues used in this work and it is equivalent to assuming that radial and angular systematics can be treated independently. In this context, the angular distribution of the random points is taken to be uniform across the survey area. It is worth noting that the selection function of the Euclid spectroscopic catalogue, based on a forward-modelling approach, does not rely on this assumption. The validity of this assumption needs to be verified with the real data. The impact of a realistic angular mask on clustering statistics is investigated in Monaco et al. (in prep.) and it will be the subject of dedicated Euclid papers prepared in light of the first real data.
If we now define the overdensity associated with the total contaminated catalogue δm = (Dm − Rm)/Rm, the expression for the associated auto-correlation estimator is
(17)
where we identified ξcc, ξℓℓ, ξnn as the auto-correlation function of the correct, line interloper, and noise interloper populations respectively, and ξcℓ, ξcn, ξℓn as the correct-line interloper, correct-noise interloper, line-noise interlopers cross-correlation functions respectively. It is worth emphasising that, except for ξcc, all correlation functions in the right-hand side of Eq. (17) are the observed 2PCF and not the intrinsic ones, since they quantify the spatial correlation of misplaced objects.
In the right-hand side of Eq. (17), the random-random pair counts RiRj, where i, j ∈ {m, c, ℓ, n}, correspond to the (normalized) random-random cross pairs associated with the different populations. They form ratios that factorize the different terms and, in turn, add an additional scale dependence to ξm(r). In those ratios, the pair counts in the numerator and denominator differ only in the radial distribution of the associated random catalogues. Under the hypothesis of a mild difference in the observed radial distribution of the different sub-populations, the ratios of random-random pairs should tend to unity and Eq. (17) be approximated by
(18)
The validity of this hypothesis in our reference mock catalogues is tested in Sect. 6.2, where we directly assess the performance of a model that ignores the radial dependence of the prefactors. As shown in Appendix A, this dependence can be significant for certain types of interlopers (e.g. O III) in specific redshift ranges. Nevertheless, the overall impact remains negligible due to the small amplitude of the corresponding prefactor. If more than one population of line interlopers contaminate the catalogue, then Eqs. (17) and (18) will include all corresponding auto-correlation functions and the cross-correlations with all other types of objects that were included in the sample.
Evaluating the prefactors in Eq. (17) requires building three random catalogues, where points radially sample the redshift distribution of correct Nc(z), line interloper Nℓ(z), and noise interloper Nn(z) populations. While the random catalogue of the contaminated sample can be generated using the observed redshift distribution of the objects in the EWS, generating the random catalogues of each object type is less trivial. These could be either modelled or be measured from the samples of interlopers identified in the EDS.
In light of the expectations for the measured 2PCF in the presence of redshift interlopers, the goals of this study are two-fold: (1) to assess the relative amplitude of each term on the right-hand side of Eq. (17) with respect to the total signal and relevance of the scale-dependent prefactors; (2) to test our capability of constraining cosmological parameters building a theoretical model of only a subset of those terms.
4. Simulated datasets
4.1. EuclidLargeMocks and contamination strategy
We based our analysis on a set of 1000 Euclid-like simulated mock catalogues, dubbed EuclidLargeMocks (Euclid Collaboration: Monaco et al. 2025), which was extracted from a suite of numerical simulations relying on approximated perturbation techniques (Monaco et al. 2002; Munari et al. 2017). We list in Table 2 the cosmological parameters used to set up those simulations. The galaxy catalogues extracted from these simulations are lightcones with an angular footprint on the sky of a circle with radius 30° and spanning the redshift range 0 < ztrue < 3. The area of the cone, 2763 deg2, is slightly larger than the 2500 deg2 expected for the first Data Release (DR1) of the EWS. Moreover, the angular footprint almost encompasses the north and south extents of the DR1 footprint, as planned before launch. These catalogues provide a minimal amount of information for each galaxy: sky coordinates, true redshift including peculiar velocities, and Hα line flux. The catalogues are limited to fHα > 10−16 erg s−1 cm−2, that is, half of the fiducial flux limit of the Euclid spectroscopic sample. This is due to the fact that the transition from high to vanishing completeness is not expected to be sharp, so the sample will contain a sizeable fraction of galaxies below the fiducial limit (Euclid Collaboration: Granett et al., in prep.).
Cosmological parameters that define the flat ΛCDM cosmology used to perform the EuclidLargeMocks parent simulations.
The measured redshifts have been added to the catalogues using a probabilistic model calibrated on an end-to-end simulation of observations (Euclid Collaboration: Granett et al., in prep.). This pixel-level simulation of 1D spectra has been produced with the FastSpec simulator, processed with the OU-SPE processing function, and eventually used to model the conditional probability distribution function (PDF) P(zmeas|ztrue) of the measured redshift zmeas given the true one ztrue. This probability is modelled with a mixture of Gaussian PDF with standard deviation of σ0, z = 0.001 for correct galaxies and line interlopers (suitably rescaled for line interlopers), and a broad distribution for noise interlopers. The implementation in the EuclidLargeMocks relies on computing P(zmeas|ztrue) at the true redshift of each galaxy and randomly sample the distribution to obtain zmeas. Galaxies for which |zmeas − ztrue|< 3 σ0, z are tagged as correct galaxies, while galaxies whose redshift is within 3 σ0, z of the redshift corresponding to a line interloper are tagged as such. The remaining galaxies are tagged as noise interlopers. A close inspection of the redshift PDF reveals that the PDF of noise interlopers overlaps with that of correct galaxies and line interlopers. In particular at a given ztrue, the probability of having noise interloper redshifts within 5 σ0, z around ztrue is not completely negligible. This contribution could be removed by a more permissive separation of correct galaxies and noise interlopers. Conversely, this approach makes it impossible to separate truly correct galaxies from noise interlopers that happen to have a roughly correct redshift by chance.
It is worth noticing that, in our implementation, all types of interlopers are drawn from a parent sample of ELGs at z < 3. In reality, noise interlopers should be drawn from the photometric sample of Euclid galaxies, which are expected to be fainter and therefore less clustered than the brighter ELGs. As a result, drawing noise interlopers from an ELG parent sample overestimates their clustering amplitude and exaggerates their impact on the clustering analysis. This choice, however, provides a deliberately pessimistic scenario to stress-test our interloper models. Finally, this choice does not represent the small fraction of stars that are not effectively separated from galaxies and acquire a redshift by chance.
Table 3 lists the mean fractions of contaminants in the EuclidLargeMocks for all considered spectroscopic redshift bins. The variation with redshift of the fractions for the different types of interlopers is determined by the corresponding visibility range of the emission line within the NISP wavelength range (see Sect. 2). We elaborate later on the consequence of such differences on the impact on the clustering analysis.
Mean fractions of the different interloper types in the four spectroscopic redshift bins in the contaminated EuclidLargeMocks.
4.2. Random catalogues
We used a single set of random catalogues (i.e. one random for each type of galaxy) to characterize the selection function of the sample and compute the 2PCF for all mocks. The radial distribution of random points have been generated by sampling the redshift distribution averaged on the first 100 mocks, in order to smooth out radial fluctuations across individual mock catalogues (for details, see Euclid Collaboration: Lee et al., in prep.). The requirements for Euclid 2PCF estimation impose that random catalogues should be at least 50 times larger than the corresponding galaxy catalogue to minimize the estimator variance (Euclid Collaboration: de la Torre et al. 2025). Therefore, the random catalogue of each population must be at least 50 times larger than the corresponding galaxy catalogues and we created random catalogues with
objects, where
is the mean number of sources for each galaxy type i averaged across the first 100 mocks. The random catalogue of the contaminated sample is then obtained by combining the random catalogues of the single populations: correct galaxies, noise interlopers, and the various types of line interlopers.
4.3. 2PCF estimation
In order to estimate the 2PCF, we made use of the methodology and software developed for estimating the three-dimensional 2PCF within the Euclid Science Ground Segment (Euclid Collaboration: de la Torre et al. 2025). The latter utilizes the minimum-variance LS estimator and enables the use of the random split method (Keihänen et al. 2019) to speed up the computation. We evaluated the monopole, quadrupole, and hexadecapole moments of the anisotropic 2PCF using 40 equally spaced bins in separation r ∈ [0, 200] h−1 Mpc (Δr = 5 h−1 Mpc) and 200 bins in μ within μ ∈ [ − 1, 1]. We computed all terms in Eq. (17), including both the auto- and cross-correlation functions of the different populations but also the random-random pair counts that appear in the prefactors of Eq. (17). To transform the redshift of the mock galaxies into distance we used the same cosmological model as used to generate the parent simulations. Our analysis focuses on the baseline redshift intervals for the Euclid galaxy clustering analysis: z ∈ [0.9,1.1], z ∈ [1.1,1.3], z ∈ [1.3,1.5], and z ∈ [1.5,1.8].
5. The contribution of interlopers to the EuclidLargeMocks 2PCF
We use Fig. 2 as an example to illustrate how the density contrast introduced in Eq. (6) in the presence of interlopers translates into the galaxy 2PCF measurements. The 2PCF of correct galaxies (solid blue line) is compared with those of the O III and S III interlopers (solid green and pink line) and the total measured 2PCF (dotted black line). All 2PCFs are estimated using the mocks presented in Sect. 4. The correct galaxies’ and line interlopers’ 2PCF are the intrinsic auto-correlation functions of the corresponding population, i.e they are not weighted by their prefactors as in Eq. (17). We can see that the different population 2PCFs are characterized by different shapes that cause a broadening of the BAO peak in the resulting measured 2PCF. The 2PCF of line interlopers is shifted and distorted compared to that of correct galaxies. For S III interlopers, the 2PCF is broadened towards larger separation scales, while for O III interlopers, it is compressed towards smaller scales. This effect is particularly evident when examining the corresponding shifts of the BAO peak position. These results demonstrate the importance of modelling the clustering properties and abundance of all types of interlopers to account for contamination effect on 2PCF measurements.
![]() |
Fig. 2. Monopole of the contaminated sample auto-correlation (dashed black line) compared to the intrinsic correct galaxy (blue line) and line interloper (green and pink line) auto-correlations, not weighted by the prefactors in Eq. (17). The 2PCF are averaged over the EuclidLargeMocks in z ∈ [1.3, 1.5]. We can appreciate both the dilution of the clustering amplitude in the presence of contamination and the distortion of the line interlopers’ signal, in particular the shift of the BAO peak. |
In the mock catalogues, we can unambiguously identify and separate all types of objects. This allows us to compute exactly all terms in Eq. (17), both the correlation functions and their prefactors. This possibility offers the opportunity to evaluate the contribution of each term to the total correlation function of the contaminated catalogue. Moreover, we can evaluate the residuals that we obtain if we neglect some terms on the right-hand side of Eq. (17). This evaluation allows us to quantify the most relevant terms that we should include in the theoretical model when the measured signal is fitted to extract cosmological information. While the modelling of the autocorrelation of correct galaxies and line interlopers is relatively straightforward, that of the cross-correlations of the various interlopers, characterised by very different redshift distributions, is considerably more challenging. It can be obtained either phenomenologically from direct measurements in the EDS or theoretically under some simplifying assumptions (see Foroozan et al. 2022).
For the sake of both simplicity and generality, we only show the results for two specific redshift bins representative of the different types and fractions of interloper galaxies. In the first one, z ∈ [0.9, 1.1], most contaminants are noise interlopers and constitute 10% of the observed catalogue. Line interlopers account for only few per cents. In the second redshift bin, z ∈ [1.3, 1.5], the fractions of noise and line interlopers are comparable, around 10% each.
5.1. Amplitude and shape of the different terms
Figure 3 shows the monopole (top panels), quadrupole (centre), and hexadecapole (bottom) of all the auto- and cross-correlation functions in Eq. (17), weighted by their corresponding prefactors that we generically denote by p on the y-axis label. The multipole correlation functions have been averaged over all 1000 EuclidLargeMocks. The panels in the first and third columns show all contributions for the two redshift bins under consideration, as indicated by the labels. For each, a zoomed-in view of the smallest contributions is displayed in the second and fourth columns, highlighting the scale dependence of all interloper contributions. As expected, the major contribution to the total signal comes from correct galaxies in both redshift bins being the most numerous type of galaxy. The other terms are all subdominant, although not negligible. Their relevance depends on the redshift bin considered.
![]() |
Fig. 3. Monopole, quadrupole, and hexadecapole moments of all terms in Eq. (17) averaged over all mock catalogues for z ∈ [0.9,1.1] (left) and z ∈ [1.3,1.5] (right). All terms comprise the correlation function and the corresponding prefactor. To simplify the visualization of all terms, the rightmost column of each panel shows a zoom-in on the smallest contributions in the corresponding redshift bin. |
At z ∈ [0.9, 1.1], as shown on the left panels of Fig. 3, the most prominent contribution apart from the correct-correct one is the correct-noise correlation signal, which significantly differs from zero. This is not unexpected, since we know that the redshift PDF of noise interlopers overlaps with that of correct galaxies (see Sect. 4.1). In this case, this term is equivalent to the autocorrelation of correct galaxies computed on a sample in which some sources have a larger error on redshift. However, the intensity of the signal ultimately depends on how catastrophic redshift errors are defined with respect to random ones. More details on the origin of this contribution in the EuclidLargeMocks can be found in Appendix C. The line interlopers contributions, while characterized by a large intrinsic auto-correlation signal, are damped by the small amplitude of their prefactors, given their small fractions in this redshift interval.
The situation is slightly different at z ∈ [1.3, 1.5] shown in the right panels of Fig. 3-right. In this redshift bin, the fractions of noise, O III, and S III interlopers are comparable. As a consequence, the amplitude of the line interloper auto correlations (dark green and violet lines for O III and S III respectively) is higher compared to the low-redshift bin and is of the same order as that of the correct-noise correlation. Given the enhancement of the line interlopers’ auto correlation, we can appreciate the distortion induced in the shape of their 2PCF, as previously illustrated by the broadening and shifting of the BAO peak in the auto-correlation function of the line interlopers in Fig. 2. The contribution of line interlopers to the contaminated 2PCF is particularly evident on small scales in the monopole, where they constitute the second most important contribution after correct galaxies.
In both redshift bins, the other terms in Eq. (17) either have a negligible amplitude or are very noisy. This is expected for the line-correct and line-line cross-correlation terms, since these populations are very far apart (Δz > 0.6, or Δr > 846 h−1 Mpc in terms of comoving distances). The correlation function amplitude of objects characterized by very broad redshift distributions, particularly noise interlopers, is expected to be very small as well. Overall, we cannot appreciate any significant shift or broadening of the BAO peak in the contaminated signal with respect to the correct-correct contribution.
5.2. Simplified models for the measured correlation function
In this section, we focus on the residual error obtained when we neglect some terms on the right side of Eq. (17), that is, when considering an incomplete modelling of the measured correlation function. The first model considered is one that ignores the specific contamination and only accounts for the correct galaxy contribution attenuated by the prefactor
(19)
In the second model we include the autocorrelation terms for both noise and line interlopers
(20)
Finally, if we further include the correct-noise cross-correlation term that features prominently in Fig. 3 we have
(21)
For each of the three models, we compared the residuals (ξm − model) to the expected statistical uncertainty, σm, on the measured 2PCF ξm and looked for the minimal set of terms that brought the systematic error below σm and 10% σm.
Figure 4 shows the amplitude of systematic error induced by using the approximate models described by Eq. (19) (blue line), Eq. (20) (golden line), and Eq. (21) (brown line), for z ∈ [0.9, 1.1] (left panel) and z ∈ [1.3, 1.5] (right panel), in monopole, quadrupole, and hexadecapole correlation functions. Systematic errors, defined as the difference between the measured and modelled quantities, are averaged over the 1000 mocks and the coloured bands around them represent the standard deviation around the mean. The grey bands represent the statistical uncertainty, σm, on the measured 2PCF, i.e. the statistical error on a single realization and 10% of its value. The value of σm is obtained from the scatter of ξm multipoles among mocks realizations, whose area is on the order of the total DR1 area.
![]() |
Fig. 4. Systematic errors in the monopole, quadrupole, and hexadecapole moments in the case of an incomplete parameterization, for z ∈ [0.9,1.1] (left) and z ∈ [1.3,1.5] (right). The grey bands correspond to the statistical uncertainty σm on the measured monopole and to 10% σm. The y axis scale is linear between −10−3 and 10−3, and symmetric logarithmic elsewhere. |
At z ∈ [0.9, 1.1], the simplest modelling including only correct galaxies autocorrelation leads to a systematic error smaller than the expected statistical uncertainty on 2PCF measurements in DR1 at all separations. Adding interlopers auto-correlations has only an impact on the smallest scales, where the line interlopers auto-correlation is highest, but has no effect on scales above 30 h−1Mpc. The systematic error falls below 10% of σm at all scales only when including the cross-correlation term between correct galaxies and noise interlopers. The systematic error has a slightly different behaviour at z ∈ [1.3, 1.5], which is directly linked to the different interloper fractions in this redshift range with respect to the previous one. The residuals in the monopole when using the simplest model are larger than the statistical uncertainty by up to about 40 h−1Mpc. In this case, the introduction of the line interlopers auto-correlation is crucial as it brings the systematic error below the statistical one. This is consistent with the monopole amplitudes reported in Fig. 3, where we see that the line interlopers signal is prominent at those scales. Despite these differences, the addition of the correct-noise cross-correlation term in this redshift range is required to bring the residuals below 10% of σm. Overall, in all models and considered redshift bins, the amplitude of the systematic error decreases with the separation, eventually approaching or dropping below 10% of the statistical uncertainty beyond 100 h−1Mpc.
The adequacy of a given 2PCF model in describing the measured 2PCF in the presence of interlopers must be ultimately evaluated upon its ability to extract unbiased scientific information. The results presented in this section help us to build an effective model for the measured 2PCF that is both accurate and as simple as possible. In other words, measuring a significant systematic effect at the level of the 2PCF measurements does not imply an equally significant decrease in the precision and accuracy of the estimated cosmological parameters. The ultimate goal of this analysis, which is detailed in the following sections, is to comprehensively assess the impact of interlopers on the inference of some cosmological parameters derived from galaxy clustering measurements.
6. The impact of interlopers on RSD parameters
In the second part of this paper, we aim at evaluating how systematic errors due to adopting an incomplete interlopers model affect the inference of the cosmological parameters. In this section, we perform a MCMC analysis to sample the posterior distribution of three key cosmological parameters, namely the growth rate, fσ8, the clustering amplitude, bσ8, and the pairwise velocity dispersion, σp: when included in the clustering model, also the total contamination fraction is let free to vary. We fix all the other parameters to the values adopted in the parent simulations. Our goal is to identify the simplest theoretical model that accurately provides unbiased estimates of the physical parameter of interest, fσ8.
We begin by considering different configurations of a model which accounts for the presence of contaminants only through a damping factor in front of the correct galaxies 2PCF, like in Eq. (19). Then we test a second model that accounts also for the auto-correlation of the two types of line interlopers but ignores the auto-correlation of the noise interlopers, which has been shown to be negligible. The goal is to verify whether the systematic errors induced by ignoring the cross-correlation terms, which are considerably more difficult to model, are small enough to be neglected. The specific models used to fit the measured 2PCF are detailed in Sect. 6.2.
In our analysis, we are not focused on evaluating the absolute precision with which cosmological parameters can be estimated from DR1 data. Instead, our goal is to assess the impact of systematic errors arising from the adoption of incomplete models for interlopers. To achieve this, we compute the results obtained by fitting a 2PCF measured on the contaminated catalogue to various 2PCF models detailed below: then, we compare these results to those obtained by fitting the 2PCF measured on the pure part of the catalogue (i.e. made of only correct galaxies) with its corresponding model for correct galaxies clustering. We refer to this latter scenario as the ‘reference case’. In the following, we detail the models and methodology adopted in this analysis and the corresponding results.
6.1. Modelling the 2PCF
All the 2PCF models adopted in the analysis, presented in Sect. 6.2, are derived from the template model for the galaxy power spectrum in redshift space described in Euclid Collaboration: Blanchard et al. (2020) and generalized in Addison et al. (2019) to include the modelling of line interlopers
(22)
The term Pdw is the damped-wiggles matter power spectrum (Ivanov & Sibiryakov 2018; Euclid Collaboration: Blanchard et al. 2020), f is the growth rate, σp is the pairwise non-linear velocity dispersion which relates to the relative displacement induced by the peculiar velocity of galaxies (Ballinger et al. 1996; Euclid Collaboration: Blanchard et al. 2020), σ8 is the rms density fluctuation at 8 h−1Mpc,, and Fz is a Gaussian function to account for the accuracy on the measured spectroscopic redshift (whose rms value σ0, z is almost independent of redshift and equal to 0.001 in the EuclidLargeMocks).
Since all cosmological parameters (apart from the aforementioned free parameters fσ8, bσ8, and σp) are set equal to those of the simulation, there is no need to model the AP effect: therefore, the values of the gamma parameters are identically equal to one for the power spectrum of the correct galaxies, whereas for line interlopers their values are estimated from Eqs. (4) and (5). Moreover, in the case of line interlopers, the power spectrum measured at redshift z depends on the cosmological parameters evaluated at the true redshift ztrue of the interloper population (Addison et al. 2019). To transform the values of the wave-number modulus and its cosine angle from the true to the observed ones in Eq. (22), we used
(23)
Since we worked in configuration space, we started from the anisotropic power spectrum model to obtain the two-dimensional 2PCF model of the correct galaxies and line interlopers auto-correlation terms in Eq. (17). We then extracted the multipoles by integrating the two-dimensional models weighted by the proper prefactor in front of each correlation function through
(24)
where jν(ks) are the spherical Bessel functions.
6.2. 2PCF phenomenological models with interlopers
We present a set of analyses which involve comparing different 2PCF models, characterized by different sets of free parameters and types of interlopers. Table 4 provides a summary of these tests, which are detailed below. We performed these tests in all four Euclid spectroscopic redshift bins. For clarity, instances like ‘A vs. B’ should be interpreted as ‘measurement A fitted against model B’.
Summary of all tests run in the MCMC, including the reference case (first line).
6.2.1. The reference case: Correct versus correct
As mentioned at the beginning of this section, to avoid being sensitive to our choice of a particular power spectrum model, we aim to compare cosmological parameter results across different interloper parameterizations against a reference case that uses the same power spectrum model. In this reference case, we fit the 2PCF measurement of the correct part of sample using the theoretical model for the correct galaxies auto-correlation. We refer to this case as correct vs. correct5. We fit6
(25)
where the parameters fσ8, b, and σp are let free in the fit and refer to the correct galaxy population within the measured redshift bin. Considering this model as reference, in particular the corresponding fσ8 value, we can evaluate the improvement induced only by considering more complex and detailed models based on the prefactors parameterization and on the addition of the line interlopers auto-correlation signals to the total theoretical model.
6.2.2. A proof-of-concept case: Contaminated versus correct
The results presented in Sect. 5 demonstrate that, at first approximation, the contaminated signal can be reproduced by accounting for the contribution of the correct galaxies only, appropriately weighted by the corresponding prefactor. We perform a proof-of-concept test in which the measured 2PCF of the contaminated sample is compared to the same correct-only 2PCF model used for the reference case, that is a model which assumes a 100% pure sample. In this case, we expect that the mismatch in the clustering amplitude will result in an underestimate of the linear bias parameter, b. The ultimate scope is to check whether the adoption of this simplified model affects the estimate of the growth rate parameter fσ8. We refer to this test as contam vs. correct. We fit
(26)
6.2.3. Correct-only modelling with exact prefactor
This is the simplest realistic model that we used to fit the contaminated signal. As in the previous cases, we account for the auto-correlation of correct galaxies only, but this time weighted by its exact prefactor as in Eq. (17) when fitting the contaminated signal. This means that we assume to know exactly the fraction of target galaxies and its scale dependence.
We refer to this test as contam vs. p*correct. We fit
(27)
with
. The model is very similar to the reference case one, apart from the 2D prefactor in front of the correct galaxies 2PCF. This prefactor is integrated together with ξcc when computing the multipoles of the model, which is what we consider in the MCMC analysis. Since the prefactor is exact (because it was measured from the pairs in the random catalogues), the free parameters are the same of the reference case.
6.2.4. Correct-only modelling with free contamination fraction
In the real survey, one expects to estimate the fraction of interlopers from the analysis of the EDS. However, it is unlikely that such an analysis will be able to estimate the scale dependence of the contamination in the first stages of the mission. In addition, the total contamination fraction will be measured with some uncertainty. Therefore, we explore an additional model in which we approximate the contamination fraction ftot by a constant rather than a scale-dependent factor, and we let it free to vary within the interval specified by a uniform prior.
We refer to this test as contam vs. (1 − ftot)2*correct, where the prefactor in this case is scale-independent and only depends on the total contamination fraction fc (see Eq. (18)). We fit
(28)
In this case, we have one more free parameter with respect to the previous tests, which is ftot. The prior on this and on the other parameters are discussed in the dedicated Sect. 6.3. We expect that a large prior of ftot may cause, on one hand, a strong degradation of shape parameters like fσ8 and bσ8 due to natural degeneracies of the model. On the other hand, the β = f/b parameter should be insensitive to the choice of this prior.
6.2.5. Correct galaxies and line interlopers modelling
This is the most complete model we present in this paper. In addition to the correct galaxies contribution, we include the O III and the S III line interlopers auto-correlation terms in the theoretical model. The complete 2PCF model (described in Eq. (32), after introducing the set of approximations we adopted) is therefore the sum of three contributions, all derived from the corresponding power spectrum models as in Eq. (22). We do not include the noise interlopers auto-correlation since in Sect. 5 we have shown that it is expected to be negligible. Despite this simplification, the model still depends on a large number of free parameters, some of which are highly degenerate. To reduce the number of degenerate parameters while maintaining a focus on estimating fσ8 and bσ8, we have adopted several simplifying assumptions, which are detailed below.
First, in analogy with the p*correct model, we assume that the contamination fractions of correct galaxies and line interlopers can be estimated from the data. We also assume that not only their average values but also their scale dependence is known. The impact of this second assumption is expected to be negligible, since, as shown in Appendix A, the scale dependence of the prefactors is either mild or, when it is not, the magnitude of the prefactor itself is small. In summary, we fix the prefactors for the three auto-correlation terms included in the model. Second, we leverage both physical and empirical considerations to build a redshift-dependent model for the growth factor and bias of the interlopers, as detailed in the following.
We assume the cosmological model of the parent simulations to constrain the redshift dependence of the fσ8 value accordingly. With this assumption, only a single free parameter, that is the correct galaxies growth rate fσ8 measured at the observed redshift z, is needed to characterize the growth rate, since the value of fσ8 at the redshift of the line interlopers zint is uniquely determined. Furthermore, since in ΛCDM (which is the EuclidLargeMocks cosmology) the function fσ8(z) is nearly linear within the redshift interval of interest, we adopt a simplified linear model that best fits the exact relation
(29)
where m = −0.09 is the slope of the relation derived from theory, assuming an error of 0.02 on the fσ8 values.
Conversely, the redshift dependence of the bias cannot be inferred from theory, as it is related to the selection function of the spectroscopic sample and to the physical properties of the different types of observed galaxies. In principle, the three population of objects that contribute to the measured 2PCF have different bias values that should be treated as independent free parameters in the model. However, to reduce the number of free parameters and find a proper relation to link the interlopers bias to the correct galaxies bias, we calibrate the bias dependence on redshift directly on the mock measurements. For each redshift bin, we separately fit the correct galaxies, O III interlopers, and S III interlopers signal with a model accounting for the exact measured prefactors in front of the 2PCF.
Figure 5 shows the bias values obtained fitting all populations separately in every measured redshift bin. The corresponding redshifts in the b(z) relation are the centres of the measured redshift bins for correct galaxies, while for line interlopers they are the true original redshifts related to the measured redshift bin through Eq. (2). We interpolate the overall dependence on redshift with a linear relation to find the angular coefficient mb7. When running chains, we vary the correct galaxies bias b and mb and then we compute the interlopers bias with the simple deterministic relation
![]() |
Fig. 5. Redshift dependence of the bias in the EuclidLargeMocks, obtained by fitting the 2PCF of each population in the Euclid spectroscopic bins. Each colour refers to a different population and reveals the true redshifts of the line interlopers compared to correct galaxies. On top of the bias values, we plot the linear relation we derived to interpolate the bias’ redshift dependence. |
(30)
We refer to this test as contam vs. correct+line. We fit
(31)
(32)
We leave the velocity dispersions σp free for all populations since we do not have a physical model for them, and we treat them as nuisance parameters. The line interloper factors γ∥ and γ⊥ are computed at a redshift z corresponding to the centre of the observed redshift interval under study via Eqs. (4) and (5), and they are reported in Table 1: we use Eq. (2) to derive the original effective redshift of line interlopers.
6.3. Methodology
To estimate the free parameters of the models, we sampled their posterior probability distribution using the MCMC sampler emcee (Foreman-Mackey et al. 2013). For this, we assumed a Gaussian likelihood for the data, which is explicitly expressed as
(33)
where χ2 is defined as
(34)
and C is the data covariance matrix. Our data vector ξmeas(r) is made of the monopole and quadrupole of the measured 2PCF
averaged over the full set of mock catalogues for all tests, apart from the reference case in which we fit the average of the correct galaxies ξccmeas(r) data vector. The covariance matrix C, instead, is that of a single realization, since we are interested in assessing the precision with which these parameters will be estimated in the DR1 Euclid survey. We limited our fit to the range of separations r = [40,200]h−1Mpc in order to exclude the smallest scales, which cannot be properly described by a tree-level theoretical model of the power spectrum (more details can be found in Appendix B). We report the mean of the posterior probability as the best estimate for the cosmological parameters, and the 1σ of the marginalized posterior as uncertainty. In the triangle plots, the coloured bands in the marginalized 1D posteriors correspond to the just mentioned 1σ uncertainty; in the 2D posteriors, we report contours corresponding to 68% and 95% confidence levels.
In Table 4 we list the uniform priors we used in the different tests. In all cases explored except the last one, we adopted non-informative priors over very broad intervals. For the last model, which accounts for line interloper contamination, we set our priors based on physically motivated constraints. For example, the upper limit fσ8 < 1 is consistent with assuming a ΛCDM model, whereas the upper limit on the bias parameter b ≤ 5 is consistent with the linear b(z) model that we have adopted (see Fig. 5). Moreover, we decided to adopt a wide [0,3] prior on mb.
When testing the correct-only model with the total contamination fraction free to vary, we tested different uniform priors on ftot. In particular, we tested a symmetric ±1% and ±10% uniform prior around the true values of ftot derived from the fractions of interlopers averaged over the 100 mock catalogues. This corresponds to the condition ftot, min < ftot < ftot, max, with ftot, min = (1 − 0.01) ftot and ftot, max = (1 + 0.01) ftot in the case of a 1% prior. We rely on the ability to estimate the fractions of the various interlopers by analysing the EDS, and we assume that we are able to measure these fraction with a precision in the range 1–10%, as reflected by the chosen priors. We only show an example where ftot is allowed to vary freely within its physical limits [0,1]. This extreme case illustrates the ‘worst’ pessimistic scenario where no external constraints are put on ftot, dramatically impacting the results of the analysis. Given the peculiarity of the test, we do not explicitly compare it with the other cases.
We used the MCMC acceptance rate and the integrated auto-correlation time as diagnostic to decide whether or not the chain was long enough to have converged. For the tests we show in this paper, we found that a configuration with 20 000 steps and 40 walkers was adequate, providing a number of effectively independent samples greater then 100 for all redshifts, tests, and model parameters. To compare our results to those obtained in Sect. 5, aimed at testing the individual contribution of the interlopers to the measured 2PCF, we show the results in the same redshift bins, i.e. z1 = [0.9,1.1] and z3 = [1.3,1.5].
6.4. Results of the amplitude fits
The contour plot in Fig. 6 shows the 2D and 1D marginalized posterior probability contours for the parameters fσ8 and bσ8 obtained from three different analyses (in z3 as an example). The grey contours show the correct vs. correct reference case. We compare them with the results of the contam vs. correct case with no allowance for interlopers contamination (red curves) and that of the contam vs. p*correct case (green curves), in which the correct prefactor is used to account for interlopers contamination. In the contam vs. correct scenario, the systematic error on the fσ8 and bσ8 values is significantly larger than the statistical uncertainty, and a simpler rescaling of the fitted galaxy bias value is not enough to recover the correct fσ8 value. However, when adding the exact prefactor of the correct galaxies contribution to the model, the contour plots (in green) overlap with those of the reference case. This comparison highlights the systematic error deriving from ignoring the presence of interlopers altogether and assuming that the sample is 100% pure. An additional degree of freedom is required to account for the overall decrease in the clustering amplitude, which is the main effect induced by interloper galaxies.
![]() |
Fig. 6. Comparison between the reference case (grey line), a model including interlopers (the minimal one, green line), and a model with contributions from correct galaxies only without the proper weighting (red line) in z ∈ [1.3, 1.5]. |
Figure 7 shows the inferred values of fσ8 and bσ8 determined in each chain in the two reference redshift intervals, along with their uncertainties. A grey band indicating the 1 σ uncertainty in the reference case is displayed to facilitate the comparison between different models.
![]() |
Fig. 7. Inferred fσ8 and bσ8 values and 1σ uncertainties for all tests in the two reference redshift bins. They grey band corresponds to 1σ around the values inferred in the reference case. |
Let us now focus on the models that include only the correct galaxies contribution. We considered three cases: one in which we assumed to know exactly the prefactor of the correct galaxies auto-correlation and its scale dependence (green curves in Fig. 7 and Fig. 8); one in which we treated the prefactor (1 − ftot)2 as a free parameter (with no scale dependence), with a strong symmetric ±1% uniform prior centred on a reference value for ftot estimated by averaging over the actual fraction of contaminants inserted in the mocks (golden line in Fig. 7); and one in which a milder prior ±10% on ftot was assumed (orange lines in Fig. 7 and Fig. 8).
![]() |
Fig. 8. Contour plots for fσ8 and bσ8 derived from three representative tests in z ∈ [0.9, 1.1] (top) and z ∈ [1.3, 1.5] (bottom). The dashed grey lines indicate the mean values of the reference case posterior distribution. |
Looking at the error bars shown in Fig. 7, we immediately notice that there is no substantial difference between the exact prefactor case and the constant one. Focusing on the centre-left panel of Fig. 9, we observe that the probability contours for the fσ8 and bσ8 parameters remain largely unaffected by the assumed interloper fraction, provided this is known a priori with 10% precision. However, this stability is lost when the contaminating fraction is allowed to vary freely between 0 and 1, i.e. when no prior information about the contamination is available. In this scenario, the probability contours for these parameters broaden significantly, as expected. It is noteworthy, though, that even in this pessimistic yet unlikely case (since we expect to measure the sample purity through analyses of the EDS), our ability to estimate the distortion parameter β remains relatively unaffected (as shown in Fig. 10). This resilience is due to the fact that β can be measured from the ratio of the monopole to the quadrupole of the galaxy 2PCF, a calculation in which the contaminating fraction cancels out.
![]() |
Fig. 9. Comparison between the cosmological parameters’ constraints obtained with a correct-only constant-prefactor model (contam vs. (1 − ftot)2*correct) with an uncertainty of 1% and 10% on the total contamination fraction ftot. As an example, we show the results for z3. |
![]() |
Fig. 10. Independently from the width of the prior on the total contamination fraction ftot, the β parameter is always constrained and its uncertainty does not change in a correct-only constant-prefactor model. |
The most realistic model we show in this study is the one including also the contribution of line interlopers (see Sect. 6.2), which corresponds to the blue curves in Fig. 7 and in the triangle plots of Fig. 8, where we show the 1D a 2D marginalized posterior distributions for fσ8 and bσ8 in three representative chains. The dashed grey lines indicate the mean values of the reference case marginalized posterior distributions. The results show that including the contribution of line interlopers in the models does not significantly modify the posterior distributions. Consequently, this inclusion does not affect the precision with which these parameters are estimated, as shown by the error bars in Fig. 7.
On the other hand, the accuracy with which these parameters are estimated varies with redshift. Focusing on fσ8, in Fig. 11 we show the percent difference between its estimated value in all tests (coloured dots) and the one fitted in the reference case (grey dots). Different colours are used consistently with the previous plots and indicate the different model used. The error bars were computed propagating the error on the mean inferred fσ8 values, i.e. dividing the 1 σ uncertainty of the marginalized fσ8 posterior by the square root of the number of mocks, in order to quantify the systematic error in the estimate of the parameter. The grey band indicates a reference 1% difference. All models underestimate the fσ8 value with respect to the reference case in the nearest redshift bins. However, the discrepancy is small, decreasing from 3% at z1 to 1% at z2. In the data, this mismatch corresponds to a systematic underestimate of the absolute amplitude of the 2PCF monopole and quadrupole moments, as shown in Fig. 12-left. At the higher redshifts z3 and z4, the more sophisticated model that includes line interlopers continue to underestimate the parameter, though the mismatch remains minor. In contrast, the predictions of all other models agree to within 1% with expectations.
![]() |
Fig. 11. Percent difference between the fσ8 value estimated in each chain and the one inferred in the reference case. Here, the errors on the percent difference are derived by considering the errors on the mean value of fσ8, i.e. the uncertainties given by the chains divided by the square root of the number of mocks. |
![]() |
Fig. 12. Agreement between the measured 2PCF |
The redshift dependence of the mismatch of the models may have different causes. One is the inadequacy of our power spectrum model, used to predict the galaxy 2PCF, in accounting for non-linear effects. We expect its impact to be small since we limit our analysis on separations larger than 40 h−1Mpc. Moreover, we compared the performance of the models to a reference case that also relies on the same matter power spectrum model. However, the model of the measured 2PCF at a given redshift accounts for the contribution of all interlopers at various redshifts, and its value at a given separation may include contributions from various type of pairs, including those at smaller separations and thus probing non-linear scales. One hint that this is a plausible explanation is the fact that the mismatch decreases with the redshift for most of the models explored.
A distinguishing feature of the first redshift bin compared to the others is the presence of a high fraction of noise interlopers and an almost complete absence of line interlopers. As a result, the cross-correlation between correct galaxies and noise interlopers becomes the most significant contribution after the correct galaxy signal, as shown in the left panel of Fig. 3. However, although this contribution is larger in z1 than in z3, its magnitude contributes but is not sufficient to account for the observed difference in fσ8. To verify this, we compared the results of fitting the correct-only model to both the measured 2PCF, i.e., the total measured signal, and the 2PCF after subtracting the correct-noise cross-correlation contribution. The fit results do not change significantly in either z1 or z3. This suggests that, at least with the contaminant fractions present in the EuclidLargeMocks, the cross-correlation between correct galaxies and noise interlopers does not play a role in the results of the first redshift bin (at least when focusing on r > 40 h−1Mpc). If the fraction of noise interlopers were to increase, the relevance of the cross-correlation signal with correct galaxies would correspondingly rise. In such a case, we are confident in our ability to model this contribution if necessary (see Appendix C).
Figure 13 shows the systematic bias between the fσ8 value inferred from all models relative to the reference case, as a function of redshift. To better quantify this bias, we computed its uncertainty by dividing the uncertainties on fσ8 in the chains by the square root of the number of mocks, as we did in Fig. 11. The systematic errors for the different models are compared to the statistical uncertainty σSTAT on fσ8 in the reference case (grey bands) when considering a statistics comparable to the Euclid final data release DR3 (i.e. the uncertainty on fσ8 from the reference case chain has been divided by
, since the DR3 volume will be about six times that of DR1). All redshift bins show systematic errors below or comparable to the statistical error regardless of the 2PCF model tested. In particular, moving towards high redshifts, the systematic errors tend to be smaller. This suggests that, with DR3-level sensitivity, we must be more careful in assessing the quality of our models and check whether this discrepancy arises from the assumption of an overly simplistic model, or whether the inadequacy of the interlopers’ modelling becomes significant with such high statistics. On the other hand, a simple interloper model appears to be more than sufficient when working with DR1-like sensitivity.
![]() |
Fig. 13. Systematic bias on fσ8 of all tests with respect to the reference case. The grey bands represent an estimate of the precision on fσ8 at the end of the mission (DR3), when the observed volume will be six times bigger than DR1. As in Fig. 11, the error bars were derived using the errors on the mean values of the inferred fσ8. |
7. The impact of interlopers on BAO parameters
In this section, we focus on modelling the BAO signal in the measured 2PCF. We parameterise the deviation between the measured and fiducial BAO scale, along and across the line-of-sight, using the AP parameters α⊥, α∥, defined as
(35)
(36)
Here the superscripts true and fid mark the true and fiducial values of the Hubble parameter H(z), the comoving angular diameter distance DA (z), and the sound horizon scale rs, as estimated using, respectively, the true cosmological model and the fiducial cosmological model used to convert redshifts into distances.
The AP parameters enter the 2PCF templates as a dilation of the measured radial and angular galaxy pair separations, in the same way as the γ⊥, ∥ parameters, defined in Eqs. (4) and (5), are incorporated into the clustering template to account for line interlopers (Eq. (22); Ross et al. 2017). Therefore, when analysing contaminated data, we expect these two sets of parameters to be highly degenerate. In particular, when employing the state-of-the-art model for the BAO peak (Euclid Collaboration: Sarpa et al., in prep.), which does not explicitly model systematic effects – neither for the redshift errors nor the interlopers signal included in Eq. (22) – we anticipate that the AP parameters will fully absorb the effects of the γ parameters, ultimately degrading the BAO constraints.
In the following, we quantify the amplitude of this degradation by applying the standard BAO model to the contaminated signal at different redshifts. The analysis presented in this article does not take into account the effects of interlopers on the BAO reconstruction, as this exploration is addressed in a companion paper (Euclid Collaboration: Sarpa et al., in prep.).
7.1. Methodology
We analysed the full set of EuclidLargeMocks introduced in Sect. 4, distinguishing between the correct and contaminated cases (correct + line + noise). By matching the fiducial cosmology to the true cosmology of the mocks when mapping redshift coordinates into distances, we expect the estimated values of the AP parameters to be consistent with unity in both scenarios.
Given that the model does not incorporate any source of systematic errors, we fit both the correct and measured 2PCF multipoles using the same template (Sarpa et al. 2021)
(37)
where ξ Ph, (ν) encapsulates the physical properties of the signal, modulated by the linear bias, b, the growth rate of structures, f, and the phenomenological parameters Σ⊥ and Σ∥ describing the anisotropic damping of BAO in the parallel and transverse directions to the line of sight. The polynomial broadband term
(38)
was included to model high-order non-linear effects (beyond first-order perturbation theory) and systematic features not explicitly captured in the physical model.
There are five main characteristics of the contaminated signal that are not accounted for in Eq. (37): the relative fraction of different mass tracers (correct galaxies and interlopers), their respective biases, the dilation of the BAO scale induced by line interlopers, the noise contribution from noise interlopers, and redshift errors. Noting that the biases and fractions of contaminants modulate the amplitude of the 2PCF signal, we expect their effects to be reabsorbed by the estimated values of the linear bias parameter, b. As previously discussed, since γ⊥ and γ∥ are degenerate with α⊥ and α∥, we used only the latter two as free parameters in the analysis. Finally, the broadband term BB accounts for both the noise background level and the redshift errors.
We validated our hypothesis on the effect of interlopers on the AP parameters by defining a Gaussian likelihood for the data (as shown in Eq. (33)) and sampling the model’s parameter posterior, assuming flat priors, as detailed in Table 5. To better understand the impact of interlopers on the BAO scale, we focused on modelling the mean monopole, quadrupole, and hexadecapole of the 2PCF averaged over the full set of mocks, utilizing the same mock covariance matrix as in the previous section. In this context, the mean 2PCF multipoles served as the theory data vector, enabling us to assess the performance of the state-of-the-art BAO template in the presence of contaminated signals. Finally, we performed the likelihood sampling using the BAOFITTER Python package8, restricting the fitting range to [50, 150] h−1Mpc.
7.2. Results of the BAO fit
Figures 14 and 15 show the 1σ and 2σ posterior probability for the AP parameters as estimated from the correct (black) and contaminated (red) catalogues for the two reference bins used in this analysis, z1 and z3, respectively. At z1, where noise interlopers dominate over line interlopers, the contaminated constraints are in almost perfect agreement with correct galaxy results, showing no bias and a slight increase of 9% in the uncertainty of α⊥ and 4% in α∥. At z3, where the fraction of noise interlopers, O III, and S III line interlopers are comparable, we detect a significant increase in the uncertainties, corresponding to 21% and 14% of the target prediction on α⊥ and α∥, respectively. Similarly to z1, their estimated value is not biased by the presence of contaminants. The best fit values for the AP parameters for z1 and z3, as well as for the other bins are shown in Table 6, together with relative errors and the reduced
values, and visualized in Fig. 16.
![]() |
Fig. 14. Alcock–Paczynski parameters in the redshift range z ∈ [0.9, 1.1]. |
![]() |
Fig. 15. Alcock–Paczynski parameters in the redshift range z ∈ [1.3, 1.5]. |
Best fit, percent relative error, and χ2 values for the AP parameters in all the redshift bins for both correct and contaminated catalogues, with α∥ and α⊥ values shifted by subtracting 1.
![]() |
Fig. 16. Summary plot of the AP parameters in all the redshift bins, both in the correct and contaminated case. |
Our results indicate that the presence of noise interlopers neither biases nor significantly degrades the BAO estimates, as their effects are effectively accounted for by the polynomial broadband term. In contrast, line interlopers result in a substantial enlargement of the probability contours. This enlargement can be attributed to the intrinsic clustering signal of these interlopers contributing to the measured 2PCF. Specifically, as discussed in Sect. 2.3, the S III and O III line interlopers exhibit BAO features located, respectively, at smaller and larger separations than that of the BAO peak of the correct galaxies 2PCF. Consequently, the BAO peak of the contaminated sample is broader than that of a pure sample, since it could be roughly modelled as the superposition of three different Gaussian curves, leading to larger uncertainties on the position of the centroid. Similar results hold for the other redshift bins (see Fig. 16).
Figure 17 concludes our analysis by illustrating the systematic bias between the contaminated and correct cases across all redshift bins. As discussed in Sect. 6.3, we quantify this uncertainty as α∥, ⊥contam − α∥, ⊥correct (blue and red dots). The error bars represent the error on the mean, estimated by dividing the standard deviations of the α∥ and α⊥ posterior distributions by the square root of the number of mocks. For comparison, the coloured bands represent the standard deviations of α∥ (blue) and α⊥ (yellow) in the correct galaxy case rescaled by a factor
to account for the six-fold larger area covered by the complete Euclid survey in DR3. In all instances, the systematic bias is significantly smaller than the statistical uncertainty.
![]() |
Fig. 17. Comparison between the systematic bias on the two AP parameters with respect to the reference case (correct). |
All these results are in agreement with the fact that residuals between the correct and the measured 2PCF are very small at the BAO scale, as shown in Sect. 2.3. In conclusion, we find that a minimal 2PCF model which does not include the interloper galaxies contribution does not introduce significant systematic errors in the estimate of the AP parameters when applied to contaminated catalogues, and therefore it could be safely adopted in the analysis of the Euclid DR1 survey.
8. Conclusions
In this work we investigated the impact of interloper galaxies on the Euclid 2PCF and the effect of this impact on cosmological parameter inference during the early phase of the mission. Our forecast is based on the set of 1000 EuclidLargeMocks catalogues that simulate the survey area of Euclid DR1 and incorporate the realistic types and fractions of galaxy contaminants with incorrect measured redshifts. We introduced the formal expression for the measured 2PCF in Eq. (17), which specifies the individual contributions of each type of interloper. Using this expression, we evaluated the relative contributions of each population to the measured 2PCF signal.
In Sect. 5, we showed the amplitude of all the different terms compared to the measured signal. These results allowed us to quantify the relevance of each term. The dominant contribution comes from the correct galaxy population, followed by the auto-correlation of line interlopers and the cross-correlation of the correct galaxies with noise interlopers. The relative importance of these latter contributions changes with redshift. All other contributions, including cross-correlation terms and the noise-noise auto-correlation, are negligible in the expected Euclid spectroscopic sample. All relevant contributions exhibit a distinct scale dependence. The measured 2PCF is not merely a rescaled version of the true underlying one but instead has its own shape. As a result, the amplitude of systematic errors induced by neglecting some terms in the modelling of the measured signal shows a scale dependence as well. In general, these systematic errors tend to decrease when going towards large separations. For example, for z ∈ [1.3, 1.5], when all interloper contributions are ignored and only the correct galaxies contribution is considered, the mismatch with the measured 2PCF is comparable to the statistical error expected for the DR1 Euclid sample below 30 h−1Mpc. This mismatch decreases down to 20% of the statistical error at the BAO scale and is further reduced on larger scales. We conclude that the sensitivity to interlopers is larger when the 2PCF analysis is extended to smaller scales. In all considered redshift bins, modelling the correct-noise term always lowers the systematic error below 10% of the statistical uncertainty on the single measurement of the measured 2PCF at all scales.
In principle, these results do not rule out the possibility that it may be necessary to develop a model for the correct-noise cross-correlation and include it in the complete theoretical model for the cosmological analysis, especially if we are interested in studying the smallest non-linear scales in view of DR3. In fact, the assessment of the impact of interlopers in the Euclid mission will evolve as the survey progresses. As the survey area expands, sample variance and Poisson noise will decrease and the attained depth of the EDS will increase. This should lead to improved calibration and reduced systematic errors, including better control over interloper types and fractions. However, as long as we limit to DR1-like uncertainties, a modelling that only accounts for correct galaxies and line interlopers auto-correlations leads to systematic errors smaller than the expected statistical uncertainty. It should also be kept in mind that the amplitude and significance of each contribution and systematic effect depend on the considered redshift interval via the types and fractions of interlopers that we expect to have. This means that we are only able to draw precise final conclusions when the true fraction of interlopers in the Euclid survey is determined, which will occur after comparing EWS and EDS observations.
To evaluate the impact of interlopers on the inference of cosmological parameters, we conducted two analyses comparing the measured 2PCF with a series of 2PCF models that incorporate interloper contributions with increasing complexity. In the first analysis, we used the full 2PCF shape to constrain the clustering amplitude in redshift space, quantified by the normalized growth rate of structure fσ8 and the bσ8 parameter (Sect. 6). In the second one, we focused on the BAO feature and extracted the AP parameters α∥ and α⊥ (Sect. 7). Both studies indicated that for Euclid DR1, a minimal 2PCF model that accounts for the attenuation of the clustering signal, independent of the type of contaminants, is sufficient to accurately extract cosmological parameters in the presence of the expected interloper types and fractions.
In the full-shape analysis, we adopted a simple model for the 2PCF that accounts for a fixed redshift error, a damping of the BAO, linear RSD, and a Fingers-of-God damping on small scales (Addison et al. 2019). We tested different phenomenological models of increasing complexity in light of the results obtained in Sect. 5, while working at a fixed cosmology. To disentangle the systematic effects due to a specific interloper modelling from the suitability of the theoretical model chosen for the 2PCF, we always compared our results to a reference case in which only the non-contaminated part of the catalogue is used, so that the theoretical model is applied to a sample without interlopers. Our results showed that a simple 2PCF model, which only accounts for the attenuation of the correct galaxies clustering, is sufficient to estimate fσ8 with 1%–3% accuracy with respect to the reference case, depending on the redshift. The systematic error induced by an incomplete modelling of the interloper galaxies is well below the statistical uncertainty that we expect for DR1, in all tested models. For the DR3 analysis, in which the statistical error is expected to decrease by a factor of about
, the effect of interlopers will require a more elaborate model than a simple attenuation in the clustering amplitude.
In the BAO analysis, we used the 2PCF template BAO model proposed by Euclid Collaboration: Sarpa et al. (in prep.) to fit the measured 2PCF. We quantified the systematic error induced by ignoring the presence of all interlopers altogether in the theoretical modelling. We used the same model to fit both the 100% pure and contaminated samples. Our results showed that the BAO analysis is extremely robust to the presence of interlopers. Specifically, the estimated values for α∥ and α⊥ obtained when ignoring the contribution of any type and fraction of interloper are almost identical to those from the analysis of the 100% pure sample, and the systematic bias between the two cases is significantly smaller than the statistical error expected not just for DR1 but also for DR3 spectroscopic catalogues of Euclid.
The results presented here form the basis for a companion paper, where the impact of interlopers on the two-point statistics is examined in Fourier space rather than in configuration space (Euclid Collaboration: Lee et al., in prep.). The primary difference between the two studies lies in the power spectrum model adopted. In the companion paper, the model is based on Effective Field Theory of Large-Scale Structure (EFT-of-LSS) predictions, which are expected to better capture non-linear effects. Although EFT models can in principle yield more precise cosmological inferences by extending the analysis to smaller scales and capturing non-linear effects, in this work our primary goal is different. As stated at the beginning of Sect. 6 and reiterated here in the Conclusion, we aim to disentangle the impact of interloper mis-modelling from the suitability of the theoretical description adopted for the 2PCF. For this reason, we consistently focus on the shifts in cosmological parameters between the reference case without interlopers and the test cases with specific interloper modelling, always at a fixed theoretical model for the 2PCF. This strategy makes our analysis effectively model independent and allows us to rely on a simpler theoretical framework with fewer nuisance parameters to marginalize over. As a validation of this approach, the results obtained in both configuration and Fourier space analyses are fully consistent.
A final aspect that we have not considered in this work is the so-called BAO reconstruction. State-of-the-art BAO analyses are performed on reconstructed catalogues, which are obtained from a backward non-linear transformation of the redshift space positions of the observed galaxies (Eisenstein et al. 2007; Padmanabhan et al. 2009) to remove the effect of non-linear evolution leading to the damping of the BAO feature. However, the quality of the reconstruction depends on the characteristics of the sample and, potentially, on the presence of interlopers. For this reason, we plan to further investigate this in the future: a key question is whether the BAO template model needs to be updated or it is flexible enough to capture all possible spurious effects that interlopers can introduce into the reconstruction. We plan to address this issue, along with other systematics, in a dedicated paper that analyses the first set of unblinded Euclid data (Euclid Collaboration: Sarpa et al., in prep.).
The interloper fraction assumed in this work, which never exceeds 20%, aligns with the expected contamination level by the end of the survey. This assumption has guided the construction of the simulated galaxy catalogues used to assess the impact of interlopers on clustering statistics. However, it is not guaranteed that this target contamination level will be achieved by the time of the first data release. The analysis of the recently obtained Euclid Quick Data Release (Q1, Euclid Collaboration: Aussel et al. 2025) data suggests that the contamination level is significantly higher than the one adopted in this work (Euclid Collaboration: Le Brun et al. 2026). While it is important to emphasize that a higher contamination level does not diminish the scientific value of the Q1 data – whose primary focus is astrophysical rather than cosmological and does not necessarily require a high level of purity – this finding serves as a warning that the 20% target contamination level may not be reached in DR1. If that is the case, we will need to update our simulated catalogues and repeat the analysis presented in this paper to reassess the impact of a higher interloper fraction on the BAO and full-shape analysis of the galaxy 2PCF.
Another crucial aspect to consider is our ability to accurately estimate the contamination level and characterize the nature of the contaminants. The baseline strategy adopted in the Euclid data analysis pipeline involves estimating the sample’s purity and completeness by comparing galaxy catalogues extracted at full depth with those obtained at the depth of the wide survey, using data from the EDS (Granett et al., in prep.). Since this method depends on the availability of full-depth EDS, which will only be achieved by DR3, it may not be optimal for the first data release. For this reason, alternative approaches are being explored. These include comparisons of redshift measurements for known sources against external, reliable datasets (Saito et al. 2020), self-calibration techniques based on galaxy-galaxy correlations across redshift bins (Peng & Yu 2023), and clustering redshifts (d’Assignies et al., in prep.). For DR1 in particular – where interloper quantification remains tentative – it will also be crucial to perform statistical tests to validate our contamination modelling, following approaches such as that described in Nguyen et al. (2025).
Acknowledgments
The authors acknowledges support from MIUR, PRIN 2022 (grant 2022NY2ZRS 001). Simulations and computations in this work have been run at the computing facilities of INFN, Sezione di Genova: the authors wish to thank the INFN IT personnel in Genova for their precious and constant support. P.M. acknowledges support from Italian Research Center on High Performance Computing Big Data and Quantum Computing (ICSC), by the Fondazione ICSC National Recovery and Resilience Plan (PNRR) Project ID CN-00000013 and by the PRIN 2022 PNRR project (code no. P202259YAF) funded by “European Union – Next Generation EU”, Mission 4, Component 1, CUP J53D23019100001. We acknowledge usage of Pleiadi system of INAF (Taffoni et al. 2020; Bertocco et al. 2020). The Euclid Consortium acknowledges the European Space Agency and a number of agencies and institutes that have supported the development of Euclid, in particular the Agenzia Spaziale Italiana, the Austrian Forschungsförderungsgesellschaft funded through BMIMI, the Belgian Science Policy, the Canadian Euclid Consortium, the Deutsches Zentrum für Luft- und Raumfahrt, the DTU Space and the Niels Bohr Institute in Denmark, the French Centre National d’Etudes Spatiales, the Fundação para a Ciência e a Tecnologia, the Hungarian Academy of Sciences, the Ministerio de Ciencia, Innovación y Universidades, the National Aeronautics and Space Administration, the National Astronomical Observatory of Japan, the Netherlandse Onderzoekschool Voor Astronomie, the Norwegian Space Agency, the Research Council of Finland, the Romanian Space Agency, the State Secretariat for Education, Research, and Innovation (SERI) at the Swiss Space Office (SSO), and the United Kingdom Space Agency. A complete and detailed list is available on the Euclid web site (www.euclid-ec.org/consortium/community/).
References
- Addison, G. E., Bennett, C. L., Jeong, D., Komatsu, E., & Weiland, J. L. 2019, ApJ, 879, 15 [NASA ADS] [CrossRef] [Google Scholar]
- Ballinger, W. E., Peacock, J. A., & Heavens, A. F. 1996, MNRAS, 282, 877 [NASA ADS] [CrossRef] [Google Scholar]
- Bertocco, S., Goz, D., Tornatore, L., et al. 2020, in Astronomical Data Analysis Software and Systems XXIX, eds. R. Pizzo, E. R. Deul, J. D. Mol, J. de Plaa, & H. Verkouter, ASP Conf. Ser., 527, 303 [NASA ADS] [Google Scholar]
- Eisenstein, D. J., Seo, H., Sirko, E., & Spergel, D. N. 2007, ApJ, 664, 675 [NASA ADS] [CrossRef] [Google Scholar]
- Euclid Collaboration (Blanchard, A., et al.) 2020, A&A, 642, A191 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Euclid Collaboration (Scaramella, R., et al.) 2022, A&A, 662, A112 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Euclid Collaboration (Gabarra, L., et al.) 2023, A&A, 676, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Euclid Collaboration (Aussel, H., et al.) 2025, A&A, submitted, [arXiv:2503.15302] [Google Scholar]
- Euclid Collaboration (de la Torre, S., et al.) 2025, A&A, 700, A78 [Google Scholar]
- Euclid Collaboration (Jahnke, K., et al.) 2025, A&A, 697, A3 [Google Scholar]
- Euclid Collaboration (Mellier, Y., et al.) 2025, A&A, 697, A1 [Google Scholar]
- Euclid Collaboration (Monaco, P., et al.) 2025, A&A, 704, A306 [Google Scholar]
- Euclid Collaboration (Le Brun, V., et al.) 2026, A&A, in press, https://doi.org/10.1051/0004-6361/202554607 [Google Scholar]
- Farrow, D. J., Sánchez, A. G., Ciardullo, R., et al. 2021, MNRAS, 507, 3187 [Google Scholar]
- Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, PASP, 125, 306 [Google Scholar]
- Foroozan, S., Massara, E., & Percival, W. J. 2022, JCAP, 10, 072 [Google Scholar]
- Hilmi, M., Leethochawalit, N., Trenti, M., & Metha, B. 2024, MNRAS, 532, 920 [Google Scholar]
- Ivanov, M. M., & Sibiryakov, S. 2018, JCAP, 07, 053 [Google Scholar]
- Keihänen, E., Kurki-Suonio, H., Lindholm, V., et al. 2019, A&A, 631, A73 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [Google Scholar]
- Levi, M., Allen, L. E., Raichoor, A., et al. 2019, BAAS, 51, 57 [NASA ADS] [Google Scholar]
- Monaco, P., Theuns, T., & Taffoni, G. 2002, MNRAS, 331, 587 [Google Scholar]
- Munari, E., Monaco, P., Sefusatti, E., et al. 2017, MNRAS, 465, 4658 [Google Scholar]
- Nguyen, A. B., Massara, E., & Percival, W. J. 2024, JCAP, 04, 008 [Google Scholar]
- Nguyen, A. B., Bonici, M., McGee, G., & Percival, W. J. 2025, JCAP, 01, 008 [Google Scholar]
- Padmanabhan, N., White, M., & Cohn, J. D. 2009, PRD, 79, 063523 [Google Scholar]
- Peng, H., & Yu, Y. 2023, MNRAS, 526, 820 [Google Scholar]
- Pullen, A. R., Hirata, C. M., Doré, O., & Raccanelli, A. 2015, PASJ, 68, 12 [Google Scholar]
- Ross, A. J., Beutler, F., Chuang, C.-H., et al. 2017, MNRAS, 464, 1168 [Google Scholar]
- Saito, S., de la Torre, S., Ilbert, O., et al. 2020, MNRAS, 494, 199 [NASA ADS] [CrossRef] [Google Scholar]
- Sarpa, E., Veropalumbo, A., Schimd, C., Branchini, E., & Matarrese, S. 2021, MNRAS, 503, 540 [Google Scholar]
- Taffoni, G., Becciani, U., Garilli, B., et al. 2020, in Astronomical Data Analysis Software and Systems XXIX, eds. R. Pizzo, E. R. Deul, J. D. Mol, J. de Plaa, & H. Verkouter, ASP Conf. Ser., 527, 307 [NASA ADS] [Google Scholar]
For comparison to Pullen et al. (2015): ztrue ≡ zInt, zmeas ≡ zSELG, and λwrong ≡ λSEL.
Organisation Unit SPEctral extraction and redshift data.
The subscript ‘m’ stands for ‘measured’, since this is the only 2PCF that we are actually going to measure in the survey.
Sometimes we use the shorter version ‘ref’, especially inside equations.
The superscript ‘meas’ highlights that the 2PCF on the left are the measured ones (in this case performed on the mock catalogues), whereas the ones on the right are theoretical models.
As a caveat, galaxy bias depends on luminosity and interlopers do not have the same luminosity as the corresponding Hα galaxies, so it is not guaranteed that the bias can be fit by a single relation, as happens to be true in this case.
Appendix A: Scale dependence of the 2PCF prefactors
Equation (17) is exact when measuring all 2PCFs through the LS estimator. All terms in the equation are two-dimensional quantities depending on both the separation scale r and the cosine of the angle with respect to the line of sight μ. This holds for the prefactors in front of each term as well, since the pair numbers in the random catalogues depend on the window function and on the redshift distribution of each galaxy population. However, we expect that at first order those prefactors are linked to the fractions of interlopers in the catalogue. This comes naturally from Eq. (6), which says that the total density contrast is the weighted sum of the density contrasts of the single populations: the more abundant a population is, the greater its contribution to the total signal. The crucial point is that the fractions in Eq. (6) depend on the window function of the corresponding galaxy population.
Contrary to the correlation function, the scale dependence of the prefactors cannot be modelled from first principles but requires a specific knowledge of the survey geometry and selection function. It is therefore crucial to understand if this scale dependence is actually significant or if it can be ignored when modelling the terms on the right hand side of Eq. (17).
Figure A.1 shows the two-dimensional prefactors p(r, μ) for the correct galaxies, all populations of interlopers, and the correct-noise cross-correlation term in the low-redshift bin (see the left scale in the colour bar). Both an angular and radial dependence arise that are different for each term. To quantify the relevance of this scale dependence, we compare each constant prefactor (dubbed ci, where the subscript identifies the various prefactors) appearing in Eq. (18), with the corresponding exact value pi derived from the pair count ratios. We show the percent difference between the two, defined as
![]() |
Fig. A.1. Two-dimensional prefactors p(r, μ) (left side of the colour bar scale) and percent difference Δp(r, μ) (right side) for a subset of the total terms in Eq. (17) for z ∈ [0.9, 1.1]. |
(A.1)
The right scale of the colour bar in Fig. A.1 shows the two-dimensional percent difference Δp(r, μ) of the showed two-dimensional prefactors. We see that, whereas the correct galaxy prefactor shows a mild sub-percent difference with respect to the constant prefactor, there are some prefactors (like the O III one) which show a tens of percent variation across the measured r and μ interval.
To allow an easier visualization and interpretation of these results, it is useful to look at the same quantities just shown in one single dimension, i.e. considering the scale dependence on the separation r only. Figure A.2 shows the r-dependent version p(r) of the prefactors in Fig. A.1, i.e. the two-dimensional prefactors averaged over μ, compared to their constant counterpart (dashed lines) for all redshift bins.
![]() |
Fig. A.2. One-dimensional r-dependent prefactors p(r) for a subset of terms in Eq. (17), compared to their constant counterpart (dashed lines) for all redshift bins (different colours). |
Two features are worth noticing. First of all, the curves are systematically offset from the dashed lines. Offsets can be positive or negative, but in all cases they largely exceed the Poisson error derived from the pair counting (not even visible in the plot). The reason for the mismatch is that number counts used to estimate these prefactors only consider pairs within the maximum separation used to compute the 2PCF, in this case 200 h−1Mpc, ignoring objects separated by larger distances. This systematic offset is therefore an artefact that could be eliminated by considering all the pair counts in the sample, which, however, would be too expensive from a computational point of view.
The second feature is that, while in some cases the coloured curves are reasonably constant, for some samples they are not and, instead, exhibit a very significant r-dependence. To make this statement more quantitative, we plot in Fig. A.3 the percent difference Δp(r) of the prefactors with respect to the corresponding constant value, i.e. Eq. (A.1) considering p(r) instead of p(r, μ). The results in all redshift bins are shown together. The percent difference shown on the y axis quantifies the mismatch between the fraction of objects estimated in whole sample and those actually used in estimating the counts, as previously discussed. The maximum percent variation across the r interval, on the other hand, quantifies the magnitude of the r-dependence and the prefactors’ genuine departure from a constant behaviour. Here we can see much more clearly that, depending on the redshift bin and on the term under analysis, the scale dependence can be significant, largely exceeding the 1% level.
![]() |
Fig. A.3. One-dimensional r-dependent percent difference Δp(r) between scale-dependent and constant prefactors for a subset of terms in Eq. (17), for all redshift bins (different colours). |
The specific spatial dependencies for each prefactor arise from the different redshift distributions N(z) of the various objects in the contaminated sample. Objects in all random catalogues have a uniform angular distribution, but different redshift distributions matching those of the corresponding object type. As a result, the random pair counts will feature a dependence on both r and μ that is different for all the random samples. This r and μ dependence in the counts of the various random catalogue propagates into the 2PCF prefactors and originate the departure from the constant prefactor model, depending on the survey window function and on the redshift distribution of the sources.
Ultimately, the scale dependence of the prefactors was not significant for our purposes. In modelling the measured 2PCF in Sect. 6.2, we use both the exact prefactor and a constant approximation in front of the correct correlation. The latter is employed when fitting the unknown contamination fraction of the sample, which is reasonable given the correct prefactor’s very mild scale dependence across all redshift bins. For line interlopers, we choose a conservative approach by using the exact prefactors in front of the line correlations. Although the line interloper prefactors exhibit a stronger scale dependence (up to tens of percent), their absolute value remains small due to the low fraction of these interlopers compared to correct galaxies. Therefore, while scale dependence exists, it does not significantly affect the measured 2PCF at DR1-like precision.
Appendix B: Minimum separation scale for the amplitude parameters fit
As evidence that our theoretical model for the power spectrum is unable to yield realistic results when fitting smallest separation scales, we compare the outcomes obtained by considering a minimum scale of rmin = 20 h−1Mpc and rmin = 40 h−1Mpc in the fit.
Figure B.1 shows the contour plots in z1 and z3 for the reference case and for the minimal correct-only model with the exact prefactor for the correct galaxies contribution. When we include scales up to a minimum separation scale of 20 h−1Mpc in the fit, the bias between the inferred value of fσ8 and its true value derived from cosmology in z1 exceeds the statistical uncertainty with which we expect to determine fσ8 in DR1. Conversely, this bias is smaller than the uncertainty we foresee for DR1 when considering a minimum separation scale of 40 h−1Mpc in the fit (same figure, right panels), independently on the redshift bin. Considering smaller scales lowers the statistical uncertainty, since we are adding signal both in the monopole and quadrupole, and therefore we become more sensitive to systematic effects related to the choice of the power spectrum model. This demonstrates that the simple power spectrum model we have decided to adopt in this analysis (see Sect. 6.1) cannot be considered reliable when dealing with scales smaller than 40 h−1Mpc, even in a context comparable to DR1 and in absence of contamination (grey contours).
![]() |
Fig. B.1. Contour plot of the correct galaxy parameters for the reference case (grey contour) and the minimal correct-only model with the exact prefactor (green contour) in z ∈ [1.3, 1.5], running chains with rmin = 20 h−1 Mpc (left) and rmin = 40 h−1 Mpc (right). |
The bias between the returned values in the reference case and the theoretical ones quantifies the inadequacy of the chosen model to represent the power spectrum. On the other hand, the bias between the results of various tests we performed to fit the contaminated signal, with or without considering certain types of interlopers, and those obtained in the reference case quantifies what we aim to determine: the impact of more or less accurate modelling of interlopers on the estimation of cosmological parameters at different redshifts. This bias graphically corresponds to the shift between the grey contours (or dots, depending on the plot) and the coloured ones, and it is analysed in Sect. 6.4.
Appendix C: The correct-noise cross-correlation in the EuclidLargeMocks
The separation of correct galaxies and noise interlopers based on |zmeas − ztrue| is affected by the intrinsic ambiguity due to the fact that galaxies without emission lines can acquire a roughly correct redshift by chance. This induces some level of correct-noise cross-correlation. In our mocks, this is emphasized by the assumed cut at 3 σ0, z = 0.003 of the redshift residuals when selecting correct galaxies. However, looking at Fig. 1, we can appreciate that noise interlopers have a tendency to cluster around the correct galaxies and line interlopers lines (light blue shaded area around the straight lines) and to correlate with them up to distances of many σ0, z. This has an impact on the correct-noise cross-correlation, which is contributed by all those noise interlopers which where artificially shifted along the line of sight by Δr < 200 h−1Mpc (the maximum scale we consider when computing the 2PCF). In the EuclidLargeMocks, these sources make up the 10%–20% of noise contaminants, depending on the redshift bin under analysis. To prove that our interpretation is correct, we attempted to model the cross-correlation signal between these two populations assuming that a fraction of the noise interlopers is made by correct galaxies with a larger (Gaussian) redshift error:
(C.1)
We tested various values for the Gaussian redshift error σ0, z. Figure C.1 shows the results of this test. Each coloured curve represents the results of the fit using the correct-only model (see Sect. 6.2) with a different redshift error standard deviation value. In this case, the parameterisation in Eq. (C.1) appears to work particularly well for an effective σ0, z = 0.035, a value 35 times higher than the instrumental error expected for Euclid. If we consider as an example a noise interloper originally located at ztrue = 1, whose redshift was mistaken by one effective σ0, z so that zmeas = 1.035, the corresponding shift along the line of sight is Δr ≈ 57 h−1Mpc < 200 h−1Mpc. One may wonder if, in view of final Euclid results, an explicit modelling of these cross terms would not be necessary at some point. However, because a separation of the two classes of galaxies is not possible in the EWS, and because the effect of noise interlopers is significant only when the galaxies are close to their true redshift (noise interlopers with large redshift errors do not correlate any more with correct galaxies), this modelling can be absorbed in the modelling of the tails of the P(zmeas|ztrue) PDF. We know from simulations and preliminary measurements that the shape of this PDF shows heavy tails, which would further increase the number of correct galaxies misinterpreted as noise interlopers with a 3 σ0, z criterion. Therefore, the treatment of the tails of the redshift random error PDF is clearly a point to be further investigated.
![]() |
Fig. C.1. Monopole (left) and quadrupole (right) of the correct-noise cross-correlation measurements and models in z ∈ [0.9, 1.1]. The solid black line represents the average of the ξcn measurements over the mock catalogues. Dashed lines correspond to different models for the signal (see Eq. C.1), each colour representing a different value for the redshift error in the 2PCF model. |
All Tables
Geometric distortion factors γ for misidentified S III and O III galaxies in the four baseline Euclid spectroscopic bins.
Cosmological parameters that define the flat ΛCDM cosmology used to perform the EuclidLargeMocks parent simulations.
Mean fractions of the different interloper types in the four spectroscopic redshift bins in the contaminated EuclidLargeMocks.
Summary of all tests run in the MCMC, including the reference case (first line).
Best fit, percent relative error, and χ2 values for the AP parameters in all the redshift bins for both correct and contaminated catalogues, with α∥ and α⊥ values shifted by subtracting 1.
All Figures
![]() |
Fig. 1. Representation of interloper galaxies in the (ztrue, zmeas) plane, representative of the Euclid spectroscopic selection. The extent of the vertical axis corresponds to the baseline observed redshift range used for the spectroscopic analysis. The plot refers to one of the EuclidLargeMocks (see Euclid Collaboration: Monaco et al. 2025, and Sect. 4.1). |
| In the text | |
![]() |
Fig. 2. Monopole of the contaminated sample auto-correlation (dashed black line) compared to the intrinsic correct galaxy (blue line) and line interloper (green and pink line) auto-correlations, not weighted by the prefactors in Eq. (17). The 2PCF are averaged over the EuclidLargeMocks in z ∈ [1.3, 1.5]. We can appreciate both the dilution of the clustering amplitude in the presence of contamination and the distortion of the line interlopers’ signal, in particular the shift of the BAO peak. |
| In the text | |
![]() |
Fig. 3. Monopole, quadrupole, and hexadecapole moments of all terms in Eq. (17) averaged over all mock catalogues for z ∈ [0.9,1.1] (left) and z ∈ [1.3,1.5] (right). All terms comprise the correlation function and the corresponding prefactor. To simplify the visualization of all terms, the rightmost column of each panel shows a zoom-in on the smallest contributions in the corresponding redshift bin. |
| In the text | |
![]() |
Fig. 4. Systematic errors in the monopole, quadrupole, and hexadecapole moments in the case of an incomplete parameterization, for z ∈ [0.9,1.1] (left) and z ∈ [1.3,1.5] (right). The grey bands correspond to the statistical uncertainty σm on the measured monopole and to 10% σm. The y axis scale is linear between −10−3 and 10−3, and symmetric logarithmic elsewhere. |
| In the text | |
![]() |
Fig. 5. Redshift dependence of the bias in the EuclidLargeMocks, obtained by fitting the 2PCF of each population in the Euclid spectroscopic bins. Each colour refers to a different population and reveals the true redshifts of the line interlopers compared to correct galaxies. On top of the bias values, we plot the linear relation we derived to interpolate the bias’ redshift dependence. |
| In the text | |
![]() |
Fig. 6. Comparison between the reference case (grey line), a model including interlopers (the minimal one, green line), and a model with contributions from correct galaxies only without the proper weighting (red line) in z ∈ [1.3, 1.5]. |
| In the text | |
![]() |
Fig. 7. Inferred fσ8 and bσ8 values and 1σ uncertainties for all tests in the two reference redshift bins. They grey band corresponds to 1σ around the values inferred in the reference case. |
| In the text | |
![]() |
Fig. 8. Contour plots for fσ8 and bσ8 derived from three representative tests in z ∈ [0.9, 1.1] (top) and z ∈ [1.3, 1.5] (bottom). The dashed grey lines indicate the mean values of the reference case posterior distribution. |
| In the text | |
![]() |
Fig. 9. Comparison between the cosmological parameters’ constraints obtained with a correct-only constant-prefactor model (contam vs. (1 − ftot)2*correct) with an uncertainty of 1% and 10% on the total contamination fraction ftot. As an example, we show the results for z3. |
| In the text | |
![]() |
Fig. 10. Independently from the width of the prior on the total contamination fraction ftot, the β parameter is always constrained and its uncertainty does not change in a correct-only constant-prefactor model. |
| In the text | |
![]() |
Fig. 11. Percent difference between the fσ8 value estimated in each chain and the one inferred in the reference case. Here, the errors on the percent difference are derived by considering the errors on the mean value of fσ8, i.e. the uncertainties given by the chains divided by the square root of the number of mocks. |
| In the text | |
![]() |
Fig. 12. Agreement between the measured 2PCF |
| In the text | |
![]() |
Fig. 13. Systematic bias on fσ8 of all tests with respect to the reference case. The grey bands represent an estimate of the precision on fσ8 at the end of the mission (DR3), when the observed volume will be six times bigger than DR1. As in Fig. 11, the error bars were derived using the errors on the mean values of the inferred fσ8. |
| In the text | |
![]() |
Fig. 14. Alcock–Paczynski parameters in the redshift range z ∈ [0.9, 1.1]. |
| In the text | |
![]() |
Fig. 15. Alcock–Paczynski parameters in the redshift range z ∈ [1.3, 1.5]. |
| In the text | |
![]() |
Fig. 16. Summary plot of the AP parameters in all the redshift bins, both in the correct and contaminated case. |
| In the text | |
![]() |
Fig. 17. Comparison between the systematic bias on the two AP parameters with respect to the reference case (correct). |
| In the text | |
![]() |
Fig. A.1. Two-dimensional prefactors p(r, μ) (left side of the colour bar scale) and percent difference Δp(r, μ) (right side) for a subset of the total terms in Eq. (17) for z ∈ [0.9, 1.1]. |
| In the text | |
![]() |
Fig. A.2. One-dimensional r-dependent prefactors p(r) for a subset of terms in Eq. (17), compared to their constant counterpart (dashed lines) for all redshift bins (different colours). |
| In the text | |
![]() |
Fig. A.3. One-dimensional r-dependent percent difference Δp(r) between scale-dependent and constant prefactors for a subset of terms in Eq. (17), for all redshift bins (different colours). |
| In the text | |
![]() |
Fig. B.1. Contour plot of the correct galaxy parameters for the reference case (grey contour) and the minimal correct-only model with the exact prefactor (green contour) in z ∈ [1.3, 1.5], running chains with rmin = 20 h−1 Mpc (left) and rmin = 40 h−1 Mpc (right). |
| In the text | |
![]() |
Fig. C.1. Monopole (left) and quadrupole (right) of the correct-noise cross-correlation measurements and models in z ∈ [0.9, 1.1]. The solid black line represents the average of the ξcn measurements over the mock catalogues. Dashed lines correspond to different models for the signal (see Eq. C.1), each colour representing a different value for the redshift error in the 2PCF model. |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.






















