The Pristine survey

Akshara Viswanathan; Amanda Byström; Else Starkenburg; Anne Foppen; Jill Straat; Martin Montelius; Federico Sestito; Kim A. Venn; Camila Navarrete; Tadafumi Matsuno; Nicolas F. Martin; Guillaume F. Thomas; Anke Ardern-Arentsen; Giuseppina Battaglia; Morgan Fouesneau; Julio Navarro; Sara Vitali

doi:10.1051/0004-6361/202452073

Home

All issues

Volume 706 (February 2026)

A&A, 706 (2026) A195

Full HTML

Open Access

Issue		A&A Volume 706, February 2026


Article Number		A195
Number of page(s)		27
Section		Galactic structure, stellar clusters and populations
DOI		https://doi.org/10.1051/0004-6361/202452073
Published online		12 February 2026

A&A, 706, A195 (2026)

XXVIII. Journey to the Galactic outskirts: Mapping the outer halo red giant stars down to the very metal-poor end

Akshara Viswanathan¹^,2^,★, Amanda Byström³^,1^★★, Else Starkenburg¹, Anne Foppen¹, Jill Straat¹, Martin Montelius¹, Federico Sestito², Kim A. Venn², Camila Navarrete⁴, Tadafumi Matsuno⁵, Nicolas F. Martin⁶^,7, Guillaume F. Thomas⁸^,9, Anke Ardern-Arentsen¹⁰, Giuseppina Battaglia⁸^,9, Morgan Fouesneau⁵, Julio Navarro² and Sara Vitali¹¹

¹ Kapteyn Astronomical Institute, University of Groningen, Landleven 12, 9747 AD Groningen, The Netherlands
² Dept. of Physics and Astronomy, University of Victoria, PO Box 3055, STN CSC, Victoria, BC V8W 3P6, Canada
³ Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK
⁴ Université Côte d’Azur, Observatoire de la Côte d’Azur, CNRS, Laboratoire Lagrange, Nice, France
⁵ Astronomisches Rechen-Institut, Zentrum für Astronomie der Universität Heidelberg, Mönchhofstraße 12-14, 69120 Heidelberg, Germany
⁶ Université de Strasbourg, CNRS, Observatoire astronomique de Strasbourg, UMR 7550, 67000 Strasbourg, France
⁷ Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany
⁸ Instituto de Astrofísica de Canarias, Calle Vía Láctea s/n, 38206 La Laguna, Santa Cruz de Tenerife, Spain
⁹ Universidad de La Laguna, Avda. Astrofísico Francisco Sánchez, 38205 La Laguna, Santa Cruz de Tenerife, Spain
¹⁰ Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge CB3 0HA, UK
¹¹ Núcleo de Astronomía, Facultad de Ingeniería y Ciencias Universidad Diego Portales, Ejército 441, Santiago, Chile

^★★ Corresponding authors: This email address is being protected from spambots. You need JavaScript enabled to view it. ; This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 31 August 2024
Accepted: 5 October 2025

Abstract

Context. The outer Galactic halo remains relatively unexplored, particularly regarding its metallicity distribution, merger debris, and the population of very and extremely metal-poor ([Fe/H] < -2.5) stars.

Aims. Using photometric metallicities from the Pristine survey data release 1 (PDR1) and Pristine-Gaia synthetic (PGS) catalogue and Gaia DR3 astrometry, we constructed well-characterised samples of bright (G < 17.6) red giant branch (RGB) stars in the outer halo. With accurate distances, these samples enable studies of the halo’s metallicity distribution, accreted debris, and very metal-poor (VMP) substructures beyond 40 kpc.

Methods. We selected giants by excluding stars with reliable Gaia parallaxes in brightness ranges where dwarfs are measurable. Purity and completeness were validated against the Pristine spectroscopic training set. Distances were derived using BaSTI isochrone fitting combined with Pristine metallicity estimates.

Results. The photometric distances reach ~100 kpc (PDR1) and ~70 kpc (PGS), with typical uncertainties of 12% and scatter up to 20-40% compared to parallax- and StarHorse-based distances. The PDR1 sample provides a nearly unbiased metallicity-distance view, while the PGS sample offers an all-sky map, especially at the very metal-poor end. Using PDR1-giants, we traced the halo metallicity distribution function out to 101 kpc, fitting a three-component Gaussian mixture model. The most metal-poor component becomes increasingly dominant with distance, as beyond 50 kpc, 40-50% of the stars are very metal-poor ([Fe/H] < -2.0). With added radial velocities, we identified metallicity trends in integrals-of-motion space and investigated accreted debris. The PGS sample reveals substructures, including the Pisces Plume, where 41 VMP stars are linked to the Magellanic stream.

Conclusions. We publish two RGB catalogues: PDR1-giants (180 314 stars, with 10 096 very metal-poor candidates and 2096 beyond 40 kpc) and PGS-giants (2 420 898 stars, with 75 679 very metal-poor candidates and 267 beyond 40 kpc). These catalogues represent extensive resources for future outer halo studies.

Key words: methods: data analysis / Galaxy: formation / Galaxy: halo / Galaxy: kinematics and dynamics / Galaxy: stellar content / Galaxy: structure

^★

Both authors contributed equally.

© The Authors 2026

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.

1 Introduction

The stellar halo provides a unique insight into the assembly history of our Galaxy. Due to the long dynamical timescales, the halo has not undergone complete phase mixing. Therefore, measuring the orbital and chemical properties of halo stars can help reconstruct significant events in the Galaxy’s history. This is one of the main goals of the field of Galactic archaeology.

Early theories about the formation of the stellar halo considered both ‘dissipative’ (Eggen et al. 1962) and ‘dissipationless’ (Searle & Zinn 1978) formation channels. These are now referred to as ‘in situ’ and ‘accretion’ (or ‘ex situ’) channels. In the in situ scenario, halo stars are born within the Galaxy and are later dynamically heated to halo-like orbits (Cooper et al. 2015; Bonaca et al. 2017; Koppelman et al. 2018). Conversely, the accretion scenario, supported by hierarchical assembly in a cold dark matter cosmology, suggests that the halo was partially formed by the tidal disruption of smaller dwarf galaxies (Helmi & White 1999; Bullock & Johnston 2005; Abadi et al. 2006). In theory, the combined orbital and chemical properties of stars should provide a powerful method for understanding the origin of the halo. Such data can help differentiate between in situ and accreted stars based on distance, metallicity, and other factors. A key objective of Galactic archaeology is to identify the number of significant events contributing to the accreted halo, estimate the progenitor masses and orbital properties, and ultimately reconstruct the stellar halo’s formation. This research field has a rich history of using chemical and orbital properties to study the halo’s origins (e.g. Ryan & Norris 1991; Chiba & Beers 2000; Venn et al. 2004; Carollo et al. 2007; Bonaca et al. 2017; Helmi et al. 2018; Belokurov et al. 2018; Koppelman et al. 2018; Malhan et al. 2018; Koppelman et al. 2019; Myeong et al. 2019; Iorio & Belokurov 2019; Naidu et al. 2020; Yuan et al. 2020; Wang et al. 2022; Ibata et al. 2024).

Significant progress has been made in understanding the origin and nature of the Galactic halo, yet several key questions remain open. Recent work has shed light on the relative contributions of in situ and ex situ components, but the full extent of the in situ (hot thick disc) population and its dominance across the halo mass spectrum is still debated (Helmi et al. 2018; Belokurov et al. 2020; Lee et al. 2023; Khoperskov et al. 2023a). Investigating the radial and vertical extent of this component is essential for constraining the Galaxy’s accretion history On the ex situ side, it remains uncertain whether the outer halo is primarily built from a few massive merger events or a multitude of lower-mass, ultra-faint galaxies (e.g. Deason et al. 2015, 2023; Khoperskov et al. 2023b). The radial variation of metal-licity—i.e. whether it transitions to a more metal-poor spherical halo beyond 40 kpc—has also been proposed but not yet conclusively established (e.g. Dietz et al. 2020; Liu et al. 2022; Medina et al. 2025). Whether such trends are driven by specific merger events or reflect a smooth component from disrupted globular clusters is also still under discussion. Furthermore, evidence of apocentric pile-ups from ancient mergers is now emerging (e.g. Balbinot & Helmi 2021; Perottoni et al. 2022; Zhu et al. 2024), though the completeness of our current samples may be limited by selection effects. In particular, colour cuts and the use of metallicity-sensitive standard candles may bias our view of the outer halo as being exclusively metal poor (e.g. Conroy et al. 2019b; Youakim et al. 2020; Bonifacio et al. 2021). By probing the distant metal-poor halo using individual stars, we are effectively resolving the remnants of galaxies with stellar masses in the range M_* = 10⁶−10⁷ M_⊙, allowing us to study the accreted high-redshift galaxy population in unprecedented detail, which is currently beyond the mass ranges that can be studied using direct observation of high redshift galaxies.

Pre-Gaia, attempts to directly explore the distant halo faced the challenge of targeting rare far away stars without the advantage of Gaia parallaxes to exclude nearby contaminants, and therefore colour-cuts or low metallicity cuts had to be implemented. The main problem with these methods is that the selection and its consequences have not been explored fully to understand the advantages and limitations. Fortunately, the observational landscape is rapidly improving on multiple fronts. Gaia has measured proper motions and parallaxes for over 1.8 billion stars down to G~22 (Gaia Collaboration 2023b). While the majority of halo stars are too distant for precise parallax measurements and too faint for Gaia’s radial velocity measurements, these new data are transforming our understanding of the stellar halo close to the Sun (e.g. Lövdal et al. 2022; Ruiz-Lara et al. 2022; Dodd et al. 2023). Additionally, very metal-poor (VMP; [Fe/H] < -2.0) stars are considered ‘fossils’ of the early Universe, preserving its primordial chemical composition. Likely among the oldest stars, possibly even the first-generation (population III, Beers & Christlieb 2005; Frebel & Norris 2015; Hansen et al. 2018; Starkenburg et al. 2018) stars, VMP stars are key tracers of the Milky Way’s formation and have greatly advanced our understanding of its assembly history (see e.g. Bond 1970, 1981; Beers et al. 1985, 1992; Christlieb et al. 2002; Nissen & Schuster 2010; Yuan et al. 2020).

To date, nearly all observational work on the stellar halo has used tracers biased by metallicity. These biases arise because halo stars are rare and generally more metal poor than disc stars, combined with the need to efficiently use spectroscopic resources (Starkenburg et al. 2009; An et al. 2013; Battaglia et al. 2017). Before Gaia, the most efficient method to distinguish the halo from disc stars was by selecting stars with low metallicities. This bias can occur at two stages: when selecting targets for spectroscopic follow-up and when identifying halo stars from the final sample. For instance, the SDSS calibration stars used by Carollo et al. (2007, 2010) to study the stellar halo were selected based on their blue colours. The SDSS SEGUE sample of K giants, used to study the halo at great distances (Xue et al. 2015; Das & Binney 2016), was selected for spectroscopic follow-up using colour-cuts favouring low metallicities. Photometric metallicities of F/G turnoff stars are another common method for studying the stellar halo, but such samples also favour lower metallicity stars due to their construction based on colour-cuts (Ivezić et al. 2008; Sesar et al. 2011; Zuo et al. 2017). Additionally, rarer populations such as RR Lyrae and blue horizontal branch stars have been extensively used due to their status as standard candles (Deason et al. 2011; Cohen et al. 2017; Hernitschek et al. 2017; Sesar et al. 2017; Thomas et al. 2018; Starkenburg et al. 2019; Iorio & Belokurov 2019). However, these populations also preferentially trace metal-poor stars. The biases incurred by using these populations to study the halo are difficult to overcome without near-perfect knowledge of the underlying population and what fractions of stars were included in the sample. These different observational methods have led to sometimes conflicting conclusions about the chemical-orbital structure of the stellar halo.

To study the metallicity structure of the distant Galactic halo, it is essential to construct a sample in which the effects of selection, distance uncertainties, and metallicity biases are well understood and corrected. Tip of the red giant branch (RGB) stars serve as bright, luminous tracers that offer significantly better proper motion precision - tangential velocity uncertainties of ∆_v ≤ 30 km/s at 80 kpc - compared to RR Lyrae or blue horizontal branch (BHB) stars (Chandra et al. 2023b). Thus, the RBG stars enable us to probe the halo at greater distances with improved kinematics in a magnitude-limited sample. However, this kinematic advantage comes with a trade-off in distance precision, as RGB distance estimates depend sensitively on metallicity. To address this, we based our selection on photometric metallicity catalogues from the Pristine survey and Gaia XP-based narrow-band CaHK magnitudes, which provide complementary coverage in depth and sky area. In this work, we use these catalogues to construct a sample of RGB stars suitable for distant halo studies, carefully tracking and correcting for selection biases. Our approach allows us to probe the outer halo’s metallicity distribution and search for very metal-poor substructures, including features linked to past merger events and interactions with the Large and Small Magellanic Clouds (respectively, LMC and SMC).

The rest of the paper is divided into five sections. In Sect. 2 we describe the input photometric metallicity catalogues, the spectroscopic training sample of the Pristine survey on which our RGB selection pipeline was tested, and the quality cuts used in the input photometric metallicity catalogues. In Sect. 3, we describe the full RGB selection pipeline solely based on photometry and parallax without the need for good distances, radial velocity, or other astrometric parameters; the photometric distance calculation using an isochrone fitting code; and the final purity and completeness of our RGB selection pipeline, using spectroscopic surface gravity determined independently from the Gaia XP spectra. Section 4 presents some of our main results, including a description of the two catalogues of giants, a bias-corrected view of the metallicity distribution functions out to large distances, a 6D subset using literature spectroscopic radial velocities and their orbital properties, associating them with several known accretion events, an all-sky view of the Milky Way outer Galactic halo in the VMP end and its implications. Finally, in Sect. 5 we present the main results, conclusions, and outlook in a broad context.

2 Data

In this section, we describe the input photometric metallicity catalogues from the Pristine survey’s first data release (PDR1) and the Gaia XP spectra ran on the Pristine survey model, Pristine-Gaia synthetic (PGS), which were used to construct the two RGB catalogues that probe the outer Galactic halo. We also describe the quality cuts applied to these input catalogues before running them on our RGB selection pipeline. We also briefly describe the Pristine survey’s training sample on which we run our selection pipeline to test the purity and completeness of our selection methods.

2.1 Pristine Data Release 1 catalogue of photometric metallicities

The Pristine survey observes the northern sky using the MegaCam wide-field imager located on the Canada France Hawaii Telescope at Mauna Kea. This survey utilises a narrow-band filter centred on the calcium (Ca) II H&K lines (CaHK) in the near UV at 3968.5 and 3933.7 Å, which are highly sensitive to metallicity. When combined with SDSS g, r, and i broad-band filters or Gaia’s BP-RP broad-bands, this narrow-band filter has been proven to provide reliable estimates of stellar metallicity (Starkenburg et al. 2017a; Martin et al. 2024). This pre-selection method is particularly effective at identifying very and extremely metal-poor stars (VMP, [Fe∕H]<-2.0, and EMP, [Fe/H]<-3.0, Youakim et al. 2017; Aguado et al. 2019; Viswanathan et al. 2025), and picks up ultra metal-poor stars (UMP, [Fe/H]<-4) (the remnants of first stars, Starkenburg et al. 2018).

The Pristine photometry, in conjunction with SDSS broadband photometry, has paved the way for subsequent medium and high-resolution spectroscopic studies. These investigations specifically target stars with the lowest metallicity estimates derived from Pristine CaHK observations. The outcomes of these spectroscopic follow-up efforts have been successful, as evidenced by various works such as Caffau et al. (2017); Starkenburg et al. (2018); Bonifacio et al. (2019); Caffau et al. (2020); Venn et al. (2020); Kielty et al. (2021); Lardo et al. (2021); Lucchesi et al. (2022); Caffau et al. (2023); Lombardo et al. (2023). The combination of metallicity-sensitive medium/narrow-band photometry with broadband photometry has resulted in significant samples of VMP and EMP star candidates.

The new version of the Pristine survey now uses Gaia broad-bands instead of SDSS broad-bands to infer these photometric metallicities down to [Fe/H]—4.0. PDR1 was made public by Martin et al. (2024, hereafter MS23) for every star in the Pristine survey that has a Gaia XP spectrum released by the newest Gaia data release 3 down to magnitudes of G~17.6 (Gaia Collaboration 2023b; De Angeli et al. 2023). The high accuracy of Gaia XP data allowed for the reprocessing and recalibration of the entire Pristine CaHK dataset, consisting of approximately 11 500 images taken since 2015. This update expanded the survey to cover over 6500 square degrees. The enhanced photometric catalogue now achieves a precision of 13 millimagnitudes (mmag), a significant improvement from the initial precision of 40 mmag. In the updated methodology, the Pristine approach for determining the photometric metallicity of stars from CaHK and broadband magnitudes now relies exclusively on Gaia broadband data (G, G_BP, G_RP) rather than SDSS data. An iterative method for extinction correction has been implemented, incorporating corrections on both Gaia broadband magnitudes and CaHK synthetic (or Pristine CaHK) narrowband magnitudes, considering the star’s photometric temperature and metallicity. Variable stars are a significant source of contamination in photometrically selected low-metallicity catalogues, as their varying brightness can mimic the colours of metal-poor stars (Starkenburg et al. 2017a; Lombardo et al. 2023). To mitigate the effects of photometric variability, which can lead to inaccurate metallicities, a variability model based on the photometric uncertainties of the 1.8 billion Gaia sources were also included as described in Section 5 of MS23.

Spectroscopic analyses of red giant stars have confirmed that approximately 38% of the extremely metal-poor star candidates have [Fe/H] values below −3.0, considering quality flags during target selection, which makes these catalogues some of the most successful ones at finding extremely metal-poor stars (Viswanathan et al. 2025). However, it is crucial to consider potential variability in these stars and to implement rigorous photometric quality cuts to ensure accurate characterization of their metallicities (Lombardo et al. 2023). Thanks to Gaia DR3, several quality cuts have been improved and recent spectroscopic follow-ups have had a much larger success at finding very and extremely metal-poor stars (Viswanathan et al. 2024, 2025). We used PDR1, to create our first sample of red giants in the outer halo.

2.2 Pristine-Gaia synthetic catalogue of photometric metallicities

The detailed creation of the PGS catalogue is described in MS23, and we only provide a brief summary here. Using the latest Gaia data release (Gaia Collaboration 2023b, DR3), spectrophotometric XP information (De Angeli et al. 2023) was used to construct an extensive catalogue of synthetic CaHK magnitudes, mimicking the narrow-band photometry used in the Pristine survey (Gaia Collaboration 2023a). Additionally, several recent studies have released metallicity estimate catalogues based on these Gaia XP spectra (Andrae et al. 2023; Zhang et al. 2024; Xylakis-Dornbusch et al. 2024).

Using both Pristine CaHK magnitudes and XP-based synthetic CaHK magnitudes within the Pristine model, two catalogues of photometric metallicities for reliable stars were made public: the PGS catalogue and the PDR1 catalogue of photometric metallicities, which includes stars common to both Pristine and the XP catalogue from Gaia DR3. The latter, serving as the first data release of the Pristine survey, provides deeper data with better signal-to-noise (S/N) ratios for stars in common. These catalogues enable the construction of reliable samples of metalpoor stars, with a particular focus on V/EMP stars. The PGS catalogue offers photometric metallicities over a large portion of the sky, while the PDR1 catalogue, limited to the Pristine survey’s footprint, delivers high-quality metallicities and extends to significantly fainter stars. We used this PGS catalogue to create our second sample of red giants in the outer halo.

2.3 Quality cuts on PDR1 and PGS catalogues

To construct a pure and complete sample of RGB stars out to large distances, we use the PDR1 and PGS catalogues of photometric metallicities. It is important to note that we use the photometric metallicities inferred using the giants subsample of the training sample in the rest of this paper, i.e. the metallicities inferred for each star in Pristine if it were a giant. We used the following quality cuts on these input catalogues to end up with reliable photometric metallicities and to allow for an efficient selection of giants based on photometry and astrometry, most of which follow the suggestions from MS23:

The photometric metallicity uncertainty is less than 0.5 dex (0.5*(FeH_CaHKsyn_84th - FeH_CaHKsyn_16th)<0.5 dex, or 0.5*(FeH_Pristine_84th - FeH_Pristine_ 16th)<0.5 dex).
The 84th percentile value of the probability distribution function (PDF) of the photometric metallicity is greater than −3.999 (FeH_CaHKsyn_84th>-3.999, or FeH_Pristine_84th>-3.999).
The percentage of Monte Carlo iterations used to determine [Fe/H] uncertainties inside the grid are greater than 80% (mcfrac_CaHKsyn>0.8, or mcfrac_Pristine>0.8).
The CASU photometric data reduction flag (merged_CASU_flag = −1 (or −2), that denotes whether a source is very likely (or likely) point-sources - this is applicable only for sources with PDR1 observations).
The extinction on the B-V magnitude is less than 0.5 (E(B-V)<0.5).
The photometric quality cut is defined as IC^*I<3σ_{C_*}. Cstar is the Gaia DR3 corrected flux excess, C^*, as defined in equation 6 of Riello et al. 2021, and Cstar_1sigma is the normalised standard deviation of C^* for the G magnitude of this source, σ_{C_*}, as defined in equation 18 of Riello et al. 2021 (abs(Cstar)<3*Cstar_1sigma).
The probability of a star being a variable star to be less than 30% Pvar<0.3.
The Gaia astrometric quality cut (Gaia’s renormalised unit weight error RUWE < 1.4).

2.4 Spectroscopic training sample used by the Pristine survey model

The Pristine survey’s model to derive photometric metallicities from CaHK narrow-band magnitudes is limited to FGK stars because, for hotter stars, the CaHK absorption lines are too weak to serve as reliable metallicity indicators. On the cooler end of the spectrum, very cool M stars and cool K giant have prominent molecular bands that significantly lower the level of the pseudo-continuum in the relevant wavelength range, making it challenging to measure the CaHK absorption features. Therefore, MS23 restricts their analysis to stars with 0.5 < (G_BP,0 - G_RP,0) < 1.5, covering evolutionary stages from the upper main sequence and turn-off to the tip of the RGB for an old, VMP stellar population. This colour interval corresponds to a temperature range of approximately 3900 < T_eff < 7000 K. The colour cut is also necessary due to the lack of VMP stars in the training sample in the colour ranges beyond this.

Following the methodology described in Starkenburg et al. (2017a), MS23 use a training sample of 66, 000 stars to map the de-reddened (CaHK, G, G_BP, G_RP) colour space onto photometric metallicities. A major component of this sample consists of SDSS/SEGUE stars (Yanny et al. 2009; Smee et al. 2013) within the Pristine footprint, with an average signal-to-noise ratio per pixel greater than 25 over the 400-800 nm wavelength range. Additionally, the SDSS pipeline must provide log g values, adopted T_eff < 7000 K, radial velocity uncertainty <10 km/s, and adopted spectroscopic metallicity [Fe/H] with an uncertainty <0.2 dex. For our outer halo studies, red giant stars are necessary, which are less numerous in the training sample. Therefore, MS23 complements the SDSS sample with APOGEE DR17 giants. To cover the rare very, extremely and ultra metal-poor stars as much as possible in the training sample and to provide reliable photometric metallicities in the VMP end, the training sample is supplemented by stars from the Pristine survey’s spectroscopic follow-up programs (Youakim et al. 2017; Aguado et al. 2019; Venn et al. 2020; Kielty et al. 2021; Lardo et al. 2021; Lucchesi et al. 2022), VMP stars from the third data release of the LAMOST survey (Li et al. 2018) corrected for spurious stars in the low-temperature range as described by Sestito et al. (2020) using the latest LAMOST DR8 catalogue, the PASTEL sample (Soubiran et al. 2016) as used by Huang et al. (2022), and highresolution observations of the Boötes I dwarf galaxy (Gilmore et al. 2013; Frebel et al. 2016).

In order to have a uniform sample of stars with homogeneously analysed atmospheric parameters to test our RGB selection pipeline, we use only the SEGUE subset of the training sample and excluding the other subsamples of the training set defined by MS23 in Section 3.

3 Methods

The pipeline for constructing the final catalogue was tested on the training sample described in Section 2.4. Because this sample contains spectroscopic log g values, we can reliably divide the stellar sample into dwarfs and giants by their log g measurements. We choose to define dwarfs as all stars with log g > 3.5 and giants as all stars with log g < 3.5. This division allows us to compute the purity and completeness of giants of every new cut we apply to the catalogue, and we can choose to apply the cuts that maximise both. We define purity as the number of stars with log g <3.5 divided by the total number of stars after our selections and completeness as the number of giants after our selections divided by the total number of giants in the training sample (stars with log g < 3.5). We apply the colour cut 0.5 < G_BPO - G_RPO < 1.5 to the training sample to match the colour range over which Pristine photometric metallicities are assigned (MS23). It is important to note that the spectroscopic training sample based on SEGUE is not fully representative of our input PDR1/PGS catalogues of photometric metallicities. This is because of different on-sky coverage, magnitude range and target selection effects. Therefore, the purity and completeness derived from them do not necessarily trace the purity and completeness of our final catalogue of RGB stars. However, the training sample allows us to efficiently test our method, which in turn allows us to better understand the consequences of our selection pipeline.

In Sections 3.1, 3.2 and 3.3, we introduce the motivation for the two cuts we apply to the catalogue. In Section 3.4, we explain how distances to the stars are derived. The main advantage of our method is that we select RGB stars using only parallax¹ and photometry, without the need for good distances, radial velocities, or atmospheric parameters.

3.1 Parallax-based colour-absolute magnitude diagram cut

For FGK stars, the main contamination in the selection of giants is dwarfs in the same colour range. An obvious difference between giant and dwarf stars is where they are located in the colour-absolute magnitude diagram (CaMD), so if we can create an CaMD of the sample, we can use the location of stars in it to separate them. Because all of our stars have Gaia parallaxes, we can invert these parallaxes to get stellar distances, and use the apparent magnitudes to get absolute magnitudes using the distance modulus equation, to produce an CaMD. However, inverting parallaxes to get distances can only be done for good quality parallaxes, parallax_over_error>5, as recommended by the Gaia consortium (Bailer-Jones et al. 2018). In practice this means that this cut, which we refer to as the parallax-based CaMD cut, is only applied to the subset of stars that have ‘good enough’ parallaxes². We use inverted parallax simply as a means to remove nearby dwarfs with good enough parallax, and therefore our quality cut on parallax errors can be less strict than what is recommended when using the parallax to obtain distances. For all stars with bad parallaxes, there is no equivalent way of applying these cuts. The following part of the method is thus only applied to the good enough parallax subset of the catalogue.

We define good (or well-defined) parallax as being neither negative nor zero, as both imply a non-physical distance, as well as the fractional parallax uncertainty being less than some value f = ∆π/π, with f ∈ (0,1). It is important to note that we do not use the inverted parallaxes to get distances for these stars, but simply use them to remove dwarfs that have good enough parallaxes (see Section 3.4 for the description of photometric distance inference).

To compute the purity and completeness for different values of f, we needed to first perform the parallax-based CaMD cut, so we needed to define a division between dwarfs and giants. We visually inspected the CaMD of the training sample for f ≤ 0.1, i.e. we only looked at stars with very good parallaxes (uncertainties less than 10%), and we constructed the following piece-wise division between dwarfs and giants: $\begin{aligned} M_{G} = 0.9 & & G_{B P, 0} - G_{R P, 0} < 0.8, \\ M_{G} = 6.25 (G_{B P, 0} - G_{R P, 0}) - 1.7 & & 0.8 < G_{B P, 0} - G_{R P, 0} < 0.952, \\ M_{G} = 4.25 & & 0.952 < G_{B P, 0} - G_{R P, 0} . \end{aligned}$ $Mathematical equation: \begin{aligned} M_G = 0.9 \text{ } \&& \text{ } G_{BP,0}-G_{RP,0} < 0.8, \\ M_G = 6.25(G_{BP,0}-G_{RP,0}) - 1.7 \text{ } \&& \text{ } 0.8 < G_{BP,0}-G_{RP,0} < 0.952, \\ M_G = 4.25 \text{ } \&& \text{ } 0.952 < G_{BP,0}-G_{RP,0}. \end{aligned}$ (1)

All the stars that fall below these lines are assumed to be dwarfs. We then tested different values of f using the above distinction between dwarfs and giants.

To identify a value of the fractional parallax uncertainty f that maximises both purity and completeness of the parallaxbased CaMD cut, we needed to compute the purity and completeness for several different values of f. We required that f cannot be negative or larger than 100%, meaning that we only consider values of f ∈ (0,1). The purity and completeness for this range in f is seen in Figure 1. This figure shows that as f increases, the purity monotonically increases (more dwarfs are correctly removed), and completeness monotonically decreases (more giants are incorrectly removed). It is important to note that the choice of f affects both the removal of dwarfs and the selection of giants. An f = 0.1 cut means that stars with positive parallaxes and f < 0.1, located above the division line in Equation (1), are considered giants. Additionally, all stars with f > 0.1, including those with negative parallaxes, are also selected as potential giant candidates. This is the reason why finding a good balance in the choice of f is important. From Figure 1, we see that as f increases, the purity increases. There is an inherent effect on the distance probed and the quality of the parallax. Therefore, with increasing f, we are more likely to select stars at larger distances, i.e. brighter giants, on a fixed CaMD. With increasing f, we are also more likely to have a cleaner selection of giants based on their ‘bad’ parallax. From Figure 1, we also see that as f increases, completeness decreases. This is because, with increasing f, we are more complete for the ‘good’ parallax giants but less complete for the ‘bad’ parallax giants and the two effects together make the completeness go down at larger f.

For our RGB selection, we choose to use f = 0.5, as it falls approximately where the increase in purity and decrease in completeness have a lower gradient and start to reach a plateau without having to compromise on the parallax uncertainty too much. The chosen 50% uncertainty on parallax is much higher than the Gaia-recommended 20%. However, we stress that we only use parallax as a proxy to remove dwarfs with good enough parallax and not assign distances based on the inverted parallax. We only need good enough parallaxes to have a reliable dwarf-giant division. If we use Bailer-Jones et al. (2021) photogeometric distances instead of inverted parallax distances, the purity and completeness results remain unchanged. After this cut is applied to the data, the purity is 50% and completeness is 79%.

Figure 2 shows the CaMD of the training sample (SEGUE), and the division between dwarfs and giants defined in Equation (1); by visual inspection, we see that it works for our choice of f ≤0.5 (with the giant sequence being distinctly visible), and it is the definition we used to construct the final RGB catalogues.

Fig. 1

Purity (blue) and completeness (orange) for different parallaxbased CaMD cuts (along with bad parallax pool of giants) in the training sample. The choice of 50% (f=0.5) is justified by the plateau reached in purity and completeness, with a good balance between the two values. The discreteness of the plot is due to the small size of the training sample (N=53 666) used to test our cuts. All values are given in percentages.

Fig. 2

Colour-absolute magnitude diagram of the training sample, with absolute magnitudes computed using Gaia parallaxes, π, with the conditions that π > 0″ and that the fractional parallax uncertainty f = 0.5.

3.2 Magnitude cut

The previous section describes a method to clean the catalogue of dwarfs when good enough parallaxes are available. After this method had been applied, the sample consisted of giants with good enough parallaxes and dwarfs and giants with bad parallaxes. Gaia DR3 astrometry only probes a volume of 2 kpc for dwarfs with good enough parallax (Viswanathan et al. 2023). We expect that all dwarfs with G > 17.6 are too far away to have good enough parallaxes. We introduce a cut where all stars fainter than G = 17.6 are removed, to remove these bad parallax dwarfs. Although there are some giants at these magnitudes, meaning that this cut reduces completeness, dwarfs with bad parallax dominate the sample at these faint magnitudes (as dwarfs are 100 times more common than giants in stellar evolution) in our training sample, and we cannot distinguish both without the use of atmospheric parameters. The choice of the exact value of 17.6 in Gaia G magnitude (which is also the magnitude limit of the PGS catalogue because it is based on Gaia XP spectra that has a magnitude limit of about 17.6) is justified by looking at the histogram of Gaia G magnitudes for the dwarf stars with good enough parallax as described in detail in Appendix A. In combination with the CaMD cut, this cut increases purity to 65% and decreases completeness to 64%.

3.3 Colour-metallicity cut

The fraction of dwarfs to giants changes as a function of both colour and metallicity: as the metallicity decreases, the colour at which the subgiant branch turns into the giant branch becomes bluer (and the absolute magnitude decreases). We thus introduced a cut that removes the subgiants based on their metal-licities and colours. Using five MIST isochrones (Choi et al. 2016; Dotter 2016) with [Fe/H] from −0.25 to −2.25 dex (in 0.5 dex steps), we identify the sub-giant branch turning point as a function of Gaia BP-RP colour and metallicity. A linear fit is applied to these points using scipy, which is used to cut away sub-giant branch stars. This linear function is³ $G_{B P, 0} - G_{R P, 0} = 0.14 \times [Fe/H] + 1.05,$ $Mathematical equation: \text{G}_{BP,0} - \text{G}_{RP,0} = 0.14 \times \text{[Fe/H]} + 1.05,$ (2)

where any stars with a colour bluer (smaller) than this are cut away. The training sample in metallicity versus colour view, colour-coded by the training sample’s log g is shown in Figure 3. We can see that the subgiant stars with log g >3.5 are removed (in red) using this colour-metallicity cut, retaining the RGB stars (in blue). There exists a trend in surface gravity with respect to metallicity for when the stars have a sub-giant branch, which is what we clearly see on the right side of the dashed black line in Figure 3. Therefore, the blue overdensity of stars at Gaia BP-RP colour of 0.6 are relatively metal-poor sub-giant branch stars with smaller surface gravity. If we change the model isochrone to PARSEC or BaSTI, the impact of the final colour-metallicity cut on the input catalogues is less than 1%. We also refrain from using the training sample to decide the colour-metallicity cut, as the training sample is not fully representative of our input catalogues, and is very small in size after the previous selections (N=7882). Therefore, we only want to use the training sample to study the consequences of our selections. This cut, together with the CaMD cut and the magnitude cut, increases purity to 90% and decreases completeness to 58%.

Our full RGB selection pipeline is shown as a flowchart in Figure 4. The log g distribution of the training sample before and after the RGB selection pipeline has been applied is shown in Figure 5, which shows the efficiency of our RGB selection on the training sample.

Fig. 3

Training sample (CaMD cut on good enough parallax giants and bad parallax giants selected after the magnitude cut) colour-coded by log g and the colour-metallicity cut in Equation (2), shown as a black line. Any stars that lie under this line are removed. The cut is designed to remove mainly subgiant stars probed unevenly at different metallicity ranges, and the colour-coding shows that most stars underneath this line has log g roughly greater than 3.5 (not a giant). The cut preserves the RGB (the blue region) and cuts away the SG/MSTO (the red region).

Fig. 4

Flowchart of the steps involved in creating the two RGB star catalogues using the parent samples from PDR1 and PGS catalogues. The number of stars removed at each selection step for the two input catalogues are shown in the red boxes, the method and cuts used to select RGB stars on every step is shown in orange boxes, and the final sample with counts are shown in the green box.

3.4 Photometric distance derivation

To map the Galactic outer halo, we need reliable distances out to ~100 kpc. This cannot be achieved using Gaia parallaxes as these reach only up to 5-10 kpc for giants. This means we need to use assumptions such as isochrone models to derive photometric distances to probe the Galactic outer halo. These isochrones are heavily dependent on the assumed metallicities, especially on the RGB. Given that our input catalogues are based on PDR1 and PGS catalogues with reliable photometric metal-licities down to −3.5, we can use them as input metallicities to derive photometric distances. We tried our method on several different isochrone models such as MIST (Dotter 2016; Choi et al. 2016), PARSEC (Bressan et al. 2012; Pastorelli et al. 2020), and BaSTI isochrones (Hidalgo et al. 2018). We find the standard deviation of our photometric distances with inverted-parallax distances for the subsample that has good parallax⁴ (f ≤ 0.1) is minimum for BaSTI isochrones. Additionally, the difference between BaSTI and PARSEC is much smaller than the difference between BaSTI and MIST. This can be attributed to the fact that MIST isochrones do not have α-enhancement implemented whereas BaSTI and PARSEC do. In the remainder of this work, we use BaSTI isochrones because of its agreement with the shape and slope of the isochrone in the RGB using Gaia XP/Gaia BP-RP colours and the fact that it is available down to lower metallicities ([Fe/H]—3.2) compared to PARSEC ([Fe/H]—2.2). We use BaSTI isochrones at a fixed age of 10 Gyr, in the α-enhanced version (Pietrinferni et al. 2004), with a Reimers mass loss parameter of 0.3, assuming a Kroupa initial mass fraction (Kroupa 2001), a fraction of unresolved binaries of 30%, and a minimum mass ratio for binaries of 0.1, between metallicities of −3.2 and −0.08. We use an effective temperature range of 3000-7000 K, Gaia BP - RP extinction corrected colour between 0.5 and 1.5, and a log g range (−0.5−4.2) just enough to infer photometric distances for RGB stars and remove outliers in the process. The log g in BaSTI model isochrones are calculated using the Stefan-Boltzman equation. We only select the RGB part of the stellar model isochrone and do not fit for any other stellar populations.

We constructed a 4D interpolation grid assuming a fixed stellar age of 10 Gyr, but varying metallicity, effective temperature, surface gravity, and absolute Gaia G magnitude. This choice is justified by tests showing that even varying the age by a factor of two affects inferred distances by only ~10%, consistent with findings from Bonaca et al. (2020). For effective temperature and surface gravity, we crossmatch our final RGB samples from the PDR1 and PGS input catalogues with the Andrae et al. (2023, hereafter A23) catalogue, which provides teff_xgboost and logg_xgboost estimates for 175 million stars, inferred from Gaia XP spectra using the XGBoost algorithm (Chen & Guestrin 2016). This results in 408 524 PDR1-giants and 6 098 246 PGS-giants (see methods flowchart in Figure 4).

We find that using temperature-based isochrone matching leads to photometric distance estimates that agree better with high-quality parallax-based distances (f ≤ 0.1) than those derived from Gaia BP-RP colours. This is expected, as temperature incorporates the full XP spectrum, whereas BP-RP compresses this into a single colour. We avoid using mh_xgboost metallicities from A23, as they are less reliable at low metal-licities unless the star is both bright and has a good parallax (f ≤ 0.04), as shown by the vetted RGB sample in Table 2 of A23. Although temperature and surface gravity from this source also degrade with poor parallaxes, they are more robust than the metallicities.

The goal of this work, however, is to demonstrate that RGB stars can be reliably selected using only photometry and parallax, without requiring spectroscopic parameters, distances, or radial velocities. Atmospheric parameters from A23 are used solely for photometric distance inference and validation. This pipeline is applicable to any photometric catalogue with Gaia parallaxes for RGB selection.

For each star, we generate an interpolated isochrone from the BaSTI model for an age of 10 Gyr and the star’s photometric metallicity from Pristine (CaHK-based for PDR1, or XP-based model for PGS). The isochrone spans a grid of effective temperatures, surface gravities, and Gaia G absolute magnitudes. We place each star in the T_eff — log g space and find the closest point on the spline-interpolated isochrone, applying a weight on log g that varies with evolutionary stage: 2500 near the turn-off and 500 near the RGB tip. This weighting minimises the difference between inferred photometric distances and parallax-based distances for the f ≤ 0.1 subset and reduces sensitivity to input parameters. The corresponding absolute Gaia G magnitude is then used with the extinction-corrected apparent G magnitude to compute the photometric distance.

Extinction corrections for all Gaia magnitudes are adopted from the input Pristine catalogues (MS23). The inferred photometric distances show good agreement with inverted parallax distances (by design) and with the Starhorse and Bailer-Jones et al. (2021) photogeometric distances (see Appendix B). In the PGS catalogue, the scatter in the distance comparison increases slightly with distance, as noted in Appendix B, indicating a mild distance-dependent uncertainty in the photometric distance estimation, which is expected in any outer halo catalogues using RGB stars.

We calculated systematic uncertainties on the inferred photometric distances instead of measurement uncertainties because we find that the measurement errors are negligible compared to the total dispersion inferred from the good parallax (f ≤ 0.1) subset. This is because we do not have measurement uncertainties on effective temperatures or surface gravities from A23 parameters that are inferred using machine learning. Available measurement uncertainties only depend on the measurement uncertainties on the photometric metallicities from the PDR1 and PGS input catalogues. These are, in turn, dependent on uncertainties on colour, and magnitudes that are relatively well-measured compared to other uncertainties. Therefore, we stick to estimating only systematic uncertainties on the inferred photometric distances. For this, we use the 100 nearest neighbours of each star in the good parallax (f ≤ 0.1) subsample, based on its input effective temperature, surface gravity and photometric metallicities (all of which are scaled between a range of 0 to 1, to make sure they have the same weights). We infer the dispersion between inverted-parallax distances and our inferred photometric distances for these 100 nearest neighbours as the systematic uncertainties on our photometric distances.

We compared the photometric distances calculated in this work with the inverted-parallax distances against the input parameters used to calculate the photometric distances for both PDR1 and PGS-giants catalogue in Figure 6. We show distance times parallax (ideally 1.0) versus photometric metallicities from PDR1/PGS catalogues from MS23, effective temperature and surface gravity from A23. The final panel shows a histogram of distance times parallax and it peaks at 0.99 and 0.95, with a 1 σ dispersion of 0.09 and 0.1 for PDR1 and PGS-giants respectively. From the Figure 6 top panel, we can see that the photometric distances agree very well with the inverted-parallax distances with no visible trend with the photometric metallicities, that is the main parameter used in our science cases in the next section. We see the same with PGS-giants in the bottom panel of Figure 6, in terms of reliability of inferred photometric distances. However, the dispersion is higher for PGS-giants, mainly because of photometric metallicities that are based on lower S/N CaHK magnitudes inferred from Gaia XP spectra, and the fact that the distance calculation pipeline is very much dependent on the reliability of accurate metallicities. In addition to removing turn-off interlopers by using a log g<3.5 cut (logg_xgboost<3.5), we also use an additional quality cut to get the final catalogue of PDR1 and PGS-giants with reliable metallicities and distances used for the science cases in the results section. This is due to the red clump (RC) stars at the metal-rich end and (colder) horizontal branch (HB) stars at the metal-poor end that does not get a log g from A23 that agrees well with parallax-based inferences. This can be seen by the cloud of stars in the black box with underestimated distances shown in the bottom left panels for PDR1 and PGS-giants compared to parallax-based distances, shown in Figure 6. To remove stars with underestimated distances in this region, we use a distance quality cut of 7% or below within the log g range of 2.3 and 3.0 (!((phot_dist_errs>0.07) & ((logg_xgboost>2.3) & (logg_xgboost<3.0)))). With all these quality cuts, we end up with 180 314 PDR1 and 2 420 898 PGS-giants, that have reliable metallicities and distances. A small part of the Pristine survey sample (~1%) can be selected as RGB stars using our method and are not part of the PDR1 catalogue, which we do not use in the rest of this paper.

We validated our photometric metallicities with GALAH DR3, APOGEE DR17, and SAGA database of VMP stars’ high-resolution spectroscopic metallicities (Buder et al. 2021; Majewski et al. 2017; Suda et al. 2008) and our photometric distances with distances inferred by the Starhorse Bayesian-isochrone inferred distances (Queiroz et al. 2023) as shown in Figure 7. We chose GALAH DR3, APOGEE DR17, and SAGA (even though the crossmatch numbers are lower than when using low-resolution spectroscopic surveys and they have a bias towards brighter stars) because of its reliability across the metal-licity scale used down to the EMP end. No particular trends within APOGEE, GALAH, or SAGA were seen. To have a full understanding of the reliability of the photometric metallcities from PDR1 and PGS catalogues down to [Fe/H]~-4.0, we refer the readers to MS23 and Viswanathan et al. (2025). Based on these works, it is also important to keep in mind that in the VMP and EMP regime, the success rates of finding true VMP and EMP giants are 97% and 38% respectively, which makes these EMP giants good candidates and not always ‘true’ EMPs. The validation of metallicities presented here is only to ensure that there are no offsets caused in the photometric metallicities due to the various selection made in the parent Pristine catalogues to select the RGB stars, and not to validate the metallicities themselves, as these are performed thoroughly in the data release paper (MS23). In the top panels of Figure 7, we see a comparison of metallicities from PDR1 (left) and PGS (right) giants with GALAH DR3 metallicities. We removed those stars with flag_sp==0 or flag_fe_h==0 from the GALAH DR3 sample, and place a quality cut of FE_H_FLAG==0 on the APOGEE DR17 sample. For the SAGA database of VMP stars, we use a 5.0 arcsec crossmatch radius as recommended by the SAGA catalogue makers due to lower precision in RA, Dec from older spectroscopic follow-ups as opposed to the 1.0 arcsec crossmatch radius used in the rest of this work. We see good agreement within 0.25 dex and 0.5 dex for PDR1 and PGS-giants respectively. For the PGS-giants, if we only use stars with good S/N in the CaHK narrow-band (error on CaHK magnitudes, d_CaHK<0.02), the scatter is as low as 0.2 dex. This is because, the photometric [Fe/H] uncertainty is ~0.1 in PDR1 catalogue versus ~0.4 in PGS catalogue at fainter magnitudes (G-16). This is also the reason why we see higher dispersion for PGS distances compared to parallax in the bottom panels of Figure 6. Therefore, we recommend using this cut in PGS-giants for science cases where reliable distances and very reliable metallicities are a necessity.

In the bottom panels of Figure 7, we show a comparison of our inferred distances with Starhorse distances (with <10% uncertainties) using spectroscopic survey parameters from APOGEE DR17, GALAH DR3, SDSS SEGUE DR12, LAMOST DR8 MRS, and Gaia DR3 RVS. No particular trends with specific surveys are seen. We choose these five surveys, because they have higher resolution or probe the fainter end of our catalogue (in the case of SEGUE). We see that the distances are not offset with a ≲20% and ≲40% scatter out to 100 kpc for PDR1 and PGS catalogues of giants respectively. The scatter between our distances and Starhorse distances is larger for PGS-giants than PDR1-giants (and slightly biased towards underestimation than overestimation), due to metallicity uncertainties.

Fig. 5

Distribution of the training sample’s surface gravities before and after the catalogue pipeline (described in Figure 4) is applied. The vertical line at log g=3.5 shows the separation used to validate the giants sample selection. The overdensity near log g~4.5 (orange region, right panel) arises from contamination by metal-rich turn-off and sub-giant stars in SEGUE, where their surface gravities overlap with the lower part of the RGB. Note that the log g<3.5 cut is an approximate validation threshold; a metallicity-dependent log g cut provides a more refined RGB selection.

Fig. 6

Photometric distance calculated in this work times the offset corrected parallax based on Lindegren et al. (2021) (which is supposed to be 1.0 in the ideal case) versus the input parameters to calculate these photometric distances, including photometric metallicities (top left) from PDR1 (top panels) and PGS catalogues (bottom panels), effective temperature (top right), and surface gravity (bottom left) from A23 XGBoost catalogue. The input parameters are colour-coded by the log density and horizontal histogram of distance times parallax (bottom right). Note that the density distributions are in log scale, so most of the stars are within ~10% systematic uncertainties. The grey overdensities in the surface gravity plots are bad distance mismatches caused due to red clump stars and (colder) horizontal branch stars.

Fig. 7

Validation of our photometric metallicities and distances with GALAH DR3 and APOGEE DR17 surveys and the high-resolution spectroscopic metallicities from the SAGA database of VMP stars (top) and Starhorse Gaia DR3 (Bayesian isochrone-fitting code) distances for different spectroscopic surveys such as APOGEE, GALAH, SDSS SEGUE, LAMOST MRS, and Gaia RVS surveys (bottom) for the PDR1 (left) and PGS (right) giants catalogue constructed in this work.

3.5 Caveats with the catalogues

In this subsection, we discuss the two main caveats with the PDR1 and/or PGS-giants catalogue constructed in this work.

3.5.1 Distances of red clump and horizontal branch stars

From the bottom left panels for PDR1 and PGS-giants in Figure 6, we can see that the distances have a clear trend towards underestimation (of up to 50%) for log g values between 2.3 and 3.0. This region of log g overlaps with where we find RC stars in the metal rich end and colder HB stars in the metal-poor end. RC stars are abundant stars that were once similar to the Sun and have since evolved into red giants, now sustained by helium fusion in their cores. Regardless of their specific age or composition, all RC stars achieve roughly the same absolute magnitude luminosity. Red clump stars are core helium-burning giants, valuable as standard candles due to their consistent luminosity and well-defined position in the H-R diagram. The Pristine survey model assigns photometric metallicities for F, G, K stars between 0.5 and 1.5 in the Gaia BP-RP colour range. This region inevitably overlaps with a few HB stars. HB stars originate from low-mass stars that have finished their main-sequence lifetimes and experienced a helium flash at the conclusion of their redgiant phase. Consequently, HB stars are very old objects, making them useful markers in studies of the Galactic structure and formation history. However, our isochrone fitting code does not explicitly model RC and HB evolutionary phases. Instead, these phases are indirectly accounted for by matching each star to the nearest point on the isochrone in the T_eff-log g space. However, the log g comes from the A23 catalogue using Gaia XP spectra that does not linearly scale with the absolute magnitudes calculated using parallaxes for the good parallax (f ≤ 0.05) subset as illustrated in Figure 8 where we show the mismatch in RC, and HB stars with the black box. Therefore, we use a quality cut in distances within these surface gravities to ensure that we end up with distances that are fully reliable (as discussed in previous subsection).

Fig. 8

Absolute G magnitude calculated using distances inferred from inverted parallax versus log g from the A23 XGBoost catalogue. We used log g as an input to calculate the photometric distances, whereas absolute magnitudes, which trace surface gravity, were inferred from parallax. One can clearly see the 1:1 trend mismatched at log g that corresponds to red clump (metal-rich) and horizontal branch (metalpoor) stars.

3.5.2 Distance-metallicity selection effect on Gaia XP

In Figure 9, we show the photometric metallicities versus heliocentric distances for the PDR1 (left) and PGS (right) catalogues of giants, colour-coded by their mean uncertainty in the CaHK magnitudes used to calculate the photometric metallicities. The Pristine survey model assigns photometric metallicities for each star based on its CaHK narrow-band magnitudes in comparison with Gaia BP-RP broad-band magnitudes. For this purpose, the model requires the CaHK uncertainties to be less than 0.1, which is the upper limit of our colour-coding in Figure 9. The Gaia XP spectra are magnitude limited down to G ~ 17.6, with Gaia’s scanning law limitations imprinted (see Riello et al. 2021 for more details about Gaia’s scanning law effect). With the selection of the PGS-giants catalogue (which is also magnitude limited at 17.6), we are pushing the limits of the S/N in the CaHK region that is required to calculate photometric metallicities reliably. This problem is almost negligible in the PDR1-giants catalogue, because the Pristine survey goes much fainter than Gaia XP spectra (down to G-21), with very high S/N in the CaHK narrow-band compared to Gaia XP spectra.

The effect of CaHK uncertainty is visible clearly in Figure 9 for the PGS-giants. We see that the CaHK uncertainty increases clearly at larger distances. It is important to note that the CaHK magnitudes are brighter for metal-poor stars than for the metalrich stars with respect to the broad-band magnitudes, which allows us to pick metal-poor stars more efficiently amidst the more metal-rich populations of our Galaxy. Given the magnitude limits and CaHK uncertainty limits of the PGS-giants catalogue, at larger distances, we only see metal-poor stars that are relatively brighter than the metal-rich stars. Therefore, the filled blue region in the right panel of Figure 9 is empty due to this distance-metallicity selection effect, and not due to physical conditions in the Galactic outer halo. This means that the PGS-giants cannot be used to study the metallicity variations at different distances in the Galactic halo. Given the low-to-no bias in distance versus metallicity in the PDR1-giants (apart from the small effects due to the colour boundaries corrected by weighing the metal-licity bins in Section 4.2), we can use this catalogue to study the metallicity structure of the Galactic halo out to large distances and down to the lowest metallicities, thereby probing deep, and far into the Galactic halo’s earliest evolutionary times. Due to the distance-metallicity selection effects in the PGS-giants catalogue, we see almost no metal-rich stars in the outer halo, which means we can study the outer Galactic halo’s oldest stars with a much cleaner sample of VMP stars than has been possible so far. We investigate some of these science cases in the next section(s).

Fig. 9

Metallicities versus heliocentric distances for PDR1 (left) and PGS (right) catalogues of giants presented in this work colour-coded by CaHK narrow-band magnitude uncertainties. The distance-metallicity selection effect caused by low S/N on the CaHK narrowband from Gaia XP spectra is highlighted by the blue filled area on the right.

Fig. 10

Distribution of the A23 surface gravities before and after the catalogue pipeline (described in Figure 4) is applied on the PDR1 (top) and PGS (bottom) input catalogues. The vertical line at log g=3.5 shows the separation used to validate the selected sample of giants. In all the panels, the overdensity of stars at log g~2.5 from RC stars are clearly visible.

3.6 RGB selection validated using A23 parameters

As a final check on the purity and completeness of our catalogues of giants, we use log g values from the A23 catalogue that is based on the Gaia XP spectra and parallax. In Figure 10, we show the log g distribution from the A23 catalogue before and after our RGB selection pipeline (see the Figure 4 flowchart for a summary of the pipeline) has been applied to the PDR1 (top) and PGS (bottom) input catalogues. This figure shows the efficiency of our selection given the small number of stars that fall below log g = 3.5, given the very low signal from RGB stars before the pipeline was applied (this is because dwarfs are 100 times more numerous than giants). The final purity and completeness calculated based on the A23 log g is 78% and 76% respectively for the PDR1-giants and 92% and 82% respectively for the PGS-giants. These numbers are much higher than what we inferred based on the training sample. These numbers show that our RGB selection performs very well, achieving both high purity and completeness. From a small subset of good parallax (f=0.05) stars that are misclassified with logg_xgboost>3.5 in our RGB catalogues or unselected with logg_xgboost<3.5 and not in our RGB catalogues, we see that most (90%) of these stars are near the sub-giant branch and/or main sequence turn-off part of the CMD.

We only use the atmospheric parameters from A23 for calculating photometric distances and validating our RGB star selection. Our goal is to demonstrate the effectiveness of selecting RGB stars using only parallax and photometry, without depending on atmospheric parameters, distances, or radial velocities. This approach is more widely applicable, as it allows for the reliable identification of RGB stars across any photometric catalogue that includes Gaia parallaxes. The validation using log g from A23 showcase the power of our RGB selection.

4 Results

In this section, we summarise the metallicity and distance properties of the catalogues of giants, and discuss the construction of 6D phase-space samples of PDR1/PGS-giants, Sagittarius (Sgr) stream members in the catalogues, calculation of phasespace information and integrals-of-motion (IOM) and how we can view the different accretion events in the metallicity view of the IOM space. We finally discuss the outer-halo substructures in our catalogues of giants.

4.1 Description of the catalogues

We present a red giants branch catalogue using the Pristine survey and/or the Gaia XP-based metallicities and photometric isochrone-fitted distances in this work. The PDR1-giants sample consists of 180 314 RGB stars and probes heliocentric distances up to 100.65 kpc (with mean uncertainties down to 12% with a maximum of 40% dependent on the quality of the input parameters, especially at the faint end). The PGS-giants sample consists of 2 420 898 RGB stars that probe heliocentric distances up to 68.03 kpc (with mean uncertainties down to 12% with a maximum of 57% dependent on the quality of the input parameters, especially in the faint end). The final purity and completeness based on the Pristine survey training sample is 90% and 58% respectively. Both the purity and completeness vary as a function of metallicities, surface gravities, and effective temperature, due to the colour cuts used to select them. From the training sample, we see that the purity and completeness decreases as a function of metallicities (by −20% between −4 and 0 dex), decreases as a function of temperature (by ~40% and ~20% between 4000 and 6000 K), and the completeness decreases as a function of surface gravity (by ~20% between 0.5 and 3.5), if we use log g<3.5 as the pure sample of giants. However, it is important to note that the log g<3.5 is not the purest and most complete selection of giants and also has a dependence on the metallicity. It is important to note that the training sample is not necessarily representative of our input catalogues, and it is quite incomplete and has much fewer stars in the VMP end. Some of these effects are corrected for when we measure the metallicity distribution function using model isochrones in Section 4.2.

The mean metallicity uncertainties are 0.08 and 0.19 dex down to [Fe/H]—4 in metallicity for the PDR1 and PGS catalogues respectively. The mean metallicity uncertainties increases up to 0.11 dex and 0.28 dex for VMP stars in the PDR1 and PGS catalogues respectively. With such reliable metallicities and distances, we can study the metallicity distributions and (chemical and dynamical) substructures than make up the outer Galactic halo.

To calculate the Cartesian positions for all the RGB stars, a distance of 8.2 kpc between the Sun and the Milky Way’s centre is assumed (GRAVITY Collaboration 2018). In the top panels of Figure 11, we show metallicity versus absolute height above the Galactic plane for PDR1 and PGS-giants. We can see that the PGS-giants have a fairly smooth distribution with metal-rich stars in the inner halo and metal-poor stars in the outer halo. However, this view is biased due to the quality cut on CaHK uncertainties used in the making of the PGS input metallicity catalogue, as discussed in Section 3.5.2. As a consequence of this, we see only metal-poor stars in the outer halo of PGS-giants, and do not trace the reality of the metallicity distribution of the outer halo. However, this sample can be used to study metal-poor substructures in the outer halo due to the low-tono contamination from metal-rich substructures (that are usually large in number). We refrain from using the PGS catalogue for anything that involves studying the metallicity distribution or metallicity-distribution-dependent science cases.

In the top left panel of Figure 11, we show the metallic-ity versus distance space for the PDR1-giants, which is more representative of the Galactic halo out to −100 kpc. We can see a prominent overdensity of stars around the metallicity of ~−1.3 out to about 30 kpc which we associate with the last major merger event, Gaia-Enceladus-Sausage (GES, Belokurov et al. 2018; Helmi et al. 2018). This is reminiscent of the strongly radial orbits clustered at −1.0 to −1.6 in spectroscopic metallicities from the H3 survey out to large distances (Conroy et al. 2019a,b). For a small subset of our sample with radial velocities, we find these stars to have high eccentricities and probe the radial regions of energy, E and vertical angular momentum, L_z (see Figure 17 and Section 4.3.3). The small subset of these stars that overlap with the APOGEE high-resolution spectroscopic survey, have lower [α/Fe], reminiscent of a dwarf galaxy stellar population that merged with the Milky Way, similar to the GES event. These checks allow us to conclude that this prominent peak in [Fe/H] versus |Z| space most likely belongs to the GES merger. We also see other substructures and the distribution is not as smooth, indicating that the Galactic halo is made of stellar populations from several different merger events. In Figure 11 middle panels, we see the on-sky distribution of the stars in galactocentric radius versus height above the plane for the PDR1 and PGS-giants. We can see the uneven footprint and northern coverage of the Pristine survey for the PDR1-giants on the left panel while the PGS-giants on the right panel are all-sky. The PDR1-giants probe further out to slightly larger distances (~100 kpc) than the PGS-giants (~60 kpc) due to the lower S/N of CaHK narrowband magnitudes for the Gaia XP-based PGS-giants. This is, in turn, due to the relative brightness limit of Gaia XP spectra of the PGS-giants sample. On the right panel, we see a strong selection function at lower scale height due to dust extinction cut in the disc plane. To see the metallicity distribution in a spatial view along Cartesian coordinates, we refer the reader to Appendix C. In the bottom panel of Figure 11, we see the 1D-histogram of galactocentric and heliocentric distance probed by PDR1 (left) and PGS (right) giants. There is a steep decrease in the number of stars at larger distances, mostly due to the negative power law slope of about 4.0 in the halo (Hernitschek et al. 2018; Deason et al. 2018; Thomas et al. 2018; Starkenburg et al. 2019), but also due to selection functions in the underlying surveys and methods used to select the RGB stars. The small bump at about 20-40 kpc could be due to GES apocentre pile-ups (Perottoni et al. 2022) or the Sagittarius stream (Ibata et al. 2020).

Fig. 11

Density distribution of photometric metallicities versus absolute scale height (Z) (top), density distribution of giant stars in R versus Z frame (middle), and histogram of heliocentric (orange) and galac-tocentric (blue) distances probed by the giants catalogue (bottom) for PDR1 (left) and PGS (right) input catalogues. Note that the ranges of the x- and y-axes are different for PDR1- and PGS-giants in the middle and bottom panels.

Fig. 12

Metallicity structure of the halo as a function of height from the mid-plane (Z) for eight metallicity bins between −4 and 0 for the PDR1-giants catalogue. The fractional contribution of each metallicity bin to the population at a certain distance has been calculated. Stars below a scale height of 2 kpc have been cut away to avoid disc contamination.

Fig. 13

Five PARSEC RGB isochrones with metallicities at −0.25, −0.75, −1.25, −1.75, and −2.25 dex (i.e. centred on the metallicity bins we used for our weights) going from red to purple. The colour range of 0.5 < G_BP - G_RP < 1.5 is in dashed grey lines and the area that is not probed because of this colour cut is shown in grey. As metallicity increases, the colour cut means that we probe a smaller portion of the RGB, leading to an undersampling of metal-rich stars. We show four panels corresponding to the four most distant bins, given in kpc, where Malmquist bias reduces the range in absolute magnitude probed by each distance bin (where the probed area is seen in white and the area lost due to Malmquist bias is seen in red, the upper limit of which is simply the absolute magnitude of the upper limit of each distance bin). Only the three most distant bins are affected. Note how the most metal-rich isochrone in the 45-101 kpc bin does not enter into the white region at all. The combined effect of the colour and magnitude cut is that the metal-poor stars reach the brightest absolute magnitudes. Therefore, the further into the halo we probe, the smaller the fraction of metal-rich stars.

4.2 The bias corrected metallicity structure of the halo

Because the PDR1 catalogue contains both reliable distances and metallicities without major selection effects, we can present the metallicity distribution functions (MDFs) of the halo as a function of distance. Disc stars are removed to create a halo sample by using a cut of |Z| < 3 kpc⁵. We used six heliocentric distance bins: 3-4 kpc, 4-6 kpc, 6-10 kpc, 10-22 kpc, 22-45 kpc, and 45-101 kpc. These ranges ensure that both the numbers of stars in each bin and the distance range spanned by one bin change smoothly.

We have two sources of bias in our MDFs. The first one is introduced because of our colour cut, and the second one is introduced when binning the sample in distance due to our magnitude cut. We consider the former source of bias first. The colour range of 0.5 < G_BP,0 - G_RP,0 < 1.5 is the same for all stars, no matter their metallicity, but because the tip of the RGB becomes redder with increasing metallicity (see Figure 13 that shows the probed colour range for a set of PARSEC isochrones with varying metal-licities), we probed a smaller fraction of the metal-rich stars than the metal-poor stars. This led to an undersampling of metal-rich stars, which biases the shapes of our MDFs. Now we consider the source of bias due to magnitude. This bias arises because of our magnitude cut where we removed stars with G > 17.6, meaning that our MDFs are affected by a Malmquist bias. Again, looking at Figure 13, the 0.5 < G_BP,0 - G_RP,0 < 1.5 cut means that the brighter the absolute magnitudes that are probed, the fewer metal-rich stars that are included in the distant bin. This again leads to an undersampling of metal-rich stars, but an undersampling that increases with distance, as can be seen in the increasing size of the red region with distance. The figure shows the three distance bins that are affected by this Malmquist bias. We bias-corrected both of these effects by introducing weights to our MDFs for different metallicity ranges. On top of this, we also have the underlying Gaia’s scanning pattern selection effect on-sky (Riello et al. 2021), which is the S/N needed for a star to have Gaia XP information in DR3. This is a strong function of the location on the sky, but does not affect the MDF as strongly.

The weights are computed using PARSEC simulated stellar populations with Kroupa (2001, 2002) canonical two-part-power law initial mass function (IMF), corrected for unresolved binaries, as they contain labels for evolutionary stage, so that we can select only RGB stars. Using BaSTI isochrones requires manual removal of subgiant branch stars, which we aim to avoid. On the other hand, using MIST isochrones would require introducing an additional assumption - an α-enhancement offset -which we also prefer not to impose. Instead, we adopt a method similar to that of Youakim et al. (2020), who corrected metallic-ity biases introduced by colour cuts in main-sequence turn-off stars. We generate five simulated stellar populations using PARSEC isochrones, each representing an equidistant metallicity bin and with a total simulated mass of 100000 M_⊙. Rather than aiming for absolute population corrections—which would require knowledge of the total stellar mass within the catalogue’s footprint (for local correction) or of the entire Galactic halo (for global extrapolation)—we focus on the shape of the MDFs. Therefore, relative weights are sufficient for our purposes, and we only present normalised histograms. To apply selection effects consistently, we impose the same cuts on the simulated populations as on the observational catalogue. First, we apply a colour cut of 0.5 < G_BP - G_RP < 1.5, matching the Pristine survey’s criteria. Since the simulated stars lack distances and apparent magnitudes, we replicate the parallax-based CaMD and magnitude cuts (as described in Sections 3.1 and 3.2) by selecting stars with the RGB flag label == 3, which removes subgiant branch stars from the simulated data. As a result of this RGB-only selection, we also skip the colour-metallicity cut from Section 3.3, which is designed to remove subgiants. The final simulated catalogue thus represents a population composed entirely of RGB stars. Because the purity and completeness of our RGB catalogues are quite high, we can assume that this is a good approximation of our catalogue.

The colour bias weights are computed by taking the total mass in each metallicity bin after applying the label == 3 cut to it, and divide that by the total mass after the label == 3 and the colour cut has been applied. The reference weight, which we divide all weights by, is taken as the −1.5 < [Fe/H] < -1.0 metallicity bin weight. The resulting weights are seen in Table 1. Multiplying these times the MDFs will undo the undersampling of metal-rich stars that occurs because of our colour cut.

We now move to computing the magnitude cut weights. Only the three most distant bins are affected by Malmquist bias introduced by our magnitude cut, see Figure 13, and only they get weights assigned for this. For each metallicity bin as in Table 1, we compute the magnitude cut weights. For this, we divide the total mass for that metallicity bin by the mass for the same bin, but where the absolute magnitudes are brighter than the limiting magnitude for each distance bin. Because we once again are only interested in the relative weights for a given distance bin, we normalise the weights within one distance bin by dividing all weights with the weight for the most metal-poor bin. As we can see in Figure 13, the most metal-rich stellar population does not enter into the white region in the most distant bin, meaning that theoretically, we are not measuring the MDF of halo with heliocentric distances larger than 45 kpc where [Fe/H] > -0.5. This part of the MDF is greyed out in subsequent plots. The weights are presented in Table 2 and we bias-correct the MDFs by multiplying them with these values. These weights are slightly overcorrecting the MDFs as we assume the largest distance at each distance bin to correct for the Malmquist bias and not the distribution of the distances itself in each bin, which is out of the scope of this work. There can also be an excess of metal-poor stars due to the inherent methodology of using CaHK narrow-band as a proxy for stellar metallicity. This is because metal-poor stars are brighter than metal-rich stars in CaHK magnitudes and therefore have a higher signal-to-noise. To correct for this, we need extensive modelling of the survey’s photometry. However, we note that we used the brighter subsample of the survey, and thus, this effect should be very small, as seen in the left panel of Figure 9. The biases due to distance uncertainties and metallicity uncertainties in the faint end should also be small due to the large range of distances chosen in each bin, and the bin size chosen for the MDFs shown in Figure 14. Therefore, the simple bias-correcting technique presented in this work is a first step towards investigating the true view of the MDF of the outer halo.

After the weights in Tables 1 and 2 have been multiplied by the MDFs in each distance bin, we fit a Gaussian mixture model (GMM) decomposition to the MDFs. The number of components are chosen based on the lowest Bayesian Information Criteria (BIC), that ends up choosing three components as the optimal ones for all the different distance bins. The MDFs and their corresponding GMMs, with each contributing component, are shown in Figure 14 together with the amount of stars in each distance bin. Figure 15 illustrates the kernel density estimate (KDE) altogether per distance bin. The GMM components and their means μ, standard deviations σ and component weights ω are shown in Table 3, both with and without the weights.

Both figures show that the metal-poor peak (coming from low-mass accretion events and the tail of more massive accretion events) marked as 1 getting stronger with distance, but peaks in the 6-10 kpc bin. This is also seen in Table 3, where ω₁ has the largest value in that bin, meaning that it has the most contribution from the metal-poor peak. μ₁ is also the most negative for that bin. However, it has the largest dispersion σ₁ in the 45101 kpc bin. The variation in the mean between the 6-10 kpc and other more distant bins is very small, compared with the measured metallicity uncertainties. Therefore, it is safe to assume that this bin stays roughly constant past 6-10 kpc. The medium metallicity peak (coming from more massive accretion events), peak 2, decreases in metallicity with distance (μ₂), increases in strength with distance (ω₂), but its dispersion (σ₂) roughly stays constant. This peak has its contribution mostly from the last major merger, GES. The metal-rich peak (coming from ‘hot’ thick disc stars, i.e. thick disc stars on halolike orbits), peak 3, is most pronounced in the closest bin. Its metallicity (μ₃) also decreases with distance, its dispersion (σ₃) increases until the most distant bin, and its strength mostly decreases with distance (ω₃). This shows that the hot thick disc stars populate mostly the inner Galactic halo (d<10 kpc). In the outermost halo (d>22 kpc), the metal-rich peak could also contain the disrupting Sagittarius dwarf galaxy stream (Ibata et al. 2020), even though we only have a small number of stars that we associate with the stream as seen in upcoming subsections. We do not attempt to remove the stream specifically to perform the MDF analysis as the number of members we find is very low (N<400), and the lack of 6D phase space information at larger distances makes the stream member removal less reliable and creates more selection effects. The effect with distance on the unweighted values is more continuous, which shows the need for our bias-correcting method using weights. As we move further into the halo, the contribution to the stellar population from accreted dwarf galaxies increase (Naidu et al. 2020), which explains the increase in metallicity dispersion with distance.

We draw the conclusion that not only do the metal-poor components become stronger as distance increases, but each given component is also more metal-poor with distance. It is also clear from the MDFs that the halo contains a metal-rich component, peak 3, that persists even at large distances (however, this might be a mix of ‘hot’ thick disc and Sagittarius stream at large distances in the last distance bin). The halo is known to contain a red and blue colour-magnitude diagram (CMD) population (Gaia Collaboration 2018). The blue sequence comes from the GES merger, where the mass ratio between the red and blue populations indicates that the GES was massive enough to perturb in situ MW stars in the old thick disc to halo-like kinematics (Gallart et al. 2019). These kinematically heated stars can most clearly be seen at metallicities spanning the range −0.7 to −0.2 dex (Belokurov et al. 2020). The canonical thick (and thin) disc stars in our sample must be removed mostly by our |Z| < 3 kpc cut. This coincides with the range of μ₃ we measure at closer distances, and these splashed stars are likely the reason that we have such pronounced metal-rich peaks in our MDFs.

The decrease of metallicity with distance in the halo has been seen previously in both simulations (Starkenburg et al. 2017b) and observations (Dietz et al. 2020; Liu et al. 2022). There are claims that this negative metallicity gradient with distance might be due to selection effects as other authors have observed a lack of this gradient out to 100 kpc (Conroy et al. 2019b). The MDFs presented in this work have been bias-corrected and should provide a much cleaner representation of the underlying metallicity structure of the halo. We still clearly see a metallicity gradient with distance, but not as pronounced as it would be without the bias-correcting weights. This underscores the importance of our bias-correcting methods, while also highlighting the difference in the metallicity distribution of the Galactic halo (that is still present after accounting for the selection biases) at different distances out to 100 kpc.

Table 1

Bias-correction due to colour cut in the RGB selection.

Table 2

Bias-correction due to magnitude cut in the RGB selection.

Table 3

Mean μ, standard deviation σ, and component weights, ω, of the three GMM components for the MDFs seen in Figure 14.

4.3 Dynamical view of metallicity substructures

In this section, we create the subsample of 6D positions and velocities of our PDR1 and PGS-giants catalogue and highlight the many different substructures seen in the outer halo down to lower metallicities using this 6D information.

4.3.1 Radial velocities from spectroscopic surveys

To derive 6D phase-space information for a subset of our sample, we cross-matched our PDR1/PGS-giants with catalogs from the SDSS DR12 Sloan Extension for Galactic Understanding and Exploration (SEGUE, York et al. 2000), the Large Sky Area Multi-Object Fiber Spectroscopic Telescope Medium and Low Resolution Surveys (LAMOST MRS, & LRS DR7, Zhao et al. 2006), the RAdial Velocity Experiment (RAVE DR6, Steinmetz et al. 2006), the Galactic Archaeology with HERMES spectroscopic survey (GALAH DR3, Buder et al. 2021), the APO Galactic Evolution Experiment (APOGEE DR17, Majewski et al. 2017), Southern Stellar Stream Spectroscopic Survey (S5 DR1, Li et al. 2019), and Gaia Radial Velocity Spectrometer (Gaia RVS DR3, Gaia Collaboration 2023b). These surveys complement each other in probing lower to higher latitudes, brighter to fainter stars, and northern and southern hemispheres. Even though the combination of all of these surveys along with the Pristine survey and/or the Gaia XP sample gives raise to a complex selection function, we try to extract as much information as possible from the literature for our giants catalogue, and refrain from modelling the selection function, given the simple science cases shown in this work. The radial velocities are corrected for their zero point offsets with each other using Gaia RVS radial velocities as the zero point (similar to what has been done in Ruiz-Lara et al. 2022). In this work, we refer to stars with 6D information as ‘PDR1/PGS 6D giants’ and the full catalogues as ‘PDR1/PGS-giants’ implying 5D information without line-of-sight velocities. The PDR1 6D giants go out to ~65 kpc and PGS 6D giants go out to ~45 kpc in heliocentric distances.

4.3.2 Sagittarius stream

In Figure 16, we show the inferred photometric distances versus right ascension (RA) for PDR1-giants (top two panels) and PGS-giants (bottom panel), with the Sagittarius stream track from Hernitschek et al. (2017) overlaid to guide the eye through the Sagittarius member stars in our sample. All the panels have π<0.05 mas (parallax<0.05, distance>20 kpc) to remove nearby field giants (same as removing stars d<10 kpc using our inferred photometric distances). In the top panel, we show the metal-rich stars ([Fe/H]<-1.0) to pick out the structures we see in Figure 12. We can see the leading arm traced out to ~60 kpc along with the spur feature 3 reported by Sesar et al. (2017). It is important to note that the spur feature seen in the top and bottom panels of Figure 16 is at the same distance probed by their RR Lyrae counterpart, showing the reliability of our distances. This spur feature close to the apocentre of the leading arm is selected with the Sagittarius stream coordinate absolute latitude cut within 20°, but it remains up to 9°, with few members going down to 5°, consistent with what is seen in the literature with standard candle tracers. This confirms the association of the spur feature with the stream itself and the existence of such apocentre lumps is seen in most Sagittarius simulations for the disruption of a Sagittarius dwarf Spheroidal (dSph)-like galaxy (Sesar et al. 2017). However, the nearby distances are still quite noisy due to field star contamination. This does not change with the parallax cut that removes nearby stars. In the middle panel, we show all stars within 20° of the Sagittarius stream coordinate latitude (using the Vasiliev et al. 2021 coordinate conversion). From the bulk of stars from GES that are more metal poor than Sagittarius, we do not see the Sagittarius signal clearly in this plot. However, we trace the trailing arm out to the apocentre more cleanly (with a small offset in distance that matches well within the distance uncertainties) in this view. Adding a metallicity cut on top of the stream latitude cut improves the selection at nearby distances but we do not see the trailing arm at larger distances anymore. In the bottom panel, we show the same for PGS-giants and we clearly trace the leading arm and the spur feature 3 in this catalogue. We do not see Sagittarius as much in PGS-giants, mostly due to the distance-metallicity selection effect due to which we do not see any metal-rich stars at higher distances and the Sagittarius stream is relatively metal-rich compared to the bulk of halo stars at large distances. A similar analysis using PDR1/PGS 6D giants is discussed in detail in Appendix D. This results in a sparser, more nearby, but cleaner selection of Sagittarius stream members due to the availability of 6D information for a full kinematic selection. The cleanest selection of Sagittarius in both the catalogues of giants are using 6D phase space information where available and if not, using a metal-rich cut for PDR1-giants or a latitude cut on PGS-giants as can be seen in the top and bottom panels of Figure 16. However, we know that the Sagittarius streams have a clear metallicity gradient, which will impact the MDF in lower metallicities as well (Cunningham et al. 2024).

Fig. 14

Metallicity distribution functions from PDR1-giants in six galactocentric distance bins and their corresponding GMMs (solid lines) and the three individual GMM components (dashed lines). The GMMs are shown next to each other in Figure 15. All distances are given in kpc. The individual GMM components are labelled as 1, 2 and 3 with decreasing metallicity in the top panel. The region [Fe/H] > -0.5 dex is greyed out for the most distant bin as we are not properly probing this region of the MDF, see Figure 13.

Fig. 15

Best-fit GMMs to the MDFs seen in Figure 14. The region [Fe/H] > -0.5 dex is greyed out for the most distant bin as we are not properly probing this region of the MDF, see Figure 13. All distances are given in kpc. The means, standard deviations and weights of the different peaks can be seen in Table 3.

4.3.3 Metallicity view of integrals-of-motion space

Substructures from small or large dwarf galaxies accreted onto the Milky Way have a distinct chemistry to the field halo stars (in situ). They will have similar metallicities with a relatively smaller metallicity dispersion and spread in chemical abundances as seen in merger events (Leaman 2012) such as GES ([Fe/H] = −1.18±0.3), Helmi streams ([Fe/H] = −1.28±0.19), Sequoia ([Fe/H] = −1.59±0.25), Thamnos ([Fe/H] = −1.9±0.41), LMS-1/Wukong ([Fe/H] = −1.58±0.23), Sagittarius ([Fe/H] = −1.0±0.2), and Cetus ([Fe/H] = −2.17±0.20, [Fe/H] = ~−2.0 in some other works Thomas & Battaglia 2022; Yuan et al. 2022) (see e.g. Malhan et al. 2022, for more details on these values). It is important to note that even though the reported metallicity dispersions of these accreted dwarf galaxies are approximated to be Gaussians, in reality these galaxies have a long tail towards the metal-poor end that is more difficult to measure and constrain reliably (Leaman 2012). These dispersion measurements in chemical abundances works well for substructures with [Fe/H]> −2.5. For lower metallicities, its harder to separate accretion events from the general halo using chemical abundances (see e.g. Sestito et al. 2024). However, in this study, a large fraction of our stars have [Fe/H] > -2.5, which makes it easier to separate accretion events by combining chemical and dynamical information, such as by binning in metallicity and examining their distribution in IOM space, as shown in Section 4.3.3.

This implies that different accretion events contributing to various regions of the Galactic halo-particularly their stillunmixed debris-can manifest as over- or underdensities in the relative number of stars at specific metallicities within narrow distance ranges. Studying the relative contribution of different metallicity bins to different distances in the Galaxy also allow us to understand the metallicity structure of the halo. Figure 12 shows the relative fraction of stars in 8 different metallicity bins along the absolute height above the disc plane. We remove stars with IZI<2 kpc to avoid thin/thick disc contamination. Fractions are computed for 40 bins in distances, spaced evenly on a logarithmic scale between 2 and 90 kpc. From Figure 17, we can see the relatively metal rich stars ([Fe/H]>-1.0) coming from Sagittarius stream (Cunningham et al. 2024) at large Z. On the intermediate metallicities (−1.0<[Fe/H]<-1.5), we see the apocentre pile-ups of GES merger, such as Hercules-Aquila Cloud (HAC), and Virgo Overdensity (VOD) at smaller Z (<30 kpc) and Outer Virgo Overdensity (OVO) and outer HAC at larger Z (>3 0 kpc) (see e.g. Belokurov et al. 2007; Newberg et al. 2007; Sesar et al. 2017). At scale heights larger than 40 kpc, the contribution from VMP stars increases very steeply. We associate about 40% of the total halo to VMP stars and 20% to EMP stars in the outermost halo (d>65 kpc). These numbers demonstrate the power of our RGB catalogue in probing further out to 100 kpc, down to the most metal-poor stars in the Galaxy.

Merger debris from different accretion events that made up the Milky Way halo in the distant past are clustered in the IOM space (Helmi & de Zeeuw 2000). Here, we use two typical quantities as lOMs: the angular momentum in the z-direction (L_z), and the total energy (E). L_z is truly conserved in an axisymmetric potential, while varying slowly in a triaxial potential, maintaining a certain degree of clustering for stars on similar orbits from the same accretion event, though it is not fully conserved. The total energy, E, was computed as $E = \frac{1}{2} v^{2} + Φ (r),$ $Mathematical equation: E = \frac{1}{2} v^2 + \Phi(r),$ (3)

where Φ(r) is the Galactic gravitational potential at the star’s location. For this analysis, we used the same potential as in Lövdal et al. (2022): a Miyamoto-Nagai disc, Hernquist bulge, and Navarro-Frenk-White halo with parameters (a_d, b_d) = (6.5, 0.26) kpc, M_d = 9.3 × 10¹⁰ M_⊙ for the disc, c_b = 0.7 kpc, M_b = 3.0 × 10¹⁰ M_⊙ for the bulge, and r_s = 21.5 kpc, c_h = 12 kpc, and Mhalo = 10¹² M_⊙ for the halo. We used a low renormalised unit weight error (ruwe < 1.4) to use stars with good quality astrometry and remove potential binaries. We assume V_LSR = 232 km/s, a distance of 8.2 kpc between the Sun and the Galactic centre (McMillan 2017), and (U_⊙, V_⊙, W_⊙) = (11.1, 12.24, 7.25) km/s for the peculiar motion of the Sun (Schönrich et al. 2010).

We show the IOM space (energy versus angular momentum in z-direction) colour-coded by the mean metallicities for the PDR1 6D giants on the top panel of Figure 17. We choose PDR1 over PGS 6D giants due to the larger distances probed, the fact that we are looking for phase-mixed structures (not coherent ones) which reduces the impact of the Pristine survey footprint, the higher quality of the photometric metallicities, and the low-to-no metallicity biases. In the bottom panels of Figure 17, we show the IOM space in ten different bins of metallicities indicating the different structures/accretion events identified in the literature (Koppelman et al. 2019; Myeong et al. 2019; Yuan et al. 2020; Naidu et al. 2020; Lövdal et al. 2022; Ruiz-Lara et al. 2022; Malhan et al. 2022; Thomas & Battaglia 2022; Yuan et al. 2022; Horta et al. 2023; Dodd et al. 2023). The bins are chosen wide enough to see most of one dwarf accretion event in one bin, given their metallicity dispersion scales. All these panels and subpanels are plotted for an absolute scale height greater than 3 kpc cut to remove disc stars (IZI>3 kpc). This inevitably also removes foreground and/or background stars from a spherical halo distribution, but the effects of this should be minimal given that the halo completeness matters less than the purity for studying phase mixed halo substructures.

In the first panel, for [Fe/H] > -0.5, we can clearly see prograde thick disc stars still left over in our sample. In the next bin (−0.75 < [Fe/H] < -0.5), we see the hot thick disc/Splash stars that are thick disc stars kicked up to halo-like orbits, likely resulting from the heating of the primordial high-α thick disc due to early mergers. We also find the ’Aleph’ structure in this bin, a highly circular structure that is significantly enriched ([Fe/H] = −0.5, [a/Fe] = 0.2), and may be associated with the enigmatic globular cluster Palomar 1. Its origin, whether in situ or ex situ, is still ambiguous. From the next panels, we start seeing accretion events that made up the Milky Way halo. Between metallicities of −1.0 and −0.75, we see the now-disrupting Sagittarius stream. We are probing the lower energies of Sagittarius in this work, mostly due to the fact that the trailing arm (at higher energy) is not covered as well as the leading arm (at lower energy) in our PDR1 6D giants. Parts of the Sagittarius stream are also visible at lower metallicities, but fewer in number. It has a clear and distinct negative L_y, which we use to select 6D members of the Sagittarius stream in Appendix D (see Chandra et al. 2023b, who also use the same criteria to select Sagittarius stream). From [Fe/H] < -1.0, we already start seeing the last major merger, GES, and metal-poor end of Sgr stream, down to metallicities of −2.0. However, the density of GES stars at L_z ~ 0 peaks in the metallicity bin −1.5<Fe/H]<-1.25. We probe large distances and higher energies of this last major merger event, down to lower metallicities, for the first time. The lower energies of GES contour does not necessarily have to belong to GES, but could belong to the old protogalaxy or the metalpoor tail of the hot thick disc. However, we need more chemical abundances to distinguish them. This is the same metallicity bin where we see the highly retrograde high energy Sequoia event and the Helmi streams, which is one of the first halo structures discovered through IOMs (Helmi et al. 1999). In the more metal-poor bin (−1.75<Fe/H]<-1.5), we see two more metal-poor structures namely, Thamnos, a low-mass (~2×10⁶ M_θ), retrograde structure deep in the potential well of the Galaxy, and LMS-1/Wukong that is still disrupting and is reported to be VMP in some studies (Malhan et al. 2021). However, we find a strong density peak around this metallicity bin, which is about 0.5 dex more metal-rich than some literature studies (Malhan et al. 2021), but similar in metallicities to the some other (Naidu et al. 2020). The next two metallicity bins look cleaner with no indication of any significant substructures. We do see a group of highly retrograde (L_z>-0.5 kpc km/s) and high energy (E—0.5 km²/s²) VMP stars (Fe/H]<-2) in the metallicity bin −2.25<[Fe/H]<-2 separate from the rest of the distribution, the origin or reality of which needs a bigger sample of homogenously analysed 6D giants to be characterised out to large distances. In the future, with the upcoming WEAVE, 4MOST and DESI surveys (Jin et al. 2024; de Jong et al. 2019; Cooper et al. 2023) and our own high-resolution spectroscopic follow-up, we will have this 6D information and chemistry to understand the accretion history of our Milky Way at the VMP end. In the VMP bin −2.5<[Fe/H]< −2.25, we see the very prograde, still-disrupting Cetus stellar stream which is one the lowest metallicity structures from a dwarf galaxy accreted onto to the Milky Way. The final and most metal-poor bin still consists of 345 PDR1 6D giants, which is one of the largest collation of giants out to large distances and down to very low metallicities. This bin is almost free of substructures, but is slightly prograde and centrally concentrated, reminiscent of the proto-galaxy/poor-old-heart/Aurora population that are thought to be of in situ origin tracing the infant Milky Way stage (Belokurov & Kravtsov 2022; Rix et al. 2022; Belokurov & Kravtsov 2023; Ardern-Arentsen et al. 2024). Among these many accretion events, we do not recover some of them namely, Shiva, Shakti and Pontus (Malhan 2022; Malhan & Rix 2024). This could be due to selection effects in the many different spectroscopic surveys and the Pristine survey itself. On the other hand, this could also be explained by the selection effect in the Gaia RVS radial velocities (Lane et al. 2022; Dillamore et al. 2023) and ridges caused by bar resonances (Dillamore et al. 2024) for Shiva and Shakti events and Pontus being captured as the low energy part of the last major merger, GES (Amarante et al. 2022). The same plots are made but colour-coded by galactocentric distances, to understand the region of the Galaxy probed by these accretion events (wherein the stars with lower energies have lower distances and higher energies have higher distances, as would be expected) and summarised in Appendix E. In the top panel of Figure 17, we can already associate the many different substructures described above with their places in the IOM by eye as they are colour-coded by the mean metallicities. With a significant number of stars in each metal-licity bin going further out into the halo, this is the first time we are able to associate all these substructures with their metallicity view of the IOM space.

Fig. 16

Sagittarius stream in our RGB catalogues. Photometric distance versus RA in the PDR1-giants catalogue at higher metallicities ([Fe/H]>-1.0, top), close to the Sagittarius stream plane (IBI<20°, middle), and the latter for the PGS-giants catalogue (bottom). All panels have Sagittarius stream tracks adapted from Hernitschek et al. (2017). Note the spur feature (associated with feature 3 from Sesar et al. (2017). Leading and trailing arms are respectively labelled ‘L’ and ‘T’. The stream members go further out and are more prominent with PDR1-giants than PGS-giants mostly due to the distance-metallicity selection effect in the latter catalogue.

Fig. 17

Metallicity view of the IOM space for the PDR1-giants catalogue. The energy versus angular momentum in the z-direction is colour-coded by the PDR1 metallicities, with the Sun’s E-L_z shown as a red star (top centre). The energy versus z-angular momentum at different slices in metallicities is colour-coded by their density distributions with different substructures highlighted for PDR1 6D giants (bottom panels). The following acronyms are used in this figure: HTD, hot thick disc; Sgr, Sagittarius; GES, Gaia-Enceladus-Sausage; LMS-1, low-mass stream-1; MW, Milky Way.

4.4 Outer halo metal-poor substructures

The outer halo is intriguing because it holds clues about the formation and evolution of the Milky Way, including the remnants of past mergers and accretion events and their turning points. Fully characterizing the global extent of structures found in local samples also requires looking further into the outer halo. In Figure 18, we show the on-sky distribution of the outer halo (d>40 kpc) colour-coded by the density of stars. We have ~2000 PDR1-giants and ~200 PGS-giants in the outer halo (that are all VMP due to the distance metallicity selection effect in PGS-giants), at distances greater than 40 kpc and Galactic latitudes higher than 20° to avoid extinction affecting our distance and metallicity estimates in the outer halo. We have almost no radial velocity from publicly available spectroscopic surveys in this outer halo subsample. This is the largest collection of VMP stars out to such large distances, allowing us to study some of the earliest times of our galaxy’s evolution.

In the top panel of Figure 18, we clearly see the Pristine survey footprint preventing us from looking at the all-sky distribution of outer halo substructures. However, the outer halo is full of many substructures, both dynamical and chemical. We clearly see overdensities near the Sagittarius stream tracks shown as black dashed lines. We see the highest density of stars around part of the region that overlaps significantly with the outer Virgo overdensity (outer-VOD) (Sesar et al. 2017). This has been associated to the apocentric pileup of debris from the GES accretion event using a very complementary sample of outer halo giants by Chandra et al. (2023b). The peak of the metallicity distribution for these stars is also close to the mean metallicity of GES. However, a larger number of 6D members would be needed to confirm this association. Linking overdensities such as HAC and VOD to larger accretion events has also been explored by Balbinot & Helmi (2021). We also see an overdensity of stars in the southern hemisphere around the same region as the Pisces Plume with the Magellanic wake overdensities but at higher longitudes. We guide the eye using the following track on the sky: l = −60 + 0.2b + 0.01b² shifted by ±10° (modified slightly from what was reported in Chandra et al. 2023b for the Pisces Plume). However, this could also simply be explained by more stars being present in this region as it is getting closer to lower latitudes, where the stellar density is higher along with the Gaia’s scanning law effect in the same region (see Figure 10 in MS23). A larger on-sky stretch of stars in this region would allow for a disentangling of these effects.

In the bottom panel of Figure 18, we see an all-sky view of the metal-poor Milky Way outer halo. Due to the distance-metallicity selection effect (see Section 3.5.2), we are free of metal-rich stars in the outer halo, which allows us to study the outer halo’s earliest evolution more easily. This is also why we do not see the Sagittarius stream which is one of the most prominent outer-halo metal-rich substructures. We see a clear overdensity of stars at the same region as HAC in the north (Belokurov et al. 2007). However, these stars are at very large distances compared to the distances probed by HAC North and South (d<20 kpc) and these are all VMP. The origin of this overdensity is unclear and due to the unavailability of radial velocities, it is impossible to derive 6D-phase space information and orbital parameters. Therefore, we cannot associate it to GES or any other accretion events yet. We indicate this with a blue box and an outer-HAC label on Figure 18 bottom panel.

In this bottom panel of Figure 18 in the south, we see a clear overdensity that we associate with the Pisces Plume (Belokurov et al. 2019) which could be a mix of the dynamical wake due to LMC’s infall and the stellar counterpart of the Magellanic stream (MSS). We show the LMC and SMC (infalling satellites of the Milky Way) as large and small orange circles in the same figure. This overdensity is almost fully co-incident with the infalling orbits of the Magellanic Clouds. The overdensity close just above the Pisces plume and closer to inner Galaxy maybe caused due to Gaia’s scanning law and metal-poor end of the Sagittarius dwarf, but we need 6D members to confirm this. We discuss the Pisces Plume substructure more in detail in the next subsection.

Fig. 18

Outer halo of the Milky Way as viewed by PDR1-giants (top, Pristine footprint) and PGS-giants (bottom, Gaia’s all-sky view) at heliocentric distances greater than 40 kpc. The colour map is a density distribution in Galactic coordinates at absolute latitudes greater than 20°. We highlight the significant overdensities that dominate the outer halo and identify their most-likely progenitors. In the top panel, we also show the footprint of the entire Pristine survey in grey. Some of the pixels in the outer halo part of PDR1-giants are larger than the Pristine footprint due to the chosen healpix level.

4.4.1 The Pisces Plume in the outer halo

Using Gaia DR2, Belokurov et al. (2019) used all-sky RR Lyrae stars to uncover a plume-like elongation near the Pisces Overdensity (we only map the plume and not the overdensity, mostly due to Gaia’s scanning law pattern that creates northern and southern caps of underdensities in these regions) extending to larger distances. This elongation, aligned with the direction of the gaseous MS, suggests a connection to the Clouds. Based on the kinematics and metallicities of a small subsample of BHB stars in the Pisces Plume of this ‘Pisces Plume’, they argued that it predominantly represents the dynamical friction wake imprinted on the Milky Way’s halo by the infall of the LMC, rather than being composed of stripped stars from the Clouds. Conroy et al. (2021) used all-sky RGB stars to suggest that the southern overdensity and a northern counterpart correspond to the dynamical friction wake and the ’collective response’ of the LMC’s infall, matching predictions from simulations (Garavito-Camargo et al. 2019, 2021). However, they found that the southern overdensity is twice as strong in the data as predicted by simulations, possibly indicating multiple populations within the Pisces Plume.

The origin of this Pisces Plume is uncertain; it is still unclear whether it contains debris stripped from the Magellanic Clouds, ram-pressure stripping (the Magellanic stream, Putman et al. 2003), or if it primarily represents the dynamical friction wake of the Large Magellanic Cloud (Garavito-Camargo et al. 2019; Conroy et al. 2021). It is important to note that almost all of our member stars in this region are VMP ([Fe/H]<-2.0, most of them are below −2.5). The Magellanic Stream is an extensive gaseous structure gracefully encircling the Milky Way and spanning over 140 degrees of the southern sky. Despite decades of dedicated observations and simulation efforts, the precise origin of the Magellanic Stream remains elusive. Two major formation processes, tidal disruption and ram-pressure stripping, are competing explanations for its existence. To complicate matters further, its trailing arm is also the region on-sky that experiences the Large Magellanic Cloud’s dynamical friction wake. The theorised stellar counterpart to the gaseous Magellanic Stream is the MSS, which was recently traced in the relatively metal-rich end by Chandra et al. (2023a) using a Gaia XP spectra giants catalogue that is complementary to ours. The stellar stream provides strong constraints on the distance and kinematics of the gaseous Magellanic Stream, helping us understand the past orientation and interaction history between the Clouds and the Milky Way. By accurately characterizing the MSS’s detailed chemical abundances, we can study the chemical evolution in the outskirts of the Clouds and the interaction of their haloes with the outer Galactic halo.

Fig. 19

Our VMP MSS member candidates projected along the Magellanic stream coordinates (panel 1), proper motions along and transverse to the stream (panels 2 and 3), heliocentric distance (panel 4), along projected stream coordinates with their transverse motion highlighted using velocity vectors (panel 5), and line-of-sight radial velocity (where available, from PGS 6D giants, panel 6). All the parameters are plotted against the Magellanic stream longitude. The projection includes particles from both tidal debris models in Besla et al. (2012), LMC, SMC parameters adapted from Kallivayalil et al. (2006a,b), and the Magellanic stellar stream debris discovered by Chandra et al. (2023a).

4.4.2 The Magellanic stellar stream down to the very metal-poor end

We select all our MSS member candidates between the orange polynomial lines shifted by ±20° as shown in the bottom panel of Figure 18 at distances larger than 25 kpc. We choose this value because at large distances, our method tends to underestimate distances more than it overestimates them (see Figure 7 for the comparison with Starhorse distances). In Figure 19, we show various kinematic properties of our VMP MSS member candidates in star symbols along the Magellanic stream longitude L_MS (coordinate conversion based on the Nidever et al. (2008) stream axis definition). We overlay the two Magellanic stream models from Besla et al. (2012, 2013) that was modelled to trace the gas for comparison. Model 1 (M1) was designed to best match the velocity of the Magellanic stream (MS) and model 2 (M2) to match the kinematics of the Clouds themselves. The LMC and SMC are also shown as big and small orange stars in all the panels. Their positions and velocities are taken from Kallivayalil et al. (2006a,b) and the LMC and SMC have a median metallicity of −0.5 and −1.0 respectively. We also overlap Chandra et al. (2023a) members as circles. These members and our member candidates are colour-coded by spectroscopic metalliccities where available and, if not, PGS photometric metallicities. We find 41 member candidates by association to the MSS’s trailing arm in the south in proper motion and positions, 47 are VMP ([Fe/H]<-2), 32 are [Fe/H]<-2.5 and 9 are EMP ([Fe/H]<-3) stars out to 70 kpc. Chandra et al. (2023a) confirmed 7 relatively metal-rich members but also serendipitously discovered 6 members that are relatively metal poor ([Fe/H]<-1.5). Their metal-rich population is described as extended and stream-like, while the metal-poor population is more diffuse and clumpy. In summary, we find the metal-poor population to also exhibit a stream-like elongated orientation.

In Figure 19 panel 1, we see the members in stream coordinates and our members lie in the same region occupied by the six metal-poor members from Chandra et al. (2023a), but more elongated. These are slightly offset from the models which are closely tracing the gas. This could be because the gas is tracing stars that are tidally disrupted from the disc regions rather than the outskirts of the Clouds. These VMP stars must be some of the oldest stars associated with the Clouds that got kicked away into a stream-like structure from the outskirts of the SMC (also maybe the LMC, even though it is relatively metal-rich) from the LMC-SMC interaction (see Navarrete et al. 2019, who trace stream-like SMC stars accreted onto the LMC halo at L_MS >0). Panels 2 and 3 show the stream longitude versus proper motions in L_MS and B_MS, respectively. The VMP MSS candidates match well with the broad direction of the model’s proper motions, but have a larger range and dispersion than the Chandra et al. (2023a) members, especially in the B_MS direction. Panel 4 shows the heliocentric distances that are much closer than the Chandra et al. (2023a) members, reminiscent of the cloud-associated debris from Zaritsky et al. (2020). These stars have relatively nearby distances compared to the metal-rich members. The metal-poor members from Chandra et al. (2023a) have similar (relatively nearby) distances comparable to our MSS VMP candidates’ distances, which could mean that the metal-poor members are closer than the metal-rich members. On the other hand, we caution that our distances tend to be biased towards closer distances. Thus, we need more reliable spectrophotometric distances to confirm this. Panel 5 shows the same as panel 1, but with transverse velocity vectors, most of which point in the same direction as the Clouds themselves. However, the members at smaller L_MS have a bit more random motion than the members close to the Clouds. Therefore, these members are less likely to be members of the MSS. Panel 6 shows the line-of-sight velocities for three members in our selection that are in common with the PGS 6D giants subset. Two of these members are from the S5 (DR1) survey and one member is from LAMOST (DR7 MRS) survey. All of these members are VMP and one of the S5 members has a spectroscopic metallicity [Fe/H] = −2.55 ±0.07, which is already the most metal-poor member of the MSS discovered yet. All these 6D stars are remarkably consistent with the models in panel 6 with respect to the line-of-sight velocities that are expected to trace the gaseous MS.

In this work, we find 41 stars at the metal-poor end, in the outer halo that we tentatively associate with the Pisces Plume overdensity/the Magellanic stellar stream. Even with the kinematic parameters roughly aligning with the gaseous MS, its models and MSS members presented in Chandra et al. (2023a), there is a possibility that part of these stars could be associated with the dynamical wake due to the LMC’s infall. In the work of Chandra et al. (2023a), from their Pisces plume members, they find that at least 7 out of 45 stars (or ≥15%) in their Pisces Plume overdensity appear to be confidently identified as debris from the Clouds. In our work, from our member candidates of Pisces plume, to clearly understand the percentage contribution from the Clouds themselves and the halo response at the VMP end, we need full 6D phase space information for all these stars and detailed chemical abundances. We have an ongoing spectroscopic follow-up program with the Gemini GHOST instrument (McConnachie et al. 2024) for >10 stars in this region covering the full extent of the plume at the bright end as a pilot program. From a full chemodynamical analysis of the stars in this region through follow-up spectroscopy, we can uncover the origin story of these VMP stars around the Pisces Plume, assess the contribution from the Clouds’ tidal debris versus the field halo’s response to the LMC’s infall, while also understanding the true origin of the metal-poor stars in the MSS.

Table 4

Description of the columns of the PDR1 and PGS catalogues of giants made publicly available in this work.

5 Conclusions and outlook

In this work, we have constructed two large RGB catalogues using the publicly available Pristine data: PDR1 (based on CFHT CaHK photometry) and PGS (based on Gaia XP spectra). These catalogues contain reliable photometric distances and metallic-ities, probing down to [Fe/H] = −3.5 and out to distances of - 100 kpc. Below, we state the technical and scientific highlights of the paper.

Technical highlights

The RGB selection pipeline combines (i) a CaMD cut using good enough parallaxes (uncertainty ≤ 50%) to separate giants from dwarfs (see Section 3.1, Figures 1, 2), (ii) a magnitude cut (G < 17.6) to retain bright giants with poor parallaxes while removing dwarfs (Section 3.2, Figure A.1), and (iii) a colour-metallicity cut using MIST isochrones to remove subgiants (Section 3.3, Figure 3; pipeline summarised in Figure 4);
Photometric distances are derived via isochrone interpolation using each star’s [Fe/H], T_eff, and log g. Uncertainties (−12%) are estimated from offsets in a good-parallax control sample (Section 3.4, Figures 6, 7). Distances are validated using inverted parallaxes and StarHorse estimates (maximum scatter < 20-40%). PDR1 metallicities are more precise (uncertainty −0.1 dex) than PGS (−0.4 dex at G-16) due to a higher S/N in the narrowband CaHK photometry;
Caveats include a slight underestimation of distances for red clump and metal-poor HB stars, as these are not explicitly modelled in the isochrone fitting (Section 3.5.1, Figure 8). PGS also shows a distance-metallicity selection bias due to stricter CaHK uncertainty thresholds, preferentially selecting metal-poor stars at larger distances (Section 3.5.2, Figure 9);
After all the quality cuts (including removal of subgiants with log g >3.5), we obtained 180 314 RGB stars in PDR1 and 2 420 898 in PGS, with reliable metallicities and distances (Figure 7);
The final purity and completeness based on the surface gravities from A23 are 78% and 76% for PDR1 and 92% and 82% for PGS (Figure 10).

Scientific highlights

The [Fe/H]-distance plane (Figure 11) shows distinct halo substructures, including the GES merger ([Fe/H]—1.4) out to −50 kpc. PDR1 gives a bias-corrected view of the outer halo; PGS provides an all-sky map of VMP halo stars. Relative contributions of different metallicities as a function of height above the plane reveal overdensities such as Sagittarius, HAC, and VOD and a steep increase in the VMP halo beyond 40 kpc (Figure 12);
Using bias-corrected metallicity distributions based on PARSEC simulated stellar populations, we modelled MDFs in six galactocentric distance bins. A three-component GMM fit shows that with increasing distance, the halo becomes more metal-poor and the dispersion increases (Figures 14 and 15);
We constructed a 6D sample (with radial velocities) of 111305 PDR1 and 1706 006 PGS-giants, probing to ~70 and ~50 kpc, respectively (Figure 17);
Sagittarius stream members were identified using both 5D and 6D selections. In the absence of radial velocities, metal-rich ([Fe/H]>-1) stars from PDR1 and a latitude cut (|B|<20°) on PGS stars work well to select the Sagittarius stream. With 6D, the cleanest selection is via a Ly<-0.25×10⁴ kpc km/s cut (Figures 16, D.1, D.2);
Mapping the IOM space with 6D PDR1-giants and slicing in metallicity revealed known substructures, including Sagittarius, GES, Helmi streams, Sequoia, LMS-1, Cetus, and more. A new retrograde, high-energy, VMP outer halo component was also detected (Figure 17);
In the outer halo (d > 40 kpc), we identified Sagittarius, outer-VOD (possibly GES debris), outer-HAC, and a prominent overdensity along the Pisces Plume, possibly tied to the Magellanic Clouds (Figure 18);
We identified 41 Magellanic Stream candidates among PGS VMP stars, three with 6D data that trace gaseous MS velocities. One has [Fe/H] = −2.55±0.07, the most metal-poor MSS candidate to date. These require follow-up chemodynamical confirmation (Figure 19);
Both RGB catalogues are made publicly available in the format shown in Table 4.

The PDR1 catalogue enables bias-corrected studies of the outer halo’s MDF and accretion history, while the all-sky PGS catalogue maps the VMP halo. These RGB samples can probe the metallicity gradient, identify ancient dwarf galaxy debris, and trace faint halo substructures. With future chemodynami-cal studies from WEAVE (Jin et al. 2024), 4MOST (de Jong et al. 2019), DESI (Cooper et al. 2023; Koposov et al. 2024), and our own Gemini/GHOST follow-up, we will unveil the complex assembly of the Galaxy’s outermost regions-which has the most exciting discovery space that has not been probed by most Galactic studies yet.

Data availability

PDR1/PGS-catalogues of giants are available at the CDS via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/706/A195

Acknowledgements

AV thanks Ewoud Wempe, and Vedant Chandra for their helpful discussion on this work. AV also thanks Tomas Ruiz-Lara and Eduardo Balbinot for making the Gaia DR3 catalogues and relevant survey cross-matches available to the ‘Galactica’ group members immediately after the Gaia Data Release 3. AV and CN thank Gurtina Besla, and Himansh Rathore for sharing their simulations of tidal Magellanic stream debris from Besla et al. (2012, 2013). AV gratefully acknowledges support from the Canadian Institute for Theoretical Astrophysics (CITA) through a CITA National Fellowship and the International Astronomical Union (IAU) and the Gruber Foundation through a IAU Gruber Fellowship. AB acknowledges the European Union’s Erasmus+ Traineeship program, and the Edinburgh Doctoral College Scholarship (ECDS) that funded her contribution to this work. ES, and MM acknowledge funding through VIDI grant “Pushing Galactic Archaeology to its limits” (with project number VI.Vidi.193.093) which is funded by the Dutch Research Council (NWO). TM was supported by a Gliese Fellowship at the Zentrum für Astronomie, University of Heidelberg, Germany. NFM gratefully acknowledge support from the French National Research Agency (ANR) funded project “Pristine” (ANR-18-CE31-0017) along with funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 834148). FS and KAV thank the National Sciences and Engineering Research Council of Canada for funding through the Discovery Grants and CREATE programs. AAA acknowledges support from the Herchel Smith Fellowship at the University of Cambridge and a Fitzwilliam College research fellowship supported by the Isaac Newton Trust. GB acknowledges support from the Agencia Estatal de Investigación del Ministerio de Ciencia en Innovación (AEIMICIN) and the European Social Fund (ESF+) under grant PRE2021-100638. GB, and GF acknowledges support from the Agencia Estatal de Investigación del Ministerio de Ciencia en Innovación (AEI-MICIN) and the European Regional Development Fund (ERDF) under grant number AYA2017-89076-P, the AEI under grant number CEX2019-000920-S and the AEI-MICIN under grant number PID2020-118778GBI00/10.13039/501100011033. Based on observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/DAPNIA, at the Canada-France-Hawaii Telescope (CFHT) which is operated by the National Research Council (NRC) of Canada, the Institut National des Sciences de l’Univers of the Centre National de la Recherche Scientifique of France, and the University of Hawaii. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. This research was supported by the International Space Science Institute (ISSI) in Bern, through ISSI International Team project 540 (The Early Milky Way). This research has been partially funded from a Spinoza award by NWO (SPI 78-411). AV also thanks the availability of the following packages and tools that made this work possible: vaex (Breddels & Veljanoski 2018), pandas (Reback et al. 2022), astropy (Astropy Collaboration 2022), NumPy (Oliphant 2006; Van Der Walt et al. 2011), SciPy (Jones et al. 2001), matplotlib (Hunter 2007), seaborn (Waskom et al. 2016), agama (Vasiliev 2019), gala (Price-Whelan 2017), galpy (Kluyver et al. 2016), healpy (Zonca et al. 2019), gaiadr3-zeropoint (Lindegren et al. 2021), jax (Schoenholz & Cubuk 2019), scikit-learn (Pedregosa et al. 2011), JupyterLab (Kluyver et al. 2016), and topcat (Taylor 2017).

References

Abadi, M. G., Navarro, J. F., & Steinmetz, M. 2006, MNRAS, 365, 747 [Google Scholar]
Aguado, D. S., Youakim, K., González Hernández, J. I., et al. 2019, MNRAS, 490, 2241 [NASA ADS] [CrossRef] [Google Scholar]
Amarante, J. A. S., Debattista, V. P., Beraldo e Silva, L., Laporte, C. F. P., & Deg, N. 2022, ApJ, 937, 12 [NASA ADS] [CrossRef] [Google Scholar]
An, D., Beers, T. C., Johnson, J. A., et al. 2013, ApJ, 763, 65 [NASA ADS] [CrossRef] [Google Scholar]
Andrae, R., Rix, H.-W., & Chandra, V. 2023, ApJS, 267, 8 [NASA ADS] [CrossRef] [Google Scholar]
Ardern-Arentsen, A., Monari, G., Queiroz, A. B. A., et al. 2024, MNRAS, 530, 3391 [NASA ADS] [CrossRef] [Google Scholar]
Astropy Collaboration (Price-Whelan, A. M., et al.) 2022, ApJ, 935, 167 [NASA ADS] [CrossRef] [Google Scholar]
Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Mantelet, G., & Andrae, R. 2018, AJ, 156, 58 [Google Scholar]
Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae, R. 2021, AJ, 161, 147 [Google Scholar]
Balbinot, E., & Helmi, A. 2021, A&A, 654, A15 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Battaglia, G., North, P., Jablonka, P., et al. 2017, A&A, 608, A145 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Beers, T. C., & Christlieb, N. 2005, ARA&A, 43, 531 [Google Scholar]
Beers, T. C., Preston, G. W., & Shectman, S. A. 1985, AJ, 90, 2089 [NASA ADS] [CrossRef] [Google Scholar]
Beers, T. C., Preston, G. W., & Shectman, S. A. 1992, AJ, 103, 1987 [NASA ADS] [CrossRef] [Google Scholar]
Belokurov, V., & Kravtsov, A. 2022, MNRAS, 514, 689 [NASA ADS] [CrossRef] [Google Scholar]
Belokurov, V., & Kravtsov, A. 2023, MNRAS, 525, 4456 [NASA ADS] [CrossRef] [Google Scholar]
Belokurov, V., Evans, N. W., Bell, E. F., et al. 2007, ApJ, 657, L89 [CrossRef] [Google Scholar]
Belokurov, V., Deason, A. J., Erkal, D., et al. 2019, MNRAS, 488, L47 [Google Scholar]
Belokurov, V., Erkal, D., Evans, N. W., Koposov, S. E., & Deason, A. J. 2018, MNRAS, 478, 611 [Google Scholar]
Belokurov, V., Sanders, J. L., Fattahi, A., et al. 2020, MNRAS, 494, 3880 [Google Scholar]
Besla, G., Kallivayalil, N., Hernquist, L., et al. 2012, MNRAS, 421, 2109 [Google Scholar]
Besla, G., Hernquist, L., & Loeb, A. 2013, MNRAS, 428, 2342 [NASA ADS] [CrossRef] [Google Scholar]
Bonaca, A., Conroy, C., Wetzel, A., Hopkins, P. F., & Keres, D. 2017, ApJ, 845, 101 [NASA ADS] [CrossRef] [Google Scholar]
Bonaca, A., Conroy, C., Cargile, P. A., et al. 2020, ApJ, 897, L18 [NASA ADS] [CrossRef] [Google Scholar]
Bond, H. E. 1970, ApJS, 22, 117 [NASA ADS] [CrossRef] [Google Scholar]
Bond, H. E. 1981, ApJ, 248, 606 [NASA ADS] [CrossRef] [Google Scholar]
Bonifacio, P., Caffau, E., Sestito, F., et al. 2019, MNRAS, 487, 3797 [CrossRef] [Google Scholar]
Bonifacio, P., Monaco, L., Salvadori, S., et al. 2021, A&A, 651, A79 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Breddels, M. A., & Veljanoski, J. 2018, A&A, 618, A13 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bressan, A., Marigo, P., Girardi, L., et al. 2012, MNRAS, 427, 127 [NASA ADS] [CrossRef] [Google Scholar]
Buder, S., Sharma, S., Kos, J., et al. 2021, MNRAS, 506, 150 [NASA ADS] [CrossRef] [Google Scholar]
Bullock, J. S., & Johnston, K. V. 2005, ApJ, 635, 931 [Google Scholar]
Caffau, E., Bonifacio, P., Starkenburg, E., et al. 2017, Astron. Nachr., 338, 686 [NASA ADS] [CrossRef] [Google Scholar]
Caffau, E., Bonifacio, P., Sbordone, L., et al. 2020, MNRAS, 493, 4677 [NASA ADS] [CrossRef] [Google Scholar]
Caffau, E., Lombardo, L., Mashonkina, L., et al. 2023, MNRAS, 518, 3796 [Google Scholar]
Carollo, D., Beers, T. C., Lee, Y. S., et al. 2007, Nature, 450, 1020 [NASA ADS] [CrossRef] [Google Scholar]
Carollo, D., Beers, T. C., Chiba, M., et al. 2010, ApJ, 712, 692 [NASA ADS] [CrossRef] [Google Scholar]
Chandra, V., Naidu, R. P., Conroy, C., et al. 2023a, ApJ, 956, 110 [CrossRef] [Google Scholar]
Chandra, V., Naidu, R. P., Conroy, C., et al. 2023b, ApJ, 951, 26 [NASA ADS] [CrossRef] [Google Scholar]
Chen, T., & Guestrin, C. 2016, XGBoost: A Scalable Tree Boosting System (New York: Association for Computing Machinery) [Google Scholar]
Chiba, M., & Beers, T. C. 2000, AJ, 119, 2843 [NASA ADS] [CrossRef] [Google Scholar]
Choi, J., Dotter, A., Conroy, C., et al. 2016, ApJ, 823, 102 [Google Scholar]
Christlieb, N., Bessell, M. S., Beers, T. C., et al. 2002, Nature, 419, 904 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
Cohen, J. G., Sesar, B., Bahnolzer, S., et al. 2017, ApJ, 849, 150 [Google Scholar]
Conroy, C., Bonaca, A., Cargile, P., et al. 2019a, ApJ, 883, 107 [NASA ADS] [CrossRef] [Google Scholar]
Conroy, C., Naidu, R. P., Zaritsky, D., et al. 2019b, ApJ, 887, 237 [NASA ADS] [CrossRef] [Google Scholar]
Conroy, C., Naidu, R. P., Garavito-Camargo, N., et al. 2021, Nature, 592, 534 [NASA ADS] [CrossRef] [Google Scholar]
Cooper, A. P., Parry, O. H., Lowing, B., Cole, S., & Frenk, C. 2015, MNRAS, 454, 3185 [Google Scholar]
Cooper, A. P., Koposov, S. E., Allende Prieto, C., et al. 2023, ApJ, 947, 37 [NASA ADS] [CrossRef] [Google Scholar]
Cunningham, E. C., Hunt, J. A. S., Price-Whelan, A. M., et al. 2024, ApJ, 963, 95 [NASA ADS] [CrossRef] [Google Scholar]
Das, P., & Binney, J. 2016, MNRAS, 460, 1725 [NASA ADS] [CrossRef] [Google Scholar]
De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
Deason, A. J., Belokurov, V., & Evans, N. W. 2011, MNRAS, 416, 2903 [NASA ADS] [CrossRef] [Google Scholar]
Deason, A. J., Belokurov, V., & Weisz, D. R. 2015, MNRAS, 448, L77 [NASA ADS] [CrossRef] [Google Scholar]
Deason, A. J., Belokurov, V., & Koposov, S. E. 2018, ApJ, 852, 118 [NASA ADS] [CrossRef] [Google Scholar]
Deason, A. J., Koposov, S. E., Fattahi, A., & Grand, R. J. J. 2023, MNRAS, 520, 6091 [Google Scholar]
Dietz, S. E., Yoon, J., Beers, T. C., & Placco, V. M. 2020, ApJ, 894, 34 [NASA ADS] [CrossRef] [Google Scholar]
Dillamore, A. M., Belokurov, V., Evans, N. W., & Davies, E. Y. 2023, MNRAS, 524, 3596 [NASA ADS] [CrossRef] [Google Scholar]
Dillamore, A. M., Belokurov, V., & Evans, N. W. 2024, MNRAS, 532, 4389 [NASA ADS] [CrossRef] [Google Scholar]
Dodd, E., Callingham, T. M., Helmi, A., et al. 2023, A&A, 670, L2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dotter, A. 2016, ApJS, 222, 8 [Google Scholar]
Eggen, O. J., Lynden-Bell, D., & Sandage, A. R. 1962, ApJ, 136, 748 [NASA ADS] [CrossRef] [Google Scholar]
El-Badry, K., Bland-Hawthorn, J., Wetzel, A., et al. 2018, MNRAS, 480, 652 [NASA ADS] [CrossRef] [Google Scholar]
Frebel, A., & Norris, J. E. 2015, ARA&A, 53, 631 [Google Scholar]
Frebel, A., Norris, J. E., Gilmore, G., & Wyse, R. F. G. 2016, ApJ, 826, 110 [NASA ADS] [CrossRef] [Google Scholar]
Gaia Collaboration (Babusiaux, C., et al.) 2018, A&A, 616, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gaia Collaboration (Montegriffo, P., et al.) 2023a, A&A, 674, A33 [CrossRef] [EDP Sciences] [Google Scholar]
Gaia Collaboration (Vallenari, A., et al.) 2023b, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gallart, C., Bernard, E. J., Brook, C. B., et al. 2019, Nat. Astron., 3, 932 [NASA ADS] [CrossRef] [Google Scholar]
Garavito-Camargo, N., Besla, G., Laporte, C. F. P., et al. 2019, ApJ, 884, 51 [NASA ADS] [CrossRef] [Google Scholar]
Garavito-Camargo, N., Besla, G., Laporte, C. F. P., et al. 2021, ApJ, 919, 109 [NASA ADS] [CrossRef] [Google Scholar]
Gilmore, G., Norris, J. E., Monaco, L., et al. 2013, ApJ, 763, 61 [NASA ADS] [CrossRef] [Google Scholar]
GRAVITY Collaboration (Abuter, R., et al.) 2018, A&A, 615, L15 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hansen, T. T., Holmbeck, E. M., Beers, T. C., et al. 2018, ApJ, 858, 92 [Google Scholar]
Helmi, A., & White, S. D. M. 1999, MNRAS, 307, 495 [CrossRef] [Google Scholar]
Helmi, A., White, S. D. M., de Zeeuw, P. T., & Zhao, H. 1999, Nature, 402, 53 [Google Scholar]
Helmi, A., & de Zeeuw, P. T. 2000, MNRAS, 319, 657 [Google Scholar]
Helmi, A., Babusiaux, C., Koppelman, H. H., et al. 2018, Nature, 563, 85 [Google Scholar]
Hernitschek, N., Sesar, B., Rix, H.-W., et al. 2017, ApJ, 850, 96 [NASA ADS] [CrossRef] [Google Scholar]
Hernitschek, N., Cohen, J. G., Rix, H.-W., et al. 2018, ApJ, 859, 31 [NASA ADS] [CrossRef] [Google Scholar]
Hidalgo, S. L., Pietrinferni, A., Cassisi, S., et al. 2018, ApJ, 856, 125 [Google Scholar]
Horta, D., Schiavon, R. P., Mackereth, J. T., et al. 2023, MNRAS, 520, 5671 [NASA ADS] [CrossRef] [Google Scholar]
Huang, Y., Beers, T. C., Wolf, C., et al. 2022, ApJ, 925, 164 [NASA ADS] [CrossRef] [Google Scholar]
Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
Ibata, R., Bellazzini, M., Thomas, G., et al. 2020, ApJ, 891, L19 [NASA ADS] [CrossRef] [Google Scholar]
Ibata, R., Malhan, K., Tenachi, W., et al. 2024, ApJ, 967, 89 [NASA ADS] [CrossRef] [Google Scholar]
Iorio, G., & Belokurov, V. 2019, MNRAS, 482, 3868 [NASA ADS] [CrossRef] [Google Scholar]
Ivezić, Ž., Sesar, B., Juric, M., et al. 2008, ApJ, 684, 287 [NASA ADS] [CrossRef] [Google Scholar]
Jin, S., Trager, S. C., Dalton, G. B., et al. 2024, MNRAS, 530, 2688 [NASA ADS] [CrossRef] [Google Scholar]
Jones, E., Oliphant, T., Peterson, P., et al. 2001, SciPy: Open source scientific tools for Python [Google Scholar]
Kallivayalil, N., van der Marel, R. P., & Alcock, C. 2006a, ApJ, 652, 1213 [NASA ADS] [CrossRef] [Google Scholar]
Kallivayalil, N., van der Marel, R. P., Alcock, C., et al. 2006b, ApJ, 638, 772 [NASA ADS] [CrossRef] [Google Scholar]
Khoperskov, S., Minchev, I., Libeskind, N., et al. 2023a, A&A, 677, A89 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Khoperskov, S., Minchev, I., Steinmetz, M., et al. 2023b, arXiv e-prints [arXiv:2310.05287] [Google Scholar]
Kielty, C. L., Venn, K. A., Sestito, F., et al. 2021, MNRAS, 506, 1438 [CrossRef] [Google Scholar]
Kluyver, T., Ragan-Kelley, B., Pérez, F., et al. 2016, Jupyter Notebooks - A Publishing Format for Reproducible Computational Workflows (IOS Press) [Google Scholar]
Koposov, S. E., Allende Prieto, C., Cooper, A. P., et al. 2024, MNRAS, 533, 1012 [Google Scholar]
Koppelman, H., Helmi, A., & Veljanoski, J. 2018, ApJ, 860, L11 [NASA ADS] [CrossRef] [Google Scholar]
Koppelman, H. H., Helmi, A., Massari, D., Price-Whelan, A. M., & Starkenburg, T. K. 2019, A&A, 631, L9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Kroupa, P. 2001, MNRAS, 322, 231 [NASA ADS] [CrossRef] [Google Scholar]
Kroupa, P. 2002, Science, 295, 82 [Google Scholar]
Lane, J. M. M., Bovy, J., & Mackereth, J. T. 2022, MNRAS, 510, 5119 [NASA ADS] [CrossRef] [Google Scholar]
Lardo, C., Mashonkina, L., Jablonka, P., et al. 2021, MNRAS, 508, 3068 [NASA ADS] [CrossRef] [Google Scholar]
Leaman, R. 2012, AJ, 144, 183 [NASA ADS] [CrossRef] [Google Scholar]
Lee, A., Lee, Y. S., Kim, Y. K., Beers, T. C., & An, D. 2023, ApJ, 945, 56 [NASA ADS] [CrossRef] [Google Scholar]
Li, H., Tan, K., & Zhao, G. 2018, ApJS, 238, 16 [CrossRef] [Google Scholar]
Li, T. S., Koposov, S. E., Zucker, D. B., et al. 2019, MNRAS, 490, 3508 [NASA ADS] [CrossRef] [Google Scholar]
Lindegren, L., Bastian, U., Biermann, M., et al. 2021, A&A, 649, A4 [EDP Sciences] [Google Scholar]
Liu, G., Huang, Y., Bird, S. A., et al. 2022, MNRAS, 517, 2787 [NASA ADS] [CrossRef] [Google Scholar]
Lombardo, L., Bonifacio, P., Caffau, E., et al. 2023, MNRAS, 522, 4815 [CrossRef] [Google Scholar]
Lövdal, S. S., Ruiz-Lara, T., Koppelman, H. H., et al. 2022, A&A, 665, A57 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lucchesi, R., Lardo, C., Jablonka, P., et al. 2022, MNRAS, 511, 1004 [NASA ADS] [CrossRef] [Google Scholar]
Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017, AJ, 154, 94 [NASA ADS] [CrossRef] [Google Scholar]
Malhan, K. 2022, ApJ, 930, L9 [NASA ADS] [CrossRef] [Google Scholar]
Malhan, K., & Rix, H.-W. 2024, ApJ, 964, 104 [NASA ADS] [CrossRef] [Google Scholar]
Malhan, K., Ibata, R. A., & Martin, N. F. 2018, MNRAS, 481, 3442 [Google Scholar]
Malhan, K., Yuan, Z., Ibata, R. A., et al. 2021, ApJ, 920, 51 [NASA ADS] [CrossRef] [Google Scholar]
Malhan, K., Ibata, R. A., Sharma, S., et al. 2022, ApJ, 926, 107 [NASA ADS] [CrossRef] [Google Scholar]
Martin, S., Yuan, Z., Fouesneau, M., et al. 2024, A&A, 692, A115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
McConnachie, A. W., Hayes, C. R., Robertson, J. G., et al. 2024, PASP, 136, 035001 [Google Scholar]
McMillan, P. J. 2017, MNRAS, 465, 76 [NASA ADS] [CrossRef] [Google Scholar]
Medina, G. E., Li, T. S., Koposov, S. E., et al. 2025, arXiv e-prints [arXiv:2504.02924] [Google Scholar]
Myeong, G. C., Vasiliev, E., Iorio, G., Evans, N. W., & Belokurov, V. 2019, MNRAS, 488, 1235 [Google Scholar]
Naidu, R. P., Conroy, C., Bonaca, A., et al. 2020, ApJ, 901, 48 [Google Scholar]
Navarrete, C., Belokurov, V., Catelan, M., et al. 2019, MNRAS, 483, 4160 [Google Scholar]
Newberg, H. J., Yanny, B., Cole, N., et al. 2007, ApJ, 668, 221 [Google Scholar]
Nidever, D. L., Majewski, S. R., & Butler Burton, W. 2008, ApJ, 679, 432 [Google Scholar]
Nissen, P. E., & Schuster, W. J. 2010, A&A, 511, L10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Oliphant, T. E. 2006, A Guide to NumPy (Trelgol Publishing USA) [Google Scholar]
Pastorelli, G., Marigo, P., Girardi, L., et al. 2020, MNRAS, 498, 3283 [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Perottoni, H. D., Limberg, G., Amarante, J. A. S., et al. 2022, ApJ, 936, L2 [NASA ADS] [CrossRef] [Google Scholar]
Pietrinferni, A., Cassisi, S., Salaris, M., & Castelli, F. 2004, ApJ, 612, 168 [Google Scholar]
Price-Whelan, A. M. 2017, J. Open Source Softw., 2, 388 [NASA ADS] [CrossRef] [Google Scholar]
Putman, M. E., Staveley-Smith, L., Freeman, K. C., Gibson, B. K., & Barnes, D. G. 2003, ApJ, 586, 170 [NASA ADS] [CrossRef] [Google Scholar]
Queiroz, A. B. A., Anders, F., Chiappini, C., et al. 2023, A&A, 673, A155 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Reback, J., Jbrockmendel, McKinney, W., et al. 2022, https://doi.org/10.5281/zenodo.5824773 [Google Scholar]
Riello, M., De Angeli, F., Evans, D. W., et al. 2021, A&A, 649, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Rix, H.-W., Chandra, V., Andrae, R., et al. 2022, ApJ, 941, 45 [NASA ADS] [CrossRef] [Google Scholar]
Ruiz-Lara, T., Matsuno, T., Lövdal, S. S., et al. 2022, A&A, 665, A58 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ryan, S. G., & Norris, J. E. 1991, AJ, 101, 1865 [NASA ADS] [CrossRef] [Google Scholar]
Schoenholz, S. S., & Cubuk, E. D. 2019, Adv. Neural Inform. Process. Syst., 33 [Google Scholar]
Schönrich, R., Binney, J., & Dehnen, W. 2010, MNRAS, 403, 1829 [NASA ADS] [CrossRef] [Google Scholar]
Searle, L., & Zinn, R. 1978, ApJ, 225, 357 [Google Scholar]
Sesar, B., Juric, M., & Ivezić, Z. 2011, ApJ, 731, 4 [CrossRef] [Google Scholar]
Sesar, B., Hernitschek, N., Dierickx, M. I. P., Fardal, M. A., & Rix, H.-W. 2017, ApJ, 844, L4 [Google Scholar]
Sestito, F., Ardern-Arentsen, A., Vitali, S., et al. 2024, A&A, 690, A333 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sestito, F., Martin, N. F., Starkenburg, E., et al. 2020, MNRAS, 497, L7 [Google Scholar]
Smee, S. A., Gunn, J. E., Uomoto, A., et al. 2013, AJ, 146, 32 [Google Scholar]
Soubiran, C., Le Campion, J.-F., Brouillet, N., & Chemin, L. 2016, A&A, 591, A118 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Starkenburg, E., Helmi, A., Morrison, H. L., et al. 2009, ApJ, 698, 567 [Google Scholar]
Starkenburg, E., Martin, N., Youakim, K., et al. 2017a, MNRAS, 471, 2587 [NASA ADS] [CrossRef] [Google Scholar]
Starkenburg, E., Oman, K. A., Navarro, J. F., et al. 2017b, MNRAS, 465, 2212 [NASA ADS] [CrossRef] [Google Scholar]
Starkenburg, E., Aguado, D. S., Bonifacio, P., et al. 2018, MNRAS, 481, 3838 [NASA ADS] [CrossRef] [Google Scholar]
Starkenburg, E., Youakim, K., Martin, N., et al. 2019, MNRAS, 490, 5757 [Google Scholar]
Steinmetz, M., Zwitter, T., Siebert, A., et al. 2006, AJ, 132, 1645 [Google Scholar]
Suda, T., Katsuta, Y., Yamada, S., et al. 2008, PASJ, 60, 1159 [NASA ADS] [Google Scholar]
Taylor, M. 2017, arXiv e-prints [arXiv:1811.09480] [Google Scholar]
Thomas, G. F., & Battaglia, G. 2022, A&A, 660, A29 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Thomas, G. F., McConnachie, A. W., Ibata, R. A., et al. 2018, MNRAS, 481, 5223 [NASA ADS] [CrossRef] [Google Scholar]
Tumlinson, J. 2010, ApJ, 708, 1398 [CrossRef] [Google Scholar]
Van Der Walt, S., Colbert, S. C., & Varoquaux, G. 2011, Comput. Sci. Eng., 13, 22 [Google Scholar]
Vasiliev, E. 2019, MNRAS, 482, 1525 [Google Scholar]
Vasiliev, E., Belokurov, V., & Erkal, D. 2021, MNRAS, 501, 2279 [NASA ADS] [CrossRef] [Google Scholar]
Venn, K. A., Irwin, M., Shetrone, M. D., et al. 2004, AJ, 128, 1177 [NASA ADS] [CrossRef] [Google Scholar]
Venn, K. A., Kielty, C. L., Sestito, F., et al. 2020, MNRAS, 492, 3241 [NASA ADS] [CrossRef] [Google Scholar]
Viswanathan, A., Starkenburg, E., Koppelman, H. H., et al. 2023, MNRAS, 521, 2087 [NASA ADS] [CrossRef] [Google Scholar]
Viswanathan, A., Starkenburg, E., Matsuno, T., et al. 2024, A&A, 683, L11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Viswanathan, A., Yuan, Z., Ardern-Arentsen, A., et al. 2025, A&A, 695, A112 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Wang, F., Zhang, H. W., Xue, X. X., et al. 2022, MNRAS, 513, 1958 [NASA ADS] [CrossRef] [Google Scholar]
Waskom, M., Botvinnik, O., Drewokane, et al. 2016, https://doi.org/10.5281/zenodo.45133 [Google Scholar]
Xue, X.-X., Rix, H.-W., Ma, Z., et al. 2015, ApJ, 809, 144 [Google Scholar]
Xylakis-Dornbusch, T., Christlieb, N., Hansen, T. T., et al. 2024, A&A, 687, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Yanny, B., Rockosi, C., Newberg, H. J., et al. 2009, AJ, 137, 4377 [Google Scholar]
York, D. G., Adelman, J., Anderson, John E. J., et al. 2000, AJ, 120, 1579 [NASA ADS] [CrossRef] [Google Scholar]
Youakim, K., Starkenburg, E., Aguado, D. S., et al. 2017, MNRAS, 472, 2963 [NASA ADS] [CrossRef] [Google Scholar]
Youakim, K., Starkenburg, E., Martin, N. F., et al. 2020, MNRAS, 492, 4986 [CrossRef] [Google Scholar]
Yuan, Z., Myeong, G. C., Beers, T. C., et al. 2020, ApJ, 891, 39 [NASA ADS] [CrossRef] [Google Scholar]
Yuan, Z., Martin, N. F., Ibata, R. A., et al. 2022, MNRAS, 514, 1664 [CrossRef] [Google Scholar]
Zaritsky, D., Conroy, C., Naidu, R. P., et al. 2020, ApJ, 905, L3 [NASA ADS] [CrossRef] [Google Scholar]
Zhang, H., Ardern-Arentsen, A., & Belokurov, V. 2024, MNRAS, 533, 889 [NASA ADS] [CrossRef] [Google Scholar]
Zhao, G., Chen, Y.-Q., Shi, J.-R., et al. 2006, Chinese J. Astron. Astrophys., 6, 265 [NASA ADS] [CrossRef] [Google Scholar]
Zhu, H., Guo, R., Shen, J., et al. 2024, ApJ, 974, 167 [Google Scholar]
Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Source Softw., 4, 1298 [Google Scholar]
Zuo, W., Du, C., Jing, Y., et al. 2017, ApJ, 841, 59 [NASA ADS] [CrossRef] [Google Scholar]

Throughout this work, when we refer to parallax, we mean the corrected parallax. The parallax (π) is corrected for its individual zero-point offset (π_offset) using the gaiadr3_zeropoint python module, following the procedure outlined in Lindegren et al. (2021).

The “egood enough” parallax is defined as π>0 and f = Δπ/π ∈ (0, 1).

This range in metallicity reflects the metallicity distribution of the sample. Note that the Pristine survey metallicities go down to −4 dex, but for halo-like ages around 11 Gyr the subgiant branch turn-off starts to behave in unexpected ways for metallicities below −2.5 dex (see Dotter 2016), and we do not use isochrones below these metallicities for the interpolation.

⁴

Good parallax in this subsection refers to stars with π>0 and f = ∆π/π ≤ 0.05 or 0.1 or 0.2, depending on what is specified.

⁵

We opt to not use |Z| < 2 kpc as in the previous section to make sure we only measure the halo MDF. Since the MDFs themselves do not have any distance information in them, disentangling thick disc contributions to the halo MDF would be difficult. Therefore, the stricter |Z|<3 kpc cut is preferred.

Appendix A Gaia G for dwarfs with good parallax

Fig. A.1

Gaia G magnitude distribution of dwarfs with good parallax (f ≤ 0.2) removed using CaMD cuts shown in Figure 2 before applying the magnitude cut. We can see that our magnitude cut falls just before the distribution falls off due to incompleteness in magnitudes where dwarfs do not have good parallaxes (f ≤ 0.2) (in orange). Selected stars’ distribution used for the CaMD cut is shown in blue.

In Figure A.1, we show the Gaia G apparent magnitude distribution for dwarfs that are discarded using the CaMD cut described in Section 3.1 and Figure 2. From the distribution in Figure A.1, we can see that at magnitude of about 17.3, the number of dwarfs with ‘good enough’ parallax drops steeply, indicating that dwarfs with good parallax (f ≤ 0.2) are not complete beyond this limit, i.e. not all dwarfs beyond this magnitude have good enough parallax that allow us to remove dwarfs and keep giants based on their ‘bad parallax’. We empirically choose the magnitude limit to select giants based on their bad parallax at G>17.6. This is very close to the G<17.3 cut suggested by Figure A.1, but slightly different as this is also approximately the limiting magnitude of the Gaia XP spectra and in turn, the limiting magnitude of publicly available PDR1 catalogue, allowing us to select giants in these publicly available catalogues.

Appendix B Comparison of inferred distances with Bailer-Jones et al. (2021) photogeometric distances

Fig. B.1

Comparison of our photometric distances with Gaia DR3 Bailer-Jones et al. (2021) photogeometric distances for PDR1 (left) and PGS (right) giants catalogue constructed in this work.

In Figure B.1, we show a comparison of our inferred photometric distances against Bailer-Jones et al. (2021) good quality photogeometric distances (<5% uncertainties). The photogeometric distances utilise the G magnitude and BP-RP colour from Gaia DR3. These distance estimates incorporate directiondependent priors based on a detailed model of the Galaxy’s 3D structure, taking into account the distribution, colours, and magnitudes of stars as observed by Gaia. This model also factors in interstellar extinction and the Gaia selection function. Tests with mock data, alongside validation against independent measurements and open clusters, indicate that these estimates remain reliable up to several kpc. However, for faint or distant stars, the prior often plays a significant role in the estimation, which is one of the main reasons we scale our inferred distances to be constrained well with the inverted-parallax measurements which is fully observational with no assumptions on the Galaxy, making it more reliable for VMP stars, as the priors on the Galaxy distribution do not always apply the same way for the most metalpoor stars. From Figure B.1, we can see that both the PDR1 and PGS-giants are in good agreement with the photogeometric distances, with the PDR1 distances with much lower spread (<20%) compared to PGS-giants (≤40%) due to the quality of photometric metallicities being way higher for PDR1-giants compared to PGS-giants (see MS23).

Appendix C Metallicity structure with spatial distributions

Fig. C.1

Spatial distribution of galactocentric cartesian Z versus X (top), Z versus Y (middle), and Y versus X (bottom), colour coded by mean metallicity for PDR1 (left) and PGS (right) giants.

In Figure C.1, we show the spatial distribution (in X-Z, Y-Z, and X-Y planes) of our PDR1 (left) and PGS-giants (right) colour-coded by their mean metallicities. The consequence of having the Pristine survey footprint is seen on left panels with PDR1-giants whereas the right panels based on PGS-giants cover the entire sky. On the right panels, we can see how the mean metallicities drop to lower metallicities at larger distances showing the power of the catalogue to look for metal-poor substructures in the outskirts of our Galxy, despite the distance-metallicity selection effect. On the left panels, we see a less biased view of mean metallicities across the spatial cartesian coordinates. We can also clearly see the large distances probed by both these catalogues of giants.

Appendix D Sagittarius in 6D giants

Fig. D.1

Angular momentum in the y-direction versus z-direction (top) and energy versus angular momentum in z-direction for PDR1 6D giants. Sagittarius stream is seen as a clear overdensity in negative y-direction angular momentum and is separated using a simple L_y cut. These Sagittarius stream stars are shown in orange in both the panels. The same cut is used in PGS 6D giants to isolate Sagittarius stream members but the number of member stars are significantly lower.

In this subsection, we show how we select 6D Sagittarius stream members. Figure D.1 shows the angular momentum in y-direction versus the z-direction in the top panel and energy E versus z-component angular momentum in the bottom panel for PDR1-giants which has more Sagittarius members than the PGS-giants. We select Sagittarius based on the hallmark negative L_y values (less than −0.5×10⁴ kpc km/s) and show them in orange in both the panels. These orange stars fall nicely in the region of Sagittarius stream in the IOM space (also seen in Figure 17) with few highly prograde and retrograde contaminants.

In Figure D.2, we show the distance versus RA view of the 6D stream members in PDR1 (top) and PGS (bottom) giants with the Hernitschek et al. (2017) track for the Sagittarius stream overlaid in blue. We can see that the stream members are selected much more efficiently and with cleaner and clearer overdensities along the stream tracks (see Figure 16 for a comparison with 5D selection) using this 6D giants selection. We end up with 374 and 409 confident 6D stream members in PDR1 and PGS-giants sample.

Fig. D.2

PDR1 (top) and PGS (bottom) 6D Sagittarius stream members in distance versus RA plane with Sagittarius stream tracks adapted from Hernitschek et al. (2017). Leading and trailing arms are indicated with a ’L’ and ’T’ respectively.

Appendix E IOM with distances

In Figure E.1, we show the IOM space sliced in bins of photometric metallicities colour-coded by their mean galactocentric distances for PDR1 6D giants. From this figure, we can associate the different substructures and phase-mixed streams to the inner/outer halo. We see the Sagittarius stream clearly dominating the metal-rich end of the outer Galactic halo, and Cetus stream dominates the outer Galactic halo in the metal-poor end. We can see the range of distances probed by the GES merger event (up to about 40 kpc). We find LMS-1/Wukong substructure in the intermediate metallicities at intermediates distances as well. We also find other nearby halo substructures such as Tham-nos and Helmi streams. We also see the retrograde intermediate metallicities at higher distances than just the solar neighbourhood; however, the Sequoia merger is still mostly in the nearby halo. In the VMP bin, −2.5<[Fe/H]<-2.0, we see distant stars in highly retrograde and higher energy orbits (shown with dashed circle). This substructure, if real, needs a more thorough chemo-dynamical investigation. In the most metal-poor bin, we see the stars being more centrally concentrated reminiscent of a protoGalaxy state of the Milky Way, with oldest stars in the inner galaxy (Tumlinson 2010; Starkenburg et al. 2018; El-Badry et al. 2018; Belokurov & Kravtsov 2022; Rix et al. 2022), but also find V/EMP stars ([Fe/H]<-2.5) out to 50 kpc. This can be already seen from [Fe/H]<-1.5, wherein the more rotating stars are more centrally concentrated than in [FeH]>-1.5. The general trend when colour-coded by mean distances is that the stars close to the inner galaxy have lower energies as they have sunken into the deep potential well of the Galaxy while the stars with large distances occupy the more higher energy orbits, as expected.

Fig. E.1

Energy versus z-angular momentum at different slices in metallicities colour-coded by their galactocentric distances for PDR1 6D giants with some known outer halo substructures annotated.

All Tables

Table 1

Bias-correction due to colour cut in the RGB selection.

	Fig. 2 Colour-absolute magnitude diagram of the training sample, with absolute magnitudes computed using Gaia parallaxes, π, with the conditions that π > 0″ and that the fractional parallax uncertainty f = 0.5.
In the text

	Fig. 4 Flowchart of the steps involved in creating the two RGB star catalogues using the parent samples from PDR1 and PGS catalogues. The number of stars removed at each selection step for the two input catalogues are shown in the red boxes, the method and cuts used to select RGB stars on every step is shown in orange boxes, and the final sample with counts are shown in the green box.
In the text

	Fig. 9 Metallicities versus heliocentric distances for PDR1 (left) and PGS (right) catalogues of giants presented in this work colour-coded by CaHK narrow-band magnitude uncertainties. The distance-metallicity selection effect caused by low S/N on the CaHK narrowband from Gaia XP spectra is highlighted by the blue filled area on the right.
In the text

	Fig. 10 Distribution of the A23 surface gravities before and after the catalogue pipeline (described in Figure 4) is applied on the PDR1 (top) and PGS (bottom) input catalogues. The vertical line at log g=3.5 shows the separation used to validate the selected sample of giants. In all the panels, the overdensity of stars at log g~2.5 from RC stars are clearly visible.
In the text

	Fig. 12 Metallicity structure of the halo as a function of height from the mid-plane (Z) for eight metallicity bins between −4 and 0 for the PDR1-giants catalogue. The fractional contribution of each metallicity bin to the population at a certain distance has been calculated. Stars below a scale height of 2 kpc have been cut away to avoid disc contamination.
In the text

	Fig. 15 Best-fit GMMs to the MDFs seen in Figure 14. The region [Fe/H] > -0.5 dex is greyed out for the most distant bin as we are not properly probing this region of the MDF, see Figure 13. All distances are given in kpc. The means, standard deviations and weights of the different peaks can be seen in Table 3.
In the text

	Fig. B.1 Comparison of our photometric distances with Gaia DR3 Bailer-Jones et al. (2021) photogeometric distances for PDR1 (left) and PGS (right) giants catalogue constructed in this work.
In the text

	Fig. C.1 Spatial distribution of galactocentric cartesian Z versus X (top), Z versus Y (middle), and Y versus X (bottom), colour coded by mean metallicity for PDR1 (left) and PGS (right) giants.
In the text

	Fig. D.2 PDR1 (top) and PGS (bottom) 6D Sagittarius stream members in distance versus RA plane with Sagittarius stream tracks adapted from Hernitschek et al. (2017). Leading and trailing arms are indicated with a ’L’ and ’T’ respectively.
In the text

	Fig. E.1 Energy versus z-angular momentum at different slices in metallicities colour-coded by their galactocentric distances for PDR1 6D giants with some known outer halo substructures annotated.
In the text