Open Access
Issue
A&A
Volume 704, December 2025
Article Number A296
Number of page(s) 23
Section Galactic structure, stellar clusters and populations
DOI https://doi.org/10.1051/0004-6361/202554934
Published online 23 December 2025

© The Authors 2025

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.

1 Introduction

Our Galaxy has undergone many mergers and accretion events over its lifetime, and these events are an integral part of the formation process that created the Milky Way we see today. Present day observations of Milky Way stars can reveal information about their origins, as these stars retain chemical and kinematic signatures of their formation environments over long periods of cosmic time (e.g. Helmi et al. 1999; Freeman & Bland-Hawthorn 2002; Bland-Hawthorn et al. 2010). Therefore, with this information it is possible to, for example, determine whether stars were formed inside our Galaxy or if they were formed in another galaxy that was later accreted. In some cases, stars can even be traced back to their common birth clusters (e.g. Meza et al. 2005; Haywood et al. 2018; Myeong et al. 2018a; Naidu et al. 2020). Recent advances in the precision and scale of spectroscopic and astrometric measurements of these Milky Way stars have increasingly allowed for the disentanglement of this complex formation history. Based on this idea, several accreted structures were identified in the Galaxy by their spatial coherence and similarity of stellar properties, for example, the Sagittarius dwarf galaxy and stream (Ibata et al. 1994), the Helmi streams (Helmi et al. 1999; Koppelman et al. 2019b), and the Orphan stream (Grillmair 2006; Belokurov et al. 2007), amongst others.

Since the availability of the European Space Agency’s Gaia mission (Gaia Collaboration 2016) third data release (Gaia DR3; Gaia Collaboration 2021; Lindegren et al. 2021), which has provided astrometric measurements for over 1.8 billion halo stars, several more accreted halo structures have been identified. In particular, these are structures that are less visible due to occlusion from other Galactic components and because they are fainter and more phase-mixed within the stellar halo. Some examples include the Gaia-Sausage-Enceladus (GSE; Belokurov et al. 2018; Helmi et al. 2018; Haywood et al. 2018), Sequoia (Myeong et al. 2019), Thamnos (Koppelman et al. 2019a), and LMS-1/Wukong (Yuan et al. 2020; Naidu et al. 2020; Malhan et al. 2021). In addition to these accreted structures, a recent work by Belokurov et al. (2020) has described a peculiar population of metal-rich stars on highly eccentric orbits that they dubbed the splashed disc, or simply ‘the Splash’, and which the authors suggest is likely the result of a perturbation of Milky Way disc stars by an ancient major merger.

Gaia has also revolutionised the study of stellar streams, long elongated populations of stars that disperse along the orbital trajectory of in-falling progenitors during their interaction with the gravitational potential of the Milky Way. Recent work using the STREAMFINDER algorithm (Malhan & Ibata 2018; Malhan et al. 2018) has resulted in a litany of new stellar stream discoveries, many of which have been followed-up spectroscopically and compiled in a catalogue (Ibata et al. 2021, 2022, 2024). Similarly, the S5 survey targeted several stellar streams in the southern hemisphere and included the spectroscopic follow-up of these streams in its first data release, DR1 (Li et al. 2019). Using the latest available data, Mateu (2023) painstakingly compiled a list of as many stellar streams as possible and computed stream tracks to trace each stream’s proper motion, radial velocity, and sky position, and published this information in a comprehensive library called galstreams. Taken together, these efforts have provided a large homogeneous dataset with which to investigate associations between stellar streams and other structures in the Galaxy.

Accreted structures in the Milky Way halo have been associated with globular cluster (GC) and dwarf galaxy populations based on similarities of their orbital properties, suggesting that the associated clusters were accreted into the Milky Way halo in a common merger event (e.g. Belokurov et al. 2018; Helmi et al. 2018; Myeong et al. 2018b,a; Malhan et al. 2019). Individual stellar streams have also been associated with their progenitors based on similarities in their orbits and chemistry (e.g. Ibata et al. 2019; Malhan et al. 2019; Bonaca et al. 2021).

Furthermore, it has been shown that there are two distinct populations of GCs in our Galaxy: one of accreted clusters, located in the halo and with a distinctly lower metallicity distribution, and one that formed in situ, which is more centrally located in the Galaxy and has a characteristically more metal-rich metallicity distribution (e.g. Zinn 1993). These two populations can be distinguished as two sequences in the age-metallicity relation (e.g. Forbes & Bridges 2010; Leaman et al. 2013), and they show respective disc- and halo-like kinematics and spatial distributions (e.g. Dinescu et al. 1997). More recently with improved kinematics, it has been shown that this separation between in situ and accreted GCs can be made using a selection in the E–Lz space (Belokurov & Kravtsov 2023, 2024) and that groups of GCs can also be connected to specific accretion events (e.g. Massari et al. 2019; Forbes 2020; Malhan et al. 2022; Sun et al. 2023). It has even been suggested that the entire stellar halo may be made up of an accreted substructure (Naidu et al. 2020) and that a substantial portion of the phase-mixed halo stars may be attributable to disrupted GCs (Martell et al. 2016; Gnedin & Ostriker 1997).

Recently, much effort has been directed towards attempting to trace halo stars back to a common progenitor based solely on their chemistry, a technique known as chemical tagging (Freeman & Bland-Hawthorn 2002; Bland-Hawthorn et al. 2010). In order for chemical tagging to be viable, stars born in the same cluster should exhibit homogeneous abundances (Hawkins et al. 2020), and unique chemical signatures of stars from a given cluster must be distinguishable from other clusters and the background signature of the overwhelmingly more numerous halo and disc stars (Price-Jones & Bovy 2019; Cheng et al. 2021). In addition, abundances must be derived with adequate precision such that measurement uncertainties and systematic broadening do not erase the distinguishing abundance signature of the cluster. Indeed, this results in a picture where chemical tagging may be successful on certain progenitors with more distinct or unusual abundance patterns, whereas recovering clusters that have generic halo-like abundances is significantly more challenging, if not entirely unfeasible. This limitation was investigated by Casamiquela et al. (2021), who showed that by using high-precision differential abundances of stars in open clusters, they were able to identify over 40% of member stars for only about one third of open clusters in their sample despite having 16 high-precision elemental abundances as input for their clustering algorithm. They concluded that this was because the overlap in the chemical parameter space was too large for most clusters, unless they had a somewhat unusual abundance signature with respect to field stars.

It has also been shown that clusters and substructures can be associated with a common progenitor based on their kinematic signatures. For example, Myeong et al. (2018b) showed an association between the GSE and eight old halo GCs in energy and action space. In another study, Myeong et al. (2018a) also suggest a tentative association between five newly discovered retrograde halo substructures and the retrograde atypical globular cluster Omega Centauri (ω Cen). Such studies leverage the fact that stars belonging to structures with distinct orbital properties are relatively easy to identify compared to those with orbits closer to the bulk disc and halo populations. Therefore, it is no surprise that most kinematic substructures and associations that we know of consist of stars that are on retrograde orbits or have highly eccentric and energetic orbits. However, a recent work by Sestito et al. (2020) showed that by using [Fe/H] measurements, they were able to find a group of low-metallicity stars that are confined to prograde disc-like orbits and do not venture out of the disc plane. They propose that these may be an ancient accreted population that has interacted in a complex way with the Galactic disc and would have been indistinguishable from the disc if only orbital parameters were considered.

By combining both chemistry and kinematics, the dimension of the discovery space increases and therefore so do the chances of having a unique vector in the parameter space with which to differentiate stars belonging to substructures and field stars. This is the basis for chemo-kinematic tagging, a method in which kinematic and chemical quantities are re-parameterised such that they can be simultaneously used to identify stars clustered in both of these parameter spaces via dimensionality reduction techniques. This technique has recently been used to identify tidally stripped members of ω Cen, even at large distances from the cluster body (Youakim et al. 2023).

In a recent work, Malhan et al. (2022) performed a comprehensive analysis of stellar structure in the Milky Way halo using the latest kinematic data of known stellar streams and stream candidates from the STREAMFINDER survey (Ibata et al. 2021) and catalogues of GCs and satellite galaxies from the literature. They accomplished this by computing the action-angle coordinates (Jr, Jϕ, Jz) and energy for these objects using 6D kinematic information and subsequently searched for groups in the kinematic action plus energy phase-space using a clustering algorithm called ENLINK. They found six main kinematic groups in the halo, which they suggested were each the result of a previous Galactic merger. Five of these had been previously described in the literature, and one was proposed to be a new merger called Pontus, which they subsequently characterised in a follow-up study (Malhan 2022).

In this paper, we take this analysis further by implementing chemo-kinematic tagging as a means to connect halo substructures, adding the independent chemical parameter [Fe/H] as an input into our clustering algorithm. We also include several other conserved orbital parameters, including apocentre distance, pericentre distance, and eccentricity, which despite being correlated with the orthogonal action coordinates are useful in modifying the relative weighting of the input parameters and improving clustering results. We also include a larger number of stream candidates from the updated STREAMFINDER catalogue (Ibata et al. 2024) and the S5 survey’s first data release (Li et al. 2019), and we use the t-distributed stochastic neighbour embedding (t-SNE) dimensionality reduction algorithm. Several previous works have used t-SNE for Milky Way studies predominantly pertaining to tagging and classifying stellar populations (e.g. Traven et al. 2017; Kos et al. 2018; Anders et al. 2018; Traven et al. 2020; Hughes et al. 2022; Santiago et al. 2022; Youakim et al. 2023; Ortigoza-Urdaneta et al. 2023; Cantelli & Teixeira 2024). However, this work marks the first time it is used on such a broad dataset to associate substructures with specific accretion events.

In Section 2, we describe the dataset and data processing, in Section 3, we describe the dimensionality reduction method and input parameters, and in Section 4 we present the results, including the validation of clustered objects found in the clustering analysis. In Section 5 we discuss the identified associations, including large groups, subgroups, and small-scale stream-progenitor associations, and finally we provide our conclusions in Section 6.

Table 1

Summary of the compiled data used in this work.

2 Data

We assembled a catalogue of substructures from several different sources in the literature, which included the most up-to-date data on Galactic GCs, satellite galaxies, stellar streams and kinematic structures. We included all of the data on these objects which had available 6D phase-space and [Fe/H] information. In this section, we provide more details on the data that were used as well as the processing involved to homogenize the dataset and prepare it for input into the t-SNE algorithm.

2.1 Dataset

Table 1 summarises the total compiled dataset used in this work and includes the source of each individual input table taken from the literature. In total, we included 147 GCs, 30 satellite galaxies, 3 kinematic halo structures, and 49 distinct stellar streams, for a total of 229 objects (Table 1 shows 238 total objects since there were seven stellar streams in common between the S5 and STREAMFINDER samples, and LMS-1 and C-19 were both also included in the STREAMFINDER sample).

2.1.1 Halo substructures: GSE, Splash, and Thamnos

In order to identify the regions of the latent space where stars from various halo structures are expected to be located, we used a sample of stars selected in Kushniruk et al. (2024). Using the most recent data release of the GALAH survey (GALAH DR4; Buder et al. 2025), they selected GSE, Splash, and Thamnos stars using a wavelet analysis to identify over-dense regions in the commonly used JrLz$\[\sqrt{\mathrm{J}_{r}}{-}\mathrm{L}_{z}\]$ parameter space and then refined the selection with a cut to select halo stars using the [Mg/Cu] versus [Na/Fe] abundance plane.

2.1.2 Globular clusters

As input for Milky Way GCs, we used the catalogue of Vasiliev (2019), who provide positions, proper motions (from Gaia DR2), radial velocities, and distances for over 150 GCs. From these data we computed the orbital parameters and integrals of motion used in the t-SNE analysis. We also supplemented the kinematics with [Fe/H] values of each GC from the Harris (1996) catalogue of GC parameters. The final sample consisted of 147 GCs with 6D + [Fe/H] data.

2.1.3 Stream sample

For the stream sample, we used the STREAMFINDER catalogue of spectroscopically followed-up stream candidates from Ibata et al. (2024), which includes ~25 000 candidates from 86 stellar streams (some of which are still unconfirmed). We selected all stars which had available radial velocities in the catalogue, which were a combination of STREAMFINDER spectroscopic follow-up and literature radial velocities from publicly available spectroscopic surveys. We assigned metallicities to each of the stream candidates based on stream metallicities published in the literature (see Table 2 of Malhan et al. (2022) for the compilation used as a reference), such that every star belonging to the same stellar stream was assigned the same metallicity.

We assigned distances to each stream candidate using the stellar stream tracks from the galstreams catalogue of Mateu (2023), which is a compilation of parameters made by fitting tracks to available data for almost 100 stellar streams in the literature. We computed the nearest neighbour from the stream track to each stream star from our sample and assigned it the corresponding distance value, resulting in a distance gradient amongst our observed stream sample that matched that expected for a given stellar stream. This effectively removed the highly uncertain distance parameter from the orbit determinations and put all of the weight on the proper motions and radial velocities, which are astrometrically and spectroscopically measured quantities, respectively. This approach is similar to what was used for the STREAMFINDER catalogue, which determined distances self-consistently during the orbit calculations (Ibata et al. 2021). Indeed, our approach yielded distance values that were very similar to those provided in the STREAMFINDER catalogue.

We also included stream candidates identified in the S5 survey (Li et al. 2019), using radial velocities and [Fe/H] measurements selected from the first data release. For this sample we also updated the distances using galstreams tracks as described above. Deriving distances this way resulted in a larger sample of stream stars which could be included in our analysis compared to using only the stars with distance estimations in Ibata et al. (2024) and Li et al. (2019), and had the added advantage that all distances for the whole stream sample were derived in a uniform way. Furthermore, distances provided in DR1 of the S5 Survey were photo-geometrically derived using the StarHorse bayesian inference code (Queiroz et al. 2018; Anders et al. 2022), which relies heavily on Milky Way priors and is therefore not suitable to provide precise individual distances to stream stars.

We selected stream stars from the S5 DR1 catalogue with priority = 9, logg50 < 4.1, and feh50 < −1. This the strictest selection to select stream candidates based on a tight selection in projected proper motion space in the reference frame of the stream, as well as a removal of main sequence and metal-rich stars to remove contamination from nearby disc stars, and was shown to effectively select stream stars in velocity space in Li et al. (2019). We also applied the qso_flag_wise = 0 and good_star = 1 flags as advised by the authors of the catalogue to ensure that the objects were stars and that the photometry was well behaved. For our purposes, we were more interested in purity rather than completeness of the stream samples, and therefore a strict cut was preferred. We also implemented a cut on the standard deviation of the radial velocity to <5 km/s to reduce the uncertainty of the orbit determinations. Finally, we supplemented the sample with detailed spectroscopic follow-up samples from the literature for Wukong (Limberg et al. 2024, from the H3 survey), LMS-1 (Malhan et al. 2021), and C-19 (Martin et al. 2022; Yuan et al. 2022).

The final sample consisted of 1805 stream candidate stars from 43 streams from STREAMFINDER, 603 stars from 12 streams from the S5 survey, (five unique streams not already included in the STREAMFINDER sample) and the additional Wukong, LMS-1, and C-19 samples. Therefore, in the full dataset we included data from a total of 49 distinct stellar streams.

2.1.4 Satellite galaxies

We used the comprehensive compilation of Local Group dwarf galaxies and their parameters from McConnachie & Venn (2020a,b). We made a cut on the sample to only include galaxies for which the measured apocentre was larger than the measured distance (rapo > dist), and the measured distance was less than 250 kpc from the Milky Way Galactic centre. This resulted in a sample of 30 satellite galaxies.

2.2 Data processing and computation of orbital parameters

We used galpy (Bovy 2015) to compute orbits and derive the orbital parameters of each object in our dataset. For the clustering analysis, we chose to use conserved quantities of the orbit, namely actions, energy, eccentricity, and Galactic apo- and peri-centre radii (Jr, Jϕ, Jz, E, e, rapo, rperi), as these quantities are largely conserved over Galactic timescales in a slowly evolving potential.

Although it is true that the entirety of the orbits are described by the three action-angle coordinates Jr, Jϕ, Jz, the orbital energy is also a commonly used parameter for selecting and identifying halo substructures in the literature in the E–Lz plane. Indeed, adding E does not add any additional independent physical information to the action-angle coordinates, but including it can help improve the clustering in the latent space by providing a beneficial weighting of the input parameters. This was previously shown to be the case in Malhan et al. (2022), where they used E in addition to Jr, Jϕ, Jz and found an improvement in the results of their clustering algorithm. We tested this empirically and indeed the inclusion of E, as well as rapo, rperi and eccentricity improved the coherence of structures in the latent space, and made selecting groups easier. In practice, including these extra, redundant parameters resulted in more detailed separation between groups, and tighter clusters of individual groups, particularly on small-scales. Therefore, rather than apply individual weights to the input parameters, which requires subjective fine-tuning and optimization, we chose instead to include these extra orbital parameters to provide a natural weighting.

For the orbit calculations, we used the four component MWPotential2014 described in Bovy (2015). (For a more in-depth discussion on this as well as the detailed properties of the Galactic components used, see Section 2.4 and Table 1 in Youakim et al. 2023.)

3 Methods

For this work, we used an unsupervised manifold learning algorithm called t-SNE to perform a clustering analysis of Milky Way halo substructures. This algorithm was originally developed by Hinton & Roweis (2002) as a general purpose dimensionality reduction tool for visualisation of high-dimensional data. It was later modified to use a t-distribution to initialise parameter weights in the high-dimensional parameter space (van der Maaten & Hinton 2008) and subsequently updated to implement a variation of a Barnes-Hut algorithm to substantially reduce the computation time and allow for the embedding of millions of objects (van der Maaten 2013). In this work, we used the MulticoreTSNE python package (Ulyanov 2016), which is a version of the algorithm optimised to run on parallelised computing systems. The general method for the dimensionality reduction follows a similar procedure to Youakim et al. (2023), and we refer to that paper for the full details.

In short, the dataset was prepared for input into the t-SNE algorithm with a standardization step, transforming the distribution for each parameter to have a median of zero and a standard deviation of one, such that the scaling of each parameter was uniform and equally weighted. In addition, we removed all objects with a Galactocentric apocentre distance rapo > 250 kpc, to remove objects with nonphysical orbital parameters from the orbit computation. We chose not to remove outliers in E, Jr, Lz and Jz as we did not want to exclude high energy dwarf galaxies from the analysis.

The two hyper-parameters relevant for the t-SNE analysis were n-jobs – the number of cores used in parallel while running the algorithm – and perplexity – which controlled the width of the initialised probability distributions used to map from the high-dimensional space to the latent space, and can roughly be summarised as the number of data points expected in a given cluster. Given that we were interested in a range of scales from large kinematic halo structures to small scale stream-progenitor associations, we performed the selection of groups considering three different scales: large structures corresponding to GSE-like mergers, medium sized structures representing sub-groups within a GSE-like merger, and small scales which consisted of at most a few objects and were mostly pairs of individual stellar streams and their progenitors. These selections were informed by considering t-SNE maps computed at several different values for perplexity, ranging from 5 < p < 3000, which are shown in Figure A.1 in the Appendix. At the smallest value of perplexity the latent space was uniformly populated and small clusters were clearly separated, but larger groups were not. On the largest scales, large groups were clearly visible but group boundaries were not well defined, and small sub-groups were not easily identifiable.

4 Results

Figure 1 displays the entire t-SNE latent space with all input data and with a perplexity value of 50. For clarity, we chose to show only the latent space at this intermediate perplexity value, which separated clusters on both large and small scales. All objects and stars are coloured according to their identification in the literature.

4.1 Selection of groups and subgroups

From the t-SNE latent space analysis, we identified 9 distinct groups (bottom left panel of Figure 1), which we further separated into 16 subgroups (bottom middle panel). These groupings comprised of known GCs, satellite galaxies, and stellar streams. Additionally, we investigated 20 small groupings of stellar streams and their potential progenitors (bottom right panel).

The initial selection of groups in the t-SNE latent space was performed visually on the t-SNE coordinates in the latent space projection, identifying regions that appeared kinematically and chemically distinct from neighbouring structures. We used the labels from known Milky Way structures, GCs, satellite galaxies and streams to guide our selection, as our goal was to make associations between known Milky Way substructures, not to perform blind tagging of groups.

We also used t-SNE maps computed with different perplexity values to inform our selections at different scales, which we show in the Appendix as Figure A.1. For example, for the main group selections we predominantly used the p = 300 and p = 500 maps, which clearly show the main larger groups. For the subgroup selection, we used the p = 30, p = 50 and p = 100 maps, which show medium scale clustering, and for the small groups and stream progenitor associations, we used the smallest scale maps of p = 5, p = 10, p = 30 and p = 50 to inform our selections.

For these selections, we took into account several different scales of clustering, the types of objects in a given cluster (e.g. Stars, GCs, satellite galaxies, streams), and the populations of previously tagged stars. Therefore, due to the complexity, use of an automated clustering algorithm for selecting groups was not feasible, and we made these selections by hand.

Table 2 summarises the main groups identified with this selection, along with their correspondence to previously identified structures from the literature. A longer version of this table is available as Table D.1 in the Appendix, which provides the full complement of halo structures, GCs, streams, and satellite galaxies belonging to each of these groups.

Table 2

Selected groups and their corresponding structures.

4.2 Validation of groups and subgroups

To validate these groupings, we examined their distributions across multiple parameter spaces, including E–Lz, Jr–Jz, and [Fe/H] distributions. Figures 2, 3 and 4 demonstrate the coherence of these structures in phase space for the larger groups, subgroups, and stream-progenitor associations, respectively. The bottom panels of these Figures show the mean [Fe/H] for all objects in a given group on the x-axis, with each point on the y-axis showing the difference Δ[Fe/H] = [Fe/H]group–[Fe/H]object for each GC (open squares), satellite galaxy (open circles), STREAMFINDER stream (filled points) and S5 stream (unfilled points), where each stellar stream is a single point representing the average [Fe/H] of the stream. We applied a horizontal offset of 0.01 dex between the streams and clusters within the groups in order to better visualise the distributions. We opted to depict the metallicity distributions this way as the large number of groups and their broad and highly variable metallicity distributions made histograms messy and difficult to interpret.

5 Discussion

Within the nine identified groups, we recognised several previously described accreted halo structures, including GSE (G-2), Thamnos (G-3), and Sequoia/Arjuna/I’itoi (G-4) as well as the three distinct populations of GCs that we identified (see Section 5.4) as bulge (G-6), post-disc formation (G-7), and pre-disc formation (G-8) and a group of satellite galaxies (G-9). We also identified a population of structures linked to a heated disc population (Splash: G-1) and a substantial group of accreted structures (G-5) containing multiple subgroups, including the Helmi streams (sg-10), LMS-1/Wukong (sg-12), Sagittarius (sg-14), and a new potential structure (sg-7) previously discovered in Malhan et al. (2022). In this section, we discuss these identified groups in further detail and put them into context within existing classifications from the literature.

5.1 Main groups

5.1.1 Group G-1 (Splash)

This group of GCs had chemistry and kinematics that were consistent with tagged stars in the splashed disc population. Therefore, it likely contains both in situ and accreted GCs that are on heated, disc-like orbits. We discuss the properties of this group of GCs further and place them into context with the other groups of Milky Way GCs in Section 5.4.

From the purple filled circles in Figure 1, it appears that there could be two subgroups of Splash stars. However, looking at Figure A.1 reveals that these subgroups are not present at any other clustering scale. Indeed, if the perplexity value is less than the size of a given group, then the t-SNE projection is likely to artificially separate the larger group into multiple smaller groups. Therefore, we do not attribute any physical significance to these two subgroups.

thumbnail Fig. 1

Top: t-SNE latent space at a perplexity of p = 50. Stream stars from STREAMFINDER are shown as filled coloured points, and stream stars from S5 as open coloured points, GCs are open coloured squares, satellite galaxies are open coloured circles, halo substructures are filled coloured circles, and extra stream stars from the literature are star symbols, coloured according to the legend. Bottom: t-SNE latent space coloured by the selected groups, with the left panel showing the selection of the main groups, the middle panel showing the selection of the subgroups, and the right panel showing the selection of the stream-cluster progenitors. Each group, subgroup and stream-progenitor pair is labelled by its group number which corresponds to the groups listed in Tables 2, 3, and D.1.

thumbnail Fig. 2

Validation plots for the nine identified groups. Top: kinematic parameter spaces of E–Lz (left panel) and Jr–Jz (right panel). Bottom: differential [Fe/H] of each structure vs the mean [Fe/H] for all structures in the group (see Section 4.2 for a more detailed explanation.) The grey dashed line shows the mean value for each group, and the faint dashed lines show a 0.2 dex dispersion to guide the eye. A horizontal offset of 0.01 dex has been applied between streams and clusters within the groups in order to better visualise the distributions. Structures are coloured consistently with the selection shown in the bottom left panel of Figure 1 and described in Table 2. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are GCs and large open circle symbols are Local Group dwarf galaxies.

5.1.2 Group G-2 (Gaia Sausage-Enceladus)

Based on the sample of GSE stars selected using GALAH DR4 data from Kushniruk et al. (2024), we were able to define the region of the t-SNE latent space in which we expected to find substructures associated with GSE (large blue circles in the top panel of Figure 1). We made this selection using a t-SNE map optimised for large scale clustering, with a perplexity of p = 300. In this larger scale map, the three subgroups of GCs that are separate from the rest of the GSE stars in the p = 50 map (sg-1, sg-3, and sg-5 in the bottom middle panel of Figure 1), are fully connected to the rest of the GSE group. We therefore include these subgroups as part of GSE in Table D.1. In this section we discuss the global properties of this GSE population, and leave the details of investigating these different subgroups to Section 5.2.1.

Several previous works have investigated the substructures associated with the GSE merger. For example, Myeong et al. (2018b) associated 10 GCs to the GSE merger using dynamical arguments. More recent works have associated far more GCs to GSE, with Massari et al. (2019) associating 32 GCs, selected in action space and corroborated using an age-metallicity relation, and Forbes (2020) associating 28 GCs. In contrast, Malhan et al. (2022) associate fewer GCs with GSE, (16–18 GCs and streams) and suggest that the rest of the GCs belong to another structure, called Pontus Malhan (2022). Based on cross-referencing the GC populations with the literature, we identify subgroup sg-5 as Pontus, and subgroup sg-2 as the Kraken (see Section 5.2.1 for more details).

In total, we found 29 GCs and 6 stellar streams to be associated with group G-5, GSE. At face value, this number agrees with the higher estimates from the literature (Massari et al. 2019; Forbes 2020). However, if we exclude the GCs from subgroups sg-2 and sg-5, which likely were brought into the Galaxy through other progenitors, these numbers are reduced to 16 GCs and 4 stellar streams, which is in better agreement with the lower estimates (Myeong et al. 2018b; Malhan et al. 2022). Therefore, the misclassification of Pontus and Kraken/Koala GCs as GSE GCs appears to account for the discrepancy between the estimates in the literature.

thumbnail Fig. 3

Validation plots for the 16 identified subgroups. Structures are coloured consistently with the selection shown in the bottom-left middle panel of Figure 1 and described in Table D.1. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are GCs, and large open circle symbols are Local Group dwarf galaxies.

5.1.3 Group G-3 (Thamnos)

We find two GCs and two stellar streams associated with the Thamnos structure. Notably, ω Cen and its associated stellar stream Fimbulthul are part of this group, as well as NGC 288 and its associated stellar stream. We discuss the plausibility of these stream-progenitor pairs to belong to Thamnos along with a more in depth discussion of the associated progenitor of ω Cen in Section 5.5.

5.1.4 Group G-4 (Sequoia/Arjuna/l’itoi)

The Sequoia structure was first discovered as a retrograde moving group in Myeong et al. (2019). The authors of that work suggested an association with 6 GCs, FSR 1758, NGC 3201, ω Cen (NGC 5139), NGC 6101, NGC 5635, and NGC 6388, as well as a tentative association with the GD-1 stream. In comparison, we found this structure to be associated with NGC 3201, IC 4499, and NGC 6101, but not to the other retrograde GCs. This is consistent with the findings of Malhan et al. (2022), who only associated NGC 6101 and NGC 3201 to this structure.

Bonaca et al. (2021) associated the GD-1, Gjöll, Leiptr, Phlegethon, Wambelong, and Ylgr stellar streams with the Sequoia/Arjuna/I’ itoi merger event. Looking at sg-8 and sg-9 of group G-4 in Table D.1, we support the finding that all of these streams are indeed associated with the Sequoia/Arjuna/I’itoi structure (with the exception of Wambelong which was not included in our sample). In addition, we add the streams Gaia-1, Gaia-6, Gaia-9, Gaia-11, Gaia-12, NGC 1261, NGC 6101 and Kshir to those belonging to the Sequoia/Arjuna/I’itoi structure. In their analysis, Malhan et al. (2022) also found that GD-1, Kshir, and Gaia-9 were associated with this structure.

5.1.5 Group G-5 (accreted structures)

The t-SNE latent space in Figure 1 shows a large group of stellar streams and GCs, as well as some accreted satellite galaxies. This group corresponds to high-energy, large radial and vertical motion, and unbound objects. We therefore refer to this group as the accreted structures, which corresponds to group G-5 in Tables 2 and D.1. We do not expect that all objects in this group are physically related to a common origin, but we do expect that some smaller groups share a host progenitor galaxy. Figure 1 clearly shows a large number of substructures within this larger group. We identified these subgroups through visual inspection of the latent space, paying attention to medium sized structures in the mid-level perplexity maps of p = 30 and p = 50. These selections are shown in the bottom middle panel of Figure 1, and these selected subgroups are summarised in Table D.1 and discussed further in Section 5.2.

We then performed another visual selection in the latent space, this time focusing on individual objects or small groups of a few objects, paying close attention to instances where streams clearly overlapped with GCs or satellite galaxies. These smallest selected groups are visible in the bottom right panel of Figure 1, and are discussed further in Section 5.3.

thumbnail Fig. 4

Top: streams and associated progenitors plotted in Galactic coordinates. Middle: kinematic parameter spaces E–Lz and Jr–Jz are shown in the left and right panels, respectively. Bottom: differential [Fe/H] of each structure vs the mean [Fe/H] for all streams and progenitors in the group. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are for GCs, and large open circle symbols are for Local Group dwarf galaxies.

5.1.6 Groups G-6, G-7, and G-8 (bulge GCs and disc GCs)

Groups G-6, G-7, and G-8 are all compact groups of GCs in the latent space. We assessed whether these represent distinct populations of GCs in Section 5.4.

5.1.7 Group G-9 (satellite dwarf galaxies)

Group G-9 is a tight cluster of satellite Galaxies which were not associated with any other structures. Figure 2 reveals that these are objects with very high energy (E > 25 000 km−2 s−2) and very high vertical action (Jz > 6000 kpc km s−1). This suggests that these satellite galaxies are orbiting in the outskirts of the Milky Way and thus are not likely to be associated with any detectable streams or other structures and are therefore not considered further in the current work.

5.2 Subgroups

5.2.1 Subgroups sg-1, sg-3, and sg-4 (subgroups of GSE)

In Table D.1, we identify five possible substructures within the GSE population (group G-2), namely sg-1, sg-2, sg-3, sg-4 and sg-5. Upon a more detailed analysis during the validation of our selected groups, we determined that subgroups sg-2 and sg-5 did not belong to the GSE, but rather belong to separate accretions events previously described in the literature, namely Kraken/Koala (Kruijssen et al. 2020; Forbes 2020) and Pontus (Malhan 2022), respectively (see Sections 5.2.2 and 5.2.3). This left three remaining possible subgroups of the GSE: sg-1, sg-3 and sg-4.

Figures 2 and 3 show the orbital properties and [Fe/H] distributions of the whole GSE group, and its subgroups, respectively, but they are difficult to see due to the occlusion from all of the other groups and subgroups also shown in these Figures. Therefore, we also included Figure C.1 in Appendix B to more clearly analyze these different subgroups. Each square represents one GC and they are colour-coded by the subgroup to which they belong. The bottom panel shows the [Fe/H] distributions of each group as a kernel density estimation.

In Figure C.1, subgroups sg-1, sg-3 and sg-4 are clustered together, but do show a slight gradient in orbital energy and vertical action, Jr. Subgroup sg-1 also stands out as having the highest orbital energy, and significantly higher Jr than the other subgroups, but only consists of a single GC, Pal 2. Furthermore, we note that in the top panel of Figure C.1, 3 out of 4 (75%) of GCs in sg-3 are in the northern hemisphere, and 9 out of 11 (82%) GCs in sg-4 are in the southern hemisphere.

A recent work by Sun et al. (2023) used kinematics, metallicities and ages to classify Milky Way GCs into different accreted groups and separated the GSE into four separate groups, GSE, GSE-a, GSE-b and GSE-c. After analyzing the detailed properties of these groups, they suggested that the three subgroups were likely unrelated to GSE, but noted that further validation is required to confirm this. Compared to our groupings, the main GSE group of Sun et al. (2023) shares many GCs with our subgroup sg-4, including NGC 362, NGC 1261, NGC 1851, NGC 1904/M 79, NGC 5286, IC 1257, NGC 6981/M 72, and NGC 7089/M 2. None of their other subgroups correspond to either our sg-1 or sg-3 subgroups based on their assigned GCs.

It is important to mention that interpreting present-day Milky Way phase-space is challenging, even with high-precision data. Using N-body simulations, Koppelman et al. (2020) showed that a GSE-like merger with a Milky Way Galaxy can result in a complicated and highly fragmented phase-space structure of the mixed debris. One consequence of this is that sub-populations of substructures with differing dynamical properties of the infalling satellite may present themselves as distinct progenitors in purely dynamical data. This reinforces the need for using chemistry when disentangling complex structures in phase-space, which is the aim of the current analysis. Nevertheless, confirming these distinct populations will require a more comprehensive chemical analysis with additional elements, along with sophisticated modelling of such complex merger events.

Therefore, based on these observations alone, we suggest that this is insufficient evidence to conclude that these subgroups represent real progenitors that accreted into the GSE galaxy before it merged with the Milky Way. However, we do remark that our results, as well as those from Sun et al. (2023), provide compelling evidence to suggest that there may be several sub-populations within the GSE, warranting further, more detailed investigation in future works.

5.2.2 Subgroup sg-2 (Kraken/Koala low orbital energy group)

Given that they are centrally concentrated in the Galaxy and located at distinctly lower orbital energies than the rest of the GSE GCs, sg-2 is likely a group of low-energy GCs that may not belong to GSE, despite their initial classification. In Massari et al. (2019), they found a group of 25 GCs which they failed to label based on their selection criteria, and so dubbed them the unassociated low-energy GCs (L-E). By comparison, our subgroup sg-2 only has six GCs, but we note that all six of them are part of the Massari et al. (2019) L-E sample. The progenitor for this group of low-energy GCs was later re-named by Kruijssen et al. (2020) as the Kraken, and predicted to have brought in 13 GCs into the Milky Way with an initial mass of 108.28 M. In another work, Forbes (2020) fit an age-metallicity relation to these low energy GCs and suggest that 21 GCs belong to this progenitor, which they call the Koala. Of the six GCs in our group sg-2, four of them are in the Koala structure. Forbes (2020) classify NGC 6284 as belonging to GSE, and classify NGC 6121 as an in situ GC due to its age and metallicity, which they point out is consistent with the classification in Horta et al. (2020) which was based on the cluster’s high alpha-element abundance. Furthermore, in their detailed chemical analysis of more than three thousand individual stars associated with GCs, Horta et al. (2020) claim that the low energy structure from Massari et al. (2019) is broadly consistent with an in situ origin, with some individual GCs likely being accreted. This picture is consistent with what we find in the current work, namely far fewer GCs in this group than were found in previous works, with the rest of the GCs from the L-E/Kraken/Koala groups being classified as in situ in our analysis and distributed somewhat evenly amongst groups G-1, G-6 and G-8. Therefore, the number of GCs belonging to the Kraken/Koala structure presented in Kruijssen et al. (2020) and Forbes (2020) is likely overestimated due to contamination from in situ GCs, and therefore the estimated masses for these structures in those works are likely overestimated.

5.2.3 Subgroup sg-5 (Pontus)

As previously mentioned in Section 5.2.1, the subgroup sg-5 has a distinctly more metal-poor metallicity distribution than the rest of the selected GSE subgroups, as is shown by the lime green curve in the bottom panel of Figure C.1. This subgroup has slightly lower orbital energies (E < −25 000 km−2 s−2) and radial actions (Jr < 1000 km kpc s−1) than the other subgroups of GSE clusters, but is very hard to differentiate based on its orbital properties alone.

In a recent work, Malhan et al. (2022) used action-angle coordinates plus energy as the inputs into a clustering algorithm and identified a new structure which they named Pontus. They report seven GCs associated with this progenitor: NGC 288, NGC 5286, NGC 7099/M 30, NGC 6205/M 13, NGC 6341/M 92, NGC 6779/M 56, and NGC 362. In a subsequent work, they performed a more detailed chemical and dynamical analysis of the stellar population of Pontus, and supported the initial finding that it is an independent accretion event separate from GSE (Malhan 2022).

Our selected subgroup sg-5 shares several GCs in common with Pontus, including NGC 6341/M 92, NGC 7099/M 30, and NGC 6779/M 56. We did not find an association with NGC 288, NGC 5286, NGC 6205/M 13, or NGC 362, but additionally found an association with NGC 2298, NGC 6287, NGC 4833, and ESO 280/SC 06, as well as the C-19 stream. All of these clusters are metal-poor, falling in the range of −2.3 < [Fe/H] < −1.8, with the exception of the C-19 stream which is the most metal-poor stellar stream in the Milky Way with [Fe/H] ~ −3.4. Therefore, despite the differences is specific membership of some GCs, we identify subgroup sg-5 as the Pontus structure from Malhan et al. (2022), and report a somewhat different set of associated GCs.

5.2.4 Subgroup sg-6 (ω Cen/Fimbulthul, C-7 and NGC 288)

The associations between NGC 288, ω Cen and their respective stellar streams have been described in detail in the literature (e.g. Shipp et al. 2018; Ibata et al. 2019). Our medium-scale selection in t-SNE latent space selects these four objects plus the C-7 stellar stream (Ibata et al. 2021) in the same subgroup. The light pink points and open squares in Figure 3 shows that this group of objects all occupy the same region in E–Lz space, on retrograde orbits at low orbital energies. Furthermore, they all share very similar metallicities with all of them falling in the narrow range of −1.30 > [Fe/H] < −1.55, which agrees with the association that these structures were accreted with the Thamnos progenitor, which has a broad metallicity distribution with a peak at [Fe/H] ≈ −1.5).

However, the sky distribution in the top panel of Figure 3 shows that the NGC 288 GC and stream are located at the souther Galactic cap, while ω Cen /Fimbulthul and C-7 are close to the Galactic plane, with C-7 overlapping with the Fimbulthul stars just south of the disc. The Galactic distribution of Thamnos stars are restricted to b <|50|, making it highly unlikely that NGC 288 could be part of the debris brought in with Thamnos, barring some unlikely, complex past dynamical interactions that have drastically altered its orbit to a polar one. Therefore, we conclude that ω Cen/Fimbulthul and C-7 were accreted with the same host as Thamnos, but NGC 288 was likely brought into the Galaxy in a different accretion event.

5.2.5 Subgroup sg-7 (candidate merger)

During our selection, we initially decided to include this subgroup as part of group G-4, the Sequoia/Arjuna/I’itoi structure, based on its proximity in higher perplexity maps. However, in all the t-SNE maps, it makes a rather tight cluster in the latent space, warranting further investigation that perhaps this subgroup may be related to a separate accretion event. Indeed, this group is clearly separate from the rest of the Sequoia/Arjuna/I’itoi stars in the E–Lz space, sitting at equally high orbital energy, but distinctly less retrograde in its orbit. This group is also distinguished by larger vertical action in the Jr–Jz panel, on the right-hand side of Figure 3. Finally, it is at a lower mean metallicity than both other subgroups belonging to group G-4.

Interestingly, Malhan et al. (2022) also noted a similar structure in their clustering analysis. They remark that it was not detected by their automatic algorithm, but rather that they noticed it when manually inspecting the clustering data. Their group contains NGC 5466 (GC and stream), NGC 7492, Gaia-10 and Tucana III. By comparison, sg-7 contains all of the same structures with the exception of NGC 7492 (sg-10), and with the addition of NGC 5694, Pal 15, NGC 6934, Pal 13, and the TucIII stream. NGC 6934 and Pal 15 are on prograde orbits and are outliers from the rest of the group in the E–Lz panel of Figure 3, and NGC 5694 and NGC 6934 are also an outliers in Jr–Jz with Jr > 4000 kpc km s−1.

If we ignore the above outliers in orbital properties, the rest of the streams and clusters have metallicities in the narrow range −2.1 < [Fe/H] < −1.9, with the exception of Tuc-III with [Fe/H] ~ −2.5 and Gaia-10 with [Fe/H] ~ −1.4. Therefore, we deem this to be a likely group accreted with a common progenitor. This is in agreement with the conclusions of Malhan et al. (2022), who classify this as a candidate merger, after analysis of the orbits and CMDs of these objects.

5.2.6 Subgroups sg-8 and sg-9 (subgroups of Sequoia/Arjuna/l’itoi)

Now that we have established that subgroup sg-7 is not part of Sequoia/Arjuna/I’itoi, we have two remaining subgroups, sg-8 and sg-9.

Naidu et al. (2020) demonstrated that the retrograde halo can be separated into three structures, Arjuna, Sequoia and I’itoi, which can be differentiated by their metallicities distributions. More specifically, they claim that Arjuna stars are the most metal-rich, occupying the regime of [Fe/H] > −1.5, Sequoia stars are found with −2 < [Fe/H] < −1.5, and l’itoi populating the metal-poor regime at [Fe/H] < −2.

The bottom panel of Figure 3 shows that subgroup sg-9 is the most metal-poor subgroup of group G-5, with [Fe/H] < −2.1. Therefore, we propose that GD-1 ([Fe/H] ~ −2.5) and Kshir (sg9) belong to the I’itoi structure, while subgroup sg-8 represents the Sequoia accretion event. However, looking into more detail at the [Fe/H] distribution in the bottom panel of Figure 3, the orange points of sg-8 show a broad metallicity distribution, with some outliers. When we investigated this further, we found that the three of the most metal-poor streams in sg-8, Leiptr ([Fe/H] ~ −2.17, Ylgr ([Fe/H] ~ −2.09), and Gaia-12 ([Fe/H] ~ −3.28) all fall in the region of the latent space between sg-8 and sg-9, at [tsne-x, tsne-y] = [20, 30] in Figure 1. Therefore, we propose that these metal-poor streams were initially selected in the wrong subgroup, and are actually part of the I’itoi structure rather than Sequoia.

5.2.7 Subgroups sg-10 and sg-11 (Helmi streams)

It has been suggested that the Helmi streams are an important donor to the Milky Way halo, contributing ~15% of its mass in field stars and 10% of its GCs (Koppelman et al. 2019b). Furthermore, Koppelman et al. (2019b) suggest through dynamical arguments that the Helmi streams are associated with seven halo GCs: NGC 4590, NGC 5024, NGC 5053, NGC 5272, NGC 5634, NGC 5904, and NGC 6981.

Given that subgroups sg-10 and sg-11 contain NGC 5024, NGC 5272/M 3, and NGC 5634, it is likely that these groups are related to the Helmi streams. However, we note that more than 50% of GCs associated with the Helmi streams in Koppelman et al. (2019b) and Forbes (2020) are assigned to different structures in our analysis. Furthermore, sg-10 and sg-11 have different Jz distributions in the right panel of Figure 3, which may indicate that sg-11 is connected to a different accretion event separate from the Helmi streams.

As for the remaining GCs expected to be associated with the Helmi streams, NGC 4590 and its associated stellar stream Fjörm are in sg-15, located far away in the latent space from sg-10 and not associated with any larger structure. NGC 5904 and NGC 6981 are associated with GSE, although they are located spatially near to sg-10 in the latent space. NGC 5053 is part of subgroup sg-12 which is adjacent to sg-10 and sg-11 in the latent space but clearly a separate structure, and is associated with the LMS-1/Wukong structure.

5.2.8 Subgroup sg-12 (LMS-1/Wukong)

Due to the inclusion of the tagged samples of know members from Wukong (Limberg et al. 2024) and LMS-1 (Malhan et al. 2021), we identified subgroup sg-12 as the LMS-1/Wukong structure. In addition to LMS-1 and Wukong stars, we also associated the metal-poor Phoenix and Kwando streams, with [Fe/H] ~ −2.7 and −2.3, respectively. We also found the GC NGC 5053 to be associated with this group, which is corroborated by previous results which have reported a connection between this GC and the LMS-1 stream based on their similar orbital properties (Yuan et al. 2020; Malhan et al. 2021).

Malhan et al. (2022) report three other GCs and five other stellar streams associated with this group, namely NGC 5272/M 3, NGC 5024/M 53, Pal 5, C-19, Indus, Sylgr and Jhelum. In our current analysis, the stream-progenitor pair SP-9 of Jhelum and NGC 5024/M 53 is adjacent to subgroup sg-12 in the latent space and therefore could be connected to the LMS-1/Wukong structure. However, in our current analysis, NGC 5272/M 3 is associated with subgroup sg-10, the Helmi streams, which is in agreement with previous findings, although those also associate NGC 5024/M 53 and NGC 5053 with the Helmi streams (Massari et al. 2019; Koppelman et al. 2019b). This emphasises the challenge with associating GCs to specific accretion events, and highlights the different results that occur when different selection criteria are implemented.

Finally, we note that Malhan et al. (2022) associate the metal-poor streams Sylgr, Phoenix and C-19 to LMS-1/Wukong and therefore describe it as the most metal-poor accretion event in the Milky Way’s history. However, they do not associate the metal-poor stream Kwando, which we add to the metal-poor streams associated with LMS-1/Wukong based on our analysis.

5.2.9 Subgroup sg-14 (Sagittarius)

This subgroup contains twelve GCs, three stellar streams and seven satellite galaxies. The GCs that we associate with this group are very similar to previous the GCs associated with the Sagittarius progenitor in previous studies (Massari et al. 2019; Forbes 2020; Malhan et al. 2022). Interestingly, Malhan et al. (2022) only associate one stream, Elqui, to the Sagittarius structure. By comparison, we do not associate the Elqui stream, but instead the Sagittarius stream, Indus stream, and NGC 5466 stream. These streams are entirely different than those associated with Sagittarius based on their orbital characteristics by Bonaca et al. (2021), namely Aliqu-Uma, ATLAS, Fjörm, Slidr and Turranburra. This discrepancy highlights the inherent uncertainties of associating streams to progenitor galaxies, and reinforces the need for higher precision measurements of streams and their progenitors to improve orbital calculations.

Looking at Table D.1, we also associated several satellite galaxies to the Sagittarius structure during the selection of subgroups in the t-SNE latent space. However, given that sg-14 is the structure with the highest orbital energy of all of the identified structures (see lilac symbols in Figure 3), it is highly probable that the high energy satellite galaxies only clustered in the same region of the latent space coincidentally. Therefore, we do not consider these satellite galaxies to be physically associated with the Sagittarius structure.

5.3 Stream-progenitor associations

We also identified small-scale structures representing stream-progenitor pairs that connect stellar streams with their original GCs or dwarf galaxies. These small scale structures provide crucial insight into the accretion and disruption processes shaping the Milky Way’s halo. We also allowed for the possibility of small associations of multiple streams and GCs, or a single stream and multiple GCs. These associations, shown in the bottom right panel of Figure 1, are summarised in Table 3, with identifiers designated as SP-X (Stream-Progenitor-X).

Figure 4 shows the sky distribution of these stream-progenitor systems in Galactic coordinates, as well as the E versus Lz and Jr versus Jz planes, and the [Fe/H] distributions of the stellar streams and the associated progenitor GCs or satellite galaxies. Typically, if streams and progenitors share a common origin, they should be spatially coherent along the stream’s orbital trajectory, share similar orbital properties, and have similar metallicity distributions. Of course, we expect that the identified groups should be somewhat clustered in these parameter spaces, since these were the input parameters used to generate the t-SNE latent space from which the groups were identified. However, here we investigate in detail these parameter spaces in order to validate these associations, and discuss the plausibility of these identified groups having a shared Galactic origin.

We first selected all small groups in the t-SNE latent space where stellar streams and GCs or satellite galaxies appeared to clustered in the latent space. We made this selection by hand using the smallest scale t-SNE maps, with perplexity of p = 5, p = 10, and p = 30, and intentionally selected all plausible associations in this step so as to favour completeness over purity. We then removed all streams from each group where the fraction of the stream stars was less than 30%, which excluded streams for which only a few stars were randomly scattered into the selected group. The selection of these objects is shown in the bottom right panel of Figure 1, and the selected groups are summarised in Table 3.

We recovered several previously identified pairs of streams and progenitors from the literature. These include: ω Cen-Fimbulthul, Gjöll – NGC 3201, NGC 6101 (GC) – NGC 6101 (stream), Fjörm – NGC 4590/M68, Svol – NGC 5272/M 3, NGC 288 (GC) – NGC 288 (stream), LMS-1 – NGC 5053, NGC 6397 (GC) – NGC 6397 (stream), Tucana III (DG) – Tuc-III (stream), NGC 6341/M 92 (GC) and M 92 (stream), NGC 2808 (GC) – NGC 2808 (stream), and NGC 1851 (GC) – NGC 1851 (stream). These are indicated in Table 3 in the discovery column with a listing of the sources in the literature where the association was first presented.

In this table, we also included a qualifier for how confident the detection of a given stream was. In order for a stream-progenitor pair to have been given a value of ‘strong’ in the detection column, they must have satisfied all three of the following criteria: overlapping in E–Lz and Jr–Jz space, metallicity values within 0.2 dex for stream/GC pairs or 0.5 dex for stream/satellite galaxy pairs, and clear proximity on the sky of stream tracks and progenitor positions. Associations that satisfied two of the three criteria were labelled ‘tentative’, associations that satisfied only one of the criteria were labelled as ‘weak’, and objects which were found not to be associated after further analysis were given a label of ‘none’.

The top panel of Figure 4 shows the sky positions in Galactic coordinates of each of the selected stream-progenitor associations, colour-coded based on the selected SP-X groups in the bottom right panel of Figure 1. We also plot the galstreams tracks underneath the stream stars so as to show the expected orbital trajectory for each stream, which allowed for the confirmation of the spatial connection in some instances when the progenitor was located far away from the stream stars, which was the case for Gjöll – NGC 3201, Fjörm – NGC 4590/M68, and Svol – NGC 5272/M 3.

In addition to the stream-progenitor pairs previously described in the literature, we also present several new stream-progenitor associations in this work.

Table 3

Stream-progenitor groups from the smallest scale selection in the t-SNE latent space.

5.3.1 Hrid – Pal 2 and NGC 7006

The SP-1 group consists of the Hrid stream (Ibata et al. 2021) and the two GCs Pal 2 and NGC 7006. The three objects form a very tight group in the latent space, and also trace the same regions in the E–Lz and Jr–Jz diagrams in the middle panels of Figure 4. However, Hrid has a spectroscopically measured metallicity of −1.13, whereas the GCs Pal 2 and NGC 7006 are more metal poor with [Fe/H] = −1.42 and −1.52, respectively. Although the [Fe/H] measurements are only discrepant by 0.3–0.4 dex, we would expect them to be closer if either Pal 2 or NGC 7006 were the progenitor of the Hrid stream, given that they are GCs and have narrow spreads in [Fe/H]. However, the [Fe/H] measurements for the Hrid stream is based on a small number of stars and therefore may be refined to a slightly lower metallicity in with future follow-up efforts. The spatial distribution of these three objects in Figure 4 shows that they are all located nearby on the sky, but that neither Pal 2 nor NGC 7006 lies on the stream track of the Hrid stream available from current data. Therefore, the current data are suggestive, but not sufficient to support the that either of Pal 2 or NGC 7006 are the progenitor for the Hrid stream. We therefore give this association a classification of ‘weak’ in Table 3. It is, however, likely that these three objects were brought into the Galaxy with a common progenitor.

5.3.2 NGC 6101 – Gjöll – NGC 3201

Both the NGC 6101 GC – stream (Ibata et al. 2021) and the Gjöll – NGC 3201 (Bianchini et al. 2019; Riley & Strigari 2020; Hansen et al. 2020b) pairs have been previously individually described in the literature, but not previously associated as a larger complex. In our current analysis, all four of these objects were selected together as part of subgroup sg-10 and therefore may be connected to a common progenitor. In the top panel of Figure 4, the two stream-progenitor pairs do not obviously trace similar orbits in galactic coordinates, but do have somewhat similar metallicities, with [Fe/H] = −1.98 for NGC 6101 (Ibata et al. 2021), −1.63 for Gjöll (Ibata et al. 2021) and −1.59 for NGC 3201 (Vasiliev 2019). Therefore, we propose that these two GCs and streams came into the Galaxy with the same progenitor that formed the retrograde Sequoia/Arjuna/I’itoi structure.

5.3.3 Ophiucus – NGC 5634, NGC 7492

The Ophiucus stream (Bernard et al. 2014) is a short, thin stream in the Northern Galactic hemisphere. Sesar et al. (2015) performed spectroscopic follow-up and identified 14 members of the stream, from which they inferred that the stream originated from an old, metal-poor globular cluster progenitor with an age of 11.7 Gyr and [Fe/H] = −1.95. However, they ultimately do not detect any overdensity along the stream and conclude that the stream is all that is left of the progenitor.

In our current analysis, we find two GCs that may potentially be the progenitor of the Ophiucus stream, namely NGC 5634, and NGC 7492. Looking at SP-3 in Figure 4, both of these GCs have a plausible connection to Ophiucus based on their positions in E–Lz and Jr–Jz, and also their [Fe/H] values. Ophiucus has [Fe/H] = −1.98 (Ibata et al. 2021), which is very much consistent with the values of [Fe/H] = −1.88 and −1.78 for NGC 5634 and NGC 7492, respectively. Even the Galactic coordinates of both GCs could be aligned with the orbital path of Ophiucus, although they are both located well beyond the streams visible extent on the sky.

Generally, it is expected that the velocity dispersion of the progenitor should be less than or equal to that of the tidal tails of the stream. Therefore, NGC 5634 can be ruled out on the basis of its high velocity dispersion of 5.3 km s−1, compared to that of Ophiucus which is 1.6 ± 0.4 km s−1 (Caldwell et al. 2020). NGC 7492 also has a velocity dispersion of 1.6 km s−1, but is also located much farther away on the sky from the Ophiucus stream stars in the Southern hemisphere, reducing its likelihood of being connected to Ophiucus.

Lane et al. (2020) conducted detailed N-body simulations based on the current observed properties of the Ophiucus stream, and suggest that the progenitor was likely a faint, weakly bound GC with a mass of 2 × 103 M or less, a half-mass radius in the range 60–100 pc, and began disrupting just 360 Myr ago. Of course, this assumes that most of the streams material is contained within its present day location, and that no progenitor is currently visible. However, this further reduces the possibility of either NGC 5634 or NGC 7492 being the progenitor. We therefore give both of these a detection significance of weak in Table 3.

5.3.4 Jhelum – NGC 5024

The t-SNE projection in Figure 1 shows the Jhelum stream and NGC 5024/M 53 clearly overlapping in the latent space. Further investigation of the orange symbols in Figure 4 shows that these objects are overlapping in the E–Lz and Jr–Jz plots and that they have a remarkably similar metallicity, both with [Fe/H] ~ −2.1. The top panel of Figure 4 shows that Jhelum is located in the southern hemisphere, while NGC 5024/M 53 is located at the north pole. Given that it appears both objects are on polar orbits, this does not rule out a connection between them although they are located far away from one another on the sky.

In a previous work, Malhan et al. (2021) proposed a connection between LMS-1, NGC 5053, Indus and NGC 5024. They specifically pointed out that since Jhelum was unrelated to LMS-1, it effectively implied that it was also unrelated to Indus and NGC 5024. However, this result is in contention with a recent analysis that indicated a common origin of the Indus and Jhelum streams (Bonaca et al. 2021), and several previous works that have suggested that Indus and Jhelum have very similar orbital properties and thus may represent different orbital wraps of the same stream (e.g. Shipp et al. 2018; Bonaca et al. 2019).

Therefore, due to their very similar orbits and metallicities, but lack of a direct spatial connection between Jhelum and NGC 5024/M 53, we propose that NGC 5024/M 53 is the progenitor of the Jhelum stream, and classify this as a tentative association in Table 3.

5.3.5 Elqui – Segue I

Elqui is a metal-poor and has a broad metallicity distribution, with [Fe/H] ~ −2.22 ± 0.27 (Li et al. 2022). The spectroscopic sample used in this work has a range of −3.1 < [Fe/H] < −1.5. Therefore, due to its broad metallicity distribution, Elqui is likely to be the stellar stream of a dwarf galaxy progenitor (Ji et al. 2020; Li et al. 2022). Segue I is a dwarf galaxy with a measured metallicity of [Fe/H] ~ −2.72, a rperi = 21 kpc, rapo = 82 kpc, and e = 0.6 according to our orbital analysis. Elqui has 8 kpc < rperi < 20 kpc, 50 kpc < rapo < 85 kpc, and 0.5 < e 0.7, making their orbital properties very similar.

Figure 4 shows that Elqui and Segue I have similar radial and vertical actions, but fairly discrepant angular momenta. Elqui is on a near polar, slightly retrograde orbit while Segue I is on a highly retrograde orbit. Furthermore, Elqui is currently located at the southern pole of the Galaxy, while Segue I is in the northern hemisphere. Taken together, despite the orbital properties of Elqui and Segue I being somewhat similar, and their compatible [Fe/H] distributions, the likelihood of Segue I being the progenitor of Elqui is weak given the difference in angular momenta of their orbits.

5.3.6 Indus – NGC 2419

In the current analysis, we find a grouping SP-15 which includes Indus, NGC 2419, and Pal 12. The top two panels of Figure 4 show that these three objects are located on a prograde orbit at very high orbital energy (E > 20 000 km−2 s−2), and large values of vertical and radial action.

Based on sky position, Pal 12 is favoured as the potential progenitor for Indus, which is located at [l, b] = [−30°, −50°], while NGC 2419 is very far away at [l, b] = [−180°, 25°]. However, based on the [Fe/H] value of Indus of [Fe/H] ~ −2.1, NGC 2419 is favoured with [Fe/H] ~ −2.15 while Pal 12 is ruled because it is much more metal-rich at with [Fe/H] ~ −0.85. Therefore, we exclude Pal 12 as a possible progenitor for Indus, and suggest that NGC 2419 is an unlikely progenitor, but likely was accreted in the same accretion event as Indus.

5.3.7 Slidr – Rup 106

In the bottom two panels of Figure 4, the Slidr stream and Rup 106 GC both have similar energy and Lz, but very different vertical actions, Jz. The bottom panel shows a negligible metallicity difference, with Slidr having [Fe/H] ~ −1.7 (Ibata et al. 2024) and Rup 106 [Fe/H] ~ −1.68, which is likely why they clustered so tightly in the latent space. However, their sky positions do not support a likely stream progenitor connection. Therefore, due to their spatial separation and differences in Jz, it is not likely that Rup 106 is the progenitor of the Slidr stream, although these two objects may have come into the Milky Way with a common host.

5.3.8 Orphan-Chenab – Grus II

The Orphan stream (Grillmair 2006; Belokurov et al. 2007) is a very long cold stream in the Milky Way halo that has been extensively studied, but despite many investigations no obvious progenitor has been found for this stream.

A recent study by Koposov et al. (2019) used a sample of RR Lyrae stars to map out the stream and made some estimates of its progenitor properties. They estimate that the progenitor mass is most likely to be ~4 × 106 M, with a luminosity of LV = 3.8 × 105 L. In another recent work, Hawkins et al. (2023) performed a detailed chemical analysis of members of the Orphan stream and concluded that based on the [Mg/Al] abundances, as well as the metallicity spread, the Orphan stream is not likely to have a GC progenitor but rather a dwarf spheroidal galaxy, and they provide a mass estimate of this progenitor of ~106 M.

Several candidates have previously been investigated as potential progenitors for the Orphan stream, including Segue I and UMa II (Fellhauer et al. 2007; Newberg et al. 2010), but neither of these have resulted in a convincing connection to the stream. More recently, Koposov et al. (2019) proposed a connection between Grus II and the Orphan stream when they noticed a clear coincidence in sky coordinates, proper motion, and velocity space. They remarked that the distance of Grus II is about 10 kpc farther than the debris from the Orphan stream at the same location. Though they conclude that a connection is likely, they still suggest that further chemical analysis would be required to confirm this.

Our chemo-dynamical analysis with t-SNE also showed a clear connection between the Orphan stream and the Grus II ultra-faint dwarf galaxy. Figure 4 shows that both the Orphan stars and Grus II share very high E and Lz values in the top right portion of the plot. In the Jr–Jz plane, however, Grus II has a significantly higher value of vertical action than the rest of the Orphan stream stars. Grus II has [Fe/H] = −2.5 (Simon et al. 2020), which makes it more metal-poor than the Orphan stream with [Fe/H] = −1.85. This fact alone is not problematic, as the more metal-poor Grus II could still lie in the metal-poor tail of the metallicity distribution of the Orphan stream. However, the fact that Orphan stream is more metal-rich than its potential progenitor cluster does pose an issue for a connection between the two objects, as pointed out by Prudil et al. (2021). They reasoned that as Grus II interacted with the Milky Way its outer envelope would have been shed to produce the Orphan stream. Given that these stars are more metal-rich than the remaining core of the UFD, it follows that for Grus II to be the progenitor of the Orphan stream, it would need to have an inverse metallicity gradient. Although inverse metallicity gradients have been observed in high-red shift galaxies (Grossi et al. 2020), they have not yet been observed in any galaxies in the Local Group.

In a recent work, Hansen et al. (2020a) performed a high resolution analysis of three stars in Grus II, and found very low metallicities and enhanced [Al/Fe] for these stars. Although, with just three stars with detailed abundance measurements, a comprehensive chemical comparison to the Orphan stream sample is not currently possible. However, if future follow-up of Grus II stars show evidence of an inverse metallicity gradient in the UFD, this would highly favour the scenario that Grus II is the progenitor of the Orphan stream.

Koposov et al. (2023) performed an in-depth analysis of the stream properties for the Orphan-Chenab stream, and conducted backward integration orbital analysis of all Milky Way satellite galaxies and GCs. They found that the Orphan-stream was consistent with several close passages with Grus II (≲0.1 kpc, and at low relative velocities of ≲50 km s−1) and therefore suggested that Grus II could have been bound to the same original host as the Orphan-Chenab stream.

Taken with these previous investigations, the current results strongly support a connection between the OC stream and the Grus II UFD and thus also support the scenario that they share the same original host galaxy. Nonetheless, these results are insufficient to claim that Grus II is the progenitor of the OC stream. However, future work to better understand the metallicity gradient in Grus II could offer new insights into this scenario.

5.4 Globular cluster populations

One of the most interesting outcomes of our clustering analysis is the identification of eight seemingly distinct groups of GCs. Each of these groups either clearly belongs to a tagged population (for example Splash, GSE, Thamnos, Sequoia/Arjuna/I’itoi) or form tight groups in the latent space that are clearly separate from any other groups. Therefore, our selection is a robust and effective method for separating Milky Way GCs into similar groups, with only a few cases where GCs fall in between selection groups. Below we discuss how these groups of GCs compare to previous classifications in the literature.

In a recent work, Belokurov & Kravtsov (2023) define a selection criteria in E–Lz space that is calibrated using [Al/Fe] abundances, which is an empirically derived boundary drawn in this space, above which the accreted GCs are located and below which the in situ GCs are located. Looking at the lower left panel of Figure 2, our selection also broadly separates the in situ and accreted groups into different regions of E–Lz space, which qualitatively resembles the selection made in Belokurov & Kravtsov (2023). In that work, they argue that this simple selection is very effective at separating the two populations, but with some small amount of contamination, for example that they label ω Cen and NGC 6273 as in situ, despite both of these GCs being likely remnant nuclear star clusters of accreted galaxies. In our selection, we correctly identify ω Cen as an accreted GC in group G-3, Thamnos, but also erroneously classify NGC 6273 as in situ as part of group G-8.

Broadly speaking, groups G-1 (Splash), G-6, G-7, and G-8 make up the in situ GCs, and groups G-2 (GSE), G-3 (Thamnos), G-4 (Sequoia/Arjuna/I’itoi), and G-5 (accreted structures) make up the accreted GCs. This is consistent with our previous classification of these groups in Table 2, and is apparent from the orbital parameters and [Fe/H] distributions shown in Figure 2. More specifically, all GCs in groups G-1, G-6, G-7 and G-8 lie at highly bound orbital energies, are on disc-like, prograde orbits with low radial and vertical actions, and have higher metallicities than the other groups, with all of them having mean [Fe/H] > −1.3, with the exception of the Splash group which has a mean [Fe/H] ~ −1.78. In total, of the 147 GCs in our sample, we find 80 of them (54%) to be consistent with an in situ origin, 64 (44%) to be accreted, 2 (1.3%) to be unbound, and 1 (0.7%) unclassified. By comparison, Belokurov & Kravtsov (2024) find a somewhat lower accreted percentage of 35%, Sun et al. (2023) find a percentage of 38.4%, Forbes (2020) find a percentage of 54%, and Massari et al. (2019) find 60%. We note that of the 44% of Milky Way GCs that we report to be accreted, we are assuming all of the GCs in group G-5 are accreted, even though we can only attribute some of them to known accretion events.

For the accreted groups belonging to G-2, G-3, G-4, and G-5, we can attribute the majority of these (59 GCs or 41% of the GC sample) to individual accretion events, which we expect to have distinct chemical and dynamical properties. In addition, the different in situ groups also show different characteristics. For example, the bottom panel of Figure 2 shows different values for the mean metallicity of these groups, although there is a very large spread in these distributions and groups G-6, G-7 and G-8 are all overlapping, but the Splash group G-1 is noticeably more metal-poor than the others. Furthermore, there appears to be some separation in the E–Lz space, with groups G-6 and G-7 sitting lower in the Galactic potential at lower values of orbital energy than groups G-1 and G-8, although this is difficult to see due to crowding in this region of the plot. In order to investigate these GC populations beyond Figure 2, we also supplemented this analysis with Figure B.1 in Appendix B showing the sky distribution in Galactic coordinates, and a plot of the Galactic apocentre and pericentre distances for all of the GCs in the sample.

Group G-6, the brown squares in Figure C.1 are centrally located in the Galaxy, with all of the GCs in this group having rapo ≲ 4 kpc, thus representing the bulge GCs. This group makes up 12% of the GC sample, which is somewhat lower than the 23% that Massari et al. (2019) attribute to the bulge, where they consider all GCs with rapo ≲ 3.5 kpc to be part of this population. We remark that it is interesting that when using a multi-dimensional classification approach, several GCs with small apocentre distances are classified into other groups. Future work should investigate if this corresponds to differences in chemistry and ages in these populations as well.

Group G-7 (dark green squares) extends beyond the Galactic centre, and consists of GCs with orbits that are largely circular, with 1 > e > 0.5. Groups G-8 and G-1 are both centrally located in the Galaxy, although they extend beyond group G-6, out to rapo ≲ 10 kpc and have somewhat eccentric orbits with e ≃0.5.

Using tangential velocities distributions, Belokurov & Kravtsov (2024) have suggested that the in situ Milky Way GC population can be separated into two groups. The first, GCs at [Fe/H] < −1.3 showing low tangential velocities likely formed during turbulent pre-disc stages of Milky Way evolution, while the second, GCs at −1.3 < [Fe/H] show much higher tangential velocities, suggesting that they formed after the formation of the disc and therefore show this disc spin-up signature. This could explain the formation mechanism of our observed groups G-7 and G-8, where a later formation time of GCs in G-7 would explain their more circularised orbits and distribution that extends farther out from the Galactic centre. Therefore, we define the in situ GCs from group G-7 as the post-disc GCs, and the GCs from group G-8 as the pre-disc GCs, signifying that they were likely to have formed after and before the formation of the Milky Way disc, respectively.

5.5 Omega Centauri

The origin of ω Cen (NGC 5139), the most massive globular cluster in the Milky Way, has for many years been the subject of considerable debate. Recently, Massari et al. (2019) and Bonaca et al. (2021) have proposed that ω Cen could be dynamically associated with the GSE merger event, or be the nuclear core of the GSE galaxy. Alternatively, Myeong et al. (2019) and Forbes (2020) have suggested an association between ω Cen and the Sequoia structure, and proposed that ω Cen might represent either the most massive GC of the Sequoia progenitor, or the nuclear core of the progenitor itself.

At the time of the discovery of Sequoia, Myeong et al. (2019) described the structure as extending down to low orbital energies consistent with those of ω Cen. However, a subsequent work by Koppelman et al. (2019a) later showed that Sequoia could be broken up into at least three different structures, Sequoia, Thamnos 1 and Thamnos 2, with the Thamnos structures occupying a lower orbital energy than Sequoia. They supported this result based on the differing abundances between the three structures and the expected mass of the Sequoia not being large enough to account for such a large distribution in E–Lz space. Based on clustering in integrals of motion space, Thamnos 1 and Thamnos 2 were later proposed to be one single structure, but with a complex chemistry representing a mix of stellar populations. Ultimately, Koppelman et al. (2019a) conclude that ω Cen is below the orbital energy of Thamnos in the E–Lz space and do not claim a connection between them.

Our current analysis, however, supports that the chemodynamic properties of ω Cen and Fimbulthul align most closely with the Thamnos structure (Group G-3). This is supported by the similar, broad metallicity distributions of ω Cen and Thamnos (with peak [Fe/H] ≈ −1.5), as well as similar orbital parameters, particularly that both Thamnos and ω Cen are on retrograde orbits and have lower orbital energies than those of Sequoia stars.

Furthermore, the proposed mass of the Thamnos structure is M* < 5 × 106M, which is remarkably similar to the present day stellar mass of ω Cen at M* = 4 × 106M. This would imply that if this scenario is true, ω Cen would have lost half of its mass during its interactions with the Milky way which is now visible as the Thamnos structure.

6 Conclusions

This work represents a large-scale clustering analysis of known substructures in the Milky Way halo. We gathered a dataset consisting of 6D phase space + [Fe/H] for 147 GCs, 30 satellite Local Group galaxies, candidate stars from 49 unique stellar streams from the STREAMFINDER, and S5 spectroscopic follow-up samples supplemented with data for the LMS-1, Wukong and C-19 streams and also clean selected samples of the GSE, Splash, and Thamnos halo structures from the GALAH DR4 data (from Kushniruk et al. 2024).

We computed the orbital parameters E, Jr, Jz, Lz, rapo, rperi, and eccentricity using Galpy, which we complimented with [Fe/H] values from available spectroscopic observations. We then standardised these seven parameters and used them as input into the t-SNE dimensionality reduction algorithm. Although the orbits are fully characterised by the three action-angle parameters Jr, Jz, and Lz, we find that adding additional input parameters that are important for separating groups in dynamical space added relevant weighting to the inputs that empirically improved the clustering in the latent space, and therefore we chose to include them. We selected groups of these structures identified from the latent space as being related in their kinematics and chemistry, resulting in 9 main groups, 16 subgroups, and 20 stream-progenitor associations, as summarised in Table D.1 and Table 3. We then discussed these groups in the context of previously described structures from the literature and the implications of these findings for our understanding of the accretion history of the Milky Way halo. The main findings from this analysis are as follows:

  • We recovered several previously described halo structures, including GSE, Thamnos, Sequoia/Arjuna/I’itoi, LMS-1/Wukong, Sagittarius, Helmi streams, Kraken/Koala, and Pontus;

  • We also confirmed several established stream-progenitor associations, including ω Cen-Fimbulthul, Gjöll – NGC 3201, NGC 6101 (GC) – NGC 6101 (stream), Fjörm – NGC 4590/M 68, Svol – NGC 5272/M 3, NGC 288 – (GC) NGC 288 (stream), LMS-1 – NGC 5053, NGC 6397 (GC) – NGC 6397 (stream), Tucana III (DG) – Tuc-III (stream), NGC 6341/M 92 (GC) – M 92 (stream), NGC 2808 (GC) – NGC 2808 (stream), and NGC 1851 (GC) – NGC 1851 (stream);

  • We report several new tentative stream-progenitor associations: Hrid – Pal 2, Ophiucus – NGC 5634, Jhelum – NGC 5024/M 53, Elqui – Segue I, and Slidr – Rup 106;

  • Most notably, we found a connection between the Orphan stream and the ultra-faint dwarf galaxy Grus II, which after further analysis we conclude are unlikely to be a true stream-progenitor pair but were likely to have shared a common progenitor;

  • We determined an accreted GC fraction of 44%, which is in line with previous determinations. Furthermore, we identified four distinct groups of in situ GCs: bulge GCs (G-6), post-disc GCs (G-7), pre-disc GCs (G-8), and GCs associated with the splashed disc population (G-1);

  • We found evidence of a substructure within the GSE accretion event, namely, five potential subgroups (sg-1, sg-2, sg-3, sg-4, and sg-5). Upon further investigation of orbital characteristics and metallicity distributions, we conclude that subgroups sg-2 and sg-5 are not part of the GSE but are the Kraken/Koala and Pontus structures, respectively. Of the remaining subgroups, we found insufficient evidence to show clear separate populations within the GSE clusters, but our results still suggest that GSE itself may have had internal structure before merging with the Milky Way, and this should be investigated further in future work;

  • We found a chemo-dynamical association between ω Cen and the Thamnos structure. Previous works have suggested that ω Cen is dynamically associated with GSE (Bonaca et al. 2021), while others have suggested that it could be the remnant core of the retrograde Sequoia/Arjuna/I’itoi (Myeong et al. 2019; Forbes 2020). However, in this work, we find that the chemo-dynamics of ω Cen are most consistent with it being associated with Thamnos and suggest that it may even be the remnant core of the progenitor that brought in the Thamnos stars. Future, more detailed N-body simulations and chemical analyses, for example of neutron capture elements, could shed light on this hypothesis.

Associations between substructures in the Galactic halo provide valuable information for the accretion history of our Galaxy and the identification of common progenitors for Milky Way stars. These findings provide valuable constraints on the properties of past Milky Way mergers, informing chemical evolution models and dynamical simulations of galaxy formation and ultimately enhancing our understanding of the formation and evolution of the Galaxy. Future work with larger samples of high-precision abundance measurements and detailed spectroscopic follow-up will help further refine these associations and resolve remaining ambiguities in classification.

Acknowledgements

KY and KL thank the European Research Council (ERC) for providing funds under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 852977). KY would like to thank Fabio De Ferrari for many productive co-working days and valuable feedback which provided crucial support and motivation for the completion of this manuscript. KY would also like to thank Fägelängens catering and the Albanova cafeteria and staff, for providing delicious lunches and the daily energy required to conduct this research. This work has made use of the typesetting software overleaf1, the plotting and table handling environment TOPCAT (Taylor 2005), and extensively used the PYTHON programming language (Python Core Team 2019) for the analysis, including the following packages: MATPLOTLIB (Hunter 2007), SCIPY (Virtanen et al. 2020), NUMPY (Harris et al. 2020), PANDAS (Wes McKinney 2010), and ASTROPY (Astropy Collaboration 2022, 2018, 2013).

References

  1. Anders, F., Chiappini, C., Santiago, B. X., et al. 2018, A&A, 619, A125 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  2. Anders, F., Khalatyan, A., Queiroz, A. B. A., et al. 2022, A&A, 658, A91 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Astropy Collaboration (Robitaille, T. P., et al.) 2013, A&A, 558, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  4. Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
  5. Astropy Collaboration (Price-Whelan, A. M., et al.) 2022, ApJ, 935, 167 [NASA ADS] [CrossRef] [Google Scholar]
  6. Belokurov, V., & Kravtsov, A. 2023, MNRAS, 525, 4456 [NASA ADS] [CrossRef] [Google Scholar]
  7. Belokurov, V., & Kravtsov, A. 2024, MNRAS, 528, 3198 [CrossRef] [Google Scholar]
  8. Belokurov, V., Evans, N. W., Irwin, M. J., et al. 2007, ApJ, 658, 337 [Google Scholar]
  9. Belokurov, V., Erkal, D., Evans, N. W., Koposov, S. E., & Deason, A. J. 2018, MNRAS, 478, 611 [Google Scholar]
  10. Belokurov, V., Sanders, J. L., Fattahi, A., et al. 2020, MNRAS, 494, 3880 [Google Scholar]
  11. Bernard, E. J., Ferguson, A. M. N., Schlafly, E. F., et al. 2014, MNRAS, 443, L84 [Google Scholar]
  12. Bianchini, P., Ibata, R., & Famaey, B. 2019, ApJ, 887, L12 [NASA ADS] [CrossRef] [Google Scholar]
  13. Bland-Hawthorn, J., Krumholz, M. R., & Freeman, K. 2010, ApJ, 713, 166 [CrossRef] [Google Scholar]
  14. Bonaca, A., Conroy, C., Price-Whelan, A. M., & Hogg, D. W. 2019, ApJ, 881, L37 [NASA ADS] [CrossRef] [Google Scholar]
  15. Bonaca, A., Naidu, R. P., Conroy, C., et al. 2021, ApJ, 909, L26 [NASA ADS] [CrossRef] [Google Scholar]
  16. Bovy, J. 2015, ApJS, 216, 29 [NASA ADS] [CrossRef] [Google Scholar]
  17. Buder, S., Kos, J., Wang, X. E., et al. 2025, PASA, 42, 051 [Google Scholar]
  18. Caldwell, N., Bonaca, A., Price-Whelan, A. M., Sesar, B., & Walker, M. G. 2020, AJ, 159, 287 [NASA ADS] [CrossRef] [Google Scholar]
  19. Cantelli, E., & Teixeira, R. 2024, MNRAS, 530, 2648 [NASA ADS] [CrossRef] [Google Scholar]
  20. Carballo-Bello, J. A., Martínez-Delgado, D., Navarrete, C., et al. 2018, MNRAS, 474, 683 [Google Scholar]
  21. Casamiquela, L., Castro-Ginard, A., Anders, F., & Soubiran, C. 2021, A&A, 654, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  22. Cheng, C. M., Price-Jones, N., & Bovy, J. 2021, MNRAS, 506, 5573 [Google Scholar]
  23. Dinescu, D. I., Girard, T. M., van Altena, W. F., Mendez, R. A., & Lopez, C. E. 1997, AJ, 114, 1014 [Google Scholar]
  24. Drlica-Wagner, A., Bechtol, K., Rykoff, E. S., et al. 2015, ApJ, 813, 109 [Google Scholar]
  25. Fellhauer, M., Evans, N. W., Belokurov, V., et al. 2007, MNRAS, 375, 1171 [Google Scholar]
  26. Forbes, D. A. 2020, MNRAS, 493, 847 [Google Scholar]
  27. Forbes, D. A., & Bridges, T. 2010, MNRAS, 404, 1203 [NASA ADS] [Google Scholar]
  28. Freeman, K., & Bland-Hawthorn, J. 2002, ARA&A, 40, 487 [Google Scholar]
  29. Gaia Collaboration (Prusti, T., et al.) 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  30. Gaia Collaboration (Brown, A. G. A., et al.) 2021, A&A, 649, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223 [Google Scholar]
  32. Grillmair, C. J. 2006, ApJ, 645, L37 [Google Scholar]
  33. Grossi, M., García-Benito, R., Cortesi, A., et al. 2020, MNRAS, 498, 1939 [Google Scholar]
  34. Hansen, T. T., Marshall, J. L., Simon, J. D., et al. 2020a, ApJ, 897, 183 [Google Scholar]
  35. Hansen, T. T., Riley, A. H., Strigari, L. E., et al. 2020b, ApJ, 901, 23 [NASA ADS] [CrossRef] [Google Scholar]
  36. Harris, W. E. 1996, AJ, 112, 1487 [Google Scholar]
  37. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357 [NASA ADS] [CrossRef] [Google Scholar]
  38. Hawkins, K., Lucey, M., Ting, Y.-S., et al. 2020, MNRAS, 492, 1164 [NASA ADS] [CrossRef] [Google Scholar]
  39. Hawkins, K., Price-Whelan, A. M., Sheffield, A. A., et al. 2023, ApJ, 948, 123 [NASA ADS] [CrossRef] [Google Scholar]
  40. Haywood, M., Di Matteo, P., Lehnert, M. D., et al. 2018, ApJ, 863, 113 [Google Scholar]
  41. Helmi, A., White, S. D. M., de Zeeuw, P. T., & Zhao, H. 1999, Nature, 402, 53 [Google Scholar]
  42. Helmi, A., Babusiaux, C., Koppelman, H. H., et al. 2018, Nature, 563, 85 [Google Scholar]
  43. Hinton, G. & Roweis, S. 2002, Adv. Neural Process. Syst., 15, 833 [Google Scholar]
  44. Horta, D., Schiavon, R. P., Mackereth, J. T., et al. 2020, MNRAS, 493, 3363 [NASA ADS] [CrossRef] [Google Scholar]
  45. Hughes, A. C. N., Spitler, L. R., Zucker, D. B., et al. 2022, ApJ, 930, 47 [NASA ADS] [CrossRef] [Google Scholar]
  46. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
  47. Ibata, R. A., Gilmore, G., & Irwin, M. J. 1994, Nature, 370, 194 [Google Scholar]
  48. Ibata, R. A., Bellazzini, M., Malhan, K., Martin, N., & Bianchini, P. 2019, Nat. Astron., 3, 667 [Google Scholar]
  49. Ibata, R., Malhan, K., Martin, N., et al. 2021, ApJ, 914, 123 [NASA ADS] [CrossRef] [Google Scholar]
  50. Ibata, R., Malhan, K., Martin, N., et al. 2022, VizieR Online Data Catalog: Gaia DR2 and EDR3 stars with sp. follow-up (Ibata+, 2021), VizieR On-line Data Catalog: J/ApJ/914/123. Originally published in: 2021ApJ...914..123I [Google Scholar]
  51. Ibata, R., Malhan, K., Tenachi, W., et al. 2024, ApJ, 967, 89 [NASA ADS] [CrossRef] [Google Scholar]
  52. Ji, A. P., Li, T. S., Hansen, T. T., et al. 2020, AJ, 160, 181 [NASA ADS] [CrossRef] [Google Scholar]
  53. Koposov, S. E., Belokurov, V., Li, T. S., et al. 2019, MNRAS, 485, 4726 [Google Scholar]
  54. Koposov, S. E., Erkal, D., Li, T. S., et al. 2023, MNRAS, 521, 4936 [NASA ADS] [CrossRef] [Google Scholar]
  55. Koppelman, H. H., Helmi, A., Massari, D., Price-Whelan, A. M., & Starkenburg, T. K. 2019a, A&A, 631, L9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  56. Koppelman, H. H., Helmi, A., Massari, D., Roelenga, S., & Bastian, U. 2019b, A&A, 625, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  57. Koppelman, H. H., Bos, R. O. Y., & Helmi, A. 2020, A&A, 642, L18 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  58. Kos, J., Bland-Hawthorn, J., Freeman, K., et al. 2018, MNRAS, 473, 4612 [NASA ADS] [CrossRef] [Google Scholar]
  59. Kruijssen, J. M. D., Pfeffer, J. L., Chevance, M., et al. 2020, MNRAS, 498, 2472 [NASA ADS] [CrossRef] [Google Scholar]
  60. Kundu, R., Navarrete, C., Fernández-Trincado, J. G., et al. 2021, A&A, 645, A116 [EDP Sciences] [Google Scholar]
  61. Kushniruk, I., Youakim, K., & Lind, K. 2024, A&A, submitted [Google Scholar]
  62. Lane, J. M. M., Navarro, J. F., Fattahi, A., Oman, K. A., & Bovy, J. 2020, MNRAS, 492, 4164 [Google Scholar]
  63. Leaman, R., VandenBerg, D. A., & Mendel, J. T. 2013, MNRAS, 436, 122 [Google Scholar]
  64. Li, T. S., Koposov, S. E., Zucker, D. B., et al. 2019, MNRAS, 490, 3508 [NASA ADS] [CrossRef] [Google Scholar]
  65. Li, T. S., Ji, A. P., Pace, A. B., et al. 2022, ApJ, 928, 30 [NASA ADS] [CrossRef] [Google Scholar]
  66. Limberg, G., Ji, A. P., Naidu, R. P., et al. 2024, MNRAS, 530, 2512 [NASA ADS] [CrossRef] [Google Scholar]
  67. Lindegren, L., Klioner, S. A., Hernández, J., et al. 2021, A&A, 649, A2 [EDP Sciences] [Google Scholar]
  68. Malhan, K. 2022, ApJ, 930, L9 [NASA ADS] [CrossRef] [Google Scholar]
  69. Malhan, K., & Ibata, R. A. 2018, MNRAS, 477, 4063 [Google Scholar]
  70. Malhan, K., Ibata, R. A., & Martin, N. F. 2018, MNRAS, 481, 3442 [Google Scholar]
  71. Malhan, K., Ibata, R. A., Carlberg, R. G., et al. 2019, ApJ, 886, L7 [NASA ADS] [CrossRef] [Google Scholar]
  72. Malhan, K., Yuan, Z., Ibata, R. A., et al. 2021, ApJ, 920, 51 [NASA ADS] [CrossRef] [Google Scholar]
  73. Malhan, K., Ibata, R. A., Sharma, S., et al. 2022, ApJ, 926, 107 [NASA ADS] [CrossRef] [Google Scholar]
  74. Martell, S. L., Shetrone, M. D., Lucatello, S., et al. 2016, ApJ, 825, 146 [Google Scholar]
  75. Martin, N. F., Venn, K. A., Aguado, D. S., et al. 2022, Nature, 601, 45 [NASA ADS] [CrossRef] [Google Scholar]
  76. Massari, D., Koppelman, H. H., & Helmi, A. 2019, A&A, 630, L4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  77. Mateu, C. 2023, MNRAS, 520, 5225 [Google Scholar]
  78. McConnachie, A. W., & Venn, K. A. 2020a, AJ, 160, 124 [Google Scholar]
  79. McConnachie, A. W., & Venn, K. A. 2020b, RNAAS, 4, 229 [Google Scholar]
  80. Meza, A., Navarro, J. F., Abadi, M. G., & Steinmetz, M. 2005, MNRAS, 359, 93 [CrossRef] [Google Scholar]
  81. Myeong, G. C., Evans, N. W., Belokurov, V., Sanders, J. L., & Koposov, S. E. 2018a, MNRAS, 478, 5449 [NASA ADS] [CrossRef] [Google Scholar]
  82. Myeong, G. C., Evans, N. W., Belokurov, V., Sanders, J. L., & Koposov, S. E. 2018b, ApJ, 863, L28 [NASA ADS] [CrossRef] [Google Scholar]
  83. Myeong, G. C., Vasiliev, E., Iorio, G., Evans, N. W., & Belokurov, V. 2019, MNRAS, 488, 1235 [Google Scholar]
  84. Naidu, R. P., Conroy, C., Bonaca, A., et al. 2020, ApJ, 901, 48 [Google Scholar]
  85. Newberg, H. J., Willett, B. A., Yanny, B., & Xu, Y. 2010, ApJ, 711, 32 [Google Scholar]
  86. Ortigoza-Urdaneta, M., Vieira, K., Fernández-Trincado, J. G., et al. 2023, A&A, 676, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  87. Palau, C. G., & Miralda-Escudé, J. 2019, MNRAS, 488, 1535 [Google Scholar]
  88. Price-Jones, N., & Bovy, J. 2019, MNRAS, 487, 871 [NASA ADS] [CrossRef] [Google Scholar]
  89. Prudil, Z., Hanke, M., Lemasle, B., et al. 2021, A&A, 648, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  90. Python Core Team 2019, Python: A dynamic, open source programming language, Python Software Foundation [Google Scholar]
  91. Queiroz, A. B. A., Anders, F., Santiago, B. X., et al. 2018, MNRAS, 476, 2556 [Google Scholar]
  92. Riley, A. H., & Strigari, L. E. 2020, MNRAS, 494, 983 [NASA ADS] [CrossRef] [Google Scholar]
  93. Santiago, C., Chaushev, A., & Sallum, S. 2022, in American Astronomical Society Meeting Abstracts, 240, 418.08 [Google Scholar]
  94. Sesar, B., Bovy, J., Bernard, E. J., et al. 2015, ApJ, 809, 59 [Google Scholar]
  95. Sestito, F., Martin, N. F., Starkenburg, E., et al. 2020, MNRAS, 497, L7 [Google Scholar]
  96. Shipp, N., Drlica-Wagner, A., Balbinot, E., et al. 2018, ApJ, 862, 114 [Google Scholar]
  97. Simon, J. D., Li, T. S., Erkal, D., et al. 2020, ApJ, 892, 137 [Google Scholar]
  98. Sollima, A. 2020, MNRAS, 495, 2222 [Google Scholar]
  99. Sun, G., Wang, Y., Liu, C., et al. 2023, Res. Astron. Astrophys., 23, 015013 [CrossRef] [Google Scholar]
  100. Taylor, M. B. 2005, in Astronomical Society of the Pacific Conference Series, 347, Astronomical Data Analysis Software and Systems XIV, eds. P. Shopbell, M. Britton, & R. Ebert, 29 [Google Scholar]
  101. Thomas, G. F., Jensen, J., McConnachie, A., et al. 2020, ApJ, 902, 89 [NASA ADS] [CrossRef] [Google Scholar]
  102. Traven, G., Matijevič, G., Zwitter, T., et al. 2017, ApJS, 228, 24 [NASA ADS] [CrossRef] [Google Scholar]
  103. Traven, G., Feltzing, S., Merle, T., et al. 2020, A&A, 638, A145 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  104. Ulyanov, D. 2016, Multicore-TSNE, https://github.com/DmitryUlyanov/Multicore-TSNE [Google Scholar]
  105. van der Maaten, L. 2013, arXiv e-prints [arXiv:1301.3342] [Google Scholar]
  106. van der Maaten, L., & Hinton, G. 2008, J. Mach. Learn. Res., 1, 1 [Google Scholar]
  107. Vasiliev, E. 2019, MNRAS, 484, 2832 [Google Scholar]
  108. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Methods, 17, 261 [Google Scholar]
  109. Wes McKinney 2010, in Proceedings of the 9th Python in Science Conference, eds. S. van der Walt, & J. Millman, 56 [Google Scholar]
  110. Yang, Y., Zhao, J.-K., Tang, X.-Z., Ye, X.-H., & Zhao, G. 2023, ApJ, 953, 130 [Google Scholar]
  111. Youakim, K., Lind, K., & Kushniruk, I. 2023, MNRAS, 524, 2630 [Google Scholar]
  112. Yuan, Z., Chang, J., Beers, T. C., & Huang, Y. 2020, ApJ, 898, L37 [Google Scholar]
  113. Yuan, Z., Martin, N. F., Ibata, R. A., et al. 2022, MNRAS, 514, 1664 [CrossRef] [Google Scholar]
  114. Zinn, R. 1993, in Astronomical Society of the Pacific Conference Series, 48, The Globular Cluster-Galaxy Connection, eds. G. H. Smith, & J. P. Brodie, 38 [Google Scholar]

Appendix A Different clustering scales with changing perplexity

thumbnail Fig. A.1

t-SNE latent space projections for different values of perplexity showing the different scales of clustering used in the selection of groups. Markers are the same as described in Figure 1.

Appendix B Groups of globular clusters

thumbnail Fig. B.1

Top: Sky plot showing only the GCs from the selected groups plotted in Galactic coordinates. Bottom left: Galactic pericentre vs apocentre distances for the GCs. Bottom right: Zoomed-in view of the left panel, with axes according to the black box in the left panel. The grey dashed lines show lines of constant eccentricity.

Appendix C Subgroups of GSE globular clusters

thumbnail Fig. C.1

Top: Globular clusters from the GSE selected subgroups plotted in Galactic coordinates. Middle: Kinematic parameter spaces E–Lz and Jr–Jz are shown in the left and right panels, respectively. Each square symbol represents one GC and the grey points show the rest of the dataset to give context to where these GC populations lie. Bottom: Kernel density estimation of [Fe/H] distributions for each GSE subgroup.

Appendix D Full table of selected groups

Table D.1

Summary of all of the identified groups and subgroups from the t-SNE latent space selection.

All Tables

Table 1

Summary of the compiled data used in this work.

Table 2

Selected groups and their corresponding structures.

Table 3

Stream-progenitor groups from the smallest scale selection in the t-SNE latent space.

Table D.1

Summary of all of the identified groups and subgroups from the t-SNE latent space selection.

All Figures

thumbnail Fig. 1

Top: t-SNE latent space at a perplexity of p = 50. Stream stars from STREAMFINDER are shown as filled coloured points, and stream stars from S5 as open coloured points, GCs are open coloured squares, satellite galaxies are open coloured circles, halo substructures are filled coloured circles, and extra stream stars from the literature are star symbols, coloured according to the legend. Bottom: t-SNE latent space coloured by the selected groups, with the left panel showing the selection of the main groups, the middle panel showing the selection of the subgroups, and the right panel showing the selection of the stream-cluster progenitors. Each group, subgroup and stream-progenitor pair is labelled by its group number which corresponds to the groups listed in Tables 2, 3, and D.1.

In the text
thumbnail Fig. 2

Validation plots for the nine identified groups. Top: kinematic parameter spaces of E–Lz (left panel) and Jr–Jz (right panel). Bottom: differential [Fe/H] of each structure vs the mean [Fe/H] for all structures in the group (see Section 4.2 for a more detailed explanation.) The grey dashed line shows the mean value for each group, and the faint dashed lines show a 0.2 dex dispersion to guide the eye. A horizontal offset of 0.01 dex has been applied between streams and clusters within the groups in order to better visualise the distributions. Structures are coloured consistently with the selection shown in the bottom left panel of Figure 1 and described in Table 2. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are GCs and large open circle symbols are Local Group dwarf galaxies.

In the text
thumbnail Fig. 3

Validation plots for the 16 identified subgroups. Structures are coloured consistently with the selection shown in the bottom-left middle panel of Figure 1 and described in Table D.1. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are GCs, and large open circle symbols are Local Group dwarf galaxies.

In the text
thumbnail Fig. 4

Top: streams and associated progenitors plotted in Galactic coordinates. Middle: kinematic parameter spaces E–Lz and Jr–Jz are shown in the left and right panels, respectively. Bottom: differential [Fe/H] of each structure vs the mean [Fe/H] for all streams and progenitors in the group. Filled and open points are stream stars from the STREAMFINDER sample and the S5 sample, respectively. Large open square symbols are for GCs, and large open circle symbols are for Local Group dwarf galaxies.

In the text
thumbnail Fig. A.1

t-SNE latent space projections for different values of perplexity showing the different scales of clustering used in the selection of groups. Markers are the same as described in Figure 1.

In the text
thumbnail Fig. B.1

Top: Sky plot showing only the GCs from the selected groups plotted in Galactic coordinates. Bottom left: Galactic pericentre vs apocentre distances for the GCs. Bottom right: Zoomed-in view of the left panel, with axes according to the black box in the left panel. The grey dashed lines show lines of constant eccentricity.

In the text
thumbnail Fig. C.1

Top: Globular clusters from the GSE selected subgroups plotted in Galactic coordinates. Middle: Kinematic parameter spaces E–Lz and Jr–Jz are shown in the left and right panels, respectively. Each square symbol represents one GC and the grey points show the rest of the dataset to give context to where these GC populations lie. Bottom: Kernel density estimation of [Fe/H] distributions for each GSE subgroup.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.