Online Material

Appendix A: Statistical evaluation of the model

A.1. Univariate tests on individual planet characteristics

Table 6: Mean values and standard deviations of the system parameters for the observed transiting planets and our simulated detections.

In this section, we detail the statistical method and tests that have been used to validate the model. We first perform basic tests of our model with simulations repeating multiple times the number of observations of the OGLE survey in order to get 50 000 detections. This number was chosen as a compromise between statistical significance and computation time. Table 6 compares the mean values and standard variations in the observations and in the simulations. The closeness of the values obtained for the two populations is an indication that our approach provides a reasonably good fit to the real stellar and planetary populations, and to the real planet compositions and evolution.

However, we do require more advanced statistical tests. First, we use the so-called Student's t-test to formally compare the mean values of all characteristics for both types of planets. The intuition is that, should the model yield simulated planets of attributes similar to real planets, the average values of these attributes should not be significantly different from one another. In other words, the so-called null hypothesis H₀ is that the difference of their mean is zero. Posing H₀: $\mu^{r}-\mu^{s}=0$ where superscripts r and s denote real and simulated planets respectively, and the alternative hypothesis $H_{\rm a}$ being the complement $H_{\rm a}$ : $\mu^{r}-\mu^{s}\neq0$ , we compute the tstatistics using the first and second moments of the distribution of each planet characteristics as follows:

$\begin{displaymath}t = \frac{{\left( {\mu_x^{r} - \mu_x^{s} } \right)}}{{\frac{{s_{\rm p} }}{{\sqrt {n_{r} + n_{s}} }}}}, \end{displaymath}$

(2)

where x is each of the planet characteristics, n is the size of each sample, and $s_{\rm p}$ is the square root of the pooled variance accounting for the sizes of the two population samples The statistics follows a tdistribution, from which one can easily derive the two-tailed critical probability that the two samples come from one unique population of planets, i.e. H₀ cannot be rejected. The results are displayed in Table 7 (Note that $\theta$ is the Safronov number; other parameters have their usual meaning). In all cases, the probabilities are greater than 40%, implying that there is no significant difference in the mean characteristics of both types of planets. In other words, the two samples exhibit similar central tendencies.

Next, we perform the Kolmogorov-Smirnov test to allow for a more global assessment of the compatibility of the two populations. This test has the advantage of being non-parametric, making no assumption about the distribution of data. This is particularly important since the number of real planets remains small, which may alter the normality of the distribution. Moreover, the Kolmogorov-Smirnov comparison tests the stochastic dominance of the entire distribution of real planets over simulated planets. To do so, it computes the largest absolute deviations D between F_r(x), the empirical cumulative distribution function of characteristics xfor real planets, and F_s(x) the cumulative distribution function of characteristics x for simulated planets, over the range of values of x: $D = \mathop {\max }\limits_x \left\{ {\left\vert {F_{\rm real} \left( x \right) - F_{\rm sim} \left( x \right)} \right\vert} \right\}$ . If the calculated D-statistic is greater than the critical D^*-statistic (provided by the Kolmogorov-Smirnov table: for 31 observations D^*=0.19 for a 80% confidence level and D^*=0.24 for a 95% confidence level), then one must reject the null hypothesis that the two distributions are similar, H₀: | F_r(x)-F_s(x) | <D^*, and accept $H_{\rm a}: \vert F_{r}(x)-F_{s}(x) \vert \geq D^*$ . Table 8 shows the result of the test. The first column provides the D-Statistics, and the second column gives the probability that the two samples have the same distribution.

Again, we find a good match between the model and observed samples: the parameters that have the least satisfactory fits are the planet's equilibrium temperature and the planet mass. These values are interpreted as being due to imperfections in the assumed star and planet populations. It is important to stress that although the extrasolar planets' main characteristics (period, mass) are well-defined by radial-velocity surveys, the subset of transiting planets is highly biased towards short periods and corresponds to a relatively small sample of the known radial-velocity planet population. This explains why the probability that the planetary mass is drawn from the same distribution in the model and in the observations is relatively low, which may otherwise seem surprising given that the planet mass distribution would be expected to be relatively well defined by the radial-velocity measurements.

A.2. Tests in two dimensions

Tests of the adequation of observations and models in two dimensions, i.e. when considering one parameter compared to another one can be performed using the method of maximum likelihood as described in Paper I. Table 9 provides values of the standard deviations from maximum likelihood for important combinations of parameters. The second column is a comparison using all planets discovered by transit surveys, and the third column using all known transiting planets (including those discovered by radial velocity).

The results are generally good, with deviations not exceeding $1.82\sigma$ . They are also very similar when considering all planets or only the subset discovered by photometric surveys. This shows that the radial-velocity and photometric planet characteristics are quite similar. The mass vs. radius relation shows the highest deviation, as a few planets are outliers of our planetary evolution model.

A.3. Multivariate assessment of the performance of the model

A.3.1. Principle

Tests such as the Student-t statistics and the Kolmogorov-Smirnov test are important to determine the adequacy of given parameters, but they do not provide a multivariate assessment of the model. In order to globally assess the viability of our model we proceed as follows: We generate a list including 50 000 ``simulated'' planets and the 31 ``observed'' giant planets from Table 1. This number is necessary for an accurate multi-variate analysis (see Sect. A.3.2). A dummy variable Y is generated with value 1 if the planet is observed, 0 if the planet is simulated.

In order to test dependencies between parameters, we have presented in Table 3 (Sect. 2.4) the Pearson correlation coefficients between each variable including Y. A first look at the table shows that the method correctly retrieves the important physical correlations without any a priori information concerning the links that exist between the different parameters. For example, the stellar effective temperature $T_{\rm eff}$ is positively correlated to the stellar mass $M_{\star}$ , and radius $R_{\star}$ . It is also naturally positively correlated to the planet's equilibrium temperature $T_{\rm eq}$ , and to the planet's radius $R_{\rm p}$ simply because evolution models predict planetary radii that are larger for larger values of the irradiation, all parameters being equal. Interestingly, it can be seen that although the Safronov number is by definition correlated to the planetary mass, radius, orbital period and star mass (see Eq. (1)), the largest correlation parameters for $\theta$ in absolute value are those related to $M_{\rm p}$ and P (as the range of both these parameters vary by more than one decade, while $M_{\star}$ and $R_{\rm p}$ only vary by a factor of 2). Also, we observe that the star metallicity is only correlated to the planet radius. This is a consequence of our assumption that a planet's heavy element content is directly proportional to the star's [Fe/H], and of the fact that planets with more heavy elements are smaller, all other parameters being equal. The planet's radius is itself correlated negatively with [Fe/H] and positively with $T_{\rm eq}$ , $M_{\star}$ , $R_{\star}$ and $T_{\rm eff}$ . Table 3 also shows the correlations with the ``reality'' parameter. Of course, a satisfactory model is one in which there is no correlation between this reality parameter and other physical parameters of the model. In our case, the corresponding correlation coefficients are always small and indicate a good match between the two populations.

Table 7: Test of equality of means. Student's t value and critical probabilities p that individual parameters for both real and simulated planets have the same sample mean.

Table 8: Kolmogorov-Smirnov tests. D-statistics and critical probabilities that individual parameters for both real and simulated planets have the same distribution.

Table 9: Standard deviations from maximum likelihood of the model and observed transiting planet populations

Obviously the unconditional probability that a given planet is real is $\Pr(Y=1)=31/50~031\simeq.00062$ . Now we wish to know whether this probability is sensitive to any of the planet characteristics, controlling for all planet characteristics at once. Hence we model the probability that a given planet is ``real'' using the logistic cumulative density function as follows:

$\begin{displaymath} \Pr(Y = 1\vert{\vec{X}_i}) = \frac{{{\rm e}^{{\vec{X}_i\vec{b} }} }}{{1 + {\rm e}^{{\vec{X}_i\vec{b} }} }} \end{displaymath}$

(4)

where $\vec{X}_i$ is the vector of explanatory variables (i.e. planet characteristics) for the planet i (real or simulated), and ${\vec b}$ is the vector of parameter to be estimated, and $\vec{X}_i\vec{b}\equiv b_0 + \sum_j X_{ij}b_j$ , and b₀ is a constant. There are n events to be considered (i=1..n) and mexplanatory variables (j=1..m).

Importantly, an ordinary least square estimator should not be used in this framework, due to the binary nature of the dependent variables. Departures from normality and predictions outside the range [0;1] are the quintessential motivations. Instead, Eq. (4) can be estimated using maximum likelihood methods. The so-called logit specification (Greene 2000) fits the parameter estimates ${\vec b}$ so as to maximize the log likelihood function:

$\begin{displaymath}\log L({\vec{Y}}\vert{\vec{X,{b}}}) = \sum\limits_{i = 1}^n {... ...^n {\log \left[ {1 + {\rm e}^{{\vec{X}_i\vec{b} }} } \right]}. \end{displaymath}$

(5)

The $\log L$ function is then maximized choosing $\vec{\hat{b} }$ such that ${\partial \log L(y_i,{\vec{X}_i,\vec{\hat{b}}})}/{\partial\vec{\hat{b} }}=0$ , using a Newton-Raphson algorithm. The closer the coefficients $\hat{b}_1,\hat{b}_2,..,\hat{b}_m$ are to 0, the closer the model is to the observations. Conversely, a coefficient that is significantly different from zero tells us that there is a correlation between this coefficient and the probability of a planet being ``real'', i.e. the model is not a good match to the observations.

Two features of logistic regression using maximum likelihood estimators are important. First, the value added by the exercise is that the multivariate approach allows us to hold all other planet characteristics constant, extending the bivariate correlations to the multivariate case. In other words, we control for all planet characteristics at once. Second, one can test whether a given parameter estimate is equal to 0 with the usual null hypothesis H₀: b=0 versus $H_{\rm a}$ : ${b}\neq 0$ . The variance of the estimator is used to derive the standard error of the parameter estimate. Using Eq. (6), dividing each variable $\hat{{b}}_j$ by the standard error ${\rm s.e.}(\hat{{b}}_j)$ yields the t-statistics and allows us to test H₀. We note ${\cal P}_j$ the probability that a higher value of t would occur by chance. This probability is evaluated for each explanatory variable j. Should our model perform well, we would expect the t value of each parameter estimate to be null, and the corresponding probability ${\cal P}_j$ to be close to one. This would imply no significant association between a single planet characteristics and the event of being a ``real'' planet.

The global probability that the model and observations are compatible can be estimated. To do so, we compute the log likelihood obtained when b_j=0for j=1..m, where m is the number of variables. Following Eq. (6):

$\begin{displaymath} \log L({\vec{Y}}\vert 1,b_0) = \sum\limits_{i = 1}^n {y_i b_... ...um\limits_{i = 1}^n {\log \left[ {1 + {\rm e}^{b_0}} \right]}. \end{displaymath}$

(6)

The maximum of this quantity is $\log L_0=n_0\log(n_0/n)+n_1\log(n_{1}/n)$ , where n₀ is the number of cases in which y=0 and n₁ is the number of observations with y=1. L₀ is thus the maximum likelihood obtained for a model which is in perfect agreement with the observations (no explanatory variable is correlated to the probability of being real). Now, it can be shown that the likelihood statistic ratio

$\begin{displaymath}c_{\rm LL}=2 (\log L_1 - \log L_0) \end{displaymath}$

(7)

follows a $\chi ^2$ distribution for a number of degrees of freedom mwhen the null hypothesis is true (Aldrich & Nelson 1984). The probability that a sum of m normally distributed random variables with mean 0 and variance 1 is larger than a value $c_{\rm LL}$ is:

$\begin{displaymath}{\cal P}_{\chi^2}= P(m/2,c_{\rm LL}/2), \end{displaymath}$

(8)

where P(k,z) is the regularized Gamma function (e.g. Abramowitz & Stegun (1964)). ${\cal {P}}_{\chi ^2}$ is thus the probability that the model planets and the observed planets are drawn from the same distribution.

A.3.2. Determination of the number of model planets required

A problem that arose in the course of the present work was to evaluate the number of model planets that were needed for the logit evaluation. It is often estimated that about 10 times more model points than observations are sufficient for a good tests. We found that this relatively small number of points indeed leads to a valid identification of the explanatory variables that are problematic, i.e. those for which the $\hat{b}$ coefficient is significantly different from 0 (if any). However, the evaluation of the global $\chi ^2$ probability was then found to show considerable statistical variability, probably given the relatively large number of explanatory variables used for the study.

In order to test how the probability ${\cal {P}}_{\chi ^2}$ depends on the size n of the sample to be analyzed, we first generated a very large list of N₀ simulated planets with CoRoTlux. We generated with Monte-Carlo simulations a smaller subset of $n_0\le N_0$ simulated planets that was augmented by the n₁=31 observed planets and computed ${\cal {P}}_{\chi ^2}$ using the logit procedure. This exercise was performed 1000 times, and the results are shown in Fig. 13. The resulting ${\cal {P}}_{\chi ^2}$ is found to be very variable for a sample smaller than $\sim$ 20 000 planets. As a consequence, we chose to present tests performed for n₀=50 000 model planets.

$\begin{figure} \par\includegraphics[width=6.5cm,clip]{10097f14.eps} \end{figure}$	Figure 13: Values of the $\chi ^2$ probability, ${\cal {P}}_{\chi ^2}$ (see text) obtained after a logit analysis as a function of the size of the sample of model planets n₀.
Open with DEXTER

A.3.3. Analysis of two CoRoTlux samples

Table 4 (see Sect. 2.4) reports the parameter estimates for each of the planet/star characteristics. We start by assessing the general quality of the logistic regression by performing the chi-square test. If the vector of planet characteristics brings no or little information as to which type of planets a given observation belongs, we would expect the logistic regression to perform badly. In technical terms, we would expect the conditional probability $\Pr(Y = 1\vert{\vec{X}})$ to be equal to the unconditional probability $\Pr(Y = 1)$ . The $\chi ^2$ test described above is used to evaluate the significance of the model.

We performed several tests: the first column of results in Table 10 shows the result of a logit analysis with the whole series of 9 explanatory variables. Globally, the model behaves well, with a likelihood statistic ratio $c_{\rm LL}=5.8$ and a $\chi ^2$ distribution for 9 degrees of freedom yielding a probability ${\cal{P}}_{\chi^2}=0.758$ . When examining individual variables, we find that the lowest probability derived from the Student test is that of [Fe/H]: ${\cal{P}}_{\rm {\rm [Fe/H]}}=0.164$ , implying that the stellar metallicity is not well reproduced. As discussed previously, this is due to the fact that several planets of the observed list have no or very poorly constrained determinations of the stellar [Fe/H], and so a default value of 0 was then used.

The other columns in Table 10 show the result of the logit analysis when removing one variable (i.e. with only 8 explanatory variables). In agreement with the above analysis, the highest global probability ${\cal {P}}_{\chi ^2}$ is obtained for the model without the [Fe/H] variable. When removing other variables, the results are very homogeneous, indicating that although the model can certainly be improved, there is no readily identified problem except that for [Fe/H]. We hope that future observations will allow for better constraints on these stars' metallicities.

In order to further test the method, we show in Table 11 the results of an analysis in which the model radii where artificially augmented by 10%. The corresponding probabilities are significantly lower: we find that the model can explain the observations by chance only in less than 1/10 000. The probabilities for each variable are affected as well so that it is impossible to identify the culprit for the bad fit with the 9 variables. However, when removing $R_{\rm p}$ from the analysis sample, the fit becomes significantly better. Note that the results for that column are slightly different of those for the same column in Table 10 because of the dependance of $\theta$ on $R_{\rm p}$ .

Table 10: Results of the logit analysis for the fiducial model with 50 000 model planets and 31 observations.

Table 11: Results of the logit analysis for the altered model ( $R_{\rm p}$ increased by 10%) with 50 000 model planets and 31 observations.