Simultaneous Modeling of Heterogeneous Subpopulations within one Framework Christina Bohk Contact author Department of Sociology and Demography University of Rostock Email: christina.bohk@uni-rostock.de Roland Rau Department of Sociology and Demography University of Rostock Email: roland.rau@uni-rostock.de XXVII IUSSP International Population Conference August 203
Subpopulations with different demographic behavior Common approaches [, 2, 3] typically project a population by age and sex with equal assumptions in mortality and fertility for the autochthonous population and the migrants. This can induce considerable projection error [4] as many studies show that mortality and fertility differ between the autochthonous population and the migrants [5]. For instance, in Germany (and in many other countries) immigrants have lower mortality than the autochthonous population due to a healthy migrant effect [6]. To account for this demographic heterogeneity, we project the autochthonous population, the immigrants, and their descendent generations with separate mortality and fertility assumptions in our probabilistic population projection model (PPPM) [7]. However, even the single projection of these subpopulations might be insufficient as recent studies show that demographic behavior also differs among immigrants [8, 9, 0]. Levels of mortality and fertility can vary by, e. g., country of origin, reason to migrate, level of education, or employment status. For instance, a young person is likely to have higher fertility if the reason to migrate is not based on anticipated economic advantages, but to re-unite with the family. Alternatively, migrants might be healthier if they have a higher level of education. To account for this demographic heterogeneity in depth, we propose to extend the PPPM so that a forecaster can project as many subpopulations with separate assumptions in mortality and fertility as needed, in order to increase projection accuracy. Next to age, sex, and migration background, a forecaster can then also use other characteristics like level of education, reason to migrate or country of origin to construct subpopulations. 2 Application in a population projection with real-world data of Germany To analyze the impact of demographic heterogeneity on projection outcome, we conduct three projections with real-world data of Germany by single age (0 to 95+) and sex from 202 to 2050 with the PPPM. From the first to the third projection, we will successively consider more and more subpopulations that are build with different characteristics, i. e. we An implementation of the PPPM is freely available as open-source software at https://bitbucket. org/christina_bohk/p3j. 2
will successively disaggregate an overall heterogeneous population in more and more homogeneous subpopulations: In the first projection, we project the total population by age and sex with equal assumptions in mortality and fertility for all enclosed people. In the second projection, we project the autochthonous population, the immigrants, and their descendent generations with separate assumptions in mortality and fertility, and in the third projection, we further disaggregate the immigrants in two more homogeneous subpopulations (A and B) who might differ, for instance, in their reason to migrate or in their attained level of education. Table summarizes the subpopulations that we will consider in each projection and that we will project with separate assumptions in mortality and/or fertility. Projection Projection 2 Projection 3 Immigrants A 0.7M, 0.9F M, F 0.8M,.2F B 0.8M,.3F Desc. Gen. A 0.8M, 0.9F M, F 0.9M,.F B 0.9M,.2F Desc. Gen. 2 A M, F M, F B M, F M, F M native mortality F native fertility Table : Assumptions in mortality (M) and fertility (F) for immigrants and their first two descendent generations, which are further disaggregated in subpopulations A and B in the third projection. For instance, 0.8M indicates mortality that is 20 percent lower than native mortality M. In total, we generate three variants for each of these three projections: One variant with separate mortality assumptions, another variant with separate fertility assumptions, and a 3
final variant with separate mortality and fertility assumptions. This allows us to analyze the impact of separate mortality and/or fertility assumptions on projection outcome, considering more and more homogeneous subpopulations from projection to 3. Projection 2 and 3 represent the old and new version of the PPPM, respectively. Supposing that the immigrant-subpopulations A and B are defined with the characteristic reason to migrate, they could represent persons who immigrate due to job attainment and due to other reasons like family reunion. It could be reasonable to make such a differentiation because migration related factors like modified immigration laws might change the motivation for people to move to a country. For instance, the immigration law in Germany aims to ease the access for highly skilled employees whereas it exacerbates the access for unskilled workers. For instance, the introduction of the EU Blue Card in 202 enables highly educated people to work and live in Germany. This residence and employment permission is initially limited to four years although it could be converted into a permanent permission, and it is linked/bounded to a minimum annual gross salary for highly qualified people. This minimum gross salary is substantially lower for scientists, mathematicians, engineers, doctors and IT specialists. 2 We think that such changes (in immigration law) could have considerable effects on population dynamics. For instance, they could change the structure of immigrants regarding their country of origin, their reason to immigrate, and their attained level of education. Studies show that such highly qualified people are relatively healthy and that they have relatively low mortality [, 2]. Hence, if the share of highly qualified people increases among immigrants, it might be possible that the healthy migrant effect intensifies. Against this background, we exemplarily assume in projection 2 that immigrants have lower mortality than the autochthonous population, and we exemplarily assume in projection 3 that some of these immigrants have even lower mortality (see Table ), possibly due to a higher qualification. We also assume an adaption between the lower mortality of the direct immigrants and the higher mortality of the natives over the descendent generations in the second and third projection. To our knowledge, such a flexible and in depth modeling of 2 http://www.bamf.de/en/dasbamf/aufgaben/blauekarte/blauekarte.html 4
demographic heterogeneity is a novelty in probabilistic population forecasts so far. Principally, our general framework is a kind of a probabilistic multi-state model, but with all methodological advantages of the old version of the PPPM [7] like, for instance, generating multiple assumptions with any method a forecaster deems best, weighting each assumption with expected likelihoods, or eliminating implausible combinations of assumptions by imposing dependencies among parameters. Number of missing immigrants Figure illustrates the increasing scale of missing immigrants, i. e. the difference in immigrants between the first and the third projection, when assuming, () lower mortality (red line), (2) higher fertility (green line) and (3) lower mortality and higher fertility (blue line) for immigrant-subpopulations than for natives. The number of missing immigrants increases with projection time, and the combined effect of lower mortality and higher fertility (for immigrants than for natives) exceeds the sum of the single effects of pure lower mortality and of pure higher fertility. A comparison of the first projection with the second and the third projection reveals that the projected number of missing immigrants is the same, if the mortality and fertility assumptions of the second projection are exactly the weighted average of the mortality and fertility assumptions of the third projection. Although the number of the immigrants is the same in the second and in the third projection, these results demonstrate how important it is to forecast a population with separate assumptions in mortality and fertility for all subpopulations. Structure of missing immigrants according to the old version of the PPPM Figure 2 depicts to what extent the direct immigrants (red line), the first descendent generation (green line) and the second descendent generation (yellow line) contribute to the number of all missing immigrants (blue line), when comparing the first with the second projection and when assuming separate assumptions in mortality and fertility for these subpopulations. 5
Number of Missing Immigrants (in Million) 2 Mortality impact Fertility impact Mortality and fertility impact.5 0.5 0 200 205 2020 2025 2030 2035 2040 2045 2050 Years to project Figure : Number of missing immigrants increases with projection time. Considering demographic heterogeneity between immigrants and natives in mortality and fertility further increases projection error. The first descendent generation are the children of the direct immigrants, whereas the second descendent generation are the children of the first descendent generation. The overall projection error increases with time and is mainly due to the underestimation of the immigrants of the first descendent generation. Hence, assuming separate fertility for immigrants and natives has a strong impact on accuracy of projection outcome. 6
2 All immigrants Direct immigrants Desc. gen. Desc. gen. 2.5 0.5 0 200 205 2020 2025 2030 2035 2040 2045 2050 Years to project Figure 2: Composition of all missing immigrants according to the old version of the PPPM, when comparing the first with the second projection and when assuming separate assumptions in mortality and fertility for these subpopulations. Structure of missing immigrants according to the new version of the PPPM Providing information about the composition of all immigrants (regarding direct immigrants and their descendent generations) is already an advantage of the old version of the PPPM, but the new version of the PPPM provides additional information about the structure of these immigrant-subpopulations, i. e. it provides information to what extent each of these immigrant-subpopulations consists of further subpopulations. According to projection 3, we can now project how the direct immigrants, the first descendent generation and the second descendent generation consist of their subpopulations A and B. Instead of three 7
immigrant-subpopulations, we now have six immigrant-subpopulations. Figure 3 depicts to what extent each of these six immigrant-subpopulations contributes to the number of all missing immigrants, when comparing the first with the third projection and when assuming separate assumptions in mortality and fertility for these subpopulations. As we assume higher fertility (and mortality) for subpopulation B than for subpopulation A, immigrants of subpopulation B are more underestimated than those of subpopulation A due to the stronger impact of separate fertility than separate mortality. 8
All immigrants Direct Immigrants 2 2.5 All immigrants A B.5 All direct immigrants A B 0.5 0.5 0 200 205 2020 2025 2030 2035 2040 2045 2050 0 200 205 2020 2025 2030 2035 2040 2045 2050 Years to project Years to project Desc. Gen. Desc. Gen. 2 2 2.5 All immigrants of desc. gen. A B.5 All immigrants of desc. gen. 2 A B 0.5 0.5 0 200 205 2020 2025 2030 2035 2040 2045 2050 0 200 205 2020 2025 2030 2035 2040 2045 2050 Years to project Years to project Figure 3: Composition of all missing immigrants according to the new version of the PPPM, when comparing the first with the third projection and when assuming separate assumptions in mortality and fertility for these subpopulations. 9
Probabilistic projection outcome in the new version of the PPPM Next to the additional information regarding the composition of a projected population, the new version of the PPPM also captures and quantifies the uncertainty of the projection outcome. Figure 4 illustrates the outcome distribution for the number of all missing immigrants, when comparing the first with the third projection and when assuming separate assumptions in mortality and fertility for all subpopulations. This outcome distribution reveals that not only the number of missing immigrants, but also the projection uncertainty increases with projection time. In 2050, the missing immigrants range between.236 and 2.355 million people with a probability of 95 percent (see also Table 2). Missing Immigrants (in million) Quantile 0.025 0. 0.5 0.9 0.975 Year 2030 0.398 0.422 0.496 0.569 0.62 2040 0.708 0.790 0.96.68.245 2050.236.44.768 2.6 2.355 Table 2: Quantiles for the projected number of all missing immigrants in the projection years 2030, 2040 and 2050, when comparing the first with the third projection and when assuming separate assumptions in mortality and fertility for all subpopulations. 0
3 2.5 2 0.0 Missing Immigrants (in Million).5 Pr 0.05 0.00 2035 2040 2045 2050 Time Figure 4: Outcome distribution for the number of all missing immigrants through the projection years up to 2050, when comparing the first with the third projection and when assuming separate assumptions in mortality and fertility for all subpopulations. Projection uncertainty increases with time, recognizable by the distribution which becomes wider with time.
3 Summary and Conlusion In general, we want to emphasize how important it is to consider demographic heterogeneity in population forecasts, and how it can affect the accuracy of projection outcome. The old version of the PPPM disaggregated a total population in direct immigrants and their descendent generations and projected each of them with separate assumptions in mortality and fertility. The new version of the PPPM goes one step further, allowing the user to disaggregate a total population in a freely selectable number of (more) homogeneous subpopulations and projecting each of them with separate assumptions in mortality and fertility. These subpopulations can be defined with any characteristic like, for instance, the attained level of education, religion, or country of origin (for immigrants). Despite these additional features of the new version, the advantages of the old version of the PPPM still apply: generate multiple assumptions for each model parameter and weigh them with expected likelihoods set correlations among assumptions of one ore more subpopulations with Settypes and Sets (to exclude implausible scenarios) generate probabilistic outcome for each subpopulation The advantages of the new version of the PPPM cannot only increase its projection accuracy, but also its applicability. Applying the new version of the PPPM can be useful whenever it is important to consider demographic heterogeneity () within a population as well as (2) between populations more thoroughly. For instance, our new general framework can be applied to conduct multiregional projections. In such a case, additional subpopulations are rather populations of other regions or countries, and not subsets of a population of the same region or country. In addition, the new version of the PPPM can be applied to project a population by the attained level of education. In such a case, the subpopulations belong to certain attained levels of education. 2
The new version of the PPPM is freely available at https://bitbucket.org/ Christina_Bohk/p3j. References [] Wolfgang Lutz, Warren C. Sanderson, and Sergei Scherbov. Probabilistic population projections based on expert opinion. In Wolfgang Lutz, editor, The Future Population of the World. What can we assume today? Earthscan Ltd, 996. [2] Wolfgang Lutz, Warren C. Sanderson, and Sergei Scherbov. The end of world population growth. Nature, 42:543 545, August 200. [3] Juha M. Alho and Bruce D. Spencer. Statistical Demography and Forecasting. Springer Science+Business Media, Inc., 2005. [4] Christina Bohk. On the Impact of Separate Fertility and Mortality Assumptions for Native and Migrant Subpopulations in Population Projections. In PAA Annual Meeting 20, 20. [5] Laura Blue and Andrew Fenelon. Explaining low mortality among US immigrants relative to native-born Americans: the role of smoking. International Journal of Epidemiology, 40:786 793, 20. [6] Martin Kohls. Demographie von Migranten in Deutschland. Challenges in Public Health. Peter Lang, 202. [7] Christina Bohk. Ein probabilistisches Bevölkerungsprognosemodell. Entwicklung und Anwendung für Deutschland. Springer VS, 202. Demographischer Wandel Hintergründe und Herausforderungen. [8] Kirk Scott and Maria Stanfors. The transition to parenthood among the second generation: Evidence from Sweden, 990-2005. Advances in Life Course Research, 6(4):90 204, 20. [9] Gunnar Andersson. Childbearing after migration: Fertility patterns of foreign-born women in Sweden. International Migration Review, 38(2):747 775, 2004. [0] Nadja Milewski. Immigrant fertility in West Germany: Is there a socialisation effect in transitions to second and third births? European Journal of Population, 26:297 323, 200. 3
[] Samir KC and Harold Lentzner. The effect of education on adult mortality and disability: a global perspective. Vienna Yearbook of Population Research, 8:20 235, 200. [2] Wolfgang Lutz and Samir KC. Global Human Capital: Integrating Education and Population. Science, 333:587 592, 20. 4