European Social Survey ESS 2004 Documentation of the sampling procedure A. TARGET POPULATION The population is composed by all persons aged 15 and over resident within private households in Spain (including Ceuta and Melilla) regardless of their nationality, citizenship or language. The 2004 design improves the one of 2002 in the coverage of the target population, because it includes the North-African cities of Ceuta and Melilla that were excluded from the first round. B. SAMPLING FRAME The sampling frame to extract the 2004 ESS sample will be the population census structured in census sections taken from the Continuous Census (Padrón Contínuo), updated in March 2004 by the Instituto Nacional de Estadística (INE, the Public Statistics Office of Spain). This census contains information on individuals. Extracting a sample of individuals instead of a sample of households, as it was done in the 2002 round, represents a threefold design improvement: The sample of individuals provided by the INE will probably include fewer errors than a sample of households, which will reduce the number of invalid cases. 1
The possibility of error and/or bias in the selection of individuals within households will be avoided now. One of the sources of design effects disappears with respect to the last year design, the one that refers to the differences in the probability of including an individual as a consequence of the different number of members in households. The Continuous Census comes from the 2001 Census, updated in March 2004 using the municipal rolls. When a citizen moves from one borough to another he has to notify it to the local authorities of the new place of residence. That will allow him to access to health, education and other public services and also to be included in the electoral list. The law obliges every city council to send the data from its roll to the INE once a year. That process produces national Continuous Census of inhabitants. Foreigners usually register themselves in municipal rolls in order to benefit from welfare services even if they are not legally established in the country. Taking the Continuous Census as a frame ensures the best possible coverage of the population of residents. The ESS Spanish frame 2004 includes all residents in private houses, yet being family or collective, registered in municipal rolls. This feature can result in a selection of some individuals not included in the target population and, in consequence, in a small percentage of ineligible cases (less than 1%). Taking into account than in the ESS sample of 2002 there were a 13% on invalid addresses and the improvements introduced in the sample design of 2004, it is expected that the percentage of ineligible cases will not exceed the 10% rate. C. SAMPLE DESIGN The proposed design is a stratified two-stage sample design. 2
The strata will be obtained by crossing two population classification criteria. The first criterion is the Autonomous Community of residence, the Spanish regions (there are 17 of them plus another one grouping the North-African autonomous cities of Ceuta and Melilla). The second criterion (the type of habitat criterion) distinguishes among three types of habitat according to their size and its capital nature: a first bracket encompassing provincial and regional capital cities, a second bracket encompassing all non-capital cities with a population over 100,000 inhabitants, and a rural area bracket with the rest of towns. (See appendix 1 for the justification of this stratification.) The cross-cutting of the two criteria gives a total of 54 theoretical strata (18x3). Only 43 of them are effective because some autonomous communities do not have towns bigger than 100,000 inhabitants different to the capital city, and the stratum corresponding to Ceuta and Melilla is reduced to these capitals (there are no other towns in it). The number of individuals by stratum will be selected proportionally to the weight of this stratum in the total population. In each stratum the two sampling stages will be the following: 1. In the first stage, a fixed number of census sections will be drawn with probability proportional to the number of inhabitants in each section. Thus, census sections will be the primary sampling units (PSUs) 1. 2. In the second stage, for each PSU selected in the previous stage, 6 or 7 individuals per unit will be randomly drawn: 6 in rural area units and 7 in urban areas. 1 There are 34,600 electoral sections in Spain. Electoral sections are the most elementary framing unit of eligible voters. The size of sections vary between 500 and 2,000 voters (18+ years old), being 1,300 the average size. 3
Due to the low response rate in the Basque Country in the first round of the ESS, we proposed now to increase the extracted sample size in this autonomous community, by drawing 1.5 times more sections for than we would given the weight of its population within the whole country. The inclusion probabilities of sections and individuals will be provided by the INE. D. DESIGN EFFECTS In the sampling design detailed above, there are two basic aspects that should be considered to take into account into the design effects: the one due to stratification and the one due to the existence of clustering. While stratification increases precision, the existence of clusters decreases it. The expected total design effect, DEFF, is the product of both effects. It is worth to note that in the Round 2 design there are only two steps instead of the three that were in Round 1, because the sample is of individuals instead of addresses. In consequence, in Round 2 there are no differences in the probability of individual selection derived from the type of household. The design effect due to stratification (DEFF s ) and the one due to the existence of clustering (DEFF c ) for the 2004 round have been estimated from the data obtained in the 2002 round (see appendix 2 for the calculation of these estimates). DEFF = (DEFF s ) * (DEFF c ) = 0.956. 1.275 = 1.219 The design proposed results in an equal probability of selection for all individuals in the same stratum, although there are some differences among strata. In 4
consequence, the DEFF p = 1 and it is not necessary to take it into account to compute total design effects. There are two sources for differences among strata. First, the distribution of target population would differ from the distribution of total population (used in the assignment of sections and individuals of the sample to strata). Second, some strata are over-represented in order to compensate low response rates. For example, urban areas, and Basque Country and Ceuta y Melilla regions have low response rates than rural areas and the rest of regions respectively. It is expected that at the end the probability of belonging to the set of interviewees will be nearly the same for all individuals of all strata. E. RESPONSE RATE One of the Central Coordinating Team (CCT) objectives for this second ESS round is to achieve a response rate of 70%, as it was for the first round. But 2002 results show that it is not an easy goal to reach: only one third of participating countries did it. There are several factors that conditioned the response rate reached in Spain, most of them related with the fieldwork process. Once these factors have been identified, a special effort will be done in this second round to overcome them and achieve a response rate much higher than the 53% reached in the first round. Although it is intended to reach a threshold of 70%, a safe estimation of 65% of response rate is being proposed for the calculation of the sample size. The aim of this is to guarantee in any case a minimum effective sample size of 1,500, which is another requirement established by the CCT. 5
F. SAMPLE SIZE To calculate the size of the sample to be selected the estimated percentage of valid locations of individuals, the estimated response rate and the estimated design effect must be taken into account. The sample of households used in the 2002 round had a 13% of invalid addresses. As it has been previously said we hope to reduce this figure by using a sample of individuals. In the calculations that follow we have taken 90% as the estimate percentage of valid cases for the 2004 sample. The results of the 2002 survey confirmed the difference in the response levels between urban and rural areas: 49% in urban areas and 57% in rural areas, which means a participation 16% higher among the latter. Taking this into account, we anticipate for the 2004 round a mean response of 65%, which would correspond to a response rate of 60.0% in the two urban brackets and 69.6% in the rural area. Regarding the two sources of design effects, it has been estimated a total design effect of 1.219. Taking into account what is discussed in the previous paragraphs, the calculations to determine the sample size for the 2004 survey are the following: Minimum effective sample size = 1,500 Net sample size = 1,500. 1.219 = 1,829 Total valid cases = 1,829 / 0.65 = 2,814 Gross sample size = 2,814 / 0.90 = 3,126 Gross sample size including the Basque Country overrepresentation = 3,126 + 80 = 3,206 6
In the process of assigning individuals to each stratum, the constraint to take an integer number of sections in each stratum has led to modify slightly the total number. Therefore, the total sample size is 3,213 individuals distributed by strata proportionally to the population in each of them, with a 50% increment of the Basque Country sample (see in appendix 3 the information used for assigning to each stratum the number of individuals to select). The following table shows the number of sections and individuals to select in each stratum. Distribution of sections and individuals by strata Number of sections Number of interviews Capitals Cities> 100,000 Rural areas Total Capitals Cities> 100,000 Rural areas Andalucía 25 5 58 88 175 35 348 558 Aragón 8 0 7 15 56 0 42 98 Asturias 2 3 8 13 14 21 48 83 Baleares 4 0 6 10 28 0 36 64 Canarias 6 1 13 20 42 7 78 127 Cantabria 2 0 5 7 14 0 30 44 Castilla y León 12 0 18 30 84 0 108 192 Castilla-La Mancha 4 0 17 21 28 0 102 130 Cataluña 20 11 45 76 140 77 270 487 Valencia 13 2 36 51 91 14 216 321 Extremadura 2 0 11 13 14 0 66 80 Galicia 6 3 24 33 42 21 144 207 Madrid 32 11 19 62 224 77 114 415 Murcia 4 2 8 14 28 14 48 90 Navarra 2 0 5 7 14 0 30 44 País Vasco 12 0 26 38 84 0 156 240 La Rioja 1 0 2 3 7 0 12 19 Ceuta y Melilla 2 0 0 2 14 0 0 14 Total 157 38 308 503 1,099 266 1,848 3,213 Total 7
The proposed design for the second round sample improves the first round s precision, due to three aspects of the 2004 design: Increase in the number of strata Removal of the intermediate step of address selection Reduction of the number of individuals selected in each primary sampling unit 8
Appendix 1: Stratification We discuss below the reasons for stratification by region and type of habitat: Stratification by region. Regions (or autonomous communities) in Spain present a great variety of structural, demographic and socioeconomic characteristics. Some regions are basically based in agriculture, other are mainly industrial; the population of some regions is considerably younger than that of others; because of geographical features, some are more easily accessible than others, etc. In addition, political opinions are also diverse among regions. For example, there are strong feelings of nationalism in some regions, and none at all in others, and there are political parties which carry out their activities only in specific regions. All these facts supported the inclusion of regions as strata in the ESS first round. The analysis of the first round data has corroborated the existing differences among communities in individuals responses and, thus, the benefit to stratify by autonomous communities. Stratification by type of habitat. In the first round design the following stratification was used according to town size: urban (capitals of provinces) and rural (rest of towns). This classification was supported at that moment by the experience of the survey organisation in the sense that the response rate usually is 20% higher in rural areas than in urban areas. From the data of the first round of the ESS it has been observed that, although there is some difference in the response rate, it is not so big as anticipated. The response rate in urban areas has been 49% while in rural areas it has been 57%, that is, a participation 16% higher in rural areas. At the same time, it has been introduced a third stratification group formed by those towns that are not capital-cities with a population greater than 100,000. This is justified because nowadays in Spain there are a significant number of big 9
towns that cannot be considered as rural areas. Therefore the introduction of this third group of stratification means a clear improvement in the representation ability of the sample. Appendix 2: Design effects estimation To calculate the design effects in the 2004 round we have used the 2002 round data. In the first place it has been selected a group of variables from the different sections of the survey that could have some variability among strata. These variables are the following: NWSPPOL Newspaper reading, politics/current affairs on average weekend POLITNTR How interested in politics TRSTLGL Trust in the legal system TRSTPLT Trust in politicians STFDEM How satisfied with the way democracy works in country SCLMEET How often socially meet with friends, relatives or colleague IMPAVO Good citizen: how important to be active in voluntary organizations Secondly, the design effects due to stratification and the existence of clusters have been calculated for the seven variables. The design effects DEFF s and DEFF c have been estimated as the average of the effects of the variables. The total effect design has been estimated as the product of these two values. The intra-group correlation coefficients (ρ) for the seven variables have been estimated using variance decomposition models. 10
The expressions used in the calculation of design effects of each variable have been the following: DEFF s = i 2 2 n s i i n ni 1 2 s n 1 DEFF c = 1 + ( k 1) ρ = 1+ (4.095 1) 0.089 = 1. 275 Being: n total number of cases n i number of cases in stratum i s 2 variance of the sample assuming a simple random sample 2 s i variance in stratum i ρ intra-group correlation coefficient k estimate of the average individuals in each cluster. The estimated value of k (4.095) is the higher of the expected number of completed interviews in clusters of rural and urban areas. Depending on the areas, the figures are 6 0.65 0.9 = 3.510 and 7 0.65 0.9 = 4.095 respectively. As it is suggested by the expert panel on ESS sampling, the conservative estimate has been taken because the small number of interviews in each cluster. The following table shows the values obtained for the seven variables and its average. 11
DEFF s ρ POLITNTR 0.953 0.059 STFDEM 0.951 0.085 TRSTLGL 0.953 0.071 TRSTPLT 0.978 0.087 IMPAVO 0.921 0.188 NWSPPOL 0.964 0.028 SCLMEET 0.971 0.103 Average 0.956 0.089 Appendix 3: assignation of the number of individuals and sections to each stratum The following table shows the distribution of the Spanish population in the 43 strata considered: Spanish population Residents in capitals Residents in towns>100.000 (not capitals) Residents in Rural areas Andalucía 2,312,345 486,765 4,558,448 7,357,558 Aragón 692,306 511,909 1,204,215 Asturias 201,154 266,419 595,425 1,062,998 Baleares 333,801 507,868 841,669 Canarias 543,340 128,822 1,022,315 1,694,477 Cantabria 180,717 354,414 535,131 Castilla y León 1,053,924 1,402,550 2,456,474 Castilla-La Mancha 395,156 1,365,360 1,760,516 Cataluña 1,804,091 1,021,768 3,517,251 6,343,110 Valencia 1,170,688 194,767 2,797,321 4,162,776 Extremadura 216,235 842,268 1,058,503 Galicia 507,245 280,186 1,908,449 2,695,880 Madrid 2,938,723 1,033,826 1,450,835 5,423,384 Murcia 370,745 184,686 642,215 1,197,646 Navarra 183,964 371,865 555,829 País Vasco 745,201 1,337,386 2,082,587 La Rioja 133,058 143,644 276,702 Ceuta y Melilla 137,916 137,916 Total 13,920,609 3,597,239 23,329,523 40,847,371 Total Source: INE, 2001 Census 12
Taking the 3,126 individuals of the sample, distributing them in proportion to the population of the three defined strata according to the urbanization criteria, and assigning 6 individuals per section in rural areas and 7 in urban areas, gives this first result: Capitals (34%): 152 sections 1,063 individuals Large towns (9%): 40 sections 281 individuals Rural areas (57%): 297 sections 1,782 individuals TOTAL 489 sections 3,126 individuals When distributing these sections proportionally to the population of each autonomous community it has been necessary to round the figures to get an integer number of sections in each stratum and an integer number of individuals equal to the number of sections multiplied by 6 or 7 depending on the stratum area (rural or urban). The number of sections and individuals to choose for the sample, including the Basque Country over-representation, has finally been: Capitals (34%): 157 sections 1,099 individuals Large towns (9%): 38 sections 266 individuals Rural areas (57%): 308 sections 1,848 individuals TOTAL 503 sections 3,213 individuals 13