PQLI Dataset Codebook Version 1.0, February 2006 Erlend Garåsen Department of Sociology and Political Science Norwegian University of Science and Technology
Table of Contents 1. Introduction...3 1.1 Files...3 1.2 Format...3 2. Methodology...4 References...7 Appendix...8 2
1. Introduction This codebook describes the PQLI dataset available for 139 countries with more than one million inhabitants, ranging from 1975 to 2000. It was originally constructed for my Master s Thesis in Political Science: Democracy and Development: A Comparative Analysis in Time and Space. Please cite as follows if you are using this dataset: Garåsen, E. (2006). Democracy and Development: A Comparative Analysis in Time and Space. Master s Thesis, Department of Sociology and Political Science, Norwegian University of Science and Technology. 1.1 Files codebook.pdf This document describes the methodology, content and format of the different datasets Pqli_75_00.dta PQLI data for 139 countries in Stata format Pqli_75_00.sav PQLI data for 139 countries in SPSS format Pqli_75_00.txt PQLI data for 139 countries in tab delimited text file Pqli_75_00.xls PQLI data for 139 countries in Excel format 1.2 Format Three fields are included in the dataset; YEAR, PQLI and COW. Years are ranging from 1975 to 2000 as long as there are available data for the countries. Countries are coded using the Correlates of War (COW) format. See Appendix for a complete list of the COW codes and the country mappings. 3
2. Methodology In order to construct a PQLI dataset for time-series, I tried, as far as possible, to obtain a complete dataset for all countries with no missing data for each year included. I also used the same source for collecting the data, and only used additional sources where missing data were a serious problem. Data for life expectancy, infant mortality and literacy are mainly collected from the World Bank Development Indicators, whereas some data for adult literacy, which were missing from the World Bank, are taken from the UNESCO s Statistical Yearbook, various years; UNDP s Human Development Report, various years; UNICEF s The State of the World s Children, various years; and from Kurian s (1979) The Book of World Rankings. Missing data are mainly a problem for very poor countries, especially for early years, countries with less than one million inhabitants and for closed societies such as North Korea. Since missing data were especially a problem for the years prior to 1975, it was decided only to include the years from 1975 to 2000, giving 26 years in the time-series dataset. Also, only countries with more than one million inhabitants in all time periods were included. This resulted in a dataset with 139 countries with 3,540 observations. There was a problem with the life expectancy data collected from the World Bank. Morris (1979) uses life expectancy at age 1, whereas the World Bank publishes data of life expectancy at birth, so life expectancy at age 1 had to be constructed from the data for life expectancy at birth and infant mortality using the following formula: [ LE(0) ( IM AVG] (1 ) LE( 1) = IM (1) where LE(1) is life expectancy at age 1, LE(0) is life expectancy at birth, IM is infant mortality rate per thousand live births and AVG is the average time infants live who die in their first year of life. 1 LE(1) gives more weight to the mortality rate of infants under one year old of age relative to the mortality rates of other age groups. Morris defines literacy as the population aged fifteen and over being able to read and write. This definition may not be suitable for all countries that may have defined literacy 1 This formula is the same used by Van der Lijn (1995). AVG needs to be further explained since it is a complicated measure. The AVG value is 0.25 for countries with an infant mortality rate of 100 per thousand live births or more, 0.5 for countries with an infant mortality rate of 10 per thousand live births or less and in between 0.25 and 0.5 for countries between these rates, which means, Albania had an AVG value of 0.35 in 1970 with an infant mortality rate of 0.065, but increased the AVG value in 2000 to 0.45 with an infant mortality rate of 0.020. 4
differently. The World Bank publishes data for illiteracy, the percentage of the illiterate people aged fifteen and above, so the data for adult literacy has been obtained by subtracting the illiteracy rate by 1. There are huge gaps in the literacy data, and the reason is that such data are not collected annually as the rate is assumed not to change significantly from one year to the next. This is also the reason why UNESCO collects such data in a five-year period. Where such gaps existed, linear interpolation was used to obtain data between known literacy values 2. If there were no known values for, say year 1980, all years prior to 1980 were coded as missing. Likewise, linear interpolation was not used in cases where I did not have a value for the latest years, for example the years 1999 and 2000 are missing from Somalia. There is one exception to this rule for Guinea, where linear interpolation was used to obtain data from 1975 to 1990 by using the literacy rate back to 1965. Estimating missing data for such a long time span may result in imprecise rates, and it is done only for a few cases. Literacy rates for North Korea should also be read carefully since linear interpolation was used to estimate values between the years 1977 to 2000. Another problem was to estimate missing data for the many OECD countries where UNESCO does not have any recorded data. A similar method used by UNDP when constructing the HDI was used where all the OECD countries was given the literacy rate of 99 per cent. In cases where there actually were data available for such countries, these values where used except for countries with a literacy rate above 99 per cent like Tajikistan for the years 1998 to 2000. Since it is assumed that a literacy rate of 100 per cent cannot be obtained, 99 per cent is the highest available rate. For infant mortality rate, a few cases had missing values. The largest time span between these missing values were four years, so linear interpolation was used to estimate the values for these cases. In addition, four values were deleted and estimated instead by linear interpolation for Central African Republic for the years 1983 to 1986, probably due to errors in the World Bank data as the values were zero. These three indicators were then converted to indices ranging from 0 to 100 where 0 represented the worst and 100 the best performance. The index for life expectancy at age 1 2 Linear interpolation is an estimating method for missing data. To be able to estimate such data, one must assume that a phenomena changes in time. The rate of change must be, ideally, constant and one needs two time points in order to obtain the missing values. For the literacy rates, it was assumed that the rate of change was constant between the missing values, and the following formula was applied to obtain the slope of the line: y = (x H - X L ) / (n + 1), where y is the slope, x H is the highest value and x L is the lowest value between the missing values, and n is the number of missing values between x H and x L. E.g., if one needs to find the two values between 3 and 4, one would first need to find the slope, or the change between these values: y = (4-3) / (2 + 1) = 0.33. So the next higher value to 3 is 3 + 0.33 = 3.33. 5
was calculated by using the highest and lowest recorded values in my dataset, 81 and 40 respectively. 3 The index for infant mortality was calculated almost similarly with the highest and lowest recorded values in my dataset, 263 and 2.9 respectively. 4 The literacy indicator was not rescaled. These three indices were then calculated to a composite indicator by averaging them, giving each indicator equal weights in order to obtain PQLI values. Note that PQLI rates have been calculated for some countries prior to their independence year, such as Estonia from 1975 to 1991. The reason that these data exist is basically due to recorded values for these countries from their sources, so there was no attempt to estimate them. As these values will contribute to a more balanced time-series dataset with fewer missing values, they are kept. 3 This differs somewhat from Morris (1979) methodology. Morris assumes that large improvements in life expectancy will only occur if there is a breakthrough in the study of geriatrics, and he uses 77 years, two years above the current best, as the upper limit and 38 years as the lower limit (Vietnam in 1950). I do not go beyond my time-series dataset to find lower or higher values. Japan has the highest recorded value for life expectancy at age 1 in 2000, and Rwanda has the lowest value in 1992. The formula applied by Morris for the 0 to 100 index is: Life expectancy at age one - 38.39 where 38 is the worst recorded life expectancy value and.39 is a factor calculated by subtracting the lowest recorded value from the highest, divided by 100, so a change in life expectancy of.39 years will result in onepoint change in the index (ibid., pp. 45f). 4 Since the highest value for infant mortality rate represents the worst compared to the lowest value, which represents the best, the formula is somewhat different: 229 - infant mortality rate per thousand 2.22 where 229 is the worst recorded value Morris uses and 2.22 is a factor calculated by subtracting 7 (the best recorded value Morris uses minus one) from the worst value, divided by 100 (ibid., pp. 43ff). I use 263 and 2.9 as the worst and best values, Cambodia in 1977 and Singapore in 2000 respectively. 6
References Garåsen, E. (2006). Democracy and Development: A Comparative Analysis in Time and Space. Master s Thesis, Department of Sociology and Political Science, Norwegian University of Science and Technology. Kurian, G.T. (1979). The Book of World Rankings. London: The Macmillian Press Ltd. Morris, D.M. (1979). Measuring the Conditions of the World s Poor: The Physical Quality of Life Index. New York: Pergamon Press Inc. UNESCO (1980). Statistical Yearbook 1980. London: UNESCO UNESCO (1984). Statistical Yearbook 1984. Paris: The Unesco Press UNESCO (1991). Statistical Yearbook 1991. Paris: The Unesco Press UNESCO (1993). Statistical Yearbook 1990/91. New York: Department of Economic and Social Information and Policy Analysis, Statistical Division UNESCO (1994). Statistical Yearbook 1994. Paris: The Unesco Press UNESCO (1995). Statistical Yearbook 1995. Paris: UNESCO Publishing & Bernan Press UNDP (1999). Human Development Report 1999. New York: Oxford University Press UNDP (2000). Human Development Report 2000. New York: Oxford University Press UNDP (2002). Human Development Report 2002. New York: Oxford University Press UNICEF (1996). The State of the World s Children 1996. New York: UNICEF UNICEF (2000). The State of the World s Children 2000. New York: UNICEF UNICEF (2001). The State of the World s Children 2001. New York: UNICEF UNICEF (2005). The State of the World s Children 2005. New York: UNICEF Van der Lijn, N. (1995). Measuring well-being with social indicators, HDI, PQLI, and BWI for 133 countries for 1975, 1980, 1985, 1988, and 1992. Tilburg University, Faculty of Economics and Business Administration Research Memorandum, No. 704 World Bank (2002). World Development Indicators. New York: United Nations 7
Appendix COW Country 700 Afghanistan 339 Albania 615 Algeria 540 Angola 160 Argentina 371 Armenia 900 Australia 305 Austria 373 Azerbaijan 771 Bangladesh 370 Belarus 211 Belgium 434 Benin 145 Bolivia 346 Bosnia 571 Botswana 140 Brazil 355 Bulgaria 439 Burkina Faso 516 Burundi 811 Cambodia 471 Cameroon 20 Canada 482 Central African Republic 483 Chad 155 Chile 710 China 100 Colombia 484 Congo, Republic of the 490 Congo, Democratic Republic of the 94 Costa Rica 437 Cote d Ivoire 344 Croatia 40 Cuba 316 Czech Republic 390 Denmark 42 Dominican Republic 130 Ecuador 651 Egypt 92 El Salvador 531 Eritrea 366 Estonia 530 Ethiopia 375 Finland 220 France 372 Georgia Comments 8
255 Germany United 452 Ghana 350 Greece 90 Guatemala 438 Guinea 41 Haiti 91 Honduras 310 Hungary 750 India 850 Indonesia 630 Iran 645 Iraq 205 Ireland 666 Israel 325 Italy 51 Jamaica 740 Japan 663 Jordan 705 Kazakhstan 501 Kenya 731 Korea, North 732 Korea, South 703 Kyrgyzstan 812 Laos 367 Latvia 660 Lebanon 570 Lesotho 450 Liberia 620 Libya 368 Lithuania 343 Macedonia 580 Madagascar 553 Malawi 820 Malaysia 432 Mali 435 Mauritania 70 Mexico 359 Moldova 712 Mongolia 600 Morocco 541 Mozambique 775 Myanmar 790 Nepal 210 Netherlands 920 New Zealand 93 Nicaragua 436 Niger 475 Nigeria 385 Norway 770 Pakistan 9
95 Panama 910 Papua New Guinea 150 Paraguay 135 Peru 840 Philippines 290 Poland 235 Portugal 360 Romania 365 Russia 517 Rwanda 670 Saudi Arabia 433 Senegal 451 Sierra Leone 830 Singapore 317 Slovakia 349 Slovenia 520 Somalia 560 South Africa 230 Spain 780 Sri Lanka 625 Sudan 380 Sweden 225 Switzerland 652 Syria 702 Tajikistan 510 Tanzania 800 Thailand 461 Togo 616 Tunisia 640 Turkey 701 Turkmenistan 500 Uganda 369 Ukraine 200 United Kingdom 2 United States 165 Uruguay 704 Uzbekistan 101 Venezuela 816 Vietnam, North 818 Vietnam United 678 Yemen, North 679 Yemen United 345 Yugoslavia United 551 Zambia 552 Zimbabwe 10