Improving Record-Linkage-Software for Survey-Data

Similar documents
20. November 2017 Englische Arbeitsübersetzung BStatG 10 a (Only the German version is authentic.)

Application for a residence permit for a long-term third country national from outside the EU (sponsor)

Employer Designation Application ATLANTIC IMMIGRATION PILOT

Measuring the numbers and characteristics of refugees

Implementation Plan for the Czech Youth Guarantee Programme

Die Messung von Bildung bei Migrantinnen und Migranten der ersten Generation: Ein neues adaptives Instrument für Umfragen

A Retrospective Study of State Aid Control in the German Broadband Market

CLUSTERING OF REGIONS OF THE EUROPEAN UNION BY THE LABOUR MARKET STRUCTURE

Estimating the foreign-born population on a current basis. Georges Lemaitre and Cécile Thoreau

Employer Designation Application

Compiling of labour migration data in Thailand. National Statistical Office,Thailand

Notification form for changes concerning the recognised sponsor. 1 Details of the recognised sponsor. 2 Details of the authorised representative

Voting Participation of Natives and Immigrants in Sweden a Cohort Analysis of the 2002, 2006 and 2010 Elections

Gender, age and migration in official statistics The availability and the explanatory power of official data on older BME women

Case Study Briefing. MAMBA labour market integration for refugees and asylum seekers in the city of Münster (Münster, Germany)

Notifying Professional Trade for Natural Persons Residing in the Czech Republic (Czech natural person)

Ninth Coordination Meeting on International Migration

Carbon Management and Institutional Issues in European Cities. Kristine Kern University of Minnesota

Telefónica Czech Republic, a.s. and. member of the Audit Committee. Agreement on the Performance. of the Office of a Member of the Audit Committee

Jahrbücher f. Nationalökonomie u. Statistik (Lucius & Lucius, Stuttgart 2010) Bd. (Vol.) 230/2

These materials were made for International Workshop on National Migration Statistics System. Please do not use for quotation without permission of

Application form ST1_en_ Application for a residence and work permit for students

Employment and Immigration

Requirements for Resident Permit Visa (for a stay longer than 90 days)

User s Guide and Codebook for the ANES 2016 Time Series Voter Validation Supplemental Data

Political Districting for Elections to the German Bundestag: An Optimization-Based Multi-Stage Heuristic Respecting Administrative Boundaries

The Subsidies Act for Political Parties

Second EU Immigrants and Minorities, Integration and Discrimination Survey: Main results

EU Labour Markets from Boom to Recession: Are Foreign Workers More Excluded or Better Adapted?

Wage Dips and Drops around First Birth

INTERNATIONAL MIGRATION AND MOBILITY OF THE EU CITIZENS IN THE VISEGRAD GROUP COUNTRIES: COMPARISON AND BILATERAL FLOWS

Labour Market Integration of Refugees Key Considerations

Estimating the Margin of Victory for Instant-Runoff Voting

'Wave riding' or 'Owning the issue': How do candidates determine campaign agendas?

State Language Law. Article 1. The purpose of this Law shall be to ensure:

ECONOMY OF SIBIU COUNTY. RESOURCES FOR A FUTURE DEVELOPMENT.

Do natives beliefs about refugees education level affect attitudes toward refugees? Evidence from randomized survey experiments

ENRI - Research Memo 07/2018. Why Europe Matters. Vegard Johansen Stine Kvamme

Published in terms of Section 51of the Promotion of Access to Information Act, 2 of 2000

HOW CAN I BECOME A GREEK CITIZEN? (Simplified instructions on the acquisition of Greek citizenship)*

DETERMINANTS OF IMMIGRANTS EARNINGS IN THE ITALIAN LABOUR MARKET: THE ROLE OF HUMAN CAPITAL AND COUNTRY OF ORIGIN

Promoting environmental mediation as a tool for public participation and conflict resolution

Migration-sensitive Cancer Registration in Europe

How did Immigrant Voters Vote at the 2017 Bundestag Election? First Results from the Immigrant German Election Study (IMGES)

Hoboken Public Schools. PLTW Introduction to Computer Science Curriculum

Onward, return, repeated and circular migration among immigrants of Moroccan origin. Merging datasets as a strategy for testing migration theories.

Collaboration Agreement

The Community Well-Being Index (CWB): Measuring Well-Being in First Nations and Non-Aboriginal Communities,

GL1_en_ Application for a residence and work permit in Greenland based on salaried work

Gender Segregation and Wage Gap: An East-West Comparison

GENERAL FRAME AGREEMENT. Concluded as of below-mentioned day, month and year by and between:

European Migration Network Conference Brussels Dirk Buchwald. Integration of Refugees into Language, Training and Work

oductivity Estimates for Alien and Domestic Strawberry Workers and the Number of Farm Workers Required to Harvest the 1988 Strawberry Crop

Data Stewardship Committee Annual Report 2015

SDGs Monitoring in Ghana: Strategies and Challenges

Classical papers: Osborbe and Slivinski (1996) and Besley and Coate (1997)

The Future Voters of Germany: The impact of demographic developments and policy changes on the electorate

The new immigrant elite in German politics: representation in city councils

1. Data description. Two supplemental voter data files

EMPLOYER GUIDE. Atlantic Immigration Pilot Program

Social Cohesion Radar

Geography, gender and the migration trajectories of Indonesian and Filipino transnational parents

Bundesamt für zentrale Dienste und offene Vermögensfragen DGZ-Ring Berlin

Naturalisation and on-the-job training: evidence from first-generation immigrants in Germany

Laws 1 to 7 of the Allied High Commission for Germany (Bonn, 21 September 1949)

International Association of Procedural Law

Fertility Behavior of 1.5 and Second Generation Turkish Migrants in Germany

Parliament has resolved to pass the following law of the Czech Republic:

Immigrant entrepreneurship in Norway

Ad-Hoc Query on Revoking Citizenship on Account of Involvement in Acts of Terrorism or Other Serious Crimes

Draft. Granting Birthright Citizenship: A Door Opener for Immigrant Children s Educational Integration?

Regulations of Digital Information Processing and Communication (I&C) at the Karlsruhe Institute of Technology (KIT) [I&C Regulations]

Why is Germany so Strong in Manufacturing?

APPENDIX IV PERSONAL QUESTIONNAIRE

The Political Economy of Data. Tim Besley. Kuwait Professor of Economics and Political Science, LSE. IFS Annual Lecture. October 15 th 2007

Population and Migration Estimates

TIM DERTWINKEL. Teaching Fellow (Lehrkraft für besondere Aufgaben) Office: A Department of Social Science Phone:

Final report. (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08

The implementation of a public shaming- policy to persuade citizens to participate in waste separation in Seattle

Health (National Cervical Screening Programme) Amendment Act 2004

Surveying recently arrived refugees in Germany: the approach of the IAB-BAMF-SOEP-Refugee Study

Italian Embassy in Addis Abeba

University of Münster, Institute of Political Science, Scharnhorststraße 100, Münster, Germany.

Divorce risks of immigrants in Sweden

WHAT YOU OUGHT TO EAT ORIENTATION VERSUS PATERNALISM

Requested by BE NCP EMN on 26 th October Compilation produced on 19 th December 2011

Towards a Standard Architecture for Digital Voting Systems - Defining a Generalized Ballot Schema

ANNEX 33. Country Report GERMANY

Application for residence and work permit on grounds of salaried work

econstor Make Your Publications Visible.

Peer Effects in Language Training for Migrants

F E M M Faculty of Economics and Management Magdeburg

Professur für Policy Analyse und Politische Wirtschaftslehre. Industry 4.0. Smart Factory Workshop. Tübingen, August 29, 2016 / Daniel Buhr

17/02/07 Lars Andresen. Integration of refugees an migrants into language, training and work in Germany

MAFE Project Migrations between AFrica and Europe. Cris Beauchemin (INED)

UvA-DARE (Digital Academic Repository)

NEWCOMERS MINISTRY OF THE PEOPLES CHURCH: A Program Proposal. The influx of newcomers and the corollary settlement issues besetting them are

Global challenges and the Managing Global Governance programme

(Un-)Balanced Migration of German Graduates

Ad-Hoc Query on Directive 2004/38/EO. Requested by BG EMN NCP on 26 July Compilation produced on 03 October 2011

Transcription:

Second conference of the European Survey Research Association, 25-29 June 2007 in Prague, Czech Republic Improving Record-Linkage-Software for Survey-Data Rainer Schnell, Tobias Bachteler, and Jörg Reiher Center for Quantitative Methods and Survey Research University of Konstanz, Germany June 25, 2007 1 / 17

Introduction Increasingly survey data is linked with individual administrative data. Such linkages may be used to improve data quality of surveys. An example are work histories, because respondents tend to underreport short spells of unemployment. 2 / 17

Outline 1. Main problems of applying record linkage on survey data 2. Consenting procedures in German population surveys 3. Improving record linkage by augmenting data 3.1 A Bayes classier for nationality 3.2 Using birth place information 3.3 Geocoding of post codes 4. Summary 3 / 17

Main Problems of Applying Record Linkage on Survey Data Two main problems have to be solved: 1. Data protection objections 2. Error prone common identiers 4 / 17

The rst problem can be solved by explaining the purpose of the linkage to the respondents. The second problem can be solved by using special software. We have implemented Merge Toolbox (MTB), a state-of-the-art record linkage software for the social sciences (free for academic use). Further improvements supposedly data driven, by improving the data quality or by augmenting the existing data. We intend to integrate the data augmentation process in MTB. 5 / 17

Merge Toolbox - MTB 6 / 17

Consenting Procedures in German Surveys Signed permission of each respondent in a survey. Data protection objections can be ameliorated by obtaining the informed consent of survey respondents, whether in written form or by telephone. Contrary to the popular belief, it is possible to get the consent to link the data by a large fraction of the respondents. The consent rate depends on details of implementation like the position in the questionnaire. To explain the purpose and importance of the linkage to the respondents is essential. 7 / 17

Example text of a consent form In order to keep the interview as short as possible, we would like to use administrative data held by the Federal Employment Agency. Such administrative data could be informations about previous periods of employment or unemployment, and participation in employment programs. We would like to ask you to consent to the linkage of your administrative data with your interview data. Should these informations be analysed, it is absolutely guaranteed that all regulations of data privacy laws are strictly met. Your consent is purely voluntary. You are free to revoke it at any time you like. Adapted and translated from a questionnaire of the Institute for Applied Social Sciences (infas), Bonn. 8 / 17

Consenting Rates Table 1: Consenting rates to linkage requests in surveys conducted by infas Client Year n Rate Max Planck Institute for Human Development 1998-1999 3,000 80.6% Federal Agency of Employment 2007 1,100 69.0% Institute of Employment Research 1999-2004 9,000 78.0% Institute of Employment Research 2000-2001 24,000 73.4% Institute of Employment Research 2000-2004 4,083 79.0% Institute of Employment Research 2005-2006 24,000 91.9% Federal Ministry of Labour and Social Aairs 2002 6,183 69.0% Federal Ministry of Labour and Social Aairs 2003-2006 1,500 87.7% Federal Ministry of Labour and Social Aairs 2003-2006 10,000 97.5% Federal Ministry of Labour and Social Aairs 2004-2006 24,000 84.9% Source: Doris Hess (infas, Bonn), personal communication 9 / 17

Improving Record Linkage by Augmenting Data The second main problem when applying record linkage on survey data are error prone common identiers. Existing algorithms can be improved by augmenting data. Main purpose is to obtain additional blocking variables. We will illustrate this by three examples. 10 / 17

A Bayes Classier for Nationality Goal: to add information about the nationality of the respondents. We trained a naive Bayes classier by calculating the probabilities of trigrams to be contained in a surname of persons with nationality n i. A given surname is classied as of nationality n i if the conditional probability of n i given its trigram set is maximal. 11 / 17

Application In an experiment with 70,000 real names with known nationality of the persons, we tested the performance of the classicator. We tried to classify into 137 classes or nationalities. 81% of the names are correctly classied. Among the names of non Germans 44% are correctly classied. The PRE for the non German names amounts to 1/3. 12 / 17

Using Birth Place Information Sometimes the birth places of German respondents are available. If a birth place is not located in Germany, possibly the respondent is a naturalised person. Since most naturalised Germans were born in eastern european countries, we compiled a list of typical birth places there. Region of birth as additional blocking variable Additional usage: treating their names dierently in similarity calculations Further application: Sampling of special populations 13 / 17

Geocoding of Post Codes We have compiled geo-coordinates of nearly all of the about 30,000 German post codes. If post code is a common identier, the geo-information can be linked. Based on the post codes, the distance for every record pair can be calculated. Usable as similarity measure or as a blocking variable. Underlying hypothesis: if respondents move they are more likely to move to a place nearby. 3,737,000 German people moved across municipal borders in 2004. 70% out of them moved to places within the same federal state, 30% moved across federal state borders. (Source: Federal Statistical Oce, Data Report 2006) 14 / 17

Application In collaboration with the Bremen Cancer Registry we are going to match data from a mammography screening and a epidemiological cancer registry. Some of the women will have moved between the data collections. We will add geo-coordinates and use the distance as an additional matching variable. 15 / 17

Summary There are two main problems of using record linkage with survey data Data protection objections Error prone identier For each, we presented a possible remedy: It is possible to get the consent to link individual data by a large fraction of survey respondents. Additional blocking and matching variables can be obtained by augmenting the available data. 16 / 17

Further Information Literature Schnell, R., Bachteler, T., und Bender, S. (2003): Record Linkage Using Error Prone Strings. Proceedings of the joint statistical meeting, S. 3713-3717, American Statistical Association. Schnell, R., Bachteler, T. und Bender, S. (2004): A Toolbox for Record Linkage. Austrian Journal of Statistics, 33(1-2), 125-133. Schnell, R., Bachteler, T. und Reiher, J. (2005): MTB - Ein Record-Linkage-Programm für die empirische Sozialwissenschaft. ZA-Information, 56, 93-103. Contact recordlinkage@uni-konstanz.de Project home http://www.uni-konstanz.de/schnell/safelink.html 17 / 17