National Programme for Estonian Language Technology: a Pre-final Summary

Similar documents
Strategic Importance of Language Technology in Estonia

Legal Deposit Copy Act

Centres of Excellence in Research Estonian case. CoE Seminar, Academy of Finland, Helsinki

FICE. Foreign Investors Council in Estonia

Estonia. Indrek Eensaar Ministry of Culture. A. Users and content

Local Language Computing Policy Initiatives

Parliamentary proceedings in Italian Senate

Estonian populations satisfaction with public e-services Main findings. TNS Emor. TNS Emor. AS Emor

Attitudes to immigrants and integration of ethnically diverse societies

Mid Ulster District Council Irish Language Policy Mid Ulster District Council Irish Language Policy

DRAFT RECOMMENDATION ON THE PROMOTION AND USE OF MULTILINGUALISM AND UNIVERSAL ACCESS TO CYBERSPACE OUTLINE

Estonian eid Infrastructure ITAPA 2009 International Congress November 3, 2009 Bratislava

QUALITY OF LIFE IN TALLINN AND IN THE CAPITALS OF OTHER EUROPEAN UNION MEMBER STATES

C U R R I C U L U M V I T A E

BEFORE. Benchmarking and foresight for regions of Europe.

Belonging and Exclusion in the Internet Era: Estonian Case

DRAFT ANNUAL TOURISM REPORTING TEMPLATE

The IWSLT 2015 Evaluation Campaign

MINUTES OF THE MEETING OF THE JOINT SESSIONS OF THE ESTONIAN-LATVIAN AND LATVIAN-ESTONIAN INTERGOVERNMENTAL COMMISSIONS FOR CROSS-BORDER CO-OPERATION

Promoting Democracy. as a Task for. Parliamentary and Political Parties Archives

Western Balkans: developments in the region and Estonia s contribution

Estonian SOLVIT Centre Digital Estonia benchmarking good practices

A summary report of the research project Russian Child in Estonian General Education School. Katrin Kello, Anu Masso, Valeria Jakobson

The legislator has also assigned various other tasks to the Inspectorate. We have also been assigned tasks with international legislation.

ESTONIA S FOURTH REPORT ON THE IMPLEMENTATION OF THE COUNCIL OF EUROPE FRAMEWORK CONVENTION FOR THE PROTECTION OF NATIONAL MINORITIES

Knowledge-based Estonia. Kristi Hakkaja Secretariat of Estonian R&D Council

Evaluation report on the sixth round of mutual evaluations:

European Integration Forum Summary report of the first meeting April 2009

Eesti Rahvusraamatukogu digitaalarhiiv DIGAR. Researcher Mobility in Estonia and Factors that Influence Mobility

Studies on translation and multilingualism

Estonia. Marika Ahven Indrek Eensaar Anton Pärn Marju Reismaa Ministry of Culture. 1. Users and content

Students residing and working in Estonia with TRP. Annika Karm Chief Expert Identity and Status Bureau

Observations on the perception of the multilingual linguistic landscape: The case of Estonia

5-6 OCTOBER 2012 TALLINN

Overview of Family Business Relevant Issues. Country Fiche Estonia

Subjectivity Classification

ADJUSTMENT OF EXPATRIATES IN THE BALTIC STATES

Electronic Voting For Ghana, the Way Forward. (A Case Study in Ghana)

The Great Escape and the Shetland bus - two sides of the coin that shaped the 1951 UN Convention on Refugees

Address given by Indulis Berzins on Latvia and Europe (London, 24 January 2000)

"Can RDI policies cross borders? The case of Nordic-Baltic region"

The Estonian Information Society Developments Since the1990s

Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science

The intriguing case? Source: European Commission, CORDA data, cut-off date January 1, 2017

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene

FORUM ON CREATIVITY AND INVENTIONS A BETTER FUTURE FOR HUMANITY IN THE 21 ST CENTURY

IEP Risk and Peace. Institute for Economics and Peace. Steve Killelea, Executive Chairman. Monday, 18th November 2013 EIB, Luxemburg

Convention on Nuclear Safety

44 th Congress of European Regional Science Association August 2004, Porto, Portugal

Coreference Semantics from Web Features. Mohit Bansal and Dan Klein UC Berkeley

PERIODIC REPORT BY ESTONIA

Euro changeover in Estonia. Ingvar Bärenklau, eurocommunications manager

APPLICATION OF THE CHARTER IN MONTENEGRO

Regulatory dialogue between Russia and the EU The political and economic context

VOLUNTARY CONTRACT NOTICE FOR UPCOMING PROCUREMENT PROCEDURE FREELANCE LAWYER-LINGUISTS FOR FRENCH Ref. ECB/13519/2010

Approval of the Statutes of Statistics Estonia

Museums Act. Passed RT I, , 1 Entry into force

In partnership with. Sponsored by. Project publisher. With the support of the Lifelong Learning Programme of the European Union

COMPETITIVENESS IN TEXTILE AND CLOTHING SECTOR IN ESTONIA

he World Digital Library

RULES AND REGULATIONS. Approved by the Annual General Meeting of CERL on 29 October 2014.

Competition and EU policy-making

European Union Universities of Small States

THE PRIORITIES OF THE ESTONIAN PRESIDENCY

2

The Estonian Parliament and EU Affairs

Estonia in international and regional organizations

The objective of the survey "Corruption in Estonia: a survey of three target groups" is to find answers to the following questions:

Audience participation as the framework of activities for museums and heritage institutions

5/6/2009. NADRA 72 Million

4.1 THE DUTCH CONSTITUTION. The part of the government that makes sure laws are carried out 1 mark.

Qualification and Skills Passport for the European Hotel and Restaurant Sector

CHAIRMAN S STATEMENT

Security Education for the Prevention of Terrorism

RULES OF THE WORLD SCHOOLS DEBATING CHAMPIONSHIPS

NOTICE. Opening of an international application procedure to recruit 2 Assistant Professors in the 2nd Group Subject Area Private Law

Proposal for a COUNCIL REGULATION (EU) on the translation arrangements for the European Union patent {SEC(2010) 796} {SEC(2010) 797}

Prentice Hall Abriendo Paso: Gramatica 2007 and Abriendo Paso: Lectura 2007

Collection management in resource sparing development of Estonia

9107/15 TB/at 1 DG G 3 B

CENTRAL CATALOGUE OF OFFICIAL DOCUMENTS OF THE REPUBLIC OF CROATIA

The Republic of Austria and the Republic of Serbia (hereinafter referred as the two Sides ),

LIST OF COURSES OFFERED SPRING 2018

Global Partnership for Effective Development Co-operation Indicative Terms of Reference Focal point for trade unions at the country level

AGREEMENT ON CULTURAL COOPERATION BETWEEN THE EUROPEAN UNION AND ITS MEMBER STATES, OF THE ONE PART, AND COLOMBIA AND PERU, OF THE OTHER PART

TERMS OF REFERENCE DEVELOP A SADC TRADE DEVELOPMENT AND TRADE PROMOTION FRAMEWORK. November 2017

How s Life in Estonia?

EUROPEAN COMMISSION APPLICANT COUNTRIES PUBLIC OPINION IN THE COUNTRIES APPLYING FOR EUROPEAN UNION MEMBERSHIP MARCH 2002

Estonian National Electoral Committee. E-Voting System. General Overview

List of persons with restrictions on gambling (HAMPI) Specification

Register of the Baltic Heritage Network collection. No online items

Summer school for junior magistrates from South Eastern Europe

Priority Area on Maritime Safety and Security EU Strategy for the Baltic Sea Region

Source: Ministry for Human Rights

Conference on Children s Rights in the Migration Crisis and in the Digital Environment. 4 November 2016 in Tallinn. Summary

Miyakita, Goki; Leskinen, Petri; Hyvönen, Eero U.S. Congress prosopographer - A tool for prosopographical research of legislators

QUALITY OF COURT PERFORMANCE: EXTERNAL EVALUATION

Internet voting in Estonia

Graduate, Undergraduate and Diploma level University of Malta, Malta

Digital humanities methods in comparative law

Transcription:

National Programme for Estonian Language Technology: a Pre-final Summary Einar Meister**, Jaak Vilo* & Neeme Kahusk*** **Vice-chairman, *Chairman & *** Coordinator of the Programme

Outline HLT evolution in Estonia Management Financing Supported projects Research groups Future prospects Summary

HLT evolution in Estonia 1960-70s: machine translation experiments, experimental phonetics, speech analysis & synthesis, semantic analysis, computer linguistics 1980s: microprocessor-controlled formant synthesis, speech recognition, human-machine dialogue modelling, electronic dictionaries 1990s: corpus linguistics text and speech corpora, morphologic analysis speller for Estonian, electronic dictionaries, Web-resources, participation in EU-projects (WordNet, BABEL, etc) 2000s: written and spoken language corpora, morpho-syntactic and semantic analysis, lexical resources and tools, speech synthesis and recognition, dialogue models, information retrieval, machine translation, Web-based access to different resources and tools

HLT evolution in Estonia Coordinated actions: Estonian HLT program supported by the Estonian Informatics Centre (1997-2000) EU FP5 project evikings II (2002-2005): Roadmap for Estonian HLT 2004-2011 Centre of Excellence in HLT (2003): successful in first round, failed in final round Estonian Language Technology Development Centre (2005): accepted for financing, but failed due to the withdrawal of the main industrial partner National programme Estonian Language and Cultural Heritage (1999-2003): some HLT-projects funded National programme Estonian Language and National Memory (2004-2008): sub-programme for Estonian HLT (2004-2005) Development Strategy of the Estonian Language 2004-2010 National Programme for Estonian Language Technology (2006-2010)

National Programme for Estonian Language Technology 2006-2010 Government supported funding initiative aimed at developing of Estonian language resources and language-specific software in order to enable Estonian to function in the modern information technology environment Estonian Ministry of Education and Research

Management (1) Steering committee of 9 members including representatives of the ministries and HLT-experts responsible for: evaluation of project proposals and progress reports making funding proposals purposeful use of public funding surveying the developments in the HLT field on the national and international scale

Management (2) Programme coordinator responsible for: preparing calls for projects project contracts and reports communication between the ministry, steering committee and project leaders documentation and Web-site administration

Management (3) General rules: financing of projects based on open competition evaluation of projects based on well-established criteria international standards/formats need to be followed groups are requested to provide annual progress reports developed prototypes and language resources are public

Management (4) Project evaluation criteria: for new applications: relevance of the proposal in the context of the programme methods applied to achieve the goals of the project competence and experience of the project team usefulness of project s results for other projects compatibility and use of standards etc. for assessment of the annual progress of on-going projects

Funding (1) Funding decision is based on the average score of individual ratings given by the steering committee members Average score Coefficient 90-100% 0,8-1 65-90% 0,7-0,9 < 65% 0 Depending on available funding and number of application s Ca 33% for corpus projects, 65% for software & research projects, 1-2% for management

Statistics: projects & funding 2006 2007 2008 2009 2010 Number of project applications 22 22 (18+4) 23 (20+3) 24 (15+9) 24 (22+2) Number of funded projects 18 20 (18+2) 23 (20+3) 23 (15+8) 24 (22+2) Total funding, MEEK (MEUR) 7.3 (0.47) 7.1 (0.46) 13.4 (0.86) 12.9 (0.83) 11.8 (0.75)

Projects http://www.keeletehnoloogia.ee/projects Speech corpora emotional speech, spontaneous speech, dialogues, L2 speech, radio news and talk shows Text corpora written language corpus, multi-lingual parallel corpora, resources for interactive language learning Research/technology development speech recognition & synthesis, machine translation, information retrieval, lexicographic tools, syntactic & semantic analysis, dialogue modeling, rule-based language software, intelligent search engine, variations in speech production and perception

Key players (1) University of Tartu: morphology, syntax, semantics, and machine translation corpora of written and spoken language, dialogue corpora, parallel corpora, lexical and semantic database (thesaurus, Estonian WordNet), phonetic corpus of spontaneous speech rule-based language software, information retrieval, interactive Web-based language learning

Key players (2) Institute of the Estonian Language: Corpus-based speech synthesis for Estonian Estonian Emotional Speech Corpus Lexicographer's workbench

Key players (3) Institute of Cybernetics at Tallinn University of Technology: automatic speech recognition in Estonian variability in speech production and perception speech corpora including radio news and talk shows, lecture speech, foreign-accented speech

Key players (4) Filosoft: corpus query in the Estonian language website keeleveeb.ee Tallinn University: Estonian Interlanguage Corpus Estonian Literary Museum: electronic dictionary of idiomatic expressions ELIKO: a prototype of Controlled Natural Language module for knowledge-based systems

Division of funding 2006-2010 Filosoft 2.4% TlnU 2.4% ELM 1.0% ELIKO 0.2% IoC 16.1% UT 50.4% IEL 27.5%

Distribution of results (1) Centre of Estonian Language Resources: the project launched in 2008 at the University of Tartu partners Institute of the Estonian Language and Institute of Cybernetics at TUT main goal to develop the infrastructure for archiving, documenting and distribution of Estonian language resources and software tools cooperation with CLARIN project in 2010 included into the Estonian Research Infrastructures Roadmap

Distribution of results (2) Programme conferences: 1st conference: November 2007, Tallinn 2nd conference: April 2009, Tartu 3rd conference: November 25-26, 2010, Tartu

Supporting activities Development of human resources: Doctoral School of Linguistics and Language Technology (2005-2008) Doctoral School in Information and Communication Technologies (2009-2015) Centre of Excellence in Computer Science (2008-2015) Curricula on computer linguistics and language technology at the University of Tartu Speech technology course at Tallinn University of Technology

Future prospects Currently under development: Estonian BLARK Estonian HLT Roadmap for 2011-2017 follow-up programme for 2011-2017 Focus of the follow-up programme on resources, software tools and integrated prototypes for public applications Important issues: availability of resources and tools via Centre of Estonian Language Resources promoting HLT integration into public and commercial applications urgent need for HLT-engineers and researchers

Summary The national programme has created favourable conditions for HLT development in Estonia < 50 MEEK (3.5 MEUR) invested into HLT area, < 30 different projects funded Remarkable progress in the amount and diversity of Estonian language resources and tools Good bases for future applications and international cooperation Estonian HLT will be not ready by the end of 2010 a follow-up programme is necessary

Last, but not least Steven Krauwer's talk at the 2nd Baltic HLT conference in Tallinn 2005: "How to survive in a multilingual EU? Do not expect too much from the EU due to the subsidiarity principle National level activities are important if you don t care of your language no one will do! There are at least two areas which should be evolved mainly at the national level creation of language resources and training of languages technologists

Really final Are we moving fast enough? Interspeech 2010: Real time speech-to-speech translation Google voice browser, etc