BINF6201/8201. Molecular phylogenetic methods

Similar documents
Asian Economic and Financial Review ESTIMATE OF POVERTY LINE AND ANALYZE OF POVERTY INDICES IN IRAN ( ) Morteza Afghah.

9.2. Section. Chapter. Objectives. Estimating the Value of a Parameter Using. The Logic in Constructing. for a Population Mean. Confidence Intervals

District Court and the sixteen individual county Probate Courts. We hear as a

Effects of Immigration on Measuring Cohort Fertility

Investment Compliance Solutions. Corporate Actions Taxability Management

8/19/16. Clustering. Clustering is a hard problem. Clustering is a hard problem

Warranties. by Dr Nicholas G. Berketis. ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS, MSc in International Shipping, Finance and Management

7/19/17. Today s Objectives. Leadership Strategies to Enhance Your Performance FOCUS ON YOUR STRENGTHS FOR SUCCESS!

PUBLIC SERVICE COMMISSION OF WEST VI'RGINIA CHARLESTON PROCEDURE. required to satisfy said complaint or make answer thereto, in writing,

FILE NO. LEGISLATIVE DIGEST. [Environment Code. Bottle

SEASONALITY OF UNEMPLOYMENT ON THE LABOUR MARKETS OF THE WEST POMERANIAN VOIVODESHIP

FOREIGN WORKERS IN SOUTHERN AGRICULTURE *

Defensive Counterterrorism Measures and Domestic Politics

The E ects of District Magnitude on Voting Behaviour

Legal Strategies for FDA Consent Decrees

Party Cues in Elections under Multilevel Governance: Theory and Evidence from US States

Subpoena. Information for a person requesting the issue of a subpoena

87 faces of the English clause

Gaber v Benhuri Ctr. for Laser Dentistry 2013 NY Slip Op 30378(U) February 15, 2013 Supreme Court, New York County Docket Number: /11 Judge:

The traditional delivery system for public

QUALITATIVE ANALYSIS OF CRITERIA FOR Federal Government, in cooperation with

Return Migration, Investment in Children, and Intergenerational Mobility: Comparing Sons of Foreign and Native Born Fathers

Minnesota s Judiciary Keeping It Strong, Fair and Impartial

Did Illegal Overseas Absentee Ballots Decide the 2000 U.S. Presidential Election? 1

Combating Housing Benefit Fraud: Local Authorities' Discretionary Powers

La version française de cette publication est intitulée FAST Guide du participant.

The Effects of District Magnitude on Voting Behavior

Last Time. u Priority-based scheduling. u Schedulable utilization u Rate monotonic rule: Keep utilization below 69%

The Optimal Weighting of Pre-Election Polling Data

Economy and Turnout: Class Differences in the 2000 U.S. Presidential Election Uisoon Kwon University of Minnesota Duluth

E-FILED ,13:54 Scott G. Weber, Clerk Clark County IN THE SUPERIOR COURT OF THE STATE OF WASHINGTON IN AND FOR THE COUNTY OF CLARK

Development Economics. Lecture 13: How institutions develop and why they matter Professor Anant Nyshadham EC 2273

The Falcon Chambers Model Arbitration Directions

Management of Asylum Applications by the UK Border Agency

Prepared for PC35 only

SEA GRANT LEGAL PROGRAM N_. _;or_. 56 LAW CENTER, L.S.U. U.S.p_,,9, BATON ROUGE, LA PAID PormrtNo. 733 Bn_ Rouge,_.

LEGAL STATUS AND U.S. FARM WAGES

Random Forests. Gradient Boosting. and. Bagging and Boosting

The Mexican trade liberalization process and its net effects on employment:

Introduc)on to Hierarchical Models 8/25/14. Hierarchical Models in Population Ecology. What are they and why should we use them? Topics of Discussion

IMMIGRATION VERSUS OUTSOURCING: A DEVELOPING COUNTRY S VIEW

Transnational Dimensions of Civil War

BY-LAW NO NOW THEREFORE the Council of The Corporation of the City of Kingston hereby ENACTS as follows.

UUHlelNAt, TROUTMAN SANDERS LLP. A T T O R N E Y S A T L A W 401 IITN STREET. N W. BUITE 1000 WASHIKGTON. O C t]4 TELEPHONE: 202-g;'4*2gS0

Money is where the fun ends: material interests and individuals preference for direct democracy

CNN International: Role, Impact, and Global Competitors

One Half See Clinton as Best U.S. President

CONSTITUTION OF ADASTRAL PARK LEISURE AND SPORTS (ATLAS) BODY TALK GYM CLUB

Pattern recognition applied to presidential elections in the United States, : Role of integral social, economic, and political traits

FOlA IVlarker. Records Managemeht;.White House Office of

A Dead Heat and the Electoral College

Minorcyzk v City of New York 2006 NY Slip Op 30833(U) October 30, 2006 Supreme Court, New York County Docket Number: /04 Judge: Eileen A.

Creating an Independent Commission for Federal Leaders Debates

NSW Labor Party Rules AS AMENDED BY THE 2016 NSW LABOR ANNUAL CONFERENCE

FILED: KINGS COUNTY CLERK 03/12/ :28 PM

Higher density development in Sydney: public perceptions and policy awareness

Chapter. Sampling Distributions Pearson Prentice Hall. All rights reserved

Chapter 16 Does Commitment Change Worldviews?

Public Opinion and Political Action. Chapter 6

SUPREME SPLIT: COMPARING THE ROBERTS AND REHNQUIST COURTS IDEOLOGICAL PREFERENCES TOWARD BUSINESS ACCOUNTING FOR CASE SELECTION

American Law & Economics Association Annual Meetings

Statistics Canada Catalogue no XIE Vol. 18 no. 10

Media Networks and Political Accountability: Evidence from Radio Networks in Brazil

Regional Disparities in the European Union: Focused on the Wages and Their Development

Misrepresentation in District Voting

The gender wage gap and occupational segregation in the Mexican labour market

Housing Authority of Utah County 240 ECenterS treet,p rovo,u tah Fax

CS 5523 Operating Systems: Synchronization in Distributed Systems

An Integrated Computational Model of Multiparty Electoral Competition

Immigration New Zealand Operational Manual. Border Entry. Issue Date: 2 March 2009

th DAY OF JULY A.D., 2001

Corruption Re-examined *

Improved Accuracy of Band Detection in GASepo System for Quantitative Analysis of Images in Epo Doping Control

econstor Make Your Publications Visible.

PROPOSED AMENDMENTS TO THE BOARD OF REGENTS POLICY ON WEAPONS POSSESSION

American Multimedia Giants

APPELLATE DIVISION ON APPEAL: THE JUSTICES RATES OF AGREEMENT, REJECTION, AND VINDICATION BY THE COURT OF APPEALS

Compensating victims of violent crime

Board of Trustees Meeting Minutes

Economic Analysis of the Birth-Control Law in China

Oregon Round Dance Teachers Association

Hoboken Public Schools. AP Statistics Curriculum

CANTONMENT BOARD, RANIKHET MINISTRY OF DEFENCE, GOVT. OF INDIA

Government Gazette Staatskoerant

Discrimination and Hostile Work Environment Claims Based upon Religion, National Origin, and Alienage

x : : : : : : : : x CLASS ACTION

The Roles of Global News Agencies

Biased Democracies: The Social and Economic Logic of Interest-Based Voting

AGENDA REQUEST AGENDA ITEM NO: V.3. Board Appointments. July 21, 2014 BY City Auditor and Clerk Pamela M. Nadalini City Auditor and Clerk Nadalini

Calculating Equivalent and Compensating Variations in CGE Models

Hukou and Highways WPS7350. Policy Research Working Paper 7350

Immigration New Zealand Operational Manual. Border entry. Issue Date: 29 Novemer 2010

Conference Position Paper (1976): Conference Proceeding 01

Off with their heads: Terrorism and electoral support for capital punishment in Australia *

SECTION I - BASIC INFORMATION REGARDING REPORT. 200 MacDill Blvd. Washington, D.C SECTION II - MAKING A FOIA REQUEST

How Interest Groups with Limited Resources can Influence Political Outcomes: Information Control and the Landless Peasant Movement in Brazil

Judicial Review as a Constraint on Tyranny of the Majority

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Probabilistic earthquake early warning in complex earth models using prior sampling

SECTION II - MAKING A FOIA REQUEST. SECTION Ill - ACRONYMS, DEFINITIONS AND EXEMPTIONS

Transcription:

BINF6080 Molecular phylogeetc methods 4-08-06

Maxmum lkelhood methods Ø So far we have oly cosdered a sgle ste cofgurato. he lkelhood for all stes s the product of the lkelhoods for each ste assumg all the stes evolve depedetly. Ø Suppose that there are s homologous sequeces each wth N ucleotdes. Let be the -th colum of the multple algmet.... f m θ θ θ Ø For a tree let be the lkelhood of tree for the -th ste where θ θ θ m are the ukow parameters such as the brach legth. Usg the prevous case as a example we have s d d d... l k } {... 5 3 4 4 3 v v v v v g v v v v l k h f y y y xy xk x xl x m θ θ θ v θ

Maxmum lkelhood methods Ø For smplcty let s assume the sequeces are homogeous.e. all stes evolve at the same rate the the lkelhood fucto for the etre sequece for the tree s N θ... θm f θ θ... m L θ θ Ø Here we treat L as a fucto of the parameters. We the search for the values of θ θ θ m that maxmze L gve the topology of the tree ths value of L s called a ML value of the tree. Ø Fdg the ML value ca be a slow process. Ø We do ths for all possble tree topologes ad detfy the oe that has the largest ML value as the ferred phylogeetc tree of the s sequeces. Ø Clearly dfferet substtuto models may result dfferet trees. Ø Whe the umber of OUs s larger a heurstc trees search algorthm should be used for evaluatg the alteratve trees.

Heurstc tree search usg predefed clusters Ø Although the tree space could be very large maorty of them have extremely low lkelhood values for a certa OUs. Ø So we ca safely gore these upromsg trees ad focus o the promsg oes. Ø o reduce the searchg space we ca predefe clusters f ther relatoshps are kow as the put. Ø he our example the problem becomes to exame the 05 possble trees geerated by coectg these predefed groups stead of a astroomcally large umber of urooted trees: N 5!! 3 5!! N U 4!!

Heurstc tree search usg predefed clusters Ø he ML value s computed for each tree the oe wth the largest ML value s retured as the ferred tree. Ø As ths algorthm exames all possble trees so the global optmum s guarateed f the predefed groups are correct. Ø Whe the smple J-C model was used ad a homogeous substtuto rate s assumed the resultg ML tree s smlar to the NJ ad parsmoy trees wth the problem of msplacg tree shrews sde the prmate group.

Maxmum lkelhood trees for prmates Ø However whe the more sophstcated HKY substtuto model plus sx γ-dstrbuto rate categores ad varat stes were used the tree costructed by the ML method places the tree shrews outsde of the prmate group. Ø Nevertheless there are three trfurcatos o ths tree dcatg that at a trfurcato pot ay of the three clusters ca be a outgroup of the other two ad the three trees have the same ML value.

Comparso of parsmoy ad maxmum lkelhood methods Ø arsmoy methods have oly oe assumpto that the chages o the braches are equally possble however ths assumpto may ot hold. Ø Because of the few assumptos are used parsmoy methods ther propoets beleve that these methods ca be appled to ay sequece data. Ø arsmoy method s also relatvely fast so ca be appled to larger data sets. Ø ML methods make assumptos about the evolutoary models. Ø ML methods eed to optmze all these parameters to fd the ML value therefore they are computatoally tesve ad are very slow. Ø Whe evolutoary models are properly selected ML methods ted to acheve better results tha parsmoy methods.

Heurstc tree search usg quartet puzzlg Ø he quartet puzzlg algorthm s a very fast heurstc algorthm for explorg the promsg trees. Step : Computer ML values of the three trees for all possble four sequeces 3 4 5 For each possble 4 sequeces 3 4 3 4 he best ML tree 4 3 6 3 4 trees

Heurstc tree search usg quartet puzzlg Step : Radomly pck up four sequeces place them the tree accordg to ther best ML tree. 4 3 Step 3: Radomly pck up a remag sequece ad add t to the tree such that the growg tree has a maxmum umber of best ML quartet trees. Repeat ths process utl all sequeces are added to the tree. For example f sequece 5 s radomly pcked ad f oe or both of the followg trees are the best ML quartet trees volvg 3 4 ad 5: 4 3 5 3 the the resultg tree wll be 5 5 4 3

Heurstc tree search usg quartet puzzlg he the last sequece 6 s added to the tree. If the followg s the best ML tree amog all quartet trees cotag sequece 6 6 3 4 he the resultg tree wll be 5 Add sequece 6 6 5 4 3 4 3 Ø he whole process s repeated may tmes wth the sequeces beg selected dfferet orders. he resultg tree wll deped o the order of sequece selectos. Ø he tree that happes most frequetly wll be chose as the ferred tree.

Bayesa phylogeetc methods Ø Bayesa theorem: f A ad B are two evets the A B B A A B B B A A A B AB 3 4 5 6 7 8 9 0 Ø If ad are evets that parttos the sample space ad s a evet from the sample space the.... + + +

Bayesa phylogeetc methods Ø For N OUs we ca have N-5!! possble urooted trees whch s a partto of the tree space. Let be the algmet of the N OUs but we do ot kow whch tree s most lkely to accout for. tree tree tree 3 tree 4 tree 5 tree 6 tree 7 tree 8 tree 9 tree 0. tree Ø I the ML method we compute the lkelhood that ca be geerated by each tree: Ltree tree. We fd the maxmum lkelhood MLmax [tree ] by chagg the parameters brach legth or substtuto rates o each tree ad retur the tree that has largest ML. Ø I Bayesa methods we compute the probablty that a tree ca be geerated by the observed algmet of the N OUs whch s called the posteror probablty tree.

Bayesa phylogeetc methods Ø Usg Bayesa theorem we have tree Ø Calculato of the deomator of the posteror probablty ca be dffcult because we have to umerate all possble trees ad ther brach legths or substtuto rates. Ø However the value of the deomator s a costat for all possble trees thus the posteror probablty of each tree s oly proportoal to the lkelhood of the tree multpled by the pror probablty. Ø If we ca geerate a large umber of trees such that the frequecy of a tree s proportoal to ts lkelhood of the tree multpled by the pror probablty the the posteror probablty ca be easly computed by tree tree tree α tree tree tree umber of tree trees wth the same topology as tree total umber of where s called the pror probablty. tree the sample.

he Markov cha Mote Carlo method for samplg Ø Markov cha Mote Carlo MCMC s a method for geeratg a sample from the etre sample space such that the frequecy of each dvdual the sample s propotoal to the lkelhood to geerate the observed data. Ø If we have o preferece for choosg a tree before seeg the data we ca use a o-formatve uform pror probablty therefore we have tree tree tree tree tree tree tree tree Ø hs meas that the posteror probablty of a tree s proportoal to ts lkelhood f the pror probablty s the same. Ø he MCMC method begs wth a tral tree ad compute ts lkelhood L a ew tree s geerated by a move o va chagg a small amout o ay of the followg parameters. Brach legth;. Rate of substtuto; 3. opology chage by a earest eghbor terchage tree move.

he Markov cha Mote Carlo method for samplg Ø he lkelhood of the ew tree L s computed whch s usually slghtly dfferet from L. If L > L the s accepted ad t becomes a elemet the sample If L < L the s accepted wth probablty L L. hs rule of selecto s call the Metropols algorthmcrtero. Ø herefore the MCMC method favors hll-clmbg moves but also allows dowhll moves wth the a certa probablty L L. Ø he result wll be that the equlbrum probabltes of observg the dfferet trees the sample are gve by the lkelhoods of the trees. Ø o see ths suppose that we have oly two trees ad so MCMC moves back ad forward betwee them wth trasto probabltes r ad r. r r

he Markov cha Mote Carlo method for samplg Ø Let p ad p be the equlbrum probabltes of these trees the sample. he at equlbrum the probabltes of observg these trees durg the samplg process should be costat that s r p p r pr or. r p Ø hs property s called detaled balace. o have trees the sample to be proportoal to ther lkelhoods we eed to set p L r L. p L herefore we have. r L Ø hs meas that to geerate the desred sample we should set the rato of trastoal probablty to be equal to the rato of lkelhoods. Ø he MCMC algorthm ust does ths because f L > L we set r r L L ; therefore r r L L. f L < L we set r L L ad r ; therefore r r L L L L.

he top four trees for the latyrrh group by MCMC Ø o compute lkelhoods HKY substtuto model plus sx γ- dstrbuto rate categores ad varat stes are used. Ø he most parts of the tree are well defed except the followg groups. he postos of Capuch s varyg hs tree s the same as the NJ ad parsmoy trees

he top seve trees for prcpal groups by MCMC Ø he ucertaty of these trees dcate that more sequeces are eeded to solve the problem. he same as by he postos NJ ad of Capuch s varyg parsmoy

opular phylogeetc tree costructo programs Ø HYLI eveloped by Joseph Felseste; Implemets most kow dstace methods such as UGAM ad NJ ad maxmum parsmoy ad ML methods; he most recet release s verso 3.69 whch cotas more tha 50 programs; Commad le terface; he package ca be freely dowloaded at http: evoluto.geetcs.washgto.eduphylp.html Ø AU hylogeetc Aalyss Usg arsmoy Wrtte by avd Swofford; Icludes parsmoy dstace matrx varats ad maxmum lkelhood methods ad may dces ad statstcal tests; escrbed at http:paup.cst.fsu.edu Ufortuately t s ow commercalzed by Sauer Assocates sellg for $85-50package.

opular phylogeetc tree costructo programs Ø MEGA Molecular Evolutoary Geetc Aalyss eveloped by Sudhr Kumar ad colleagues at ASU; Cotas parsmoy dstace ad lkelhood methods for molecular data uclec acd sequeces ad prote sequeces; Ca do bootstrappg cosesus trees ad a varety of data edtg tasks; Has sequece algmet fucto usg a mplemetato of ClustalW; A GUI based program; Cota tree dsplay fuctos. Ø REE-UZZLE Wrtte by Korba Strmmer; A program for maxmum lkelhood aalyss for ucleotde ad amo acd algmets; Ifers phylogees by quartet puzzlg;

opular phylogeetc tree costructo programs Ø REE-UZZLE cotued Supports all popular models of sequece evoluto of ucleotdes ad protes ad ca take rate heterogeety amog stes to accout; Compatble wth HYLI fles; he curret verso also has features for parallel computato usg the MI message-passg terface f ths s avalable; Freely avalable at http:www.tree-puzzle.de. Ø MrBayes A program for the Bayesa estmato of phylogeetc trees. Ablty to aalyze ucleotde amo acd restrcto ste ad morphologcal data Freely avalable at http:mrbayes.cst.fsu.edu Ø ree Vew A program for vsualzato ad prtg trees; Free at http:taxoomy.zoology.gla.ac.ukrodtreevew.html