BINF6080 Molecular phylogeetc methods 4-08-06
Maxmum lkelhood methods Ø So far we have oly cosdered a sgle ste cofgurato. he lkelhood for all stes s the product of the lkelhoods for each ste assumg all the stes evolve depedetly. Ø Suppose that there are s homologous sequeces each wth N ucleotdes. Let be the -th colum of the multple algmet.... f m θ θ θ Ø For a tree let be the lkelhood of tree for the -th ste where θ θ θ m are the ukow parameters such as the brach legth. Usg the prevous case as a example we have s d d d... l k } {... 5 3 4 4 3 v v v v v g v v v v l k h f y y y xy xk x xl x m θ θ θ v θ
Maxmum lkelhood methods Ø For smplcty let s assume the sequeces are homogeous.e. all stes evolve at the same rate the the lkelhood fucto for the etre sequece for the tree s N θ... θm f θ θ... m L θ θ Ø Here we treat L as a fucto of the parameters. We the search for the values of θ θ θ m that maxmze L gve the topology of the tree ths value of L s called a ML value of the tree. Ø Fdg the ML value ca be a slow process. Ø We do ths for all possble tree topologes ad detfy the oe that has the largest ML value as the ferred phylogeetc tree of the s sequeces. Ø Clearly dfferet substtuto models may result dfferet trees. Ø Whe the umber of OUs s larger a heurstc trees search algorthm should be used for evaluatg the alteratve trees.
Heurstc tree search usg predefed clusters Ø Although the tree space could be very large maorty of them have extremely low lkelhood values for a certa OUs. Ø So we ca safely gore these upromsg trees ad focus o the promsg oes. Ø o reduce the searchg space we ca predefe clusters f ther relatoshps are kow as the put. Ø he our example the problem becomes to exame the 05 possble trees geerated by coectg these predefed groups stead of a astroomcally large umber of urooted trees: N 5!! 3 5!! N U 4!!
Heurstc tree search usg predefed clusters Ø he ML value s computed for each tree the oe wth the largest ML value s retured as the ferred tree. Ø As ths algorthm exames all possble trees so the global optmum s guarateed f the predefed groups are correct. Ø Whe the smple J-C model was used ad a homogeous substtuto rate s assumed the resultg ML tree s smlar to the NJ ad parsmoy trees wth the problem of msplacg tree shrews sde the prmate group.
Maxmum lkelhood trees for prmates Ø However whe the more sophstcated HKY substtuto model plus sx γ-dstrbuto rate categores ad varat stes were used the tree costructed by the ML method places the tree shrews outsde of the prmate group. Ø Nevertheless there are three trfurcatos o ths tree dcatg that at a trfurcato pot ay of the three clusters ca be a outgroup of the other two ad the three trees have the same ML value.
Comparso of parsmoy ad maxmum lkelhood methods Ø arsmoy methods have oly oe assumpto that the chages o the braches are equally possble however ths assumpto may ot hold. Ø Because of the few assumptos are used parsmoy methods ther propoets beleve that these methods ca be appled to ay sequece data. Ø arsmoy method s also relatvely fast so ca be appled to larger data sets. Ø ML methods make assumptos about the evolutoary models. Ø ML methods eed to optmze all these parameters to fd the ML value therefore they are computatoally tesve ad are very slow. Ø Whe evolutoary models are properly selected ML methods ted to acheve better results tha parsmoy methods.
Heurstc tree search usg quartet puzzlg Ø he quartet puzzlg algorthm s a very fast heurstc algorthm for explorg the promsg trees. Step : Computer ML values of the three trees for all possble four sequeces 3 4 5 For each possble 4 sequeces 3 4 3 4 he best ML tree 4 3 6 3 4 trees
Heurstc tree search usg quartet puzzlg Step : Radomly pck up four sequeces place them the tree accordg to ther best ML tree. 4 3 Step 3: Radomly pck up a remag sequece ad add t to the tree such that the growg tree has a maxmum umber of best ML quartet trees. Repeat ths process utl all sequeces are added to the tree. For example f sequece 5 s radomly pcked ad f oe or both of the followg trees are the best ML quartet trees volvg 3 4 ad 5: 4 3 5 3 the the resultg tree wll be 5 5 4 3
Heurstc tree search usg quartet puzzlg he the last sequece 6 s added to the tree. If the followg s the best ML tree amog all quartet trees cotag sequece 6 6 3 4 he the resultg tree wll be 5 Add sequece 6 6 5 4 3 4 3 Ø he whole process s repeated may tmes wth the sequeces beg selected dfferet orders. he resultg tree wll deped o the order of sequece selectos. Ø he tree that happes most frequetly wll be chose as the ferred tree.
Bayesa phylogeetc methods Ø Bayesa theorem: f A ad B are two evets the A B B A A B B B A A A B AB 3 4 5 6 7 8 9 0 Ø If ad are evets that parttos the sample space ad s a evet from the sample space the.... + + +
Bayesa phylogeetc methods Ø For N OUs we ca have N-5!! possble urooted trees whch s a partto of the tree space. Let be the algmet of the N OUs but we do ot kow whch tree s most lkely to accout for. tree tree tree 3 tree 4 tree 5 tree 6 tree 7 tree 8 tree 9 tree 0. tree Ø I the ML method we compute the lkelhood that ca be geerated by each tree: Ltree tree. We fd the maxmum lkelhood MLmax [tree ] by chagg the parameters brach legth or substtuto rates o each tree ad retur the tree that has largest ML. Ø I Bayesa methods we compute the probablty that a tree ca be geerated by the observed algmet of the N OUs whch s called the posteror probablty tree.
Bayesa phylogeetc methods Ø Usg Bayesa theorem we have tree Ø Calculato of the deomator of the posteror probablty ca be dffcult because we have to umerate all possble trees ad ther brach legths or substtuto rates. Ø However the value of the deomator s a costat for all possble trees thus the posteror probablty of each tree s oly proportoal to the lkelhood of the tree multpled by the pror probablty. Ø If we ca geerate a large umber of trees such that the frequecy of a tree s proportoal to ts lkelhood of the tree multpled by the pror probablty the the posteror probablty ca be easly computed by tree tree tree α tree tree tree umber of tree trees wth the same topology as tree total umber of where s called the pror probablty. tree the sample.
he Markov cha Mote Carlo method for samplg Ø Markov cha Mote Carlo MCMC s a method for geeratg a sample from the etre sample space such that the frequecy of each dvdual the sample s propotoal to the lkelhood to geerate the observed data. Ø If we have o preferece for choosg a tree before seeg the data we ca use a o-formatve uform pror probablty therefore we have tree tree tree tree tree tree tree tree Ø hs meas that the posteror probablty of a tree s proportoal to ts lkelhood f the pror probablty s the same. Ø he MCMC method begs wth a tral tree ad compute ts lkelhood L a ew tree s geerated by a move o va chagg a small amout o ay of the followg parameters. Brach legth;. Rate of substtuto; 3. opology chage by a earest eghbor terchage tree move.
he Markov cha Mote Carlo method for samplg Ø he lkelhood of the ew tree L s computed whch s usually slghtly dfferet from L. If L > L the s accepted ad t becomes a elemet the sample If L < L the s accepted wth probablty L L. hs rule of selecto s call the Metropols algorthmcrtero. Ø herefore the MCMC method favors hll-clmbg moves but also allows dowhll moves wth the a certa probablty L L. Ø he result wll be that the equlbrum probabltes of observg the dfferet trees the sample are gve by the lkelhoods of the trees. Ø o see ths suppose that we have oly two trees ad so MCMC moves back ad forward betwee them wth trasto probabltes r ad r. r r
he Markov cha Mote Carlo method for samplg Ø Let p ad p be the equlbrum probabltes of these trees the sample. he at equlbrum the probabltes of observg these trees durg the samplg process should be costat that s r p p r pr or. r p Ø hs property s called detaled balace. o have trees the sample to be proportoal to ther lkelhoods we eed to set p L r L. p L herefore we have. r L Ø hs meas that to geerate the desred sample we should set the rato of trastoal probablty to be equal to the rato of lkelhoods. Ø he MCMC algorthm ust does ths because f L > L we set r r L L ; therefore r r L L. f L < L we set r L L ad r ; therefore r r L L L L.
he top four trees for the latyrrh group by MCMC Ø o compute lkelhoods HKY substtuto model plus sx γ- dstrbuto rate categores ad varat stes are used. Ø he most parts of the tree are well defed except the followg groups. he postos of Capuch s varyg hs tree s the same as the NJ ad parsmoy trees
he top seve trees for prcpal groups by MCMC Ø he ucertaty of these trees dcate that more sequeces are eeded to solve the problem. he same as by he postos NJ ad of Capuch s varyg parsmoy
opular phylogeetc tree costructo programs Ø HYLI eveloped by Joseph Felseste; Implemets most kow dstace methods such as UGAM ad NJ ad maxmum parsmoy ad ML methods; he most recet release s verso 3.69 whch cotas more tha 50 programs; Commad le terface; he package ca be freely dowloaded at http: evoluto.geetcs.washgto.eduphylp.html Ø AU hylogeetc Aalyss Usg arsmoy Wrtte by avd Swofford; Icludes parsmoy dstace matrx varats ad maxmum lkelhood methods ad may dces ad statstcal tests; escrbed at http:paup.cst.fsu.edu Ufortuately t s ow commercalzed by Sauer Assocates sellg for $85-50package.
opular phylogeetc tree costructo programs Ø MEGA Molecular Evolutoary Geetc Aalyss eveloped by Sudhr Kumar ad colleagues at ASU; Cotas parsmoy dstace ad lkelhood methods for molecular data uclec acd sequeces ad prote sequeces; Ca do bootstrappg cosesus trees ad a varety of data edtg tasks; Has sequece algmet fucto usg a mplemetato of ClustalW; A GUI based program; Cota tree dsplay fuctos. Ø REE-UZZLE Wrtte by Korba Strmmer; A program for maxmum lkelhood aalyss for ucleotde ad amo acd algmets; Ifers phylogees by quartet puzzlg;
opular phylogeetc tree costructo programs Ø REE-UZZLE cotued Supports all popular models of sequece evoluto of ucleotdes ad protes ad ca take rate heterogeety amog stes to accout; Compatble wth HYLI fles; he curret verso also has features for parallel computato usg the MI message-passg terface f ths s avalable; Freely avalable at http:www.tree-puzzle.de. Ø MrBayes A program for the Bayesa estmato of phylogeetc trees. Ablty to aalyze ucleotde amo acd restrcto ste ad morphologcal data Freely avalable at http:mrbayes.cst.fsu.edu Ø ree Vew A program for vsualzato ad prtg trees; Free at http:taxoomy.zoology.gla.ac.ukrodtreevew.html