The HeLIx + inversion code Genetic algorithms. A. Lagg - Abisko Winter School 1

Similar documents
Genetic Algorithms with Elitism-Based Immigrants for Changing Optimization Problems

Polydisciplinary Faculty of Larache Abdelmalek Essaadi University, MOROCCO 3 Department of Mathematics and Informatics

Title: Local Search Required reading: AIMA, Chapter 4 LWH: Chapters 6, 10, 13 and 14.

Honors Biology Reading Guide Chapter 13 v Lamarck Ø Suggested fossils/organisms differed because species evolve Ø Proposed species evolve as a result

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

A Correlation of. Campbell. Biology. 9 th Edition, AP* Edition. to the. AP Biology Curriculum Framework

Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal

Migrants Selection and Replacement in Distributed Evolutionary Algorithms for Dynamic Optimization

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Cluster Analysis. (see also: Segmentation)

A Hybrid Immigrants Scheme for Genetic Algorithms in Dynamic Environments

CS 5523: Operating Systems

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for. Credentialing Exams

Economics Marshall High School Mr. Cline Unit One BC

Estimating the Margin of Victory for Instant-Runoff Voting

General Framework of Electronic Voting and Implementation thereof at National Elections in Estonia

11th Annual Patent Law Institute

Dimension Reduction. Why and How

Probabilistic earthquake early warning in complex earth models using prior sampling

Chapter 11. Weighted Voting Systems. For All Practical Purposes: Effective Teaching

Voting and Complexity

Understanding and Solving Societal Problems with Modeling and Simulation

A comparative analysis of subreddit recommenders for Reddit

Methodology. 1 State benchmarks are from the American Community Survey Three Year averages

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Essential Questions Content Skills Assessments Standards/PIs. Identify prime and composite numbers, GCF, and prime factorization.

PPIC Statewide Survey Methodology

Text UI. Data Store Ø Example of a backend to a real Could add a different user interface. Good judgment comes from experience

The Effectiveness of Receipt-Based Attacks on ThreeBallot

Evolutionary Game Path of Law-Based Government in China Ying-Ying WANG 1,a,*, Chen-Wang XIE 2 and Bo WEI 2

Hoboken Public Schools. AP Calculus Curriculum

From Meander Designs to a Routing Application Using a Shape Grammar to Cellular Automata Methodology

NORTH KOREA: U.S. ATTiTUdES ANd AwARENESS

Hoboken Public Schools. PLTW Introduction to Computer Science Curriculum

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Robust Electric Power Infrastructures. Response and Recovery during Catastrophic Failures.

Do two parties represent the US? Clustering analysis of US public ideology survey

Hoboken Public Schools. College Algebra Curriculum

AM ): 9:00-11:10 AM

Tengyu Ma Facebook AI Research. Based on joint work with Rong Ge (Duke) and Jason D. Lee (USC)

Tie Breaking in STV. 1 Introduction. 3 The special case of ties with the Meek algorithm. 2 Ties in practice

Human Inheritance. Tracking Traits in Humans. Autosomal Inheritance. Sex-Linked Inheritance. Chromosome Changes. Genetic Testing

THE LOUISIANA SURVEY 2018

Syllabus

Random tie-breaking in STV

Understanding factors that influence L1-visa outcomes in US

Chapter 4. Modeling the Effect of Mandatory District. Compactness on Partisan Gerrymanders

Estonian National Electoral Committee. E-Voting System. General Overview

CS269I: Incentives in Computer Science Lecture #4: Voting, Machine Learning, and Participatory Democracy

And for such other and further relief as to this Court may deem just and proper.

Syllabus

Instructors: Tengyu Ma and Chris Re

Nonexistence of Voting Rules That Are Usually Hard to Manipulate

Michael Laver and Ernest Sergenti: Party Competition. An Agent-Based Model

Comparison of Multi-stage Tests with Computerized Adaptive and Paper and Pencil Tests. Ourania Rotou Liane Patsula Steffen Manfred Saba Rizavi

Introduction-cont Pattern classification

1 Electoral Competition under Certainty

Using a Fuzzy-Based Cluster Algorithm for Recommending Candidates in eelections

Manipulative Voting Dynamics

Writing Strong Patent Applications in China. Andy Booth Head of Patents Dyson Technology Limited

File Systems: Fundamentals

Name Phylogeny. A Generative Model of String Variation. Nicholas Andrews, Jason Eisner and Mark Dredze

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

CHE 572: Modelling Process Dynamics

Deep Learning and Visualization of Election Data

Subreddit Recommendations within Reddit Communities

Constraint satisfaction problems. Lirong Xia

Congressional Gridlock: The Effects of the Master Lever

Maps and Hash Tables. EECS 2011 Prof. J. Elder - 1 -

Designing police patrol districts on street network

Hoboken Public Schools. Algebra II Honors Curriculum

Computational Inelasticity FHLN05. Assignment A non-linear elasto-plastic problem

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Case 4:16-cv Document 11 Filed in TXSD on 08/15/16 Page 1 of 32 IN UNITED STATES DISTRICT COURT FOR THE SOUTHERN DISTRICT OF TEXAS

Support Vector Machines

Category-level localization. Cordelia Schmid

(67686) Mathematical Foundations of AI June 18, Lecture 6

information it takes to make tampering with an election computationally hard.

Computational Political Economy

KNOW THY DATA AND HOW TO ANALYSE THEM! STATISTICAL AD- VICE AND RECOMMENDATIONS

Lecture 8: Verification and Validation

PRRI March 2018 Survey Total = 2,020 (810 Landline, 1,210 Cell) March 14 March 25, 2018

BUSI 2503 Section A BASIC FINANCIAL MANAGEMENT Summer, 2013(May & June)

Human Inheritance. Tracking Traits in Humans. Autosomal Inheritance. Sex-Linked Inheritance. Chromosome Changes. Genetic Testing

DrillScene Presentation for Finding Petroleum Drilling and the Digital Oilfield Aberdeen 08 May 2012 COPYRIGHT 2011 SEKAL AS. ALL RIGHTS RESERVED

Economics 470 Some Notes on Simple Alternatives to Majority Rule

Towards a Standard Architecture for Digital Voting Systems - Defining a Generalized Ballot Schema

Case Bb (elastic, 1D vertical gradient)

Arthur M. Keller, Ph.D. David Mertz, Ph.D.

Maps, Hash Tables and Dictionaries

Random Forests. Gradient Boosting. and. Bagging and Boosting

CHAPTER 2 LITERATURE REVIEW

THE LOUISIANA SURVEY 2018

Priority Queues & Heaps

In the beginning God created the heavens and the earth. Gen 1:1

AP Government & Politics ~ Lussier. Summer Assignments 2018

DU PhD in Home Science

The first of these contains the FAQs concerning the main document.

A model for election night forecasting applied to the 2004 South African elections

Transcription:

The HeLIx + inversion code Genetic algorithms A. Lagg - Abisko Winter School 1

Inversion of the RTE Once solution of RTE is known: Ø comparison between Stokes spectra of synthetic and observed spectrum Ø trial-and-error changes of the initial parameters of the atmosphere ( human inversions ) Ø until observed and synthetic (fitted) profile matches Inversions: Nothing else but an optimization of the trial-and-error part Problem: Inversions always find a solution within the given model atmosphere. Solution is seldomly unique (might even be completely wrong). Goal of this lecture: Principles of genetic algorithms Learn the usage of the HeLIx + inversion code, develop a feeling on the reliability of inversion results. A. Lagg - Abisko Winter School 2

The merit function Ø The quality of the model atmosphere must be evaluated Ø Stokes profiles represent discrete sampled functions Ø widely used: chisqr definition number of free parameters sum over Stokes sum over WL-pixels weight (also WL-dep) Ø RTE gives the Stokes spectrum I s syn Ø The unknowns of the system are the (height dependent) model parameters: A. Lagg - Abisko Winter School 3

HeLIx + overview of features includes Zeeman, Paschen-Back, Hanle effect (He 10830) atomic polarization for He 10830 (He D3) magneto-optical effects fitting / removing telluric lines fitting unknown parameters of spectral lines various methods for continuum correction / fitting convolution with instrument filter profiles user-defined weighting scheme direct read access to SOT/SP, VTT-TIP2, SST-CRISP,... flexible atomic data configuration extensive IDL based display routines MPI support (to invert maps) Download from http://www.mps.mpg.de/homes/lagg GBSO download-section à helix use invert and IR$soft A. Lagg - Abisko Winter School 4

The inversion technique: reliability Two minimizations implemented: Levenberg-Marquardt: à requires good initial guess PIKAIA (genetic algorithm, Charbonneau 1995): à no initial guess needed planned: DIRECT algorithm (good compromise between global min and speed) steepest Pikaia gradient A. Lagg - Abisko Winter School 5

Initial guess problem Having a good initial guess for the iteration process improves both the speed and the convergence of the inversion. A. Lagg - Abisko Winter School 6

Initial guess optimizations Weak field initialization Auer77 initialization Other methods: Ø Artificial Neural Networks (ANN) Ø MDI / magnetograph formulae Ø use a minimization technique which does not rely on initial guess values A. Lagg - Abisko Winter School 7

Genetic algorithms P. Spijker, TU Eindhoven Ø Genetic algorithms (GA s) are a technique to solve problems which need optimization Ø GA s are a subclass of Evolutionary Computing Ø GA s are based on Darwin s theory of evolution Ø History of GA s: Ø Evolutionary computing evolved in the 1960 s. Ø GA s were created by John Holland in the mid-70 s. A. Lagg - Abisko Winter School 8

Advantages / drawbacks Ø No derivatives of the goodness of fit function with respect to model parameters need be computed; it matters little whether the relationship between the model and its parameters is linear or nonlinear. Ø Nothing in the procedure outlined above depends critically on using a least-squares statistical estimator; any other robust estimator can be substituted, with little or no changes to the overall procedure. Ø In most real applications, the model will need to be evaluated (i.e., given a parameter set, compute a synthetic dataset and its associated goodness of fit) a great many times; if this evaluation is computationally expensive, the forward modeling approach can become impractical. A. Lagg - Abisko Winter School 9

Evolution in biology Ø Each cell of a living thing contains chromosomes - strings of DNA Ø Each chromosome contains a set of genes - blocks of DNA Ø Each gene determines some aspect of the organism (like eye colour) Ø Ø A collection of genes is sometimes called a genotype A collection of aspects (like eye colour) is sometimes called a phenotype Ø Reproduction involves recombination of genes from parents and then small amounts of mutation (errors) in copying Ø The fitness of an organism is how much it can reproduce before it dies Ø Evolution based on survival of the fittest A. Lagg - Abisko Winter School 10

Biological reproducion Ø During reproduction errors occur Ø Due to these errors genetic variation exists Ø Most important errors are: Ø Recombination (cross-over) Ø Mutation A. Lagg - Abisko Winter School 11

Natural selection Ø The origin of species: Preservation of favourable variations and rejection of unfavourable variations. Ø There are more individuals born than can survive, so there is a continuous struggle for life. Ø Individuals with an advantage have a greater chance for survive: survival of the fittest. Ø Important aspects in natural selection are: Ø adaptation to the environment Ø isolation of populations in different groups which cannot mutually mate Ø If small changes in the genotypes of individuals are expressed easily, especially in small populations, we speak of genetic drift Ø success in life : mathematically expressed as fitness A. Lagg - Abisko Winter School 12

How to apply to RTE? David Hales (www.davidhales.com) Ø GA s often encode solutions as fixed length bitstrings (e.g. 101110, 111111, 000101) Ø Each bit represents some aspect of the proposed solution to the problem Ø For GA s to work, we need to be able to test any string and get a score indicating how good that solution is Ø definition of fitness function required: convenient to use chisqr merit function GA s improve the fitness maximization technique A. Lagg - Abisko Winter School 13

Example Drilling for oil David Hales (www.davidhales.com) Ø Imagine you had to drill for oil somewhere along a single 1km desert road Ø Problem: choose the best place on the road that produces the most oil per day Ø We could represent each solution as a position on the road Ø Say, a whole number between [0..1000] Solution1 = 300 Solution2 = 900 Road 0 500 1000 A. Lagg - Abisko Winter School 14

Encoding problem Ø The set of all possible solutions [0..1000] is called the search space or state space Ø In this case it s just one number but it could be many numbers or symbols Ø Often GA s code numbers in binary producing a bitstring representing a solution Ø In our example we choose 10 bits which is enough to represent 0..1000 512 256 128 64 32 16 8 4 2 1 900 1 1 1 0 0 0 0 1 0 0 300 0 1 0 0 1 0 1 1 0 0 1023 1 1 1 1 1 1 1 1 1 1 In GA s these encoded strings are sometimes called genotypes or chromosomes and the individual bits are sometimes called genes A. Lagg - Abisko Winter School 15

Fitness of oil function Solution1 = 300 (0100101100) Solution2 = 900 (1110000100) Road 0 1000 O I L 30 5 Location A. Lagg - Abisko Winter School 16

Search space Ø Oil example: search space is one dimensional (and stupid: how to define a fitness function?). Ø RTE: encoding several values into the chromosome many dimensions can be searched Ø Search space an be visualised as a surface or fitness landscape in which fitness dictates height (fitness / chisqr hypersurface) Ø Each possible genotype is a point in the space Ø A GA tries to move the points to better places (higher fitness) in the space A. Lagg - Abisko Winter School 17

Fitness landscapes (2-D) A. Lagg - Abisko Winter School 18

Search space Ø Obviously, the nature of the search space dictates how a GA will perform Ø A completely random space would be bad for a GA Ø Also GA s can, in practice, get stuck in local maxima if search spaces contain lots of these Ø Generally, spaces in which small improvements get closer to the global optimum are good A. Lagg - Abisko Winter School 19

The algorithm Ø Generate a set of random solutions Ø Repeat Ø Test each solution in the set (rank them) Ø Remove some bad solutions from set Ø Duplicate some good solutions Ø make small changes to some of them Ø Until best solution is good enough How to duplicate good solutions? A. Lagg - Abisko Winter School 20

Adding Sex Ø Two high scoring parent bit strings (chromosomes) are selected and with some probability (crossover rate) combined Ø Producing two new offsprings (bit strings) Ø Each offspring may then be changed randomly (mutation) Ø Selecting parents: many schemes possible, example: Roulette Wheel Ø Add up the fitness's of all chromosomes Ø Generate a random number R in that range Ø Select the first chromosome in the population that - when all previous fitness s are added - gives you at least the value R sex result of sex parents are seldom happy with the result A. Lagg - Abisko Winter School 21

Example population No. Chromosome Fitness 1 1010011010 1 2 1111100001 2 3 1011001100 3 4 1010000000 1 5 0000010000 3 6 1001011111 5 7 0101010101 1 8 1011100111 2 sum: 18 A. Lagg - Abisko Winter School 22

Roulette Wheel Selection 1 2 3 4 5 6 7 8 1 2 3 1 3 5 1 2 0 Rnd[0..18] = 7 Rnd[0..18] = 12 18 Chromosome4 Chromosome6 Parent1 Parent2 Higher chance of picking a fit chromosome! A. Lagg - Abisko Winter School 23

Crossover - Recombination 1010000000 Parent1 Offspring1 1011011111 1001011111 Parent2 Offspring2 1010000000 Crossover single point - random With some high probability (crossover rate) apply crossover to the parents. (typical values are 0.8 to 0.95) A. Lagg - Abisko Winter School 24

Mutation mutate Offspring1 1011011111 Offspring1 1011001111 Offspring2 1010000000 Offspring2 1000000000 Original offspring Mutated offspring With some small probability (the mutation rate) flip each bit in the offspring (typical values between 0.1 and 0.001) A. Lagg - Abisko Winter School 25

Improved algorithm Ø Generate a population of random chromosomes Ø Repeat (each generation) Ø Calculate fitness of each chromosome Ø Repeat ØUse roulette selection to select pairs of parents ØGenerate offspring with crossover and mutation Ø Until a new population has been produced Ø Until best solution is good enough A. Lagg - Abisko Winter School 26

Many Variants of GA Ø Different kinds of selection (not roulette): Tournament, Elitism, etc. Ø Different recombination: one-point crossover, multi-point crossover, 3 way crossover etc. Ø Different kinds of encoding other than bitstring Integer values, Ordered set of symbols Ø Different kinds of mutation variable mutation rate Ø Different reduction plans controls how newly bred offsprings are inserted into the population PIKAIA (Charbonneau, 1995) A. Lagg - Abisko Winter School 27

How PIKAIA works A. Lagg - Abisko Winter School 28

List of ME Codes (incomplete) Ø HeLIx + A. Lagg, most flexible code (multi-comp, multi line), He 10830 Hanle slab model implemented. Genetic algorithm Pikaia. Fully parallel. Ø VFISV J.M.Borrero, for SDO HMI. Fastest ME code available. F90, fully parallel. Levenberg-Marquardt with some optimizations. Ø MERLIN Written by Jose Garcia at HAO in C, C++ and some other routines in Fortran. (Lites et al. 2007 in Il Nouvo Cimento) Ø MELANIE Hector Socas at HAO. In F90, not parallel. Numerical derivatives. Ø HAZEL Andres Asensio Ramos et al. (2008). Optimized for He 10830, He D3, Hanle-slab model. Ø MILOS Orozco Suarez et al. (2007), IDL, some papers published with it A. Lagg - Abisko Winter School 29

Installation & Usage of HeLIX + Follow instructions on user s manual: Basic usage: Ø 1-component model, create & invert synthetic spectrum Ø discuss problems: Ø parameter crosstalk Ø uniqueness of solution Ø stability & reliability Ø influence of noise Download from http://www.mps.mpg.de/homes/lagg GBSO download-section à helix use invert and IR$soft A. Lagg - Abisko Winter School 30

Exercise II: HeLIx + installation and basic usage Ø install and run IDL interface of HeLIx + Ø the first input file: synthesis of Fe I 6302.5 Ø change atmospheric parameters (B, INC, ) Ø change line parameters (quantum numbers, g eff ) Ø display Zeeman pattern Ø add noise Ø 1 st inversion Ø play with noise level / initial values / parameter range Ø weighting scheme Download first input file: abisko_1c.ipt http://www.mps.mpg.de/homes/lagg/ Synthesis Ø add complexity to atmospheric model (stray-light, multicomponent) Ø add 2 nd spectral line (Fe 6301.5) blind tests: Ø take synthetic profile from someone else and invert it Ø Which parameters are robust? Ø How can robustness be improved? A. Lagg - Abisko Winter School 31