Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

Similar documents
Support Vector Machines

Random Forests. Gradient Boosting. and. Bagging and Boosting

Robust Electric Power Infrastructures. Response and Recovery during Catastrophic Failures.

Probabilistic Latent Semantic Analysis Hofmann (1999)

Deep Learning and Visualization of Election Data

Instructors: Tengyu Ma and Chris Re

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Classifier Evaluation and Selection. Review and Overview of Methods

Cluster Analysis. (see also: Segmentation)

CS 229 Final Project - Party Predictor: Predicting Political A liation

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Thinkwell s Homeschool Microeconomics Course Lesson Plan: 31 weeks

Lecture 8: Verification and Validation

PROJECTING THE LABOUR SUPPLY TO 2024

IBM Cognos Open Mic Cognos Analytics 11 Part nd June, IBM Corporation

P(x) testing training. x Hi

Deep Learning Working Group R-CNN

Chapter Five: Forces. Ø 5.1 Forces. Ø 5.2 Friction. Ø 5.3 Forces and Equilibrium

JUDGE, JURY AND CLASSIFIER

Ethnic minority poverty and disadvantage in the UK

Hoboken Public Schools. College Algebra Curriculum

Introduction to Path Analysis: Multivariate Regression

Introduction to Text Modeling

Hoboken Public Schools. AP Calculus Curriculum

HOW ECONOMIES GROW AND DEVELOP Macroeconomics In Context (Goodwin, et al.)

Thinkwell s Homeschool Economics Course Lesson Plan: 36 weeks

Outline for Teaching/Assignments (Semestered School ~88 classes per semester)

Tie Breaking in STV. 1 Introduction. 3 The special case of ties with the Meek algorithm. 2 Ties in practice

* Source: Part I Theoretical Distribution

The Effectiveness of Receipt-Based Attacks on ThreeBallot

Understanding factors that influence L1-visa outcomes in US

CS 229: r/classifier - Subreddit Text Classification

Research and strategy for the land community.

Essential Questions Content Skills Assessments Standards/PIs. Identify prime and composite numbers, GCF, and prime factorization.

Category-level localization. Cordelia Schmid

Model of Voting. February 15, Abstract. This paper uses United States congressional district level data to identify how incumbency,

Hoboken Public Schools. Algebra II Honors Curriculum

Classification, Detection and Prosecution of Fraud on Mobile Networks

The HeLIx + inversion code Genetic algorithms. A. Lagg - Abisko Winter School 1

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

Processes. Criteria for Comparing Scheduling Algorithms

Improved Boosting Algorithms Using Confidence-rated Predictions

Production Sharing Agreements as a Form of International Cooperation N. Chebaeva, post-graduate student Supervisor Professor Dr. Igor B.

FastBridge Math Assessments

RECOMMENDED CITATION: Pew Research Center, May, 2017, Partisan Identification Is Sticky, but About 10% Switched Parties Over the Past Year

Probabilistic earthquake early warning in complex earth models using prior sampling

Congressional Gridlock: The Effects of the Master Lever

A comparative analysis of subreddit recommenders for Reddit

(a) Draw side-by-side box plots that show the yields of the two types of land. Check for outliers before making the plots.

Chapter. Sampling Distributions Pearson Prentice Hall. All rights reserved

Swiss E-Voting Workshop 2010

Combating Human Trafficking Using Mathematics

Analysis of Categorical Data from the California Department of Corrections

Multilevel models for repeated binary outcomes: attitudes and vote over the electoral cycle

VoteCastr methodology

An overview and comparison of voting methods for pattern recognition

Integrative Analytics for Detecting and Disrupting Transnational Interdependent Criminal Smuggling, Money, and Money-Laundering Networks

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

Lecture 6 Cryptographic Hash Functions

WORKGROUP S CONSENSUS PROCESS AND GUIDING PRINCIPLES CONSENSUS

Mesleki İngilizce - Technical English. Why take IELTS? Academic vs. General. Why take IELTS? (Cont.) Notes: Dr. Görkem SERBES

BASICS OF HOLOGRAPHY

Genetic Algorithms with Elitism-Based Immigrants for Changing Optimization Problems

Towards Tackling Hate Online Automatically

Hoboken Public Schools. AP Statistics Curriculum

SIMPLE LINEAR REGRESSION OF CPS DATA

Female Migration, Human Capital and Fertility

FOURIER ANALYSIS OF THE NUMBER OF PUBLIC LAWS David L. Farnsworth, Eisenhower College Michael G. Stratton, GTE Sylvania

The Analytics of the Wage Effect of Immigration. George J. Borjas Harvard University September 2009

Michael Laver and Ernest Sergenti: Party Competition. An Agent-Based Model

Lab 3: Logistic regression models

Honors General Exam Part 1: Microeconomics (33 points) Harvard University

INTERNATIONAL ECONOMICS, FINANCE AND TRADE Vol. II - Strategic Interaction, Trade Policy, and National Welfare - Bharati Basu

Can Mathematics Help End the Scourge of Political Gerrymandering?

1. The augmented matrix for this system is " " " # (remember, I can't draw the V Ç V ß #V V Ä V ß $V V Ä V

Constraint satisfaction problems. Lirong Xia

! = ( tapping time ).

Chapter 8: Recursion

EXAMINATION 3 VERSION B "Wage Structure, Mobility, and Discrimination" April 19, 2018

Tengyu Ma Facebook AI Research. Based on joint work with Rong Ge (Duke) and Jason D. Lee (USC)

Proving correctness of Stable Matching algorithm Analyzing algorithms Asymptotic running times

Do two parties represent the US? Clustering analysis of US public ideology survey

Practice Questions for Exam #2

Combining national and constituency polling for forecasting

Department of Economics Working Paper Series

Computational Inelasticity FHLN05. Assignment A non-linear elasto-plastic problem

International trade in the global economy. 60 hours II Semester. Luca Salvatici

Illegal Immigration. When a Mexican worker leaves Mexico and moves to the US he is emigrating from Mexico and immigrating to the US.

Identifying Factors in Congressional Bill Success

NBER WORKING PAPER SERIES THE PERFORMANCE OF THE PIVOTAL-VOTER MODEL IN SMALL-SCALE ELECTIONS: EVIDENCE FROM TEXAS LIQUOR REFERENDA

30 Transformational Design with Essential Aspect Decomposition: Model-Driven Architecture (MDA)

Aspect Decomposition: Model-Driven Architecture (MDA) 30 Transformational Design with Essential. References. Ø Optional: Ø Obligatory:

REVISIONS IN POPULATION PROJECTIONS AND THEIR IMPLICATIONS FOR THE GROWTH OF THE MALTESE ECONOMY

Split Decisions: Household Finance when a Policy Discontinuity allocates Overseas Work

COULD SIMULATION OPTIMIZATION HAVE PREVENTED 2012 CENTRAL FLORIDA ELECTION LINES?

PROJECTION OF NET MIGRATION USING A GRAVITY MODEL 1. Laboratory of Populations 2

IDE DISCUSSION PAPER No. 517

Hoboken Public Schools. Algebra I Curriculum

Civil Justice Improvements (CJI) Committee. Update #2

DETERMINANTS OF IMMIGRANTS EARNINGS IN THE ITALIAN LABOUR MARKET: THE ROLE OF HUMAN CAPITAL AND COUNTRY OF ORIGIN

Transcription:

Neural Networks

Overview Ø s are considered black-box models Ø They are complex and do not provide much insight into variable relationships Ø They have the potential to model very complicated patterns ( universal approximators ) Ø Can be used for both classification and continuous prediction tasks.

The History Ø Concept was welcomed with enthusiasm in 80 s Ø Didn t live up to expectations then Ø Too much hype, perhaps Ø Overtaken by other black box techniques like Support Vector Machines with Kernels in 2000 s Ø Now in the age of image and visual recognition problems, neural networks have made comeback Ø Area of rapid development Ø Rebranded as Deep Learning Ø Recurrent s Ø Convolutional s Ø Feedforward s

The Structure of a These s are often called Multilayer Perceptrons (MLPs)

The Structure of a Output Hidden Layer 2 Input Layer Hidden Layer 1

The Structure of a! " & "" & #"! #! $ & "# & ## '( Output! % & "$ bias bias (=1) Input Layer bias Hidden Layer 1 Hidden Layer 2

The Structure of a '( Associated with each line in this diagram is a parameter to be solved for!

A Simpler! " & "! # & # )! $ +,-. bias (=1) To avoid triple subscripts, let s simplify our network to 1 hidden layer and just 3 input variables. We ll assume a binary target

Math Structure of a! " 6 "" & "! # 6 "# & # ) 6 "$! $ 6 "7 +,-. bias (=1) & " = tanh (6 "7 + 6 ""! " + 6 "#! # + 6 "$! $ ) Hyperbolic tangent. One of many possible sigmoid functions. Range is -1 to 1. Related to logistic function.

Sigmoid Function 1 tanh -5 0 5-1

Math Structure of a! " 6 "" & "! # 6 "# & # ) 6 "$! $ 6 "7 +,-. bias (=1) & " = tanh (6 "7 + 6 ""! " + 6 "#! # + 6 "$! $ ) The intercept of each equation called bias term

Math Structure of a! " 6 #" & "! # 6 ## & # )! $ 6 #$ +,-. bias (=1) 6 #7 & # = tanh (6 #7 + 6 #"! " + 6 ##! # + 6 #$! $ ) The intercept of each equation called bias term

Math Structure of a! " & "! # & # 6 7" )! $ 6 7# +,-. 6 77 bias (=1) :;<,= () ) = 6 77 + 6 7" & " + 6 7# & #

Math Structure of a Ø With just 3 input variables and 1 hidden layer containing 2 hidden units, we have to estimate 11 parameters! & " = tanh (6 "7 + 6 ""! " + 6 "#! # + 6 "$! $ ) & # = tanh (6 #7 + 6 #"! " + 6 ##! # + 6 #$! $ ) :;<,= () ) = 6 77 + 6 7" & " + 6 7# & # Ø Weight estimates found by maximizing the loglikelihood function for a class target Ø The process involves an algorithm called backpropagation

Math Structure of a Ø With just 3 input variables and 1 hidden layer containing 2 hidden units, we have to estimate 11 parameters! & " = tanh (6 "7 + 6 ""! " + 6 "#! # + 6 "$! $ ) & # = tanh (6 #7 + 6 #"! " + 6 ##! # + 6 #$! $ ) :;<,= () ) = 6 77 + 6 7" & " + 6 7# & # Ø Probability estimates are obtained by solving the logit equation for p for each (x 1, x 2 ):

Training a Neural Net (Backpropagation Algorithm) Ø Forward phase: Starting with some initial weights (often random), the calculations are passed through the network to the output layer where a predicted value is computed. Ø Backward phase: The predicted value is compared to the actual value and the error is propagated backwards in the network to modify the connection weights. Ø Repeat until something like convergence.

Standardization Ø s work best when input data are scaled to a narrow range around 0 Ø For bell shaped data, statistical z-score standardization appropriate Ø For severely non-normal data, range standardization more appropriate.

Probability Surface of a High probability of yellow class Lower probability of yellow class

Probability Surface of a

Advantages of a Neural Network Ø Can be adapted to classification or numerical prediction problems Ø Capable of modelling complex nonlinear patterns Ø (More complex than any other algorithm right now) Ø Makes few assumptions about the data s underlying relationships.

Disadvantages of a Ø s have no mechanism for variable selection. You provide inputs. Ø Very difficult to see the relationships underlying the data. Ø Signs of weights can cancel each other out through the networks Ø Each input gets weight for each hidden unit which then get combined Ø Extremely computationally intensive Ø Slow to train Ø Particularly if network structure is complex or number of variables is large Ø Prone to overfitting training data