Introduction to Path Analysis: Multivariate Regression

Similar documents
ANALYSIS OF THE EFFECT OF REMITTANCES ON ECONOMIC GROWTH USING PATH ANALYSIS ABSTRACT

Practice Questions for Exam #2

Gender preference and age at arrival among Asian immigrant women to the US

SIMPLE LINEAR REGRESSION OF CPS DATA

Incumbency as a Source of Spillover Effects in Mixed Electoral Systems: Evidence from a Regression-Discontinuity Design.

Working Paper No. 160

Table A.2 reports the complete set of estimates of equation (1). We distinguish between personal

Global Public Opinion toward the United Nations: Insights from the Gallup World Poll

Impact of the EU Enlargement on the Agricultural Income. Components in the Member States

Appendix to Sectoral Economies

Experiments in Election Reform: Voter Perceptions of Campaigns Under Preferential and Plurality Voting

DU PhD in Home Science

Honors General Exam Part 1: Microeconomics (33 points) Harvard University

Inflation and relative price variability in Mexico: the role of remittances

Corruption and business procedures: an empirical investigation

CHAPTER FIVE RESULTS REGARDING ACCULTURATION LEVEL. This chapter reports the results of the statistical analysis

IMMIGRATION REFORM, JOB SELECTION AND WAGES IN THE U.S. FARM LABOR MARKET

Immigration and Internal Mobility in Canada Appendices A and B. Appendix A: Two-step Instrumentation strategy: Procedure and detailed results

Immigrant and Domestic Minorities Racial Identities and College Performance

Hoboken Public Schools. AP Statistics Curriculum

Benefit levels and US immigrants welfare receipts

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for. Credentialing Exams

High Technology Agglomeration and Gender Inequalities

Remittances and Poverty. in Guatemala* Richard H. Adams, Jr. Development Research Group (DECRG) MSN MC World Bank.

Combining national and constituency polling for forecasting

Macroeconomic Determinants of Tariff Policy in Pakistan

What makes people feel free: Subjective freedom in comparative perspective Progress Report

A positive correlation between turnout and plurality does not refute the rational voter model

The authors acknowledge the support of CNPq and FAPEMIG to the development of the work. 2. PhD candidate in Economics at Cedeplar/UFMG Brazil.

And Yet it Moves: The Effect of Election Platforms on Party. Policy Images

Women and Voting in the Arab World: Explaining the Gender Gap

Honors General Exam PART 3: ECONOMETRICS. Solutions. Harvard University April 2014

Explaining the Deteriorating Entry Earnings of Canada s Immigrant Cohorts:

Tsukuba Economics Working Papers No Did the Presence of Immigrants Affect the Vote Outcome in the Brexit Referendum? by Mizuho Asai.

The Seventeenth Amendment, Senate Ideology, and the Growth of Government

IV. Labour Market Institutions and Wage Inequality

PASW & Hand Calculations for ANOVA

Effects of Unionization on Workplace-Safety Enforcement: Regression-Discontinuity Evidence

From Policy to Polity: Democracy, Paternalism, and the Incorporation of Disadvantaged Citizens

EFFECTS OF REMITTANCE AND FDI ON THE ECONOMIC GROWTH OF BANGLADESH

JudgeIt II: A Program for Evaluating Electoral Systems and Redistricting Plans 1

The Ruling Party and its Voting Power

THE EVALUATION OF OUTPUT CONVERGENCE IN SEVERAL CENTRAL AND EASTERN EUROPEAN COUNTRIES

! = ( tapping time ).

Do Individual Heterogeneity and Spatial Correlation Matter?

17003-EEF. Political Preferences of (Un)happy Voters: Evidence Based on New Ideological Measures. Richard Jong-A-Pin Maite Laméris Harry Garretsen

The Determinants of Low-Intensity Intergroup Violence: The Case of Northern Ireland. Online Appendix

Multilevel models for repeated binary outcomes: attitudes and vote over the electoral cycle

Supplementary Material for Preventing Civil War: How the potential for international intervention can deter conflict onset.

List of Tables and Appendices

have been prohibitively expensive as well.

On the Causes and Consequences of Ballot Order Effects

Appendix for: The Electoral Implications. of Coalition Policy-Making

Can Politicians Police Themselves? Natural Experimental Evidence from Brazil s Audit Courts Supplementary Appendix

Workers Remittances. and International Risk-Sharing

Methodological and Substantive Issues in Analyses of a Dependent Nominal-Level Variable in Comparative Research. The Case of Party Choice

Online Appendix for Redistricting and the Causal Impact of Race on Voter Turnout

Vote Compass Methodology

A Critical Assessment of the Determinants of Presidential Election Outcomes

Cornell University ILR School. Sherrilyn M. Billger. Carlos LaMarche

Volume 35, Issue 1. An examination of the effect of immigration on income inequality: A Gini index approach

English Deficiency and the Native-Immigrant Wage Gap in the UK

PROJECTION OF NET MIGRATION USING A GRAVITY MODEL 1. Laboratory of Populations 2

Introduction and methodology 1

The National Citizen Survey

School Performance of the Children of Immigrants in Canada,

NBER WORKING PAPER SERIES THE LABOR MARKET IMPACT OF HIGH-SKILL IMMIGRATION. George J. Borjas. Working Paper

Statistical Analysis of Corruption Perception Index across countries

Model of Voting. February 15, Abstract. This paper uses United States congressional district level data to identify how incumbency,

ANALYSES OF JUVENILE CHINOOK SALMON AND STEELHEAD TRANSPORT FROM LOWER GRANITE AND LITTLE GOOSE DAMS, NOAA Fisheries

Supporting Information

The role of Social Cultural and Political Factors in explaining Perceived Responsiveness of Representatives in Local Government.

EXPORT, MIGRATION, AND COSTS OF MARKET ENTRY EVIDENCE FROM CENTRAL EUROPEAN FIRMS

All s Well That Ends Well: A Reply to Oneal, Barbieri & Peters*

Is Corruption Anti Labor?

Migrant Wages, Human Capital Accumulation and Return Migration

Rethinking the Area Approach: Immigrants and the Labor Market in California,

FACTORS INFLUENCING POLICE CORRUPTION IN LIBYA A Preliminary Study.

Submission to the Speaker s Digital Democracy Commission

George J. Borjas Harvard University. September 2008

Introduction and methodology 1

The parametric g- formula in SAS JESSICA G. YOUNG CIMPOD 2017 CASE STUDY 1

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

English Deficiency and the Native-Immigrant Wage Gap

Random Forests. Gradient Boosting. and. Bagging and Boosting

Candidate Faces and Election Outcomes: Is the Face Vote Correlation Caused by Candidate Selection?

A Global Perspective on Socioeconomic Differences in Learning Outcomes

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

Journal of Economic Cooperation, 29, 2 (2008), 69-84

Source Cues, Partisan Identities, and Political Value Expression

David Stasavage. Private investment and political institutions

Women and Power: Unpopular, Unwilling, or Held Back? Comment

Preliminary Effects of Oversampling on the National Crime Victimization Survey

Migration, Poverty & Place in the Context of the Return Migration to the US South

Immigrant-native wage gaps in time series: Complementarities or composition effects?

STATISTICAL GRAPHICS FOR VISUALIZING DATA

GENDER EQUALITY IN THE LABOUR MARKET AND FOREIGN DIRECT INVESTMENT

Immigrants Inflows, Native outflows, and the Local Labor Market Impact of Higher Immigration David Card

Index of Integration: Toward a Summary Measure of a Multidimensional Concept

Support Vector Machines

Transcription:

Introduction to Path Analysis: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #7 March 9, 2016 EPSY 905: Multivariate Regression via Path Analysis

Today s Lecture Multivariate regression via path analysis Path analysis details: Ø Terms Ø Software estimation defaults (variables in/out of likelihood) Ø Model comparisons via likelihood ratio tests Ø Measures of absolute and approximate model fit Ø Model modification methods Ø Standardized regression coefficients Additional issues in path analysis Ø Variable considerations EPSY 905: Multivariate Regression via Path Analysis 2

Today s Data Example Data are simulated based on the results reported in: Pajares, F., & Miller, M. D. (1994). Role of self-efficacy and self-concept beliefs in mathematical problem solving: a path analysis. Journal of Educational Psychology, 86, 193-203. Sample of 350 undergraduates (229 women, 121 men) Ø In simulation, 10% of variables were missing (using missing completely at random mechanism) Note: simulated data characteristics differ from actual data (some variables extend beyond their official range) Ø Simulated using Multivariate Normal Distribution w Some variables had boundaries that simulated data exceeded Ø Results will not match exactly due to missing data and boundaries EPSY 905: Multivariate Regression via Path Analysis 3

Variables of Data Example Sex (1 = male; 0 = female) Math Self-Efficacy (MSE) Ø Reported reliability of.91 Ø Assesses math confidence of college students Perceived Usefulness of Mathematics (USE) Ø Reported reliability of.93 Math Anxiety (MAS) Ø Reported reliability ranging from.86 to.90 Math Self-Concept (MSC) Ø Reported reliability of.93 to.95 Prior Experience at High School Level (HSL) Ø Self report of number of years of high school during which students took mathematics courses Prior Experience at College Level (CC) Ø Self report of courses taken at college level Math Performance (PERF) Ø Reported reliability of.788 Ø 18-item multiple choice instrument (total of correct responses) EPSY 905: Multivariate Regression via Path Analysis 4

Our Destination: Overall Path Model Sex High School Math Experience Mathematics Self-Concept Mathematics Performance Mathematics Self-Efficacy Perceived Usefulness College Math Experience Direct Effect Residual Variance EPSY 905: Multivariate Regression via Path Analysis 5

The Big Picture Path analysis is a multivariate statistical method that, when using an identity link, assumes the variables in an analysis are multivariate normally distributed Ø Ø Mean vectors Covariance matrices By specifying simultaneous regression equations (the core of path models), a very specific covariance matrix is implied Ø This is where things deviate from mixed models As with all multivariate models, the key to path analysis is finding an approximation to the unstructured (saturated) covariance matrix Ø With fewer parameters, if possible The art to path analysis is in specifying models that blend theory and statistical evidence to produce valid, generalizable results EPSY 905: Multivariate Regression via Path Analysis 6

MULTIVARIATE REGRESSION/ANOVA/ANCOVA VIA PATH ANALYSIS EPSY 905: Multivariate Regression via Path Analysis 7

Multivariate Regression Before we dive into path analysis, we will begin with a multivariate regression model: Ø Predicting mathematics performance (PERF) with sex (F), college math experience (CC), and the interaction between sex and college math experience (FxCC) Ø Predicting perceived usefulness (USE) with sex (F), college math experience (CC), and the interaction between sex and college math experience (FxCC) PERF % = β )*+, ( + β )*+,, F % + β )*+,.. CC % + β )*+, )*+,,0.. F % CC % + e % USE % = β 45* ( + β 45*, F % + β 45*.. CC % + β 45* 45*,0.. F % CC % + e % We denote the residual for PERF as e % )*+, and the residual for USE as e % 45* Ø We also assume the residuals are Multivariate Normal: e % )*+, 8 σ <:)*+,,45* 0 45* N 8 e % 0, σ <:)*+, σ <:)*+,,45* σ 8 <:45* EPSY 905: Multivariate Regression via Path Analysis 8

Before Continuing: We will Center CC at 10 EPSY 905: Multivariate Regression via Path Analysis 9

Types of Variables in the Analysis An important distinction in path analysis is between endogenous and exogenous variables Endogenous variable(s): variables whose variability is explained by one or more variables in a model Ø In linear regression, the dependent variable is the only endogenous variable in an analysis w Mathematics Performance (PERF) and Mathematics Usefulness (USE) Exogenous variable(s): variables whose variability is not explained by any variables in a model Ø In linear regression, the independent variable(s) are the exogenous variables in the analysis w Female (F), college experience (CC), and the interaction (FxCC) EPSY 905: Multivariate Regression via Path Analysis 10

Multivariate Linear Regression Path Diagram σ, 8 σ 8 σ,,.... σ 8,0.. σ,,,0.. σ..,,0.. Female (F) College Math Experience (CC) Female x College Math Experience (FxCC) β 45*.. β 45*,0.. β, )*+, β, 45* β )*+,.. β )*+,,0.. Mathematics Performance (PERF) Mathematics Usefulness (USE) Direct Effect σ 8 <:)*+, σ <:)*+,,45* σ 8 <:45* Residual (Endogenous) Variance Exogenous Variances Exogenous Covariances EPSY 905: Multivariate Regression via Path Analysis 11

R s Version of the Path Diagram EPSY 905: Multivariate Regression via Path Analysis 12

Labeling Variables The endogenous (dependent) variables are: Ø Performance (PERF) and Usefulness (USE) The exogenous (independent) variables are: Ø Female (F), college experience (CC), and the interaction of Female and college experience (FxCC) EPSY 905: Multivariate Regression via Path Analysis 13

Multivariate Regression in R Using the lavaan Package By putting 0* in front of each of the variables, we are allowing them to be in the likelihood (for model comparisons) but not predict either DV A note about path analysis software: Most packages put all variables into the likelihood function (Mplus does not) So, you must start with all variables in the model for LRTs EPSY 905: Multivariate Regression via Path Analysis 14

Multivariate Regression Model Parameters Lavaan considers all five variables to be part of a multivariate normal distribution, so the unstructured (saturated) model has a total of 20 parameters: Ø Ø 5 means 5 variances Ø 10 covariances (5-choose-2 or 5*(5-1)/2)) The model itself has 14 parameters: Ø Ø Ø Ø Ø Ø 5 intercepts 0 regression slopes (but we ll add these next) 2 residual variances 1 residual covariance 3 exogenous variances 3 exogenous covariances Lavaan will estimate two models for each analysis: H0 (your model) and H1 (saturated model) Degrees of DF in path models come from comparing the saturated model number of parameters with the parameters estimated Ø Parameters available 20 14 parameters estimated = 6 df Therefore, this model will not fit perfectly model fit statistics will be available EPSY 905: Multivariate Regression via Path Analysis 15

Output from Lavaan: Summary Statement Note: No information about exogenous variables (from fixed.x=true option) EPSY 905: Multivariate Regression via Path Analysis 16

Path Diagram with Numbers Shown EPSY 905: Multivariate Regression via Path Analysis 17

Output from lavaan: Fitted and Saturated Covariance Matrix The fitted covariance matrix shows you what the model implies the variances and covariancesshould be Here the exogenous variables are provided by sample estimates (fitted.x=true) Model parameters provide the endogenous parameters The lower matrix is the saturated model matrix EPSY 905: Multivariate Regression via Path Analysis 18

Output from lavaan: Residual Covariance Matrices The raw residuals are the difference between the model implied covariance matrix and the H1 (saturated model) covariance matrix/mean vector EPSY 905: Multivariate Regression via Path Analysis 19

METHODS OF EXAMINING MODEL FIT EPSY 905: Multivariate Regression via Path Analysis 20

Methods of Model Fit Model-data fit is of utmost concern when building models with multivariate outcomes If a model does not fit the data: Ø Parameter estimates may be biased Ø Standard errors of estimates may be biased Ø Inferences made from the model may be wrong Ø If the saturated model fit is wrong, then the LRTs will be inaccurate Examining model fit is the first step in multivariate models That said, not all good-fitting models are useful Ø model fit just allows you to talk about your model there may be nothing of significance (statistically or practically) in your results, though EPSY 905: Multivariate Regression via Path Analysis 21

Types of Model Fit Information Model fit information for models where outcomes are conditionally MVN* come in several types, but all are based on the premise that any model mean and covariance structure must fit as well as the saturated mean vector and covariance matrix model *If model outcomes are not conditionally MVN, model fit is very different All possible models/structures are nested within the saturated mean vector and covariance matrix model Ø Most model fit statistics come from comparing any model/structure with the saturated model Indices shown first are called global model fit indices Ø Report fit of model globally (as opposed to locally for specific parameters) EPSY 905: Multivariate Regression via Path Analysis 22

Example lavaan Model Fit Output EPSY 905: Multivariate Regression via Path Analysis 23

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihoodand Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 24

Indices of Global Model Fit Primary: obtained model χ 2 (from Model test baseline model) here we use the MLR rescaled χ 2 from the Robust Column Ø χ 8 is evaluated based on model df (difference in parameters between your CFA model and the saturated model) Ø Tests null hypothesis that this model (H 0 ) fits equally to saturated model (H 1 ) so significance is undesirable (smaller χ 2, bigger p-value is better) w Means saturated model is estimated automatically for each model analyzed Ø Just using χ 2 is insufficient, however: w Distribution doesn t behave like a true χ 2 if sample sizes are small (or, if not using MLR, if items are non-normally distributed) w Obtained χ 2 depends largely on sample size w Some mention this is an unreasonable null hypothesis (perfect fit??) Because of these issues, alternative measures of fit are usually used in conjunction with the χ 2 test of model fit Ø Absolute Fit Indices (besides χ 2 ) Ø Parsimony-Corrected; Comparative (Incremental) Fit Indices EPSY 905: Multivariate Regression via Path Analysis 25

Chi-Square Test of Model Fit The Chi-Square Test of Model Fit provides a likelihood ratio test comparing the current model to the saturated (unstructured) model: Ø The value is -2 times the difference in log-likelihoods (rescaled if MLR) Ø The degrees of freedom is the difference in the number of estimated model parameters Ø The p-value is from the Chi-square distribution If this test has a significant p-value: Ø The current model (H ( ) is rejected the model fit is significantly worse than the full model Ø In latent variable models, this test is usually ignored w Said to be overly sensitive If this test does not have a significant p-value: Ø The current model (H ( ) is not rejected fits equivalently to full model EPSY 905: Multivariate Regression via Path Analysis 26

Where the Saturated Model Test Comes From The saturated model LRT comes from a likelihood ratio test of the current model with the saturated model If using MLR (Robust method), then this LRT is rescaled based on the estimated scaling factors of both models This same information can be obtained from: Ø Loglikelihood model output section Ø anova() function comparing fit for current and saturated models EPSY 905: Multivariate Regression via Path Analysis 27

Calculating the LRT for Global Fit Test for Model 04 From the lavaan output: Calculation: Ø 14 parameters in our model; 20 in saturated model Ø Scaling correction factor: c A+ = q C<DEC%FE<G c C<DEC%FE<G q IJKK c IJKK q C<DEC%FE<G q IJKK = 1.005 Ø χ 8 = 88.O(P Q.((R = 22.204 Ø DF = 6 Conclusion: this model fit significantly worse than the saturated model Ø And it should especially if any of our predictors have non-zero betas EPSY 905: Multivariate Regression via Path Analysis 28

Saturated Model LRT and Loglikelihood Output If the loglikelihoods of the current model ( User model or H ( ) are equal to the loglikelihoods of the saturated model ( Unrestriced model or H Q ), then you are running a model that is equivalent to the saturated model Ø No other model fit will be available or useful EPSY 905: Multivariate Regression via Path Analysis 29

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihoodand Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 30

Model Test Baseline Model The model test baseline model section provides a LRT: Ø Comparing the saturated (unstructured) model with an independent variables model (called the baseline model) Here, the null model is the baseline (the independent variables model) Ø If the test is significant, this means that at least one (and likely more than one) variable has a significant covariance (and correlation) Ø If the test is not significant, this means that the independence model is appropriate w This is not likely to happen w But if it does, there are virtually no other models that will be significant Not often reported as it is likely variables are correlated EPSY 905: Multivariate Regression via Path Analysis 31

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihoodand Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 32

User Model Versus Baseline Model Section The User model versus baseline model section provides two additional measures of model fit comparing the current (user) model to the baseline (independent variables) model CFI stands for Comparative Fit Index Ø Higher is better (above.95 indicates good fit) TLI stands for Tucker Lewis Index Ø Higher is better (above.95 indicates good fit) EPSY 905: Multivariate Regression via Path Analysis 33

Comparative (Incremental) Fit Indices Fit evaluated relative to a null model (of 0 covariances) Ø Relative to that, your model should be great! CFI: Comparative Fit Index Ø Based on idea of the chi-square non-centrality parameter: (χ 2 df) Ø CFI = 1 WXY Z [ \ ]GI[,( WXY Z [ \ ]GI[,Z^\ ]GI^,( T = target (current/estimated) model N = null (baseline/independent variables) model Ø From 0 to 1: bigger is better, >.90 = acceptable, >.95 = good TLI: Tucker-Lewis Index (= Non-Normed Fit Index) Ø TLI = a^\ bc^] a [ \ bc [ a^\ bc^]q Ø From <0 to >1, bigger is better, >.95 = good EPSY 905: Multivariate Regression via Path Analysis 34

Information Criteria Output The information criteria output provides relative fit statistics: Ø AIC: Akaike Information Criterion Ø BIC: Bayesian Information Criterion (also called Schwarz s criterion) Ø Sample-size Adjusted BIC These statistics weight the information given by the parameter values by the parsimony of the model (the number of model parameters) Ø For all statistics, the smaller number is better The core of these statistics is -2*log-likelihood EPSY 905: Multivariate Regression via Path Analysis 35

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihood and Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 36

Comparing Information Criteria Information criteria are relative tests of fit The are calculated based on the log-likelihood of the model, factoring in a penalty for number of parameters (plus other things) They should never be used to compare nested models Ø The likelihood ratio test is the most powerful test statistic to use for nested models When comparing non-nested models, first choose a statistic Ø AIC, BIC, or Sample-size Adjusted BIC are what are given by default The preferred model is the one with the lowest value of that statistic EPSY 905: Multivariate Regression via Path Analysis 37

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihoodand Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 38

Indices of Global Model Fit Parsimony-Corrected: RMSEA Root Mean Square Error of Approximation Uses comparison with CFA model and saturated model Ø χ 8 listed here from first part of lavaan output Relies on a non-centrality parameter (NCP) Ø Indexes how far off your model is à χ 2 distribution shoved over Ø NCP à d = (χ 2 df) / (N-1) Then, RMSEA = SQRT(d/df) w df is difference between # parameters in CFA model and saturated model Ø RMSEA ranges from 0 to 1; smaller is better w <.05 or.06 = good,.05 to.08 = acceptable,.08 to.10 = mediocre, and >.10 = unacceptable Ø In addition to point estimate, get 90% confidence interval Ø RMSEA penalizes for model complexity it s discrepancy in fit per df left in model (but not sensitive to N, although CI can be) Ø Test of close fit : null hypothesis that RMSEA.05 EPSY 905: Multivariate Regression via Path Analysis 39

RMSEA (Root Mean Square Error of Approximation) The RMSEA is an index of model fit where 0 indicates perfect fit (smaller is better): RMSEA is based on the approximated covariance matrix The goal is a model with an RMSEA less than.05 Ø Although there is some flexibility The result above indicates our model fits poorly (RMSEA of.0088) EPSY 905: Multivariate Regression via Path Analysis 40

The fit.measures=true Model Fit Statistics Unlabeled section Ø Likelihood ratio test versus the saturated model Ø Testing if your model fits as well as the saturated model Model test baseline model Ø Likelihood ratio test pitting the saturated model against the independent variables model Ø Testing whether any variables have non-zero covariances (significant correlations) User model versus baseline model Ø CFI Ø TLI Loglikelihoodand Information Criteria Ø Likelihood ratio tests (nested models) Ø Information criteria comparisons (non-nested models) Root Mean Square Error of Approximation Ø How far off a model is from the saturated model, per degree of freedom Standardized Root Mean Square Residual Ø How far off a model s correlations are from the saturated model correlations EPSY 905: Multivariate Regression via Path Analysis 41

Standardized Root Mean Squared Residual The SRMR (standardized root mean square residual) provides the average standardized difference between: Ø The estimated covariance matrix of the saturated model Ø The estimated covariance matrix of the current model Lower is better (some suggest less than 0.08) EPSY 905: Multivariate Regression via Path Analysis 42

LOCAL MODEL FIT MEASURES EPSY 905: Multivariate Regression via Path Analysis 43

Local Measures of Model (Mis)Fit Local measures of model (mis)fit are statistics that point to the location (typically of a covariance matrix) where a model may not fit well Ø As opposed to global measures that indicate a model fit overall Local measures of model (mis)fit are typically of two types: Ø Residual covariance matrices (unstandardized, standardized, or normalized) w The difference between the model s estimated covariance matrix and the saturated model s estimated covariance matrix w These were used for the SRMR Ø Model modification indices w 1-degree of freedom hypothesis tests for the improvement of the model LRT if one more parameter was allowed to be estimated EPSY 905: Multivariate Regression via Path Analysis 44

Residual Covariance Matrices Residual covariance matrices are used to figure out how to best improve model misfit The raw or unstandardized residual covariance matrix for the model literally takes the difference between model implied and saturated model covariance matrices I often prefer normalized versions of these matrices Ø We can inspect the normalized residual covariance matrix (like z-scores) to see where our biggest misfit occurs EPSY 905: Multivariate Regression via Path Analysis 45

Modification Indices: More Help for Fit As we used Maximum Likelihood to estimate our model, another useful feature is that of the modification indices Ø Modification indices, also called Score or LaGrangian Multiplier tests, attempt to suggest the change in the log-likelihood for adding a given model parameter (larger values indicate a better fit for adding the parameter) mi column: the expected value of the LRT of the current model and a model where this parameter was added mi.scaled column: the scaled (robust) LRT Ø Should be bigger than 3.84 for 1 df Ø Practice is to find values that are much higher (say 10 or more) epc column: expected value of the parameter in the model where this parameter was added EPSY 905: Multivariate Regression via Path Analysis 46

ADDING PREDICTORS TO THE MODEL EPSY 905: Multivariate Regression via Path Analysis 47

Adding Predictors: Removing Zero Values from Parameters EPSY 905: Multivariate Regression via Path Analysis 48

First Question: Which Model Fits Better? After adding the predictors (estimating their betas) to the model, we must first ask which model fits better A likelihood ratio test (LRT) can be performed comparing model02 (with predictors) and model01 (without) Which model is the null model? Which model is the alternative model? What is the null hypothesis? What is the alterative hypothesis? EPSY 905: Multivariate Regression via Path Analysis 49

LRT With Scaled Chi-Squares R makes the scaled Chi-square LRT easy use the anova() function and it will rescale the Chi-squares automatically Here we see that we reject model01 (the null model) So we conclude that at least one beta value was significantly different from zero EPSY 905: Multivariate Regression via Path Analysis 50

Step 2: Inspect Model Fit Next we inspect the model fit of model02: Model02 has the same loglikelihood as the saturated model so it is equivalent to the saturated model Therefore it fits perfectly! Any path model where all exogenous variables predict all endogenous variables AND all covariances between endogenous variables are estimated is the saturated model EPSY 905: Multivariate Regression via Path Analysis 51

Up Next: Inspect Parameters and Make Interpretations EPSY 905: Multivariate Regression via Path Analysis 52

New Terms: Standardized Parameters Standardized parameters are parameters that are transformed by dividing by one or more standard deviations Big-picture example: Recall the covariance to correlation formula Covariance X, Y Correlation(X, Y) = SD X SD(Y) The correlation is a standardized covariance Standardized = units removed EPSY 905: Multivariate Regression via Path Analysis 53

Standardized Regression Parameters The standardized regression parameters are similar Take the original equation for a simple linear (one predictor) regression: β v w = ρ w,v σ y σ 0 Ø β v w is interpreted as the increase in units of Y per units of X To standardize (std.all in lavaan), remove units: b v w = β v w σ 0 σ y = ρ w,v Ø b v w is interpreted as the increase in SDs of Y per SDs of X Standardized parameters are useful for comparing effects on different scales EPSY 905: Multivariate Regression via Path Analysis 54

Questions to Answer about this Model What is the effect of college experience on usefulness for males? What is the effect of college experience on usefulness for females? What is the difference between males and females ratings of usefulness when college experience = 10? How did the difference between males and females ratings change for each additional hour of college experience? EPSY 905: Multivariate Regression via Path Analysis 55

Questions to Answer about this Model What is the effect of college experience on performance for males? What is the effect of college experience on performance for females? What is the difference between males and females performance when college experience = 10? How did the difference between males and females performance change for each additional hour of college experience? EPSY 905: Multivariate Regression via Path Analysis 56

WRAPPING UP AND REFOCUSING EPSY 905: Multivariate Regression via Path Analysis 57

Path Analysis: An Introduction In this lecture we discussed the basics of path analysis Ø Model specification/identification Ø Model estimation Ø Model fit (necessary, but not sufficient) Ø Model modification and re-estimation Ø Final model parameter interpretation There is a lot to the analysis but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important Ø Path models imply very specific covariance structures Ø The validity of the results hinge upon accurately finding an approximation to the covariance matrix EPSY 905: Multivariate Regression via Path Analysis 58