# BasicStatisticsinMultivariateAnalysis

Basic Statistics in Multivariate Analysis Determining Sample Size: Balancing Power, Precision, and Practicality Patrick Dattalo Preparing Research Articles Bruce A. Thyer Systematic Reviews and Meta-Analysis Julia H. Littell, Jacqueline Corcoran, and Vijayan Pillai Historical Research Elizabeth Ann Danto Confi rmatory Factor Analysis Donna Harrington Randomized Controlled Trials: Design and Implementation for Community-Based Psychosocial Interventions Phyllis Solomon, Mary M. Cavanaugh, and Jeffrey Draine Needs Assessment David Royse, Michele Staton-Tindall, Karen Badger, and J. Matthew Webster Multiple Regression with Discrete Dependent Variables John G. Orme and Terri Combs-Orme Developing Cross-Cultural Measurement Thanh V. Tran Intervention Research : Developing Social Programs Mark W. Fraser, Jack M. Richman, Maeda J. Galinsky, and Steven H. Day Developing and Validating Rapid Assessment Instruments Neil Abell, David W. Springer, and Akihito Kamata Clinical Data-Mining: Integrating Practice and Research Irwin Epstein Strategies to Approximate Random Sampling and Assignment Patrick Dattalo Analyzing Single System Design Data William R. Nugent Survival Analysis Shenyang Guo The Dissertation: From Beginning to End Peter Lyons and Howard J. Doueck Cross-Cultural Research Jorge Delva, Paula Allen-Meares, and Sandra L. Momper Secondary Data Analysis Thomas P. Vartanian Narrative Inquiry Kathleen Wells Structural Equation Modeling Natasha K. Bowen and Shenyang Guo Finding and uating Evidence: Systematic Reviews and Evidence-Based Practice Denise E. Bronson and Tamara S. Davis Policy Creation and uation: Understanding Welfare Re in the United States Richard Hoefer Grounded Theory Julianne S. Oktay Systematic Synthesis of Qualitative Research Michael Saini and Aron Shlonsky Quasi-Experimental Research Designs Bruce A. Thyer Conducting Research in Juvenile and Criminal Justice Settings Michael G. Vaughn, Carrie Pettus-Davis, and Jeffrey J. Shook Qualitative s for Practice Research Jeffrey Longhofer, Jerry Floersch, and Janet Hoy Analysis of Multiple Dependent Variables Patrick Dattalo Culturally Competent Research: Using Ethnography as a Meta-Framework Mo Yee Lee and Amy Zaharlick Using Complexity Theory for Research and Program uation Michael Wolf-Branigin Basic Statistics in Multivariate Analysis Karen A. Randolph and Laura L. Myers POCKET GUIDES TO SOCIAL WORK RESEARCH S Series Editor Tony Tripodi, DSW Professor Emeritus, Ohio State University Basic Statistics in Multivariate Analysis KAREN A. RANDOLPH LAURA L. MYERS 1 3 Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offi ces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016 ? Oxford University Press 2013 All rights reserved. No part of this publication may be reproduced, stored in a retri system, or transmitted, in any or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this work in any other and you must impose this same condition on any acquirer. Library of Congress Cataloging-in-Publication Data Randolph, Karen A. Basic statistics in multivariate analysis / Karen A. Randolph, Laura L. Myers. p. cm. — (Pocket guides to social work research s) Includes bibliographical references and index. ISBN 978–0–19–976404–4 (pbk. : alk. paper) 1. Social service—Research— ology. 2. Multivariate analysis. I. Myers, Laura L. II. Title. HV11.R3123 2013 519.5′35—dc23 2012033754 1 3 5 7 9 8 6 4 2 Printed in the United States of America on acid-free paper v Contents Acknowledgments vii 1 Introduction 1 2 Descriptive Statistical s 11 3 Inferential Statistics 35 4 Bivariate Statistical s 69 5 Bivariate and Multivariate Linear Regression Analysis 109 6 Analysis of Variance (ANOVA) and Covariance (ANCOVA) 133 7 Path Analysis 163 Appendix: Statistical Symbols 187 Glossary 189 References 203 Index 209 This page intentionally left blank vii Acknowledgments W e are grateful for the generous encouragement and helpful advice from a number of people who contributed to the preparation of this book. First and foremost, we thank Dr. Tony Tripodi, the series edi- tor of Pocket Guides to Social Work Research s , for his unwavering support and enthusiasm. We are also very appreciative of the assistance we received from Maura Roessner, Editor, and Nicholas Liu, Assistant Editor, at Oxford University Press. Karen Randolph is grateful for the support of Mrs. Agnes Flaherty Stoops through the Agnes Flaherty Stoops Professorship in Child Welfare. We have been fortunate to work with many bright and talented students, whose commitment to understand- ing and applying complex statistical s in conducting their own social work–based research inspired us to pursue this topic. A very spe- cial acknowledgment is extended to Christina Ouma, doctoral student and contributing author of this book’s companion website, for her tire- less efforts and attention to detail in navigating the National Educational Longitudinal Study of 1988 to develop the practice rcises for the statistical procedures. We also thank Leah Cheatham, Hyejin Kim, Dr. Bruce Thyer, and David Albright for their thoughtful comments on manuscript drafts. Karen Randolph would like to thank Dr. Betsy Becker, Ying Zhang, Leah Cheatham, and Tammy Bradford for their assistance on Chapter Six. This page intentionally left blank Basic Statistics in Multivariate Analysis This page intentionally left blank 1 1 Introduction Statistical s used to investigate questions that are relevant to social work researchers are becoming more complex. The use of meth- ods such as path analysis in causal modeling is increasingly required to match appropriate data analysis procedures to questions of interest. As a consequence, social work researchers need a skill set that allows them to thoroughly understand and test multivariate models accurately. A strong background in basic statistics provides the foundation for this skill set and allows for the use of more advanced s to study relevant ques- tions to social work researchers. The purpose of Basic Statistics in Multivariate Analysis is to introduce readers to three multivariate analytical s, with a focus on the basic statistics (e.g., mean, variance) that support these s. Multivariate analytical s are made up of basic statistical procedures. This is an important, yet often overlooked, aspect of advanced statistics. We posit that, by having a strong foundation in basic statistics, particularly with regard to understanding their role in more advanced s, readers will be more confi dent and thus more likely to utilize advanced s in their research. What do we mean by “basic statistics”? Basic statistics are statistics that organize and summarize data. This includes frequency distributions, 2 Basic Statistics in Multivariate Analysis percentages, measures of central tendency (i.e., mean, median, mode), and measures of dispersion or variability (i.e., range, variance, and stan- dard deviation). Basic statistics are also referred to as descriptive statis- tics (e.g., Rubin, 2010), as the intent is to describe individual variables, rather than test inferences about the relationships between variables. The results of basic statistical analysis, also called univariate analysis, are often displayed in charts and graphs, such as bar graphs, histograms, and stem and leaf plots. What do we mean by “multivariate analysis”? The focus of multivari- ate analysis s is on multiple variables. It is a collection of statistical techniques that is used to examine and make inferences about the relation- ships between variables. Real world problems that are of interest to social workers are generally affected by a variety of factors. Multivariate analysis allows social work researchers to understand the etiology of these prob- lems in a way that more accurately refl ects how they really happen. We can examine the relationships between several factors (i.e., variables) and an outcome by using multivariate analytic s. While several multi- variate analytic s are available, we describe three of the more com- mon s—multiple linear regression analysis, analysis of variance (ANOVA) and covariance (ANCOVA), and path analysis. More ina- tion about each of these s is provided later in this chapter. Bivariate statistics make up a special class of multivariate statistics. As the name implies, bivariate statistics focus on the relationship between two variables. Common bivariate statistical tests are the dependent sam- ples t- test, the independent samples t- test, the Pearson r correlation, and the chi-square test. These tests, and others, are used to test inferences about the relationship between two variables. In general, books on basic statistics for social workers address an important educational need in social work education—to facilitate the development of skills for consuming ination about statistics. This is based on an assumption that the target audience is unlikely to extend their studies in a way that includes the need to learn and conduct mul- tivariate analysis . Our focus is different. We take a unique approach by directing our efforts toward preparing entry-level doctoral students and early-career social work researchers, especially those who may not have a strong background in basic statistics, to use advanced analytic pro- cedures by highlighting the important role of basic statistics in these s. In their content review of 30 statistical textbooks, Hulsizer and Introduction 3 Woolf (2009) observed that only “a small handful of authors elected to go beyond simple regression and include a chapter on multiple regres- sion (13%)” (p. 35). They also note the absence of content on Analysis of Covariance (ANCOVA) and other multivariate s. We include content beyond simple regression to address these gaps. The primary focus of this book is to offer opportunities for readers, particularly entry-level doctoral students and early-career social work researchers, to strengthen their understanding and skills in basic statistics and related statistical procedures so that they are more prepared to utilize multivariate analytical s to study problems and issues that are of concern to social workers. We assume that readers have familiarity with univariate and bivariate statistical analysis and some experience in using the Statistical Package for the Social Sciences (SPSS) and AMOS software (SPSS Inc., 2011). The book is also designed to be used as a reference guide in addressing questions that may emerge in conducting multivari- ate analysis, as well as a companion text in advanced statistics courses for doctoral students. THE BRIDGE FROM BASIC TO INFERENTIAL STATISTICS IN DATA ANALYSIS This book provides ination about both basic and inferential statis- tics. Basic statistics summarize or classify the characteristics of a sample. They provide a foundation for understanding the sample. For example, basic statistics can be used to indicate the number or percentage of males and females in a study, their mean or average age, and the range of their ages from youngest to oldest. Basic statistics include counts, percentages, frequency distributions, measures of central tendency, and measures of variability. They can be displayed as various graphical representations of the data. While basic statistics provide ination about a sample, inferen- tial statistics focus on the population from which the sample was drawn, using data collected from the sample. Inferential statistics are used to make predictions or draw conclusions about the population based on what is known about the sample. Probability theory provides the basis for making predictions about a population from a sample. Inferential statis- tics include parametric statistical tests such as the Pearson’s r correlation, Student’s t -tests, and analysis of variance, and nonparametric statistical 4 Basic Statistics in Multivariate Analysis tests such as Spearman’s rho, Mann–Whitney U , Wilcoxson signed rank, and Kruskal–Wallis H tests. As an example, the Pearson r correlation test could be used to determine the relationship between depression and fre- quency of alcohol use among older adults. Basic and inferential statistics differ based on what their intended purpose is with regard to the type of ination they provide. Basic statistics are used to summarize ination about a sample. Inferential statistics are used to make predictions about a population based on infor- mation obtained from a sample of the population. The process of making predictions from a sample to a population with inferential statistics is more restrictive than summarizing data using basic statistics. Because inferential statistics involve statistical testing, four assumptions about the data must be met. In general, the assumptions are as follows: 1) the dependent variable (DV) is mea- sured at the interval or ratio level, 2) the distribution of the data is normal (i.e., unimodal and not excessively skewed or kurtotic), 3) the variances across the data are equal (i.e., homogeneity of variance), and 4) the observations are independent of one another. Note that, for some tests (e.g., dependent-samples t test), the 4th assumption (i.e., independence of observations) does not apply. This is the case when data are collected from the same sample at more than one time point (e.g., pre- and posttest observations). Furthermore, the man- ner in which some assumptions are operationalized varies depending on the particular type of parametric test. Finally, additional assump- tions are required for tests of multivariate models when causality is inferred. All of this can be confusing. We will cover assumptions in much more detail throughout the book, including how to determine whether assumptions have been met and the impact on results when assumptions are violated. Making predictions using inferential statistical tests also requires that models are accurately estimated. The following criteria are used to ensure accuracy in model estimation: The model should be correctly specifi ed. A correctly specifi ed ? model is one in which 1) all relevant independent variables (IV) are in the model, 2) all irrelevant IVs are not in the model, 3) each IV is measured without error, and 4) the IVs in the model are not correlated with variables that are not in the model. Introduction 5 The IVs should not be strongly correlated with one another (i.e., ? no undue multicollinearity). There should be no infl uential outliers among the IVs or in the ? solution. The sample size should be large enough to detect results at the ? desired effect size. We will also discuss these criteria in more detail in subsequent chapters, including how to determine whether each criterion has been met and when a criterion is not met, the extent to which it becomes problematic in model testing. Basic and inferential statistics are related to one another in that basic statistics provide the foundation for conducting multivariate analyses, in order to make inferences about the relationship between variables. Kleinbaum and others (1988) describe this succinctly: The primary goal of most statistical analysis is to make statistical infer- ences, that is, to draw valid conclusions about a population of items of measurements based upon ination contained in a sample from that population. Once sample data have been collected, it is useful, prior to analysis, to examine the data using tables, graphs, and [basic] statistics, such as the sample mean or the sample variance. Such descriptive efforts are important for representing the essential features of the data in easily interpretable terms. Following such examination, statistical inferences are made through two related activities: estimation and hypothesis test- ing. (p. 16) AN INTRODUCTION TO MULTIVARIATE ANALYSIS IN SOCIAL WORK In this book we describe how basic statistics are used to in three common multivariate analytical s—multiple linear regression analyses, analysis of variance (ANOVA) and covariance (ANCOVA), and path analysis. Often these s are used to support making inferences about causality between variables. Of course, inferring causality requires more than just establishing a statistical association between variables. Other conditions are 1) the presumed cause (e.g., X) occurs before the presumed effect (e.g., Y) (i.e., time precedence), 2) the direction of the 6 Basic Statistics in Multivariate Analysis causal relationship (e.g., X causes Y rather than the other way around) is correctly specifi ed (i.e., correct effect priority), and 3) there are no other plausible explanations of the relationship between the presumed cause and the presumed effect (i.e., nonspuriousness) (Kline, 2011, p. 98). We will revisit the conditions for establishing causality, particularly with regard to time order and nonspuriousness as these criteria are important in path analysis, discussed in Chapter 7. Note also that each of these s is a of the general linear model. The basis of the general linear model is that “relationships among dependent and independent variables vary according to straight-line patterns” (Bohrnstedt educa- tional resources and support; the role of parents and peers in education; self-reports on smoking, alcohol and drug use, and extracurricular activ- ities; and results of achievement tests in reading, social studies, math- ematics and science” (United States Department of Education, National Educational Longitudinal Study of 1988, 2011, “Overview,” para. 1). We use data from the fi rst and second waves of NELS: 88 to demonstrate the statistical techniques described in the book. This page intentionally left blank 11 2 Descriptive Statistical s Descriptive statistical s are used to summarize all of the data in an existing database into fewer numbers, making the data easier to visu- alize and understand. Faulkner and Faulkner (2009) defi ne descriptive statistical s as “ways of organizing, describing, and presenting quantitative (numerical) data in a manner that is concise, manageable, and understandable” (p. 155). Descriptive statistics utilize univariate statistical s to examine and summarize data one variable at a time. We can calculate numeric values that describe samples or popula- tions. Numeric values that describe samples are called statistics, whereas numeric values that describe populations are called parameters . This chapter focuses on a review of the descriptive statistical s com- monly used in social work research. Before we turn to these individual s, we will fi rst look at the steps involved in defi ning the variables that will be used in a study, and in determining how and at what level these variables will be measured. 12 Bas