Alan C. Acock’s A Gentle Introduction to Stata, Revised Sixth Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users will be able to not only use Stata well but also learn new aspects of Stata.
Acock assumes that the user is not familiar with any statistical software. This assumption of a blank slate is central to the structure and contents of the book. Acock starts with the basics; for example, the part of the book that deals with data management begins with a careful and detailed example of turning survey data on paper into a Stata-ready dataset. When explaining how to go about basic exploratory statistical procedures, Acock includes notes that will help the reader develop good work habits. This mixture of explaining good Stata habits and explaining good statistical habits continues throughout the book.
Acock is quite careful to teach the reader all aspects of using Stata. He covers data management, good work habits (including the use of basic do-files), basic exploratory statistics (including graphical displays), and analyses using the standard array of basic statistical tools (correlation, linear and logistic regression, and parametric and nonparametric tests of location and dispersion). He also successfully introduces some more advanced topics such as multiple imputation and multilevel modeling in a very approachable manner. Acock teaches Stata commands by using the menus and dialog boxes while still stressing the value of Stata commands and do-files. In this way, he ensures that all types of users can build good work habits. Each chapter has exercises that the motivated reader can use to reinforce the material.
The tone of the book is friendly and conversational without ever being glib or condescending. Important asides and notes about terminology are set off in boxes, which makes the text easy to read without any convoluted twists or forward referencing. Rather than splitting topics by their Stata implementation, Acock arranges the topics as they would appear in a basic statistics textbook; graphics and postestimation are woven into the material naturally. Real datasets, such as the General Social Surveys from 2002, 2006, and 2016, are used throughout the book.
The focus of the book is especially helpful for those in the behavioral and social sciences because the presentation of basic statistical modeling is supplemented with discussions of effect sizes and standardized coefficients. Various selection criteria, such as semipartial correlations, are discussed for model selection. Acock also covers a variety of commands available for evaluating reliability and validity of measurements.
The revised sixth edition is fully up to date for Stata 17, including updated discussion and images of Stata’s interface and modern command syntax. In addition, examples include new features such as the table command and collect suite for creating and exporting customized tables as well as the option for creating graphs with transparency.
© Copyright 1996–2022 StataCorp LLC
List of figures
List of tables
List of boxed tips
Preface
Support materials for the book
1. GETTING STARTED
Conventions
Introduction
The Stata screen
Using an existing dataset
An example of a short Stata session
Video aids to learning Stata
Summary
Exercises
2. ENTERING DATA
Creating a dataset
An example questionnaire
Developing a coding system
Entering data using the Data Editor
Value labels
The Variables Manager
The Data Editor (Browse) view
Saving your dataset
Checking the data
Summary
Exercises
3. PREPARING DATA FOR ANALYSIS
Introduction
Planning your work
Creating value labels
Reverse-code variables
Creating and modifying variables
Creating scales
Save some of your data
Summary
Exercises
4. WORKING WITH COMMANDS, DO-FILES, AND RESULTS
Introduction
How Stata commands are constructed
Creating a do-file
Copying your results to a word processor
Logging your command file
Summary
Exercises
5. DESCRIPTIVE STATISTICS AND GRAPHS FOR ONE VARIABLE
Descriptive statistics and graphs
Where is the center of a distribution?
How dispersed is the distribution?
Statistics and graphs—unordered categories
Statistics and graphs—ordered categories and variables
Statistics and graphs—quantitative variables
Summary
Exercises
6. STATISTICS AND GRAPHS FOR TWO CATEGORICAL VARIABLES
Relationship between categorical variables
Cross-tabulation
Chi-squared test
Degrees of freedom
Probability tables
Percentages and measures of association
Odds ratios when dependent variable has two categories
Ordered categorical variables
Interactive tables
Tables—linking categorical and quantitative variables
Power analysis when using a chi-squared test of significance
Summary
Exercises
7. TESTS FOR ONE OR TWO MEANS
Introduction to tests for one or two means
Randomization
Random sampling
Hypotheses
One-sample test of a proportion
Two-sample test of a proportion
One-sample test of means
Two-sample test of group means
Testing for unequal variances
Repeated-measures t test
Power analysis
Nonparametric alternatives
Mann–Whitney two-sample rank-sum test
Nonparametric alternative: Median test
Video tutorial related to this chapter
Summary
Exercises
8. BIVARIATE CORRELATION AND REGRESSION
Introduction to bivariate correlation and regression
Scattergrams
Plotting the regression line
An alternative to producing a scattergram, binscatter
Correlation
Regression
Spearman’s rho: Rank-order correlation for ordinal data
Power analysis with correlation
Summary
Exercises
9. ANALYSIS OF VARIANCE
The logic of one-way analysis of variance
ANOVA example
ANOVA example with nonexperimental data
Power analysis for one-way ANOVA
A nonparametric alternative to ANOVA
Analysis of covariance
Two-way ANOVA
Repeated-measures design
Intraclass correlation—measuring agreement
Power analysis with ANOVA
Power analysis for one-way ANOVA
Power analysis for two-way ANOVA
Power analysis for repeated-measures ANOVA
Summary of power analysis for ANOVA
Summary
Exercises
10. MULTIPLE REGRESSION
Introduction to multiple regression
What is multiple regression?
The basic multiple regression command
Increment in R-squared: Semipartial correlations
Is the dependent variable normally distributed?
Are the residuals normally distributed?
Regression diagnostic statistics
Outliers and influential cases
Influential observations: DFbeta
Combinations of variables may cause problems
Weighted data
Categorical predictors and hierarchical regression
A shortcut for working with a categorical variable
Fundamentals of interaction
Nonlinear relations
Fitting a quadratic model
Centering when using a quadratic term
Do we need to add a quadratic component?
Power analysis in multiple regression
Summary
Exercises
11. LOGISTIC REGRESSION
Introduction to logistic regression
An example
What is an odds ratio and a logit?
The odds ratio
The logit transformation
Data used in the rest of the chapter
Logistic regression
Hypothesis testing
Testing individual coefficients
Testing sets of coefficients
Margins: More on interpreting results from logistic regression
Nested logistic regressions
Power analysis when doing logistic regression
Next steps for using logistic regression and its extensions
Summary
Exercises
12. MEASUREMENT, RELIABILITY, AND VALIDITY
Overview of reliability and validity
Constructing a scale
Generating a mean score for each person
Reliability
Stability and test–retest reliability
Equivalence
Split-half and alpha reliability—internal consistency
Kuder–Richardson reliability for dichotomous items
Rater agreement—kappa (K)
Validity
Expert judgment
Criterion-related validity
Construct validity
Factor analysis
PCF analysis
Orthogonal rotation: Varimax
Oblique rotation: Promax
But we wanted one scale, not four scales
Scoring our variable
Summary
Exercises
13. STRUCTURAL EQUATION AND GENERALIZED STRUCTURAL EQUATION MODELING
Linear regression using sem
Using the sem command directly
SEM and working with missing values
Exploring missing values and auxiliary variables
Getting auxiliary variables into your SEM command
A quick way to draw a regression model
The gsem command for logistic regression
Fitting the model using the logit command
Fitting the model using the gsem command
Path analysis and mediation
Conclusions and what is next for the sem command
Exercises
14. WORKING WITH MISSING VALUES – MULTIPLE IMPUTATION
What variables do we include when doing imputations?
The nature of the problem
Multiple imputation and its assumptions about the mechanism for missingness
Multiple imputation
Setup and multiple-imputation stage
The analysis stage
For those who want an R2 and standardized βs
When impossible values are imputed
Summary
Exercises
15. AN INTRODUCTION TO MULTILEVEL ANALYSIS
Questions and data for groups of individuals
Questions and data for a longitudinal multilevel application
Fixed-effects regression models
Random-effects regression models
An applied example
Research questions
Reshaping data to do multilevel analysis
Random-intercept model
Random-intercept model—quadratic term
Treating time as a categorical variable
Including a time-invariant covariate
Summary
Exercises
Overview of three IRT models for dichotomous items
The one-parameter logistic (1PL) model
The two-parameter logistic (2PL) model
The three-parameter logistic (3PL) model
Fitting the 1PL model using Stata
The estimation
How important is each of the items?
An overall evaluation of our scale
Estimating the latent score
Fitting a 2PL IRT model
Fitting the 2PL model
The graded response model—IRT for Likert-type items
The data
Fitting our graded response model
Estimating a person’s score
Reliability of the fitted IRT model
Using the Stata menu system
Extensions of IRT
Exercises
A. WHAT’S NEXT?
Introduction to the appendix
Resources
Web resources
Books about Stata
Short courses
Acquiring data
Learning from the postestimation methods
Summary
© Copyright 1996–2022 StataCorp LLC