Michael N. Mitchell’s Data Management Using Stata: A Practical Handbook, Second Edition comprehensively covers data management tasks, from those a beginning statistician would need to those hard-to-verbalize tasks that can confound an experienced user. Mitchell does this all in simple language with illustrative examples.
The book is modular in structure, with modules based on data management tasks rather than on clusters of commands. This format is helpful because it allows readers to find just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.
Throughout the book, Mitchell subtly emphasizes the absolute necessity of reproducibility and an audit trail. Instead of stressing programming esoterica, Mitchell reinforces simple habits and points out the time savings gained by being careful. Mitchell’s experience in UCLA’s Academic Technology Services clearly drives much of his advice.
The second edition brings updates needed for features added to Stata versions since Stata 10: reading and writing Microsoft Excel files, working with Unicode properly, and using frames. Mitchell also added a chapter showing how to build your own utility programs to simplify and automate routine tasks, easing code maintenance and aiding uniformity across projects.
New users will learn everything they need to import, clean, and prepare data for first analyses in Stata. Even experienced users will learn new tricks and new ways to approach data management problems.
This is a great book–thoroughly recommended for anyone interested in data management using Stata.
Acknowledgements
List of tables
List of figures
Preface to the Second Edition
1. INTRODUCTION
Using this book
Overview of this book
Listing observations in this book
More online resources
2. READING AND WRITING DATASETS
Introduction
Reading Stata datasets
Importing Excel spreadsheets
Importing SAS files
Importing SAS XPORT Version 5 files
Importing SAS XPORT Version 8 files
Importing SPSS files
Importing dBase files
Importing raw data files
Importing space-separated files
Importing fixed-column files
Importing fixed-column files with multiple lines of raw data per observation
Common errors when reading and importing files
Entering data directly into the Stata Data Editor
3. SAVING AND EXPORTING DATA FILES
Introduction
Saving Stata datasets
Exporting Excel files
Exporting SAS XPORT Version 8 files
Exporting SAS XPORT Version 5 files
Exporting dBase files
Exporting comma-separated and tab-separated files
Exporting space-separated files
Exporting Excel files revisited: Creating reports
4. DATA CLEANING
Introduction
Double data entry
Checking individual variables
Checking categorical by categorical variables
Checking categorical by continuous variables
Checking continuous by continuous variables
Correcting errors in data
Identifying duplicates
Final thoughts on data cleaning
5. LABELING DATASETS
Introduction
Describing datasets
Labeling variables
Labeling values
Labeling utilities
Labeling variables and values in different languages
Adding comments to your dataset using notes
Formatting the display of variables
Changing the order of variables in a dataset
6. CREATING VARIABLES
Introduction
Creating and changing variables
Numeric expressions and functions
String expressions and functions
Recoding
Coding missing values
Dummy variables
Date variables
Date-and-time variables
Computations across variables
Computations across observations
More examples using the egen command
Converting string variables to numeric variables
Converting numeric variables to string variables
Renaming and ordering variables
7. COMBINING DATASETS
Introduction
Appending: Appending datasets
Appending: Problems
Merging: One-to-one match merging
Merging: One-to-many match merging
Merging: Merging multiple datasets
Merging: Update merges
Merging: Additional options when merging datasets
Merging: Problems merging datasets
Joining datasets
Crossing datasets
8. PROCESSING OBSERVATIONS ACROSS SUBGROUPS
Introduction
Obtaining separate results for subgroups
Computing values separately by subgroups
Computing values within subgroups: Subscripting observations
Computing values within subgroups: Computations across observations
Computing values within subgroups: Running sums
Computing values within subgroups: More examples
Comparing the by and tsset commands
9. CHANGING THE SHAPE OF YOUR DATA
Introduction
Wide and long datasets
Introduction to reshaping long to wide
Reshaping long to wide: Problems
Introduction to reshaping wide to long
Reshaping wide to long: Problems
Multilevel datasets
Collapsing datasets
10. PROGRAMMING FOR DATA MANAGEMENT: PART 1
Introduction
Tips on long-term goals in data management
Executing do-files and making log files
Automating data checking
Combining do-files
Introducing Stata macros
Manipulating Stata macros
Repeating commands by looping over variables
Repeating commands by looping over numbers
Repeating commands by looping over anything
Accessing results stored from Stata commands
10. PROGRAMMING FOR DATA MANAGEMENT: PART 2
Writing Stata programs for data management
Program 1: hello
Where to save your Stata programs
Program 2: Multilevel counting
Program 3: Tabulations in list format
Program 4: Scoring the simple depression scale
Program 5: Standardizing variables
Program 6: Checking variable labels
Program 7: Checking value labels
Program 8: Customized describe command
Program 9: Customized summarize command
Program 10: Checking for unlabeled values
Tips on debugging Stata programs
Final thoughts: Writing Stata programs for data management
A. COMMON ELEMENTS
Introduction
Overview of Stata syntax
Working across groups of observations with by
Comments
Data types
Logical expressions
Functions
Subsetting observations with if and in
Subsetting observations and variables with keep and drop
Missing values
Referring to variable lists
Frames
Frames example 2: Juggling related tasks
Frames example 3: Checking double data entry