TStat - O F F I C I A L

NOVITA'/TESTI

Data Management Using Stata: A Practical Handbook

by Michael N. Mitchell

Michael N. Mitchell’s Data Management Using Stata comprehensively covers data-management tasks, from those a beginning statistician would need to those hard-to-verbalize tasks that can confound an experienced user. Mitchell does this all in simple language with illustrative examples.

The book is modular in structure, with modules based on data-management tasks rather than on clusters of commands. This format is helpful because it allows readers to find and read just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.

Throughout the book, Mitchell subtly emphasizes the absolute necessity of reproducibility and an audit trail. Instead of stressing programming esoterica, Mitchell reinforces simple habits and points out the time-savings gained by being careful. Mitchell’s experience in UCLA’s Academic Technology Services clearly drives much of his advice.

Mitchell includes advice for those who would like to learn to write their own data-management Stata commands. Even experienced users will learn new tricks and new ways to approach data-management problems.

This is a great book—thoroughly recommended for anyone interested in data management using Stata.

Table of contents

Acknowledgements

List of tables

List of figures

Preface

1 Introduction

1.1 Using this book
1.2 Overview of this book
1.3 Listing observations in this book
1.1 The likelihood maximization problem

2 Reading and writing datasets

2.1 Introduction
2.2 Reading Stata datasets
2.3 Saving Stata datasets
2.4 Reading comma-separated and tab-separated files
2.5 Reading space-separated files
2.6 Reading fixed-column files
2.7 Reading fixed-column files with multiple lines of raw data per observation
2.8 Reading SAS XPORT files
2.9 Common errors reading files
2.10 Entering data directly into the Stata Data Editor
2.11 Saving comma-separated and tab-separated files
2.12 Saving space-separated files
2.13 Saving SAS XPORT files

3 Data cleaning

3.1 Introduction
3.2 Double data entry
3.3 Checking individual variables
3.4 Checking categorical by categorical variables
3.5 Checking categorical by continuous variables
3.6 Checking continuous by continuous variables
3.7 Correcting errors in data
3.8 Identifying duplicates
3.9 Final thoughts on data cleaning

4 Labeling datasets

          4.1 Introduction
          4.2 Describing datasets
          4.3 Labeling variables
          4.4 Labeling values
          4.5 Labeling utilities
          4.6 Labeling variables and values in different languages
          4.7 Adding comments to your dataset using notes
          4.8 Formatting the display of variables
          4.9 Changing the order of variables in a dataset

5 Creating variables

5.1 Introduction
5.2 Creating and changing variables
5.3 Numeric expressions and functions
5.4 String expressions and functions
5.5 Recoding
5.6 Coding missing values
5.7 Dummy variables
5.8 Date variables
5.9 Date-and-time variables
5.10 Computations across variables
5.11 Computations across observations
5.12 More examples using the egen command
5.13 Converting string variables to numeric variables
5.14 Converting numeric variables to string variables
5.15 Renaming and ordering variables

6 Combining datasets

6.1 Introduction
6.2 Appending: Appending datasets
6.3 Appending: Problems
6.4 Merging: One-to-one match-merging
6.5 Merging: One-to-many match-merging
6.6 Merging: Merging multiple datasets
6.7 Merging: Update merges
6.8 Merging: Additional options when merging datasets
6.9 Merging: Problems merging datasets
6.10 Joining datasets
6.11 Crossing datasets

7 Processing observations across subgroups

7.1 Introduction
7.2 Obtaining separate results for subgroups
7.3 Computing values separately by subgroups
7.4 Computing values within subgroups: Subscripting observations
7.5 Computing values within subgroups: Computations across observations
7.6 Computing values within subgroups: Running sums
7.7 Computing values within subgroups: More examples
7.8 Comparing the by and tsset commands

8 Changing the shape of your data

8.1 Introduction
8.2 Wide and long datasets
8.3 Introduction to reshaping long to wide
8.4 Reshaping long to wide: Problems
8.5 Introduction to reshaping wide to long
8.6 Reshaping wide to long: Problems
8.7 Multilevel datasets
8.8 Collapsing datasets

9 Programming for data management

9.1 Introduction
9.2 Tips on long-term goals in data management
9.3 Executing do-files and making log files
9.4 Automating data checking
9.5 Combining do-files
9.6 Introducing Stata macros
9.7 Manipulating Stata macros
9.8 Repeating commands by looping over variables
9.9 Repeating commands by looping over numbers
9.10 Repeating commands by looping over anything
9.11 Accessing results saved from Stata commands
9.12 Saving results of estimation commands as data
9.13 Writing Stata programs

10 Additional resources

10.1 Online resources for this book
10.2 Finding and installing additional programs
10.3 More online resources

A Common elements

A.1 Introduction
A.2 Overview of Stata syntax
A.3 Working across groups of observations with by
A.4 Comments
A.5 Data types
A.6 Logical expressions
A.7 Functions
A.8 Subsetting observations with if and in
A.9 Subsetting observations and variables with keep and drop
A.10 Missing values
A.11 Referring to variable lists

Subject Index

via Rettangolo, 12/14 - 67039 - Sulmona (AQ) - Italia