Christopher F. Baum’s An Introduction to Stata Programming, Second Edition, is a great reference for anyone who wants to learn Stata programming.
Baum assumes readers have some familiarity with Stata, but readers who are new to programming will find the book accessible. He begins by introducing programming concepts and basic tools. More advanced programming tools such as structures and pointers and likelihood-function evaluators using Mata are gradually introduced throughout the book alongside examples.
This new edition reflects some of the most important statistical tools added since Stata 10. Of note are factor variables and operators, the computation of marginal effects, marginal means, and predictive margins using margins, the use of gmm to implement generalized method of moments estimation, and the use of suest for seemingly unrelated estimation.
As in the previous edition of the book, Baum steps the reader through the three levels of Stata programming. He starts with do-files. Do-files are powerful batch files that support loops and conditional statements and are ideal to automate your workflow as well as to guarantee reproducibility of your work.
He then delves into ado-files, which are used to extend Stata by creating new commands that share the syntax and behavior of official commands. Baum gives an example of how to write a command to calculate percentiles and the range of a variable, complete with documentation and certification.
After introducing the fundamentals of command development, Baum shows users how these concepts can be applied to help them write their own custom estimation commands by using Stata’s built-in numerical maximum-likelihood estimation routine, ml, its built-in nonlinear least-squares routines, nl and nlsur, and its built-in generalized method of moments estimation routine.
Finally, he introduces Mata, Stata’s matrix programming language. Mata programs are integrated into ado-files to build a custom estimation routine that is optimized for speed and numerical stability. Baum briefly discusses how ado-file programming concepts relate to Mata functions and objects. He also explains some of the advantages of using Mata for certain programming tasks. Baum introduces concepts by providing the background and importance of the topic, presents common uses and examples, and then concludes with larger, more applied examples he refers to as “cookbook recipes”.
Many of the examples are of particular interest because they arose from frequently asked questions from Stata users. If you want to understand basic Stata programming or want to write your own routines and commands using advanced Stata tools, Baum’s book is a great reference.
Table of contents
List of tables
List of figures
Acknowledgments
Notation and typography
1. WHY SHOULD YOU BECOME A STATA PROGRAMMER?
Do-file programming
Ado-file programming
Mata programming for ado-files
Plan of the book
Installing the necessary software
2. SOME ELEMENTARY CONCEPTS AND TOOLS
Introduction
What you should learn from this chapter
Navigational and organizational issues
The current working directory and profile.do
Locating important directories: sysdir and adopath
Organization of do-files, ado-files, and data files
Editing Stata do- and ado-files
Data types
Storing data efficiently: The compress command
Date and time handling
Time-series operators
Handling errors: The capture command
Protecting the data in memory: The preserve and restore commands
Getting your data into Stata
Inputting data from ASCII text files and spreadsheets
Handling text files
Free format versus fixed format
The insheet command
Accessing data stored in spreadsheets
Fixed-format data files
Importing data from other package formats
Guidelines for Stata do-file programming style
Basic guidelines for do-file writers
Enhancing speed and efficiency
How to seek help for Stata programming
3. DO-FILE PROGRAMMING: FUNCTIONS, MACROS, SCALARS, AND MATRICES
Introduction
What you should learn from this chapter
Some general programming details
The varlist
The numlist
The if exp and in range qualifiers
Missing data handling
Recoding missing values: The mvdecode and mvencode commands
String-to-numeric conversion and vice versa
Numeric-to-string conversion
Working with quoted strings
Functions for the generate command
Using if exp with indicator variables
The cond() function
Recoding discrete and continuous variables
Functions for the egen command
Official egen functions
egen functions from the user community
Computation for by-groups
Observation numbering: _n and _N
Local macros
Global macros
Extended macro functions and macro list functions
System parameters, settings, and constants:creturn
Scalars
Matrices
4. COOKBOOK: DO-FILE PROGRAMMING I
Tabulating a logical condition across a set of variables
Computing summary statistics over groups
Computing the extreme values of a sequence
Computing the length of spells
Summarizing group characteristics over observations
Using global macros to set up your environment
List manipulation with extended macro functions
Using creturn values to document your work
5. DO-FILE PROGRAMMING: VALIDATION, RESULTS, AND DATA MANAGEMENT
Introduction
What you should learn from this chapter
Data validation: The assert, count, and duplicates commands
Reusing computed results: The return and ereturn commands
The ereturn list command
Storing, saving, and using estimated results
Generating publication-quality tables from stored estimates
Reorganizing datasets with the reshape command
Combining datasets
Combining datasets with the append command
Combining datasets with the merge command
The dangers of many-to-many merges
Other data-management commands
The fillin command
The cross command
The stack command
The separate command
The joinby command
The xpose command
6. COOKBOOK: DO-FILE PROGRAMMING II
Efficiently defining group characteristics and subsets
Using a complicated criterion to a subset of observations
Applying reshape repeatedly
Handling time-series data effectively
reshape to perform rowwise computation
Adding computed statistics to presentation-quality tables
Presenting marginal effects rather than coefficients
Generating time-series data at a lower frequency
7. DO-FILE PROGRAMMING: PREFIXES, LOOPS, AND LISTS
Introduction
What you should learn from this chapter
Prefix commands
The by prefix
The xi prefix
The statsby prefix
The rolling prefix
The simulate and permute prefix
The bootstrap and Jackknife prefixes
Other prefix commands
The forvalues and foreach commands
8. COOKBOOK: DO-FILE PROGRAMMING III
Handling parallel lists
Calculating moving-window summary statistics
Producing summary statistics with rolling and merge
Calculating moving-window correlations
Computing monthly statistics from daily data
requiring at least n observations per panel unit
Counting the Number of distinct values per individual
9. DO-FILE PROGRAMMING: OTHER TOPICS
Introduction
What you should learn from this chapter
Storing results in Stata matrices
The post and postfile commands
Output: The outsheet, outfile, and commands
Automating estimation output
Automating graphics
Characteristics
10. COOKBOOK: DO-FILE PROGRAMMING IV
Computing firm-level correlations with multiple indices
Computing marginal effects for graphical presentation
Automating the production of LATEX tables
Tabulating downloads from the Statistical Software Components archive
Extracting data from graph files sersets
Constructing continuous price and returns series
11. ADO-FILE PROGRAMMING
Introduction
What you should learn from this chapter
The structure of a Stata program
The program statement
The syntax and return statements
Implementing program options
Including a subset of observations
Generalizing the command to handle multiple variables
Making commands byable
Program properties
Documenting your program
egen function programs
Writing an e-class program
Defining subprograms
Certifying your program
Programs for ml, nl, nlsur, simulate, bootstrap, and jackknife
Writing an ml-based command
Programs for the nl and nlsur commands
Programs for the simulate, bootstrap, and jackknife prefixes
Guidelines for Stata ado-file programming style
Presentation
Helpful Stata features
Respect for datasets
Speed and efficiency
Reminders
Style in the large
Use the best tools
12. COOKBOOK: ADO-FILE PROGRAMMING
Retrieving results from rolling:
Generalization of egen function pct9010() to support all pairs of quantiles
Constructing a certification script =
Using the ml command to estimate means and variances
Applying equality constraints in ml estimation
Applying inequality constraints in ml estimation
Generating a dataset containing the single longest spell
13. MATA FUNCTIONS FOR ADO-FILE PROGRAMMING
Mata: First principles
What you should learn from this chapter
Mata fundamentals
Operators
Relational and logical operators
Subscripts
Populating matrix elements
Mata loop commands
Conditional statements
Function components
Arguments
Variables
Saved results
Calling Mata functions
Mata st_ interface functions
Data access
Access to locals, globals, scalars, and matrices
Access to Stata variablesattributes
Example: st_ interface function usage
Example: Matrix operations
Extending the command
Creating arrays of temporary objects with pointers
Structures
Additional Mata features
Macros in Mata functions
Compiling Mata functions
Building and maintaining an object library
A useful collection of Mata routines
14. COOKBOOK: MATA FUNCTION PROGRAMMING
Reversing the rows or columns of a Stata matrix
Shuffling the elements of a string variable
Firm-level correlations with multiple indices with Mata
Passing a function to a Mata function
Using subviews in Mata
Storing and retrieving country-level data with Mata structures
Locating nearest neighbors with Mata
Computing the seemingly unrelated regression estimator
GMM-CUE estimator using Mata’s optimize() functions