An Introduction to Stata Programming

Christopher F. Baum’s An Introduction to Stata Programming, Second Edition, is a great reference for anyone who wants to learn Stata programming.

 

Baum assumes readers have some familiarity with Stata, but readers who are new to programming will find the book accessible. He begins by introducing programming concepts and basic tools. More advanced programming tools such as structures and pointers and likelihood-function evaluators using Mata are gradually introduced throughout the book alongside examples.

 

This new edition reflects some of the most important statistical tools added since Stata 10. Of note are factor variables and operators, the computation of marginal effects, marginal means, and predictive margins using margins, the use of gmm to implement generalized method of moments estimation, and the use of suest for seemingly unrelated estimation.

 

As in the previous edition of the book, Baum steps the reader through the three levels of Stata programming. He starts with do-files. Do-files are powerful batch files that support loops and conditional statements and are ideal to automate your workflow as well as to guarantee reproducibility of your work.

 

He then delves into ado-files, which are used to extend Stata by creating new commands that share the syntax and behavior of official commands. Baum gives an example of how to write a command to calculate percentiles and the range of a variable, complete with documentation and certification.

 

After introducing the fundamentals of command development, Baum shows users how these concepts can be applied to help them write their own custom estimation commands by using Stata’s built-in numerical maximum-likelihood estimation routine, ml, its built-in nonlinear least-squares routines, nl and nlsur, and its built-in generalized method of moments estimation routine.

 

Finally, he introduces Mata, Stata’s matrix programming language. Mata programs are integrated into ado-files to build a custom estimation routine that is optimized for speed and numerical stability. Baum briefly discusses how ado-file programming concepts relate to Mata functions and objects. He also explains some of the advantages of using Mata for certain programming tasks. Baum introduces concepts by providing the background and importance of the topic, presents common uses and examples, and then concludes with larger, more applied examples he refers to as “cookbook recipes”.

 

Many of the examples are of particular interest because they arose from frequently asked questions from Stata users. If you want to understand basic Stata programming or want to write your own routines and commands using advanced Stata tools, Baum’s book is a great reference.

Table of contents
List of tables
List of figures
Acknowledgments
Notation and typography

 

1. WHY SHOULD YOU BECOME A STATA PROGRAMMER?

Do-file programming
Ado-file programming
Mata programming for ado-files

Plan of the book
Installing the necessary software

 

2. SOME ELEMENTARY CONCEPTS AND TOOLS

Introduction

What you should learn from this chapter

Navigational and organizational issues

The current working directory and profile.do
Locating important directories: sysdir and adopath
Organization of do-files, ado-files, and data files

Editing Stata do- and ado-files
Data types

Storing data efficiently: The compress command
Date and time handling
Time-series operators

Handling errors: The capture command
Protecting the data in memory: The preserve and restore commands
Getting your data into Stata

Inputting data from ASCII text files and spreadsheets

Handling text files
Free format versus fixed format
The insheet command
Accessing data stored in spreadsheets
Fixed-format data files

Importing data from other package formats

Guidelines for Stata do-file programming style

Basic guidelines for do-file writers
Enhancing speed and efficiency

How to seek help for Stata programming

 

3. DO-FILE PROGRAMMING: FUNCTIONS, MACROS, SCALARS, AND MATRICES

Introduction

What you should learn from this chapter

Some general programming details

The varlist
The numlist
The if exp and in range qualifiers
Missing data handling

Recoding missing values: The mvdecode and mvencode commands

String-to-numeric conversion and vice versa

Numeric-to-string conversion
Working with quoted strings

Functions for the generate command

Using if exp with indicator variables
The cond() function
Recoding discrete and continuous variables

Functions for the egen command

Official egen functions
egen functions from the user community

Computation for by-groups

Observation numbering: _n and _N

Local macros
Global macros
Extended macro functions and macro list functions

System parameters, settings, and constants:creturn

Scalars
Matrices

 

4. COOKBOOK: DO-FILE PROGRAMMING I

Tabulating a logical condition across a set of variables
Computing summary statistics over groups
Computing the extreme values of a sequence
Computing the length of spells
Summarizing group characteristics over observations
Using global macros to set up your environment
List manipulation with extended macro functions
Using creturn values to document your work

 

5. DO-FILE PROGRAMMING: VALIDATION, RESULTS, AND DATA MANAGEMENT

Introduction

What you should learn from this chapter

Data validation: The assert, count, and duplicates commands
Reusing computed results: The return and ereturn commands

The ereturn list command

Storing, saving, and using estimated results

Generating publication-quality tables from stored estimates

Reorganizing datasets with the reshape command
Combining datasets
Combining datasets with the append command
Combining datasets with the merge command

The dangers of many-to-many merges

Other data-management commands

The fillin command
The cross command
The stack command
The separate command
The joinby command
The xpose command

 

6. COOKBOOK: DO-FILE PROGRAMMING II

Efficiently defining group characteristics and subsets

Using a complicated criterion to a subset of observations

Applying reshape repeatedly
Handling time-series data effectively
reshape to perform rowwise computation
Adding computed statistics to presentation-quality tables

Presenting marginal effects rather than coefficients

Generating time-series data at a lower frequency

 

7. DO-FILE PROGRAMMING: PREFIXES, LOOPS, AND LISTS

Introduction

What you should learn from this chapter

Prefix commands

The by prefix
The xi prefix
The statsby prefix
The rolling prefix
The simulate and permute prefix
The bootstrap and Jackknife prefixes
Other prefix commands

The forvalues and foreach commands

 

8. COOKBOOK: DO-FILE PROGRAMMING III

Handling parallel lists
Calculating moving-window summary statistics

Producing summary statistics with rolling and merge
Calculating moving-window correlations

Computing monthly statistics from daily data
requiring at least n observations per panel unit
Counting the Number of distinct values per individual

 

9. DO-FILE PROGRAMMING: OTHER TOPICS

Introduction

What you should learn from this chapter

Storing results in Stata matrices
The post and postfile commands
Output: The outsheet, outfile, and commands
Automating estimation output
Automating graphics
Characteristics

 

10. COOKBOOK: DO-FILE PROGRAMMING IV

Computing firm-level correlations with multiple indices
Computing marginal effects for graphical presentation
Automating the production of LATEX tables
Tabulating downloads from the Statistical Software Components archive
Extracting data from graph files sersets
Constructing continuous price and returns series

 

11. ADO-FILE PROGRAMMING

Introduction

What you should learn from this chapter

The structure of a Stata program
The program statement
The syntax and return statements
Implementing program options
Including a subset of observations
Generalizing the command to handle multiple variables
Making commands byable

Program properties

Documenting your program
egen function programs
Writing an e-class program

Defining subprograms

Certifying your program
Programs for ml, nl, nlsur, simulate, bootstrap, and jackknife

Writing an ml-based command

Programs for the nl and nlsur commands
Programs for the simulate, bootstrap, and jackknife prefixes

Guidelines for Stata ado-file programming style

Presentation
Helpful Stata features
Respect for datasets
Speed and efficiency
Reminders
Style in the large
Use the best tools

 

12. COOKBOOK: ADO-FILE PROGRAMMING

Retrieving results from rolling:
Generalization of egen function pct9010() to support all pairs of quantiles
Constructing a certification script =
Using the ml command to estimate means and variances

Applying equality constraints in ml estimation

Applying inequality constraints in ml estimation
Generating a dataset containing the single longest spell

 

13. MATA FUNCTIONS FOR ADO-FILE PROGRAMMING

Mata: First principles

What you should learn from this chapter

Mata fundamentals

Operators
Relational and logical operators
Subscripts
Populating matrix elements
Mata loop commands
Conditional statements

Function components

Arguments
Variables
Saved results

Calling Mata functions
Mata st_ interface functions

Data access
Access to locals, globals, scalars, and matrices
Access to Stata variablesattributes

Example: st_ interface function usage
Example: Matrix operations

Extending the command

Creating arrays of temporary objects with pointers
Structures
Additional Mata features

Macros in Mata functions
Compiling Mata functions
Building and maintaining an object library
A useful collection of Mata routines

 

14. COOKBOOK: MATA FUNCTION PROGRAMMING

Reversing the rows or columns of a Stata matrix
Shuffling the elements of a string variable
Firm-level correlations with multiple indices with Mata
Passing a function to a Mata function
Using subviews in Mata
Storing and retrieving country-level data with Mata structures
Locating nearest neighbors with Mata
Computing the seemingly unrelated regression estimator
GMM-CUE estimator using Mata’s optimize() functions

Author: Christopher F. Baum
Edition: Second Edition
ISBN-13: 978-1-59718-150-1
©Copyright: 2016
Versione e-Book disponibile

This new edition reflects some of the most important statistical tools added since Stata 10. Of note are factor variables and operators, the computation of marginal effects, marginal means, and predictive margins using margins, the use of gmm to implement generalized method of moments estimation, and the use of suest for seemingly unrelated estimation.