Economic Problem Sets Using “SATA”

Directions: Following each question, type or handwrite your answers and copy/paste the Stata output when needed (use the ‘copy as picture’

option). Staple all pages together — assignments turned in unstapled will be returned with a grade of zero. (Only stapling is acceptable —

paper clips and other methods of binding are not acceptable.) Also, if we cannot discern the meaning of your work, your response will be

assumed wrong.

Problem 1 — the difference-in-differences estimator (6 points total)

In April 1978, Fidel Castro announced that Cubans who wanted to leave Cuba for the United States could do so from the port of Mariel,

Cuba. Within 6 month, about 125,000 (mainly) lowskill Cubans had flowed through Mariel for Miami, resulting in a sudden 7% increase labor

force of Miami. David Card (Industrial and Labor Relations Review 1990) examined the effect of the Mariel boat lift on wages of low-skill

workers in Miami.

For African-Americans in Miami (the “treated” labor market) the unemployment rate was 8.3% (before the boat lift) and 9.6% 1981 (after the

boat lift). For African-Americans in cities similar to Miami that did not experience a big influx of low-skill immigration (Atlanta,

Houston, and Los Angeles — the “control” labor markets) the unemployment rate was 10.3% in 1979 (before the boat lift) and 12.6% in 1981

(after the boat lift).

1.1. (2 points) If we assume that, without the Mariel boat lift, the unemployment rate of African Americans in Miami would have followed

the same trend as African-Americans in the comparison cities, what would have happened to the black unemployment rate in Miami between

1979 and 1981? That is, what would the 1981 black unemployment rate in Miami have been?

1.2. (2 points) Based on your answer to (a), what is the difference-in-differences estimate of the the effect of the Mariel boat lift on

the black unemployment rate in Miami?

1.3. (2 points) Is your answer in (b) what you expected? Why or why not? If not, can you think of any possible explanations for the

anomaly?

Problem 2 (10 points in total) — the fixed effects estimator

This problem introduces you to the use of panel data and the fixed effects estimator. It is based on Computer Exercise C14.8 in

Wooldridge, and before you start, you should work though Computer Exercise C13.11 in Wooldridge — the solution is in <25-panel data-

rev.pdf> posted on D2L.

Use the data in MATHPNL(sm).dta for this exercise. You will do a fixed effects version of the first differencing done in Computer

Exercises C13.11. The model of interest is:

math4it = δ1y93t + … + δ6y98t + γ1log(rxppit) + γ2log(rxppi,t–1) + ψ2log(enrolit) + ψ3lunchit

+ ai + uit

where the first available year (the base year) is 1993 because of the lagged spending variable. (Remember that the earlier model was:

math4it = δ1y93t + … + δ6y98t + β1log(rxppit) + log

(enrolit) + β3lunchit + ai + uit)

2.1. (2 points) Estimate the model by pooled OLS and report the estimates. You should include an intercept along with the year dummies to

allow ai to have a nonzero expected value. What are the estimated effects of the spending variables — both contemporaneous spending

(rxppit) and lagged spending (rxppi,t–1)?

2.2. (2 points) Interpret the coefficient of the lunchit variable. Is the sign of the coefficient what you expected? Would you say that

the district poverty rate has a big effect on test pass rates?

2.3. (2 points) Now, estimate the equation by fixed effects. (Hint: type xtreg math4 y94 y95 y96 y97 y98 lrexpp lrexpp_1 lenrol lunch,

fe.) Is the lagged spending variable still significant?

2.4. (2 points) Do an F test for the joint significance of the enrollment and lunch program variables and report the results. (Hint: type

test lenrol lunch.) Why, when you use fixed effects, do you think the enrollment and lunch program variables are jointly insignificant?

2.5. (2 points) Define the total, or long-run, effect of spending as θ1 = γ1 + γ2. What are θ1 and the standard error on θ1? (Hint: This

is easy to do in Stata using lincom. Type lincom lrexpp + lrexpp_1.)

2

Problem 3 — basic time series estimation (13 points total)

This problem introduces you to regression with time series data and is based on problem C10.11 in Wooldridge. It uses the Stata file

TRAFFIC.dta, which contains monthly data on car accidents and traffic laws from California from January 1981 to December 1989. It includes

the following variables:

totacc

number of total accidents in California

year

1981 to 1989

fatacc

number of fatal accidents in California

prcfat

percent fatal accidents: 100•(fatacc/totacc)

unem

state unemployment rate

spdlaw

= 1 after 65 mph speed limit in effect

beltlaw

= 1 after seatbelt law

wkends

number of weekends in month

feb                                   …

= 1 if month is February

dec

= 1 if month is December

3.1. (1 point) During what month and year did the highway speed limit increase to 65 miles per hour (from 55 miles per hour)? (Hint: type

browse year spdlaw feb mar apr may jun jul aug sep oct nov dec to view these variables. Note that the month of January is excluded.)

3.2. (2 points) Run a regression of total accidents (totacc) on the 11 monthly dummies

(excluding January) and the variable year. Would you say there is seasonality in total accidents? Why or why not? (Hint: type reg totacc

year feb mar apr may jun jul aug sep oct nov dec.)

3.3 (2 points) Add the variable unemployment (unem) to the regression in 3.2. Does the sign of the slope on unemployment make sense to

you? Is it statistically significant at the 1% level?

3.4. (2 points) Next, investigate whether introducing the 65 mile speed limit increased traffic accidents in California. Begin by running

a regression of the number of total accidents (totacc) on the 11 monthly dummies, unemployment, and speed law (spdlaw). Interpret the

coefficient on speed law. Does its sign make sense?

3.5. (2 points) Now add the variable year to the regression from part 3.4. Interpret the coefficient on spdlaw. Is it statistically

significant at the 5% level?

3.6. (2 points) Why did the OLS estimate of the coefficient on spdlaw in 3.5 decrease once you added year to the regression?

3.7 (2 points) Finally, run the same regression as in 3.5, but now use prcfat (percent of fatal accidents) as the dependent variable

instead of totacc. (In other words, regress prcfat on the 11 monthly dummies, year, unemployment, and speed law). Interpret the slope on

spdlaw. Does the higher speed limit have a statistically significant effect on the percent of fatal accidents? (Use significance level of

1%.)