tema i (mínimos cuadrados ordinarios)

Upload: joaquinsd

Post on 08-Jul-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    1/49

    Ordinary Least Squares

    Rómulo A. Chumacero

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    2/49

    OLS

    Motivation

    •  Economics is (essentially) observational science•  Theory provides discussion regarding the relationship between variables

    — Example: Monetary policy and macroeconomic conditions

    •  What?: Properties of OLS

    •  Why?: Most commonly used estimation technique

    •  How?: From simple to more complex

    Outline

    1. Simple (bivariate) linear regression

    2. General framework for regression analysis

    3. OLS estimator and its properties

    4. CLS (OLS estimation subject to linear constraints)

    5. Inference (Tests for linear constraints)

    6. Prediction

    1

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    3/49

    OLS

    An Example

    Figure 1: Growth and Government size

    2

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    4/49

    OLS

    Correlation Coefficient

    •  Intended to measure direction and closeness of linear association

    •  Observations:  { } =1

    •  Data expressed in deviations from the (sample) mean:

    e =  −  =  −1  X=1

     = 

    •  Cov( ) = E  ()− E  () E  ()

     =  −1

     

    X=1 eewhich depends on the units in which    and   are measured

    •  Correlation coefficient is a measure of linear association independent of units

     =  −1 X

    =1

    ee

    =  

     = v uut −1  X=1

    e2  =  •  Limits: −1 ≤ ≤ 1 (applying Cauchy-Schwarz inequality)

    3

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    5/49

    OLS

    Caution

    •  Fallacy: “Post hoc, ergo propter hoc” (after this, therefore because of this)•  Correlation is not causation

    •  Numerical and statistical significance, may not mean nothing

    •  Nonsense (spurious) correlation

    •  Yule (1926):

    — Death rate - Proportion of marriages in the Church of England (1866-1911)

    —  = 095

    — Ironic: To achieve immortality → close the church!•  A few more recent examples

    4

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    6/49

    OLS

    Ice Cream causes Crime

    Figure 2: Nonsense 1

    5

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    7/49

    OLS

    Yet another reason to hate Bill Gates

    Figure 3: Nonsense 2

    6

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    8/49

    OLS

    Facebook and the Greeks

    Figure 4: Nonsense 3

    7

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    9/49

    OLS

    Let’s save the pirates

    Figure 5: Nonsense 4

    8

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    10/49

    OLS

    Divine justice

    Figure 6: Nonsense 5?

    9

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    11/49

    OLS

    Simple linear regression model

    •  Economics as a remedy for nonsense (correlation does not indicate direction of dependence)•  Take a stance:

     =  1 +  2 +

    — Linear

    — Dependent / independent

    — Systematic / unpredictable

    •    observations, 2 unknowns

    •  Infinite possible solutions

    — Fit a line by eye

    — Choose two pairs of observations and join them

    — Minimize distance between   and predictable component

    ∗  minP ||→LAD∗  minP2 →OLS

    10

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    12/49

    OLS

    Our Example

    Figure 7: Growth and Government size

    11

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    13/49

    OLS

    Our Example

    Figure 8: Linear regression

    12

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    14/49

    OLS

    Simple linear regression model

    •  Define the sum of squares of the residuals () function as:

       ( ) = X

    =1

    ( −  1 −  2)2

    •  Estimator: Formula for estimating unknown parameters

    •  Estimate: Numerical value obtained when sample data is substituted in formula

    •  The OLS estimator ( b ) minimizes    ( ). FONC:   ( )

     1 ¯̄̄̄ b  =  −2X³ − b 1 − b 2´ = 0   ( )

     2

    ¯̄̄̄ b  =  −2

    X

    ³ − b 1 − b 2´ = 0

    •  Two equations, two unknowns: b 1   =   − b 2 b 2   =   2 =  =P 

    =1 eeP =1

    e2

    13

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    15/49

    OLS

    Simple linear regression model

    •  Properties:

    — b 1 b 2  minimize — OLS line passes through the mean point  ( )

    — b ≡ − b 1 − b 2  are uncorrelated (in the sample) with 

    Figure 9: SSR

    14

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    16/49

    OLS

    General Framework

    •  Observational data {1 2 }•  Partition  = ( ) where  ∈ R,  ∈ R

    •  Joint density:   ( ),   vector of unknown parameters

    •  Conditional distribution:   ( ) =  ( | 1) ( 2);    ( 2) = R ∞

    −∞ ( )

    •  Regression analysis: statistical inferences on  1

    Ignore  ( 2) provided 1  and 2  are “variation free”

    •  : ‘dependent’ or ‘endogenous’ variable.  : vector of ‘independent’ or ‘exogenous’ variables

    •  Conditional mean:   ( 3). Conditional variance:   ( 4)

    ( 3) =   E  ( | 3) =

    Z  ∞−∞

     (| 2)

    ( 4) = Z  ∞

    −∞

    2 (| 2) − [ ( 3)]2

    •  : diff erence between   and conditional mean:

     =  ( 3) +   (1)

    15

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    17/49

    OLS

    General Framework

    Proposition 1 Properties of  1. E  ( | ) = 02. E  () = 03. E  [()] = 0  for any function   (·)4. E  () = 0

    Proof.  1. By definition of    and linearity of conditional expectations,

    E  ( | ) =   E  [ − () | ]=   E  [ | ]− E  [() | ]=   ()

    − () = 0

    2. By the law of iterated expectations and the  first result,

    E  () = E  [E  ( | )] = E  (0) = 0

    3. By essentially the same argument,

    E  [()]   =   E  [E  [() | ]]

    =   E  [()E  [ | ]]

    =   E  [() · 0] = 0

    4. Follows from the third result setting () = 

    16

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    18/49

    OLS

    General Framework

    •  (1) + fi

    rst result of Proposition 1: regression framework:   =   ( 3) +

    E  ( | ) = 0

    •  Important: framework, not model: holds true by definition.

    •   (·) and  (·) can take any shape

    •  If   (·) is linear: Linear Regression Model (LRM).

    ( 3) = 0 

      ×1

    =

    ⎡⎣ 1...

     

    ⎤⎦  

     ×=

    ⎡⎣ 01...

    ⎤⎦ =

    ⎡⎣ 11   · · ·   1...   . . .   ...

    1   · · ·  

    ⎤⎦

     ×1=

    ⎡⎣ 1...

     

    ⎤⎦

    17

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    19/49

    OLS

    Regression models

    Defi

    nition 1 The Linear Regression Model (LRM) is:1.    = 0  +  or     =    +

    2. E  ( | ) = 03. rank ( ) =   or  det( 0 ) 6= 04. E  () = 0 ∀ 6= 

    Definition 2 The Homoskedastic Linear Regression Model (HLRM) is the LRM plus 

    5. E ¡

    2 |¢

     =  2 or  E  (0 | ) = 2  

    Definition 3 The Normal Linear Regression Model (NLRM) is the LRM plus 

    6.  

    ∼ N ¡0

    2

    ¢

    18

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    20/49

    OLS

    Definition of OLS Estimator

    •  Defi

    ne the sum of squares of the residuals () function as:   ( ) = (  − )0 (  − ) =  0  − 2 0  +  0 0 

    •  The OLS estimator (

     b ) minimizes    ( ). FONC:

       ( )

      ¯̄̄̄ b  = −2 0   + 2 0  b  = 0which yield normal equations  0   =  0  b .

    Proposition 2

     b  = ( 0 )−1 ( 0 )  is the  arg min

        ( )

    Proof.  Using normal equations: b  = ( 0 )−1 ( 0 ). SOSC: 2   ( )

     0

    ¯̄

    ¯̄ b 

    = 2 0 

    then b  is minimum as  0  is p.d.m.•  Important implications:

     b  is a linear function of   

    — b  is a random variable (function of    and  )—  0  must be of full rank19

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    21/49

    OLS

    Interpretation

    •  Defi

    ne least squares residuals  b =    −  b    (2) b2 =  −1 b0 b   =   b  + b =      +  ; where   =   ( 0 )−1  0 and   =  −  

    Proposition 3 Let    be an   ×  matrix of rank  . A matrix of the form     =  (0)−1 0 is called a projection matrix and has the following properties:

    i)    =  0 =  2 (Hence    is symmetric and idempotent)

    ii) rank ( ) = 

    iii) Characteristic roots (eigenvalues) of    consist of r ones and n-r zeros iv) If    =   for some vector  , then     =   (hence the word projection)v)    =   −   is also idempotent with rank n-r, eigenvalues consist of n-r ones and r zeros,

    and if    = , then MZ = 0

    vi)   can be written as  0, where  0 =  , or as  101 + 2

    02 + +

    0  where    is a vector 

    and   = ( )

    20

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    22/49

    O S

    Interpretation

       =   b  + b =      +  Y

    Col(X )

    MY

    PY

    0

    Figure 10: Orthogonal Decomposition of   

    21

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    23/49

    The Mean of 

     b 

    Proposition 4 In the LRM,  E h³ b −  ́ | i = 0  and  E  b  =  Proof.  By the previous results,

     b   = ( 0 )

    −1 0   = ( 0 )

    −1 0 (  + )

    =     + ( 0 )−1

     0

    Then

    E h³ b −  ́ | i   =   E h( 0 )−1  0 | i

    = ( 0 )−1

     0E  ( | )

    = 0

    Applying the law of iterated expectations,  E  b  = E hE ³ b | ́ i =   

    22

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    24/49

    The Variance of 

     b 

    Proposition 5 In the HLRM, V ³ b | ́  =  2 ( 0 )−1 and  V ³ b ́  =  2E h( 0 )−1iProof.   Since b −   = ( 0 )−1  0

    ³ b | 

    ´  =   E 

    ³ b −  

    ´³ b −  

    ´0| 

    ¸=   E h( 0 )−1  00  ( 0 )−1 | i= ( 0 )

    −1 0E  [0 | ]   ( 0 )

    −1

    =   2 ( 0 )−1

    Thus, V ³ b ́  = E hV ³ b | ́ i+ V hE ³ b | ́ i =  2E h( 0 )−1i•  Important features of  V 

    ³ b | ́  =  2 ( 0 )−1:— Grows proportionally with 2

    — Decreases with sample size— Decreases with volatility of   

    23

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    25/49

    The Mean and Variance of 

     b2

    Proposition 6 In the LRM, b2 is biased.Proof.  We know that b  =   . It is trivial to verify that b = . Then, b2 =  −1 b0 b =

     −10 This implies that

    ¡ b2 | 

    ¢  =    −1E  [0 | ]

    =    −1

    trE  [0

    | ]=    −1E  [tr (0 ) | ]=    −1E  [tr (0) | ]=    −12tr ( )=   2 ( 

     −)  −1

    Applying the law of iterated expectations we obtain E  b2 = 2 (  − )  −1Unbiased estimator:

     e2 = (  − )−1

     b0

     b.

    Proposition 7 In the NLRM, V  b2 =  −22 (  − ) 4•  Important:

    — With the exception of Proposition 7, normality is not required

     b2 is biased, but it is the MLE under normality and is consistent

    — Variance of  b  and b2 depend on 2. bV ³ b ́  = e2 ( 0 )−124

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    26/49

     b  is BLUE

    Theorem 1 (Gauss-Markov) b  is BLUE Proof.  Let  = ( 0 )−1  0, so b  =  . Consider any other linear estimator  = ( +  )  .

    Then,E  ( | ) = ( 0 )

    −1 0  +   = ( −  )  

    For   to be unbiased we require     = 0, then:V  ( | ) = E 

    £( +  ) 0 ( +  )0

    ¤As ( +  ) ( +  )0 = ( 0 )−1 +  0, we obtain

    V  ( | ) = V ³ b | ´+ 2 0As   0 is p.s.d. we have V  ( | ) ≥ V 

    ³ b | ́•  Despite popularity, Gauss-Markov not very powerful

    — Restricts quest to linear and unbiased estimators— There may be “nonlinear” or biased estimator that can do better (lower MSE)

    — OLS not BLUE when homoskedasticity is relaxed

    25

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    27/49

    Asymptotics I

    •  Unbiasedness is not that useful in practice (frequentist perspective)

    •  It is also not common in general contexts

    •  Asymptotic theory: properties of estimators when sample size is infinitely large

    •  Cornerstones: LLN (consistency) and CLT (inference)

    Defi

    nition 4 (Convergence in probability)  A sequence of real or vector valued random variables  {}  is said to converge to   in probability if 

    lim →∞

    Pr (k  − k  ) = 0  for any    0

    We write    →  or   lim   = .

    Definition 5 (Convergence in mean square)  {}  converges to   in mean square if lim

     →∞E  (  − )2 = 0

    We write    → .

    Definition 6 (Almost sure convergence)  {}  converges to   almost surely if 

    Prh

     lim →∞

      = i

     = 1

    We write   → .

    Definition 7 The estimator  b   of  0  is said to be a weakly consistent estimator if  b  

    →0.

    Definition 8 The estimator  b   of  0  is said to be a strongly consistent estimator if  b  → 0.26

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    28/49

    Laws of Large Numbers and Consistency of 

     b 

    Theorem 2 (WLLN1, Chebyshev)  Let  E  () =  ,  V  () =  

    2

    ,   (  ) = 0 ∀  6=  .If   lim →∞

    1

     P 

    =1 2 ≤ ∞, then 

      −   → 0

    Theorem 3 (SLLN1, Kolmogorov)  Let  {}  be independent with  fi nite variance  V  () =

    2

      ∞. If  P∞=1 22   ∞, then   −   → 0

    •  Assume that  −1 0  →  (invertible and nonstochastic)

     b −    = ( 0 )

    −1 0

    = ¡ −1 0 ¢−1 ¡ −1 0¢   → 0• b  is consistent: b    →  

    27

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    29/49

    Analysis of Variance (ANOVA)

       = b   + b  −    =

    ³ b

      −  ´

    +

     b

    ¡  −  ¢0 ¡  −  ¢ = ³ b  −  ´0 ³ b  −  ´+ 2³ b  −  ´0 b + b0 bbut b 0 b =   0    = 0 and  0 b =   0 b = 0. Thus

    ¡  −  

    ¢0

    ¡  −  

    ¢ =

    ³ b  −  

    ´0

    ³ b  −  

    ´+

     b0

     b

    This is called the ANOVA formula, often written as

      =   +

    2 =  

       = 1−

       = 1−  

     0 

     =     −  −10. If regressors include constant,  0 ≤ 2 ≤ 1.

    28

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    30/49

    Analysis of Variance (ANOVA)

    •  Measures percentage of variance of     accounted for in variation of  b .•  Not “measure” or “goodness” of  fit

    •  Doesn’t explain anything

    •  Not even clear if  2 has interpretation in terms of forecast performance

    Model 1:   =   +   Model 2:   −  =   +  with   =  − 1•  Mathematically identical and yield same implications and forecasts

    •  Yet reported 2 will diff er greatly

    •  Suppose   ' 1. Second model:  2 ' 0, First model can be arbitrarily close to one

    •  2 is increases as regressors are added. Theil proposed:

    2

    = 1−  

     

     

     −

     = 1−

     e2

     b2•  Not used that much today, as better model evaluation criteria have been developed

    29

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    31/49

    OLS Estimator of a Subset of   

    Partition   = £  1    2 ¤     = µ  1 2 ¶Then  0  b  =  0  can be written as:

     01 1

     b 1 +  

    01 2

     b 2   =    

    01    (3a)

     02 1 b 1 +  02 2 b 2   =    02    (3b)Solving for b 2  and reinserting in (3a) we obtain b 1 = ( 01 2 1)−1  01 2 

     b 2 = ( 02 1 2)

    −1 02 1 

    where   =  −   =  −  ( 0 )−1  0   (for   = 1 2).Theorem 4 (Frisch-Waugh-Lovell) b 2 and  b can be computed using the following algorithm:

    1. Regress     on   1  obtain residual 

     e 

    2. Regress   2  on   1  obtain residual  e 2

    3. Regress  e   on  e 2, obtain  b 2  and residuals  bFWL was used to speed computation

    30

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    32/49

    Application of FWL: (Demeaning)

    Partition   = £  1    2 ¤ where  1 =   and  2  is the matrix of observed regressorse 2   =    1 2 =  2 − (0)−1 0 2=    2 − 2

    e    =    1   =   − (0)−1 0 =     −  FWL states that b 2  is OLS estimate from regression of  e   on e 2

     b 2 = à  

    X=1 e2e02!−1

    à  

    X=1 e2e!Thus the OLS estimator for the slope coefficients is a regression with demeaned data.

    31

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    33/49

    Constrained Least Squares (CLS)

    Assume the following constraint must hold:0  =    (4)

     ( ×   matrix of known constants),   ( -vector of known constants).   , rank() =  .CLS estimator of    ( ) is value of    that minimizes  subject to (4).

    L (  ) = (  − )0 (  − ) + 2 0 (0 − )   is a  -vector of Lagrange multipliers. FONC:

     L

     ¯̄̄̄ =  −2 0   + 2 0  + 2  = 0

     L

     

    ¯̄̄̄ 

    =   0 −  = 0

      =

     b − ( 0 )−1

    h0 ( 0 )

    −1

    i−1

    ³0

     b −

    ´  (5)

    2 =  −1¡

      −  ¢0 ¡  −  ¢  is BLUE

    32

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    34/49

    Inference

    •  Up to now, properties of estimators did not depend on the distribution of  

    •  Consider the NLRM with  ∼  N ¡

    0 2¢

    . Then:

    | ∼  N ¡

    0 2¢

    •  On the other hand as b  = ( 0 )−1  0 , then: b |    v  N ³ 2 ( 0 )−1´•  However, as

     b 

      →  , it also converges in distribution to a degenerate distribution

    •  Thus, we require something more to conduct inference•  Next, we discuss  finite (exact) and large sample distribution of estimators to test hypothesis

    •  Components:

    — Null hypothesis H0

    — Alternative hypothesis H1

    — Test statistic (one tail, two tails)

    — Rejection region

    — Conclusion

    33

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    35/49

    Inference with Linear Constraints (normality)H0 : 0  =    H1 : 0  6= 

    The  Test  = 1. Assume   is normal, under the null hypothesis:

    0

     b 

      v  N 

    h 20 ( 0 )

    −1i

    0 b − h20 ( 0 )−1

    i12 ∼  N  (0 1)   (6)Test statistic used when   is known. If not, recall

     b0

     b2  ∼ 2 −   (7)As (6) and (7) are independent, hence:

      =  0 b −

    he20 ( 0 )−1 i

    12 ∼   −

    (6) holds even when normality of    is not present.

    If H0 :  1 = 0, define   =£

     1 0   · · ·   0¤0

     = 0,

      =

     b 1

    q bV 1134

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    36/49

    Inference with Linear Constraints (normality)

    •  Confi

    dence interval:Pr

    ∙ b  − 2q bV      b  + 2q bV ¸ = 1− •  Tail probability, or probability value ( -value) function

       =  ( ) = Pr (| | ≥ | |) = 2 (1− Φ (| |))Reject the null when the -value is less than or equal to 

    •  Confidence interval for :

    Pr"(  − ) e22 −1−2

    2  (  − ) e22 −2

    # = 1−   (8)

    35

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    37/49

    The   Test (normality)

     1. Under the null:

       ¡ ¢−   ³ b ́2

      ∼ 2 When 2 is not known, replace 2 with

     e2 and obtain

       ¡ ¢−   ³ b ́e2   =    −   ³0 b − ´0 h0 ( 0 )−1 i−1 ³0 b − ´ b0 b   ∼   −   (9)

    As with   tests, reject null when the value computed exceeds the critical value

    36

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    38/49

    Asymptotics II

    •  How to conduct inference when   is not necessarilly normal?

    Figure 11: Convergence in distribution

    37

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    39/49

    CLT

    Definition 9 (Convergence in distribution)  {}   is said to converge to     in distribution if the distribution function  F    of    converges to the distribution  F   of    at every continuity 

    point of  F . We write   →  and we call  F  the limiting distribution of  {}. If  {}  and  {}

    have the same limiting distribution, we write   =    .

    Theorem 5 (CLT1, Lindeberg-Lévy)  Let  {}  be i.i.d. with  E  =   and  V  = 2. Then 

       =    − [V  ]

    12 =√ 

       −

    →  N  (0 1)

    •  Assume that  −1 0  → (invertible and nonstochastic) and that  −12 0   →  N 

    ¡0 2

    ¢√  ³ b −  ́  = ¡ −1 0 ¢−1 ³ −12 0´   →  N ¡0 2−1¢•  Thus, under the HLRM, asymptotic distribution does not depend on distribution of  

    •  Normal vs t-test / Chi2 vs F test

    38

    OLS

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    40/49

    Tests for Structural Breaks

    Suppose we have two regimes regression

     1   =    1 1 + 1

     2   =    2 2 + 2

    E ∙ 12 ¸ £ 

    01  

    02 ¤ = ∙

     21  1   0

    0   22

      2 ¸

    H0 :  1 =  2

    Assume 1 = 2. Define

       =    +

       =∙

      1 2¸ ,   = ∙  1   0

    0    2¸ ,   = ∙  1

     2¸ , and   = ∙ 1

    Applying (9) we obtain:

     1 +  2−

    2

    ³ b 1 − b 2´0

    h( 01 1)

    −1 + ( 02 2)−1

    i−1

    ³ b 1 − b 2´ 0 h −  ( 0 )−1  0i  ∼   1+ 2−2   (10)where

     b 1 = ( 

    01 1)

    −1  01 1  and

     b 2 = ( 

    02 2)

    −1  02 2.

    39

    OLS

    T f S l B k

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    41/49

    Tests for Structural Breaks

    Same result can be derived as follows: Define  under alternative (structural change)

      ³ b ́  =   0 h −  ( 0 )−1  0i  

    and  under the null hypothesis

       ¡ ¢ =   0 h −  ( 0 )−1  0i  1 +  2 − 2

      ¡

     ¢−   ³ b ́

      

    ³ b 

    ´  ∼   1+ 2−2   (11)

    Unbiased estimate of  2 is e2 =      ¡ ¢ 1 +  2 − 2

    Chow tests are popular, but modern practice is skeptic. Recent theoretical and empirical

    applications: period of possible break as endogenous latent variable.

    40

    OLS

    P di ti

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    42/49

    Prediction

    •  Out-of-sample predictions for    (for   ) is not easy. In that period:     = 0 

      +  

    •  Types of uncertainty:

    — Unpredictable component

    — Parameter uncertainty

    — Uncertainty about  

    — Specification uncertainty

    •  Types of forecasts

    — Point forecast

    — Interval forecast

    — Density forecast

    •  Active area of research

    41

    OLS

    P di ti

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    43/49

    Prediction

    •  If  HLRM  holds, the predictor that minimizes MSE is b0  b  •  Given , mean squared prediction error is

    E h

    (

     b  −  )2 | 

    i =  2

    h1 + 0  ( 

    0 )−1

     

    i•  To construct estimator of variance of forecast error, substitute e2 for 2•  You may think that a confidence interval forecast could be formulated as:

    Pr

     b  − 2

    q bV 

     b     

     b +  + 2

    q bV 

     b 

    ¸ = 1−

    WRONG. Notice that

      − b r 

    2

    h1 + 0  ( 

    0 )−1  

    i =

      + 0 

    ³ − b ́

    r 2

    h1 + 0  ( 

    0 )−1  

    iRelation does not have a discernible limiting distribution (unless  is normal). We didn’t needto impose normality for all the previous results (at least asymptotically).

    We assumed that the econometrician knew  . If    is stochastic and not known at   , MSEcould be seriously underestimated.

    42

    OLS

    P di ti

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    44/49

    Prediction 

    x 1x 0

    x

     f(y)

    y

    Figure 12: Forecasting

    43

    OLS

    Measures of predictive accuracy of forecasting models

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    45/49

    Measures of predictive accuracy of forecasting models

    RMSE = v uut 1  X

     =1(  − b )2

    MAE =  1

     

     

    X =1|  −

     b |

    Theil    statistic:

      =

    v uut

    P  =1 (  − b )2P

      =1

     ∆ =v uutP  =1 (∆  −∆ b )2P 

     =1 (∆ )2

    ∆  =   −  −1   and   ∆

     b  =

     b  −  −1

    or, in percentage changes,

    ∆  =   −  −1

     −1and   ∆ b  = b  −  −1

     −1

    These measures will reflect the model’s ability to track turning points in the data

    44

    OLS

    Evaluation

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    46/49

    Evaluation

    •  When comparing 2 models, is one model really better than the other?

    •  Diebold-Mariano: Framework for comparing models

      =  (

     b)− (

     b ) ;     =

      √  

    →  N  (0 1)

    •  Harvey, Leyborne, and Newbold (HLN): Correct size distortions and use Student´s 

      =   ·

    ∙  + 1− 2 + (− 1)  

     

    ¸12

    45

    OLS

    Finite Samples

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    47/49

    Finite Samples

    •  Statistical properties of most methods: known only asymptotically

    •  “Exact”  finite sample theory can rarely be used to interpret estimates or test statistics

    •  Are theoretical properties reasonably good approximations for the problem at hand?

    •  How to proceed in these cases?

    •  Monte Carlo experiments and bootstrap

    46

    OLS

    Monte Carlo Experiments

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    48/49

    Monte Carlo Experiments

    •  Often used to analyze  finite sample properties of estimators or test statistics

    •  Quantities approximated by generating many pseudo-random realizations of stochastic processand averaging them

    — Model and estimators or tests associated with the model. Objective: assess small sample

    properties.

    — DGP: special case of model. Specify “true” values of parameters, laws of motion of variables, and distributions of r.v.

    — Experiment: replications or samples ( ), generating artificial samples of data according

    to DGP and calculating the estimates or test statistics of interest

    — After   replications, we have equal number of estimators which are subjected to statisticalanalysis

    — Experiments may be performed by changing sample size, values of parameters, etc. Re-sponse surfaces.

    •  Monte Carlo experiments are random. Essential to perform enough replications so resultsare sufficiently accurate. Critical values, etc.

    47

    OLS

    Bootstrap Resampling

  • 8/19/2019 Tema I (Mínimos Cuadrados Ordinarios)

    49/49

    Bootstrap Resampling

    •  Bootstrap views observed sample as a population

    •  Distribution function for this population is the EDF of the sample, and parameter estimatesbased on the observed sample are treated as the actual model parameters

    •  Conceptually: examine properties of estimators or test statistics in repeated samples drawnfrom tangible data-sampling process that mimics actual DGP

    •  Bootstrap do not represent exact   finite sample properties of estimators and test statisticsunder actual DGP, but provides approximation that improves as size of observed sampleincreases

    •  Reasons for acceptance in recent years:

    — Avoids most of strong distributional assumptions required in Monte Carlo

    — Like Monte Carlo, it may be used to solve intractable estimation and inference problems

    by computation rather than reliance on asymptotic approximations, which may be verycomplicated in nonstandard problems

    — Boostrap approximations are often equivalent to  first-order asymptotic results, and maydominate them in cases.

    48