slides casarin monte carlo

8/10/2019 Slides Casarin Monte carlo

1/46


2/46


3/46

Contents

1 A Matlab Primer 1

1.1 Programming Languages . . . . . . . . . . . . . . . . . . . . . 11.2 Fourth Generation Languages (4GPL) . . . . . . . . . . . . . 51.3 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . 61.3.3 Creating Matrices . . . . . . . . . . . . . . . . . . . . . 71.3.4 Matrix Description . . . . . . . . . . . . . . . . . . . . 71.3.5 Other Functions . . . . . . . . . . . . . . . . . . . . . . 71.3.6 Loops and If Statements . . . . . . . . . . . . . . . . . 81.3.7 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4.1 Input, Output and Graphics . . . . . . . . . . . . . . . 91.4.2 Ordinary Least Square . . . . . . . . . . . . . . . . . . 111.4.3 A Bayesian Linear Regression Model . . . . . . . . . . 12

1.5 From Matlab to Scilab and R . . . . . . . . . . . . . . . . . . 17

2 Monte Carlo Integration 212.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 A Monte Carlo Estimator . . . . . . . . . . . . . . . . . . . . 222.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . 242.4 Optimal Number of MC Samples . . . . . . . . . . . . . . . . 25

2.5 Appendix - Matlab Code . . . . . . . . . . . . . . . . . . . . . 27

3 Importance Sampling 313.1 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . 313.2 Properties of the IS Estimators . . . . . . . . . . . . . . . . . 323.3 Generating Student-t Variables . . . . . . . . . . . . . . . . . 34

i


4/46

ii CONTENTS


5/46

Chapter 1

A Matlab Primer

Aim

Learn some basic facts in Matlab programming

Contents

1. Programming Languages

2. Fourth Generation Languages (4GPL)

3. Matlab

4. Examples

5. From Matlab to Scilab

1.1 Programming Languages

If you need to carry out an econometric analysis, before starting to write a

code, may be you would like to have a look to the following link

http://www.feweb.vu.nl/econometriclinks/software.html

where many of the most used econometrics softwares and their contributed

libraries are linked.

1


6/46

2 CHAPTER 1. A MATLAB PRIMER

In the following we report a brief description of the softwares listed at the

econometriclinks webpage maintained by the Royal Economic Society:

A+, ACML, ADMB, AIMMS, ALOGIT, Alyuda, AMOS, AMPL, APL,Apophenia, Arc, AREMOS, AutoBox, Autometrics, AutoSignal

B34S, BACC, BATS, BETA, BIOGEME, BMDP, Brodgar, BUGS,BV4

BACC: Bayesian Analysis, Computation and Communication. Free high

quality generic software developed for different operating systems (Windows,

Unix) and different front-ends. Specific model procedures as well. Supportedby the US NSF. Developed by Bill McCausland under the supervision of John

Geweke.

BUGS: Bayesian inference Using Gibbs Sampling (MCMC: Markov Chain

Monte Carlo)

C(++), CART, Census X12, Caterpillar-SSA, CPLEX, ConfortS,CVar

DataDesk, Dataplore, Dataplot, DATAVIEW, DEA-Solver, DEME-TRA, Draco, DYALOG, DYNARE

DYNARE: A Program for the Resolution and Simulation of Dynamic Models

with Forward Variables Through the Use of a Relaxation Algorithm. Com-

putes k-th order approximations of dynamic stochastic general equilibrium

(DSGE) models. Also allows Bayesian Estimation of DSGEs

EasyFit, EasyReg, EcoWin, ECTS, EQS, Eviews, Excel, EXPO

FAME, ForecastPro, Fortran, FreeFore, FSQP

GAMS, GARCH, GAUSS, GAUSSX, GiveWin, Gempack, GeoDa,Genstat, GLIM, GLIMMIX, GQOPT, graphpad, Gnuplot, GSL, GRETL

GAMS: Generic Algebraic Modeling System for large scale optimization

problems.


7/46

1.1. PROGRAMMING LANGUAGES 3

GAUSS: is a programming language designed to operate with and on ma-

trices. It is a general purpose tool. As such, it is a long way from more

specialised econometric packages. On a spectrum which runs from the com-

puter language C at one end to, say, the menu-driven econometric program

EViews at the other, GAUSS is very much at the programming end.

GRETL: Is a cross-platform software package for econometric analysis, writ-

ten in the C programming language. It is free, open-source software.

HLM

ICRFS-Plus, ILOG, IDAMS, IMSL, INSTAT, ITSM J, JMP, JMulti, JStatCom, JWAVE

KNITRO

MacAnova, Maple, Mendeley, MARS, Mathcad, Mathematica, Math-Player, MathML, MathType,MATLAB, Matrixer, M@ximize, MetrixND,

MHTS, Microfit, MiKTeX, Minitab, MINOS, MIXOR, MLE, MLwiN,

Modeleasy, ModelQED, Modler, MOSEK, Mplus, Modula, MuPAD,

Mx.

MATLAB: It is a high-level language and more specifically a 4GPL (such as

SAS, SPSS, Stata, GAUSS) which allows matrix manipulations for numeri-

cal computing.

NAG Mark 22 Numerical Libraries (2009), Genstat, MLP (ML estima-tion))

Octave, O-Matrix, Omegahat, OpenDX, Ox, OxEdit, OxGauss, Ox-MetricsOctave: a high-level language, primarily intended for numerical computa-

tions. It provides a convenient command line interface for solving linear and

nonlinear problems numerically, and for performing other numerical experi-

ments using a language that is mostly compatible with Matlab. It may also


8/46


be used as a batch-oriented language.

Ox: is an object-oriented matrix programming language for statistics and

econometrics developed by Jurgen Doornik

PASS, PASW, PcFiml, PcGets, PcGive, PcNaive, PythonPython: Free Open Source Dynamic object-oriented programming language

that can be used for many kinds of software development. It offers strong

support for integration with other languages and tools, comes with extensive

standard libraries

R,RATS, REG-X, ReSampling Stats, Rlab, Rlab+R: is 4GPL, it is a free software environment for statistical computing and

graphics. It compiles and runs on a wide variety of UNIX platforms, Win-

dows and MacOS.

RATS: developed by Estima, RATS (Regression Analysis of Time Series) is

an econometrics and time-series analysis software package.

S+,SAS, SCA, Scilab, SciPy, SciViews, Sciword, SCP, Shazam, Sigmaplot,

SIMSTAT, SOLAS, SOL, Soritec, SpaceStat, SQlite, SPAD, Speakeasy,

IBM SPSS, SsfPack, STAMP, Stata, StatCrunch, Statgraphics, Sta-

tistica, Stat/Transfer, StatsDirect, STL, Statview, SUDAAN, SVAR,

SYSTAT

SAS: is a 4GPL which allows to define a sequence of operations (statistical

analysis and data management) to be performed on data

Scilab: is 4GPL free and open source for numerical computation, similar to

Matlab

TSM, TISEAN, TRAMO/SEATS, TSP, TVARTRAMO/SEATS:

UNISTAT, VassarStats, ViSta


9/46

1.2. FOURTH GENERATION LANGUAGES (4GPL) 5

Web Decomp, WebStat, WEKA, WinIDAMS, WINKS, Windows KWIK-STAT, XploRe, Winsolve, X-12-ARIMA, XLisp-Stat, Xtremes, X(G)PL

1.2 Fourth Generation Languages (4GPL)

Each step in the development of Computer Languages has aimed to reduce

the amount of time required to write programs and reduce the amount of

skill required to write Programs.

In the 1GPL the programs are written in binary code and can access

binary digits. To write programs with 1GPL is a very skilled job and it is

very time consuming to test and debug programs.

In the2GPL, the programs are written in symbolic assembly code, they

access bytes and are slightly less time demanding.

In the 3GPL, the programs are written in a High Level Language (e.g.

COBOL, Pascal, C, Fortran, etc), they can access records and programming

requires less time and skills.

In the 4GPL, the programs perform BOOLEAN operations on SETS

(Mathematical), they requires less time and skills. A well known example of

4GPL is SQL.

Scilab,Matlab,GaussandR, see

http://www.scilab.org/

http://www.mathworks.it/

http://www.aptech.com/

http://www.r-project.org/are 4GPL and have some common features. They are a long way from more

specialised econometric packages, are not menu-driven programs (such as E-

Views) and are very much at the programming end. Thus all of them require

a certain degree of familiarity with programming methods and structures.


10/46


Another common feature is that they are extremely powerful for matrix

manipulationand in this sense they are more useful for economists than the3GPL programming languages (such as C or Fortran), where the basic data

units are all scalars. At the same time they are very flexible and allows more

expert users to use interface to procedures written in other languages such

as C, C++, or Fortran.

An important feature of Scilab and R is that the source code of their

libraries are available, which is not generally the case for Matlab and Gauss.

Finally note that Matlab, Gauss and R have a lot of proprietary and con-tributed libraries oriented to statistics and econometrics.

1.3 Matlab

1.3.1 Operators

Select submatrix from matrix:x( startrow : endrow, startcolumn : endcolumn ) Transposition operator: Matrix Operators: + - * \ % Element-by-element operators: .+ .- .* .\ Concatenating operators:[leftmatrix, rightmatrix] [uppermatrix; bottommatrix]

Relational operators: < > == /= >= .== ./= .>= .


11/46


12/46


y= ceil( x ); y= floor( x ); y= reshape( x,r,c ); Kronecker product: kron( x , y ) y= trimr( x,t,b );

1.3.6 Loops and If Statements

for i=start:step:increment;

...

end;

while logical expression;

...

end;

if logical expression 1;

...elseif logical expression 2;

...

else;

...

end;

Example of do loop with counter:

i=1;while (i=100);

...

i=i+1;

end;


13/46


14/46


end;

end;

%*************************************************

% Some Pictures...

%*************************************************

% figure(1) to have distinct graphs

figure(1);

title(Time series data);

ylabel(Data);xlabel(Time);

plot(xx,yy);

figure(2);

title(Time-varying log-volatility);

a=plot(xx,s,color,[1 0 0]); %[red green blue] the rgb convention

axis([1 n min(s) max(s)]); % Set tics

figure(3);

title(Dummy);

plot(xx,d,color,[1 0 0]); %[red green blue] the rgb conventionaxis([1 n -0.1 1.1]); % Set tics

%*************************************************

% All charts in one pictures...%*************************************************

figure(4);

subplot(3,1,1);

title(Time series data);ylabel(Data);

xlabel(Time);

plot(xx,yy);

subplot(3,1,2);

title(Time-varying log-volatility);

plot(xx,s,color,[1 0 0]); %[red green blue] the rgb convention

axis([1 n min(s) max(s)]); % Set tics

subplot(3,1,3);

title(Dummy);plot(xx,d,color,[1 0 0]); %[red green blue] the rgb convention

axis([1 n -0.1 1.1]); % Set tics

%*************************************************% histogram

%*************************************************

figure(5);

hist(yy,50);

%*************************************************

% Save the results in a ouput file

%*************************************************fid = fopen(C:/Dottorato/Teaching/SummerSchoolBertinoro/...

TutorialAntonietta/TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt, w);

fprintf(fid, %5.2f\n, yy);

fclose(fid);


15/46

1.4. EXAMPLES 11

%*************************************************

1.4.2 Ordinary Least Square

We learn how to use structures in Matlab

function results=ols(y,x)

% PURPOSE: least-squares regression

%---------------------------------------------------

% USAGE: results = ols(y,x)

% where: y = dependent variable vector (nobs x 1)% x = independent variables matrix (nobs x nvar)

%---------------------------------------------------% RETURNS: a structure

% results.meth = ols

% results.beta = bhat

% results.tstat = t-stats

% results.yhat = yhat% results.resid = residuals

% results.sige = e*e/(n-k)% results.rsqr = rsquared

% results.rbar = rbar-squared

% results.dw = Durbin-Watson Statistic

% results.nobs = nobs

% results.nvar = nvars

% results.y = y data vector

Check for the correct number of input argument and if the number ofrows of x is equal to the number of rows of y

if (nargin ~= 2); error(Wrong # of arguments to ols);

else[nobs nvar] = size(x); [nobs2 ndep] = size(y);

if (nobs ~= nobs2); error(x and y must have same # obs in ols);end;

end;

k=nvar;

Evaluate all the statistics that are usually involved in a OLS estimation

results.y = y;results.nobs = nobs;

results.nvar = nvar;

%xpxi = (x*x)\eye(k);

results.beta = xpxi*(x*y);

results.yhat = x*results.beta;

results.resid = y - results.yhat;

sigu = results.resid*results.resid;results.sige = sigu/(nobs-nvar);

tmp = (results.sige)*(diag(xpxi));results.tstat = results.beta./(sqrt(tmp));


16/46


ym = y - mean(y);

rsqr1 = sigu; rsqr2 = ym*ym;

results.rsqr = 1.0 - rsqr1/rsqr2; % r-squaredrsqr1 = rsqr1/(nobs-nvar);

rsqr2 = rsqr2/(nobs-1.0);

results.rbar = 1 - (rsqr1/rsqr2); % rbar-squared

ediff = results.resid(2:nobs) - results.resid(1:nobs-1);results.dw = (ediff*ediff)/sigu; % durbin-watson

end;

We save as a function the ols.m code and run the following simulationexample

nob=100;

x1=ones(nob,1);

x2=randn(nob,1).*((1:nob)/10);

x=[x1 x2];sig=2;y=x*[10; 0.9]+sig*randn(nob,1);

res=ols(y,x);

res.beta

%%

figure(1)

plot([res.yhat y]);

figure(2)plot(res.resid);

1.4.3 A Bayesian Linear Regression Model

LetyRn,X Rn Rk and Rk. Consider the simple regression model

y= X+ (1.1)

Nn(0n, 2In)(1.2)

with the following prior specification

(1.3) R N(r, T)

or equivalently

(1.4) Q N(q, Ik)

where QQ= T1 and q= Qr.


17/46


18/46


19/46

1.4. EXAMPLES 15

0.9610 11.2966 0

0.9200 11.2740 0

Theil-Goldberger estimates

1.0037

0.9569

0.9198

We apply now the inference procedure to a financial dataset. We consider

monthly data on the short-term interest rate (the three-month Treasury Billrate) and on the AAA corporate bond yield in the USA. As Treasury Bill

notes and AAA bonds are low-risk securities and one could expect that there

is a relationship between their interest rate. We consider data from January

1950 to December 1999.

Letyibe the monthly change in the Treasury Bill rate andzithe monthly

change in the AAA bond rate. We will fit on this set of data the heteroscedas-

tic model presented above with

yi=1+2zi+i

that corresponds to set xi = (1, zi) and = (1, 2)

in the multivariate

regression model given above. The results of the estimation procedure are

Gibb sampling estimates

Coefficient t-statistic t-probability

0.0053 0.7805 0.2177

0.2751 19.8628 0

Theil-Goldberger estimates

0.0057

0.2747


20/46


100 200 300 400 500 6001.5

1

0.5

0

0.5

1

1.5

Actual

Fitted

100 200 300 400 500 6001

0.5

0

0.5

1

1.5

Residuals

Figure 1.1: Actual and fitted data (top) and residuals (bottom) using theBayesian estimates of the linear regression model.

The estimates of the2 are 0.0283 for the Gibbs sampler and 0.0282 for the

Theil-Goldberger procedure.

The actual and fitted data and the residuals are given in Fig. 1.1. The

plot of the residuals shows that in the second half of the sample (say after

the 1975) the variance is underestimated. More precisely one should account

in the model for the time variation in the variance of the data. This call

for heteroscedastic linear regression models (see Chapter??) or for nonlinear

models such as stochastic volatility models (see Chapter ?? and ??).

References

Gelfand, Alan E., and A.F.M Smith. 1990. Sampling-Based Approachesto Calculating Marginal Densities, Journal of the American Statistical Asso-

ciation, Vol. 85, pp. 398-409.


21/46


22/46


//*************************************************

// Some Pictures...//*************************************************

// figure(1) to have distinct graphs

figure(1);

title("Time series data");

ylabel("Data");

xlabel("Time");

plot(xx,yy);

figure(2);

title("Time-varying log-volatility");

plot(xx,s,color,[1 0 0]); //[red green blue] the rgb convention

a=gca();

a.data_bounds=[1,min(s);n,max(s)];// Set tics

figure(3);

title("Dummy");

plot(xx,d,color,[1 0 0]); //[red green blue] the rgb convention

a=gca();

a.data_bounds=[1,-0.1;n,1.1];// Set tics

//*************************************************

// All charts in one pictures...

//*************************************************

figure(4);

subplot(3,1,1);

title("Time series data");

ylabel("Data");

xlabel("Time");

plot(xx,yy);

subplot(3,1,2);

title("Time-varying log-volatility");

plot(xx,s,color,[1 0 0]); //[red green blue] the rgb convention

a=gca();

a.data_bounds=[1,min(s);n,max(s)];// Set tics

subplot(3,1,3);

title("Dummy");plot(xx,d,color,[1 0 0]); //[red green blue] the rgb convention

a=gca();

a.data_bounds=[1,-0.1;n,1.1];// Set tics

//*************************************************

// histogram


23/46

1.5. FROM MATLAB TO SCILAB AND R 19

//*************************************************

figure(5);histplot(100,yy);

//*************************************************

// Save the results in a ouput file

//*************************************************

fprintfMat(C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/...

TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt,yy,%5.2f);

// attention this overwrites the existing file

R

#*************************************************

# basic in I/O, graphical, statistical procedures

#*************************************************

# Load UK/EU exchange rate data

yy=scan("C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/...

TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/pound.txt",sep="\t",skip=0,na.strings=".")

dim(yy)=c(1006,1);

#*************************************************

n=dim(yy); # evaluate the number of rows #

n=n[1];

xx=(1:n);

#*************************************************# for endfor if end

# (1) Evaluate sequentially the variance

# (2) Built a dummy variable, based on the value

# of the variance estimated recursively

#*************************************************

wn=10; # set the value of a variable#

s=array(0,n); # define a n-dim null vector #

d=array(0,n);

for (j in ((wn+1):n)){

s[j]=var(yy[(j-wn+1):j]);

if (s[j]>0.45){

d[j]=1;

}

}

#*************************************************

# Some Pictures...

#*************************************************

# figure(1) to have distinct graphs


24/46


dev.new();plot(xx,yy,main="Time series data",xlab="Time",ylab="Data",type="l");

dev.new();

plot(xx,s,main="Time-varying log-volatility",xlab="Time",ylab="Data",type="l");

#[red green blue] the rgb convention

dev.new();

plot(xx,d,main="Dummy",xlab="Time",ylab="Data",type="l");


#*************************************************

# All charts in one pictures...

#*************************************************

par(mfrow=c(3,1),pin=c(5,1.5));plot(xx,yy,main="Time series data",xlab="Time",ylab="Data",type="l");

plot(xx,s,main="Time-varying log-volatility",xlab="Time",ylab="Data",type="l");


plot(xx,d,main="Dummy",xlab="Time",ylab="Data",type="l");


#*************************************************

# histogram

#*************************************************

dev.new();

hist(yy,50);

#*************************************************

# Save the results in a ouput file

#*************************************************

save(yy, file = "C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/...

TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt");


25/46

Chapter 2

Monte Carlo Integration

Aim

Apply basic Monte Carlo principles to solve some basic integrationproblems. Discuss the choice of the number of samples in a MonteCarlo estimation.

Contents

1. Integration

2. A Monte Carlo Estimator

3. Asymptotic Properties

4. Optimal Number of MC Samples

5. Appendix - Matlab Code

2.1 Integration Our aim is to approximate the integral

(2.1) (f) =

10

f(x)dx

21


26/46

22 CHAPTER 2. MONTE CARLO INTEGRATION

for the following integrand functions f

1. f(x) =x

2. f(x) =x2

3. f(x) = cos(x)

We apply a Monte Carlo approach and re-write the integration problem instatistical terms as follows

(2.2)

10

f(x)dx=

+

f(x)I[0,1](x)dx= E(f(X))

where IA(x) if the indicator function that holds 1 ifxA and 0 otherwiseand X U[0,1] is a random variable with a standard uniform distribution.

2.2 A Monte Carlo Estimator

Let X1, . . . , X n be a set ofn i.i.d. samples from a uniform distribution.The integral= E(f(X)) approximates as follows

(2.3) n= 1

n

ni=1

f(Xi)

that is called a Monte Carlo estimator ofE(f(X)).

The results of the Monte Carlo estimates for different sample sizes n =1, . . . , 50 and different integrand functions fare given in Fig. 2.1

Find the mean and the variance of the estimator and give a Monte Carloapproximation for the expression of the variance.


27/46


28/46


where

2(f) = V(f(X1)) = +

(x )2f(x)I[0,1](x)dx

For the differentfwe find the analytical solution of the integral (f) (see

also horizontal dotted lines in Fig. 2.1)

1. Forf(x) =x

(2.6) E(f(X1)) = 1

0

xdx= 1

2x2

0

1

= 1/2

2. Forf(x) =x2

(2.7) E(f(X1)) =

10

x2dx=

13 x30

1

= 1/3

3. Forf(x) = cos(x)

(2.8) E(f(X1)) = 10 cos(x)dx=

1sin(x)0

1 = 0

2.3 Asymptotic Properties

Under the i.i.d. and finite variance assumptions we have

(2.9) na.s.

n

(2.10)

n (n ) Dn

N(0, 2(f))

For the different fwe have


29/46

2.4. OPTIMAL NUMBER OF MC SAMPLES 25

1. Forf(x) =x

V(f(X1)) = E(f(X1)2) (E(f(X1)))2

=

10

x2dx 1

0

xdx

2= 1/3 1/4 = 1/12

2. Forf(x) =x2

V(f(X1)) = 1/5 1/9 = 4/45

3. Forf(x) = cos(x)

V(f(X1)) = 1/2 0 = 1/2

When the variance V(f(X1)) is unknown one can use the Monte Carlo

estimator

(2.11) 2(f) = 1

n

1

n

i=1

(Xi n)2

The empirical approximations of the asymptotic variances are given in Fig.

2.2.

Exercise: use the asymptotic distribution and the approximation of theasymptotic variance to find the 5% confidence intervals of the MC estimator

of.

2.4 Optimal Number of MC Samples

It is possible to use the asymptotic properties of a MC estimator to findthe optimal number n of samples that are necessary to reach an accuracy


30/46


MC Variances

0 10 20 30 40 500.05

0.1

0.15

0.2

0.25f(x)=x

Empirical Variance

Theoretical Variance

0 10 20 30 40 500

0.05

0.1

0.15

0.2f(x)=x

2

Empirical Variance


0 10 20 30 40 500

0.5

1

1.5

f(x)=cos(x)

Empirical Variance


Figure 2.2: Monte Carlo variance estimates 2n (solid lines) for different sam-ple sizes n= 1, . . . , 50 and the true value 2 (horizontal dotted lines).

level , for a given confidence level , in the Monte Carlo estimation of.

The asymptotic results allow us to find nsuch that

(2.12) P r|n | 2(f)/n = 1

that is

(2.13) X=

n

2(f)n =

X

22(f)


31/46

2.5. APPENDIX - MATLAB CODE 27

where X = 1(1/2), with 1 the inverse cumulative distribution

function of a standard normal.When the variance 2(f) is unknown one can use the Monte Carlo esti-

mator 2n(f) and then apply a similar asymptotic argument. In this case the

optimal number of simulations should satisfy the following relationship

(2.14) 2n(f)n2

X2

One can check iteratively the condition.

1. Start with n1 MC samples X1, . . . , X n1

2. If 2n(f) n2X2

then stop otherwise

3. evaluatek1=n2X2

nand generatek1samplesXn1+1, . . . , X n1+k1(xindicates the integer part ofx)

Exercise: write a Matlabs code for computing the optimal number of sam-

ples that are needed to estimate (f) for the different integrand functions f

given in Section 1 and for the accuracy level = 0.001.

2.5 Appendix - Matlab Code% Uniform Random Number% Monte Carlo method as an approximated integration technique

% integrate f(x) on the [0,1] interval

% solution: 1/2, 1/3, and 0

clc;

n=50;

x=rand(n,1);gav=zeros(n,3);

gavvar=NaN(n,3);

gav(1,1)=x(1,1);gav(1,2)=x(1,1)^2;

gav(1,3)=cos(pi*x(1,1));

for i=2:n

gav(i,1)=sum(x(1:i))/i;gav(i,2)=sum(x(1:i).^2)/i;

gav(i,3)=sum(cos(pi*x(1:i)))/i;gavvar(i,1)=var(x(1:i));


32/46


gavvar(i,2)=var(x(1:i).^2);

gavvar(i,3)=var(cos(pi*x(1:i)));

end%

%

%%%%%%%%% Graphics (mean) %%%%%%%%%%

figure(1);subplot(3,1,1);

plot(gav(:,1));

line((1:n),ones(n,1)/2,color,red);

legend(Empirical Average,Theoretical Mean,...Location,NorthEastOutside);

title(f(x)=x);

%subplot(3,1,2);

plot(gav(:,2));


legend(Empirical Average,Theoretical Mean,...Location,NorthEastOutside);title(f(x)=x^2);

%

subplot(3,1,3);plot(gav(:,3));

line((1:n),ones(n,1)*0,color,red);

legend(Empirical Average,Theoretical Mean,...

Location,NorthEastOutside);title(f(x)=cos(\pi x));

To export picture to a .eps file one can use

%%%%%%%%% Export a picture %%%%%%%%%%%%%

dire=C:\Dottorato\Teaching\SummerSchoolBertinoro;figu=\TutorialAntonietta\TutorialRobAnt\Figure\;

figname=strvcat([strcat(dire,figu,MC1.eps)]);print (gcf,-depsc2, figname);

%

%%%%%%%%% Graphics (variance) %%%%%%%%%%figure(2);

subplot(3,1,1);plot(gavvar(:,1));


legend(Empirical Variance,Theoretical Variance,...

Location,NorthEastOutside);

title(f(x)=x);

%subplot(3,1,2);

plot(gavvar(:,2));

line((1:n),ones(n,1)*4/45,color,red);legend(Empirical Variance,Theoretical Variance,...


title(f(x)=x^2);

%

subplot(3,1,3);plot(gavvar(:,3));

line((1:n),ones(n,1)*1/2,color,red);legend(Empirical Variance,Theoretical Variance,...


33/46

2.5. APPENDIX - MATLAB CODE 29


title(f(x)=cos(\pi x));


34/46



35/46

Chapter 3

Importance Sampling

Aim

Define and apply the importance sampling method and study itsproperties.

Contents

1. Importance Sampling (IS)

2. Properties of the IS Estimators

3. Generating Student-t Variables

3.1 Importance Sampling

Let be a probability density function, fa measurable function and

(3.1) = E(f(X)) =

f(x)(x)dx

the integral of interest.

In importance sampling (see Section 3.3 in Robert and Casella (2004)) a

distribution g (called importance distribution or instrumental distribution)

31


36/46

32 CHAPTER 3. IMPORTANCE SAMPLING

is used to apply a change of measure

(3.2) =

(x)

g(x)f(x)g(x)dx

The resulting integral is then evaluated numerically by using a i.i.d. sample

X1, . . . , X n fromg

(3.3) ISn = 1

n

ni=1

w(Xi)f(Xi)

wherew(Xi) =

(Xi)

g(Xi), i= 1, . . . , n

are called importance weights.

3.2 Properties of the IS Estimators

The Monte Carlo estimator ISn of is unbiased

Eg(ISn ) =

1

n

ni=1

w(xi)f(xi)

ni=1

g(xi)dxi

=

(x1)

g(x1)f(x1)g(x1)dx1

=

f(x1)(x1)dx1

and converges almost surely to , under the assumption supp g

supp .

Nevertheless the existence of the variance and of a limiting distribution is

not guaranteed. We shall notice that Vg(ISn ) Eg((ISn )2) thus the condition

we need to check is the existence of an upper bound for the second order


37/46


38/46


39/46

3.3. GENERATING STUDENT-T VARIABLES 35

0 1 2 3 4 5

x 104

0

1

2

Studentt

0 1 2 3 4 5

x 104

0

5

10

Normal

0 1 2 3 4 5

x 104

0

1

2

Cauchy

Figure 3.1: Importance sampling weights for the proposal distributionsT(, 0, 1),N(0, /( 2)) andC(0, 1)

where< 0 and cumulative distribution function

F(x) = 1

x

1

(1 + ((u )/)2)du

= 1

2+

1

arctan

x IR(x)

The inverse c.d.f. method can be applied in order to generate from the

Cauchy. IfX=F1(U), where U U[0,1], then X C(, ).

From the results in Fig. 3.1 one can see that the importance weights for

Student-t and Cauchy are not unstable while the importance weights asso-

ciated to the normal exhibit some large jumps. For all the functions theresults in Fig. 3.2 show that the normal proposal produces jumps in the

progressive averages (green lines) that are due to the unbounded variance of

the estimator. However for the first function the normal proposal behaves

quite well when compared with the Cauchy and Student-t proposals. For the


40/46


second and third function the Cauchy proposal seems to converge faster than

the Student-t. In all the pictures we plotted (black lines) the approximationobtained with an exact simulation from a Student-t with = 12.

Exercise - Use repeated Monte Carlo experiments to find the distribution

of the estimator n(f). Plot the 95% and 5% quantiles and the mean of the

estimator for n= 1, . . . , 50000.

The Matlab code is

%%%%%%% Importance weight for T(nustar,0,1)

function w=w1(x,nu,nustar)w=pdf(t,x,nu)/pdf(t,x,nustar);

end

%

%%%%%%% Importance weight for N(0,nu/(nu-2))

function w=w2(x,nu)w=pdf(t,x,nu)/pdf(normal,x,0,sqrt(nu/(nu-2)));

end%

%%%%%%% Importance weight for C(0,1)

function w=w3(x,nu)

w=pdf(t,x,nu)/pdfcauchy(x,0,1);

end

%clc;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Importance sampling

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

nu=12;

nustar=7;

nIS=50000;

mu1IS=zeros(nIS,4);mu2IS=zeros(nIS,4);mu3IS=zeros(nIS,4);

%

mu1IScum=zeros(nIS,4);



%wIS=zeros(nIS,3);

for i=1:nIS% Proposal 1

x1=random(t,nustar);

% Proposal 2

x2=random(normal,0,sqrt(nu/(nu-2)));

% Proposal 3x3=tan((rand(1,1)-0.5)*pi);

%x3=random(normal,0,1)/random(normal,0,1);% Exact


41/46


42/46


plot((1:nIS),wIS(:,3));

legend(Cauchy,Location,NorthEast);

set(gca,FontSize,fs);

figure(2)

plot((1:nIS),mu1IScum(:,1:3));

hold on;plot((1:nIS),mu1IScum(:,4),-k);

hold off;

legend(Student-t,Normal,Cauchy,Exact,Location,NorthEast);

ylim([0.00001 0.00015]);set(gca,FontSize,fs);

figure(3)plot((1:nIS),mu2IScum(:,1:3));

hold on;

plot((1:nIS),mu2IScum(:,4),-k);

hold off;legend(Student-t,Normal,Cauchy,Exact,Location,NorthEast);ylim([1 1.4]);


figure(4)

plot((1:nIS),mu3IScum(:,1:3));

hold on;

plot((1:nIS),mu3IScum(:,4),-k);hold off;

legend(Student-t,Normal,Cauchy,Exact,Location,NorthEast);

ylim([3 9]);


This code calls the following function defined by the user

%%%%%%% Cauchy probability density functionfunction f=pdfcauchy(x,a,b)

f=1/(pi*b*(1+((x-a)/b)^2));

end

%


43/46

3.3. GENERATING STUDENT-T VARIABLES 39

0 1 2 3 4 5

x 104

2

4

6

8

10

12

14

x 105

Studentt

NormalCauchy

Exact

0 1 2 3 4 5

x 104

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

StudenttNormalCauchyExact

0 1 2 3 4 5

x 104

3

4

5

6

7

8

9

StudenttNormalCauchyExact

Figure 3.2: Charts from one to three: IS for the different functions f.In each chart the IS estimators for different proposals (colored lines) andthe Monte Carlo estimator with exact simulation from theT(12, 0, 1) (blacklines).


44/46


45/46

Exercise

Importance Sampling

Consider a Student-t distributionT(,,2) with density

(3.6) (x) = ((+ 1)/2)

(/2)

1 +

(x )22

(+1)/2IR(x)

w.l.o.g. take = 0, = 1 and = 12.

Study the performance of the importance sampling estimator ISn of

(3.7) = E(f(X)) =

f(x)(x)dx=

(x)g(x)

f(x)g(x)dx

when the following instrumental distributions, g(x), are used

1.T(, 0, 1) with < (e.g. = 7)

2.N(0, /( 2))

3.C(0, 1)

for the following test functions

1.

f(x) =

sin(x)

x

5I(x)(2.1,+)

41


46/46


2.

f(x) = x1 x

3.

f(x) = x5

1 + (x 3)2 I[0,+)(x)

Metropolis-Hastings

Write a M.-H. algorithm to generate n = 500 i.i.d. random samples from

a zero-mean and independent bivariate normal distribution,N2(0, I2), withcovariance matrix, I2 and mean 0 = (0, 0)

. Use alternatively independent

and random walk proposals with variance covariance matrix2I2. (Try with

different values of2).

slides casarin monte carlo

Documents