fonaments d estadistica

24
FONAMENTS D’ESTADÍSTICA Dr. Josep M. Vilaseca CAPSE 2009

Upload: josep-maria-vilaseca-llobet

Post on 10-Feb-2017

275 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Fonaments d estadistica

FONAMENTS D’ESTADÍSTICA

Dr. Josep M. VilasecaCAPSE 2009

Page 2: Fonaments d estadistica

PROBABILITY

The probability of an event is the number of times we would observe it if we repeated the experiment a large

number of times

Page 3: Fonaments d estadistica

THE NORMAL DISTRIBUTION AND THE STANDARD NORMAL

DISTRIBUTION

Page 4: Fonaments d estadistica

INFERENCE FORM A SAMPLE MEAN

The mean of the sampling distribution of means is the true population mean

Its standard deviation is the population standard deviation divided by the square root of the sample (it is called the standard error). Mesura la precisió de la meva mostra.

Confidence interval: estimated mean +/- multiplier x standard error of the estimate

Page 5: Fonaments d estadistica

INFERENCE FORM A SAMPLE MEAN

95% confidence interval: we are 95% confident that the true mean in the population lies between this interval

Z and t: using tables we can obtain a probability for the calculated value

P-value: is the area under the curve corresponding to values outside the range (-z,z; -t,t). That is, the area in the tails of the distribution gives the probability of observing the more extreme values

Page 6: Fonaments d estadistica

INFERENCE FORM A SAMPLE MEAN

Null hypothesis: the two population means are the same

Alternative hypothesis: the two population means are not the same

Hypothesis test: we calculate the probability of obtaining the observed data if the null hypothesis were true (“larger accept, smaller reject”)

Page 7: Fonaments d estadistica

COMPARISON OF TWO MEANS

Paired samples occur when the individual observations in the first sample are matched to individual observations in the second sample. For quantitative data this usually occurs when there are repeated measurements on the same person

Unpaired data occur when individual observations in one sample are independent of individual observations in the other

Page 8: Fonaments d estadistica

COMPARISON OF TWO MEANS

Paired data: we calculate the difference between the first and second measurements, then the mean difference, the standard deviation of the differences and the standard error of the mean difference. We can also calculate the probability that, on average, there is no difference between the paired observations in the population using a hypothesis test. The null hypothesis is that the mean population difference is zero. We assume that the differences are normally distributed with a mean of zero

Page 9: Fonaments d estadistica

COMPARISON OF TWO MEANS Unpaired data: we calculate the difference

between two independent means, the standard deviation in two independent samples, and the standard error of the difference in two independent means, which is a combination of the standard errors of the two independent sample distributions. Using the standard error of the difference in means, we can calculate the confidence interval for the estimated difference and test whether it is significantly different from zero. We can use a z test in the same way as we did before for a single sample mean of paired samples

Page 10: Fonaments d estadistica

COMPARISON OF TWO MEANS When the sample size is small, we use the

t-distribution to calculate confidence intervals and test hypothesis (either paired or unpaired data).

To compare independent samples, however, we need to assume that the variances of the two populations are the same.

Page 11: Fonaments d estadistica

INFERENCE FROM A SAMPLE PROPORTION (7) The sampling distribution of a proportion is

approximately Normal when the sample is large

The SE of a sample estimate is equal to the standard deviation divided by √n.

95% CI= p ± 1.96 x SE(proportion) 95% CI= p ± 1.96 √p(1 – p) / n

Page 12: Fonaments d estadistica

INFERENCE FROM A SAMPLE PROPORTION (7) If we want to assess whether the

population proportion has a certain value:1. First we should state the Null Hypothesis

Π= Π0

2. Then we state the Alternative Hypothesis Π≠ Π0

3. Finally we compute the test statistic z= p - Π0 / SE(Π)

Page 13: Fonaments d estadistica

INFERENCE FROM A SAMPLE PROPORTION (7) Remember: we calculate the SE(Π)

assuming the null hypothesis to be true. Remember: these methods are only

reliable if the sample is large (say, if the proportion is less than 0.5 and the number of subjects with the disease is 5 or more

When these conditions are not satisfied, we use the binomial distribution.

Page 14: Fonaments d estadistica

COMPARISON OF TWO PROPORTIONS (8)

We want to make comparisons between the proportions in two independent populations (case – control study, cohort study, clinical trial).

For a large sample we can use a normal approximation to the binomial distribution

When comparing proportions for independent samples, the first thing we do is calculate the difference between the two proportions

Page 15: Fonaments d estadistica

COMPARISON OF TWO PROPORTIONS (8)

The analysis for comparing two independent proportions is similar to the comparison of two independent means

The standard error for the difference in two proportions is a combination of the standard error of the two independent distributions

Hypothesis test: we use a common proportion (because the two proportions are supposed to be the same) and the pooled standard error

Page 16: Fonaments d estadistica

ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLES When we want to examine the relationship

between two categorical variables we tabulate one against the other. This is called a two – way table (also known as cross – tabulation)

An association exists between two categorical variables if the distribution of one variable varies according to the value of the other

Page 17: Fonaments d estadistica

The chi – squared test for the 2x2 tables is identical to the z-test for comparing 2 proportions. The value z is the square root of chi-squared.

The Fisher’s exact test may also be used.

ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLES

Page 18: Fonaments d estadistica

CORRELATION (10)

Do the values of a variable tend to be higher (or lower) for higher values of the other? CORRELATION

What is the value of one of the variables likely to be when we know the value of the other? LINEAR REGRESSION

Page 19: Fonaments d estadistica

CORRELATION (10) Correlation is used to study the possible linear

(straight line) between two quantitative variables. This tells how much the two variables are associated

To measure the degree of linear association we calculate a correlation coefficient

The standard method is to calculate the Pearson’s correlation coefficient, denoted r

Page 20: Fonaments d estadistica

Measures the scatter of the points around an underlying linear (straight line trend)

Can take any value from -1 to +1 If there is no linear relationship then the

correlation is zero. But be careful, there can be a strong non – linear relationship between two variables.

CORRELATION (10)Pearson’s correlation coefficient

Page 21: Fonaments d estadistica

CORRELATION (10) We can think of the square of r as: the

proportion of the variability in the y variable that is accounted for by the linear relationship with the y variable

Assumptions for use of correlation: the two variables have an approximately

Normal distribution all observations should be independent

Causation cannot be directly inferred from a strong correlation coefficient

Page 22: Fonaments d estadistica

LINEAR REGRESSION (11)

Regression studies the relationship between two variables when one of them depends on the other. This also alows one variable to be estimated given the value of the other.

Page 23: Fonaments d estadistica

MULTI – VARIABLE ANALYSIS 1: STRATIFICATION (12)

Summary measures Standardisation Mantel – Haensel

Page 24: Fonaments d estadistica

MULTI – VARIABLE ANALYSIS 2:MULTIPLE LINEAR REGRESSION