The Standard Normal Distribution PSY440 June 3, 2008 Outline of Class Period Article Presentation (Kristin M) Recap of two items from last time Using Excel to compute descriptive statistics Using SPSS to generate histograms Standardization (z-transformation) of scores The normal distribution Properties of the normal curve Standard normal distribution & the unit normal table Intro to probability theory and hypothesis testing Using Excel to Compute Mean & SD Step 1: Compute mean of height with formula bar. Step 2: Create deviation scores by creating a formula that subtracts the mean from each raw score, and apply the formula to all of the cells in a blank column next to the column of raw scores. Step 3: Square the deviations by creating a formula and applying it

to the cells in the next blank column. Step 4: Use the formula bar to add the squared deviations, divide by (n-1) and take the square root of the result. Step 5: Check the result by computing the SD with the formula bar. Using SPSS to generate histograms Most common answer: Most distinctive answer: How did this happen? The shape of the histogram will change depending on the intervals used on the x axis. For very large samples and truly continuous variables, the shape will smooth out, but with smaller samples, the shape can change considerably if you change the size of the intervals. Make sure you are in charge of SPSS and not vice versa!

SPSS has default settings for many of its operations that or may not be what you want. You can tell SPSS how many intervals you want in your histogram, or how large you want the intervals to be. Histogram with 16 intervals In legacy dialogues, chose interactive and then choose histogram. (see note) In chart builder, choose histogram then choose element properties then click on set parameters The Z transformation If you know the mean and standard deviation (sample or population we wont worry about which one, since your text book doesnt) of a distribution, you can convert a

given score into a Z score or standard score. This score is informative because it tells you where that score falls relative to other scores in the distribution. Locating a score Where is our raw score within the distribution? The natural choice of reference is the mean (since it is usually easy to find). So well subtract the mean from the score (find the deviation score). X The direction will be given to us by the negative or positive sign on the deviation score Thedistance is the value of the deviation score Locating a score Reference point

=100 X1 = 162 X2 = 57 X X - 100 = +62 1 X2 - 100 = -43 Direction Locating a score Reference point Below X1 = 162 X2 = 57

=100 X X - 100 = +62 1 X2 - 100 = -43 Above Transforming a score The distance is the value of the deviation score However, this distance is measured with the units of measurement of the score. Convert the score to a standard (neutral) score. In this case a z-score. Raw score z=

X Population mean Population standard deviation Transforming scores =100 =50 X z= X1 = 162

X1 - 100 = +1.20 50 X2 = 57 X2 - 100 = -0.86 50 A z-score specifies the precise location of each X value within a distribution. Direction: The sign of the z-score (+ or -) signifies whether the score is above the mean or below the mean. Distance: The numerical value of the z-score specifies the distance from the mean by counting the number of standard deviations between X and . Transforming a distribution We can transform all of the scores in a distribution We can transform any & all observations to z-scores if we know the distribution mean and standard deviation.

We call this transformed distribution a standardized distribution. Standardized distributions are used to make dissimilar distributions comparable. e.g., your height and weight One of the most common standardized distributions is the Zdistribution. Properties of the z-score distribution =100 =50 =0 X z= transformation 50 150 z mean =

Xmean = 100 100 100 50 =0 Properties of the z-score distribution =100 =50 =0 X z= transformation 50

150 100 100 50 150 100 = 50 Xmean = 100 z mean = =0 X+1std = 150 z +1std = +1

+1 Properties of the z-score distribution =100 =50 =0 =1 X z= transformation 50 150

100 100 50 150 100 z +1std = 50 50 100 z1std = 50 z mean = Xmean = 100 X+1std = 150 X-1std = 50 -1 =0 = +1 = -1

+1 Properties of the z-score distribution Shape - the shape of the z-score distribution will be exactly the same as the original distribution of raw scores. Every score stays in the exact same position relative to every other score in the distribution. Mean - when raw scores are transformed into z-scores, the mean will always = 0. The standard deviation - when any distribution of raw scores is transformed into z-scores the standard deviation will always = 1. From z to raw score We can also transform a z-score back into a raw score if we know the mean and standard deviation information of the original distribution . Z = (X - ) --> (Z)( ) = (X - ) --> X = (Z)( ) +

=100 =50 =0 =1 X = Z + transformation 50 X = 70 150 -1 X = (-0.60)( 50) + 100

+1 Z = -0.60 Lets try it with our data To transform data on height into standard scores, use the formula bar in excel to subtract the mean and divide by the standard deviation. Can also choose standardize (x,mean,sd) Show with shoe size Observe how height and shoe size can be more easily compared with standard (z) scores Z-transformations with SPSS You can also do this in SPSS. Use Analyze . Descriptive Statistics. Descriptives . Check the box that says save standardized values as variables. The Normal Distribution Normal distribution

The Normal Distribution Normal distribution is a commonly found distribution that is symmetrical and unimodal. Not all unimodal, symmetrical curves are Normal, so be careful with your descriptions 2 2 1 It is defined by the following equation: e (X ) / 2 2 The mean, median, and mode are all equal for this2 distribution. -2 -1 0

1 2 The Normal Distribution This equation provides x and y coordinates on the graph of the frequency distribution. You can plug a given value of x into the formula to find the corresponding y coordinate. Since the function describes a symmetrical curve, note that the same y (height) is given by two values of x (representing two scores an equal distance above and below the mean) 1 Y = -2 -1 0 1

2 2 2 e (X ) 2 / 2 2 The Normal Distribution As the distance between the observed score (x) and the mean increases, the value of the expression (i.e., the y coordinate) decreases. Thus the frequency of observed scores that are very high or very low relative to the mean, is low, and as the difference between the observed score and the mean gets very large, the frequency approaches 0. 1 Y =

-2 -1 0 1 2 2 2 e (X ) 2 / 2 2 The Normal Distribution As the distance between the observed score (x) and the mean

decreases (i.e., as the observed value approaches the mean), the value of the expression (i.e., the y coordinate) increases. The maximum value of y (i.e., the mode, or the peak in the curve) is reached when the observed score equals the mean hence mean equals mode. 1 Y = -2 -1 0 1 2 2 2

e (X ) 2 / 2 2 The Normal Distribution The integral of the function gives the area under the curve (remember this if you took calculus?) The distribution is asymptotic, meaning that there is no closed solution for the integral. It is possible to calculate the proportion of the area under the curve represented by a range of x values (e.g., for x values between -1 and 1). 1 Y = -2 -1

0 1 2 2 2 e (X ) 2 / 2 2 The Unit Normal Table z .00 .01

-3.4 -3.3 : : 0 : : 1.0 : : 3.3 3.4 0.0003 0.0005 : : 0.5000 : : 0.8413 : :

0.9995 0.9997 0.0003 0.0005 : : 0.5040 : : 0.8438 : : 0.9995 0.9997 The normal distribution is often transformed into z-scores. Gives the precise proportion of scores (in zscores) between the mean (Z score of 0) and any other Z score in a Normal distribution Contains the proportions in the tail to the left of corresponding z-scores of a Normal distribution

This means that the table lists only positive Z scores The .00 column corresponds to column (3) in Table B of your textbook. Note that for z=0 (i.e., at the mean), the proportion of scores to the left is .5 Hence, mean=median. Using the Unit Normal Table z .00 .01 -3.4 -3.3 : : 0 : : 1.0 :

: 3.3 3.4 0.0003 0.0005 : : 0.5000 : : 0.8413 : : 0.9995 0.9997 0.0003 0.0005 : : 0.5040 :

: 0.8438 : : 0.9995 0.9997 50%-34%-14% rule Similar to the 68%-95%-99% rule 34.13% -2 -1 0 13.59% 1 2 2.28%

At z = +1: 15.87% (13.59% and 2.28%) of the scores are to the right of the score 100%-15.87% = 84.13% to the left Using the Unit Normal Table z .00 .01 -3.4 -3.3 : : 0 : : 1.0 : : 3.3

3.4 0.0003 0.0005 : : 0.5000 : : 0.8413 : : 0.9995 0.9997 0.0003 0.0005 : : 0.5040 : : 0.8438

: : 0.9995 0.9997 Steps for figuring the percentage above or below a particular raw or Z score: 1. Convert raw score to Z score (if necessary) 2. Draw normal curve, where the Z score falls on it, shade in the area for which you are finding the percentage 3. Make rough estimate of shaded areas percentage (using 50%-34%-14% rule) Using the Unit Normal Table z .00

.01 -3.4 -3.3 : : 0 : : 1.0 : : 3.3 3.4 0.0003 0.0005 : : 0.5000 : : 0.8413

: : 0.9995 0.9997 0.0003 0.0005 : : 0.5040 : : 0.8438 : : 0.9995 0.9997 Steps for figuring the percentage above or below a particular raw or Z score: 4. Find exact percentage using unit normal table

5. If needed, subtract percentage from 100%. 6. Check the exact percentage is within the range of the estimate from Step 3 SAT Example problems The population parameters for the SAT are: = 500, = 100, and it is Normally distributed Suppose that you got a 630 on the SAT. What percent of the people who take the SAT get your score or lower? z= X 630 500 From the table: = = 1.3 100 z(1.3) =.9032 So 90.32% got your score or lower

-2 -1 Thats 9.68% above this score 1 2 The Normal Distribution You can go in the other direction too Steps for figuring Z scores and raw scores from percentages: 1. Draw normal curve, shade in approximate area for the percentage (using the 50%-34%-14% rule) 2. Make rough estimate of the Z score where the shaded area starts 3. Find the exact Z score using the unit normal table 4. Check that your Z score is similar to the rough estimate from Step 2

5. If you want to find a raw score, change it from the Z score The Normal Distribution Example: What z score is at the 75th percentile (at or above 75% of the scores)? 1. Draw normal curve, shade in approximate area for the percentage (using the 50%-34%-14% rule) 2. Make rough estimate of the Z score where the shaded area starts (between .5 and 1) 3. Find the exact Z score using the unit normal table (a little less than .7) 4. Check that your Z score is similar to the rough estimate from Step 2 5. If you want to find a raw score, change it from the Z score using mean and standard deviation info. The Normal Distribution Finding the proportion of scores falling between two observed scores 1. 2. 3. 4. 5.

Convert each score to a z score Draw a graph of the normal distribution and shade out the area to be identified. Identify the area below the highest z score using the unit normal table. Identify the area below the lowest z score using the unit normal table. Subtract step 4 from step 3. This is the proportion of scores that falls between the two observed scores. -2 -1 0 1 2 The Normal Distribution -2 -1 0 1

2 Example: What proportion of scores falls between the mean and .2 standard deviations above the mean? 1. 2. 3. Convert each score to a z score (mean = 0, other score = .2) Draw a graph of the normal distribution and shade out the area to be identified. Identify the area below the highest z score using the unit normal table: For z=.2, the proportion to the left = .5793 4. Identify the area below the lowest z score using the unit normal table. For z=0, the proportion to the left = .5 5. Subtract step 4 from step 3: .5793 - .5 = .0793

About 8% of the observations fall between the mean and .2 SD. The Normal Distribution -2 -1 0 1 2 Example 2: What proportion of scores falls between -.2 standard deviations and -.6 standard deviations? 1. 2. 3. Convert each score to a z score (-.2 and -.6) Draw a graph of the normal distribution and shade out the area to be identified. Identify the area below the highest z score using the unit normal table: For z=-.2, the proportion to the left = 1 - .5793 = .4207 4.

Identify the area below the lowest z score using the unit normal table. For z=-.6, the proportion to the left = 1 - .7257 = .2743 5. Subtract step 4 from step 3: .4207 - .2743 = .1464 About 15% of the observations fall between -.2 and -.6 SD. Hypothesis testing Example: Testing the effectiveness of a new memory treatment for patients with memory problems Our pharmaceutical company develops a new drug treatment that is designed to help patients with impaired memories. Before we market the drug we want to see if it works. The drug is designed to work on all memory patients, but we cant test them all (the population). So we decide to use a sample and conduct the following experiment. Based on the results from the sample we will make conclusions about the population.

Hypothesis testing Example: Testing the effectiveness of a new memory treatment for patients with memory problems Memory patients Memory treatment Memory 55 Test errors No Memory treatment Memory 60 errors Test Is the 5 error difference: A real difference due to the effect of the treatment Or is it just sampling error?

5 error diff Testing Hypotheses Hypothesis testing Procedure for deciding whether the outcome of a study (results for a sample) support a particular theory (which is thought to apply to a population) Core logic of hypothesis testing Considers the probability that the result of a study could have come about if the experimental procedure had no effect If this probability is low, scenario of no effect is rejected and the theory behind the experimental procedure is supported Basics of Probability Poible uccefuloutco e Probability = Allpoible outco e Probability Expected relative frequency of a particular outcome

Outcome The result of an experiment Flipping a coin example What are the odds of getting a heads? Poible uccefuloutco e n = 1 flip Probability = Allpoible outco e One outcome classified as heads Total of two outcomes = 1 2 = 0.5 Flipping a coin example n=2

Number of heads 2 1 1 What are the odds of getting two heads? One 2 heads outcome Four total outcomes = 0.25 0 This situation is known as the binomial # of outcomes = 2n Flipping a coin example n=2

Number of heads 2 1 1 0 What are the odds of getting at least one heads? Three at least one heads outcome Four total outcomes = 0.75 Flipping a coin example n=3 2n = 23 = 8 total outcomes

HHH Number of heads 3 HHT 2 HTH 2 HTT 1 THH 2 THT

1 TTH 1 TTT 0 Flipping a coin example Number of heads 3 Distribution of possible outcomes probability (n = 3 flips) .4 .3

.2 .1 .125 .375 .375 .125 0 1 2 3 Number of heads 2 X f p 3 2 1 1 3 3 .125

.375 .375 2 0 1 .125 1 1 2 1 0 Flipping a coin example Distribution of possible outcomes probability

(n = 3 flips) .4 .3 .2 .1 .125 .375 .375 .125 0 1 2 3 Number of heads Can make predictions about likelihood of outcomes based on this distribution. Whats the probability of flipping three heads in a row? p = 0.125 Flipping a coin example Distribution of possible outcomes probability

(n = 3 flips) .4 .3 .2 .1 .125 .375 .375 .125 0 1 2 3 Number of heads Can make predictions about likelihood of outcomes based on this distribution. Whats the probability of flipping at least two heads in three tosses? p = 0.375 + 0.125 = 0.50 Flipping a coin example Distribution of possible outcomes probability

(n = 3 flips) .4 .3 .2 .1 .125 .375 .375 .125 0 1 2 3 Number of heads Can make predictions about likelihood of outcomes based on this distribution. Whats the probability of flipping all heads or all tails in three tosses? p = 0.125 + 0.125 = 0.25 Hypothesis testing Distribution of possible outcomes (of a particular sample size, n) Can make predictions about

likelihood of outcomes based on this distribution. In hypothesis testing, we compare our observed samples with the distribution of possible samples (transformed into standardized distributions) This distribution of possible outcomes is often Normally Distributed Inferential statistics Hypothesis testing Core logic of hypothesis testing Considers the probability that the result of a study could have come about if the experimental procedure had no effect If this probability is low, scenario of no effect is rejected and the theory behind the experimental procedure is supported A five step program

Step 1: State your hypotheses Step 2: Set your decision criteria Step 3: Collect your data Step 4: Compute your test statistics Step 5: Make a decision about your null hypothesis Hypothesis testing Hypothesis testing: a five step program Step 1: State your hypotheses: as a research hypothesis and a null hypothesis about the populations Null hypothesis (H0) This is the one that you test There are no differences between conditions (no effect of treatment) Research hypothesis (HA) Generally, not all groups are equal You arent out to prove the alternative hypothesis

If you reject the null hypothesis, then youre left with support for the alternative(s) (NOT proof!) Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses In our memory example experiment: One -tailed Our theory is that the treatment should improve memory (fewer errors). H0: Treatment > No Treatment HA: Treatment < No Treatment Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses In our memory example experiment: direction One -tailed specified Our theory is that the treatment should improve
memory (fewer errors). no direction specified Two -tailed Our theory is that the treatment has an effect on memory. H0: Treatment > No Treatment H0: Treatment = No Treatment HA: Treatment < No Treatment HA: Treatment No Treatment One-Tailed and Two-Tailed Hypothesis Tests Directional hypotheses One-tailed test
Nondirectional hypotheses Two-tailed test Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses Step 2: Set your decision criteria Your alpha () level will be your guide for when to reject or fail to reject the null hypothesis. Based on the probability of making making an certain type of error Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses Step 2: Set your decision criteria Step 3: Collect your data Testing Hypotheses Hypothesis testing: a five step program

Step 1: State your hypotheses Step 2: Set your decision criteria Step 3: Collect your data Step 4: Compute your test statistics Descriptive statistics (means, standard deviations, etc.) Inferential statistics (z-test, t-tests, ANOVAs, etc.) Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses Step 2: Set your decision criteria Step 3: Collect your data Step 4: Compute your test statistics Step 5: Make a decision about your null hypothesis

Based on the outcomes of the statistical tests researchers will either: Reject the null hypothesis Fail to reject the null hypothesis This could be correct conclusion or the incorrect conclusion Error types Type I error (): concluding that there is a difference between groups (an effect) when there really isnt. Sometimes called significance level or alpha level We try to minimize this (keep it low) Type II error (b): concluding that there isnt an effect, when there really is. Related to the Statistical Power of a test (1-b) Error types There really isnt an effect Reject

H0 Experimenters conclusions Fail to Reject H0 Real world (truth) H0 is correct H0 is wrong There really is an effect Error types Real world (truth) I conclude that

there is an effect H0 is correct Reject H0 Experimenters conclusions Fail to Reject H0 I cant detect an effect H0 is wrong Error types Real world (truth) Reject

H0 Experimenters conclusions Fail to Reject H0 H0 is correct Type I error H0 is wrong Type II error

Performing your statistical test What are we doing when we test the hypotheses? Real world (truth) H0: is true (no treatment effect) H0: is false (is a treatment effect) One population Two populations XA the memory treatment sample are the same as those in the population of memory patients. XA they arent the same as those in the population of memory patients

Performing your statistical test What are we doing when we test the hypotheses? Computing a test statistic: Generic test Could be difference between a sample and a population, or between different samples observed difference test statistic = difference expected by chance Based on standard error or an estimate of the standard error Generic statistical test The generic test statistic distribution (think of this as the distribution of sample means) To reject the H0, you want a computed test statistics that is large Whats large enough? The alpha level gives us the decision criterion Distribution of the test statistic

-level determines where level determines where these boundaries go Generic statistical test The generic test statistic distribution (think of this as the distribution of sample means) To reject the H0, you want a computed test statistics that is large Whats large enough? The alpha level gives us the decision criterion Distribution of the test statistic If test statistic is here Reject H0 If test statistic is here Fail to reject H0 Generic statistical test The alpha level gives us the decision criterion Two -tailed One -tailed

= 0.05 Reject H0 Reject H0 0.025 split up into the two tails 0.025 Fail to reject H0 Reject H0 Fail to reject H0 Fail to reject H0 Generic statistical test The alpha level gives us the decision criterion Two -tailed

One -tailed = 0.05 all of it in one tail Reject H0 Reject H0 0.05 Fail to reject H0 Reject H0 Fail to reject H0 Fail to reject H0 Generic statistical test The alpha level gives us the decision criterion Two -tailed One -tailed

= 0.05 Reject H0 all of it in one tail Reject H0 0.05 Fail to reject H0 Reject H0 Fail to reject H0 Fail to reject H0 Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. After the treatment they have an

average score of X = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, = 60, = 8? Step 1: State your hypotheses H0: the memory treatment sample are the same as those in the population of memory patients. Treatment = pop = 60 HA: they arent the same as those in the population of

memory patients Treatment pop 60 Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. After the treatment they have an average score of X = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, = 60, = 8? H0: Treatment = pop = 60 HA: Treatment pop 60 Step 2: Set your decision

criteria = 0.05 One -tailed Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. After the treatment they have an average score of X = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, = 60, = 8? H0: Treatment = pop = 60 HA: Treatment pop 60 One -tailed

= 0.05 Step 3: Collect your data Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. After the treatment they have an average score of X = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, = 60, = 8? H0: Treatment = pop = 60 HA: Treatment pop 60 One -tailed

= 0.05 Step 4: Compute your test statistics X X 55 60 = zX = 8 X = -2.5 16

Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. H0: Treatment = pop = 60 HA: Treatment pop 60 = 0.05 One -tailed zX = 2.5 Step 5: Make a decision After the treatment they have an about your null hypothesis average score of X = 55 memory errors. How do they compare to the general population of memory patients who have 5% a distribution of memory errors that is

Normal, = 60, = 8? -2 -1 Reject H0 1 2 Generic statistical test An example: One sample z-test Memory example experiment: We give a n = 16 memory patients a memory improvement treatment. After the treatment they have an average score of X = 55 memory errors. How do they compare to the general population of memory patients who have

a distribution of memory errors that is Normal, = 60, = 8? H0: Treatment = pop = 60 HA: Treatment pop 60 One -tailed = 0.05 zX = 2.5 Step 5: Make a decision about your null hypothesis - Reject H0 - Support for our HA, the evidence suggests that the treatment decreases the number of memory errors