# 85:222: Chapter 2 Describing, Displaying, and Exploring ... Engineering Applications of Statistics (will be known to us eventually as we go through our syllabus) 1. 2. 3. 4. 5. 6. Statistical Process Control Quality Assessment Model Building and Predicting Communicating with and Acting on Experimental Results Assessing Design Reliability Experimental Design

VARIABLE AND DATA Variable A characteristic of a population or sample that is of interest to us. Example In case of Hudson Auto a variable is the average cost of parts used in engine tune-ups. VARIABLE AND DATA

Data The actual values of the variables Data may be quantitative, qualitative Example In case of Hudson Auto data are the actual costs of parts used in 50 engine tune-ups observed: 91 71 104 85 62 78 69 74 97 82

93 57 72 89 62 68 88 68 98 101 75 52 66 75 97 105 83 68 79 105 99 79 77 71 79

80 75 65 69 69 97 62 72 76 80 109 67 74 62 73 QUANTITATIVE DATA Quantitative data may have a ratio scale

Data possessing a natural zero point and organized into measures for which differences are meaningful. Examples: Money, income, sales, profits, losses, heights of NBA players Quantitative data may also have an interval scale The distance between numbers is a known, constant size, but the zero value is arbitrary. Examples: Temperature on the Fahrenheit scale. Types of Quantitative Data A) Nominal data: They are numbers that represent arbitrary data. Example: An engineering school, might use numbers to denote undergraduate majors. 1 For electrical engineering 2 For civil engineering and so on. Types of Quantitative Data

B) Ordinal data convey ranking in terms of importance, strength, or severity. Example: a value of 3 corresponds to gentle breeze A value of 6 corresponds to a strong breeze A value of 9 signifies a strong gale The change in force between a gentle and a strong sea breeze is not equal to that between a strong gale and a strong breeze, as the numbers themselves would misleadingly indicate. Types of Quantitative Data C) Interval data allow only addition and subtraction. Example: Temperature for which scales are arbitrarily chosen. A 100 degree Fahrenheit day is not twice as hot as 50 degree day since 100 degree/ 50 degree is not a meaningful ratio.

Types of Quantitative Data D) Ratio Data include times and many physical measurements of size, weight, or strength. The arithmetic operations of addition, subtraction, division, and multiplication are all valid with ratio data. Most statistical investigations involve arithmetic operations, which limits them to interval or ratio data. Example: income, sales, profits. QUALITATIVE DATA Qualitative data has a nominal scale Data that can only be classified into categories and

cannot be arranged in an ordering scheme. Examples: eye color, gender, marital status, religious affiliation, etc. Examples: Suppose that the responses to the marital-status question is recorded as follows: Single 1 Divorced 3 Married 2 Widowed 4 85:222: Chapter 2 Describing, Displaying, and Exploring Statistical Data Assuming you have collected a data set of your

interest The questions are: How can you make sense out of it? How can you organize and summarize the data set to make it more comprehensible and meaningful? Graphical techniques for Describing Quantitative Data The Frequency Distribution The most popular and traditional graphical method for describing quantitative data is the frequency histogram often called a frequency distribution.

HISTOGRAM Consider the following data that shows days to maturity for 40 short-term investments 70 62 75 57 51 64 38 56 53 31 99

67 71 47 63 55 70 51 50 66 64 60 99 55 85 89

69 68 81 79 87 78 95 80 83 65 39 86 98 70 HISTOGRAM

First, construct a frequency distribution An arrangement or table that groups data into nonoverlapping intervals called classes and records the number of observations in each class Approximate number of classes: See Table 2.3, p. 29 Number of observation Number of classes Less than 50 5-7 50-200 7-9 200-500 9-10 500-1,000 10-11 1,000-5,000

11-13 5,000-50,000 13-17 More than 50,000 17-20 HISTOGRAM Approximate class width is obtained as follows: Approximate class width Largest value - Smallest value Number of classes HISTOGRAM

Classes and counts for the days-to-maturity data Days to Maturity TALLY Number of Investments HISTOGRAM Class relative frequency is obtained as follows: Class frequency Class relative frequency

Total number of observations HISTOGRAM 12 Frequency 10 8 6 4 2 0 40 50 60

70 80 90 Number of Days to Maturity 100 HISTOGRAM Classes: Categories for grouping data. Frequency: The number of observations that fall in a class. Frequency distribution: A listing of all classes along with their frequencies. Relative frequency: The ratio of the frequency of a class

to the total number of observations. Relative-frequency distribution: A listing of all classes along with their relative frequencies. HISTOGRAM Lower cut point: The smallest value that can go in a class. Upper cut point: The smallest value that can go in the next higher class. The upper cut point of a class is the same as the lower cut point of the next higher class. Midpoint: The middle of a class, obtained by taking the average of its lower and upper cut points. Width: The difference between the upper and

lower cut points of a class. HISTOGRAM Frequency histogram: A graph that displays the classes on the horizontal axis and the frequencies of the classes on the vertical axis. The frequency of each class is represented by a vertical bar whose height is equal to the frequency of the class. Relative-frequency histogram: A graph that displays the classes on the horizontal axis and the relative frequencies of the classes on the vertical axis. The relative frequency of each class is represented by a vertical bar whose height is equal to the relative frequency of the class. RELATIVE FREQUENCY HISTOGRAM Relative-frequency distribution for the days-to-maturity data Days to

Maturity Relative Frequency RELATIVE FREQUENCY HISTOGRAM Relative Frequency 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 40 50

60 70 80 90 Number of Days to Maturity 100 FREQUENCY POLYGON A frequency polygon is a graph that

displays the data by using lines that connect points plotted for frequencies at the midpoint of classes. The frequencies represent the heights of the midpoints. FREQUENCY POLYGON 12 Frequency 10 8 6 4 2 0

35 45 55 65 75 85 Number of Days to Maturity 95 STEM-AND-LEAF DISPLAY

Here the raw data are arranged tabularly by locating each observation on a tree This is done by separating the values into a stem digit and a leaf digit. The main advantage is that it provides the essential features of the histogram; it can be seen easily by rotating the plot 90 degrees The disadvantage is that the stem-and-leaf plot becomes cumbersome if the number of observations is large STEM-AND-LEAF DISPLAY

Diagrams for days-to-maturity data: (a) stem-and-leaf Stem Leaves 3 4 5 6 7 8 9 (a)