Data

Unit Review Sheet

These facts and definitions should be mastered throughout this unit. This page can be used for periodic review and study as you are finishing the unit and in the future.

Facts and Definitions

Lesson 1: Statistics Review

A measure of center is a number that identifies the center value of a distribution.
The mean of a data set is the average of all the data values in the set.
$Mean = \frac{Sum of all values}{Number of values}$
The median is the middle value in a data set that is arranged in order from smallest to largest.
To find the median of a data set with an even number of values, add the two middle numbers and divide the sum by 2.
The mode of a data set is the value that occurs the most frequently.
Mean, median, and mode are measures of center.
Variability describes how spread out or different the numbers in a data set are.
The range is the difference between the largest and the smallest values in a data set.
The interquartile range (IQR) measures how spread out the middle 50% of a data set is.
$IQR = Q_{3} - Q_{1}$
The mean absolute deviation (MAD) measures how far, on average, each data value is from the mean.
$MAD = \frac{sum of absolute differences from the mean}{number of data values}$
A box plot is a graph that shows the center and spread of a data set using five key numbers: minimum, maximum, median, first quartile, and third quartile.
In a box plot the rectangle always shows the middle 50% of the data set.

Lesson 2: Scatterplots

A scatter plot is a graph of plotted points that shows the relationship between two sets of data.
A linear relationship occurs when two variables show a pattern that forms a straight line on a scatterplot.
A positive linear relationship means that the variables move in the same direction. As x is increasing, y is also increasing.
A negative linear relationship means that the variables move in the opposite direction. As x is increasing, y is decreasing.
When points are scattered randomly on a scatterplot, the variables have no relationship.
A cluster is a group of data points that are close together.
An outlier is a data point that is separated from the rest of the points.
On a scatterplot, a trend is the general direction in which data points move.
A best fit line is a straight line that shows the overall trend in a scatterplot.
High variability means the points are widely scattered around the best fit line, showing a weaker relationship between the variables.
Low variability means the points cluster closely around the best fit line, showing a stronger relationship between the variables.
Correlation means two things change together.
Causation means a change in one variable influences a change in the other variable.

Lesson 3: Constructing a Scatter Plot

[none]

Lesson 4: Linear Models

A linear model is a straight‑line equation that shows how two things are connected. It helps predict what will happen next by showing how much the dependent variable ( $y$ ) changes whenever the independent variable ( $x$ ) changes.
Slope means how much $y$ changes compared to how much $x$ changes.

Lesson 5: Categorical Data

Categorical Data is data that can be sorted into named groups based on shared qualities (ex: color, type of pet, or zip code).
Numerical Data refers to data that represents measurable quantities or amounts (ex: height, money, or temperature).
A two‑way frequency table is a table that displays data for two categorical variables at the same time. Each cell represents the frequency of cases that fall into both categories.
A two-way relative frequency table shows the proportions or percentages of data in each category instead of raw counts.

Lesson 6: Unit 8 Test

[none]

Final Project: Collecting and Organizing Data

[none]