Section 1.3 Visualizing Data (B3)
Visualizing data appropriately can let us apply visual intuition to our understanding, extending beyond computational prowess.
In this section, we cover several types of data visualizations and how to generate them.
Run the following code to download the hsb2.csv
data set:
This data set represents two hundred observations who were randomly sampled from the High School and Beyond survey, a survey conducted on high school seniors by the National Center of Education Statistics.
Subsection 1.3.1 Numerical Data
Definition 1.3.1.
A scatterplot compares two numerical variables. For each case in the sample, the values of two variables are plotted respectively on the cartesian plane.
Activity 1.3.1. School Scatter Plot.
Run the following code to plot a scatter plot comparing student reading scores to their math scores.
(a)
What does each dot represent? What do it's coordinates represent?
(b)
What sort of relationship between these variables do you notice?
(c)
Follow this link and identify two numerical variables you would like to compare. https://www.openintro.org/data/index.php?data=hsb2
.
(d)
Fix the following code and run it to plot a scatter plot comparing your two choices.
(e)
What does each dot represent? What do it's coordinates represent?
(f)
What sort of relationship between these variables do you notice?
Definition 1.3.2.
A histogram consists of contiguous boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled either frequency (the number of time a value(s) occurs) or relative frequency (the percentage or proportion of time(s) a value occurs).
Activity 1.3.2. School Histogram.
Run the following code to plot a histogram for student's science scores.
(a)
What do the \(x\) values represent? What do the heights represent?
(b)
What do you notice about this graph?
(c)
Add the following to plot a density distribution instead of a frequency distribution:
(d)
Follow this link and identify a numerical variable you would like to plot a histogram for. https://www.openintro.org/data/index.php?data=hsb2
.
(e)
Fix the following code and run it to plot the histogram of your choice.
(f)
What do the \(x\) values represent? What do the heights represent?
(g)
What do you notice about this graph?
(h)
Modify the code to turn it into a desnity distribution.
Note that for smaller data sets, it can be more convenient to use Desmos
to produce a histogram. The data is placed within the list L
.
Subsection 1.3.2 Categorical Data
Since Categorical variables have no numerical values, the ways we choose to display them revolve around frequency.
Definition 1.3.3.
A bar chart similar to a histogram, plots the frequency of density of categorical variables.
Activity 1.3.3. School Bar Chart.
Run the following code to plot a bar chart for the racial demographics of students.
(a)
What does each bar represent? What do the heights represent?
(b)
Follow this link and identify a categorical variable you would like to plot a bar chart for. https://www.openintro.org/data/index.php?data=hsb2
.
(c)
Fix the following code and run it to plot the bar chart of your choice.
(d)
What do you notice about this graph?
(e)
What do the bars and heights represent?
If we want to compare different groups to each other, it may make sense to utilize a grouped bar chart.
Activity 1.3.4. School Grouped Bar Chart.
Run the following code to plot a grouped bar chart for the program type of schools, broken down by public vs private.
(a)
What does each bar represent? What do the heights represent?
(b)
How do public and private schools differ?
(c)
Follow this link and identify two categorical variables you would like to plot a group bar chart for. https://www.openintro.org/data/index.php?data=hsb2
.
(d)
Fix the following code and run it to plot the bar chart of your choice.
(e)
What do you notice about this graph?
(f)
What do the bars and heights represent?
Definition 1.3.4.
A Mosaic plot counts the relative frequency of cases with respect to two categorical variables and displays their relative frequency as areas.
Activity 1.3.5. School Mosaic Plot.
Run the following code to plot a mosaic plot for the program type of schools, broken down by public vs private.
(a)
What does each rectangle represent? What do the areas represent?
(b)
How does this plot highlight the difference between academic, general and vocational schools?
(c)
Follow this link and identify two categorical variables you would like to plot a mosaic plot for. https://www.openintro.org/data/index.php?data=hsb2
.
(d)
Fix the following code and run it to plot the bar chart of your choice.
(e)
What do you notice about this graph?
(f)
What do the rectangles and areas represent?