Skip to content

Commit cd3faaa

Browse files
committed
Add week 7.
1 parent a18c45f commit cd3faaa

File tree

4 files changed

+543
-0
lines changed

4 files changed

+543
-0
lines changed

Diff for: lecture/10.04/blank_notes.Rmd

+156
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
title: "Stat 33A - Lecture Notes 7"
3+
date: Oct 4, 2020
4+
output: pdf_document
5+
---
6+
7+
8+
Exploratory Data Analysis
9+
=========================
10+
11+
What does it mean to "explore" data?
12+
13+
* Look for patterns (examine variation in the data)
14+
* Look for errors in the data
15+
* Look for relationships between variables
16+
* Look at data to get an overview (what data are present?)
17+
* Check assumptions (model, conclusions, etc)
18+
19+
What are the techniques to "explore" data?
20+
21+
* Make plots
22+
* Compute summary statistics
23+
* Fit models (including hypothesis tests, machine learning)
24+
25+
26+
The table below has _suggestions_ for choosing an appropriate plot
27+
based on the data types.
28+
29+
You also need to think about what you're trying to convey.
30+
31+
First Feature | Second Feature | Plot
32+
-------------- | ---------------- | ----
33+
categorical | | bar, dot
34+
categorical | categorical | bar, dot, mosaic
35+
numerical | | box, density, histogram
36+
numerical | categorical | box, density
37+
numerical | numerical | line, scatter, smooth scatter
38+
39+
40+
Again we'll use the dogs data:
41+
```{r}
42+
43+
```
44+
45+
Example: How many dogs are there in each group (toy, sporting, etc)?
46+
47+
```{r}
48+
49+
```
50+
51+
Example: What's the distribution of datadog scores?
52+
53+
```{r}
54+
55+
```
56+
57+
Example: How are size and height related?
58+
59+
```{r}
60+
61+
```
62+
63+
64+
Distribution Plots
65+
==================
66+
67+
For numeric features, we typically use box, histogram, or density plots.
68+
69+
70+
Example: How does height vary for different groups of dogs?
71+
72+
What can we do to display these?
73+
* side-by-side box plots
74+
* overlapping density plots
75+
76+
77+
Let's start with a box plot:
78+
```{r}
79+
80+
```
81+
82+
83+
How can we display the groups in a density plot?
84+
```{r}
85+
86+
```
87+
88+
Too many lines!
89+
90+
You can use a ridge plot instead to show many densities at once:
91+
```{r}
92+
# install.packages("ggridges")
93+
94+
```
95+
96+
97+
Faceted Plots
98+
=============
99+
100+
Side-by-side plots are called _faceted_ plots.
101+
102+
Can we make the group vs height dogs plot using faceted plots?
103+
104+
105+
The `facet_wrap()` function lays out facets in rows (to use screen space
106+
efficiently).
107+
108+
The syntax is:
109+
```
110+
facet_wrap(vars(FEATURE))
111+
```
112+
113+
114+
For example:
115+
```{r}
116+
117+
```
118+
119+
120+
The `facet_grid()` function lays out facets in a grid. The syntax is:
121+
```
122+
facet_grid(ROWS ~ COLUMNS)
123+
```
124+
Use `.` as a placeholder if you only want one feature.
125+
126+
127+
For example:
128+
```{r}
129+
130+
```
131+
132+
133+
When should you use facets versus aesthetics?
134+
135+
Use facets when aesthetics would put too much information on the plot (too many
136+
lines, too many points, etc).
137+
138+
Use aesthetics when there is less information to show; facets tend to use space
139+
less efficiently than aesthetics.
140+
141+
Overall, think about the reader. There is no rule that always holds here.
142+
143+
144+
EDA Strategy
145+
============
146+
147+
See the lecture slides.
148+
149+
150+
151+
EDA Examples
152+
============
153+
154+
```{r}
155+
156+
```

0 commit comments

Comments
 (0)