Skip to content

Commit bfcbbd0

Browse files
committed
Add week 9.
1 parent 6a325a7 commit bfcbbd0

File tree

4 files changed

+1032
-0
lines changed

4 files changed

+1032
-0
lines changed

Diff for: lecture/10.18/blank_notes.Rmd

+359
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,359 @@
1+
---
2+
title: "Stat 33A - Lecture Notes 9"
3+
date: Oct 18, 2020
4+
output: pdf_document
5+
---
6+
7+
8+
9+
Apply Function Basics
10+
=====================
11+
12+
Doing the same operation repeatedly is a common pattern in programming.
13+
14+
Vectorization is one way, but not all functions are vectorized.
15+
16+
17+
In R, the "apply functions" are another way to do something repeatedly.
18+
19+
The apply functions call a function on each element of a vector or list.
20+
21+
22+
23+
The `lapply()` Function
24+
---------------------
25+
26+
The first and most important apply function is `lapply()`. The syntax is:
27+
```
28+
lapply(X, FUN, ...)
29+
```
30+
31+
The function `FUN` is called once for each element of `X`, with the element as
32+
the first argument. The `...` is for additional arguments to `FUN`, which are
33+
held constant across all calls.
34+
35+
36+
Unrealistic example:
37+
```{r}
38+
39+
```
40+
In practice, it's clearer and more efficient to use vectorization here.
41+
42+
43+
Let's use the dogs data for some realistic examples:
44+
```{r}
45+
46+
```
47+
48+
`lapply()` always returns the result as a list.
49+
50+
"l" for **list** result.
51+
52+
53+
54+
The `sapply()` Function
55+
---------------------
56+
57+
`sapply()` simplifies the result to a vector, when possible.
58+
59+
"s" for **simplified** result.
60+
61+
Examples:
62+
```{r}
63+
64+
```
65+
66+
The `sapply()` function is useful if you are working interactively.
67+
68+
69+
70+
71+
72+
73+
Apply Function Examples
74+
=======================
75+
76+
The California Counties AQI data set is available on the bCourse (`aqi.zip`).
77+
78+
Let's load one of the files:
79+
```{r}
80+
81+
```
82+
83+
What are the classes of the columns?
84+
```{r}
85+
86+
```
87+
88+
How can we load all of the files?
89+
```{r}
90+
91+
```
92+
93+
The `rbind` function combines two data frames by stacking them together:
94+
```{r}
95+
96+
```
97+
98+
The data frames we want to stack are in a list.
99+
100+
How can we call `rbind` on all of them?
101+
102+
103+
The `do.call` function calls a function using a list as the arguments:
104+
```{r}
105+
106+
```
107+
108+
How can we convert multiple columns to a different class?
109+
```{r}
110+
111+
```
112+
113+
114+
Are there any missing values in the columns?
115+
```{r}
116+
colSums(sapply(aqi_df, is.na))
117+
```
118+
119+
How can we compute summary statistics about the numeric columns?
120+
```{r}
121+
is_numeric = sapply(aqi_df, is.numeric)
122+
sapply(aqi_df[is_numeric], mean)
123+
```
124+
125+
126+
127+
128+
129+
130+
131+
132+
133+
134+
135+
The Split-Apply Strategy
136+
========================
137+
138+
The `split()` function splits a vector or data frame into groups based on some
139+
other vector (usually congruent).
140+
141+
```{r}
142+
143+
```
144+
145+
146+
Split weight of dogs by the group column:
147+
```{r}
148+
149+
```
150+
151+
The `split()` function is especially useful when combined with `lapply()` or
152+
`sapply`().
153+
154+
```{r}
155+
156+
```
157+
This is an R idiom!
158+
159+
160+
161+
The `tapply()` Function
162+
---------------------
163+
164+
The `tapply()` function is equivalent to the `split()` and `sapply()` idiom.
165+
166+
"t" for **table**, because `tapply()` is a generalization of the
167+
frequency-counting function `table()`.
168+
169+
170+
Examples:
171+
```{r}
172+
173+
```
174+
175+
This strategy is important for analyzing tabular data regardless of what
176+
programming language or packages you're using.
177+
178+
179+
180+
Split-Apply and dplyr
181+
=====================
182+
183+
We'll use the dogs data here:
184+
```{r}
185+
186+
```
187+
188+
The split-apply strategy is often used to compute grouped statistics.
189+
190+
For example, we can compute the mean weight of the dogs by group:
191+
```{r}
192+
193+
```
194+
195+
The `aggregate` function does the same thing as `tapply`, but returns a data
196+
frame:
197+
```{r}
198+
199+
```
200+
201+
The dplyr `group_by` and `summarize` functions are another form of split-apply:
202+
```{r}
203+
204+
```
205+
206+
207+
208+
Choosing an Apply Function
209+
==========================
210+
211+
212+
1. Is the function you want to apply vectorized?
213+
214+
If yes, use vectorization.
215+
216+
Otherwise, continue to #2.
217+
218+
219+
2. Do you want to apply the function to elements or to groups?
220+
221+
For elements, continue to #3.
222+
223+
For groups, use the split-apply pattern. Use `split()`, then
224+
continue to #3 to choose an apply function.
225+
226+
Note `tapply()` is equivalent to `split()` and `sapply()`.
227+
228+
229+
3. Will the function return the same data type for each element?
230+
231+
If yes, continue to #4.
232+
233+
Otherwise, use `lapply()`.
234+
235+
236+
4. Are you working interactively?
237+
238+
If yes, use `sapply()`.
239+
240+
Otherwise, use `vapply()`.
241+
242+
243+
Other Apply Functions
244+
---------------------
245+
246+
See this StackOverflow Post for a summary:
247+
248+
https://stackoverflow.com/a/7141669
249+
250+
251+
The purrr and dplyr packages provide Tidyverse alternatives to apply functions.
252+
253+
254+
255+
Conditional Expressions
256+
=======================
257+
258+
Sometimes you'll need code to do different things, depending on a condition.
259+
260+
_If-statements_ provide a way to write conditional code.
261+
262+
263+
For example, suppose we want to greet one person differently from the others:
264+
```{r}
265+
266+
```
267+
268+
Indent code inside of the if-statement by 2 or 4 spaces.
269+
270+
Indentation makes your code easier to read.
271+
272+
273+
274+
The condition has to be a scalar:
275+
```{r}
276+
277+
```
278+
279+
You can chain together if-statements:
280+
```{r}
281+
282+
```
283+
284+
If-statements return the value of the last expression in the evaluated block:
285+
```{r}
286+
287+
```
288+
289+
Curly braces `{ }` are optional for single-line expressions:
290+
```{r}
291+
292+
```
293+
294+
But you have to be careful if you don't use them:
295+
```{r}
296+
297+
```
298+
299+
The `else` block is optional:
300+
```{r}
301+
302+
```
303+
304+
When there's no `else` block, the value of the `else` block is `NULL`:
305+
```{r}
306+
307+
```
308+
309+
310+
311+
The Congruent Vectors Strategy
312+
==============================
313+
314+
If-statements don't work well with vectors.
315+
316+
For example, suppose we want to transform a vector `x` so that:
317+
318+
* Negative elements are set to 0.
319+
* Positive elements are squared.
320+
321+
Using an if-statement doesn't work for this:
322+
```{r}
323+
324+
```
325+
326+
327+
Instead, use congruent vectors:
328+
329+
1. An input vector (or vectors) to use in conditions.
330+
331+
2. An output vector to store the results.
332+
333+
Use the input vector to conditionally assign elements to the output vector.
334+
335+
336+
So:
337+
```{r}
338+
339+
```
340+
341+
342+
Another example:
343+
```{r}
344+
345+
```
346+
347+
348+
The `ifelse()` Function
349+
-----------------------
350+
351+
R also has a vectorized `ifelse()` function.
352+
353+
For example:
354+
```{r}
355+
356+
```
357+
358+
The `ifelse()` function is less efficient than a regular if-statement or the
359+
congruent vectors strategy.

0 commit comments

Comments
 (0)