
Lecture 9
May 28, 2025
Practice + some review + many more details tomorrow in class
Initial practice questions are posted!
Please read comments on ALL questions. Some mistakes we did not penalize for, but could in future problems.
In the first few questions, the point adjustment feature was used for partial credit.
Regrade requests: submit within a week if you believe a grading mistake was made.
Code style was the biggest common issue.
Even though most questions had code style points, we did not double penalize on this lab.
There were some minor problems we did not take off for this time but left comments about.
spaces before and line breaks after each + when building a ggplot,
spaces before and line breaks after each |> in a data transformation pipeline,
code should be properly indented
this should match the automatic indentation when you hit enter
to fix: highlight code, click code option on top of RStudio, select reindent lines
there should be spaces around = signs and spaces after commas
use |> and not %>%
use <- and not = for saving a data frame
Question 7c: No code needed! Use results 7a and 7b to answer. You can delete the code chunk from the template!
Question 10b: ‘filter your reshaped dataset from question 10’ should be from question 9
Question 2c: ‘How is the output different from the one in part (a)?’ should say from part (b)
Select pages on gradescope when you submit
At least three commits to your lab 2 repo
These should hopefully be free points!!!

What is a ‘typical value’ of population density?

# A tibble: 1 × 2
mean med
<dbl> <dbl>
1 3098. 1156.
Before Midterm 1…
After Midterm 1…

library()), data importing (read_csv), and webscraping (eventually)mutate, fct_relevel, pivot_*, *_join
ggplot, geom_*, etcsummarize, group_by, count, mean, median, sd, quantile, IQR, cor, etcWhen data is in a pack, such as tidyverse, loading the pacakge gets our dataset
Most often, this is not the case

read_csv() - file saved as .csv
read_tsv(), read_delim(), etc - other file formatsread_excel() - .xls or .xlsxread_sheet() – We haven’t covered this in the videos, but might be useful for your projectsGenerally, the format is:
Where is durham-climate.csv?


use / to separate folder(s) + file names; file path in quotes
Answer:
ae-mneubrander?Where is durham-climate.csv?



This allows us to save data for later usage, sharing outside of R, etc.
Using write_csv():
Read a CSV file with tidy data
Split it into subsets based on features of the data
Write out subsets as CSV files
case_when() is similar to if_else(), but allows multiple casescase_when() is often used in mutate() to make a new column
Read an Excel file with non-tidy data
Tidy it up!
We’ve seen lots of functions that deal with numeric data (mean, median, sum, etc.) - what about characters?
stringr is a tidyverse package with lots of functions for dealing with character strings
today: str_detect in stringr

str_detect() identifies if some characters are a substring of a larger string
useful in cases when you need to check some condition, for example:
in a filter()
in an if_else() or case_when()
str_detect() identifies if some characters are a substring of a larger string
useful in cases when you need to check some condition, for example:
in a filter()
in an if_else() or case_when()
example: which classes in a list are in the stats department?
General form:

Are these data tidy? Why or why not?
What “data moves” do we need to go from the original, non-tidy data to this, tidy one?
