Lab 0

Hello, World and STA 199!

Lab
End of lab on May 15

This lab will set you up for the computing workflow and give you an opportunity to introduce yourselves to each other and the teaching team.

Hello, World!

You may have heard/seen this phrase, Hello, World!, elsewhere before. It’s usually the first thing you learn in programming – to learn to write a computer program to print this sentence to screen. Things will be different in this course, as it’s not a programming, but a data science course. So, starting tomorrow in class, you’ll learn to use a computer program (called R) to work with data.

But today, we need to set you up for success! Let’s first briefly review the components of the computational toolkit for the course:

  1. R: The programming language you’ll learn in this course.

  2. RStudio: The piece of software (a.k.a. the integrated development environment, IDE) you’ll use to write R code in.

Note

R is the name of the programming language itself and RStudio is a convenient interface.

  1. Quarto: The tool you’ll use to create reproducible computational documents that contain both your narrative (i.e., words in English) and your code (i.e., code in R). Every piece of assignment you hand in will be a Quarto document.
Note

You might be familiar with word processors like MS Word or Google Docs. We will not be using these in this class. Instead, the words you would write in such a document as well as the code will go into a Quarto document, and when you render the document (more on what this means later) you will get a document out that has your words, your code, and the output of that code. Everything in one place, beautifully formatted!

  1. Git: Version control system.
  2. GitHub: A web hosting service for the Git version control system that also allows for transparent collaboration between team members.
Note

Git is a version control system (like “Track Changes” features from Microsoft Word but more powerful) and GitHub is the home for your Git-based projects on the internet (like DropBox but much better).

Access R and RStudio

  • Go to https://cmgr.oit.duke.edu/containers and login with your Duke NetID and Password.
  • Click STA198-199 to log into the Docker container. You should now see the RStudio environment.

Go to https://cmgr.oit.duke.edu/containers and under Reservations available click on reserve STA 198-199 to reserve a container for yourself.

Note

A container is a self-contained instance of RStudio for you, and you alone. You will do all of your computing in your container.

Once you’ve reserved the container you’ll see that it will show up under My reservations.

To launch your container click on it under My reservations, then click Login, and then Start.1

Create a GitHub account

You should have already done this yesterday during class! Go ahead and check to make sure you remember what email address you used with your GitHub account.

Set up your SSH key

You will authenticate GitHub using SSH (Secure Shell Protocol – it doesn’t really matter what this means for the purpose of this course). Below is an outline of the authentication steps; you are encouraged to follow along as your TA demonstrates the steps.

Note

You only need to do this authentication process one time on a single system.

  • Go back to your RStudio container and type credentials::ssh_setup_github() into your console.
  • R will ask “No SSH key found. Generate one now?” You should click 1 for yes.
  • You will generate a key. R will then ask “Would you like to open a browser now?” You should click 1 for yes.
  • You may be asked to provide your GitHub username and password to log into GitHub. After entering this information, you should paste the key in and give it a name. You might name it in a way that indicates where the key will be used, e.g., sta199).

You can find more detailed instructions here if you’re interested.

Configure Git to introduce yourself

There is one more thing we need to do before getting started on the assignment. Specifically, we need to configure your git so that RStudio can communicate with GitHub. This requires two pieces of information: your name and email address.

To do so, you will use the use_git_config() function from the usethis package.

Note

You’ll hear about 📦 packages a lot in the context of R – basically they’re how developers write functions and bundle them to distribute to the community (and more on this later too!).

Type the following lines of code in the console in RStudio filling in your name and the address associated with your GitHub account.

usethis::use_git_config(
  user.name = "Your name", 
  user.email = "Email associated with your GitHub account"
)

For example, mine would be

usethis::use_git_config(
  user.name = "Marie Neubrander", 
  user.email = "mneubrander@crimson.ua.edu"
)
Note

I used my Alabama email because that is the one I used to create my GitHub account. You should also be using the email address you used to create your GitHub account, even if it isn’t your Duke email.

You are now ready interact with GitHub via RStudio!

If you don’t do this, we won’t be able to tell who you are and give you points for the work you do…

Note

Well, a more accurate statement would be that we will be reaching out to you to get things right during the first few weeks of classes. But, it’s best if you can get it right today!

How do I know if I did it right?

When you input the usethis::use_git_config(...) command into the console and hit enter/return, it will appear as if nothing happened. To verify that everything worked, you can briefly switch over to the Terminal tab (should be to the right of Console) and type the commands git config user.name and git config user.email. If all is well, these will return whatever text you originally provided. If you notice a mistake or typo, you can just go back to the Console and rerun usethis::use_git_config(...) with modified inputs.

The following video walks you through the steps outlined in the SSH key generation and Git configuration sections above.

Clone first repo

A GitHub repository (repo) is a collection of files hosted on GitHub. Repos are the main way we will distribute files to you during the semester. When you copy the files in a repo to your local computing environment (your container, in this case), that’s called “cloning”. So, let’s clone our first repo:

  • Go to the course organization at github.com/sta199-su25 organization on GitHub. Click on the repo with the prefix ae. This is your application exercise (AE) repo. It contains the starter documents you need to complete your AE for today’s lecture (and will contain the starter files for future AEs).

  • Click on the green CODE button, select Use SSH (this might already be selected by default, and if it is, you’ll see the text Clone with SSH). Click on the clipboard icon to copy the repo URL.

  • In RStudio, go to FileNew ProjectVersion ControlGit.

  • Copy and paste the URL of your assignment repo into the dialog box Repository URL. Again, please make sure to have SSH highlighted under Clone when you copy the address.

  • Click Create Project, and the files from your GitHub repo will be displayed in the Files pane in RStudio.

You will need to get used to these steps, because you’ll probably clone at least one new repo every week.

Hello STA 199!

Fill out the course “Getting to know you” survey on Canvas.

We will use the information collected in this survey for a variety of goals, including getting to know you as a person and your course goals and concerns.

Footnotes

  1. Yes, it’s too many steps. I don’t know why! But it works, and you’ll get used to it. Trust me, it beats downloading and installing everything you need on your computers!↩︎