"What does it mean to do empirical social science? Asking good questions. Digging up novel data. Designing statistical analysis. Writing up results.
For many of us, most of the time, what it means is writing and debugging code. We write code to clean data, to transform data, to scrap data, and to merge data. We write code to execute statistical analyses, to simulate models, to format results, to produce plots. We stare at, puzzle over, fight with, and curse at code that isn’t working the way we expect it to. We dig through old code trying to figure out what we were thinking when we wrote it, or why we’re getting a different result from the one we got the week before."
Code and Data for Social Sciences: A Practictioner’s Guide (Matthew Gentzkow and Jesse M. Shapiro)
Classes will be held in a computer room. You will be able to use either one of PSE computer or your personal laptop. In either case, please download R and RStudio before the first class on your laptop. You can find installation instructions here. If you already have R and RStudio on your laptop, make sure to have the latest versions. You will use R and RStudio for several classes.
Class 1: Introduction to data wrangling with the tidyverse
Class 2: Inspecting, cleaning, plotting and analyzing data
Class 3: Writing LaTeX articles and making websites using R Markdown
Class 4: Working with string variables, functions through an introduction to web scraping with R
Throughout this course and in the future, you will spend quite some time debugging your code. The first thing to acknowledge is that you are probably not the first one to run into this particular issue. You can search for help on these two very helpful forums: Stack Overflow and RStudio Community.
Keep in mind that when you are start using R, many issues arise because of packages: not installed, not loaded, etc.
I strongly encourage you to go over this online chapter on debugging once you are familiarized with R (especially its subsection on debugging code in RMarkdown: it might be useful for your assignments).
For the past decades, economics has relied more and more on data analysis. This has forced economists to learn some coding to use statistical softwares (e.g. R, Stata, Python). Unfortunately, economics students are rarely taught the basics of computer sciences and coding. We usually learn on-the-go.
You can find my attempt to gather the Good Coder Commandments here.
RStudio: Familiarize yourself with RStudio
Keyboard shortcuts in RStudio: Useful list of all the keyboard shortcuts in RStudio
Import data: How to import data with the tidyverse
Tidy data: How to tidy data with tidyr
Transform data: How to manipulate and summarize data with dplyr
Data visualization: How to graph your data with ggplot2
String variables: How to manipulate string variables with stringr
Date and time variables: How to deal with date and time variables with lubridate
Spatial data: How to manipulate spatial data with sf
Funtions: How to apply functions with purrr
RMarkdown 1/2: How to generate a RMarkdown (for your assignments!)
RMarkdown 2/2: Useful reference about RMarkodwn syntax and options
Code and Data for Social Sciences: A Practictioner’s Guide (Matthew Gentzkow and Jesse M. Shapiro)
IMPORTANT: all homeworks should be submitted in a zipped folder called classX_firstname_lastname.zip, where X is the number of the class, e.g. class1_hannah_bull.zip.