You will need to download MiKTeX or TexLive for this class.
Windows: MiKTeX (Complete) - http://miktex.org/2.9/setup (NOTE: Be sure to download the Complete rather than Basic installation)
Mac OS X: MacTeX 2013+ (Full) - http://tug.org/mactex/ (NOTE: Downloading with Safari rather than Chrome is strongly recommended)
Linux: TexLive 2013+ (Full) - sudo apt-get install texlive-full
knitr
(command install.packages("knitr")
)Knit
(at the top of the window)Install the package devtools
Run: devtools::install_github('yihui/tinytex')
Open the MikTeX Console (find it in the search bar), and in settings select 'always install packages on the fly'
We will learn how to create any type of document (homework, paper, slides, etc…) in any format (PDF, html, beamer) using Rmarkdown. The main advantage of Rmarkdown is that it allows you to create a single document that includes both your text and your code.
This document can then be knitted : you can tell R to convert your document to the format of your choice. When doing so, R reads the plain text of your document, but also evaluates the code chunks that you have included in your document. You have at your disposal a bunch of options that allow you to tell R whether the R output and code should be displayed in the final document or not.
To see an example of the homework exercise, which is the goal of today's lesson, click here.
Our aim for today is to learn how to write a document that contains tables and graphs that you create in R.
The data for today’s exercise can be downloaded from here.
In order to open an Rmarkdown file, you can click on the + sign below the File tab that you usually use to open an R script. This time, you can select R Markdown. A window appears, where you can write down the name of your document and choose its type (PDF, html…). If you did not manage to install LaTeX on your computer before the session, select the html option.
Your document starts with a preamble, declaring information on the type and structure of the document.
---
title: "Title"
author: "First Name Last Name"
date: "September 11th, 2019"
output: pdf_document
---
The "output" line allows you to choose the document type. Here are the most commonly used document types: * pdf_document produces a pdf file * word_document produces a word file * beamer_presentation produces a beamer document, i.e. slides compiled with LaTeX * html_document produces an html file
You can structure your document with different sections and subsection using the # sign:
# Section
## Subsection
### Subsubsection
etc
You can then write your text as plain text. Here are a few useful commands that you can use to format your text.
Symbol | Effect |
---|---|
*italics* |
italics |
**bold** |
bold |
exponent^2^ |
exponent2 |
[link to Course website](https://introtor-pse.appspot.com/) |
link to Course website |
$y = \alpha + \beta x + \epsilon$ |
\(y = \alpha + \beta x + \epsilon\) |
The main advantage of using Rmarkdown is that you can include in your document pieces of code that might or might not be evaluated by R. These pieces of code are called Code chunks.
Here is how you can create a code chunk:
```{r}
Write R your code here.
```
Within the {}, you can specify in which software's syntax you're writing your code (in your case, this is always going to be R) as well as a set of options for your Code chunk. The syntax is hence {r, options}
.
The code options are:
Option | Effect |
---|---|
eval = FALSE | knitr will not run the code in the code chunk |
include = FALSE | knitr will run the chunk but not include the chunk in the final document |
echo = FALSE | knitr will not display the code in the code chunk above it’s results in the final document |
The results options are:
Option | Effect |
---|---|
results = 'hide' | knitr will not display the code’s results in the final document |
results = 'hold' | knitr will delay displaying all output pieces until the end of the chunk |
warning = FALSE | knitr will not display any warning messages generated by the code |
message = FALSE | knitr will not display any messages generated by the code |
Let's install and load our packages, and load our data. Since we do not want this code chunk to appear in the final document, we are going to use the include = FALSE
option.
```{r, include = FALSE, warning=FALSE, message=FALSE, error=FALSE}
list.of.packages <- c("tidyverse", "knitr", "haven", "stargazer")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")
library("tidyverse")
library("knitr")
library("haven")
library("stargazer")
df <- read_dta("Data/Score_US.dta") # you might use a different function depending on your file format.
```
Say we have predefined an object myvalue=42
. In order to display this value in the text, use ` r myvalue
` to print 42.
There are two ways to output tables in your final document:
write down the table manually in the plain text of your Rmarkdown document
output a table that you have created in your R code
Type this as plain text:
Row names | Values
----------| ------
Row 1 | Value 1
Row 2 | Value 2
It should look like this:
Row names | Values |
---|---|
Row 1 | Value 1 |
Row 2 | Value 2 |
You can insert an R table to your document using the kable
function. The syntax is as follows: kable(table_name, options)
. Here are the most useful options of the kable
function:
Option | Effect |
---|---|
align = c(“l”,“c”,“c”,“c”) | First column will be aligned to the left, second to fourth columns will be centered |
col.names = c(“Name 1”, “Name 2”, “Name 3”, “Name 4”) | Defines the columns headers |
Try to create a table containing some summary statistics for reading (X4RSCALK1
) and maths (X4MSCALK1
). The variables should be in rows and the different summary statistics (mean, sd, min, max) in columns. Remember that we want R to evaluate the code without displaying the code and its result in the document.
Revision question 1: The relevant variables have been selected for you, and the data has been converted from wide into long format for you. Group by the subject, then summarise the score, computing the mean, SD, min and max. Don't forget to ungroup at the end.
```{r, include = FALSE}
table <- df %>%
select(CHILDID, reading = X4RSCALK1, maths = X4MSCALK1) %>%
gather(key = subject, value = score, reading:maths) %>%
### FILL HERE (group_by)
### FILL HERE (summarise, creating variables Mean, SD, Min, Max)
### FILL HERE (ungroup)
```
Let’s now output this table to our document. Since we do not want to output the code to our document, we are going to use the echo= FALSE option:
```{r, echo = FALSE}
kable(table, caption = 'This is my caption')
```
subject | Mean | SD | Min | Max |
---|---|---|---|---|
maths | 63.40389 | 13.17846 | 15.5707 | 93.6786 |
reading | 70.02437 | 13.12185 | 25.2709 | 95.1282 |
You can use the R package stargazer
to make nice tables, specifically regression tables.
For example, you can use stargazer(table)
to produce a latex table of the previous table we created, or you can easily create a table of a regression. See the package documentation for more details. You will need to use results='asis'
in the options for the code chuck in order for the table to appear.
Revision question 2: Regress X1RSCALK1
(reading score) on X1MSCALK1
(maths score).
```{r, echo = FALSE, results='asis', warning=FALSE, error=FALSE, message=FALSE }
fit <- ### FILL HERE
stargazer(fit, header=FALSE, type='latex')
```
Dependent variable: | |
X1RSCALK1 | |
X1MSCALK1 | 0.674*** |
(0.005) | |
Constant | 16.946*** |
(0.160) | |
Observations | 14,045 |
R2 | 0.575 |
Adjusted R2 | 0.575 |
Residual Std. Error | 6.378 (df = 14043) |
F Statistic | 18,965.390*** (df = 1; 14043) |
Note: | p<0.1; p<0.05; p<0.01 |
![Caption for the picture.](/path/to/image.pdf)
Try to insert the distribution of the variable of your choice to the document. Make sure no unintended message will be output to the document, and center the figure using the fig.align='center'
option.
Revision question 3: Create a univariate ggplot using the variable X4RSCALK1
(reading score). The geometry is a histogram plot. Add the classic theme and label the x-axis with "Reading score".
```{r , fig.width = 5, fig.height = 4, warning = FALSE, echo = FALSE, error = FALSE, message = FALSE, fig.align='center'}
### FILL HERE
```
In order to add a bibliography to the end of your document, you first need to create a .bib file in LaTeX that contains the list of references that you want to cite in BibTeX format.
In order to obtain the BibTeX references of the paper, you can search for the paper in Google Scholar and select the “” sign. Then, click on BibTeX and copy and paste the reference to your .bib file.
Then, add a line specifying the .bib file name to the Preamble:
bibliography: biblio.bib
You should insert the reference as follows, in you text:
Blablabla, as was shown by @reference_name.
This will result in:
Using the dataset of your choice, create an article template including:
Headings and subheadings
Some text (lorem ipsum)
A summary statistics table
A ggplot
A regression table with one simple regression
An inline value incorporated in the text
A mathematical formula in the text
A bibliography with at least 2 references
The article does not need to make sense!
You can find many free data sets online, for example:
UCI Machine Learning Repository
Upload your PDF file at the link on the homepage.