We will learn how to create any type of document (homework, paper, slides, etc…) in any format (PDF, html, beamer) using Rmarkdown. The main advantage of Rmarkdown is that it allows you to create a single document that includes both your text and your code.
This document can then be knitted : you can tell R to convert your document to the format of your choice. When doing so, R reads the plain text of your document, but also evaluates the code chunks that you have included in your document. You have at your disposal a bunch of options that allow you to tell R whether the R output and code should be displayed in the final document or not.
Our aim for today is to learn how to write a document (for instance your first Measurement homework !) that contains tables and graphs that you created in R.
The data for today’s exercise can be downloaded from here.
In order to open an Rmarkdown file, you can click on the + sign below the File tab that you usually use to open an R script. This time, you can select R Markdown. A window appears, where you can write down the name of your document and choose its type (PDF, html…). If you did not manage to install LaTeX on your computer before the session, select the html option.
Now that you have created your file, it contains a Preamble of the following form:
---
title: "Title"
author: "First Name Name"
output: pdf_document
---
You can add a date by adding the following line to the Preamble: date: "September 13th, 2019"
The output line allows you to choose the document type, and may be changed at anytime. Here are the most commonly used document types:
Your document can be structured with different sections and subsections using the # sign:
# Exercise 1
## 1^^st^^ question
### Question 1.a
You can then write your text as plain text. Here are a few useful commands that you can use to format your text.
Symbol | Effect |
---|---|
*italics* |
italics |
**bold** |
bold |
exponent^2^ |
exponent2 |
[link to Course website](https://introtor-pse.appspot.com/) |
link to Course website |
$y = \alpha + \beta x + \epsilon$ |
\(y = \alpha + \beta x + \epsilon\) |
The main advantage of using Rmarkdown is that you can include to your document pieces of code that might or might not be evaluated by R. These pieces of code are called Code chunks.
Here is how you can create a code chunk:
```{r}
Write your R code here.
```
Within the {}, you can specify in which langage you are writing your code (in your case, this is always going to be R) as well as a set of options for your Code chunk. The syntax is hence {r, options}
.
The code options are:
Option | Effect |
---|---|
eval = FALSE | knitr will not run the code in the code chunk |
include = FALSE | knitr will run the chunk but not include the chunk in the final document |
echo = FALSE | knitr will not display the code in the code chunk above its results in the final document |
The results options are:
Option | Effect |
---|---|
results = ‘hide’ | knitr will not display the code’s results in the final document |
results = ‘hold’ | knitr will delay displaying all output pieces until the end of the chunk |
warning = FALSE | knitr will not display any warning messages generated by the code |
message = FALSE | knitr will not display any messages generated by the code |
All of the packages that you’ll use in your document need to be loaded within your Rmarkdown document. For today’s session, you’ll need the packages haven
to load data in dta format and tidyverse
to make changes to your data. You will also need the package knitr
in order to knit your document. Do not forget to load the packages once they are installed.
Since we do not want this code chunk to appear in the final document, we are going to use the include = FALSE
option.
```{r packages2, include = FALSE}
list.of.packages <- c("tidyverse", "haven", "knitr","broom")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")
invisible(lapply(list.of.packages, library, character.only = TRUE))
```
Let’s now load our data.
```{r, include = FALSE}
df <- read_dta("Data/Score_US.dta") # you might use a different function depending on your file format.
```
There are two ways to output tables in your final document:
Try to type this as plain text:
Row names | Values
----------| ------
Row 1 | Value 1
Row 2 | Value 2
It should look like this:
Row names | Values |
---|---|
Row 1 | Value 1 |
Row 2 | Value 2 |
You can insert an R table to your document using the kable
function. The syntax is as follows : kable(table_name, options)
. Here are the most useful options of the kable
function:
Option | Effect |
---|---|
align = c(“l”,“c”,“c”,“c”) | First column will be aligned to the left, second to fourth columns will be centered |
col.names = c(“Name 1”, “Name 2”, “Name 3”, “Name 4”) | Defines the columns headers |
Try to create a table containing some summary statistics for two variables of your choice. The variables should be in rows and the different summary statistics (mean, sd, min, max) in columns. Remember that we want R to evaluate the code without displaying the code and its result in the document.
```{r, include = FALSE}
table1 <- df %>%
summarise(Mean = mean(X4RSCALK1,na.rm = T),
SD = sd(X4RSCALK1, na.rm = T),
Min = min(X4RSCALK1, na.rm = T),
Max = max(X4RSCALK1,na.rm = T)) %>%
mutate(Variable = "Reading") %>%
select(Variable, Mean:Max) %>%
mutate_if(is.numeric, funs(round(., digits=2)))
table2 <- df %>%
summarise(Mean = mean(X4MSCALK1,na.rm = T),
SD = sd(X4MSCALK1, na.rm = T),
Min = min(X4MSCALK1, na.rm = T),
Max = max(X4MSCALK1,na.rm = T)) %>%
mutate(Variable = "Maths") %>%
select(Variable, Mean:Max) %>%
mutate_if(is.numeric, funs(round(., digits=2)))
table <- rbind(table1,table2)
```
Let’s now output this table to our document. Since we do not want to output the code to our document, we are going to use the echo= FALSE
option:
```{r, echo = FALSE}
kable(table, caption = 'Reading and Maths Summary statistics', col.names= c("Variable","Mean", "SD", "Minimum", "Maximum"), align = c("l","c","c","c","c"))
```
Variable | Mean | SD | Minimum | Maximum |
---|---|---|---|---|
Reading | 70.02 | 13.12 | 25.27 | 95.13 |
Maths | 63.40 | 13.18 | 15.57 | 93.68 |
First of all, add the following option to your front matter:
---
header-includes:
- \usepackage{booktabs}
---
Let’s define an R object that we will want to display in the table:
a <- 0.567
Now, write down your LaTeX code as plain text and call the object a using the following syntax :
\begin{table}[!htbp]
\caption{Title }
\centering
\vspace{0.5cm}
\begin{tabular}{lccc}
\toprule
& \textbf{Maths Scores} & \textbf{Reading scores} & \textbf{Sciences scores} \\
\midrule \\
Var 1 & `r round(a, digits=2)` & 80 & 80 \\
Var 2 & 70 & 50 & 60 \\
\bottomrule
\end{tabular}
\flushleft{\small{Notes:}}
\end{table}
Let’s check whether boys and girls score differently at the 4th test in reading:
```{r, include = FALSE}
test <- tidy(t.test(df$X4RSCALK1 ~ df$X_CHSEX_R)) %>%
select(estimate,statistic)
```
Then, output your results using the kable
command :
```{r, echo = FALSE}
kable(test, col.names = c("Estimate","t-statistic"), caption = 'Children scores by parental education')
```
Estimate | t-statistic |
---|---|
2.740359 | 11.49756 |

Try to insert the distribution of the variable of your choice to the document. Make sure no unintended message will be outputed to the document, and center the figure using the fig.align='center'
option.
```{r , fig.width = 5, fig.height = 4, warning = FALSE, echo = FALSE, error = FALSE, message = FALSE, fig.align='center'}
ggplot(data = df, aes(x = df$X4RSCALK1)) +
geom_histogram() +
theme_classic() +
labs(x = "Reading score")
```
In order to add a bibliography to the end of your document, you first need to create a .bib file in LaTeX that contains the list of references that you want to cite in BibTeX format.
In order to obtain the BibTeX references of the paper, you can search for the paper in Google Scholar and select the “” sign. Then, click on BibTeX and copy and paste the reference to your .bib file.
Then, add a line specifying the .bib file name to the Preamble:
bibliography: biblio.bib
You should insert the reference as follows, in you text:
Blablabla, as was shown by Author [@reference_name].
This will result in: