1 Today’s exercise

We will learn how to create any type of document (homework, paper, slides, etc…) in any format (PDF, html, beamer) using Rmarkdown. The main advantage of Rmarkdown is that it allows you to create a single document that includes both your text and your code.

This document can then be knitted : you can tell R to convert your document to the format of your choice. When doing so, R reads the plain text of your document, but also evaluates the code chunks that you have included in your document. You have at your disposal a bunch of options that allow you to tell R whether the R output and code should be displayed in the final document or not.

2 Set-up

Our aim for today is to learn how to write a document (for instance your first Measurement homework !) that contains tables and graphs that you created in R.

2.1 Reading the data

The data for today’s exercise can be downloaded from here.

2.2 Opening an Rmarkdown file

In order to open an Rmarkdown file, you can click on the + sign below the File tab that you usually use to open an R script. This time, you can select R Markdown. A window appears, where you can write down the name of your document and choose its type (PDF, html…). If you did not manage to install LaTeX on your computer before the session, select the html option.

2.3 Choosing the document type

Now that you have created your file, it contains a Preamble of the following form:

---
title: "Title"
author: "First Name Name"
output: pdf_document
---

You can add a date by adding the following line to the Preamble: date: "September 13th, 2019"

The output line allows you to choose the document type, and may be changed at anytime. Here are the most commonly used document types:

  • pdf_document produces a pdf file
  • word_document produces a word file
  • beamer_presentation produces a beamer document, i.e. LaTeX slides
  • html_document produces an html file

3 Plain text

Your document can be structured with different sections and subsections using the # sign:

# Exercise 1
## 1^^st^^ question
### Question 1.a

You can then write your text as plain text. Here are a few useful commands that you can use to format your text.

Symbol Effect
*italics* italics
**bold** bold
exponent^2^ exponent2
[link to Course website](https://introtor-pse.appspot.com/) link to Course website
$y = \alpha + \beta x + \epsilon$ \(y = \alpha + \beta x + \epsilon\)

4 Code chunks

The main advantage of using Rmarkdown is that you can include to your document pieces of code that might or might not be evaluated by R. These pieces of code are called Code chunks.

Here is how you can create a code chunk:

```{r}
Write your R code here.
```

Within the {}, you can specify in which langage you are writing your code (in your case, this is always going to be R) as well as a set of options for your Code chunk. The syntax is hence {r, options}.

The code options are:

Option Effect
eval = FALSE knitr will not run the code in the code chunk
include = FALSE knitr will run the chunk but not include the chunk in the final document
echo = FALSE knitr will not display the code in the code chunk above its results in the final document

The results options are:

Option Effect
results = ‘hide’ knitr will not display the code’s results in the final document
results = ‘hold’ knitr will delay displaying all output pieces until the end of the chunk
warning = FALSE knitr will not display any warning messages generated by the code
message = FALSE knitr will not display any messages generated by the code

4.1 Loading the packages and the data

All of the packages that you’ll use in your document need to be loaded within your Rmarkdown document. For today’s session, you’ll need the packages haven to load data in dta format and tidyverse to make changes to your data. You will also need the package knitr in order to knit your document. Do not forget to load the packages once they are installed.

Since we do not want this code chunk to appear in the final document, we are going to use the include = FALSE option.

```{r packages2, include = FALSE}
list.of.packages <- c("tidyverse", "haven", "knitr","broom")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")

invisible(lapply(list.of.packages, library, character.only = TRUE))
```

Let’s now load our data.

```{r, include = FALSE}
df <- read_dta("Data/Score_US.dta") # you might use a different function depending on your file format.
```

5 Tables

There are two ways to output tables in your final document:

5.1 Make your table manually

Try to type this as plain text:

Row names | Values 
----------| ------
Row 1     | Value 1
Row 2     | Value 2

It should look like this:

Row names Values
Row 1 Value 1
Row 2 Value 2

5.2 Insert an R table

You can insert an R table to your document using the kable function. The syntax is as follows : kable(table_name, options). Here are the most useful options of the kable function:

Option Effect
align = c(“l”,“c”,“c”,“c”) First column will be aligned to the left, second to fourth columns will be centered
col.names = c(“Name 1”, “Name 2”, “Name 3”, “Name 4”) Defines the columns headers

Try to create a table containing some summary statistics for two variables of your choice. The variables should be in rows and the different summary statistics (mean, sd, min, max) in columns. Remember that we want R to evaluate the code without displaying the code and its result in the document.

```{r, include = FALSE}
table1 <- df %>%
          summarise(Mean = mean(X4RSCALK1,na.rm = T),
                    SD = sd(X4RSCALK1, na.rm = T),
                    Min = min(X4RSCALK1, na.rm = T),
                    Max = max(X4RSCALK1,na.rm = T)) %>%
                    mutate(Variable = "Reading") %>%
                    select(Variable, Mean:Max) %>%
         mutate_if(is.numeric, funs(round(., digits=2)))
table2 <- df %>%
          summarise(Mean = mean(X4MSCALK1,na.rm = T),
                    SD = sd(X4MSCALK1, na.rm = T),
                    Min = min(X4MSCALK1, na.rm = T),
                    Max = max(X4MSCALK1,na.rm = T)) %>%
                    mutate(Variable = "Maths") %>%
                    select(Variable, Mean:Max) %>%
         mutate_if(is.numeric, funs(round(., digits=2)))

table <- rbind(table1,table2)
```

Let’s now output this table to our document. Since we do not want to output the code to our document, we are going to use the echo= FALSE option:

```{r, echo = FALSE}
kable(table, caption = 'Reading and Maths Summary statistics', col.names= c("Variable","Mean", "SD", "Minimum", "Maximum"), align = c("l","c","c","c","c"))
```
Reading and Maths Summary statistics
Variable Mean SD Minimum Maximum
Reading 70.02 13.12 25.27 95.13
Maths 63.40 13.18 15.57 93.68

5.3 Write down a LaTeX table

First of all, add the following option to your front matter:

---
header-includes:
   - \usepackage{booktabs}
---

Let’s define an R object that we will want to display in the table:

a <- 0.567

Now, write down your LaTeX code as plain text and call the object a using the following syntax :

\begin{table}[!htbp]
\caption{Title }    
\centering
\vspace{0.5cm}
\begin{tabular}{lccc}
    \toprule
    & \textbf{Maths Scores}  & \textbf{Reading scores}   & \textbf{Sciences scores} \\

    \midrule \\
    Var 1 & `r round(a, digits=2)` & 80 & 80 \\
    Var 2 & 70 & 50 & 60 \\
    \bottomrule
\end{tabular}

\flushleft{\small{Notes:}}
\end{table}

6 Make a table with the t-test’s output

Let’s check whether boys and girls score differently at the 4th test in reading:

```{r, include = FALSE}

test <- tidy(t.test(df$X4RSCALK1 ~ df$X_CHSEX_R)) %>%
        select(estimate,statistic)

```

Then, output your results using the kable command :

```{r, echo = FALSE}

kable(test, col.names = c("Estimate","t-statistic"), caption = 'Children scores by parental education')

```
Children scores by parental education
Estimate t-statistic
2.740359 11.49756

7 Insert Figures

7.1 Insert figures from a folder

![Caption for the picture.](/path/to/image.pdf)

7.2 Insert R figures

Try to insert the distribution of the variable of your choice to the document. Make sure no unintended message will be outputed to the document, and center the figure using the fig.align='center'option.

```{r , fig.width = 5, fig.height = 4, warning = FALSE, echo = FALSE, error = FALSE, message = FALSE, fig.align='center'}
ggplot(data = df, aes(x = df$X4RSCALK1)) + 
      geom_histogram() + 
      theme_classic() + 
      labs(x = "Reading score") 
```

8 Add a bibliography to the document

In order to add a bibliography to the end of your document, you first need to create a .bib file in LaTeX that contains the list of references that you want to cite in BibTeX format.

In order to obtain the BibTeX references of the paper, you can search for the paper in Google Scholar and select the “” sign. Then, click on BibTeX and copy and paste the reference to your .bib file.

Then, add a line specifying the .bib file name to the Preamble:

bibliography: biblio.bib

You should insert the reference as follows, in you text:

Blablabla, as was shown by Author [@reference_name].

This will result in: