Supplementary resources:
- R Markdown: The Definitive Guide,
- R markdown cheat sheet,
- RStudio: Bibliographies and Citations,
- Yan Holtz: Pimp my RMD - a few tips for R Markdown,
- Kieran Healy: Plain text, papers, pandoc,
- Create Awesome HTML Table with knitr::kable and kableExtra

In this workshop the focus is on getting all the stuff out of R. We will cover how to create html (such as this), pdf (like a LaTeX document) or Word output from R, or how to get just individual results, such as regression tables. Communicating research is a fundamental part of the (academic) research process.

1. RMarkdown

Markdown is a simple, easy to read and easy to write language that was created initially as a text-to-HTML tool. The Markdown syntax is straightforward and easy to memorize. Let’s take a look at the basics.

1.1 Basic formatting

Adding headers

# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6

To add bold, italics or their combination:

*italics* or _italics_  
**bold** or __bold__  

italics or italics
bold or bold

Add linebreaks with two empty spaces + enter. A single enter just adds a newline.

Create lists easily:

unordered list

- item1
- item2
    + subitem1
    + subitem2
    
or

* item1
* item2
    + subitem1
    + subitem2


ordered lists

1. first item
2. second item
    2.1 subitem
    2.2 subitem

nordered list

  • item1
  • item2
    • subitem1
    • subitem2

or

  • item1
  • item2
    • subitem1
    • subitem2

ordered lists

  1. first item
  2. second item
    • sub item

Add images: ![](https://upload.wikimedia.org/wikipedia/en/thumb/b/b9/MagrittePipe.jpg/300px-MagrittePipe.jpg)

1.2 setting up R Markdown1

You can create a new R Markdown document from the File > New File > R Markdown... route. The new document you’ll have is essentially a plain text file, with an .Rmd extension. R Markdown allows us to interweave text, code and results in one document.

The main elements of our document are: 1. The YAML header at the beggining of the doc, between the --- lines. 2. Code chunks, marked by ``` 3. Text with markdown formatting

Knitting works that your .Rmd file is being sent to the knitr package, which then executes all your code chunks and then pandoc renders the output in your desired format.

A short document looks like this. We will go over each element and then use our prevous sessions to write a short mock report.

#> ---
#> title: "Analysis of determinants of life expectancy"
#> author: "Akos Mate"
#> date: '2018-07-30'
#> output:
#>   html_document: default
#> ---
#> 
#> ```{r setup, include=FALSE}
#> 
#> ```
#> 
#> In this reprt we investigate the possible connection between life expectancy and GDP per capita. To illustrate our point we will plot the correlation.
#> 
#> ```{r, echo = FALSE}
#> library(ggplot2)
#> library(dplyr)
#> library(gapminder)
#> 
#> 
#> ggplot(data = gapminder,
#>        mapping = aes(x = gdpPercap,
#>                      y = lifeExp)) +
#>     geom_point()
#> ```

Quick excercise: create a new R Markdown document and see what the output of the above code is. You can run the Rmarkdown document by “knitting” it with the Knit button.

1.2.1 chunk names

Code chunks are the backbone of your document, they contain the R code that you would write in your script. You can also embed code inline, with ` `, which will look like this: dim(df). Each chunk can have different options, which you can specify in the top of the chunk like this: ```{r, options here}

You can also name your chunks by adding the name on the top: ```{r chunk_name}. It is a useful practice because if you have a chunk with some error in it, you know where to look for, after checking the error message. The bottom of the script window also allows you to navigate between chunk by using their names.

1.2.2 chunk options

You can specify options for each chunk (or set up a global default) which will controll how knitr will run the code inside. The most useful options:

  • eval = TRUE/FALSE When FALSE it’ll only display code, not the output, as the code inside the chunk will not be evaluated and run. If you just want to show your code, without the results this is useful.

  • echo = TRUE/FALSE When TRUE it will show both your code and the output below.

  • warning and error when TRUE will display the error and warning messages alongside your results. Useful if you have long warnings for some reasons and you do not want to clutter the results.

  • message same as the previous, but with messages. (e.g.: what you see after loading packages)

You can set global options for your document with the following line in a code chunk: knitr::opts_chunk$set(). For example the defaults that I used for these outputs is the following:

knitr::opts_chunk$set(echo = TRUE,
                      comment = NA,
                      collapse = TRUE,
                      warning = FALSE)

1.2.3 YAML header

In this, you can specify the attributes for your documents (similary to the LaTeX preambulum).

The YAML header for this document looks like this:


---
title: "Working with RMarkdown"
author: "Akos Mate"
subtitle: "PERG workshop"
date: '2019 January'
output:
    html_document:
        toc: true
        toc_depth: 3
        toc_float: true
        theme: readable
        css: style.css
bibliography: mybib.bib
---

Most of the things are self explanatory (such as title, author, etc.), but there are some options under output that are worth exploring. You should also mind the indentation of the header elements, because it matters!

The output: in this case is a html_document, with table of contents enables (toc: true) with displaying 3 levels (toc_depth: 3). The table of content automatically pulls your markdown headers (#, ##, etc.). You can switch between outputs in two ways:

  • use the output: pdf_document in the YAML header
  • use the knit drop down menu to choose your output

Possible output options are:

  • pdf_document creates a pdf doc, using LaTeX. If you always wanted to try LaTeX but found it too complicated, this is an easy way to create professional looking papers, without going into the LaTeX nitty gritties (eventually you’ll have to I’m afraid). You need to install LaTeX for this feature.
  • word_document creates Microsoft Word docs with .docx extension
  • odt_document creates OpenDocument Texts with .odt extension
  • rtf_document creates Rich Text Format with .rtf extension

The bibliography is one of the key argument if you are writing academic papers. For this to work, you’ll need a BibTeX file (with .bib extension), which is essentially a plain text file with your bib citation. If you use (you should!!!!) any citation manager, there is an option to export your citations into a Bib file. A bibtex formatted citation looks like this:

@article{albrecht1999time,
  title={Time varying speed of light as a solution to cosmological puzzles},
  author={Albrecht, Andreas and Magueijo, Joao},
  journal={Physical Review D},
  volume={59},
  number={4},
  pages={043516},
  year={1999},
  publisher={APS}
}

You can get this type of citation from Google Scholar as well. After you prepared your .bib file, you just need to specify it in the YAML header as such: bibliography: mybib.bib.

  • To insert the citation into the paper, you need to use the following syntax: [@bibkey] where the bib key is the identifier in the @article{bibkey, ...}. In our case, it is “albrecht1999time”.
  • To cite this seminal contribution to science, we type: [@albrecht1999time] which will give us this: (Albrecht and Magueijo 1999).
  • For in text citation, just use @albrecht1999time: Albrecht and Magueijo (1999)
  • Supress the author by adding a -: [-@albrecht1999time]: Albrecht et al (1999) demonstrated, that because of physics!

You can add the bibliography at the end of your paper with the # Bibliography header. With this we are mostly set to write great papers without exiting from our R workflow.

1.3 Tables and other output

We will see how to get our R things into html, LaTeX, and Word.

1.3.1 html

library(dplyr)
library(knitr)
library(kableExtra)
#> Warning: package 'kableExtra' was built under R version 3.5.2
library(survey)
#> Warning: package 'survey' was built under R version 3.5.2
library(broom)
library(stargazer)
#> Warning: package 'stargazer' was built under R version 3.5.2

If you need to create html versions of your research (for your blog for example), best to use the knitr::kable() function. The code which generates nice tables. If you wish, you can add further nice little extras with the kableExtra package. Let’s add stripes to our table and highlight the row where our mouse is with the kable_styling() function. If left empty, it will give you a the output of the kable() function.

df <- mtcars[1:5, 1:6]
df %>%
  kable() %>%
  kable_styling()
mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440

The more fancy version:

kable(df) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440

You can export your regression tables in similar fashion after tidying it up with the broom package.

data("airquality")

m1 <- tidy(lm(Ozone~Temp+Solar.R, data = airquality))

reg1_table <- m1 %>% 
    select(IV = term, Est. = estimate, sd = std.error, `p value` = p.value) %>% 
    mutate_if(is.numeric, funs(round(., 2)))
reg1_table %>% 
    kable() %>% 
    kable_styling()
IV Est. sd p value
(Intercept) -145.70 18.45 0.00
Temp 2.28 0.25 0.00
Solar.R 0.06 0.03 0.03



Or using the stargazer package, which is producing a more journal like output.

m2 <- lm(Ozone~Temp+Solar.R, data = airquality)
stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "html")
Regression result
Dependent variable:
Ozone levels
Temp 2.278***
(0.246)
Solar.R 0.057**
(0.026)
Constant -145.703***
(18.447)
Observations 111
R2 0.510
Adjusted R2 0.501
Residual Std. Error 23.500 (df = 108)
F Statistic 56.275*** (df = 2; 108)
Note: p<0.1; p<0.05; p<0.01



To have the output render, you need to set your chunk options the following: ```{r, results = "asis"}. Otherwise you’ll just get the html code, that you can paste into any html file to have it rendered:

stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "html")
#> 
#> <table style="text-align:center"><caption><strong>Regression result</strong></caption>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
#> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
#> <tr><td style="text-align:left"></td><td>Ozone levels</td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Temp</td><td>2.278<sup>***</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(0.246)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td style="text-align:left">Solar.R</td><td>0.057<sup>**</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(0.026)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td style="text-align:left">Constant</td><td>-145.703<sup>***</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(18.447)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>111</td></tr>
#> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.510</td></tr>
#> <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.501</td></tr>
#> <tr><td style="text-align:left">Residual Std. Error</td><td>23.500 (df = 108)</td></tr>
#> <tr><td style="text-align:left">F Statistic</td><td>56.275<sup>***</sup> (df = 2; 108)</td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
#> </table>

1.3.2 Word

There are a number of ways to export to Word. You can just simply use kable() and knit to Word, which will give you a table output in Word that you can format any way you like. Another option is you export the R table as a csv and then import it into Word by opening the csv file, then selecting the imported text, Insert > Table > Convert Text to Table and here Word should automatically recognize what it should do.

# write.table(reg1_table, file = "reg1_table.csv", sep = ",")

Otherwise, you can open the .csv in Excel and then copy the Excel table into Word.

1.3.3 LaTeX

If you write in LaTeX, or knit the R markdown doc into pdf, you need LaTeX table output. Fortunately, the stargazer package is rather flexible in that regard, as you only have to specify type = latex to get the LaTeX output.

stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "latex")
#> 
#> % Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
#> % Date and time: Fri, Jan 25, 2019 - 12:10:21 PM
#> \begin{table}[!htbp] \centering 
#>   \caption{Regression result} 
#>   \label{} 
#> \begin{tabular}{@{\extracolsep{5pt}}lc} 
#> \\[-1.8ex]\hline 
#> \hline \\[-1.8ex] 
#>  & \multicolumn{1}{c}{\textit{Dependent variable:}} \\ 
#> \cline{2-2} 
#> \\[-1.8ex] & Ozone levels \\ 
#> \hline \\[-1.8ex] 
#>  Temp & 2.278$^{***}$ \\ 
#>   & (0.246) \\ 
#>   & \\ 
#>  Solar.R & 0.057$^{**}$ \\ 
#>   & (0.026) \\ 
#>   & \\ 
#>  Constant & $-$145.703$^{***}$ \\ 
#>   & (18.447) \\ 
#>   & \\ 
#> \hline \\[-1.8ex] 
#> Observations & 111 \\ 
#> R$^{2}$ & 0.510 \\ 
#> Adjusted R$^{2}$ & 0.501 \\ 
#> Residual Std. Error & 23.500 (df = 108) \\ 
#> F Statistic & 56.275$^{***}$ (df = 2; 108) \\ 
#> \hline 
#> \hline \\[-1.8ex] 
#> \textit{Note:}  & \multicolumn{1}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
#> \end{tabular} 
#> \end{table}

I highly suggest that you give the pdf output a go, or give LaTeX a try, as it produces beautiful, highly customizable and professional output. The output of the above code looks like this:

An alternative package for both the html and LaTeX output is xtable, which have similar functionality as stargazer.

2. Example document with various R output

A quick example of what a possible article written in RMarkdown looks like is here: https://github.com/aakosm/rmarkdown_workshop/blob/master/rmd_example/ex2.pdf

The source for that document is in the Github repository “/rmd_example/ex2.rmd”, but for posterity, here is the rmd source (note: you will need the mybib.bib for it to run, which is also in the same folder)

#> ---
#> title: "Insightful paper example"
#> author: Akos Mate
#> date: "January 2019"
#> output: pdf_document
#> bibliography: mybib.bib
#> abstract: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
#>   tempor incididunt ut labore et dolore magna aliqua. Ut enimad minim veniam, quis
#>   nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis
#>   aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat
#>   nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
#>   officia deserunt mollit anim id est laborum.
#> ---
#> 
#> ```{r setup, include=FALSE}
#> knitr::opts_chunk$set(echo = FALSE,
#>                       message = FALSE,
#>                       warning = FALSE,
#>                       fig.pos = 'h!'
#> )
#> ```
#> 
#> ```{r, echo = FALSE}
#> library(ggplot2)
#> library(dplyr)
#> library(gapminder)
#> library(stargazer)
#> ```
#> 
#> # Introduction
#> 
#> **Example article, based on the post of [Kieran Healy: Plain text, papers, pandoc (click if you want to dig deeper)](https://kieranhealy.org/blog/archives/2014/01/23/plain-text/).**
#> 
#> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#> 
#> In this reprt we investigate the possible connection between life expectancy and GDP per capita. To illustrate our point we will plot the correlation.
#> 
#> 
#> 
#> ```{r, fig.width=4, fig.height=3, fig.align='center', fig.cap="Effect of GDP per capita on life expectancy"}
#> ggplot(data = gapminder,
#>        mapping = aes(x = log(gdpPercap),
#>                      y = lifeExp)) +
#>     geom_point(alpha = 0.3) +
#>     geom_smooth(method = "lm") +
#>     labs(x = "GDP per capita (log)",
#>          y = "Life expectancy",
#>          caption = "note: Observations are country years") +
#>     theme_minimal()
#> ```
#> 
#> 
#> # Discussion
#> 
#> And we run a regression to prove it. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#> 
#> ```{r results = "asis"}
#> cs_gapminder <- gapminder %>% 
#>     filter(year == 2007)
#> 
#> m1 <- lm(lifeExp ~ log(gdpPercap), data = cs_gapminder)
#> 
#> m2 <- lm(lifeExp ~ log(gdpPercap) + pop, data = cs_gapminder)
#> 
#> m3 <- lm(lifeExp ~ log(gdpPercap) + pop + continent, data = cs_gapminder)
#> 
#> stargazer(m1, m2, m3, title = "OLS models", dep.var.labels = "Life Expectancy", covariate.labels = c("GDP per capita (log)", "Population", "Continent"), type = "latex", header = FALSE)
#> ```
#> 
#> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#> 
#> 
#> Also a summary table for the mtcars dataset.
#> 
#> ```{r results = "asis"}
#> stargazer(mtcars, title = "Descriptive statistics", header = FALSE)
#> ```
#> 
#> 
#> # Conclusion
#> 
#> We used [@gapminder2017] dataset to investigate all we wanted to now. Now we are reasonably sure, that Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Also, [@stargazer2018].
#> 
#> # Bibliography

Bibliography

Albrecht, Andreas, and Joao Magueijo. 1999. “Time Varying Speed of Light as a Solution to Cosmological Puzzles.” Physical Review D 59 (4). APS: 043516.


  1. Examples for this section are adapted from the R for Data Science, ch.27