Supplementary resources:
- R Markdown: The Definitive Guide,
- R markdown cheat sheet,
- RStudio: Bibliographies and Citations,
- Yan Holtz: Pimp my RMD - a few tips for R Markdown,
- Kieran Healy: Plain text, papers, pandoc,
- Create Awesome HTML Table with knitr::kable and kableExtra
In this workshop the focus is on getting all the stuff out of R. We will cover how to create html (such as this), pdf (like a LaTeX document) or Word output from R, or how to get just individual results, such as regression tables. Communicating research is a fundamental part of the (academic) research process.
Markdown is a simple, easy to read and easy to write language that was created initially as a text-to-HTML tool. The Markdown syntax is straightforward and easy to memorize. Let’s take a look at the basics.
Adding headers
# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6
To add bold, italics or their combination:
*italics* or _italics_
**bold** or __bold__
italics or italics
bold or bold
Add linebreaks with two empty spaces + enter. A single enter just adds a newline.
Create lists easily:
unordered list
- item1
- item2
+ subitem1
+ subitem2
or
* item1
* item2
+ subitem1
+ subitem2
ordered lists
1. first item
2. second item
2.1 subitem
2.2 subitem
nordered list
or
ordered lists
Add images: 
You can create a new R Markdown document from the File > New File > R Markdown...
route. The new document you’ll have is essentially a plain text file, with an .Rmd
extension. R Markdown allows us to interweave text, code and results in one document.
The main elements of our document are: 1. The YAML header at the beggining of the doc, between the ---
lines. 2. Code chunks, marked by ```
3. Text with markdown formatting
Knitting works that your .Rmd
file is being sent to the knitr
package, which then executes all your code chunks and then pandoc
renders the output in your desired format.
A short document looks like this. We will go over each element and then use our prevous sessions to write a short mock report.
#> ---
#> title: "Analysis of determinants of life expectancy"
#> author: "Akos Mate"
#> date: '2018-07-30'
#> output:
#> html_document: default
#> ---
#>
#> ```{r setup, include=FALSE}
#>
#> ```
#>
#> In this reprt we investigate the possible connection between life expectancy and GDP per capita. To illustrate our point we will plot the correlation.
#>
#> ```{r, echo = FALSE}
#> library(ggplot2)
#> library(dplyr)
#> library(gapminder)
#>
#>
#> ggplot(data = gapminder,
#> mapping = aes(x = gdpPercap,
#> y = lifeExp)) +
#> geom_point()
#> ```
Quick excercise: create a new R Markdown document and see what the output of the above code is. You can run the Rmarkdown document by “knitting” it with the Knit button.
Code chunks are the backbone of your document, they contain the R code that you would write in your script. You can also embed code inline, with ` `
, which will look like this: dim(df)
. Each chunk can have different options, which you can specify in the top of the chunk like this: ```{r, options here}
You can also name your chunks by adding the name on the top: ```{r chunk_name}
. It is a useful practice because if you have a chunk with some error in it, you know where to look for, after checking the error message. The bottom of the script window also allows you to navigate between chunk by using their names.
You can specify options for each chunk (or set up a global default) which will controll how knitr
will run the code inside. The most useful options:
eval = TRUE/FALSE
When FALSE
it’ll only display code, not the output, as the code inside the chunk will not be evaluated and run. If you just want to show your code, without the results this is useful.
echo = TRUE/FALSE
When TRUE
it will show both your code and the output below.
warning
and error
when TRUE
will display the error and warning messages alongside your results. Useful if you have long warnings for some reasons and you do not want to clutter the results.
message
same as the previous, but with messages. (e.g.: what you see after loading packages)
You can set global options for your document with the following line in a code chunk: knitr::opts_chunk$set()
. For example the defaults that I used for these outputs is the following:
knitr::opts_chunk$set(echo = TRUE,
comment = NA,
collapse = TRUE,
warning = FALSE)
In this, you can specify the attributes for your documents (similary to the LaTeX preambulum).
The YAML header for this document looks like this:
---
title: "Working with RMarkdown"
author: "Akos Mate"
subtitle: "PERG workshop"
date: '2019 January'
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
theme: readable
css: style.css
bibliography: mybib.bib
---
Most of the things are self explanatory (such as title, author, etc.), but there are some options under output
that are worth exploring. You should also mind the indentation of the header elements, because it matters!
The output:
in this case is a html_document
, with table of contents enables (toc: true
) with displaying 3 levels (toc_depth: 3
). The table of content automatically pulls your markdown headers (#
, ##
, etc.). You can switch between outputs in two ways:
output: pdf_document
in the YAML headerPossible output options are:
pdf_document
creates a pdf doc, using LaTeX. If you always wanted to try LaTeX but found it too complicated, this is an easy way to create professional looking papers, without going into the LaTeX nitty gritties (eventually you’ll have to I’m afraid). You need to install LaTeX for this feature.word_document
creates Microsoft Word docs with .docx
extensionodt_document
creates OpenDocument Texts with .odt
extensionrtf_document
creates Rich Text Format with .rtf
extensionThe bibliography
is one of the key argument if you are writing academic papers. For this to work, you’ll need a BibTeX file (with .bib
extension), which is essentially a plain text file with your bib citation. If you use (you should!!!!) any citation manager, there is an option to export your citations into a Bib file. A bibtex formatted citation looks like this:
@article{albrecht1999time,
title={Time varying speed of light as a solution to cosmological puzzles},
author={Albrecht, Andreas and Magueijo, Joao},
journal={Physical Review D},
volume={59},
number={4},
pages={043516},
year={1999},
publisher={APS}
}
You can get this type of citation from Google Scholar as well. After you prepared your .bib
file, you just need to specify it in the YAML header as such: bibliography: mybib.bib
.
[@bibkey]
where the bib key is the identifier in the @article{bibkey, ...}
. In our case, it is “albrecht1999time”.[@albrecht1999time]
which will give us this: (Albrecht and Magueijo 1999).@albrecht1999time
: Albrecht and Magueijo (1999)-
: [-@albrecht1999time]
: Albrecht et al (1999) demonstrated, that because of physics!You can add the bibliography at the end of your paper with the # Bibliography
header. With this we are mostly set to write great papers without exiting from our R workflow.
We will see how to get our R things into html, LaTeX, and Word.
library(dplyr)
library(knitr)
library(kableExtra)
#> Warning: package 'kableExtra' was built under R version 3.5.2
library(survey)
#> Warning: package 'survey' was built under R version 3.5.2
library(broom)
library(stargazer)
#> Warning: package 'stargazer' was built under R version 3.5.2
If you need to create html versions of your research (for your blog for example), best to use the knitr::kable()
function. The code which generates nice tables. If you wish, you can add further nice little extras with the kableExtra
package. Let’s add stripes to our table and highlight the row where our mouse is with the kable_styling()
function. If left empty, it will give you a the output of the kable()
function.
df <- mtcars[1:5, 1:6]
df %>%
kable() %>%
kable_styling()
mpg | cyl | disp | hp | drat | wt | |
---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 |
The more fancy version:
kable(df) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
mpg | cyl | disp | hp | drat | wt | |
---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 |
You can export your regression tables in similar fashion after tidying it up with the broom
package.
data("airquality")
m1 <- tidy(lm(Ozone~Temp+Solar.R, data = airquality))
reg1_table <- m1 %>%
select(IV = term, Est. = estimate, sd = std.error, `p value` = p.value) %>%
mutate_if(is.numeric, funs(round(., 2)))
reg1_table %>%
kable() %>%
kable_styling()
IV | Est. | sd | p value |
---|---|---|---|
(Intercept) | -145.70 | 18.45 | 0.00 |
Temp | 2.28 | 0.25 | 0.00 |
Solar.R | 0.06 | 0.03 | 0.03 |
Or using the stargazer
package, which is producing a more journal like output.
m2 <- lm(Ozone~Temp+Solar.R, data = airquality)
stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "html")
Dependent variable: | |
Ozone levels | |
Temp | 2.278*** |
(0.246) | |
Solar.R | 0.057** |
(0.026) | |
Constant | -145.703*** |
(18.447) | |
Observations | 111 |
R2 | 0.510 |
Adjusted R2 | 0.501 |
Residual Std. Error | 23.500 (df = 108) |
F Statistic | 56.275*** (df = 2; 108) |
Note: | p<0.1; p<0.05; p<0.01 |
To have the output render, you need to set your chunk options the following: ```{r, results = "asis"}
. Otherwise you’ll just get the html code, that you can paste into any html file to have it rendered:
stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "html")
#>
#> <table style="text-align:center"><caption><strong>Regression result</strong></caption>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
#> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
#> <tr><td style="text-align:left"></td><td>Ozone levels</td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Temp</td><td>2.278<sup>***</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(0.246)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td style="text-align:left">Solar.R</td><td>0.057<sup>**</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(0.026)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td style="text-align:left">Constant</td><td>-145.703<sup>***</sup></td></tr>
#> <tr><td style="text-align:left"></td><td>(18.447)</td></tr>
#> <tr><td style="text-align:left"></td><td></td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>111</td></tr>
#> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.510</td></tr>
#> <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.501</td></tr>
#> <tr><td style="text-align:left">Residual Std. Error</td><td>23.500 (df = 108)</td></tr>
#> <tr><td style="text-align:left">F Statistic</td><td>56.275<sup>***</sup> (df = 2; 108)</td></tr>
#> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
#> </table>
There are a number of ways to export to Word. You can just simply use kable()
and knit to Word, which will give you a table output in Word that you can format any way you like. Another option is you export the R table as a csv and then import it into Word by opening the csv file, then selecting the imported text, Insert > Table > Convert Text to Table
and here Word should automatically recognize what it should do.
# write.table(reg1_table, file = "reg1_table.csv", sep = ",")
Otherwise, you can open the .csv in Excel and then copy the Excel table into Word.
If you write in LaTeX, or knit the R markdown doc into pdf, you need LaTeX table output. Fortunately, the stargazer
package is rather flexible in that regard, as you only have to specify type = latex
to get the LaTeX output.
stargazer(m2, title = "Regression result", dep.var.labels = "Ozone levels", type = "latex")
#>
#> % Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
#> % Date and time: Fri, Jan 25, 2019 - 12:10:21 PM
#> \begin{table}[!htbp] \centering
#> \caption{Regression result}
#> \label{}
#> \begin{tabular}{@{\extracolsep{5pt}}lc}
#> \\[-1.8ex]\hline
#> \hline \\[-1.8ex]
#> & \multicolumn{1}{c}{\textit{Dependent variable:}} \\
#> \cline{2-2}
#> \\[-1.8ex] & Ozone levels \\
#> \hline \\[-1.8ex]
#> Temp & 2.278$^{***}$ \\
#> & (0.246) \\
#> & \\
#> Solar.R & 0.057$^{**}$ \\
#> & (0.026) \\
#> & \\
#> Constant & $-$145.703$^{***}$ \\
#> & (18.447) \\
#> & \\
#> \hline \\[-1.8ex]
#> Observations & 111 \\
#> R$^{2}$ & 0.510 \\
#> Adjusted R$^{2}$ & 0.501 \\
#> Residual Std. Error & 23.500 (df = 108) \\
#> F Statistic & 56.275$^{***}$ (df = 2; 108) \\
#> \hline
#> \hline \\[-1.8ex]
#> \textit{Note:} & \multicolumn{1}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\
#> \end{tabular}
#> \end{table}
I highly suggest that you give the pdf output a go, or give LaTeX a try, as it produces beautiful, highly customizable and professional output. The output of the above code looks like this:
An alternative package for both the html and LaTeX output is xtable
, which have similar functionality as stargazer
.
A quick example of what a possible article written in RMarkdown looks like is here: https://github.com/aakosm/rmarkdown_workshop/blob/master/rmd_example/ex2.pdf
The source for that document is in the Github repository “/rmd_example/ex2.rmd”, but for posterity, here is the rmd source (note: you will need the mybib.bib for it to run, which is also in the same folder)
#> ---
#> title: "Insightful paper example"
#> author: Akos Mate
#> date: "January 2019"
#> output: pdf_document
#> bibliography: mybib.bib
#> abstract: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
#> tempor incididunt ut labore et dolore magna aliqua. Ut enimad minim veniam, quis
#> nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis
#> aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat
#> nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
#> officia deserunt mollit anim id est laborum.
#> ---
#>
#> ```{r setup, include=FALSE}
#> knitr::opts_chunk$set(echo = FALSE,
#> message = FALSE,
#> warning = FALSE,
#> fig.pos = 'h!'
#> )
#> ```
#>
#> ```{r, echo = FALSE}
#> library(ggplot2)
#> library(dplyr)
#> library(gapminder)
#> library(stargazer)
#> ```
#>
#> # Introduction
#>
#> **Example article, based on the post of [Kieran Healy: Plain text, papers, pandoc (click if you want to dig deeper)](https://kieranhealy.org/blog/archives/2014/01/23/plain-text/).**
#>
#> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#>
#> In this reprt we investigate the possible connection between life expectancy and GDP per capita. To illustrate our point we will plot the correlation.
#>
#>
#>
#> ```{r, fig.width=4, fig.height=3, fig.align='center', fig.cap="Effect of GDP per capita on life expectancy"}
#> ggplot(data = gapminder,
#> mapping = aes(x = log(gdpPercap),
#> y = lifeExp)) +
#> geom_point(alpha = 0.3) +
#> geom_smooth(method = "lm") +
#> labs(x = "GDP per capita (log)",
#> y = "Life expectancy",
#> caption = "note: Observations are country years") +
#> theme_minimal()
#> ```
#>
#>
#> # Discussion
#>
#> And we run a regression to prove it. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#>
#> ```{r results = "asis"}
#> cs_gapminder <- gapminder %>%
#> filter(year == 2007)
#>
#> m1 <- lm(lifeExp ~ log(gdpPercap), data = cs_gapminder)
#>
#> m2 <- lm(lifeExp ~ log(gdpPercap) + pop, data = cs_gapminder)
#>
#> m3 <- lm(lifeExp ~ log(gdpPercap) + pop + continent, data = cs_gapminder)
#>
#> stargazer(m1, m2, m3, title = "OLS models", dep.var.labels = "Life Expectancy", covariate.labels = c("GDP per capita (log)", "Population", "Continent"), type = "latex", header = FALSE)
#> ```
#>
#> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
#>
#>
#> Also a summary table for the mtcars dataset.
#>
#> ```{r results = "asis"}
#> stargazer(mtcars, title = "Descriptive statistics", header = FALSE)
#> ```
#>
#>
#> # Conclusion
#>
#> We used [@gapminder2017] dataset to investigate all we wanted to now. Now we are reasonably sure, that Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Also, [@stargazer2018].
#>
#> # Bibliography
Albrecht, Andreas, and Joao Magueijo. 1999. “Time Varying Speed of Light as a Solution to Cosmological Puzzles.” Physical Review D 59 (4). APS: 043516.
Examples for this section are adapted from the R for Data Science, ch.27↩