Workflow: Getting help

This book is not an island; there is no single resource that will allow you to master R. As you begin to apply the techniques described in this book to your own data, you will soon find questions that we do not answer. This section describes a few tips on how to get help, and to help you keep learning.

Google is your friend

If you get stuck, start with Google. Typically adding “R” to a query is enough to restrict it to relevant results: if the search isn’t useful, it often means that there aren’t any R-specific results available. Google is particularly useful for error messages. If you get an error message and you have no idea what it means, try googling it! Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. (If the error message isn’t in English, run Sys.setenv(LANGUAGE = "en") and re-run the code; you’re more likely to find help for English error messages.)

If Google doesn’t help, try Stack Overflow. Start by spending a little time searching for an existing answer, including [R] to restrict your search to questions and answers that use R.

Making a reprex

If your googling doesn’t find anything useful, it’s a really good idea prepare a reprex, short for minimal reproducible example. A good reprex makes it easier for other people to help you, and often you’ll figure out the problem yourself in the course of making it. There are two parts to creating a reprex:

First, you need to make your code reproducible. This means that you need to capture everything, i.e., include any library() calls and create all necessary objects. The easiest way to make sure you’ve done this is to use the reprex package.
Second, you need to make it minimal. Strip away everything that is not directly related to your problem. This usually involves creating a much smaller and simpler R object than the one you’re facing in real life or even using built-in data.

That sounds like a lot of work! And it can be, but it has a great payoff:

80% of the time creating an excellent reprex reveals the source of your problem. It’s amazing how often the process of writing up a self-contained and minimal example allows you to answer your own question.
The other 20% of time you will have captured the essence of your problem in a way that is easy for others to play with. This substantially improves your chances of getting help!

When creating a reprex by hand, it’s easy to accidentally miss something that means your code can’t be run on someone else’s computer. Avoid this problem by using the reprex package which is installed as part of the tidyverse. Let’s say you copy this code onto your clipboard (or, on RStudio Server or Cloud, select it):

y <- 1:4
mean(y)

Then call reprex(), where the default target venue is GitHub:

reprex::reprex()

A nicely rendered HTML preview will display in RStudio’s Viewer (if you’re in RStudio) or your default browser otherwise. The relevant bit of GitHub-flavored Markdown is ready to be pasted from your clipboard (on RStudio Server or Cloud, you will need to copy this yourself):

``` r
y <- 1:4
mean(y)
#> [1] 2.5
```

Here’s what that Markdown would look like rendered in a GitHub issue:

y <- 1:4
mean(y)
#> [1] 2.5

Anyone else can copy, paste, and run this immediately.

There are three things you need to include to make your example reproducible: required packages, data, and code.

Packages should be loaded at the top of the script, so it’s easy to see which ones the example needs. This is a good time to check that you’re using the latest version of each package; it’s possible you’ve discovered a bug that’s been fixed since you installed or last updated the package. For packages in the tidyverse, the easiest way to check is to run tidyverse_update().
The easiest way to include data is to use dput() to generate the R code needed to recreate it. For example, to recreate the mtcars dataset in R, perform the following steps:
1. Run dput(mtcars) in R
2. Copy the output
3. In reprex, type mtcars <- then paste.
Try and find the smallest subset of your data that still reveals the problem.
Spend a little bit of time ensuring that your code is easy for others to read:
- Make sure you’ve used spaces and your variable names are concise, yet informative.
- Use comments to indicate where your problem lies.
- Do your best to remove everything that is not related to the problem.
The shorter your code is, the easier it is to understand, and the easier it is to fix.

Finish by checking that you have actually made a reproducible example by starting a fresh R session and copying and pasting your script in.

Investing in yourself

You should also spend some time preparing yourself to solve problems before they occur. Investing a little time in learning R each day will pay off handsomely in the long run. One way is to follow what the tidyverse team is doing on the tidyverse blog. To keep up with the R community more broadly, we recommend reading R Weekly: it’s a community effort to aggregate the most interesting news in the R community each week.

If you’re an active Twitter user, you might also want to follow Hadley (@hadleywickham), Mine (@minebocek), Garrett (@statgarrett), or follow @rstudiotips to keep up with new features in the IDE. If you want the full fire hose of new developments, you can also read the (#rstats) hashtag. This is one the key tools that Hadley and Mine use to keep up with new developments in the community.

Summary

This chapter concludes the Whole Game part of the book. You’ve now seen the most important parts of the data science process: visualization, transformation, tidying and importing. Now you’ve got a holistic view of whole process and we start to get into the the details of small pieces.

The next part of the book, Transform, goes into depth into the different types of variables that you might encounter: logical vectors, numbers, strings, factors, and date-times, and covers important related topics like tibbles, regular expression, missing values, and joins. There’s no need to read these chapters in order; dip in and out as needed for the specific data that you’re working with.