From 68a1d54489107d5f70145ebf3509b9bc09d403d8 Mon Sep 17 00:00:00 2001 From: Jonathan Page Date: Tue, 16 Aug 2016 02:11:45 -1000 Subject: [PATCH 1/2] Grammar (#270) --- transform.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/transform.Rmd b/transform.Rmd index afbed38..7d75d24 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -2,7 +2,7 @@ ## Introduction -Visualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often you'll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. You'll learn how to do all that (and more!) in this chapter which will teach you how to transform your data using the dplyr package and a new dataset on flights departing New York City in 2013. +Visualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often you'll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. You'll learn how to do all that (and more!) in this chapter, which will teach you how to transform your data using the dplyr package and a new dataset on flights departing New York City in 2013. ### Prerequisites @@ -35,7 +35,7 @@ You might also have noticed the row of three letter abbreviations under the colu ### Dplyr basics -In this chapter you are going to learn the five key dplyr functions that allow you to solve vast majority of your data manipulation challenges: +In this chapter you are going to learn the five key dplyr functions that allow you to solve the vast majority of your data manipulation challenges: * Pick observations by their values (`filter()`). * Reorder the rows (`arrange()`). From fc8ea9d06033a301909b9756b7d33619aa024fe7 Mon Sep 17 00:00:00 2001 From: Jonathan Page Date: Tue, 16 Aug 2016 02:22:41 -1000 Subject: [PATCH 2/2] Grammar (#271) --- EDA.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/EDA.Rmd b/EDA.Rmd index 67b659c..0961781 100644 --- a/EDA.Rmd +++ b/EDA.Rmd @@ -83,7 +83,7 @@ Every variable has its own pattern of variation, which can reveal interesting in ### Visualising distributions -How you visualise the distribution of a variable will depend on whether the variable is categorical or continuous. A variable is **categorical** if it can only take one of small set of values. In R, categorical variables are usually saved as factors or character vectors. To examine the distribution of a categorical variable, use a bar chart: +How you visualise the distribution of a variable will depend on whether the variable is categorical or continuous. A variable is **categorical** if it can only take one of a small set of values. In R, categorical variables are usually saved as factors or character vectors. To examine the distribution of a categorical variable, use a bar chart: ```{r} ggplot(data = diamonds) + @@ -130,7 +130,7 @@ ggplot(data = smaller, mapping = aes(x = carat, colour = cut)) + geom_freqpoly(binwidth = 0.1) ``` -There are a few challenges with this type of plot, which we will come back to in [visualisation a categorical and a continuous variable](#cat-cont). +There are a few challenges with this type of plot, which we will come back to in [visualising a categorical and a continuous variable](#cat-cont). Now that you can visualise variation, what should you look for in your plots? And what type of follow-up questions should you ask? I've put together a list below of the most useful types of information that you will find in your graphs, along with some follow up questions for each type of information. The key to asking good follow up questions will be to rely on your **curiosity** (What do you want to learn more about?) as well as your **skepticism** (How could this be misleading?). @@ -582,7 +582,7 @@ ggplot(faithful, aes(eruptions)) + geom_freqpoly(binwidth = 0.25) ``` -Sometimes we'll turn the end of pipeline of data transformation into a plot. Watch for the transition from `%>%` to `+`. I wish this transition wasn't necessary but unfortunately ggplot2 was created before the pipe was discovered. +Sometimes we'll turn the end of a pipeline of data transformation into a plot. Watch for the transition from `%>%` to `+`. I wish this transition wasn't necessary but unfortunately ggplot2 was created before the pipe was discovered. ```{r, eval = FALSE} diamonds %>% @@ -591,4 +591,4 @@ diamonds %>% geom_tile() ``` -If you want learn more about ggplot2, I'd highly recommend grabbing a copy of the ggplot2 book: . It's been recently updated, so includes dplyr and tidyr code, and has much more space to explore all the facets of visualisation. Unfortunately the book isn't generally available for free, but if you have a connection to a university you can probably get an electronic version for free through SpringerLink. +If you want learn more about ggplot2, I'd highly recommend grabbing a copy of the ggplot2 book: . It's been recently updated, so it includes dplyr and tidyr code, and has much more space to explore all the facets of visualisation. Unfortunately the book isn't generally available for free, but if you have a connection to a university you can probably get an electronic version for free through SpringerLink.