76 lines
4.7 KiB
Plaintext
76 lines
4.7 KiB
Plaintext
# Program {#sec-program-intro .unnumbered}
|
|
|
|
```{r}
|
|
#| results: "asis"
|
|
#| echo: false
|
|
source("_common.R")
|
|
```
|
|
|
|
In this part of the book, you'll improve your programming skills.
|
|
Programming is a cross-cutting skill needed for all data science work: you must use a computer to do data science; you cannot do it in your head, or with pencil and paper.
|
|
|
|
```{r}
|
|
#| label: fig-ds-program
|
|
#| echo: false
|
|
#| out.width: ~
|
|
#| fig-cap: >
|
|
#| Programming is the water in which all other components of the data
|
|
#| science process swims.
|
|
#| fig-alt: >
|
|
#| Our model of the data science process with program (import, tidy,
|
|
#| transform, visualize, model, and communicate, i.e. everything)
|
|
#| highlighted in blue.
|
|
|
|
knitr::include_graphics("diagrams/data-science/program.png", dpi = 270)
|
|
```
|
|
|
|
Programming produces code, and code is a tool of communication.
|
|
Obviously code tells the computer what you want it to do.
|
|
But it also communicates meaning to other humans.
|
|
Thinking about code as a vehicle for communication is important because every project you do is fundamentally collaborative.
|
|
Even if you're not working with other people, you'll definitely be working with future-you!
|
|
Writing clear code is important so that others (like future-you) can understand why you tackled an analysis in the way you did.
|
|
That means getting better at programming also involves getting better at communicating.
|
|
Over time, you want your code to become not just easier to write, but easier for others to read.
|
|
|
|
Writing code is similar in many ways to writing prose.
|
|
One parallel which we find particularly useful is that in both cases rewriting is the key to clarity.
|
|
The first expression of your ideas is unlikely to be particularly clear, and you may need to rewrite multiple times.
|
|
After solving a data analysis challenge, it's often worth looking at your code and thinking about whether or not it's obvious what you've done.
|
|
If you spend a little time rewriting your code while the ideas are fresh, you can save a lot of time later trying to recreate what your code did.
|
|
But this doesn't mean you should rewrite every function: you need to balance what you need to achieve now with saving time in the long run.
|
|
(But the more you rewrite your functions the more likely your first attempt will be clear.)
|
|
|
|
In the following three chapters, you'll learn skills to improve your programming skills:
|
|
|
|
1. Copy-and-paste is a powerful tool, but you should avoid doing it more than twice.
|
|
Repeating yourself in code is dangerous because it can easily lead to errors and inconsistencies.
|
|
Instead, in @sec-functions, you'll learn how to write **functions** which let you extract out repeated code so that it can be easily reused.
|
|
|
|
2. Functions extract out repeated code, but you often need to repeat the same actions on different inputs.
|
|
You need tools for **iteration** that let you do similar things again and again.
|
|
These tools include for loops and functional programming, which you'll learn about in @sec-iteration.
|
|
|
|
3. As you read more code written by others, you'll see more code that doesn't use the tidyverse.
|
|
In @sec-base-r, you'll learn some of the most important base R functions that you'll see in the wild.
|
|
These functions tend to be designed to use individual vectors, rather than data frames, often making them a good fit for your programming needs.
|
|
|
|
## Learning more
|
|
|
|
The goal of these chapters is to teach you the minimum about programming that you need to practice data science.
|
|
Once you have mastered the material in this book, we strongly believe you should continue to invest in your programming skills.
|
|
Learning more about programming is a long-term investment: it won't pay off immediately, but in the long term it will allow you to solve new problems more quickly, and let you reuse your insights from previous problems in new scenarios.
|
|
|
|
To learn more you need to study R as a programming language, not just an interactive environment for data science.
|
|
We have written two books that will help you do so:
|
|
|
|
- [*Hands on Programming with R*](https://rstudio-education.github.io/hopr/), by Garrett Grolemund.
|
|
This is an introduction to R as a programming language and is a great place to start if R is your first programming language.
|
|
It covers similar material to these chapters, but with a different style and different motivation examples (based in the casino).
|
|
It's a useful complement if you find that these four chapters go by too quickly.
|
|
|
|
- [*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham.
|
|
This dives into the details of R the programming language.
|
|
This is a great place to start if you have existing programming experience.
|
|
It's also a great next step once you've internalized the ideas in these chapters.
|