r4ds/oreilly/quarto-workflow.html

18 lines
4.7 KiB
HTML
Raw Normal View History

<section data-type="chapter" id="chp-quarto-workflow">
<h1><span id="sec-quarto-workflow" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Quarto workflow</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p><p>Earlier, we discussed a basic workflow for capturing your R code where you work interactively in the <em>console</em>, then capture what works in the <em>script editor</em>. Quarto brings together the console and the script editor, blurring the lines between interactive exploration and long-term code capture. You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter. When youre happy, you move on and start a new chunk.</p><p>Quarto is also important because it so tightly integrates prose and code. This makes it a great <strong>analysis notebook</strong> because it lets you develop code and record your thoughts. An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences. It:</p><ul><li><p>Records what you did and why you did it. Regardless of how great your memory is, if you dont record what you do, there will come a time when you have forgotten important details. Write them down so you dont forget!</p></li>
<li><p>Supports rigorous thinking. You are more likely to come up with a strong analysis if you record your thoughts as you go, and continue to reflect on them. This also saves you time when you eventually write up your analysis to share with others.</p></li>
<li><p>Helps others understand your work. It is rare to do data analysis by yourself, and youll often be working as part of a team. A lab notebook helps you share not only what youve done, but why you did it with your colleagues or lab mates.</p></li>
</ul><p>Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks. Weve drawn on our own experiences and Colin Purringtons advice on lab notebooks (<a href="https://colinpurrington.com/tips/lab-notebooks" class="uri">https://colinpurrington.com/tips/lab-notebooks</a>) to come up with the following tips:</p><ul><li><p>Ensure each notebook has a descriptive title, an evocative file name, and a first paragraph that briefly describes the aims of the analysis.</p></li>
<li>
<p>Use the YAML header date field to record the date you started working on the notebook:</p>
<pre data-type="programlisting" data-code-language="yaml">date: 2016-08-23</pre>
<p>Use ISO8601 YYYY-MM-DD format so thats there no ambiguity. Use it even if you dont normally write dates that way!</p>
</li>
<li><p>If you spend a lot of time on an analysis idea and it turns out to be a dead end, dont delete it! Write up a brief note about why it failed and leave it in the notebook. That will help you avoid going down the same dead end when you come back to the analysis in the future.</p></li>
2022-11-19 00:30:32 +08:00
<li><p>Generally, youre better off doing data entry outside of R. But if you do need to record a small snippet of data, clearly lay it out using <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tibble::tribble()</a></code>.</p></li>
<li><p>If you discover an error in a data file, never modify it directly, but instead write code to correct the value. Explain why you made the fix.</p></li>
<li><p>Before you finish for the day, make sure you can render the notebook. If youre using caching, make sure to clear the caches. That will let you fix any problems while the code is still fresh in your mind.</p></li>
2022-11-19 00:30:32 +08:00
<li><p>If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), youll need to track the versions of the packages that your code uses. A rigorous approach is to use <strong>renv</strong>, <a href="https://rstudio.github.io/renv/index.html" class="uri">https://rstudio.github.io/renv/index.html</a>, which stores packages in your project directory. A quick and dirty hack is to include a chunk that runs <code><a href="https://rdrr.io/r/utils/sessionInfo.html">sessionInfo()</a></code> — that wont let you easily recreate your packages as they are today, but at least youll know what they were.</p></li>
<li><p>You are going to create many, many, many analysis notebooks over the course of your career. How are you going to organize them so you can find them again in the future? We recommend storing them in individual projects, and coming up with a good naming scheme.</p></li>
</ul></section>