diff --git a/quarto.qmd b/quarto.qmd index c35ef6f..260ca9b 100644 --- a/quarto.qmd +++ b/quarto.qmd @@ -601,28 +601,25 @@ On subsequent runs, knitr will check to see if the code has changed, and if it h The caching system must be used with care, because by default it is based on the code only, not its dependencies. For example, here the `processed_data` chunk depends on the `raw-data` chunk: -``` -`r chunk`{r} +```{r} #| label: raw-data rawdata <- readr::read_csv("a_very_large_file.csv") -`r chunk` +``` -`r chunk`{r} +```{r} #| label: processed_data #| cache: true processed_data <- rawdata |> filter(!is.na(import_var)) |> mutate(new_variable = complicated_transformation(x, y, z)) -`r chunk` ``` Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes. You can avoid that problem with the `dependson` chunk option: -``` -`r chunk`{r} +```{r} #| label: processed-data #| cache: true #| dependson: "raw-data" @@ -630,7 +627,6 @@ You can avoid that problem with the `dependson` chunk option: processed_data <- rawdata |> filter(!is.na(import_var)) |> mutate(new_variable = complicated_transformation(x, y, z)) -`r chunk` ``` `dependson` should contain a character vector of *every* chunk that the cached chunk depends on. @@ -642,13 +638,11 @@ This is an arbitrary R expression that will invalidate the cache whenever it cha A good function to use is `file.info()`: it returns a bunch of information about the file including when it was last modified. Then you can write: -``` -`r chunk`{r} +```{r} #| label: raw-data #| cache.extra: file.info("a_very_large_file.csv") rawdata <- readr::read_csv("a_very_large_file.csv") -`r chunk` ``` We've followed the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) to name these chunks: each chunk is named after the primary object that it creates.