Spreadsheets editor comments

This commit is contained in:
mine-cetinkaya-rundel 2023-03-11 11:25:51 -05:00
parent 04c0d1907b
commit da0d03fd1b
1 changed files with 15 additions and 8 deletions

View File

@ -19,6 +19,8 @@ The best practices presented in this paper will save you much headache when you
## Excel
Microsoft Excel is a widely used spreadsheet software program where data are organized in worksheets inside of spreadsheet files.
### Prerequisites
In this section, you'll learn how to load data from Excel spreadsheets in R with the **readxl** package.
@ -30,7 +32,6 @@ Later, we'll also use the writexl package, which allows us to create Excel sprea
library(readxl)
library(tidyverse)
library(writexl)
```
@ -45,7 +46,7 @@ Most of readxl's functions allow you to load Excel spreadsheets into R:
These functions all have similar syntax just like other functions we have previously introduced for reading other types of files, e.g. `read_csv()`, `read_table()`, etc.
For the rest of the chapter we will focus on using `read_excel()`.
### Reading spreadsheets {#sec-reading-spreadsheets}
### Reading Excel spreadsheets {#sec-reading-spreadsheets-excel}
@fig-students-excel shows what the spreadsheet we're going to read into R looks like in Excel.
@ -317,17 +318,17 @@ Confusingly, it's also possible to have something that looks like a number but i
These differences between how the underlying data are stored vs. how they're displayed can cause surprises when the data are loaded into R.
By default readxl will guess the data type in a given column.
A recommended workflow is to let readxl guess the column types, confirm that you're happy with the guessed column types, and if not, go back and re-import specifying `col_types` as shown in @sec-reading-spreadsheets.
A recommended workflow is to let readxl guess the column types, confirm that you're happy with the guessed column types, and if not, go back and re-import specifying `col_types` as shown in @sec-reading-spreadsheets-excel.
Another challenge is when you have a column in your Excel spreadsheet that has a mix of these types, e.g. some cells are numeric, others text, others dates.
When importing the data into R readxl has to make some decisions.
In these cases you can set the type for this column to `"list"`, which will load the column as a list of length 1 vectors, where the type of each element of the vector is guessed.
### Data not in cell values
::: callout-note
Sometimes data is stored in more exotic ways, like the color of the cell background, or whether or not the text is bold.
In such cases, you might find the [tidyxl package](https://nacnudus.github.io/tidyxl/) useful.
See <https://nacnudus.github.io/spreadsheet-munging-strategies/> for more on strategies for working with non-tabular data from Excel.
:::
### Writing to Excel {#sec-writing-to-excel}
@ -507,6 +508,10 @@ A good way of familiarizing yourself with the coding style used in a new package
## Google Sheets
Google Sheets is another widely used spreadsheet program included.
It's free and web-based.
Just like with Excel, in Google Sheets data are organized in worksheets (also called sheets) inside of spreadsheet files.
### Prerequisites
This section will also focus on spreadsheets, but this time you'll be loading data from a Google Sheet with the **googlesheets4** package.
@ -531,7 +536,7 @@ readxl and googlesheets4 packages are both designed to mimic the functionality o
Therefore, many of the tasks can be accomplished with simply swapping out `read_excel()` for `read_sheet()`.
However you'll also see that Excel and Google Sheets don't behave in exactly the same way, therefore other tasks may require further updates to the function calls.
### Read sheets
### Reading Google Sheets
@fig-students-googlesheets shows what the spreadsheet we're going to read into R looks like in Google Sheets.
This is the same dataset as in @fig-students-excel, except it's stored in a Google Sheet instead of Excel.
@ -603,7 +608,7 @@ deaths <- read_sheet(deaths_url, range = "A5:F15")
deaths
```
### Write sheets
### Writing to Google Sheets
You can write from R to Google Sheets with `write_sheet()`.
The first argument is the data frame to write, and the second argument is the name (or other identifier) of the Google Sheet to write to:
@ -652,7 +657,9 @@ For further authentication details, we recommend reading the documentation googl
## Summary
In this chapter you learned how to read data into R from spreadsheets: from Microsoft Excel with `read_excel()` from the readxl package and from Google Sheets with `read_sheet()` from the googlesheets4 package.
Microsoft Excel and Google Sheets are two of the most popular spreadsheet systems.
Being able to interact with data stored in Excel and Google Sheets files directly from R is a superpower!
In this chapter you learned how to read data into R from spreadsheets from Excel with `read_excel()` from the readxl package and from Google Sheets with `read_sheet()` from the googlesheets4 package.
These functions work very similarly to each other and have similar arguments for specifying column names, NA strings, rows to skip on top of the file you're reading in, etc.
Additionally, both functions make it possible to read a single sheet from a spreadsheet as well.