Fix/databases probably typos (#1474)

* typos

* probably typos

* a typo

* typos

* a typo
This commit is contained in:
Mitsuo Shiota 2023-05-21 13:00:25 +09:00 committed by GitHub
parent 32717bcf53
commit d0dc1a8c6c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 8 additions and 8 deletions

View File

@ -16,7 +16,7 @@ You want to be able to reach into the database directly to get the data you need
In this chapter, you'll first learn the basics of the DBI package: how to use it to connect to a database and then retrieve data with a SQL[^databases-1] query.
**SQL**, short for **s**tructured **q**uery **l**anguage, is the lingua franca of databases, and is an important language for all data scientists to learn.
That said, we're not going to start with SQL, but instead we'll teach you dbplyr, which can translate your dplyr code to the SQL.
We'll use that as way to teach you some of the most important features of SQL.
We'll use that as a way to teach you some of the most important features of SQL.
You won't become a SQL master by the end of the chapter, but you will be able to identify the most important components and understand what they do.
[^databases-1]: SQL is either pronounced "s"-"q"-"l" or "sequel".
@ -37,7 +37,7 @@ library(tidyverse)
## Database basics
At the simplest level, you can think about a database as a collection of data frames, called **tables** in database terminology.
Like a data.frame, a database table is a collection of named columns, where every value in the column is the same type.
Like a data frame, a database table is a collection of named columns, where every value in the column is the same type.
There are three high level differences between data frames and database tables:
- Database tables are stored on disk and can be arbitrarily large.
@ -66,7 +66,7 @@ To connect to the database from R, you'll use a pair of packages:
- You'll also use a package tailored for the DBMS you're connecting to.
This package translates the generic DBI commands into the specifics needed for a given DBMS.
There's usually one package for each DBMS, e.g.
RPostgres for Postgres and RMariaDB for MySQL.
RPostgres for PostgreSQL and RMariaDB for MySQL.
If you can't find a specific package for your DBMS, you can usually use the odbc package instead.
This uses the ODBC protocol supported by many DBMS.
@ -94,7 +94,7 @@ con <- DBI::dbConnect(
The precise details of the connection vary a lot from DBMS to DBMS so unfortunately we can't cover all the details here.
This means you'll need to do a little research on your own.
Typically you can ask the other data scientists in your team or talk to your DBA (**d**ata**b**ase **a**dministrator).
The initial setup will often take a little fiddling (and maybe some googling) to get right, but you'll generally only need to do it once.
The initial setup will often take a little fiddling (and maybe some googling) to get it right, but you'll generally only need to do it once.
### In this book
@ -110,7 +110,7 @@ con <- DBI::dbConnect(duckdb::duckdb())
```
duckdb is a high-performance database that's designed very much for the needs of a data scientist.
We use it here because it's very to easy to get started with, but it's also capable of handling gigabytes of data with great speed.
We use it here because it's very easy to get started with, but it's also capable of handling gigabytes of data with great speed.
If you want to use duckdb for a real data analysis project, you'll also need to supply the `dbdir` argument to make a persistent database and tell duckdb where to save it.
Assuming you're using a project (@sec-workflow-scripts-projects), it's reasonable to store it in the `duckdb` directory of the current project:
@ -301,7 +301,7 @@ The following sections explore each clause in more detail.
::: callout-note
Note that while SQL is a standard, it is extremely complex and no database follows it exactly.
While the main components that we'll focus on in this book are very similar between DBMSs, there are many minor variations.
While the main components that we'll focus on in this book are very similar between DBMS's, there are many minor variations.
Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases.
It's not perfect, but it's continually improving, and if you hit a problem you can file an issue [on GitHub](https://github.com/tidyverse/dbplyr/issues/) to help us do better.
:::
@ -426,7 +426,7 @@ flights |>
summarize(delay = mean(arr_delay))
```
If you want to learn more about how NULLs work, you might enjoy "[*Three valued logic*](https://modern-sql.com/concept/three-valued-logic)" by Markus Winand.
If you want to learn more about how `NULL`s work, you might enjoy "[*Three valued logic*](https://modern-sql.com/concept/three-valued-logic)" by Markus Winand.
In general, you can work with `NULL`s using the functions you'd use for `NA`s in R:
@ -655,7 +655,7 @@ dbplyr's translations are certainly not perfect, and there are many R functions
In this chapter you learned how to access data from databases.
We focused on dbplyr, a dplyr "backend" that allows you to write the dplyr code you're familiar with, and have it be automatically translated to SQL.
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will it easier for you to communicate with other data folks who don't use R.
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will make it easier for you to communicate with other data folks who don't use R.
If you've finished this chapter and would like to learn more about SQL.
We have two recommendations: