Fix/databases probably typos (#1474)
* typos * probably typos * a typo * typos * a typo
This commit is contained in:
parent
32717bcf53
commit
d0dc1a8c6c
|
@ -16,7 +16,7 @@ You want to be able to reach into the database directly to get the data you need
|
|||
In this chapter, you'll first learn the basics of the DBI package: how to use it to connect to a database and then retrieve data with a SQL[^databases-1] query.
|
||||
**SQL**, short for **s**tructured **q**uery **l**anguage, is the lingua franca of databases, and is an important language for all data scientists to learn.
|
||||
That said, we're not going to start with SQL, but instead we'll teach you dbplyr, which can translate your dplyr code to the SQL.
|
||||
We'll use that as way to teach you some of the most important features of SQL.
|
||||
We'll use that as a way to teach you some of the most important features of SQL.
|
||||
You won't become a SQL master by the end of the chapter, but you will be able to identify the most important components and understand what they do.
|
||||
|
||||
[^databases-1]: SQL is either pronounced "s"-"q"-"l" or "sequel".
|
||||
|
@ -37,7 +37,7 @@ library(tidyverse)
|
|||
## Database basics
|
||||
|
||||
At the simplest level, you can think about a database as a collection of data frames, called **tables** in database terminology.
|
||||
Like a data.frame, a database table is a collection of named columns, where every value in the column is the same type.
|
||||
Like a data frame, a database table is a collection of named columns, where every value in the column is the same type.
|
||||
There are three high level differences between data frames and database tables:
|
||||
|
||||
- Database tables are stored on disk and can be arbitrarily large.
|
||||
|
@ -66,7 +66,7 @@ To connect to the database from R, you'll use a pair of packages:
|
|||
- You'll also use a package tailored for the DBMS you're connecting to.
|
||||
This package translates the generic DBI commands into the specifics needed for a given DBMS.
|
||||
There's usually one package for each DBMS, e.g.
|
||||
RPostgres for Postgres and RMariaDB for MySQL.
|
||||
RPostgres for PostgreSQL and RMariaDB for MySQL.
|
||||
|
||||
If you can't find a specific package for your DBMS, you can usually use the odbc package instead.
|
||||
This uses the ODBC protocol supported by many DBMS.
|
||||
|
@ -94,7 +94,7 @@ con <- DBI::dbConnect(
|
|||
The precise details of the connection vary a lot from DBMS to DBMS so unfortunately we can't cover all the details here.
|
||||
This means you'll need to do a little research on your own.
|
||||
Typically you can ask the other data scientists in your team or talk to your DBA (**d**ata**b**ase **a**dministrator).
|
||||
The initial setup will often take a little fiddling (and maybe some googling) to get right, but you'll generally only need to do it once.
|
||||
The initial setup will often take a little fiddling (and maybe some googling) to get it right, but you'll generally only need to do it once.
|
||||
|
||||
### In this book
|
||||
|
||||
|
@ -110,7 +110,7 @@ con <- DBI::dbConnect(duckdb::duckdb())
|
|||
```
|
||||
|
||||
duckdb is a high-performance database that's designed very much for the needs of a data scientist.
|
||||
We use it here because it's very to easy to get started with, but it's also capable of handling gigabytes of data with great speed.
|
||||
We use it here because it's very easy to get started with, but it's also capable of handling gigabytes of data with great speed.
|
||||
If you want to use duckdb for a real data analysis project, you'll also need to supply the `dbdir` argument to make a persistent database and tell duckdb where to save it.
|
||||
Assuming you're using a project (@sec-workflow-scripts-projects), it's reasonable to store it in the `duckdb` directory of the current project:
|
||||
|
||||
|
@ -301,7 +301,7 @@ The following sections explore each clause in more detail.
|
|||
|
||||
::: callout-note
|
||||
Note that while SQL is a standard, it is extremely complex and no database follows it exactly.
|
||||
While the main components that we'll focus on in this book are very similar between DBMSs, there are many minor variations.
|
||||
While the main components that we'll focus on in this book are very similar between DBMS's, there are many minor variations.
|
||||
Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases.
|
||||
It's not perfect, but it's continually improving, and if you hit a problem you can file an issue [on GitHub](https://github.com/tidyverse/dbplyr/issues/) to help us do better.
|
||||
:::
|
||||
|
@ -426,7 +426,7 @@ flights |>
|
|||
summarize(delay = mean(arr_delay))
|
||||
```
|
||||
|
||||
If you want to learn more about how NULLs work, you might enjoy "[*Three valued logic*](https://modern-sql.com/concept/three-valued-logic)" by Markus Winand.
|
||||
If you want to learn more about how `NULL`s work, you might enjoy "[*Three valued logic*](https://modern-sql.com/concept/three-valued-logic)" by Markus Winand.
|
||||
|
||||
In general, you can work with `NULL`s using the functions you'd use for `NA`s in R:
|
||||
|
||||
|
@ -655,7 +655,7 @@ dbplyr's translations are certainly not perfect, and there are many R functions
|
|||
|
||||
In this chapter you learned how to access data from databases.
|
||||
We focused on dbplyr, a dplyr "backend" that allows you to write the dplyr code you're familiar with, and have it be automatically translated to SQL.
|
||||
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will it easier for you to communicate with other data folks who don't use R.
|
||||
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will make it easier for you to communicate with other data folks who don't use R.
|
||||
If you've finished this chapter and would like to learn more about SQL.
|
||||
We have two recommendations:
|
||||
|
||||
|
|
Loading…
Reference in New Issue