r4ds/oreilly/transform.html

18 lines
3.5 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div data-type="part">
<h1><span id="sec-transform-intro" class="quarto-section-identifier d-none d-lg-block">Transform</span></h1><p>After reading the first part of the book, you understand (at least superficially) the most important tools for doing data science. Now its time to start diving into the details. In this part of the book, youll learn about the most important types of variables that youll encounter inside a data frame and learn the tools you can use to work with them.</p><div class="cell">
<div class="cell-output-display">
<figure id="fig-ds-transform"><p><img src="diagrams/data-science/transform.png" alt="Our data science model transform, highlighted in blue. " width="535"/></p>
<figcaption>Figure 1: The options for data transformation depends heavily on the type of data involve, the subject of this part of the book.</figcaption>
</figure>
</div>
</div><p>You can read these chapters as you need them; theyre designed to be largely standalone so that they can be read out of order.</p><ul><li><p><a href="#chp-logicals" data-type="xref">#chp-logicals</a> teaches you about logical vectors. These are simplest type of vector, but are extremely powerful. Youll learn how to create them with numeric comparisons, how to combine them with Boolean algebra, how to use them in summaries, and how to use them for condition transformations.</p></li>
<li><p><a href="#chp-numbers" data-type="xref">#chp-numbers</a> dives into tools for vectors of numbers, the powerhouse of data science. Youll learn more about counting and a bunch of important transformation and summary functions.</p></li>
<li><p><a href="#chp-strings" data-type="xref">#chp-strings</a> will give you the tools to work with strings: youll slice them, youll dice them, and youll stick them back together again. This chapter mostly focuses on the stringr package, but youll also learn some more tidyr functions devoted to extracting data from strings.</p></li>
<li><p><a href="#chp-regexps" data-type="xref">#chp-regexps</a> introduces you to regular expressions, a powerful tool for manipulating strings. This chapter will take you from thinking that a cat walked over your keyboard to reading and writing complex string patterns.</p></li>
<li><p><a href="#chp-factors" data-type="xref">#chp-factors</a> introduces factors: the data type that R uses to store categorical data. You use a factor when variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.</p></li>
<li><p><a href="#chp-datetimes" data-type="xref">#chp-datetimes</a> will give you the key tools for working with dates and date-times. Unfortunately, the more you learn about date-times, the more complicated they seem to get, but with the help of the lubridate package, youll learn to how to overcome the most common challenges.</p></li>
<li><p><a href="#chp-missing-values" data-type="xref">#chp-missing-values</a> discusses missing values in depth. Weve discussed them a couple of times in isolation, but now its time to discuss them holistically, helping you come to grips with the difference between implicit and explicit missing values, and how and why you might convert between them.</p></li>
<li><p><a href="#chp-joins" data-type="xref">#chp-joins</a> finishes up this part of the book by giving you tools to join two (or more) data frames together. Learning about joins will force you to grapple with the idea of keys, and think about how you identify each row in a dataset.</p></li>
</ul></div>