changed instances of heirarchical into hierachical in variation.Rmd

This commit is contained in:
rlzijdeman 2016-06-03 16:50:09 +02:00
parent 427ebd6ac0
commit 2ab8e749c9
1 changed files with 7 additions and 7 deletions

View File

@ -371,11 +371,11 @@ This doesn't mean that you should ignore complex interactions in your data. You
## Clusters
Cluster algorithms are automated tools that seek out clusters in n-dimensional space for you. Base R provides two easy to use clustering algorithms: heirarchical clustering and k means clustering.
Cluster algorithms are automated tools that seek out clusters in n-dimensional space for you. Base R provides two easy to use clustering algorithms: hierarchical clustering and k means clustering.
### Heirarchical clustering
### Hierarchical clustering
Heirarchical clustering uses a simple algorithm to locate groups of points that are near each other in n-dimensional space:
Hierarchical clustering uses a simple algorithm to locate groups of points that are near each other in n-dimensional space:
1. Identify the two points that are closest to each other
2. Combine these points into a cluster
@ -388,7 +388,7 @@ You can visualize the results of the algorithm as a dendrogram, and you can use
knitr::include_graphics("images/EDA-hclust.pdf")
```
To use heirarchical clustering in R, begin by selecting the numeric columns from your data; you can only apply heirarchical clustering to numeric data. Then apply the `dist()` function to the data and pass the results to `hclust()`. `dist()` computes the distances between your points in the n dimensional space defined by your numeric vectors. `hclust()` performs the clustering algorithm.
To use hierarchical clustering in R, begin by selecting the numeric columns from your data; you can only apply hierarchical clustering to numeric data. Then apply the `dist()` function to the data and pass the results to `hclust()`. `dist()` computes the distances between your points in the n dimensional space defined by your numeric vectors. `hclust()` performs the clustering algorithm.
```{r}
small_iris <- sample_n(iris, 50)
@ -418,7 +418,7 @@ ggplot(small_iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point(aes(color = factor(clusters)))
```
You can modify the heirarchical clustering algorithm by setting the method argument of hclust to one of "complete", "single", "average", or "centroid". The method determines how to measure the distance between two clusters or a lone point and a cluster, a measurement that effects the outcome of the algorithm.
You can modify the hierarchical clustering algorithm by setting the method argument of hclust to one of "complete", "single", "average", or "centroid". The method determines how to measure the distance between two clusters or a lone point and a cluster, a measurement that effects the outcome of the algorithm.
```{r, echo = FALSE}
knitr::include_graphics("images/EDA-linkage.pdf")
@ -444,7 +444,7 @@ small_iris %>%
### K means clustering
K means clustering provides a simulation based alternative to heirarchical clustering. It identifies the "best" way to group your data into a pre-defined number of clusters. The figure below visualizes (in two dimensional space) the k means algorith:
K means clustering provides a simulation based alternative to hierarchical clustering. It identifies the "best" way to group your data into a pre-defined number of clusters. The figure below visualizes (in two dimensional space) the k means algorith:
1. Randomly assign each data point to one of $k$ groups
2. Compute the centroid of each group
@ -455,7 +455,7 @@ K means clustering provides a simulation based alternative to heirarchical clust
knitr::include_graphics("images/EDA-kmeans.pdf")
```
Use `kmeans()` to perform k means clustering with R. As with heirarchical clustering, you can only apply k means clustering to numerical data. Pass your numerical data to the `kmeans()` function, then set `center` to the number of clusters to search for ($k$) and `nstart` to the number of simulations to run. Since the results of k means clustering depend on the initial assignment of points to groups, which is random, R will run `nstart` simulations and then return the best results (as measured by the minimum sum of squared distances between each point and the centroid of the group it is assigned to). Finally, set the maximum number of iterations to let each simulation run in case the simulation cannot quickly find a stable grouping.
Use `kmeans()` to perform k means clustering with R. As with hierarchical clustering, you can only apply k means clustering to numerical data. Pass your numerical data to the `kmeans()` function, then set `center` to the number of clusters to search for ($k$) and `nstart` to the number of simulations to run. Since the results of k means clustering depend on the initial assignment of points to groups, which is random, R will run `nstart` simulations and then return the best results (as measured by the minimum sum of squared distances between each point and the centroid of the group it is assigned to). Finally, set the maximum number of iterations to let each simulation run in case the simulation cannot quickly find a stable grouping.
```{r}
iris_kmeans <- small_iris %>%