Stay organized with collections
Save and categorize content based on your preferences.
To cluster your data, you'll follow these steps:
Prepare data.
Create similarity metric.
Run clustering algorithm.
Interpret results and adjust your clustering.
This page briefly introduces the steps. We'll go into depth in subsequent
sections.
Prepare data
As with any ML problem, you must normalize, scale, and transform feature data
before training or fine-tuning a model on that data. In addition, before
clustering, check that the prepared data lets you accurately calculate
similarity between examples.
Create similarity metric
Before a clustering algorithm can group data, it needs to know how similar
pairs of examples are. You can quantify the similarity between examples by
creating a similarity metric, which requires a careful understanding of your
data.
Run clustering algorithm
A clustering algorithm uses the similarity metric to cluster data.
This course uses k-means.
Interpret results and adjust
Because clustering doesn't produce or include a ground "truth" against which you
can verify the output, it's important to check the result against your
expectations at both the cluster level and the example level. If the result
looks odd or low-quality, experiment with the previous three steps. Continue
iterating until the quality of the output meets your needs.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-25 UTC."],[[["Data needs to be prepared through normalization, scaling, and transformation before using it for clustering."],["A similarity metric is crucial for clustering algorithms as it quantifies how similar data points are to each other."],["The k-means algorithm is employed in this course to group data based on the defined similarity metric."],["Evaluating and adjusting clustering outcomes is an iterative process involving reviewing cluster quality and individual data point assignments."]]],[]]