𝐊-𝐌𝐞𝐚𝐧𝐬 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠 𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 - 𝐟𝐨𝐫 𝐛𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬
𝐖𝐡𝐚𝐭 𝐢𝐬 𝐊-𝐌𝐞𝐚𝐧𝐬?
It’s an unsupervised machine learning algorithm that automatically groups your data into K similar clusters without labels. It finds hidden patterns using distance-based similarity.
𝐈𝐧𝐭𝐮𝐢𝐭𝐢𝐯𝐞 𝐞𝐱𝐚𝐦𝐩𝐥𝐞:
You run a mall. Your data has:
› Age
› Annual Income
› Spending Score
K-Means can divide customers into:
⤷ Budget Shoppers
⤷ Mid-Range Customers
⤷ High-End Spenders
𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬:
① Choose the number of clusters K
② Randomly initialize K centroids
③ Assign each point to its nearest centroid
④ Move centroids to the mean of their assigned points
⑤ Repeat until centroids don’t move (convergence)
𝐎𝐛𝐣𝐞𝐜𝐭𝐢𝐯𝐞:
Minimize the total squared distance between data points and their cluster centroids
𝐉 = Σ‖𝐱ᵢ - μⱼ‖²
Where 𝐱ᵢ = data point, μⱼ = cluster center
𝐇𝐨𝐰 𝐭𝐨 𝐩𝐢𝐜𝐤 𝐊:
Use the Elbow Method
⤷ Plot K vs. total within-cluster variance
⤷ The “elbow” in the curve = ideal number of clusters
𝐂𝐨𝐝𝐞 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 (𝐒𝐜𝐢𝐤𝐢𝐭-𝐋𝐞𝐚𝐫𝐧):
from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
model = KMeans(n_clusters=2, random_state=0)
model.fit(X)
print(model.labels_)
print(model.cluster_centers_)
𝐁𝐞𝐬𝐭 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞𝐬:
⤷ Customer segmentation
⤷ Image compression
⤷ Market analysis
⤷ Social network analysis
𝐋𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧𝐬:
› Sensitive to outliers
› Requires you to predefine K
› Works best with spherical clusters
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A