# Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

///Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

## Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

K Means clustering algorithm is unsupervised machine learning technique used to cluster data points. In this tutorial we will go over some theory behind how k means works and then solve income group clustering problem using sklearn, kmeans and python. Elbow method is a technique used to determine optimal number of k, we will review that method as well.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #kmeans #MachineLearningTechnique #sklearn

Code:

Exercise solution:

Topics that are covered in this Video:
0:00 introduction
0:08 Theory – Explanation of Supervised vs Unsupervised learning and how kmeans clustering works. kmeans is unsupervised learning
5:00 Elbow method
7:33 Coding (start) (Cluster people income based on age)
9:38 sklearn.cluster KMeans model creation and training
14:56 Use MinMaxScaler from sklearn
24:07 Exercise (Cluster iris flowers using their petal width and length)

Next Video:
Machine Learning Tutorial Python – 14: Naive Bayes Part 1:

Populor Playlist:
Data Science Full Course:

Data Science Project:

Machine learning tutorials:

Pandas:

matplotlib:

Python:

Jupyter Notebook:

To download csv and code for all tutorials: go to click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website:

source

By |2021-07-22T15:14:51+00:00July 22nd, 2021|Python Video Tutorials|46 Comments

1. Neeraj kumar July 22, 2021 at 3:14 pm - Reply

very good content for begineers

2. Manoj kumar July 22, 2021 at 3:14 pm - Reply

Excellent tutorial. I have a query. in your notebook after shift+enter it prints all argument of that class but in my notebook it prints only class name. how can i change this

3. Alon Avramson July 22, 2021 at 3:14 pm - Reply

Thank you! really enjoyed this session. I tried both Petal and Sepal and it went very well.

4. Ajay Kumar July 22, 2021 at 3:14 pm - Reply

My search technique has changed from "hierarchical clustering" to "hierarchical clustering codebasics".

5. mandeep kharb July 22, 2021 at 3:14 pm - Reply

Hi,

With this K means clustering algorithm, prediction accuracy is very low. I tried my hands with sepal length and sepal width features but accuracy came is 0.1, how to improve on that ?

6. Tejobhiru July 22, 2021 at 3:14 pm - Reply

hi.. in the exercise, ive got the K value as 3. but even on using K=3 in KMeans, im getting only 2 clusters in the scatter plot !!
any idea why?

7. mandeep kharb July 22, 2021 at 3:14 pm - Reply

How we get the intuition that we need to scale the data ?
Can we do scaling in supervised learning algorithms like-> linear,logistic regression, SVM etc ?

8. Binod Pratap Singh July 22, 2021 at 3:14 pm - Reply

Elbow Method fails in case we are having more than 6 clusters

9. francis kiragu July 22, 2021 at 3:14 pm - Reply

Wow. You are a great man. You've made it soo simple to understand. Thank you Sir. 🔥🔥🔥

10. Deepak Bhandari July 22, 2021 at 3:14 pm - Reply

Excellent.

11. Rakesh Murali July 22, 2021 at 3:14 pm - Reply

Beautiful! Superbly explained!

12. Sunil Anand July 22, 2021 at 3:14 pm - Reply

How can I troubleshoot this error while scaling?
" Expected 2D array, got 1D array instead:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. "

13. SARATH S July 22, 2021 at 3:14 pm - Reply

Sir is there a way that I can Use more than one independent variable for clustering. For example, Can I use both petal length and breadth to group the different species of Irises??

14. damodharratnam thappeta July 22, 2021 at 3:14 pm - Reply

why do we calculate only Euclidean why not manhattan

15. Marc Hansel Thomas July 22, 2021 at 3:14 pm - Reply

lookin handsome my man!

16. Andrew Lim July 22, 2021 at 3:14 pm - Reply

Excellent quick and short explanation of K-means. Appreciate it

17. Gwap Da Math Tutor July 22, 2021 at 3:14 pm - Reply

How do we make predictions using the scaled model???

18. Abhinav Sharma July 22, 2021 at 3:14 pm - Reply

Btw, the value of K by elbow technique you taught is 3.

19. VINEETH GOGU July 22, 2021 at 3:14 pm - Reply

Can you share the link for text clustering ???

20. Milan Tomin July 22, 2021 at 3:14 pm - Reply

This is the best-explained K-means on the internet – period. Thank you!

21. doing the elbow method for the iris data set i can't decide if the optimal value for k is 2 or 3.

22. Raghu Ram July 22, 2021 at 3:14 pm - Reply

23. Shubhankit Chowdhury July 22, 2021 at 3:14 pm - Reply

I am facing problem to import the csv file ? Can anyone pls help

24. Amanda Low July 22, 2021 at 3:14 pm - Reply

Wow, great intro to cluster analysis in Python. Thank you so much, awesome teaching as always!

25. Nash Gaming July 22, 2021 at 3:14 pm - Reply

Hi Brother, Very good tutorial. Can we make a confusion matrix on it and calculate accuracy, f1score etc.???

26. GCET-20-18 Ram Baldotra July 22, 2021 at 3:14 pm - Reply

Iris dataset : Got optimal n_clusters=3 by elbow technique ✨Thanks A lot sir .
Sir in Iris data set cluster_centres are having only one coordinate ,Can you tell us how can we plot them on scatter plot.

27. LoneTree262 July 22, 2021 at 3:14 pm - Reply

Great video. I am still learning and found this very helpful.

28. Shifra Isaacs July 22, 2021 at 3:14 pm - Reply

Thanks so much!

29. Pınar Doğan July 22, 2021 at 3:14 pm - Reply

very well explained. Thank ou for this great tutorial!

30. namrata kelkar July 22, 2021 at 3:14 pm - Reply

Thank you so much for this tutorial 🙂

Excelle tutorial, Thank you very much

32. Mr. Banzer July 22, 2021 at 3:14 pm - Reply

I have a question why we use an unsupervised machine learning algorithm on this data? This data can be easily delt with a supervised machine learning algorithm. Like Linear Regression or any other model!

33. Geethanjali Ravichandhran July 22, 2021 at 3:14 pm - Reply

Hi sir , using Iris data set i performed your exercise i drop target feature first and i predict it using kmeans later i compare the predicted target and real target the score is -2.0978758743804002 i really dont know how to improve my score could you please help me reg this? These results are oly for sepal length and sepal width along with real target and predicted target

34. Subham Saha July 22, 2021 at 3:14 pm - Reply

May be, using log on Y axis would be better than using minmax scaler, it would save out time.

35. Amr Elkholy July 22, 2021 at 3:14 pm - Reply

you are amazing, I like your simplicity in delivering the information, thank you very much

36. Mangesh Chitale July 22, 2021 at 3:14 pm - Reply

The best

very clear…thanks so much

38. Mirian Carrillo July 22, 2021 at 3:14 pm - Reply

The number of clusters I choose was 3, and I confirmed with the elbow method. I also scaled the features 🙂

39. Shankar Gymnastics and Karate July 22, 2021 at 3:14 pm - Reply

Great job

40. Przemysław Pałczyński July 22, 2021 at 3:14 pm - Reply

41. Prasanna kumar July 22, 2021 at 3:14 pm - Reply

Sir, why we need to find the number clusters in our datasets?

how is an error even possible in the above case, a centroid is an imagined point , right? How can there be an error possible if you are calculating the euclidean distance between the centroid and each point . Remember there is no target variable , unlike supervised machine learning . Please make that ambiguity clear .

43. lucelli July 22, 2021 at 3:14 pm - Reply

thank you

44. Jakub Ircow July 22, 2021 at 3:14 pm - Reply

the way you re scaling it is horrible

45. Ruthvik Raja M.V July 22, 2021 at 3:14 pm - Reply

osm

46. You are a genius.