Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

///Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

Machine Learning Tutorial Python – 13: K Means Clustering Algorithm

FavoriteLoadingAdd to favorites

K Means clustering algorithm is unsupervised machine learning technique used to cluster data points. In this tutorial we will go over some theory behind how k means works and then solve income group clustering problem using sklearn, kmeans and python. Elbow method is a technique used to determine optimal number of k, we will review that method as well.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #kmeans #MachineLearningTechnique #sklearn

Code:
data link:

Exercise solution:

Topics that are covered in this Video:
0:00 introduction
0:08 Theory – Explanation of Supervised vs Unsupervised learning and how kmeans clustering works. kmeans is unsupervised learning
5:00 Elbow method
7:33 Coding (start) (Cluster people income based on age)
9:38 sklearn.cluster KMeans model creation and training
14:56 Use MinMaxScaler from sklearn
24:07 Exercise (Cluster iris flowers using their petal width and length)

Next Video:
Machine Learning Tutorial Python – 14: Naive Bayes Part 1:

Populor Playlist:
Data Science Full Course:

Data Science Project:

Machine learning tutorials:

Pandas:

matplotlib:

Python:

Jupyter Notebook:

To download csv and code for all tutorials: go to click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website:
Facebook:
Twitter:

source

By |2021-07-22T15:14:51+00:00July 22nd, 2021|Python Video Tutorials|46 Comments

46 Comments

  1. Neeraj kumar July 22, 2021 at 3:14 pm - Reply

    very good content for begineers

  2. Manoj kumar July 22, 2021 at 3:14 pm - Reply

    Excellent tutorial. I have a query. in your notebook after shift+enter it prints all argument of that class but in my notebook it prints only class name. how can i change this

  3. Alon Avramson July 22, 2021 at 3:14 pm - Reply

    Thank you! really enjoyed this session. I tried both Petal and Sepal and it went very well.

  4. Ajay Kumar July 22, 2021 at 3:14 pm - Reply

    My search technique has changed from "hierarchical clustering" to "hierarchical clustering codebasics".

  5. mandeep kharb July 22, 2021 at 3:14 pm - Reply

    Hi,

    With this K means clustering algorithm, prediction accuracy is very low. I tried my hands with sepal length and sepal width features but accuracy came is 0.1, how to improve on that ?

  6. Tejobhiru July 22, 2021 at 3:14 pm - Reply

    hi.. in the exercise, ive got the K value as 3. but even on using K=3 in KMeans, im getting only 2 clusters in the scatter plot !!
    any idea why?

  7. mandeep kharb July 22, 2021 at 3:14 pm - Reply

    How we get the intuition that we need to scale the data ?
    Can we do scaling in supervised learning algorithms like-> linear,logistic regression, SVM etc ?

  8. Binod Pratap Singh July 22, 2021 at 3:14 pm - Reply

    Elbow Method fails in case we are having more than 6 clusters

  9. francis kiragu July 22, 2021 at 3:14 pm - Reply

    Wow. You are a great man. You've made it soo simple to understand. Thank you Sir. 🔥🔥🔥

  10. Deepak Bhandari July 22, 2021 at 3:14 pm - Reply

    Excellent.

  11. Rakesh Murali July 22, 2021 at 3:14 pm - Reply

    Beautiful! Superbly explained!

  12. Sunil Anand July 22, 2021 at 3:14 pm - Reply

    How can I troubleshoot this error while scaling?
    " Expected 2D array, got 1D array instead:

    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. "

  13. SARATH S July 22, 2021 at 3:14 pm - Reply

    Sir is there a way that I can Use more than one independent variable for clustering. For example, Can I use both petal length and breadth to group the different species of Irises??

  14. damodharratnam thappeta July 22, 2021 at 3:14 pm - Reply

    why do we calculate only Euclidean why not manhattan

  15. Marc Hansel Thomas July 22, 2021 at 3:14 pm - Reply

    lookin handsome my man!

  16. Andrew Lim July 22, 2021 at 3:14 pm - Reply

    Excellent quick and short explanation of K-means. Appreciate it

  17. Gwap Da Math Tutor July 22, 2021 at 3:14 pm - Reply

    How do we make predictions using the scaled model???

  18. Abhinav Sharma July 22, 2021 at 3:14 pm - Reply

    Really Appreciate your lectures 🙏🙏
    Btw, the value of K by elbow technique you taught is 3.

  19. VINEETH GOGU July 22, 2021 at 3:14 pm - Reply

    Can you share the link for text clustering ???

  20. Milan Tomin July 22, 2021 at 3:14 pm - Reply

    This is the best-explained K-means on the internet – period. Thank you!

  21. doing the elbow method for the iris data set i can't decide if the optimal value for k is 2 or 3.

  22. Raghu Ram July 22, 2021 at 3:14 pm - Reply

    Hello sir, iam big fan for your coaching, requesting you to upload the explanation of KNN algorithm too.PLEASE …Thanks in advance.

  23. Shubhankit Chowdhury July 22, 2021 at 3:14 pm - Reply

    I am facing problem to import the csv file ? Can anyone pls help

  24. Amanda Low July 22, 2021 at 3:14 pm - Reply

    Wow, great intro to cluster analysis in Python. Thank you so much, awesome teaching as always!

  25. Nash Gaming July 22, 2021 at 3:14 pm - Reply

    Hi Brother, Very good tutorial. Can we make a confusion matrix on it and calculate accuracy, f1score etc.???

  26. GCET-20-18 Ram Baldotra July 22, 2021 at 3:14 pm - Reply

    Iris dataset : Got optimal n_clusters=3 by elbow technique ✨Thanks A lot sir .
    Sir in Iris data set cluster_centres are having only one coordinate ,Can you tell us how can we plot them on scatter plot.

  27. LoneTree262 July 22, 2021 at 3:14 pm - Reply

    Great video. I am still learning and found this very helpful.

  28. Shifra Isaacs July 22, 2021 at 3:14 pm - Reply

    Thanks so much!

  29. Pınar Doğan July 22, 2021 at 3:14 pm - Reply

    very well explained. Thank ou for this great tutorial!

  30. namrata kelkar July 22, 2021 at 3:14 pm - Reply

    Thank you so much for this tutorial 🙂

  31. Durga Prasad Vadlamoodi July 22, 2021 at 3:14 pm - Reply

    Excelle tutorial, Thank you very much

  32. Mr. Banzer July 22, 2021 at 3:14 pm - Reply

    I have a question why we use an unsupervised machine learning algorithm on this data? This data can be easily delt with a supervised machine learning algorithm. Like Linear Regression or any other model!

  33. Geethanjali Ravichandhran July 22, 2021 at 3:14 pm - Reply

    Hi sir , using Iris data set i performed your exercise i drop target feature first and i predict it using kmeans later i compare the predicted target and real target the score is -2.0978758743804002 i really dont know how to improve my score could you please help me reg this? These results are oly for sepal length and sepal width along with real target and predicted target

  34. Subham Saha July 22, 2021 at 3:14 pm - Reply

    May be, using log on Y axis would be better than using minmax scaler, it would save out time.

  35. Amr Elkholy July 22, 2021 at 3:14 pm - Reply

    you are amazing, I like your simplicity in delivering the information, thank you very much

  36. Mangesh Chitale July 22, 2021 at 3:14 pm - Reply

    The best

  37. Akhmad Syakhlani July 22, 2021 at 3:14 pm - Reply

    very clear…thanks so much

  38. Mirian Carrillo July 22, 2021 at 3:14 pm - Reply

    The number of clusters I choose was 3, and I confirmed with the elbow method. I also scaled the features 🙂

  39. Shankar Gymnastics and Karate July 22, 2021 at 3:14 pm - Reply

    Great job

  40. Przemysław Pałczyński July 22, 2021 at 3:14 pm - Reply

    Thank you for your courses

  41. Prasanna kumar July 22, 2021 at 3:14 pm - Reply

    Sir, why we need to find the number clusters in our datasets?

  42. Nikhil Ramabadran July 22, 2021 at 3:14 pm - Reply

    how is an error even possible in the above case, a centroid is an imagined point , right? How can there be an error possible if you are calculating the euclidean distance between the centroid and each point . Remember there is no target variable , unlike supervised machine learning . Please make that ambiguity clear .

  43. lucelli July 22, 2021 at 3:14 pm - Reply

    thank you

  44. Jakub Ircow July 22, 2021 at 3:14 pm - Reply

    the way you re scaling it is horrible

  45. Ruthvik Raja M.V July 22, 2021 at 3:14 pm - Reply

    osm

  46. You are a genius.

Leave A Comment

*