# Machine Learning Tutorial Python – 9 Decision Tree

///Machine Learning Tutorial Python – 9 Decision Tree

## Machine Learning Tutorial Python – 9 Decision Tree

Decision tree algorithm is used to solve classification problem in machine learning domain. In this tutorial we will solve employee salary prediction problem using decision tree. First we will go over some theory and then do coding practice. In the end I’ve a very interesting exercise for you to solve.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #DecisionTree

Code:
csv file for exercise:

Topics that are covered in this Video:
0:02 – How to solve classification problem using decision tree algorithm?
0:26 – Theory (Explain rationale behind decision tree using a use case of predicting salary based on department, degree and company that a person is working for)
2:10 – How do you select ordering of features? High vs low information gain and entropy
3:52 – Gini impurity
4:28 – Coding (start)
9:11 – Create sklearn model using DecisionTreeClassifier
13:32 – Exercise (Find out survival rate of titanic ship passengers using decision tree)

Next Video:
Machine Learning Tutorial Python – 10 Support Vector Machine (SVM):

Populor Playlist:
Data Science Full Course:

Data Science Project:

Machine learning tutorials:

Pandas:

matplotlib:

Python:

Jupyter Notebook:

To download csv and code for all tutorials: go to click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website:

source

By |2020-05-24T03:45:44+00:00May 24th, 2020|Python Video Tutorials|22 Comments

1. codebasics May 24, 2020 at 3:45 am - Reply
2. Kush Varma May 24, 2020 at 3:45 am - Reply

Got a score of 97% in titanic dataset, I used LabelEncoder for Sex column, when I tried to fill empty age with the mean value, the column converted to float64, then by using df_new_data.Age = df_new_data.Age.astype(np.int64), converted to integer column of age. Which all together resulted in a score of 97.41% model.

3. Nuaman Afzal May 24, 2020 at 3:45 am - Reply

sir i got 0.987 score

4. Naveen Kalhan May 24, 2020 at 3:45 am - Reply

really appreciate your work. learning a lot… just want to confirm something from the tutorial @7:40 you are using fit_transform with le_company object for all the other columns and did not use le_job object and le_degree object. is it ok? or should we do it? Thank you very much again.

Hi all,
Suppose we have around 80000 training examples with four features and all the features have around 1000 unique values now if I use one-hot encoding then the data set will explode, we are going to have a very large data set in terms of the column. Is it good to use one-hot encoding or we have a better option to deal with such types of situation?
Thanks

6. Naveen Kumar S May 24, 2020 at 3:45 am - Reply

Sir how to predict two target columns as output

7. Naveen Kumar S May 24, 2020 at 3:45 am - Reply

the score is 0.9876543209876543

8. Elias hossain May 24, 2020 at 3:45 am - Reply

Exercise result for the titanic dataset: Score: 0.77 (using Decision Tree Classifier)

9. yash khare May 24, 2020 at 3:45 am - Reply

Sir,what if we directly apply decision tree without using train test split

10. reagan asubonteng May 24, 2020 at 3:45 am - Reply

can you tell where you got that datase?

11. Ishant Khurana May 24, 2020 at 3:45 am - Reply

Titanic exercise score ="0.9876543209876543"

12. Devendra Gohare May 24, 2020 at 3:45 am - Reply

accuracy Achieved : 85.5%

13. Vaibhav Gehani May 24, 2020 at 3:45 am - Reply

Computed accuracy – 0.8026905829596412 (Right or Wrong)
Replacing age NaN with mean value

14. Sai Vikas May 24, 2020 at 3:45 am - Reply

9:13
Here you didn't used any one hot encoder you just passed the interger values of the variable those got from label encoder may I know why?

15. Parikshit Rajpara May 24, 2020 at 3:45 am - Reply

Can you make a video on when should we use which algorithm? Please. Will help everyone sir.

16. sujith ramanathan May 24, 2020 at 3:45 am - Reply

I have one doubt. This decision tree and Binary Logistic Regression are looks similar both outputs are in boolean. Can you please explain a bit more detail here.

Can we do an additional exercise with the same Titanic datasheet to predict the Age where the Age is NaN. ?

17. sahithya m May 24, 2020 at 3:45 am - Reply

sir in the PClass columns the values are present like this 1st,2nd,3rd…… so how to change these values into an integer when I am using labelEncoder() I got an error

18. Swapn Shah May 24, 2020 at 3:45 am - Reply

Hello sir, Why didn't we use OneHotEncoder here ?

19. TheHelghastkilla May 24, 2020 at 3:45 am - Reply

Why did you use regular encoding instead of One Hot Encoding? When do you know to use which?

20. Vaibhav Dhand May 24, 2020 at 3:45 am - Reply

Thank you, sir, the exercise that you gave at the end of your lectures help us to experiment and get an in-depth knowledge of the algorithm. accuracy achieved =0.87

21. Videos4U May 24, 2020 at 3:45 am - Reply

Thank you.
I do not understand that we you did not use Dummy Variables? Because as I understood, it should be binary based for example Google:001 FaceBook 010 ABC:011 , I see "2" !

22. Prashant Kalambe May 24, 2020 at 3:45 am - Reply

I got 98.3% accuracy