# Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)

///Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)

## Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)

Logistic regression is used for classification problems in machine learning. This tutorial will show you how to use sklearn logisticregression class to solve binary classification problem to predict if a customer would buy a life insurance. At the end we have an interesting exercise for you to solve.
Usually there are two types of machine learning problems (1) Linear regression where prediction value is continuous (2) Classification where predicted value is categorical. Logistic regression is used for classification problems mainly.
Code:
Exercise: Open above notebook from github and go to the end.

Topics that are covered in this Video:
0:01 – Theory (Explain difference between logic regression and classification)
1:18 – What is logistic regression?
1:26 – Classification types (Binary vs multiclass classification)
1:53 – Explanation of logistic regression using the example of if person will buy insurance based on his age
5:38 – Sigmoid or Logit function
8:18 – Coding (for coding we are using an example of if a person will buy insurance or not based on his age)
14:36 – sklearn predict_proba() function
15:49 – Exercise (Solve a problem of predicting employee retention based on salary, distance to work, promotion, department etc)

Next Video:
Machine Learning Tutorial Python – 8 Logistic Regression (Multiclass Classification):

Populor Playlist:
Data Science Full Course:

Data Science Project:

Machine learning tutorials:

Pandas:

matplotlib:

Python:

Jupyter Notebook:

To download csv and code for all tutorials: go to click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website:

source

By |2020-01-10T01:08:44+00:00January 10th, 2020|Python Video Tutorials|42 Comments

Finally got the Python version of Andrew Ag's machine learning course. With a better explanation.
thanks.

2. Zaid Zeee January 10, 2020 at 1:08 am - Reply

this video really really very helpful.
thank you so much for this amazing kwnldge

3. aleisley January 10, 2020 at 1:08 am - Reply

Can only go up to 78% and that's with tuning the hyperparameters. Thanks btw codebasics!

why we will always fit model on train data set not on test,but we will use transform on both…what is the difference between fit and transform?

5. Arun Sharma January 10, 2020 at 1:08 am - Reply

is this possible to have 0 and 1 both for the same age and we can compute them further based on probability?

6. Bandham Manikanta January 10, 2020 at 1:08 am - Reply

Perfect explanation on logistic regression.

Loved it. Thanks a lot.

7. raja ram January 10, 2020 at 1:08 am - Reply

Design and build a binary classifier over the dataset. Explain your algorithm and its

configuration. Explain your findings into both numerical and graphical

representations. Evaluate the performance of the model and verify the accuracy and

the effectiveness of your model. can u explain

8. mridul ahmed January 10, 2020 at 1:08 am - Reply

wow so nice. Thanx to explain in a very nice way.

9. Amanullah Mahabub January 10, 2020 at 1:08 am - Reply

You guys are life savers. man love your videos.

10. Aarushi Gupta January 10, 2020 at 1:08 am - Reply

It's good but too slow to listen

11. Rida Mehdawe January 10, 2020 at 1:08 am - Reply

among several videos, this one is the best. appreciated

12. salvin dsouza January 10, 2020 at 1:08 am - Reply

14:40 blooper !!

13. Praful Maka January 10, 2020 at 1:08 am - Reply

Nice explanation!

14. George Trialonis January 10, 2020 at 1:08 am - Reply

Thank you very much for the videos on ML, AI, Python, etc. They help me learn a lot. Your explanations are clear and well understood. Thanks.

15. Sunny veer Pratap Singh January 10, 2020 at 1:08 am - Reply

bro you are best .. tried to swirl thru other online videos and then I end up watching your videos and I understand better .

sir why didn't u dropped one dummy column from salary_high,salary_medium,salary_low in the exercise question, will it not create dummy variable trap?

17. umbul banin January 10, 2020 at 1:08 am - Reply

great job sir

18. Raju Prudhvi January 10, 2020 at 1:08 am - Reply

Tnks sir can give me prediction plot visualization

19. Islamic Way January 10, 2020 at 1:08 am - Reply

What is y_train?

20. fahim shahriar January 10, 2020 at 1:08 am - Reply

sir, in your given exercise can we drop the independent variable by backward elimination process ??

21. weerapast ruenrurngdee January 10, 2020 at 1:08 am - Reply

Thank you so much 🙂

22. Matt Chase January 10, 2020 at 1:08 am - Reply

23. Maxim Kuznetsov January 10, 2020 at 1:08 am - Reply

Cool

24. rudr'a rajput January 10, 2020 at 1:08 am - Reply

when i predict the following let see
In[22]: model.predict(56)
#it show me following error, please give me solution.
—————————————————————————

ValueError Traceback (most recent call last)

<ipython-input-18-f6c77a36af5e> in <module>

—-> 1 model.predict(56)

c:usersuserappdatalocalprogramspythonpython37-32libsite-packagessklearnlinear_modelbase.py in predict(self, X)

287 Predicted class label per sample.

288 """

–> 289 scores = self.decision_function(X)

290 if len(scores.shape) == 1:

291 indices = (scores > 0).astype(np.int)

c:usersuserappdatalocalprogramspythonpython37-32libsite-packagessklearnlinear_modelbase.py in decision_function(self, X)

263 "yet" % {'name': type(self).__name__})

264

–> 265 X = check_array(X, accept_sparse='csr')

266

267 n_features = self.coef_.shape[1]

c:usersuserappdatalocalprogramspythonpython37-32libsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)

512 "Reshape your data either using array.reshape(-1, 1) if "

513 "your data has a single feature or array.reshape(1, -1) "

–> 514 "if it contains a single sample.".format(array))

515 # If input is 1D raise error

516 if array.ndim == 1:

ValueError: Expected 2D array, got scalar array instead:

array=56.

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

25. Chris LAM January 10, 2020 at 1:08 am - Reply

shift-tab.

26. Ankit DS January 10, 2020 at 1:08 am - Reply

Bro, this command is not working – giving below error – I did a dir on LinearRegression and can see 'predict_proba' in it, but still getting below error

'LinearRegression' object has no attribute 'predict_proba'

27. Vishnu Dutt January 10, 2020 at 1:08 am - Reply

awesome explanation

28. hardik darji January 10, 2020 at 1:08 am - Reply

Sir, I tried exercise 'HR_comma_sep.cvs' —
Cannot getting Accuracy > 0.78… how can I improve the model?
thanks a lot …

29. Prajual Pillai January 10, 2020 at 1:08 am - Reply

you should not force an accent.

30. Patrik Buess January 10, 2020 at 1:08 am - Reply

thx Sir!

31. Hemanth Peddi January 10, 2020 at 1:08 am - Reply

what is penalty=l2 in the output 5 at 13:56 ? Can you please explain the parameters of the function

32. shubham jain January 10, 2020 at 1:08 am - Reply

sir I have got an error "too many values to unpack" at 11:24,please help me to resolve this issue.

33. Arunav Rath January 10, 2020 at 1:08 am - Reply

Superb..I have question though, after building the model in the exercise, how do we apply to new employees? I mean I want to check the probability of retaining a new set of employees.How to do that?

34. Ankit Parashar January 10, 2020 at 1:08 am - Reply

76% with salary

35. Umashankar verma January 10, 2020 at 1:08 am - Reply

36. Justin Dates January 10, 2020 at 1:08 am - Reply

how do you import the csv file?

37. siddhant ranjan January 10, 2020 at 1:08 am - Reply

accuracy-100%

Hi sir can you make a video for given exercise …so that we can understand how to analyse also ..pls? your video is awesome..

39. Sunday Honesty January 10, 2020 at 1:08 am - Reply

Thank you Codebasicd for helping me understand Linear regression. I have question for Codebasics and everyone. please do well to answer me. thanks in advance.
I want to perform a logistic regression. I was asked to use state and political party and vote gotten as my independent variable and make a prediction whether a political party wins or loses. I have 36 states in my country and i want to use 3 dominant parties i want to use as a case study. my problem is how the layout of these data will be; I am unable to resolve party been in a separate column unless I take one political party and take one state and do the prediction explicitly and then move on to another.
Please i really needs you guys help to resolve these issue. Thanks in advance.

40. Uttam Dey January 10, 2020 at 1:08 am - Reply

Sir, in the exercise section how did you decide that the df['left']==1 is the employees who left and df['left']==0 are the one retained. As I initially thought vice versa.Please respond to this query how to decide in such circumstance.

By the way the tutorials are really helpful and thank you very much for the helpful tutorials.

41. Riefvan Achmad Masrury January 10, 2020 at 1:08 am - Reply

Really nice and clear explanation, will be very useful for my students

42. Naveen kumar M January 10, 2020 at 1:08 am - Reply

Thanks codebasics for such a clear explanations with examples.

For the exercise problem, satisfaction level, average monthly hours, promotion last 5 years and salary are independent variables right ??
If so, why not 'time_spend_company' and 'Work_accident' considered ?
Can you please explain me actually i didn't get how to conclude the variables as dependent and independent specifically for this exercise problem. .