Python Web Scraping Tutorial (Introductory)

///Python Web Scraping Tutorial (Introductory)

Python Web Scraping Tutorial (Introductory)

FavoriteLoadingAdd to favorites

At about 9:17, the url should be ‘?p=’ instead of ‘?p+’ although it still works with the plus.

A basic introduction to BeautifulSoup and web scraping.
If you enjoyed, feel free to leave a like and subscribe for upcoming videos. If you have questions, just leave a comment and I’ll try and answer as soon as possible.

Header: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36

Contact me: arashkg.contact@gmail.com

source

By |2021-03-21T13:01:59+00:00March 21st, 2021|Python Video Tutorials|13 Comments

13 Comments

  1. Juan Dela Cruz March 21, 2021 at 1:02 pm - Reply

    How did you come up with your header? I think that something what are we are missing.

  2. Sarwar Hayatt March 21, 2021 at 1:02 pm - Reply

    Very well. I want to make a python script to will load the page content automatically i.e if there is an option like "Load more " or Arrow key to load contents down. The script automatically does that.so user can see that contents by just scrolling down. Pease help.

  3. Rodolfo Padilla March 21, 2021 at 1:02 pm - Reply

    How did that URL work when it was "p=" but you concatenated it with a "p+"

  4. Jae Wavyy March 21, 2021 at 1:02 pm - Reply

    adding the mask how did you come up with Mozilla … etc etc? I'm using chrome.

  5. dontcallmebrobruh March 21, 2021 at 1:02 pm - Reply

    Please how do you select a line and make it grey with #. When i do that this way i just replace the entire line with #.

  6. Siddharth Mishra March 21, 2021 at 1:02 pm - Reply

    Traceback (most recent call last):
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
    self.connect()
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect
    server_hostname=server_hostname)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket
    _context=self, _session=session)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in _init_
    self.do_handshake()
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake
    self._sslobj.do_handshake()
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake
    self._sslobj.do_handshake()
    ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File "/Users/sid.mishra910/PycharmProjects/untitled/work.py", line 8, in <module>
    resp = urllib.request.urlopen(req)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
    File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
    urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

    GIVES THIS ERROR NEED A FIX

  7. Creampie March 21, 2021 at 1:02 pm - Reply

    If you can do all this than can web scraping can be useful for doxing?

  8. Josh March 21, 2021 at 1:02 pm - Reply

    Hey, how could I get the text before the data for the facebook info? For example each row has something like
    "Previous Close" 120.57
    "Open" 120.90
    "Bid" 119.85 x 400
    "Ask" 119.94 x 300
    So my question would be how can I ALSO get the text values like "Previous Close", "Open", "Bid", "Ask", etc, etc AND the numbers for each like you did in the video.

    Great Video btw!

  9. Calvin Wankhede March 21, 2021 at 1:02 pm - Reply

    Excellent video, thank you. I've seen the requests module recommended over urllib though, is there any particular reason for this?

  10. glibsonoran March 21, 2021 at 1:02 pm - Reply

    To get your browser's user-agent: http://www.whatsmyua.com. This was a really helpful tutorial.

  11. Joe S March 21, 2021 at 1:02 pm - Reply

    Nice. What would you do to get the emails from a vbulletin forum?

  12. Manish Kumar Mishra March 21, 2021 at 1:02 pm - Reply

    Beautifully explained!

  13. Buhs March 21, 2021 at 1:02 pm - Reply

    I really liked this video, thank you!

Leave A Comment

*