

At about 9:17, the url should be ‘?p=’ instead of ‘?p+’ although it still works with the plus.
A basic introduction to BeautifulSoup and web scraping.
If you enjoyed, feel free to leave a like and subscribe for upcoming videos. If you have questions, just leave a comment and I’ll try and answer as soon as possible.
Header: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Contact me: arashkg.contact@gmail.com
source
How did you come up with your header? I think that something what are we are missing.
Very well. I want to make a python script to will load the page content automatically i.e if there is an option like "Load more " or Arrow key to load contents down. The script automatically does that.so user can see that contents by just scrolling down. Pease help.
How did that URL work when it was "p=" but you concatenated it with a "p+"
adding the mask how did you come up with Mozilla … etc etc? I'm using chrome.
Please how do you select a line and make it grey with #. When i do that this way i just replace the entire line with #.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket
_context=self, _session=session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in _init_
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/sid.mishra910/PycharmProjects/untitled/work.py", line 8, in <module>
resp = urllib.request.urlopen(req)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>
GIVES THIS ERROR NEED A FIX
If you can do all this than can web scraping can be useful for doxing?
Hey, how could I get the text before the data for the facebook info? For example each row has something like
"Previous Close" 120.57
"Open" 120.90
"Bid" 119.85 x 400
"Ask" 119.94 x 300
So my question would be how can I ALSO get the text values like "Previous Close", "Open", "Bid", "Ask", etc, etc AND the numbers for each like you did in the video.
Great Video btw!
Excellent video, thank you. I've seen the requests module recommended over urllib though, is there any particular reason for this?
To get your browser's user-agent: http://www.whatsmyua.com. This was a really helpful tutorial.
Nice. What would you do to get the emails from a vbulletin forum?
Beautifully explained!
I really liked this video, thank you!