Data Analysis with Python and Pandas Tutorial Introduction

///Data Analysis with Python and Pandas Tutorial Introduction

Data Analysis with Python and Pandas Tutorial Introduction

FavoriteLoadingAdd to favorites

Pandas is a Python module, and Python is the programming language that we’re going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.

At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that’s all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.

Sample code:

Pip install tutorial:

Matplotlib series starts here:

source

By |2020-01-22T01:20:14+00:00January 22nd, 2020|Python Video Tutorials|41 Comments

41 Comments

  1. sentdex January 22, 2020 at 1:20 am - Reply

    The example shown in this video is now deprecated, I've updated the text-based tutorial to work as of March 31 2018. https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/

  2. Nishant Parekh January 22, 2020 at 1:20 am - Reply

    can someone tell me how to make such kind of videos please ? Like record the screen with myself in a small box in a corner ?

  3. Ksp January 22, 2020 at 1:20 am - Reply

    I just can't thank you enough man..

  4. Gaurav Sharma January 22, 2020 at 1:20 am - Reply

    i am trying to fetch data from the CSV file.I am not able to handle string data types. The below code is perfectly working for int datatypes.

    But while it is fetching string data then in output it is showing Object details along with index no".So i don't want Object details and index number in the output.

    Please let me know how can i do that.

    Here below is my code:

    ———————————————————————–

    import pandas as pd

    df=pd.read_csv('C:ABCC.csv')

    k=df[df.GS==df['GS'].max()]

    print("NR PDSCH scheduled rank-SCG SCell :",int(k['NR PDSCH scheduled rank-100']))

    print("NR PDSCH modulation for CW0 :",k['NR PDSCH modulation for CW0-100'])

    ———————————————————————–

    The output of my Code:

    NR PDSCH scheduled rank-SCG SCell : 4

    NR PDSCH modulation for CW0 : 422 16-QAM

    Name: NR PDSCH modulation for CW0-100, dtype: object

    ———————————————————————–

    I don't want 422–>Index No and "Name: NR PDSCH modulation for CW0-100, dtype: object" in my output.

    Below is the desired format of output:

    ———————————————————————–

    NR PDSCH scheduled rank-SCG SCell : 4

    NR PDSCH modulation for CW0 : 16-QAM

    ———————————————————————–

  5. STUFF January 22, 2020 at 1:20 am - Reply

    Hi Im excel i dislike this vid

  6. connor fischer January 22, 2020 at 1:20 am - Reply

    what is the screen that he starts the video in? and how do i get to it?
    I tried to run pandas with python and it didn't look anything like what he had

  7. connor fischer January 22, 2020 at 1:20 am - Reply

    if you have trouble downloading using pip install try opening command prompt by typing it into your windows search bar is it should be downloaded with python

  8. Dopeboyz789 January 22, 2020 at 1:20 am - Reply

    I have python but im having trouble locating pandas. Can you help me out pls

  9. Babak J January 22, 2020 at 1:20 am - Reply

    if you are watching this in 2019, this will work (open a free account on iexcloud):
    *** code starts here ***

    from iexfinance.stocks import get_historical_data
    import pandas as pd
    from datetime import datetime
    import matplotlib.pyplot as plt
    from matplotlib import style

    # docs for iexfinance https://pypi.org/project/iexfinance/
    # get your token from https://iexcloud.io/

    style.use('ggplot')
    start = datetime(2017, 1, 1)
    end = datetime(2018, 1, 1)

    df = get_historical_data("XOM", start, end, output_format='pandas', token='<your token from iexcloud.io>')

    print(df.head)
    df['close'].plot()
    plt.show()

    *** code endss here ***
    dont forget to pip install the libraries

  10. pices26 January 22, 2020 at 1:20 am - Reply

    from pandas_datareader import data as web
    instead of pandas.io.data as web

  11. susi sorglos January 22, 2020 at 1:20 am - Reply

    which source to get data is working atm? maybe some to pay for?

  12. hanz hanseller January 22, 2020 at 1:20 am - Reply

    Good on you

  13. Милош Пиваш January 22, 2020 at 1:20 am - Reply

    Old code doesn't work. What to do?

    this "df = web.DataReader("XOM", "morningstar", start, end)" (**** )
    raises
    >>Exception has occurred: pandas_datareader.exceptions.ImmediateDeprecationError
    Morningstar has been immediately deprecated due to large breaks in the API without the introduction of a stable replacement. Pull Requests to re-enable these data connectors are welcome. See https://github.com/pydata/pandas-datareader/issues

    – See https://github.com/pydata/pandas-datareader/issues
    We see a bunch of stuff we don't need
    – Search "morningstar"
    First result is "Remove Google and Morningstar"

    – Ok, websearch pandas_datareader
    First result is:
    https://pandas-datareader.readthedocs.io/en/latest/
    It's some documentation
    What are we trying to do?
    Access data
    – Click "Remote Data Access"
    We're here: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html
    A list of some names appears, it says "Currently the following sources are supported"
    First two are Google and Morningstar???

    First bellow Morningstar is IEX, maybe try that.
    – Click on IEX
    We're here: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#remote-data-iex
    – Look and compare sample codes
    – Ok, maybe replace **** with:
    f = web.DataReader('F', 'iex', start, end)

    >>Exception has occurred: ValueError
    Invalid date specified. Must be within past 5 years.
    – Change 2010 to 2015
    It works (at 2019.02.26.)

    If IEX gets deprecated at the time of reading this, mimic the the process above.

  14. Isnotaustin January 22, 2020 at 1:20 am - Reply

    i am liking just because you specifically didnt say and beg for subscribers

  15. Mr VIX January 22, 2020 at 1:20 am - Reply

    do you still use win 7 ?

  16. Potatonumbertwo January 22, 2020 at 1:20 am - Reply

    Hi ,is anyone know what's the difference between import pandas_datareader.data as web and import pandas_datareader as web . i got the same result ,i just wondering why ?

  17. V G January 22, 2020 at 1:20 am - Reply

    Great work. Thank you.

  18. irmscher9 January 22, 2020 at 1:20 am - Reply

    I can't stress enough how awesome you are man! =)

  19. Cib January 22, 2020 at 1:20 am - Reply

    For people not being able to follow this tutorial because morningstar doesn't work anymore, here's what you'll do:
    1. in cmd type: pip3 install iexfinance
    2. in imports add : from iexfinance.stocks import get_historical_data
    3. instead of: df = web.DataReader("XOM", "morningstar", start, end)
    put: df = get_historical_data("XOM", start, end, output_format='pandas')
    4. delete: df = df.drop("Symbol", axis=1)
    Here we are just using iexfinance instead of morningstar(be careful tho, iexfinance can only pull the data 5 years behind, so we cannot call 2010 like he did)
    Hope that helps 🙂

  20. Kiran Randhawa January 22, 2020 at 1:20 am - Reply

    You waffle a little bit. I'm a waffler too. I think your content is great but it would be best if you stick to what the talk is mean to be about. Right now I'm waiting to get to more info about Pandas but you're talking about language performance. Hey nobody is perfect but I hope my feedback helps.

  21. spicytuna08 January 22, 2020 at 1:20 am - Reply

    is there a way to display matplotlib outputs to web?

  22. Nimaz Sheik January 22, 2020 at 1:20 am - Reply

    # This is my code for anyone having problems with this tutorial

    import pandas as pd

    import datetime

    # run 'pip install pandas-datareader' to use pandas_datareader
    from pandas_datareader import data as web # module to access remote data
    import matplotlib.pyplot as plt

    from matplotlib import style

    style.use('ggplot')

    start = datetime.datetime(2017,1,1)

    end = datetime.datetime.now()
    # getting data till current date

    df = web.DataReader('TSLA', 'yahoo', start, end)
    # reading tesla stock from yahoo

    print(df.head())

    df['Adj Close'].plot()

    plt.show()

  23. hritik singh January 22, 2020 at 1:20 am - Reply

    can anyone help me please "#ModuleNotFoundError:No module named 'email.utils' ; 'email' is not a package" this is happening i am using pyscripter version->3.5.1 and python version->3.7.1 .

  24. tehamill1 January 22, 2020 at 1:20 am - Reply

    I can't thank you enough for these videos!

  25. Udaya Sri Kariyawasam January 22, 2020 at 1:20 am - Reply

    I was unable to plot with this code : import pandas as pd
    import datetime
    #import pandas.io.data as web
    import pandas_datareader.data as web
    import matplotlib.pyplot as plt
    from matplotlib import style
    style.use('ggplot')

    start = datetime.datetime(2010, 1, 1)
    end = datetime.datetime(2018, 1, 1)

    df =web.DataReader("XOM", "yahoo", start, end)
    print(df.head())
    df ['Adj Close'].plot
    plt.show() this plt.show is not working could you please help me in doing

  26. Iphone 3G January 22, 2020 at 1:20 am - Reply

    Good video, except the part where you said Python is 99.9% as fast as C. This is completely false. C is A LOT and I mean A LOT faster than Python. You can easy teat it by making a script in Python than requires some computing power, time it, then make the same program in C, time it. You will see that the C one will compute a lot faster. Im npt saying 2x or 3x as fast. If you have a bit program that requires a lot of computing , the C version can run 20x – 30x faster than Python. I have worked with both languages and tested it multiple times. When it comes to speed. C/C++ is superior. The only language that could beat it is Fortran.

  27. Bruno Picasso January 22, 2020 at 1:20 am - Reply

    I tried to run it but a KeyError appears . it´s something related to "Adj Close"

  28. Elmira Ch January 22, 2020 at 1:20 am - Reply

    Morningstar doesnt work anymore "raise ImmediateDeprecationError(DEP_ERROR_MSG.format("Morningstar"))

    pandas_datareader.exceptions.ImmediateDeprecationError:

    Morningstar has been immediately deprecated due to large breaks in the API without the

    introduction of a stable replacement. Pull Requests to re-enable these data

    connectors are welcome."

  29. Adnan Sabbir January 22, 2020 at 1:20 am - Reply

    The code in his text tutorial is deprecated too. It gives an error
    "Morningstar has been immediately deprecated due to large breaks in the API without the

    introduction of a stable replacement. Pull Requests to re-enable these data

    connectors are welcome."

    I tried the same code here just by changing
    "import pandas.io.data as web" -> "import pandas_datareader.data as web"

    And it works

  30. Adil Saju January 22, 2020 at 1:20 am - Reply

    Why are u even using windows for python?

  31. Adil Saju January 22, 2020 at 1:20 am - Reply

    Great Video Again. But please use some better microphones, you sound's too husky and unclear.

  32. Alper Calisir January 22, 2020 at 1:20 am - Reply
  33. Enigma January 22, 2020 at 1:20 am - Reply

    pandas_datareader.exceptions.ImmediateDeprecationError:
    Morningstar has been immediately deprecated due to large breaks in the API without the
    introduction of a stable replacement. Pull Requests to re-enable these data
    connectors are welcome.

  34. Mudassir Mustafa January 22, 2020 at 1:20 am - Reply

    Hey. New to the channel and to the field as well. I am using pyCharm as some tutorial had me install that so thats the IDE I am a bit more comfortable in but its giving me a headache now as everyone else is using Terminal, IDLE or Sublime Text. I am not sure once I have save the file, how to run that to see all that. Not sure how to google it as well. Spent the whole day figuring that out and stumble here. Any help would be appreciated. Thanks

  35. spicytuna08 January 22, 2020 at 1:20 am - Reply

    sublime is not free. this one is also F annoying. i hope this tutorial guy clarifies the working env instead of saying "it doesn't matter." of course it matters.

  36. spicytuna08 January 22, 2020 at 1:20 am - Reply

    WTF, why so many errors? PyCharm is F annoying. Keeps asking to upgrade. I don't need this annoyance. F Python is annoying. Does anyone knows if I can set up python in C9? My concern is that I won't be able to see graphics.

  37. Rob Thorn January 22, 2020 at 1:20 am - Reply

    FAIL FAIL FAIL … import pandas.io.data as web fails, pandas_datareader.data as web fails, one after another, the tutorial won't run. This is TYPICAL of Python, its dependencies are constantly flaking, it's like building a skyscraper on sand. I keep asking myself, "Why bother?" Despite what you say, C++ is 2-3 orders of magnitude faster, and when you get down to serious applications like machine learning, you will see that. You will come to love FORTRAN and C/C++ for their speed and efficiency. And putting Python wrappers on good C code is like your princess going to the ball in a SHOPPING CART. The things that Pandas does for you are the least important part of the process yet you make it sound important — get back to basics, that's the key.

  38. Dimitri Esslinger January 22, 2020 at 1:20 am - Reply

    I am very dissapointed in Pandas framework since there is no easy way to implement rolling forward window functions on datetime column. Only backward window functions…

Leave A Comment

*