Parallel Python: Analyzing Large Datasets Intermediate | SciPy 2016 Tutorial | Matthew Rocklin & Mi

Home/Programming Video Tutorials/Python Video Tutorials/Parallel Python: Analyzing Large Datasets Intermediate | SciPy 2016 Tutorial | Matthew Rocklin & Mi

Parallel Python: Analyzing Large Datasets Intermediate | SciPy 2016 Tutorial | Matthew Rocklin & Mi

FavoriteLoadingAdd to favorites

Students will walk away with a high-level understanding of both parallel problems and how to reason about parallel computing frameworks. They will also walk away with hands-on experience using a variety of frameworks easily accessible from Python.

For the first half, we will cover basic ideas and common patterns encountered when analyzing large data sets in parallel. We start by diving into a sequence of examples that require increasingly complex tools. From the most basic parallel API: map, we will cover some general asynchronous programming with Futures, and high level APIs for large data sets, such as Spark RDDs and Dask collections, and streaming patterns. For the second half, we focus on traits of particular parallel frameworks, including strategies for picking the right tool for your job. We will finish with some common challenges in parallel analysis, such as debugging parallel code when it goes wrong, as well as deployment and setup strategies.

Part one: We dive into common problems with a variety of tools

1. Parallel Map
2. Asynchronous Futures
3. High Level Datasets
4. Streaming

Part two: We analyze common traits of parallel computing systems.

1. Processes and Threads. The GIL, inter-worker communication, and contention
2. Latency and overhead. Batching, profiling.
3. Communication mechanisms. Sockets, MPI, Disk, IPC.
4. Stuff that gets in the way. Serialization, Native v. JVM, Setup, Resource Managers, Sample Configurations
5. Debugging async and parallel code / Historical perspective

We intend to cover the following tools: concurrent.futures, multiprocessing/threading, joblib, IPython parallel, Dask, Spark

By | 2017-06-23T03:30:37+00:00 June 23rd, 2017|Python Video Tutorials|5 Comments

5 Comments

  1. jaya krishna June 23, 2017 at 3:34 am - Reply

    When I am running prep.py file its only creating blank folder..but no data in it
    ….can anybody provide any help for this?

  2. Toby Patterson June 23, 2017 at 3:49 am - Reply

    Fantastic presentation

  3. Enthought June 23, 2017 at 4:09 am - Reply

    See the complete SciPy 2016 Conference talk & tutorial playlist here: https://www.youtube.com/playlist?list=PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6

  4. Enthought June 23, 2017 at 4:18 am - Reply

    Tutorial materials may be found at https://github.com/mrocklin/scipy-2016-parallel/

  5. JoYstiKdu75020 June 23, 2017 at 4:24 am - Reply

    Thank you very much for the tutorial and the server. It feels like I was at the workshop.

Leave A Comment

*