CUDACast #10a – Your First CUDA Python Program

///CUDACast #10a – Your First CUDA Python Program

CUDACast #10a – Your First CUDA Python Program

FavoriteLoadingAdd to favorites

In this CUDACast video, we’ll see how to write and run your first CUDA Python program using the Numba Compiler from Continuum Analytics.

source

By |2019-09-05T22:39:29+00:00September 5th, 2019|Python Video Tutorials|21 Comments

21 Comments

  1. clark whitehead September 5, 2019 at 10:39 pm - Reply

    It's faster if you run it without adding @vectorize. The two programs your comparing aren't even the same. Is this some sort of con? WTH?

  2. Blake Edwards September 5, 2019 at 10:39 pm - Reply

    Thanks!

  3. Python PogChamp September 5, 2019 at 10:39 pm - Reply

    Thank You! This helped so much!

  4. Roy Olsen September 5, 2019 at 10:39 pm - Reply

    Interestingly, numba does this a lot faster on CPU than on CUDA.

    Numba with CPU target: 0.08 seconds
    Numba with CUDA target 0.45 seconds
    Original python code: 12.96 seconds

    (Xeon E5630 and GTX 1060)

  5. Shlok Dave September 5, 2019 at 10:39 pm - Reply

    Extremely misleading. Very, very poor example.

  6. Ruito Mauaie September 5, 2019 at 10:39 pm - Reply

    hi everyone I'm new in python field, I have a python code but when I run I got an error (Torch not compiled with CUDA enabled), can someone explain what this error means? I just only test the code but I don't have Nvidia graphic may someone help me? I'm doing my final research using KPCA . I look forward hearing from you

  7. Daniel Abramov September 5, 2019 at 10:39 pm - Reply

    yeah, faster 1 CPU core vs XXX CUDA cores 😀 Can you do at least "from concurrent.futures import ProcessPoolExecutor" and parallelize func? Decorator functionality is awesome.

  8. Chretze September 5, 2019 at 10:39 pm - Reply

    For some reason I get an error whenever the print() function is called: "Attribute error: 'NoneType' object has no attribute 'write'"
    What the hell, I have no clue why that would occur. It can't even just print("hello world")

  9. Lancelot Xavier September 5, 2019 at 10:39 pm - Reply

    720P?

  10. liberator48 September 5, 2019 at 10:39 pm - Reply

    the VectorAdd function has no output, how does this work?

  11. nomad27 September 5, 2019 at 10:39 pm - Reply

    I know this is only meant to show simple code running on GPU, but note that the following CPU only matlab code does the same in 0.08 seconds (almost 1/2 of the GPU in this example):
    a=ones(32000000,1);
    b=ones(32000000,1);
    c=zeros(32000000,1);
    tic
    c=a+b;
    toc
    Elapsed time is 0.080721 seconds.

  12. Cypherdude1 September 5, 2019 at 10:39 pm - Reply

    What's the best book to buy regarding Python CUDA programming? I did a search on Amazon for Python CUDA but I didn't find much published after January 2015.

  13. Benjamin F. September 5, 2019 at 10:39 pm - Reply

    cool! How about a video explaining something more complex/useful than adding two vectors? That would a lot

  14. Bob Watkins September 5, 2019 at 10:39 pm - Reply

    Contrived example. The pure Python function was written in a naive way so as to run exceptionally slowly. If you rewrite it to use numpy's vector addition, it runs faster than the CUDA version. Just comment out the numba imports and the @vectorize decorator to see this. I'm sure that the GPU is much faster than the CPU, but is it not clear from this video whether numba allows a Python program to take advantage of that speed considering the data-transfer overhead to and from the GPU. What you need to do is publish a more realistic example where a significant portion of the program's state and logic resides on the GPU.

  15. James Kim September 5, 2019 at 10:39 pm - Reply

    This exaggerates the difference unnecessarily and, more importantly, does not reveal the GPU acceleration over CPU. Using numpy, you can do np.add(a, b) rather than a for loop of element-wise adding of a and b. Doing this only takes 0.07sec even without using a GPU. So, could an informative comparison of np.add(a,b) and a GPU utilizing version thereof be shown to truly reveal how fast GPU does it? The fact seems to be that for this simple vector addition, using GPU offers no gain. The situation is sure to be different when more meaningful calculations are vectorized using GPU.

  16. catklyst September 5, 2019 at 10:39 pm - Reply

    HOLY SHIT

  17. AKSHAT SHARMA September 5, 2019 at 10:39 pm - Reply

    I we do, return(a + b) kind of thing without " @vectorize…" then also it would perform better. and target = 'cuda' takes much longer time than target = 'cpu'.

  18. Francesco Faccenda September 5, 2019 at 10:39 pm - Reply

    Tried code verson 2 withouth the GPU directives. Run a lot faster anyway. You should compare same code on CPU and on GPU or it's not fair.

  19. I77AGIC September 5, 2019 at 10:39 pm - Reply

    why are you posting outdated videos…

  20. Caleb Klein September 5, 2019 at 10:39 pm - Reply

    What GPU was this executed on?

  21. adrian vera September 5, 2019 at 10:39 pm - Reply

    to run on python3
    -replace numbapro to numba
    -replace target='gpu' to target='cuda'

Leave A Comment

*