Break into the powerful world of parallel GPU
programming with this down-to-earth, practical
guide

Designed for professionals across multiple industrial sectors,
Professional CUDA C Programming presents CUDA -- a
parallel computing platform and programming model designed to ease
the development of GPU programming -- fundamentals in an
easy-to-follow format, and teaches readers how to think in parallel
and implement parallel algorithms on GPUs. Each chapter covers a
specific topic, and includes workable examples that demonstrate the
development process, allowing readers to explore both the "hard"
and "soft" aspects of GPU programming.

Computing architectures are experiencing a fundamental shift
toward scalable parallel computing motivated by application
requirements in industry and science. This book demonstrates the
challenges of efficiently utilizing compute resources at peak
performance, presents modern techniques for tackling these
challenges, while increasing accessibility for professionals who
are not necessarily parallel programming experts. The CUDA
programming model and tools empower developers to write
high-performance applications on a scalable, parallel computing
platform: the GPU. However, CUDA itself can be difficult to learn
without extensive programming experience. Recognized CUDA
authorities John Cheng, Max Grossman, and Ty McKercher guide
readers through essential GPU programming skills and best practices
in Professional CUDA C Programming, including:

* CUDA Programming Model

* GPU Execution Model

* GPU Memory model

* Streams, Event and Concurrency

* Multi-GPU Programming

* CUDA Domain-Specific Libraries

* Profiling and Performance Tuning

The book makes complex CUDA concepts easy to understand for
anyone with knowledge of basic software development with exercises
designed to be both readable and high-performance. For the
professional seeking entrance to parallel computing and the
high-performance computing community, Professional CUDA C
Programming is an invaluable resource, with the most current
information available on the market.



Autorentext

John Cheng, PHD, is a Research Scientist at BGP International in Houston. He has developed seismic imaging products with GPU technology and many high-performance parallel production applications on heterogeneous computing-platforms.

Max Grossman is an expert in GPU computing with experience applying CUDA to problems in medical imaging, machine learning, geophysics, and more.

Ty McKercher has been helping customers adopt GPU acceleration technologies while he has been employed at NVIDIA since 2008.



Zusammenfassung

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide

Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the "hard" and "soft" aspects of GPU programming.

Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:

  • CUDA Programming Model
  • GPU Execution Model
  • GPU Memory model
  • Streams, Event and Concurrency
  • Multi-GPU Programming
  • CUDA Domain-Specific Libraries
  • Profiling and Performance Tuning

The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.



Inhalt

Foreword xvii

Preface xix

Introduction xxi

Chapter 1: Heterogeneous Parallel Computing with CUDA 1

Parallel Computing 2

Sequential and Parallel Programming 3

Parallelism 4

Computer Architecture 6

Heterogeneous Computing 8

Heterogeneous Architecture 9

Paradigm of Heterogeneous Computing 12

CUDA: A Platform for Heterogeneous Computing 14

Hello World from GPU 17

Is CUDA C Programming Difficult? 20

Summary 21

Chapter 2: CUDA Programming Model 23

Introducing the CUDA Programming Model 23

CUDA Programming Structure 25

Managing Memory 26

Organizing Threads 30

Launching a CUDA Kernel 36

Writing Your Kernel 37

Verifying Your Kernel 39

Handling Errors 40

Compiling and Executing 40

Timing Your Kernel 43

Timing with CPU Timer 44

Timing with nvprof 47

Organizing Parallel Threads 49

Indexing Matrices with Blocks and Threads 49

Summing Matrices with a 2D Grid and 2D Blocks 53

Summing Matrices with a 1D Grid and 1D Blocks 57

Summing Matrices with a 2D Grid and 1D Blocks 58

Managing Devices 60

Using the Runtime API to Query GPU Information 61

Determining the Best GPU 63

Using nvidia-smi to Query GPU Information 63

Setting Devices at Runtime 64

Summary 65

Chapter 3: CUDA Execution Model 67

Introducing the CUDA Execution Model 67

GPU Architecture Overview 68

The Fermi Architecture 71

The Kepler Architecture 73

Profile-Driven Optimization 78

Understanding the Nature of Warp Execution 80

Warps and Thread Blocks 80

Warp Divergence 82

Resource Partitioning 87

Latency Hiding 90

Occupancy 93

Synchronization 97

Scalability 98

Exposing Parallelism 98

Checking Active Warps with nvprof 100

Checking Memory Operations with nvprof 100

Exposing More Parallelism 101

Avoiding Branch Divergence 104

The Parallel Reduction Problem 104

Divergence in Parallel Reduction 106

Improving Divergence in Parallel Reduction 110

Reducing with Interleaved Pairs 112

Unrolling Loops 114

Reducing with Unrolling 115

Reducing with Unrolled Warps 117

Reducing with Complete Unrolling 119

Reducing with Template Functions 120

Dynamic Parallelism 122

Nested Execution 123

Nested Hello World on the GPU 124

Nested Reduction 128

Summary 132

Chapter 4: Global Memory 135

Introducing the CUDA Memory Model 136

Benefits of a Memory Hierarchy 136

CUDA Memory Model 137

Memory Management 145

Memory Allocation and Deallocation 146

Memory Trans…

Titel
Professional CUDA C Programming
EAN
9781118739273
ISBN
978-1-118-73927-3
Format
E-Book (pdf)
Hersteller
Herausgeber
Veröffentlichung
28.08.2014
Digitaler Kopierschutz
Adobe-DRM
Dateigrösse
57.95 MB
Anzahl Seiten
528
Jahr
2014
Untertitel
Englisch