-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathoutline.html
More file actions
47 lines (47 loc) · 3.28 KB
/
outline.html
File metadata and controls
47 lines (47 loc) · 3.28 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="cs" lang="cs">
<head>
<title>CUDA and Applications to Task-based Programming</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link href="css/screen.css" type="text/css" rel="stylesheet" media="screen,projection" />
<!--[if lte IE 6]><link href="css/msie.css" type="text/css" rel="stylesheet" media="screen,projection" /><![endif]-->
<link rel="stylesheet" media="print" type="text/css" href="css/print.css" />
</head>
<body>
<div id="layout">
<div id="header">
<h1 id="logo"><a href="#">CUDA<span class="light">Tutorial</span></a></h1>
<hr class="noscreen" />
<p class="noscreen noprint"> <em>Skip to <a href="#obsah">content</a>, <a href="#nav">navigation</a>.</em> </p>
<hr class="noscreen" />
<div id="nav" class="box">
<ul>
<li><a href="index.html">Home<br />
<span>Main page</span></a></li>
<li><a href="authors.html">Presenters<br />
<span>Creators of this tutorial</span></a></li>
<li><a href="audience.html">Audience<br />
<span>Who this tutorial is for</span></a></li>
<li><a href="outline.html">Outline<br />
<span>Subject overview</span></a></li>
<li><a href="mailto:kerbl@cg.tuwien.ac.at">Contact<br />
<span>Write us!</span></a></li>
</ul>
<hr class="noscreen" />
</div>
<div id="container" class="box">
<div id="obsah" class="content box">
<div class="in">
<h2>Tutorial Outline</h2>
To provide a profound understanding of how CUDA applications can achieve peak performance, the first two parts of this tutorial outline the modern CUDA architecture. Following a basic introduction, we expose how language features are linked to---and constrained by---the underlying physical hardware components. Furthermore, we describe common applications for massively parallel programming, offer a detailed breakdown of potential issues, and list ways to mitigate performance impacts. An exemplary analysis of PTX and SASS snippets illustrates how code patterns in CUDA are mapped to actual hardware instructions.
</br>
</br>
In parts 3 and 4, we focus on novel features that were enabled by the arrival of CUDA 10+ toolkits and the Volta+ architectures, such as ITS, tensor cores, and the graph API. In addition to basic use case demonstrations, we outline our own experiences with these capabilities and their potential performance benefits. We also discuss how long-standing best practices are affected by these changes and describe common caveats for dealing with legacy code on recent GPU models. We show how these considerations can be implemented in practice by presenting state-of-the-art research into task-based GPU scheduling, and how the dynamic adjustment of thread roles and group configurations can significantly increase performance.
</div>
</div>
</div>
<div id="footer"> <span class="f-left">© 2021 <a href="mailto:kerbl@cg.tuwien.ac.at">Michael Kenzel, Bernhard Kerbl, Martin Winter and Markus Steinberger</a></span> <span class="f-right">Design: <a href="http://www.davidkohout.cz">David Kohout</a></span> </div>
</div>
</div>
</body>
</html>