1- Introduction to TorchRec
2- ========================
1+ TorchRec ์๊ฐ
2+ ============
33
44.. tip ::
5- To get the most of this tutorial, we suggest using this
6- `Colab Version <https://colab.research.google.com/github/pytorch/torchrec/blob/main/Torchrec_Introduction.ipynb >`__.
7- This will allow you to experiment with the information presented below .
5+ ์ด ํํ ๋ฆฌ์ผ์ ์ต๋ํ ํ์ฉํ๋ ค๋ฉด ์ด
6+ `Colab ๋ฒ์ <https://colab.research.google.com/github/pytorch/torchrec/blob/main/Torchrec_Introduction.ipynb >`__ ์ ์ฌ์ฉํ๋ ๊ฒ์ด ์ข์ต๋๋ค .
7+ ์ด๋ฅผ ํตํด ์๋์ ์ ์๋ ์ ๋ณด๋ฅผ ์คํํ ์ ์์ต๋๋ค .
88
9- Follow along with the video below or on ` youtube <https://www.youtube.com/watch?v=cjgj41dvSeQ >`__.
9+ ์๋ ๋์์์ด๋ ` ์ ํ๋ธ <https://www.youtube.com/watch?v=cjgj41dvSeQ >`__ ์์ ๋ฐ๋ผํด๋ณด์ธ์ .
1010
1111.. raw :: html
1212
1313 <div style =" margin-top :10px ; margin-bottom :10px ;" >
1414 <iframe width =" 560" height =" 315" src =" https://www.youtube.com/embed/cjgj41dvSeQ" frameborder =" 0" allow =" accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen ></iframe >
1515 </div >
1616
17- Frequently, when building recommendation systems, we want to represent
18- entities like products or pages with embeddings. For example, see Meta
19- AIโs `Deep learning recommendation
20- model <https://arxiv.org/abs/1906.00091> `__, or DLRM. As the number of
21- entities grow, the size of the embedding tables can exceed a single
22- GPUโs memory. A common practice is to shard the embedding table across
23- devices, a type of model parallelism. To that end, TorchRec introduces
24- its primary API
25- called |DistributedModelParallel |_,
26- or DMP. Like PyTorchโs DistributedDataParallel, DMP wraps a model to
27- enable distributed training.
17+ ์ถ์ฒ ์์คํ
์ ๋ง๋ค ๋, ์ ํ์ด๋ ํ์ด์ง์ ๊ฐ์ ๊ฐ์ฒด๋ฅผ ์๋ฒ ๋ฉ์ผ๋ก ํํํ๊ณ ์ถ์ ๊ฒฝ์ฐ๊ฐ ๋ง์ต๋๋ค.
18+ Meta AI์ `๋ฅ๋ฌ๋ ์ถ์ฒ ๋ชจ๋ธ <https://arxiv.org/abs/1906.00091 >`__ ๋๋ DLRM์ ์๋ก ๋ค ์ ์์ต๋๋ค.
19+ ๊ฐ์ฒด์ ์๊ฐ ์ฆ๊ฐํจ์ ๋ฐ๋ผ, ์๋ฒ ๋ฉ ํ
์ด๋ธ์ ํฌ๊ธฐ๊ฐ ๋จ์ผ GPU์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ด๊ณผํ ์ ์์ต๋๋ค.
20+ ์ผ๋ฐ์ ์ธ ๋ฐฉ๋ฒ์ ๋ชจ๋ธ ๋ณ๋ ฌํ์ ์ผ์ข
์ผ๋ก, ์๋ฒ ๋ฉ ํ
์ด๋ธ์ ์ฌ๋ฌ ๋๋ฐ์ด์ค๋ก ์ค๋ฉ(shard)ํ๋ ๊ฒ์
๋๋ค.
21+ ์ด๋ฅผ ์ํด, TorchRec์ |DistributedModelParallel |_ ๋๋ DMP๋ก ๋ถ๋ฆฌ๋ ์ฃผ์ํ API๋ฅผ ์๊ฐํฉ๋๋ค.
22+ PyTorch์ DistributedDataParallel์ ๊ฐ์ด, DMP๋ ๋ถ์ฐ ํ์ต์ ๊ฐ๋ฅํ๊ฒํ๊ธฐ ์ํด ๋ชจ๋ธ์ ํฌ์ฅํฉ๋๋ค.
2823
29- Installation
30- ------------
24+ ์ค์น
25+ ----
3126
32- Requirements : python >= 3.7
27+ ์๊ตฌ ์ฌํญ : python >= 3.7
3328
34- We highly recommend CUDA when using TorchRec. If using CUDA : cuda >= 11.0
29+ TorchRec์ ์ฌ์ฉํ ๋๋ CUDA๋ฅผ ์ ๊ทน ์ถ์ฒํฉ๋๋ค. CUDA๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ : cuda >= 11.0
3530
3631
3732.. code :: shell
@@ -42,22 +37,20 @@ We highly recommend CUDA when using TorchRec. If using CUDA: cuda >= 11.0
4237 pip3 install torchrec-nightly
4338
4439
45- Overview
46- --------
40+ ๊ฐ์
41+ ----
4742
48- This tutorial will cover three pieces of TorchRec: the ``nn.module `` |EmbeddingBagCollection |_, the |DistributedModelParallel |_ API, and
49- the datastructure |KeyedJaggedTensor |_.
43+ ์ด ํํ ๋ฆฌ์ผ์์๋ TorchRec์ ``nn.module `` |EmbeddingBagCollection |_, |DistributedModelParallel |_ API,
44+ ๋ฐ์ดํฐ ๊ตฌ์กฐ |KeyedJaggedTensor |_ 3๊ฐ์ง ๋ด์ฉ์ ๋ค๋ฃน๋๋ค .
5045
5146
52- Distributed Setup
53- ~~~~~~~~~~~~~~~~~
47+ ๋ถ์ฐ ์ค์
48+ ~~~~~~~
5449
55- We setup our environment with torch.distributed. For more info on
56- distributed, see this
57- `tutorial <https://pytorch.org/tutorials/beginner/dist_overview.html >`__.
50+ torch.distributed๋ฅผ ์ฌ์ฉํ์ฌ ํ๊ฒฝ์ ์ค์ ํฉ๋๋ค. ๋ถ์ฐ์ ๋ํ ์์ธํ ๋ด์ฉ์ ์ด
51+ `ํํ ๋ฆฌ์ผ <https://pytorch.org/tutorials/beginner/dist_overview.html >`__ ์ ์ฐธ๊ณ ํ์ธ์.
5852
59- Here, we use one rank (the colab process) corresponding to our 1 colab
60- GPU.
53+ ์ฌ๊ธฐ์๋ 1๊ฐ์ colab GPU์ ๋์ํ๋ 1๊ฐ์ ๋ญํฌ(colab ํ๋ก์ธ์ค)๋ฅผ ์ฌ์ฉํฉ๋๋ค.
6154
6255.. code :: python
6356
7164 os.environ[" MASTER_ADDR" ] = " localhost"
7265 os.environ[" MASTER_PORT" ] = " 29500"
7366
74- # Note - you will need a V100 or A100 to run tutorial as as !
75- # If using an older GPU (such as colab free K80),
76- # you will need to compile fbgemm with the appripriate CUDA architecture
77- # or run with "gloo" on CPUs
67+ # ์ฐธ๊ณ - ํํ ๋ฆฌ์ผ์ ์คํํ๋ ค๋ฉด V100 ๋๋ A100์ด ํ์ํฉ๋๋ค !
68+ # colab free K80๊ณผ ๊ฐ์ ์ค๋๋ GPU๋ฅผ ์ฌ์ฉํ๋ค๋ฉด,
69+ # ์ ์ ํ CUDA ์ํคํ
์ฒ๋ก fbgemm๋ฅผ ์ปดํ์ผํ๊ฑฐ๋,
70+ # CPU์์ "gloo"๋ก ์คํํด์ผ ํฉ๋๋ค.
7871 dist.init_process_group(backend = " nccl" )
7972
8073
81- From EmbeddingBag to EmbeddingBagCollection
82- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74+ EmbeddingBag์์ EmbeddingBagCollection์ผ๋ก
75+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8376
84- PyTorch represents embeddings through |torch.nn.Embedding |_ and |torch.nn.EmbeddingBag |_.
85- EmbeddingBag is a pooled version of Embedding .
77+ PyTorch๋ |torch.nn.Embedding |_ ์ |torch.nn.EmbeddingBag |_ ๋ฅผ ํตํด ์๋ฒ ๋ฉ์ ๋ํ๋
๋๋ค .
78+ EmbeddingBag์ ์๋ฒ ๋ฉ์ ํ(pool) ๋ฒ์ ์
๋๋ค .
8679
87- TorchRec extends these modules by creating collections of embeddings. We
88- will use |EmbeddingBagCollection |_ to represent a group of EmbeddingBags .
80+ TorchRec์ ์๋ฒ ๋ฉ ์ปฌ๋ ์
์ ์์ฑํ์ฌ ์ด ๋ชจ๋๋ค์ ํ์ฅํฉ๋๋ค.
81+ EmbeddingBag ๊ทธ๋ฃน์ ๋ํ๋ด๊ณ ์ |EmbeddingBagCollection |_ ์ ์ฌ์ฉํฉ๋๋ค .
8982
90- Here, we create an EmbeddingBagCollection (EBC) with two embedding bags.
91- Each table, ``product_table `` and ``user_table ``, is represented by a 64
92- dimension embedding of size 4096. Note how we initially allocate the EBC
93- on device โmetaโ. This will tell EBC to not allocate memory yet.
83+ ์ฌ๊ธฐ์๋, 2๊ฐ์ EmbeddingBag์ ๊ฐ์ง๋ EmbeddingBagCollection (EBC)์ ์์ฑํฉ๋๋ค.
84+ ๊ฐ ํ
์ด๋ธ ``product_table `` ๊ณผ ``user_table `` ๋ 4096 ํฌ๊ธฐ์ 64 ์ฐจ์ ์๋ฒ ๋ฉ์ผ๋ก ํํ๋ฉ๋๋ค.
85+ โmetaโ ๋๋ฐ์ด์ค์์ EBC๋ฅผ ์ด๊ธฐ์ ํ ๋นํ๋ ๋ฐฉ๋ฒ์ ์ฃผ์ํ์ธ์. EBC์๊ฒ ์์ง ๋ฉ๋ชจ๋ฆฌ๊ฐ ํ ๋น๋์ง ์์์ต๋๋ค.
9486
9587.. code :: python
9688
@@ -118,16 +110,15 @@ on device โmetaโ. This will tell EBC to not allocate memory yet.
118110 DistributedModelParallel
119111~~~~~~~~~~~~~~~~~~~~~~~~
120112
121- Now, weโre ready to wrap our model with |DistributedModelParallel |_ (DMP). Instantiating DMP will:
113+ ์ด์ ๋ชจ๋ธ์ |DistributedModelParallel |_ (DMP)๋ก ๊ฐ์ ์ค๋น๊ฐ ๋์์ต๋๋ค.
114+ DMP์ ์ธ์คํด์คํ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
122115
123- 1. Decide how to shard the model. DMP will collect the available
124- โshardersโ and come up with a โplanโ of the optimal way to shard the
125- embedding table(s) (i.e., the EmbeddingBagCollection).
126- 2. Actually shard the model. This includes allocating memory for each
127- embedding table on the appropriate device(s).
116+ 1. ๋ชจ๋ธ์ ์ค๋ฉํ๋ ๋ฐฉ๋ฒ์ ๊ฒฐ์ ํฉ๋๋ค. DMP๋ ์ด์ฉ ๊ฐ๋ฅํ โshardersโ๋ฅผ ์์งํ๊ณ
117+ ์๋ฒ ๋ฉ ํ
์ด๋ธ์ ์ค๋ฉํ๋ ์ต์ ์ ๋ฐฉ๋ฒ (์ฆ, the EmbeddingBagCollection)์ โplanโ์ ์์ฑํฉ๋๋ค.
118+ 2. ๋ชจ๋ธ์ ์ค๋ฉํฉ๋๋ค. ์ด ๊ณผ์ ์ ๊ฐ ์๋ฒ ๋ฉ ํ
์ด๋ธ์ ์ ์ ํ ์ฅ์น๋ก ๋ฉ๋ชจ๋ฆฌ๋ฅผ ํ ๋นํ๋ ๊ฒ์ ํฌํจํฉ๋๋ค.
128119
129- In this toy example, since we have two EmbeddingTables and one GPU ,
130- TorchRec will place both on the single GPU.
120+ ์ด ์์ ์์๋ 2๊ฐ์ EmbeddingTables๊ณผ ํ๋์ GPU๊ฐ ์๊ธฐ ๋๋ฌธ์ ,
121+ TorchRec์ ๋ชจ๋ ๋จ์ผ GPU์ ๋ฐฐ์นํฉ๋๋ค.
131122
132123.. code :: python
133124
@@ -136,15 +127,14 @@ TorchRec will place both on the single GPU.
136127 print (model.plan)
137128
138129
139- Query vanilla nn.EmbeddingBag with input and offsets
140- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
130+ ์
๋ ฅ๊ณผ ์คํ์
์ด ์๋ ๊ธฐ๋ณธ nn.EmbeddingBag ์ง์
131+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141132
142- We query |nn.Embedding |_ and |nn.EmbeddingBag |_
143- with ``input `` and ``offsets ``. Input is a 1-D tensor containing the
144- lookup values. Offsets is a 1-D tensor where the sequence is a
145- cumulative sum of the number of values to pool per example.
133+ ``input `` ๊ณผ ``offsets `` ์ด ์๋ |nn.Embedding |_ ๊ณผ |nn.EmbeddingBag |_ ๋ฅผ ์ง์ํฉ๋๋ค.
134+ ์
๋ ฅ์ lookup ๊ฐ์ ํฌํจํ๋ 1-D ํ
์์
๋๋ค.
135+ ์คํ์
์ ์ํ์ค๊ฐ ๊ฐ ์์ ์์ ๊ฐ์ ธ์ค๋ ๊ฐ์ ์์ ํฉ์ธ 1-D ํ
์์
๋๋ค.
146136
147- Letโs look at an example, recreating the product EmbeddingBag above:
137+ ์์ EmbeddingBag์ ๋ค์ ๋ง๋ค์ด๋ณด๋ ์๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
148138
149139::
150140
@@ -162,18 +152,15 @@ Letโs look at an example, recreating the product EmbeddingBag above:
162152 product_eb(input = torch.tensor([101 , 202 , 303 ]), offsets = torch.tensor([0 , 2 , 2 ]))
163153
164154
165- Representing minibatches with KeyedJaggedTensor
166- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
155+ KeyedJaggedTensor๋ก ๋ฏธ๋ ๋ฐฐ์น ํํํ๊ธฐ
156+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167157
168- We need an efficient representation of multiple examples of an arbitrary
169- number of entity IDs per feature per example. In order to enable this
170- โjaggedโ representation, we use the TorchRec datastructure
171- |KeyedJaggedTensor |_ (KJT).
158+ ์์ ๋ฐ ๊ธฐ๋ฅ๋ณ๋ก ๊ฐ์ฒด ID๊ฐ ์์์ ์์ธ ๋ค์ํ ์์ ๋ฅผ ํจ์จ์ ์ผ๋ก ๋ํ๋ด์ผ ํฉ๋๋ค.
159+ ๋ค์ํ ํํ์ด ๊ฐ๋ฅํ๋๋ก, TorchRec ๋ฐ์ดํฐ๊ตฌ์กฐ |KeyedJaggedTensor |_ (KJT)๋ฅผ ์ฌ์ฉํฉ๋๋ค.
172160
173- Letโs take a look at how to lookup a collection of two embedding
174- bags, โproductโ and โuserโ. Assume the minibatch is made up of three
175- examples for three users. The first of which has two product IDs, the
176- second with none, and the third with one product ID.
161+ โproductโ ์ โuserโ, 2๊ฐ์ EmbeddingBag์ ์ปฌ๋ ์
์ ์ฐธ์กฐํ๋ ๋ฐฉ๋ฒ์ ์ดํด๋ด
๋๋ค.
162+ ๋ฏธ๋๋ฐฐ์น๊ฐ 3๋ช
์ ์ฌ์ฉ์์ 3๊ฐ์ ์์ ๋ก ๊ตฌ์ฑ๋์ด ์๋ค๊ณ ๊ฐ์ ํฉ๋๋ค.
163+ ์ฒซ ๋ฒ์งธ๋ 2๊ฐ์ product ID๋ฅผ ๊ฐ์ง๊ณ , ๋ ๋ฒ์งธ๋ ์๋ฌด๊ฒ๋ ๊ฐ์ง์ง ์๊ณ , ์ธ ๋ฒ์งธ๋ ํ๋์ product ID๋ฅผ ๊ฐ์ง๋๋ค.
177164
178165::
179166
@@ -185,7 +172,7 @@ second with none, and the third with one product ID.
185172 | [303] | [606] |
186173 |------------|------------|
187174
188- The query should be:
175+ ์ง์๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
189176
190177.. code :: python
191178
@@ -198,34 +185,33 @@ The query should be:
198185 print (mb.to(torch.device(" cpu" )))
199186
200187
201- Note that the KJT batch size is
202- ``batch_size = len(lengths)//len(keys) ``. In the above example,
203- batch_size is 3.
188+ KJT ๋ฐฐ์น ํฌ๊ธฐ๋ ``batch_size = len(lengths)//len(keys) `` ์ธ ๊ฒ์ ๋์ฌ๊ฒจ๋ด ์ฃผ์ธ์.
189+ ์ ์์ ์์ batch_size๋ 3์
๋๋ค.
204190
205191
206192
207- Putting it all together, querying our distributed model with a KJT minibatch
208- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193+ ์ด์ ๋ฆฌํ์ฌ, KJT ๋ฏธ๋๋ฐฐ์น๋ฅผ ์ฌ์ฉํ์ฌ ๋ถ์ฐ ๋ชจ๋ธ ์ง์ํ๊ธฐ
194+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209195
210- Finally, we can query our model using our minibatch of products and
211- users.
196+ ๋ง์ง๋ง์ผ๋ก ์ ํ๊ณผ ์ฌ์ฉ์์ ๋ฏธ๋๋ฐฐ์น๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ์ง์ํฉ๋๋ค.
212197
213- The resulting lookup will contain a KeyedTensor, where each key (or
214- feature) contains a 2D tensor of size 3x64 (batch_size x embedding_dim).
198+ ๊ฒฐ๊ณผ ์กฐํ๋ KeyedTensor๋ฅผ ํฌํจํฉ๋๋ค.
199+ ๊ฐ ํค(key) ๋๋ ํน์ง(feature)์ ํฌ๊ธฐ๊ฐ 3x64 (batch_size x embedding_dim)์ธ
200+ 2D ํ
์๋ฅผ ํฌํจํฉ๋๋ค.
215201
216202.. code :: python
217203
218204 pooled_embeddings = model(mb)
219205 print (pooled_embeddings)
220206
221207
222- More resources
223- --------------
208+ ์ถ๊ฐ ์๋ฃ
209+ ---------
224210
225- For more information, please see our
211+ ์์ธํ ๋ด์ฉ์
226212`dlrm <https://github.com/pytorch/torchrec/tree/main/examples/dlrm >`__
227- example, which includes multinode training on the criteo terabyte
228- dataset, using Metaโs ` DLRM < https://arxiv.org/abs/1906.00091 >`__.
213+ ์์ ๋ฅผ ์ฐธ๊ณ ํ์ธ์. ์ด ์์ ๋ Meta์ ` DLRM < https://arxiv.org/abs/1906.00091 >`__ ์ ์ฌ์ฉํ์ฌ
214+ 1ํ
๋ผ๋ฐ์ดํธ ๋ฐ์ดํฐ์
์ ๋ํ ๋ฉํฐ ๋
ธ๋ ํ์ต์ ํฌํจํฉ๋๋ค.
229215
230216
231217.. |DistributedModelParallel | replace :: ``DistributedModelParallel ``
0 commit comments