Translate torchrec_tutorial #599 (#600)

gorae17 · web-flow · commit 601a6562a046 · 2022-09-14T23:44:47.000+09:00
* Translate torchrec_tutorial #599
diff --git a/intermediate_source/torchrec_tutorial.rst b/intermediate_source/torchrec_tutorial.rst
@@ -1,37 +1,32 @@
-Introduction to TorchRec
-========================
+TorchRec 소개
+============
 
 .. tip::
-   To get the most of this tutorial, we suggest using this
-   `Colab Version <https://colab.research.google.com/github/pytorch/torchrec/blob/main/Torchrec_Introduction.ipynb>`__.
-   This will allow you to experiment with the information presented below.
+   이 튜토리얼을 최대한 활용하려면 이 
+   `Colab 버전 <https://colab.research.google.com/github/pytorch/torchrec/blob/main/Torchrec_Introduction.ipynb>`__ 을 사용하는 것이 좋습니다.
+   이를 통해 아래에 제시된 정보를 실험할 수 있습니다.
    
-Follow along with the video below or on `youtube <https://www.youtube.com/watch?v=cjgj41dvSeQ>`__.
+아래 동영상이나 `유튜브 <https://www.youtube.com/watch?v=cjgj41dvSeQ>`__ 에서 따라해보세요.
 
 .. raw:: html
 
    <div style="margin-top:10px; margin-bottom:10px;">
      <iframe width="560" height="315" src="https://www.youtube.com/embed/cjgj41dvSeQ" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    </div>
 
-Frequently, when building recommendation systems, we want to represent
-entities like products or pages with embeddings. For example, see Meta
-AI’s `Deep learning recommendation
-model <https://arxiv.org/abs/1906.00091>`__, or DLRM. As the number of
-entities grow, the size of the embedding tables can exceed a single
-GPU’s memory. A common practice is to shard the embedding table across
-devices, a type of model parallelism. To that end, TorchRec introduces
-its primary API
-called |DistributedModelParallel|_,
-or DMP. Like PyTorch’s DistributedDataParallel, DMP wraps a model to
-enable distributed training.
+추천 시스템을 만들 때, 제품이나 페이지와 같은 객체를 임베딩으로 표현하고 싶은 경우가 많습니다. 
+Meta AI의 `딥러닝 추천 모델 <https://arxiv.org/abs/1906.00091>`__ 또는 DLRM을 예로 들 수 있습니다. 
+객체의 수가 증가함에 따라, 임베딩 테이블의 크기가 단일 GPU의 메모리를 초과할 수 있습니다. 
+일반적인 방법은 모델 병렬화의 일종으로, 임베딩 테이블을 여러 디바이스로 샤딩(shard)하는 것입니다. 
+이를 위해, TorchRec은 |DistributedModelParallel|_ 또는 DMP로 불리는 주요한 API를 소개합니다. 
+PyTorch의 DistributedDataParallel와 같이, DMP는 분산 학습을 가능하게하기 위해 모델을 포장합니다.
 
-Installation
-------------
+설치
+----
 
-Requirements: python >= 3.7
+요구 사항: python >= 3.7
 
-We highly recommend CUDA when using TorchRec. If using CUDA: cuda >= 11.0
+TorchRec을 사용할 때는 CUDA를 적극 추천합니다. CUDA를 사용하는 경우: cuda >= 11.0
 
 
 .. code:: shell
@@ -42,22 +37,20 @@ We highly recommend CUDA when using TorchRec. If using CUDA: cuda >= 11.0
     pip3 install torchrec-nightly
 
 
-Overview
---------
+개요
+----
 
-This tutorial will cover three pieces of TorchRec: the ``nn.module`` |EmbeddingBagCollection|_, the |DistributedModelParallel|_ API, and
-the datastructure |KeyedJaggedTensor|_.
+이 튜토리얼에서는 TorchRec의 ``nn.module`` |EmbeddingBagCollection|_, |DistributedModelParallel|_ API, 
+데이터 구조 |KeyedJaggedTensor|_ 3가지 내용을 다룹니다.
 
 
-Distributed Setup
-~~~~~~~~~~~~~~~~~
+분산 설정
+~~~~~~~
 
-We setup our environment with torch.distributed. For more info on
-distributed, see this
-`tutorial <https://pytorch.org/tutorials/beginner/dist_overview.html>`__.
+torch.distributed를 사용하여 환경을 설정합니다. 분산에 대한 자세한 내용은 이 
+`튜토리얼 <https://pytorch.org/tutorials/beginner/dist_overview.html>`__ 을 참고하세요.
 
-Here, we use one rank (the colab process) corresponding to our 1 colab
-GPU.
+여기서는 1개의 colab GPU에 대응하는 1개의 랭크(colab 프로세스)를 사용합니다.
 
 .. code:: python
 
@@ -71,26 +64,25 @@ GPU.
     os.environ["MASTER_ADDR"] = "localhost"
     os.environ["MASTER_PORT"] = "29500"
 
-    # Note - you will need a V100 or A100 to run tutorial as as!
-    # If using an older GPU (such as colab free K80), 
-    # you will need to compile fbgemm with the appripriate CUDA architecture
-    # or run with "gloo" on CPUs 
+    # 참고 - 튜토리얼을 실행하려면 V100 또는 A100이 필요합니다!
+    # colab free K80과 같은 오래된 GPU를 사용한다면,  
+    # 적절한 CUDA 아키텍처로 fbgemm를 컴파일하거나,
+    # CPU에서 "gloo"로 실행해야 합니다.
     dist.init_process_group(backend="nccl")
 
 
-From EmbeddingBag to EmbeddingBagCollection
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+EmbeddingBag에서 EmbeddingBagCollection으로
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-PyTorch represents embeddings through |torch.nn.Embedding|_ and |torch.nn.EmbeddingBag|_.
-EmbeddingBag is a pooled version of Embedding.
+PyTorch는 |torch.nn.Embedding|_ 와 |torch.nn.EmbeddingBag|_ 를 통해 임베딩을 나타냅니다.
+EmbeddingBag은 임베딩의 풀(pool) 버전입니다.
 
-TorchRec extends these modules by creating collections of embeddings. We
-will use |EmbeddingBagCollection|_ to represent a group of EmbeddingBags.
+TorchRec은 임베딩 컬렉션을 생성하여 이 모듈들을 확장합니다. 
+EmbeddingBag 그룹을 나타내고자 |EmbeddingBagCollection|_ 을 사용합니다.
 
-Here, we create an EmbeddingBagCollection (EBC) with two embedding bags.
-Each table, ``product_table`` and ``user_table``, is represented by a 64
-dimension embedding of size 4096. Note how we initially allocate the EBC
-on device “meta”. This will tell EBC to not allocate memory yet.
+여기서는, 2개의 EmbeddingBag을 가지는 EmbeddingBagCollection (EBC)을 생성합니다.
+각 테이블 ``product_table`` 과 ``user_table`` 는 4096 크기의 64 차원 임베딩으로 표현됩니다. 
+“meta” 디바이스에서 EBC를 초기에 할당하는 방법에 주의하세요. EBC에게 아직 메모리가 할당되지 않았습니다. 
 
 .. code:: python
 
@@ -118,16 +110,15 @@ on device “meta”. This will tell EBC to not allocate memory yet.
 DistributedModelParallel
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
-Now, we’re ready to wrap our model with |DistributedModelParallel|_ (DMP). Instantiating DMP will:
+이제 모델을 |DistributedModelParallel|_ (DMP)로 감쌀 준비가 되었습니다. 
+DMP의 인스턴스화는 다음과 같습니다.
 
-1. Decide how to shard the model. DMP will collect the available
-   ‘sharders’ and come up with a ‘plan’ of the optimal way to shard the
-   embedding table(s) (i.e., the EmbeddingBagCollection).
-2. Actually shard the model. This includes allocating memory for each
-   embedding table on the appropriate device(s).
+1. 모델을 샤딩하는 방법을 결정합니다. DMP는 이용 가능한 ‘sharders’를 수집하고
+   임베딩 테이블을 샤딩하는 최적의 방법 (즉, the EmbeddingBagCollection)의 ‘plan’을 작성합니다.
+2. 모델을 샤딩합니다. 이 과정은 각 임베딩 테이블을 적절한 장치로 메모리를 할당하는 것을 포함합니다. 
 
-In this toy example, since we have two EmbeddingTables and one GPU,
-TorchRec will place both on the single GPU.
+이 예제에서는 2개의 EmbeddingTables과 하나의 GPU가 있기 때문에,
+TorchRec은 모두 단일 GPU에 배치합니다. 
 
 .. code:: python
 
@@ -136,15 +127,14 @@ TorchRec will place both on the single GPU.
     print(model.plan)
 
 
-Query vanilla nn.EmbeddingBag with input and offsets
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+입력과 오프셋이 있는 기본 nn.EmbeddingBag 질의
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-We query |nn.Embedding|_ and |nn.EmbeddingBag|_
-with ``input`` and ``offsets``. Input is a 1-D tensor containing the
-lookup values. Offsets is a 1-D tensor where the sequence is a
-cumulative sum of the number of values to pool per example.
+``input`` 과 ``offsets`` 이 있는 |nn.Embedding|_ 과 |nn.EmbeddingBag|_ 를 질의합니다.
+입력은 lookup 값을 포함하는 1-D 텐서입니다. 
+오프셋은 시퀀스가 각 예제에서 가져오는 값의 수의 합인 1-D 텐서입니다.
 
-Let’s look at an example, recreating the product EmbeddingBag above:
+위의 EmbeddingBag을 다시 만들어보는 예는 다음과 같습니다.
 
 ::
 
@@ -162,18 +152,15 @@ Let’s look at an example, recreating the product EmbeddingBag above:
     product_eb(input=torch.tensor([101, 202, 303]), offsets=torch.tensor([0, 2, 2]))
 
 
-Representing minibatches with KeyedJaggedTensor
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+KeyedJaggedTensor로 미니 배치 표현하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-We need an efficient representation of multiple examples of an arbitrary
-number of entity IDs per feature per example. In order to enable this
-“jagged” representation, we use the TorchRec datastructure
-|KeyedJaggedTensor|_ (KJT).
+예제 및 기능별로 객체 ID가 임의의 수인 다양한 예제를 효율적으로 나타내야 합니다. 
+다양한 표현이 가능하도록, TorchRec 데이터구조 |KeyedJaggedTensor|_ (KJT)를 사용합니다.
 
-Let’s take a look at how to lookup a collection of two embedding
-bags, “product” and “user”. Assume the minibatch is made up of three
-examples for three users. The first of which has two product IDs, the
-second with none, and the third with one product ID.
+“product” 와 “user”, 2개의 EmbeddingBag의 컬렉션을 참조하는 방법을 살펴봅니다. 
+미니배치가 3명의 사용자와 3개의 예제로 구성되어 있다고 가정합니다. 
+첫 번째는 2개의 product ID를 가지고, 두 번째는 아무것도 가지지 않고, 세 번째는 하나의 product ID를 가집니다. 
 
 ::
 
@@ -185,7 +172,7 @@ second with none, and the third with one product ID.
    | [303]      | [606]      |
    |------------|------------|
 
-The query should be:
+질의는 다음과 같습니다.
 
 .. code:: python
 
@@ -198,34 +185,33 @@ The query should be:
     print(mb.to(torch.device("cpu")))
 
 
-Note that the KJT batch size is
-``batch_size = len(lengths)//len(keys)``. In the above example,
-batch_size is 3.
+KJT 배치 크기는 ``batch_size = len(lengths)//len(keys)`` 인 것을 눈여겨봐 주세요. 
+위 예제에서 batch_size는 3입니다. 
 
 
 
-Putting it all together, querying our distributed model with a KJT minibatch
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+총정리하여, KJT 미니배치를 사용하여 분산 모델 질의하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Finally, we can query our model using our minibatch of products and
-users.
+마지막으로 제품과 사용자의 미니배치를 사용하여 모델을 질의합니다.  
 
-The resulting lookup will contain a KeyedTensor, where each key (or
-feature) contains a 2D tensor of size 3x64 (batch_size x embedding_dim).
+결과 조회는 KeyedTensor를 포함합니다. 
+각 키(key) 또는 특징(feature)은 크기가 3x64 (batch_size x embedding_dim)인 
+2D 텐서를 포함합니다. 
 
 .. code:: python
 
     pooled_embeddings = model(mb)
     print(pooled_embeddings)
 
 
-More resources
---------------
+추가 자료
+---------
 
-For more information, please see our
+자세한 내용은 
 `dlrm <https://github.com/pytorch/torchrec/tree/main/examples/dlrm>`__
-example, which includes multinode training on the criteo terabyte
-dataset, using Meta’s `DLRM <https://arxiv.org/abs/1906.00091>`__.
+예제를 참고하세요. 이 예제는 Meta의 `DLRM <https://arxiv.org/abs/1906.00091>`__ 을 사용하여
+1테라바이트 데이터셋에 대한 멀티 노드 학습을 포함합니다. 
 
 
 .. |DistributedModelParallel| replace:: ``DistributedModelParallel``