Spaces:

nanotron
/

ultrascale-playbook

Running

App Files Files Community

104

embed predict_memory

#47

by nouamanetazi HF staff - opened 29 days ago

base: refs/heads/main

←

from: refs/pr/47

Discussion Files changed

+23

-31

Files changed (6) hide show

assets/images/predict_memory_tool.png +3 -0
assets/images/profile_trace_annotated.png +2 -2
dist/assets/images/memorycoalescing.png +2 -2
dist/assets/images/predict_memory_tool.png +3 -0
dist/index.html +7 -14
src/index.html +6 -13

assets/images/predict_memory_tool.png ADDED Viewed

Git LFS Details

SHA256: b079e5968c1ddfff6f0f663db43f6fb9715240e92dd455875194575eb4c98313
Pointer size: 130 Bytes
Size of remote file: 94.3 kB

assets/images/profile_trace_annotated.png CHANGED Viewed

Git LFS Details

SHA256: e1806f717e427febe26bfa45135d45d76adc9808c8a92553f7f7e0bb9faa80ae
Pointer size: 131 Bytes
Size of remote file: 995 kB

Git LFS Details

SHA256: 7359ca99eff4eaa53952bfba0dd562ab6bb9b109033f283d296aef2471e642bc
Pointer size: 131 Bytes
Size of remote file: 995 kB

dist/assets/images/memorycoalescing.png CHANGED Viewed

Git LFS Details

SHA256: 088cd848100ab26abbffdcc7c0e8f18a83facd0a8637c460e3ac88d483b04b46
Pointer size: 130 Bytes
Size of remote file: 94.1 kB

Git LFS Details

SHA256: 1094fe9aeb953c743791445ee6d7e73a5a89fa85fe60f4312266d1265e7c591a
Pointer size: 130 Bytes
Size of remote file: 94.1 kB

dist/assets/images/predict_memory_tool.png ADDED Viewed

Git LFS Details

SHA256: a61828c60b0e39e57c3d050474889dc51c87f5fdaa6d5afa4ae7b55e329678b2
Pointer size: 130 Bytes
Size of remote file: 26.5 kB

dist/index.html CHANGED Viewed

@@ -193,19 +193,12 @@
             </div>
         </div>
-        <p>While this widget gives a theoretical breakdown the following tool can be used to predict the memory usage:</p>
-        <ul>
-          <li>
-            <p>
-              <a href="https://huggingface.co/spaces/nanotron/predict_memory">predict_memory</a>
-            </p>
-          </li>
-          <li>
-            <p>
-              <a href="https://pytorch.org/docs/stable/torch_cuda_memory.html">torch_cuda_memory</a>
-            </p>
-          </li>
-        </ul>
         <p><strong>Clear code implementations:</strong> theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
@@ -1874,7 +1867,7 @@
         <p>Clearly, none of these techniques is a silver bullet for magical scaling and we'll often have to combine them in one way or another. Can we actually come up with a few rules that would help us find a good starting point to choose among –and combine– them? This will be the topic of our next section.</p>
-        <h2>How to Find the Best Training Configuration</h2>
         <p>We’ve now covered all the parallelism techniques that are actually used to distribute and training larger models as well as how and why they can be combined together. There remain a general question: which ones should we choose in the end and how to decide on a specific combination?</p>

             </div>
         </div>
+        <p>While this widget gives a theoretical breakdown we also made the <a href="https://huggingface.co/spaces/nanotron/predict_memory">following tool</a> that can be used to predict the memory usage during a training run:</p>
+        <a href="https://huggingface.co/spaces/nanotron/predict_memory">
+            <img src="/assets/images/predict_memory_tool.png" alt="Predict Memory Tool" />
+        </a>
         <p><strong>Clear code implementations:</strong> theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
         <p>Clearly, none of these techniques is a silver bullet for magical scaling and we'll often have to combine them in one way or another. Can we actually come up with a few rules that would help us find a good starting point to choose among –and combine– them? This will be the topic of our next section.</p>
+        <h2>Finding the Best Training Configuration</h2>
         <p>We’ve now covered all the parallelism techniques that are actually used to distribute and training larger models as well as how and why they can be combined together. There remain a general question: which ones should we choose in the end and how to decide on a specific combination?</p>

src/index.html CHANGED Viewed

@@ -193,19 +193,12 @@
             </div>
         </div>
-        <p>While this widget gives a theoretical breakdown the following tool can be used to predict the memory usage:</p>
-        <ul>
-          <li>
-            <p>
-              <a href="https://huggingface.co/spaces/nanotron/predict_memory">predict_memory</a>
-            </p>
-          </li>
-          <li>
-            <p>
-              <a href="https://pytorch.org/docs/stable/torch_cuda_memory.html">torch_cuda_memory</a>
-            </p>
-          </li>
-        </ul>
         <p><strong>Clear code implementations:</strong> theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>

             </div>
         </div>
+        <p>While this widget gives a theoretical breakdown we also made the <a href="https://huggingface.co/spaces/nanotron/predict_memory">following tool</a> that can be used to predict the memory usage during a training run:</p>
+        <a href="https://huggingface.co/spaces/nanotron/predict_memory">
+            <img src="/assets/images/predict_memory_tool.png" alt="Predict Memory Tool" />
+        </a>
         <p><strong>Clear code implementations:</strong> theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>