Commit
Β·
b00a86a
1
Parent(s):
5ee9a01
hm
Browse files- src/index.html +2 -2
src/index.html
CHANGED
@@ -80,7 +80,7 @@
|
|
80 |
This open source book is here to change that. Starting from the basics, we'll walk you through the knowledge necessary to scale the training of large language models (LLMs) from one GPU to tens, hundreds, and even thousands of GPUs, illustrating theory with practical code examples and reproducible benchmarks.
|
81 |
</p>
|
82 |
|
83 |
-
<p>As the size of the clusters used to train these models has grown, various techniques, such as data parallelism, tensor parallelism, pipeline parallelism, and context parallelism as well as ZeRO and kernel fusion, have been invented to make sure that GPUs are highly utilized at all times. This significantly reduces training time and makes the most efficient use of this expensive hardware.
|
84 |
|
85 |
<aside>If you have questions or remarks, open a discussion on the <a href="https://huggingface.co/spaces/nanotron/ultrascale-playbook/discussions?status=open&type=discussion">Community tab</a>!</aside>
|
86 |
|
@@ -254,7 +254,7 @@
|
|
254 |
|
255 |
<!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->
|
256 |
|
257 |
-
<p><strong>Real training efficiency benchmarks:</strong> How to <em>actually</em> scale your LLM training depends on your infrastructure, such as the kind of chips used, interconnect, etc., so we can
|
258 |
|
259 |
<!-- <iframe id="plotFrame" src="assets/data/benchmarks/benchmarks_interactive.html" scrolling="no" frameborder="0" height="840" width="720"></iframe> -->
|
260 |
<div id="fragment-benchmarks_interactive"></div>
|
|
|
80 |
This open source book is here to change that. Starting from the basics, we'll walk you through the knowledge necessary to scale the training of large language models (LLMs) from one GPU to tens, hundreds, and even thousands of GPUs, illustrating theory with practical code examples and reproducible benchmarks.
|
81 |
</p>
|
82 |
|
83 |
+
<p>As the size of the clusters used to train these models has grown, various techniques, such as data parallelism, tensor parallelism, pipeline parallelism, and context parallelism as well as ZeRO and kernel fusion, have been invented to make sure that GPUs are highly utilized at all times. This significantly reduces training time and makes the most efficient use of this expensive hardware. These distributed training techniques are not only important for building initial models but have also become essential for fine-tuning large models on specialized data, which often produces the best results. In this book, we'll progressively go over all of these techniques β from the simplest to the most refined ones β while maintaining a single story line to help you understand where each method comes from.</p>
|
84 |
|
85 |
<aside>If you have questions or remarks, open a discussion on the <a href="https://huggingface.co/spaces/nanotron/ultrascale-playbook/discussions?status=open&type=discussion">Community tab</a>!</aside>
|
86 |
|
|
|
254 |
|
255 |
<!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->
|
256 |
|
257 |
+
<p><strong>3. Real training efficiency benchmarks:</strong> How to <em>actually</em> scale your LLM training depends on your infrastructure, such as the kind of chips used, interconnect, etc., so we can't give a single unified recipe for this. What we will give you is a way to benchmark several setups. This is what we've done on our cluster. We ran over 4,100 distributed experiments (over 16k including test runs) with up to 512 GPUs to scan many possible distributed training layouts and model sizes. </p>
|
258 |
|
259 |
<!-- <iframe id="plotFrame" src="assets/data/benchmarks/benchmarks_interactive.html" scrolling="no" frameborder="0" height="840" width="720"></iframe> -->
|
260 |
<div id="fragment-benchmarks_interactive"></div>
|