AWS Trainium & Inferentia documentation
NeuronX Text-generation-inference for AWS inferentia2
NeuronX Text-generation-inference for AWS inferentia2
Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs).
A neuron backend allows to deploy TGI for Trainium and Inferentia chips.