Meta has released a family of its latest LLMs, Llama 3. These models range from 8 billion to 70 billion parameters and include pre-trained as well as fine-tuned versions optimised specifically for dialogue applications and are available on Microsoft Azure.
The fine-tuned Llama 3 models, designed for dialogue use cases, have demonstrated impressive performance across various benchmarks. In human evaluations assessing their helpfulness and safety, these models have proven to be on par with popular closed-source counterparts, according to Microsoft’s blog.
Meta is offering the Llama-3-8B inference APIs alongside hosted fine-tuning capabilities through Azure AI Studio. Azure AI Studio is a robust platform designed for developing Generative AI applications, offering features like a playground for model exploration, Prompt Flow for prompt engineering, and RAG (Retrieval Augmented Generation) for seamless integration of data into applications.
Under this offering, users can leverage the Llama-3-8B inference APIs on a pay-as-you-go basis, where billing is based on the input and output tokens utilised during model scoring.
Additionally, for models supporting fine-tuning, fine-tuning jobs are billed hourly, with inference for fine-tuned models incurring charges based on token usage along with an hourly hosting fee.
Integration with Azure AI Studio simplifies the subscription process for accessing and utilizing Meta’s Llama 3 models, offering a comprehensive environment for AI development and deployment.
Earlier this year, Meta chief Mark Zuckerberg announced that Meta trained Llama 3 using a massive compute infrastructure. The company plans to procure 350k H100s by the end of this year, with an overall total of almost 600k H100s equivalent of compute if other resources are included.