Last updated July 16, 2024
In AI Trends & Future

Time To Scale Down Large Language Models

Advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs.

Share

Published on July 16, 2024

by Anshul Vipat

Renowned research scientist Andrej Karpathy recently said that the llm.c project showcases how GPT-2 can now be trained in merely 24 hours on a single 8XH100 GPU node—for just $672.

Karpathy’s journey began with an interest in reproducing OpenAI’s GPT-2 for educational purposes. He initially encountered obstacles in using PyTorch, a popular deep-learning framework.

Frustrated by these challenges, Karpathy decided to write the entire training process from scratch in C/CUDA, resulting in the creation of the llm.c project. It eventually evolved into a streamlined, efficient system for training language models.

The project, which implements GPT training in C/CUDA, has minimal setup requirements and offers efficient and cost-effective model training.

Scaling down LLMs

In his post, Karparthy mentioned how advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs.

Mauro Sicard, the director of BRIX Agency, agreed with Karparthy. “With the improvements in both GPUs and training optimisation, the future may surprise us,” he said.

Scaling down LLM models while maintaining performance is a crucial step in making AI more accessible and affordable.

According to Meta engineer Mahima Chhagani, LLMLingua is a method designed to efficiently decrease the size of prompts without sacrificing significant information.

Chhagani said using an LLM cascade, starting with affordable models like GPT-2 and escalating to more powerful ones like GPT-3.5 Turbo and GPT-4 Turbo, optimises cost by only using expensive models when necessary.

FrugalGPT is another approach that uses multiple APIs to balance cost and performance, reducing costs by up to 98% while maintaining a performance comparable to GPT-4.

Additionally, a Reddit developer named pmarks98 used a fine-tuning approach with tools like OpenPipe and models like Mistral 7B, cutting costs by up to 88%.

Is there a Real Need to Reduce Costs?

Cheaper LLMs, especially open-source models, often have limited capabilities compared to the proprietary models from tech giants like OpenAI or Google.

While the upfront costs may be lower, running a cheap LLM locally can lead to higher long-term costs due to the need for specialised hardware, maintenance overheads, and limited scalability.

Moreover, as pointed out by Princeton professor Arvind Narayanan, the focus has shifted from capability improvements to massive cost reductions, which many AI researchers find disappointing.

Cost over Capability Improvements

Narayanan argued that cost reductions are more exciting and impactful for several reasons. They often lead to improved accuracy in many tasks. Lower costs can also accelerate the pace of research by turning it more affordable and making more functionalities accessible.

So, in terms of what will make LLMs more useful in people’s lives, cost is hands down more significant at this stage than capability, he said.

In another post, Narayanan said that the cheaper a resource gets, the more demand there will be for it. Maybe in the future it will be common to build applications that invoke LLMs millions of times in the process of completing a simple task.
This democratisation of AI could accelerate faster than we imagined, possibly leading to personal AGIs for $10 by 2029.

📣 Want to advertise in AIM? Book here

Anshul Vipat

Anshul Vipat is a tech aficionado, enthusiastic about the latest innovations in the digital world. He also holds keen interest in traveling, exploring and cooking