UHG
Search
Close this search box.

AI Still Needs Humans for the Cumbersome Task of Data Annotation

Currently, foundational model-based platforms can incorporate feedback and corrections from domain experts to improve the accuracy of annotations.

Share

Illustration by Nikhil Kumar

According to recent studies, OpenAI’s GPT-4 has been successful in accurately annotating cell types using marker gene information in single-cell RNA sequencing analysis.

Generative AI has expanded the use cases of data annotation and labelling. However, human expertise will always remain core to its success.

Acknowledging the same, Radha Basu, the co-founder and CEO of iMerit, told AIM in an exclusive interaction, “Traditional data annotation methods, which relied on low-skill crowdsourced workforces, were effective for simple tasks but were limited in scalability and efficiency for complex data sets.”

Now, GPT-4’s annotations aligning closely with manual ones across various tissues and cell types has helped cut down the time and expertise needed for annotation.

“Firstly, such (generative AI) models inherently have a larger context window than traditional predictive AI. Secondly, the need for data training is more at the expert practitioner level since the underlying foundation model is built with unstructured or semi-structured data,” added Basu. 

However, there have been mixed opinions on the future of data annotation platforms with generative AI. 

As per recent reports, the global data annotation market is projected to be valued at around $8.22 billion by 2028. But Voxel51 co-founder and University of Michigan professor Jason Corso is of the opinion that data annotation jobs are slowly becoming obsolete.

Contrary to his expectations, however, technology is not eating up traditional data annotation jobs.

Generative AI’s larger context window and expert-level training capabilities differentiate it from traditional data annotation methods, making it more adaptable. 

Currently, foundational model-based platforms can incorporate feedback and corrections from domain experts, such as medical professionals, agronomists, or mathematicians, to refine and improve the accuracy of annotations.

“This expert collaboration ensures the data used to train AI models is of higher quality,” Basu highlighted. “Combining generative AI’s processing power with the knowledge of trained professionals leads to more reliable and accurate AI systems.”

Basu noted that LLM-based systems can handle more intricate data labelling tasks by integrating automation into workflows, improving efficiency and scalability without long hours of manual effort. Meanwhile, human annotation has been vital for AI, encouraging the growth of supervised machine learning.

So now, although self-supervised learning and auto-annotation are emerging, they cannot fully replace human annotations.

How Does iMerit Fit in?

iMerit’s proprietary model Ango Hub leverages generative AI to propel growth in different industries such as medicine, autonomous mobility, and precision agriculture. Ango Hub offers a flexible workflow manager that integrates human and machine efforts, allowing public models to be included in the process. 

The platform utilises a curated network of experts for tasks like reinforcement learning with human feedback (RLHF) and supervised fine tuning (SFT), providing high-quality, domain-specific data training. 

The ability of GenAI to handle and integrate various data types will further improve its role in data annotation, leading to more comprehensive and accurate AI models.

Apart from being one of the exceptional technologists, Basu is also known to foster a good work culture. Her leadership strategy centres on authenticity, dedication, and innovation, focussing on giving one’s all and taking risks. 

“My leadership style is one of constant innovation while staying focused on my company’s role in the ecosystem and what our customers actually need from us. I am always looking at embracing change, mixing and matching the three pillars – technology, talent, and technique,” she added.

In the context of AI and ML innovations, the vision for iMerit is to build a responsible, inclusive organisation by prioritising a diverse, motivated workforce, applying technology to societal needs, and maintaining sustainable business practices with strong financial discipline. 

About 52% of the organisation comprises women. 

India as a Market

Over the past decade, the company has adapted to the shifting data requirements by evolving from simple, prescriptive tasks to more complex, domain-specific projects requiring a consultative approach and expert collaboration. 

This evolution included the adoption of heavy dashboards and production metrics, automation, and workflow orchestration. 

Basu said that India has managed to become a rapidly relevant market due to the rise of GenAI companies.

“India is a very exciting market due to the rapid growth in GenAI companies and the interest in building an India stack, including local languages and problems to be solved,” said Basu.

The revenue share from the Indian market has seen rapid growth during 2023-24.

Meanwhile, there has also been a noticeable increase in inquiries for such data corpus creation, domain tuning, and red-teaming in various local languages, leading to active collaborations with customers on Indian language-based stacks.

“We have to do justice to both [the customer and the workforce] in order to achieve quality and consistency,” Basu emphasised.

This has further been recognised as iMerit and Ango Hub have managed to win two awards in India this year alone for being best in class in terms of machine learning, application and solutions provided.

Looking ahead, Basu believes that the integration of generative AI with multimodal data (combining image, speech, text, LiDAR, and video) is expected to change the visual domain in industries such as medical AI and autonomous mobility.

📣 Want to advertise in AIM? Book here

Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore generative AI with a special focus on big techs, database, healthcare, DE&I, hiring in tech and more.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.