Growing up in Lajpat Nagar, New Delhi, visually impaired Kartik Sawhney did not have the option to study science for Classes 11 and 12 in CBSE. He had to advocate for his right to study the subject he desired and even when the opportunity was finally available to him, he encountered difficulties in accessing the necessary resources.
“Over 96% of the content available today is incompatible with various assistive technologies that people with disabilities use. So, chances that a PDF you’re going to find online is going to be accessible is only 4%,” Sawhney told AIM.
Often his mother had to translate the curriculum into Braille, to make it accessible for Kartik. This was not easy, but he, at a very early stage learnt how technology could come to his aid. He is now an engineer and entrepreneur with a background in AI and has worked with companies like Microsoft, Uber and IBM.
Seeking to help individuals like him, Sawhney, along with Shakul Sonker, who is also visually impaired, turned to AI to overcome the systematic barriers the community faces.
In 2018, they co-founded Inclusive Stem (I-Stem) an advocacy group to help their peers who were studying Science, Technology, Engineering, and Mathematics (STEM), to provide them support and mentorship. But, it was during the COVID Pandemic, that the duo decided to tap into their background and leverage AI at scale.
Sawhney, along with Sonker, who is also a computer engineer and has experience working as a machine learning engineer, started developing computer vision models.
Sonker told AIM that I-Stem’s technology is being leveraged by various universities, educational institutions and corporations across India and globally.
“For the Telangana government, our initiative involves converting K-12 books into accessible formats in both English and Telugu,” he continued, “Other customers include IIT Delhi, Ashoka University, Washington State’s Department of Services for the Blind, Google, and Intel. Our technology is also being used by the United Nations (UN) and UNICEF is our first investor.”
Microsoft, GSMA, Bosch, and National Geographic Society are among the list of investors, according to Sonker. “Moreover, our partners and supporters include the Nudge Institute, Stanford StartX, Morgan Stanley, Oracle, Amazon, and Goldman Sachs.”
Going beyond Optical Character Recognition
Optical Character Recognition (OCR) served as the initial step, but its limitations are known. Although effective for text extraction from images, OCR falls short for diverse content. The goal for the duo was to go beyond OCR and leverage AI to translate complex STEM content.
“OCR gives you gibberish if you, for instance, use it for a maths problem. So, we thought, why not leverage AI to deeply comprehend the structure and then make it available in a way that is compatible with the screen readers, which visually impaired individuals use,” Sawhney said.
Most of the documents are not compatible with the screen readers and that is the big problem, according to Sawhney. What I-Stem developed is a deep learning computer vision model trained on diverse data, which includes academic papers, pamphlets, posters, presentations, receipts, invoices, menus, etc., ensuring comprehensive coverage for conversion into accessible formats.
The model has an accuracy of 92% and can read complex documents. The system excels with STEM, multicolumn, and finance documents—representing the majority of the training data. As the name of the company suggests, the duo has also emphasised making STEM content accessible, addressing a field that was previously less available to certain individuals.
Here, Sonker adds that since the accuracy is not fully 100%, the startup has a manual team of remediators who ensure the documents converted are accurate. This helps ensure that when a screen reader reads it, the content is accurate and is in a format which is optimised for the best user experience.
Moreover, the startup is now working towards leveraging generative AI models like GPT-V, the vision model from OpenAI. Currently, conventional AI models can identify an image but struggle with detailed descriptions. But foundational vision models make comprehensive image descriptions possible.
“Additionally, we enable users to ask questions using visual question answering, enhancing understanding of specific details in diagrams,” Sawhney said.
( I-Stem awareness session held in Bhopal)
Changing the discourse on disability
The duo also clearly understands that ensuring content accessibility is insufficient; it’s merely the beginning of the challenges the community encounters. Hence, a crucial part of their efforts includes community programmes.
To make their technology widely accessible, I-Stem has teamed up with non-profits that deliver the technology on the ground and ensure they provide support in classrooms.
In the present scenario, disability is often linked with low expectations. By collaborating with individuals with disabilities and showcasing their accomplishments, Sawhney and Sonker want to challenge the misconception that disability equates to limited expectations.
“Demonstrating that people with disabilities can hold positions of power helps transform the overall narrative around disability. With universities, we administer a fellowship programme targeting individuals with high potential and disabilities.
“We believe that by assisting these individuals in securing meaningful roles within high-growth industries, they can serve as influential role models. This not only paves the way for their success but also contributes to reshaping the narrative and discourse surrounding disability,” Sawhney said.
Moreover, over 70% of the disabled community in India is unemployed, which is more than the population of Sri Lanka, Sawhney said who also believes this is contributing to an annual loss of nearly USD 12 billion to the Indian economy.
To solve this problem, I-Stem has launched a hiring portal that helps disabled candidates connect with employers based on their past experience, work preferences, aspirations and disability.