AI Mysteries | AIM

Tech Meltdowns: 8 Epic Outages and What Went Wrong

Gopika Raj — Thu, 25 Jul 2024 09:29:27 +0000

According to the Uptime Institute’s 2024 Outage Analysis, between 10 and 20 “high-profile IT outages or data centre events” occur every year.

The study revealed that while power is the main cause of data centre outages, network issues are the leading cause of outages across all IT services. These outages make headlines and have serious consequences, disrupt business for customers, and damage company reputations.

More than half of the respondents said their most recent major outage cost them over $100,000, and 16% reported that it cost them over $1 million. Additionally, the report mentions the leading causes of network outages, including design and configuration, hardware, capacity, software, and environmental threats.

Here are eight major tech outages to be explored.

Microsoft-CrowdStrike

Last week, CrowdStrike, a security technology provider, caused a massive global IT outage, potentially the biggest in history, affecting airlines, banks, businesses, schools, and government services worldwide.

The CrowdStrike Outage occurred due to a faulty software update in their Falcon sensor program, which caused widespread disruptions to Windows systems globally. This led to the infamous “Blue Screen of Death” and reboot loops for millions of users.

Excluding Microsoft, US Fortune 500 companies are said to face $5.4 billion in financial losses due to the Windows outage.

Meta

On October 4, 2021, Meta platforms, including Facebook, Instagram and WhatsApp, experienced an outage lasting nearly six hours. Users faced difficulties accessing the apps, leading to a surge in traffic on competing platforms like Twitter and TikTok.

During this period, Facebook reportedly lost about $545,000 in US ad revenue per hour.

Google Services

Popular Google services such as YouTube, Gmail, Google Drive, and Google Docs were down for an hour, affecting millions of users worldwide on December 14, 2020.

The outage was attributed to a failure in Google’s authentication system, which manages user logins across its services. The issue specifically stemmed from an internal storage quota problem.

Users attempting to access these platforms encountered errors, with many reporting that they were unable to log in or retrieve their data. Google acknowledged the issue and confirmed that the services were restored for the vast majority of affected users shortly after the outage.

Fastly

On June 8, 2021, Fastly, a major content delivery network (CDN) provider, experienced a significant global outage that disrupted numerous high-profile websites, including Amazon, Reddit, and The New York Times.

The outage was triggered by a software bug that had been introduced during a deployment on May 12, which remained dormant until a valid configuration change made by a customer activated it.

This led to 85% of Fastly’s network returning errors, resulting in widespread accessibility issues for many internet users around the world.

Twitter (X Corp)

Twitter suffered a major outage on December 28, 2022, leaving tens of thousands of users unable to access the platform or its features for several hours. It primarily impacted users attempting to access the platform via desktop computers.

Many reported being unexpectedly logged out, encountering error messages, and facing difficulties in viewing replies or using features like notifications and TweetDeck. The hashtag #TwitterDown trended on the platform as users shared their experiences during the outage.

AWS

On December 7, 2021, Amazon Web Services (AWS) experienced a significant outage that disrupted numerous services and affected a wide range of businesses and applications. It primarily impacted the US-East-1 region, located in Northern Virginia, which is crucial for many of AWS’s services.

The outage was caused by an automated scaling activity designed to increase capacity for service within AWS’s main network. This action unintentionally triggered a surge in connection attempts within AWS’s internal network, overwhelming the devices managing communication between the internal and main networks.

Akamai

On June 17, 2021, a significant disruption occurred at Akamai, affecting the websites of numerous financial institutions and airlines in Australia and the United States. This outage was traced back to server-related glitches at Akamai, a major content delivery network (CDN) provider.

The incident marked the second major internet blackout within a week, following a prior outage caused by a rival CDN, Fastly Inc.

Akamai attributed the outage to a bug in its software, which was promptly addressed. The company confirmed that the issue was not related to any cyber-attack or security vulnerability.

Cloudflare

A power failure led to Cloudflare coming down for around two days. The platform uses the services of three data centres. One such data centre experienced a power failure. The outage was caused by a failure of the facility’s generators and faulty circuit breakers.

As the generators failed, Cloudflare’s network routers lost power, which disrupted services reliant on the PDX-04 data centre.

The outage primarily affected Cloudflare’s dashboard, APIs, and related services, while traffic through its global network continued to function without interruption.

The post Tech Meltdowns: 8 Epic Outages and What Went Wrong appeared first on AIM.

Top 8 AI Tools for Legal Professionals

Gopika Raj — Fri, 19 Jul 2024 07:07:37 +0000

According to a 2024 generative AI report, the adoption of AI in the legal profession shows that 14% of law firm and corporate legal respondents are already using AI technology, 12% are planning to use it, 35% are considering whether or not to use it, and 40% currently have no plans to use AI.

Additionally, 27% of legal industry respondents said they use open-source AI tools like ChatGPT today, while 43% plan to use legal-specific AI tools within the next three years.

Here are a few AI tools for legal professionals to explore.

Casetext

Casetext, one of the best legal research tools on the market, and its new AI assistant has the potential to make it even more useful. Casetext speeds up the research process with its AI assistant called CoCounsel, which was developed in collaboration with OpenAI and built on top of GPT-4 with customizations for the legal industry.

Casetext claims that CoCounsel can write and comprehend at a postgraduate level. Beyond legal research, CoCounsel can also help with reviewing documents, searching databases to discover relevant information, preparing for trial, summarising findings, and laying out the basics for a legal memo.

Detangle.ai

Detangle makes innovative use of AI technology to summarise lengthy legal research and transform complex jargon into plain English. By simplifying and explaining legal terminology, detangle helps one save thousands in legal fees and boost one’s confidence in legal matters.

The detangled document provides a concise 1-2 paragraph summary, so one can immediately understand the key points. With a simple interface, one can upload assets or paste URLs to get AI-generated summaries of lengthy documents, audio files, or videos, making complex legal jargon easy to understand.

Amto

Amto utilises 692 proprietary AI algorithms specifically trained for lawyers, to enhance online visibility, attract quality leads, and convert them into loyal clients.

Amto combines human expertise with AI for superior legal marketing; expert lawyers curate content, while AI selects ideal keywords to target the audience effectively. It ensures all strategies comply with ABA guidelines, safeguarding the integrity of one’s practice.

The services include using generative AI based on ChatGPT for drafting legal documents, letters, and emails, providing instant revisions, recommending missing clauses, identifying research gaps, and creating client reminders for deadlines and important dates.

AI Lawyer

AI Lawyer is an innovative tool designed for everyone. It avoids expensive legal consultations, long waits for appointments, and confusing legal texts.

It provides legal information and simplifies legal language for general audiences, supports lawyers in legal research and strategy development, and helps law students hone their research skills and gain insights into modern legal trends.

AI Lawyer’s best features include providing fast, easy-to-understand insights into complex legal issues, summarising uploaded documents, and creating simple legal agreements in minutes.

Lawgeex

Lawgeex contract review automation solution, powered by patented AI technology, reviews and redlines legal documents based on one’s predefined policies. Unlike other solutions, Lawgeex comprehends contractual context and one’s position, ensuring consistency across contracts, saving time and money, and expediting deal closures.

This AI-driven tool supports high-volume contract analysis for non-disclosures, service agreements, and more. It integrates with popular CRM systems and other apps, allowing for automatic redlining via email or direct integration, making the review process efficient and seamless.

Lex Machina

Lex Machina, a legal analytics company owned by LexisNexis, leverages AI technology to review and sort court documents, enabling users to discover trends among parties, lawyers, judges, and courts. The data includes findings, damages, and case outcomes.

Lex Machina continuously crawls databases such as the USPTO, ITC/EDIS, PACER, and state court data every 24 hours, ensuring the AI platform has access to the latest information on millions of cases. This comprehensive data collection allows Lex Machina to deliver a vast array of AI-driven insights.

Law.co

Law.co (AI) platform for lawyers and law firms, is designed to digitise and optimise legal operations. It automates and streamlines complex legal processes and research, while enhancing contract drafting and editing with the power of GPT.

This powerhouse platform leverages AI to revolutionise law firms’ operations, offering cutting-edge features like automated legal drafting, contract review, and enhanced legal research.

Law.co also utilises GPT and an advanced semantic search engine, providing powerful access to queries with contextual understanding and continuous learning capabilities. Additionally, the platform is multilingual, making it an excellent choice for firms offering services in multiple languages.

Harvey AI

Harvey AI stands among other AI-driven legal tools for its predictive abilities. It dissects extensive legal databases, delves into case histories, and analyses legal trends to provide well-informed predictions about case trajectories.

This enables proactive decision-making by leveraging data-driven insights, setting it apart as a distinctive advantage. Harvey AI’s ability to detect early warning signs and suggest risk-mitigation strategies can revolutionise internal procedures.

It goes beyond reactive legal actions, empowering organisations to make proactive decisions, a feature that marketers can highlight to showcase its unique benefits.

The post Top 8 AI Tools for Legal Professionals appeared first on AIM.

Top 10 AI Tools for Finance and Accounting in 2024

Gopika Raj — Thu, 18 Jul 2024 06:49:19 +0000

The State of AI in Accounting Report 2024, which explores the impact of AI on the accounting profession based on insights from 595 professionals, forecasts significant changes in the accounting industry thanks to AI.

A staggering 71% of respondents foresee substantial transformation driven by AI. Despite this enthusiasm, the report reveals a notable gap: while 82% of accountants express interest or excitement about AI, only 25% actively invest in AI training for their teams.

Moreover, the report identifies three primary areas where AI is being utilised by accounting professionals: communication, task automation, and research.

Currently, 59% of accountants use AI to compose emails, 36% to automate workflows, and 31% leverage AI tools for research purposes, highlighting the diverse applications of AI in enhancing efficiency and productivity within the accounting sector.

Here are 10 AI tools that are widely used in accounting and finance.

ClickUp

ClickUp Accounting is a cloud-based software for managing accounts and creating shareable reports.

ClickUp Brain, an AI-powered virtual assistant, connects tasks, documents, and people, helping with financial management, project detailing, and meeting updates. Also, one can set up client/project workspaces, organise tasks into folders/lists by service type (audits, tax filings, monthly accounting).

Trullion

Trullion’s AI-powered accounting software solution offers significant time savings, growth opportunities, and impeccable financial oversight for accounting and audit teams. It automatically verifies the numbers against reporting and compliance requirements, identifying discrepancies and potential issues before they impact the business.

The platform leverages a proprietary financial rules engine, connects to hundreds of third-party data sources, and stays current with global compliance standards, ensuring comprehensive and up-to-date financial management.

Vic.ai

Vic.ai integrates seamlessly with leading ERP systems and accounting solutions, offering flexible and scalable AI-first capabilities through an open API.

It optimises Accounts Payable processes, supports informed decision-making, and handles payment processing via card, cheque, and ACH, ensuring compatibility with all major ERPs for enhanced efficiency in financial operations.

Zeni

Zeni integrates AI to automate accounting, spending, and budgeting, simplifying financial operations with real-time data analysis for informed business decisions, blending AI with human expertise for effective expense tracking, bookkeeping, bill payments, reimbursements, and more.

Zeni provides personalised budgeting advice and a comprehensive one-page financial overview. It enables easy comparison of monthly, quarterly, and yearly reports to track progress, and simplifies data consolidation from receipts through a dedicated email address.

Docyt

Docyt AI enhances QuickBooks® with enterprise-level accounting automation, streamlining workflows for scalable business growth. One can choose from diverse plans, including expense management and automated bookkeeping for large operations.

It helps access secure financial tools via its mobile app and automates revenue tracking and gaining insights across all streams with Docyt AI. One can accelerate month-end closings with real-time accounting and smart reporting capabilities.

Booke

Booke can transform financial processes with its AI-driven Robotic Bookkeeper for QuickBooks and Xero. It helps instantly organise invoices and receipts in any language or currency. It also assists in customising fields effortlessly with drag-and-drop, while the AI learns from one’s history to code transactions accurately.

Additionally, it helps resolve coding errors, categorise transactions, and automate tasks using AI. It streamlines month-end close with powerful automation, detecting and fixing errors quickly with Booke’s advanced features.

Bluedot

Blue Dot is an AI-driven tax compliance platform leveraging patented technology to help businesses ensure tax compliance, reduce spending vulnerabilities, and gain a comprehensive view of employee transactions.

It utilises VAT Box to identify and calculate eligible VAT spending, employs AI for detecting and analysing wage tax information under Taxable Employee Benefits, and enhances expense management workflows with its proprietary AI-driven suite, applying checks and tax rules to maintain compliance.

Gridlex

Gridlex Sky, part of the Gridlex suite, integrates accounting, expenses, and ERP functionalities to streamline financial processes. It automates revenue and expense calculations, enhancing accuracy and saving time previously spent on manual tasks.

This automation reduces errors, improves efficiency, and integrates seamlessly with Gridlex Ray for HR management and Gridlex Zip for CRM and customer service support, offering businesses a comprehensive platform for essential operations.

Truewind

Truewind is an AI-powered software designed specifically for startups, offering reliable bookkeeping and detailed financial models with minimal errors. It accelerates month-end close processes for accounting firms and internal teams, reducing administrative burdens and increasing profitability.

Accounting firms benefit from Truewind’s specialised solution, which simplifies the month-end close without traditional checklist hassles. It integrates seamlessly, eliminating the need for manual checklist transfers into software, often required by other solutions.

Stampli

Stampli streamlines invoice management across all stakeholders—accounts payable (AP) staff, approvers, management, controllers, CFOs, and vendors—via a unified communications hub that integrates with each invoice. This fosters a seamless collaboration and rapid query resolution, accelerating processing times by 5x through timely access to critical information, thereby enhancing decision-making capabilities.

Customers choose Stampli for its efficient invoice capture, coding, and approval processes, bolstering internal controls with detailed audit trails and real-time insights to optimise overall finance operations while ensuring audit readiness.

The post Top 10 AI Tools for Finance and Accounting in 2024 appeared first on AIM.

10 AI Tools for Sales and Marketing Professionals

Gopika Raj — Mon, 15 Jul 2024 07:07:46 +0000

Today, organisations are reaping the benefits of AI, enhancing productivity and company performance. According to Harvard Business Review, AI sales tools have led to a 50% increase in leads and saved organisations 40-60% in overall costs.

According to a recent report by Market.us, the market for AI in sales and marketing is projected to grow substantially, reaching $10 billion by 2033, with a compound annual growth rate (CAGR) of 16.8% during the forecast period.

In this article, we will explore the top 10 AI tools for sales and marketing.

Zoho CRM

Zoho CRM acts as a single repository, bringing together sales, marketing, and customer support activities, and streamlining processes, policies, and people on one platform. It comes with its own AI assistant, Zia, a conversational AI within Zoho Analytics that helps users turn raw data into actionable insights within seconds.

Users can start a conversation with Zia, ask her anything, receive meaningful insights such as KPIs and powerful visualisations, and quickly make critical business decisions.

This is particularly helpful for marketing teams in providing valuable feedback to sales reps on their prospects and automating sales follow-ups in specified cases.

Yesware

Yesware AI sales tool offers sales teams a one-stop solution for comprehensive sales outreach with features like personalised email campaigns, automated follow-ups, and data-driven analytics.

It helps sales reps by automating and streamlining email outreach, generating automated email campaigns, and providing robust real-time data-driven insights, empowering sales teams to make informed decisions and strategic adjustments.

Salesforce Einstein

Salesforce Einstein is an AI technology that uses machine learning, natural language processing, and predictive analytics to analyse data, uncover insights, and automate tasks. It creates customisable, predictive, and generative AI experiences for all business needs, ensuring safety and security.

By analysing customer behaviour and preferences, Einstein enhances sales collateral, workflows, and other processes to build more meaningful and engaging relationships.

Loom AI

Loom AI allows sales professionals to send prospects auto-enhanced videos with auto-generated scripts. It helps in engaging the sales pitches. Loom has been a game-changer for sales professionals looking to add a personal touch to their outreach and analyse prospect engagement.

Loom AI adds AI-generated titles, summaries, chapters, and custom messaging. Moreover, it connects with sales tools like Salesforce, Zoom, Slack, Google Workspace, Calendly, and more to keep the workflow running smoothly.

Gong

Gong.io is a revenue intelligence platform which uses AI to analyse customer interactions, providing actionable insights for sales and marketing teams. It offers deal intelligence, personalised coaching, performance improvement tools, and enhanced pipeline management, helping sales reps learn from top performers, identify risks, and close more deals.

Gong fosters alignment between marketing, sales, product, and customer success teams by sharing customer insights, and it identifies patterns in real conversation data to help improve sales interactions.

Drift AI

Drift is a conversational chatbot marketing and sales AI tool designed to help teams engage with website visitors in real-time, providing automated yet highly personalised insights at scale. It is an AI-powered buyer engagement platform that automatically listens, understands, and learns from buyers to create highly personalised experiences.

Additionally, Drift helps sales and marketing teams qualify and score leads based on specific custom buying signals and other criteria, ensuring sales reps know exactly where and when to focus their efforts.

InsightSquared

InsightSquared is a comprehensive sales and revenue analytics platform that enhances sales and marketing operations through various features. It utilises AI and machine learning for sales forecasting, pipeline management, and performance analytics.

The platform offers activity capture, conversation intelligence, and guided selling to improve sales efficiency. It provides revenue operations dashboards and custom reporting for data-driven decision-making. Additionally, InsightSquared supports sales coaching by analysing performance data and call recordings.

Clari

Clari’s AI sales tool is considered a revenue operations platform that helps sales teams optimise their performance with real-time insights, accurate sales forecasts, and impressive predictive analytics.

One of the things that makes Clari so functional is that it pulls and combines data from a wide array of sources. Its AI feature studies previous sales, market, and industry data alongside current data in the same categories to predict deal outcomes and/or suggest tweaks or strategy updates as needed.

HubSpot Sales Hub

HubSpot excels in CRM, offering powerful tools for lead management, automated outreach, and sales optimisation. Its advanced automation and AI-powered features, including lead scoring, enable strategic resource allocation and targeted outreach strategies, boosting conversion rates.

HubSpot Sales Hub integrates AI for forecasting, task automation, content management, and team collaboration, empowering sales teams with real-time insights and historical data analysis for informed decision-making.

Vendasta

Vendasta offers a suite of AI-powered tools to enhance sales and marketing efforts. These include automated review responses, AI-generated social media content, customised prospect reports, and personalised email campaigns.

The platform also provides AI-assisted lead generation through Snapshot Reports, PPC advertising optimisation, and automated business listing creation.

Additionally, Vendasta’s platform offers numerous automation features, including initiating email campaigns, facilitating product adoption, and identifying upsell opportunities.

The post 10 AI Tools for Sales and Marketing Professionals appeared first on AIM.

India’s Space Startups Have Taken Off, Quite Literally

Anshul Vipat — Wed, 03 Jul 2024 11:31:14 +0000

India tells the success story of significant startups that develop solutions to make the country a global leader in the space industry.

At the AWS Summit, Clint Crosier, the director of the AWS Aerospace and Satellite business, called India the next space technology hub. AWS sees India as a significant growth market and plans on investing 12.7 billion in cloud infrastructure in India by 2030.

“We’re investing in the Indian people, the Indian economy, and Indian technology. We want to make our technology available and believe it can do wonderful things in India,” said Crosier, a former US Air Force Major General who has more than three decades of experience in space missions.

To Infinity and Beyond

Ten years ago, India only had one startup, now it has at least 190 space technology startups, Crosier said. Last year, space startups raised $120 million in new funding, a rate that is doubling or tripling annually.

India also has a huge intellectual capital. “The best technologists in the world come from India. When the thought of scaling in different countries came up, India was clearly the right place to start,” Crosier said.

India’s space business is currently valued at $8 billion, accounting for a meagre 2% of the worldwide space economy. However, by 2033, Crosier believes it is expected to be $44 billion.

Crucially, the government expenditure component, which has been significant in recent years, is predicted to shrink from 27% in 2021 to less than 18% by 2040. These factors will help rapid growth of its dynamic private space startup ecosystem.

Wings of Fire

The once-dominant state-run ISRO has given way to many rapidly growing entrepreneurs.

These endeavours cover a wide range of industries, including Earth observation applications (Pixxel), space-based data analytics (Bellatrix Aerospace), satellite manufacturing (Agnikul Cosmos), and launch vehicle development (Skyroot Aerospace).

India’s cost advantage in space missions, combined with ISRO’s technological knowledge, gives entrepreneurs a distinct advantage, luring global clients and paving the way for India to become a significant space power.

For example Kawa Space, which Crosier mentioned in his talks, signals intelligence and maritime domain awareness as a service.

He also highlighted the significant role of Blue Sky Analytics in the fight against climate change. The company is leveraging satellite data, AI, and cloud technology to make a substantial impact.

Another key player is T-hub, a renowned accelerator and incubator in India, sponsored by the Indian government, and a part of the AWS Space Tech Accelerator Program. Crosier emphasised the importance of their participation in the program.

Crosier also emphasised India’s significant role as a pioneer in quantum key distribution. This technology enables real-time data distribution with a low-latency key.

He also introduced GIS Kernel, a Pune-based startup, which is involved in building satellites for quantum key distribution. Crosier hailed India’s leadership in this field, fostering a sense of pride and appreciation.

Apart from these, there is KCP Infra that has delivered Integrated Air Drop Test (IADT) – Crew Module Structure to ISRO for the Gaganyaan mission.

Another company helping India achieve the dream of sending a man in space is Tata Elxsi, which has designed and developed the crew

module recovery models (CMRM) for the recovery team training for the space mission. Pushpak Aerospace, another Bangalore-based startup, is delivering aerospace components for ISRO.

Credit Goes to ISRO and IN-SPACe

ISRO, which has been a pioneer in space exploration in the country, has been collaborating with these startups providing them with the much-needed fuel to carry on.

For instance, technological firms like Ananth and Data Patterns are the core manufacturers of ISRO’s ground stations, nano satellites, and automated test equipment. Dhruva Space manufactures satellites for missions in Low Earth Orbit (LEO) and beyond.

Through the help of ISRO, Skyroot became the first private Indian startup to successfully test liquid propulsion engines and a 3D printed cryogenic engine. It launched India’s first private rocket, Vikram-S, in 2022.

To facilitate private sector participation, the government has created the Indian National Space Promotion and Authorisation Centre (IN-SPACe), as a single-window, independent, nodal agency.

Going Long on India

Not just AWS, major space-tech companies are investing in India. Elon Musk’s SpaceX has been collaborating with Indian space startups for a long time.

Last year, it launched Indian startup Azista BST Aerospace’s satellites for remote-sensing capabilities. The company claims it can produce 50 satellites in a year and its spacecraft for 20% less cost than rivals elsewhere.

Musk, who is set to visit India soon, will meet some startups and will be exploring investment opportunities in EVs, space exploration, and satellite-based services

Major players, including Jeff Bezos’ Blue Origin, which, like Musk’s SpaceX, provides launch services among other things, met with the Indian government and entrepreneurs several times over the past two years to discuss manufacturing collaborations.

Recently, The US-headquartered Space Exploration and Research Agency (SERA), in collaboration with Blue Origin, announced India as a “partner nation” in its human flight program for citizens.

GIC, the Singapore sovereign wealth fund, and Sherpalo Ventures, based in Silicon Valley, have both invested in Indian space startups.

Sky Is No Longer The Limit

In Crosier’s words, government support, investment flows, number of tech companies, and policy environment give India the best opportunity to boost the space industry through partnerships with startups.

He said that India’s space agency is already riding high with its past successful missions. “In the future, we are going to see continued investment and funding,” shared Crosier.

This influx of capital will enable startups to build infrastructure, advance technology, attract talent, and enhance launch capabilities. India has a large talent pool of skilled engineers and technologists that companies want to tap into and support.

The post India’s Space Startups Have Taken Off, Quite Literally appeared first on AIM.

10 Shockingly Realistic Videos Generated Using Runway Gen-3 Alpha

Tarunya S — Wed, 03 Jul 2024 07:31:23 +0000

Runway has unveiled its newest model Gen-3 Alpha, a groundbreaking text-to-video AI model that has set a new benchmark in video creation. This advanced model allows users to generate high-quality, ultra-realistic scenes that are 10 seconds long, with many different camera movements, using only text prompts, still imagery, or pre-recorded videos.

The American AI startup was founded in 2018 by Cristóbal Valenzuela, Alejandro Matamala, and Anastasis Germanidis.

“The ability to create unusual transitions has been one of the most fun and surprising ways we’ve been using Gen-3 Alpha internally,” said Runway co-founder and CTO Germanidis.

Go make art https://t.co/BRraBDEv0g
— Cristóbal Valenzuela (@c_valenzuelab) July 1, 2024

Back in February 2023, Runway released Gen-1 and Gen-2 the first commercial and publicly available foundational video-to-video and text-to-video generation modelaccessible via an easy-to-use website.

Meanwhile, here’s a compilation of ten mind-blowing videos produced by Gen-3 Alpha.

An Ant’s Journey

This AI-generated video begins with an extreme close-up of an ant emerging from its nest, highlighting the intricate details of the ant’s movements and surroundings. As the camera steadily pulls back at a moderate pace, the scene gradually expands to reveal the broader environment of a neighbourhood beyond the hill.

A Giant Stone Hand

The video depicts an ultra-wide shot capturing a giant stone hand emerging from a massive pile of rocks at the base of a towering mountain. The hand, intricately carved with lifelike details, appears to be reaching out towards the sky. While the surrounding area is filled with smaller boulders and debris.

It highlights the advanced visual capabilities of AI in capturing and rendering detailed environments.

Thumbs Up in Front of a Burning Building

Here, a man stands confidently in front of a burning building, with flames roaring and smoke billowing into the sky behind him. The intense heat and bright orange glow create a dramatic backdrop. Despite the chaotic scene, he gives a ‘thumbs up’ sign.

This visual experience shows the original artwork while offering a fresh, modern perspective through the unsettling potential of AI.

Neon-Lit Dark Forest

With Gen-3 Alpha, the camera zooms through the dense, shadowy depths of a dark forest, the scene is transformed by vibrant neon light emanating from the flora.

Bright plants and glowing flowers create a mesmerising, otherworldly glow that illuminates the path ahead, casting an enchanting light on the surrounding trees and foliage.

The interplay of darkness and neon colours creates a visually stunning and immersive experience, drawing viewers deeper into this mystical and illuminated woodland realm.

The Ostrich in the 1980s Kitchen

The scene opens with a slow, deliberate cinematic push-in on an ostrich standing in the centre of a quintessential kitchen. The camera glides through the warm hues of the room, capturing the intricate details of the ostrich. The kitchen is adorned with classic 1980s décor.

In a Rundown City

The video begins by showing a giant creature walking through the city, visible through the window of a building. The scene is dimly lit by a single, flickering street lamp, revealing an empty cityscape bathed in its eerie glow.

The precise features and seamless editing add to future creativity in AI.

An Aerial View

The scene opens with the mysterious cloaked figure at the centre of the frame, rising steadily into the sky as the camera captures an aerial view of a vast metropolis. The vast expanse of skyscrapers and high-rise buildings stretches out below, while the figure’s ascent is slow and deliberate.

The AI algorithms show the light refraction, intense colour, and slow zoom, creating a captivating scene.

A Young Woman

The clip features a captivating zoom-in shot of a young woman sitting alone on a wooden bench in the middle of an empty school gym. She sits expressionless as she looks into the camera. She’s dressed in casual clothing and the camera continues to zoom in on her.

Tsunami Through the Alley

This Gen-3 video shows an alleyway, framed by a vibrant array of colourful buildings. Their facades, painted in hues of blue, orange, pink, and green, create a picturesque yet surreal backdrop.

The camera captures the dynamic movement of a massive tsunami as it barrels through with its force unstoppable as water crashes against the buildings, showing frothing waves.

Overall, it showcases its potential in generating high-quality, lifelike video content, making it a standout in digital animation.

Exploring Ancient Ruins

The video begins with an astronaut walking through ancient stone buildings, talking and filming himself. As he moves forward, the camera captures the intricate details of the surroundings, such as the stone walls, arched doorways, and rough textures.

The video adds a sense of mystery to the scene. “In the distance, a tower can be seen behind the astronaut.

This is a much more advanced model than any existing model at understanding and generating videos.

The post 10 Shockingly Realistic Videos Generated Using Runway Gen-3 Alpha appeared first on AIM.

Why AI Keeps Creating Body Horror

Donna Eva — Tue, 02 Jul 2024 05:10:40 +0000

Luma AI’s Dream Machine has some pretty impressive capabilities, but its most interesting one lies in creating body horror.

While many have succeeded in jailbreaking the relatively new video generation model to generate gory or NSFW videos, most have inadvertently faced some pretty shocking results.

CW: Body Horror?

This AI video attempt to show gymnastics is one of the best examples I have seen that AI doesn’t actually understand the human body and it’s motion but is just regurgitating available data. (Which appears to be minimal for gymnastics) https://t.co/8dD2q30e4G
— Cheshire Cat ᓚᘏᗢ, (@autismsupsoc) June 29, 2024

This isn’t uncommon, as generative AI has been pretty notorious for creating nightmare fuel when it comes to generating humans. From generating too many fingers to messing up basic body proportions and fusing faces, users have been pointing out these flaws with the first iterations of DALL-E, Midjourney and Stable Diffusion.

Responding to Dream Machine’s attempt at generating the video of a gymnast, Meta’s chief AI scientist Yann LeCun implied that currently, it’s nearly impossible for video generation models to generate anything physics-based.

Video generation models do not understand basic physics.
Let alone the human body. https://t.co/qas7HS2m5p
— Yann LeCun (@ylecun) June 30, 2024

Are We Doomed to Have AI Mess Ups?

Early image generation models largely relied on layering several images and finetuning them to create a prompt-relevant image, this resulted in the models often mistaking hands and other body parts for something else.

diffusion is such an unserious algorithm

mf you're literally just repeatedly exclaiming "Enhance!" at an image of grey noise until it becomes an anime girl or whatever
— henry (@arithmoquine) June 30, 2024

This is in both parts due to the dataset that the model relies on as well as how the model goes about identifying different parts, resulting in pretty outlandish hallucinations.

Responding to a query from Buzzfeed last year, Stability AI explained the reason behind this. “It’s generally understood that within AI datasets, human images display hands less visibly than they do faces. Hands also tend to be much smaller in the source images, as they are relatively rarely visible in large form,” a spokesperson said.

Midjourney and other image generation models, over time, have managed to rectify these issues, through refining their datasets to focus on certain aspects and improving the model’s capabilities.

Just like image generation models got better, LeCun conceded that video generation models, too, would improve. However, his bold prediction was that systems that would be able to understand physics would not be generative.

“Video generation systems will get better with time, no doubt. But learning systems that actually understand physics will not be generative. All birds and mammals understand physics better than any video generation system. Yet none of them can generate detailed videos,” he said.

Forget the Horrors, What About Physics?

While the body horror aspects of AI-generated content have garnered significant attention, the more fundamental challenge lies in creating AI systems that truly understand and replicate real-world physics.

As LeCun points out, even the most advanced video generation models struggle with basic physical principles that animals intuitively grasp. Maybe improving this could solve the issue of body horror altogether.

This goes beyond just aesthetics or generating uncanny valley humans. A core challenge with AI, which includes achieving AGI, is trying to bridge the gap between pattern recognition and a genuine understanding of how the world works.

Current generative models excel at producing visually convincing imagery, but, as LeCun and many others have pointed out, they lack the underlying comprehension of cause and effect, motion, and physical interactions that govern our reality.

Addressing this challenge could require a shift in approach. Rather than focusing solely on improving generative capabilities, researchers might need to develop new architectures that can learn and apply physical principles.

This could involve incorporating physics engines, simulations, or novel training methods that emphasise understanding over mere reproduction. Maybe even trying to incorporate 3D models within datasets to give them a better understanding of how objects, including human bodies, could move in certain situations.

Though lesser known, we already have models like MotionCraft, PhyDiff and MultiPhys which make use of physics simulators and 3D models.

The future of AI in visual content creation may not lie in increasingly realistic generative models but in systems that can reason about and manipulate physical concepts. These advancements could lead to AI that avoids body horror and also produces generations that are fundamentally more coherent and aligned with our physical world.

The post Why AI Keeps Creating Body Horror appeared first on AIM.

Top 7 Papers Presented by Google at CVPR 2024

Tarunya S — Fri, 28 Jun 2024 10:30:55 +0000

The 2024 edition of CVPR 2024, the prestigious annual conference for computer vision and pattern recognition, took place from June 17 to 21 in Seattle, Washington.

Google Research was one of the key sponsors, which presented over 95 papers on various topics including computer vision, AI, machine learning, deep learning, and related areas from academic, applied, and business R&D perspectives. It also had an active involvement in over 70 workshops and tutorials.

“Computer vision is rapidly advancing, thanks to work in both industry and academia,” said David Crandall, professor of computer science at Indiana University, Bloomington and CVPR 2024 program co-chair.

The event saw 11,532 entries, out of which only 2,719, that is 23.58%, were accepted. Let’s take a look at the top papers presented by Google this time.

Generative Image Dynamics

Generative Image Dynamics presents a novel approach for generating realistic image sequences from a single input image by the authors. This work presents a generative model that predicts the temporal evolution of images, capturing spatial and temporal dependencies.

This approach has potential applications in video prediction and by generating realistic image sequences from a single input, it advances generative modelling and opens new possibilities for creative and interactive applications.

Rich Human Feedback for Text-to-Image Generation

The paper proposes a novel approach to leveraging human feedback for improving text-to-image generation models.

The framework allows users to give detailed feedback on generated images, such as annotations, sketches, and descriptions. This feedback is used in a novel training strategy to fine-tune and improve the text-to-image generation model.

Incorporating rich human input also addresses the limitations of current models and advances user-centric generative systems.

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

The paper introduces a diffusion model that can efficiently estimate the 3D lighting environment from a single 2D image.

The diffusion model enables real-time applications like virtual try-on and augmented reality, with effective lighting estimation demonstrated on diverse inverse rendering benchmarks, surpassing prior state-of-the-art methods.

The authors have also released the source code and a demo of the Diffusion Light system, enhancing accessibility for further research and development.

Eclipse: Disambiguating Illumination and Materials using Unintended Shadows

This paper, published in May 2024 by a team of Google researchers, presents Palm-E, a large language model designed for dialogue applications. The model is based on the Pathways Language Model (PaLM) architecture, which is a scaled-up version of the Transformer model.

The authors fine-tuned the model on a large dataset of conversational data, including both human-human and human-bot conversations. The authors evaluated Palm-E on a range of dialogue tasks, including open-domain conversation, task-oriented dialogue, and dialogue safety.

Time-, Memory- and Parameter-Efficient Visual Adaptation

This paper was published by a team of researchers from the University of Tubingen.

The paper explores the use of deep reinforcement learning to study the evolution of cooperation in social dilemmas. Social dilemmas are situations where individual self-interest conflicts with the collective good, and cooperation is often required to achieve the best outcome for the group.

They found that the agents were able to learn cooperative strategies in some cases, but that the emergence of cooperation depended on several factors, including the payoff structure of the game and the presence of noise.

Video Interpolation with Diffusion Models

Here, the authors argue that traditional supervised learning approaches for summarisation are limited by the quality and diversity of the available training data, and that RL with human feedback can help address these limitations.

They also propose a framework which involves training a reward model to predict the quality of summaries based on human feedback, and then using this reward model to train a summarisation model using RL.

It also includes an analysis of the reward model and the summarisation model, and discusses several challenges and limitations of using RL with human feedback for summarisation.

WonderJourney: Going from Anywhere to Everywhere

Another paper here, presents a new approach for generating images from text using diffusion models. Diffusion models are a class of generative models that have recently shown promising results in image synthesis tasks. The authors first train a text encoder to map text descriptions to a latent space. They then use this latent space to condition a diffusion model to generate images.

The diffusion model is trained using a denoising objective, where the model learns to progressively remove noise from a noisy image until it matches the target image.

The authors evaluated their approach on several benchmark datasets for text-to-image synthesis and compared it to several state-of-the-art models.

The post Top 7 Papers Presented by Google at CVPR 2024 appeared first on AIM.

Top 7 Papers Presented by Meta at CVPR 2024

Gopika Raj — Tue, 25 Jun 2024 10:15:48 +0000

CVPR 2024 (Conference on Computer Vision and Pattern Recognition) saw some of the most outstanding research papers on computer vision. As a preeminent event for new research in support of AI, ML, deep learning, and much more, it continues to lead the field.

This year, CVPR saw 11,532 papers submitted with 2,719 approvals, which is a considerable increase compared to last year that saw 9,155 papers and 2,359 accepted.

CVPR, a leading-edge expo, also provides a platform for networking opportunities with tutorials and workshops, with the event annually attracting over 10,000 scientists and engineers. It featured research papers presented by major tech companies, including Meta, Google, and others, which followed suit from last year.

Here are some of the top papers presented by Meta.

PlatoNeRF: 3D Reconstruction in Plato’s Cave via Single-View Two-Bounce Lidar

PlatoNeRF is an innovative method for reconstructing 3D scenes from a single view using two-bounce lidar data. By combining neural radiance fields (NeRF) with time-of-flight data from a single-photon lidar system, it reconstructs both visible and occluded geometry with enhanced robustness to ambient light and low albedo backgrounds.

This method outperforms existing single-view 3D reconstruction techniques by utilising pulsed laser measurements to train NeRF, ensuring accurate reconstructions without hallucination. As single-photon lidars become more common, PlatoNeRF offers a promising, physically accurate alternative for 3D reconstruction, especially for occluded areas.

Read the full paper here.

Relightable Gaussian Codec Avatars

Meta researchers developed Relightable Gaussian Codec Avatars, which create high-fidelity, relightable head avatars capable of generating novel expressions.

The method uses a 3D Gaussian geometry model to capture fine details and a learnable radiance transfer appearance model for diverse materials, enabling realistic real-time relighting even under complex lighting.

This approach outperforms existing methods, demonstrated on a consumer VR headset. By combining advanced geometry and appearance models, it achieves exceptional visual quality and realism suitable for real-time applications like virtual reality, though further research is needed to address scalability, accessibility, and ethical considerations.

Read the full paper here.

Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

The Nymeria dataset, the world’s largest of its kind, contains 300 hours of human motion data from 264 participants across 50 locations, captured using multimodal egocentric devices.

It includes 1200 recordings, 260 million body poses, 201.2 million images, 11.7 billion IMU samples, and 10.8 million gaze points, all synchronised into a single metric system.

The dataset features comprehensive language descriptions of human motion, totaling 310.5K sentences and 8.64 million words. It supports research tasks like motion tracking, synthesis, and understanding, with baseline results for models such as MotionGPT and TM2T.

Collected under strict privacy guidelines, the Nymeria dataset significantly advances egocentric motion understanding and supports breakthroughs in related research areas.

Read the full paper here.

URHand: Universal Relightable Hands

URHand is a universal relightable hand model using multi-view images of hands captured in a light stage with hundreds of identities.

Its key innovation is a spatially varying linear lighting model that preserves light transport linearity, enabling efficient single-stage training and adaptation to continuous illuminations without costly processes.

Combining physically-based rendering with data-driven modelling, URHand generalises across various conditions and can be quickly personalised using a phone scan. It outperforms existing methods in quality producing realistic renderings with detailed geometry and accurate shading.

URHand is suitable for applications in gaming, social telepresence, and augmenting training data for hand pose estimation tasks, representing a significant advancement in scalable, high-fidelity hand modelling.

Read the full paper here.

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HybridNeRF enhances the speed of neural radiance fields (NeRFs) by blending surface and volumetric rendering methods. While traditional NeRFs are slow due to intensive per-ray sampling in volume rendering, HybridNeRF optimises by predominantly rendering objects as surfaces.

It requires fewer samples, and reserves volumetric modelling for complex areas like semi-opaque or thin structures.

Adaptive “surfaceness” parameters dictate this hybrid approach, which improves error rates by 15-30% compared to current benchmarks and achieves real-time frame rates of over 36 FPS at 2K x 2K resolution.

Evaluated on datasets including Eyeful Tower and ScanNet++, HybridNeRF delivers state-of-the-art quality and real-time performance through innovations like spatially adaptive surfaceness, distance-adjusted Eikonal loss, and hardware acceleration techniques, advancing neural rendering for immersive applications.

Read the full paper here.

Robust Human motion reconstruction via diffusion

The paper ‘RoHM: Robust Human Motion Reconstruction via Diffusion’ introduces a method for reconstructing 3D human motion from monocular RGB(-D) videos, focusing on noise and occlusion challenges.

RoHM uses diffusion models to denoise and fill motion data iteratively, improving upon traditional methods like direct neural network regression or data-driven priors with optimisation.

It divides the task into global trajectory reconstruction and local motion prediction, managed separately with a novel conditioning module and iterative inference scheme.

RoHM outperforms existing methods in accuracy and realism across various tasks, with faster test-time performance. Future work aims to enhance real-time capability and incorporate facial expressions and hand poses.

Read the full paper here.

Learning to Localise Objects Improves Spatial Reasoning in Visual-LLMs

LocVLM is a novel approach to enhance spatial reasoning and localisation awareness in visual language models (V-LLMs) such as BLIP-2 and LLaVA. The method utilises image-space coordinate-based instruction fine-tuning objectives to inject spatial awareness, treating location and language as a single modality.

This approach improves VQA performance across image and video domains, reduces object hallucination, enhances contextual object descriptions, and boosts spatial reasoning abilities.

The researchers evaluate their model on 14 datasets across five vision-language tasks, introducing three new localisation-based instruction fine-tuning objectives and developing pseudo-data generation techniques.

Overall, LocVLM presents a unified framework for improving spatial awareness in V-LLMs, leading to enhanced performance in various vision-language tasks.

Read the full paper here.

The post Top 7 Papers Presented by Meta at CVPR 2024 appeared first on AIM.

Meet the Indian Techies who Turned into Actors

Tarunya S — Mon, 24 Jun 2024 06:13:36 +0000

The worlds of technology and entertainment are not as disparate as they may seem. Taking their careers to pivot, some individuals have made a slick transition from being tech-savvy professionals to captivating actors.

One notable example is that of techie-turned-actor Ashton Kutcher. Before his acting career took off, he worked as a biochemical engineer. However, Kutcher continued to remain deeply involved in the tech world as a successful venture capitalist and co-founder of investment firm Sound Ventures.

Here are a few Indians who have not only challenged career trajectories but also redefined the stereotypes associated with tech professionals.

Premgi Amaren

Prem Kumar Gangai Amaren is an Indian singer, composer, songwriter, actor, and comedian. His stage name, Premgi, was originally a spelling error, as it was intended to be ‘Prem G’, with the G representing Gangai.

Before entering the entertainment industry, this music director turned actor worked at HCL Technologies.

Santhanam

An actor and comedian, primarily active in Tamil cinema, started his professional journey working at Wipro before transitioning into acting. He initiated his career as a television comedian, gaining popularity through his well-received performances and box office success.

By the early 2010s, the film industry had quickly recognized him as the “Comedy Superstar”.

Rahul Ravindran

This artist is a multi-talented actor, director, and screenwriter, recognized primarily for his roles in Telugu films. Prior to his acting career, he worked at Infosys. In 2018, he made his directorial debut with the Telugu film Chi La Sow, for which he received the National Award for Best Original Screenplay.

Jitendra Kumar

Best known for his portrayal of Jeetu in TVF Pitchers, Jeetu Bhaiya in the series Kota Factory, and the much-loved Sachiv Ji in Amazon Prime’s series Panchayat, Kumar studied civil engineering at the Indian Institute of Technology (IIT) Kharagpur.

After completing his education, he worked briefly in a corporate job before pursuing a career in acting.

Nivin Pauly

Nivin Pauly is an actor and producer, primarily active in the Malayalam film industry. Prior to his acting career, he worked as a software engineer at Infosys in Bangalore, a position he secured through campus placements.

Nivin was employed from 2006 to 2008 before deciding to resign and pursue acting full-time. He has since received numerous accolades, including two Kerala State Film Awards, two Kerala Film Critics Association Awards, and many more.

Karthik Kumar

Karthik Kumar, an actor and stand-up comedian, had a stint at Google before venturing into acting. He has an impressive record of performing over 1000 shows across various countries including India, USA, UK, Singapore, Malaysia, and Hong Kong.

In November 2016, Karthik expressed his frustration about being typecast in certain roles, leading him to announce his retirement from the film industry. However, he made a comeback with a role in the movie Rocketry: The Nambi Effect in 2022.

Sumukhi Suresh

Sumukhi Suresh is an actor, stand-up comic, writer, and director. She worked at Mindtree before pursuing comedy and has been compared to Tina Fey by Hindustan Times. Sumukhi also launched a content platform called ‘Motormouth‘ for writers to pitch stories for movies and web shows.

R Madhavan

Prior to his successful career as an actor, writer, director, and producer, predominantly in Tamil and Hindi films, R Madhavan had a background in technology. He spent a brief period working as a software programmer in Canada before returning to India to pursue his acting career.

Throughout his career, he has received numerous awards, including one National Film Award and two Tamil Nadu State Film Awards, among others. Currently, he holds the position of the president at the Film and Television Institute of India (FTII) in Pune.

Siddharth

Siddharth is a well-known actor who has worked in Tamil, Telugu, and Hindi cinema. In addition to acting, he has also contributed to films as a screenwriter, producer, and playback singer.

Before entering the film industry, he began his career in the tech field, at IBM. However, he later decided to pursue acting. In 2014, he had a highly successful year, winning critical acclaim and achieving box office success.

The post Meet the Indian Techies who Turned into Actors appeared first on AIM.

Virtual Reality Brings You Closer to God

Anshul Vipat — Sat, 22 Jun 2024 04:30:00 +0000

Recently, in Varanasi, children and elderly women were seen sitting with folded hands and peering intently into their VR headsets for a virtual darshan of the famous Kashi Vishwanath temple.

Harshit Shrivastava and his TechXR team are exploring the virtual world beyond gaming to develop immersive content for temples and other religious places around India.

“Two virtual reality devices are being used during the trial. On an average, 250 devotees get a virtual darshan of Baba Kashi Vishwanath daily,” said Shrivastava, in an interview with AIM.

God Comes Home

This feature isn’t new. In 2022, Shrivastava and his team first experimented with this technology at Ujjain’s Mahakaleshwar temple. “We set up three physical experience centres, replete with VR headsets, AR devices, and 3D-printed scale models of the sanctum sanctorum,” said Shrivastava.

Another startup company Tagbin’s pet project Temple 360 has integrated 36 temples on its virtual website. This allows people to visit these temples remotely and perform virtual darshan, prayers, and rituals online for any of the 36 temples covered on the platform.

Similarly, Experience Makkah, offers a similar experience, making yatris incapable of undertaking Hajj and Umrah on a virtual tour.

It uses 3D modelling to let users circle the Kaaba building, meet praying pilgrims dressed in white terry cloth garments, learn about the rituals and explore other significant landmarks. Experience Makkah’s latest version can be explored through Google Cardboard, a low-cost cardboard attachment that turns smartphones into virtual reality viewers.

Holy City, another VR application, gives a glimpse of Jerusalem’s Old City.

Saurav Bhaik, the founder and CEO of Tagbin, and his team have also developed a hologram of Shri Krishna. Devotees can ask about their life problems, and the hologram will answer based on verses from Bhagavad Gita.

“This speech-to-text and text-to-speech technology is a large language model trained on only Bhagwad Gita translations in English,” Bhaik said.

Such encounters are just one of the many emerging locations in the metaverse. In this immersive virtual world, individuals can connect via avatars, which have risen in popularity throughout the pandemic.

The metaverse pilgrimage tours aim to replicate the feel in a virtual world. Regardless of age or medical condition, you can efficiently perform darshan from the comfort of your home.

Limitations

While the experience might be immersive for devotees, the technology is expensive. Shrivastava had to import VR devices into India, with each device costing INR 1 lakh. That’s when he got the idea of augmenting VR into our smartphones.

“Through our Durlabh Darshan app, devotees can take live darshan. The subscription cost is as low as INR 2,500 per year,” he added.

Other startups, too, need to make it accessible to all.

“In the past, VR headsets were very bulky and of low quality, requiring phones to be inserted into basic viewers like Google Cardboard. However, now high-end headsets like Apple Vision Pro are lighter, more comfortable to wear for longer periods, and provide improved visual experiences,” said Bhaik.

Difficult To Scale

Apart from the cost, scaling VR into pilgrimage has another set of problems. VR headsets have no safety requirements in India.

“The lack of data sets specific to the Indian context leads to hallucinations in these tools. For example, one of the GenAI images of Goddess Saraswati had seven toes,” said Ajit Padmanabh, the founder & CEO of Who VR. His company is integrating VR into museums and temples.

The numbers support the claim. As of 2024, the VR market in India is $789 million. In contrast, the USA, which is the leading revenue generator in this market, has a projected volume of $10,900 million.

Needs Heavy Computing

One of the key advantages of virtual pilgrimage events is accessibility. When the coronavirus put a break on travel, Nimrod Shanit came up with the Holy City that gives a glimpse of Jerusalem’s Old City.

“In creating these 3D spaces, hundreds of thousands of photos were captured using extremely high-resolution cameras. The data footprint of this project exceeded 50 terabytes, and the computer power required to process it was immense,” said Shanit.

Is it really worth it?

Traditionalists dismiss the idea as analogous to converting a temple into a theme park or that these are simple gimmicks that do not stir spiritual energies. They raise the question: “How can one do a pilgrimage without doing a pilgrimage?”

However, Shrivastava disagrees. “The idea of introducing VR is to enhance the devotee’s experience, not replace it.”

Opportunities Galore

India’s spiritual sector is estimated to be worth between $30 billion and $40 billion. According to the tourism ministry, India’s religious tourism sector attracted 1,439 million tourists in 2022.

It is predicted to increase by 16% by 2030. The sector is expected to earn $59 billion in revenue by 2028 and provide 140 million temporary and permanent jobs by 2030. The need to augment technology is real.

“The tourism industry is starved of technology. India needs to be at the forefront of the metaverse. We cannot let our artisans and locals miss this bus the way our population missed the internet boom in the 2000s,” said Padmanabh.

The post Virtual Reality Brings You Closer to God appeared first on AIM.

Why is C++ Not Used in AI Research?

Anshul Vipat — Fri, 21 Jun 2024 12:30:00 +0000

C++, a language that once shone brightly in the late twentieth century, was at the forefront of technological advancements, particularly in space exploration.

However, the emergence of newer, more visually appealing programming languages has shifted the spotlight away from C++.

At the AI+Data Summit 2024, researcher Yejin Choi said that researchers no longer use the language for AI research.

So, is C++ becoming a relic of the past?

Not Many Takers for AI

Despite its performance benefits and applications in various AI fields, such as speech recognition and computer vision, C++ is not the go-to language for AI development.

Its complexity and steep learning curve pose significant challenges. In contrast, Python’s user-friendly nature, extensive libraries, and large developer communities have propelled it to the forefront of AI programming.

Furthermore, C++ involves manual memory management, which can result in memory leaks and errors if not done correctly. This can be a considerable issue, particularly in large-scale AI programmes.

Microsoft emphasised this issue when it revealed that 70% of its updates in the previous 12 years were solutions for memory safety bugs, owing to Windows being mostly written in C and C++.

Google’s Chrome team released their own research, which revealed that memory management and safety flaws accounted for 70% of all major security bugs in the Chrome codebase. It is largely written in C++.

C++ also lacks built-in support for garbage collection, database access, and threading, which can necessitate extra effort to develop.

This can be particularly challenging in AI applications that require concurrent processing of data and tasks, such as deep learning and neural networks, real-time systems and embedded systems, data processing, and data science.

To overcome these limitations, developers often use third-party libraries and frameworks that provide threading support, such as OpenMP or Boost. However, these libraries can add complexity and overhead to the code, which may only be ideal for some applications.

C++ is Complicated

If you’ve visited a page like the C++ FAQ, you’ll understand how hard C++ can be. A comma in the wrong location might trigger hundreds of compile errors in earlier language versions.

The language has improved since C++ 11, with move semantics for transferring ownership and rvalue references, although there is still a high learning curve.

Developing a New Application

In recent years, we’ve witnessed the growth of various programming languages that potentially replace C++ for low-level system tasks, like Rust, which provides safety and security by eliminating buffer overflows and memory leaks (and is much easier to learn than C++).

When you compare the feature sets of modern languages like C++, Python, and Rust, the C language begins to look like a dinosaur! The C standard has not had new features introduced since 2011!

The 2017 standard release included technical corrections and clarifications, and the 2023 standard release did not rock the boat either.

Is C++ Losing Popularity?

Mark Russinovich, the chief technical officer of Microsoft Azure, has stated that developers should stop creating code in the programming languages C and C++ and that the industry should treat these computer languages as “deprecated”.

Ken Thompson, the Bell Labs researcher who designed the original Unix operating system, called it a “bad language” that is “way too big, way too complex” and “obviously built by a committee”.

GitHub compiled a list of the top ten most popular programming languages for machine learning. Python is the most popular language in machine learning repositories, with C++ being sixth.

According to Stack Overflow’s Developer Survey, beginners beginning to code are more likely to prefer Python over C++ than professionals.

While C++ provides advantages regarding speed and memory management, it also has disadvantages, such as a high learning curve and little community assistance.

Despite its challenges, C++ can be a powerful choice for machine learning applications that require high-performance processing and advanced memory management. The choice between C++ and Python for machine learning ultimately depends on the specific needs of the application and the developers’ skill level.

The post Why is C++ Not Used in AI Research? appeared first on AIM.

The 10 Best Videos Created by Luma AI

Tarunya S — Fri, 14 Jun 2024 11:25:27 +0000

Close on the heels of Sora & Kling, comes a new contender – Dream Machine. California-based startup Luma AI, which focuses on visual AI, has unveiled this new video generator that stands out due to its use of AI to create realistic visual content.

One of the key differentiators is the photorealistic quality of its videos. The AI algorithms employed by Luma meticulously analyse and enhance every detail, from texture to lighting, ensuring that the final output looks almost indistinguishable from real-world footage.

A prime contributor to Luma’s success is AWS. Amazon’s cloud computing subsidiary has provided Luma AI with the infrastructure, exposure and practical applications, showcasing its capabilities in streamlining production processes.

“Great to see how AWS H100 training infrastructure helped the Luma AI team reduce time to train foundation models and support the launch of Dream Machine,” said Swami Sivasubramanian, vice president for data and machine learning services, AWS.

Co-founded in 2021 by CEO Amit Jain, Luma AI is currently based in San Francisco, California.

AIM decided to try out Dream Machine to produce a video. Here’s a look at it.

Meanwhile, we have also compiled a list of the top 10 mind-blowing videos produced by Dream Machine.

A Woman

This AI-generated video features a woman with a shaved head wearing a blue outfit. She appears to have a serious expression, and the background includes a building with multiple windows, suggesting an urban setting.

By allowing everyone to experiment with AI-powered video generation for free on its website, Luma AI has hit a major milestone in the field.

The abandoned Building

The video depicts a long, narrow hallway with dim lighting, likely located in an abandoned or poorly maintained building. The corridor has graffiti writing, peeling paint, and debris scattered on the floor. The ambience is eerie and desolate.

This highlights the advanced visual capabilities of AI in capturing and rendering detailed environments.

Girl with a Pearl Earring

Here, the video brings the painting, ‘Girl with a Pearl Earring’, the timeless beauty of Johannes Vermeer’s masterpiece to life using AI.

As the painting is transformed into a realistic video with every brushstroke and delicate detail, it captures the subtle play of light and shadow, the intricate textures, and the serene expression of the girl.

This visual experience shows the original artwork while offering a fresh, modern perspective through the unsettling potential of AI.

Kabosu!

With Dream Machine, this video brings Kabosu to life. Every detail, from the eyes to the fluffy coat, is rendered with creativity and high-quality visuals, demonstrating the advanced capabilities of the model.

The body reconstruction, backed by the model’s new technology, allows users to create videos in various aspect ratios. Overall, it showcases its potential in generating high-quality, life-like video content, making it a standout in the field of digital animation.

Mark Zuckerberg

The Mark Zuckerberg video made by Dream Machine showcases an innovative application of artificial intelligence and technology.

In this video, it appears as though Zuckerberg is in the middle of the woods, looking outside through a glass window. This almost-realistic clip can be viewed from multiple angles. It also captures and renders his movements and expressions, bringing a new level of realism to virtual representations.

The potential of AI in creating life-like digital avatars paves the way for future advancements in virtual communication and entertainment.

Willy Wonka Walks Off

In this video, Willy Wonka is digitally recreated where he walks away, expressing disappointment. The character’s facial expressions, gestures, and mannerisms align perfectly which offers a glimpse into the future of digital media and storytelling possibilities.

The precise features and seamless editing add to future creativity in AI.

Disaster Girl Meets Firefighters

This AI-generated video contains several realistic elements, such as a young girl smiling, firefighters attempting to extinguish a fire, and two officers having a conversation at the end.

This serves as an example of AI’s capacity to bridge digital content with real-world impact.

A Girl & a Zeal of Zebras

This video featuring a girl and zebras in the forest goes beyond mere visuals; it intricately weaves together elements of nature, human curiosity, and storytelling.

Set against the backdrop of lush greenery it shows the girl’s encounter with the zebras showing seamless integration of AI technology in entertainment. Through advanced algorithms, the characters exhibit life-like movements and expressions, enhancing the immersive experience.

The Eye

The video focusing on the eye exemplifies an exploration of visual perception through advanced AI techniques. This captivating clip delves into the intricacies of the human eye, capturing its mesmerising colours.

The AI algorithms show the light refraction, intense colour, and slow zoom, creating a highly realistic and captivating scene.

The Masked People

This clip features a captivating scene in which a group of masked individuals are situated within a vibrant environment painted in striking hues of bright blue and pink. The contrasting colours of the room amplify the presence of the masked figures, creating an intriguing visual that captivates viewers.

The characters’ movements within their space are rendered with detail. The AI ensures that each gesture and reaction is natural, enhancing the viewer’s engagement and the characters’ believability.

The post The 10 Best Videos Created by Luma AI appeared first on AIM.

6 Incredible Ways LLMs are Transforming Healthcare

Anshul Vipat — Fri, 14 Jun 2024 06:09:26 +0000

Last year, Google decided to explore the use of large language models (LLMs) for healthcare, resulting in the creation of Med-PaLM, an open-source large language model designed for medical purposes.

The model achieved an 85% score on USMLE MedQA, which is comparable to an expert doctor and surpassed similar AI models such as GPT-4.

Just like Med-PaLM, several LLMs positively impact clinicians, patients, health systems, and the broader health and life sciences ecosystem. As per a Microsoft study, 79% of healthcare organisations reported using AI technology currently.

The use of such models in healthcare is only expected to grow due to the ongoing investments in artificial intelligence and the benefits they provide.

LLMs in Medical Research

Recently, Stanford University Researchers used an LLM to find a potential new heart disease treatment. Using MeshGraphNet, an architecture based on graph neural networks (GNNs), the team created a one-dimensional Reduced Order Model (1D ROM) to simulate blood flow.

MeshGraphnet provides various code optimisations, including data parallelism, model parallelism, gradient checkpointing, cuGraphs, and multi-GPU and multi-node training, all of which are useful for constructing GNNs for cardiovascular simulations.

https://twitter.com/Jousefm2/status/1772151378279899345

Llama in Medicine

Researchers at the Yale School of Medicine and the School of Computer and Communication Sciences at the Swiss science and technology institute EPFL used Llama to bring medical know-how into low-resource environments.

One such example is Meditron, a large medical multimodal foundation model suite created using LLMs. Meditron assists with queries on medical diagnosis and management through a natural language interface.

This tool could be particularly beneficial in underserved areas and emergency response scenarios, where access to healthcare professionals may be limited.

Researchers at @ICepfl & @YaleMed teamed up to build Meditron, an LLM suite for low-resource medical settings. With Llama 3, their new model outperforms most open models in its parameter class on benchmarks like MedQA & MedMCQA.

More details https://t.co/nqKebwOGKa pic.twitter.com/BHlwd8Q3zJ
— AI at Meta (@AIatMeta) April 29, 2024

According to a preprint in Nature, Meditron has been trained in medical information, including biomedical literature and practice guidelines. It’s also been trained to interpret medical imaging, including X-ray, CT, and MRI scans.

Bolstering Clinical Trials

Quantiphi, an AI-first digital engineering company, uses NVIDIA NIM to develop generative AI solutions for clinical research and development. These solutions, powered by LLMs, are designed to generate new insights and ideas, thereby accelerating the pace of medical advancements and improving patient care.

Likewise, ConcertAI is advancing a broad set of translational and clinical development solutions within its CARA AI platform. The Llama 3 NIM has been incorporated to provide population-scale patient matching for clinical trials, study automation, and research.

Data Research

Mendel AI is developing clinically focused AI solutions to understand the nuances of medical data at scale and provide actionable insights. It has deployed a fine-tuned Llama 3 NIM for its Hypercube copilot, offering a 36% performance improvement.

Mendel is also investigating possible applications for Llama 3 NIM, such as converting natural language into clinical questions and extracting clinical data from patient records.

Advancing Digital Biology

The Techbio pharmaceutical companies and life sciences platform providers use NVIDIA NIM for generative biology, chemistry, and molecular prediction.

This involves using LLMs to generate new biological, chemical, and molecular structures or predictions, thereby accelerating the pace of drug discovery and development.

Transcripta Bio, a company dedicated to drug discovery has a Rosetta Stone to systematically decode the rules by which drugs affect the expression of genes within the human body. Its proprietary AI modelling tool Conductor AI discovers and predicts the effects of new drugs at transcriptome scale.

It also uses Llama 3 to speed up intelligent drug discovery.

BioNeMo is a generative AI platform for drug discovery that simplifies and accelerates the training of models using your own data and scaling the deployment of models for drug discovery applications. BioNeMo offers the quickest path to both AI model development and deployment.

Then there is AtlasAI drug discovery accelerator, powered by the BioNeMo, NeMo and Llama 3 NIM microservices. AtlasAI is being developed by Deloitte.

Medical Knowledge and Medical Core Competencies

One way to enhance the medical reasoning and comprehension of LLMs is through a process called ‘fine-tuning’. This involves providing additional training with questions in the style of medical licensing examinations and example answers selected by clinical experts.

This process can help LLMs to better understand and respond to medical queries, thereby improving their performance in healthcare applications.

Examples of such tools are First Derm, a teledermoscopy application for diagnosing skin conditions, enabling dermatologists to assess and provide guidance remotely, and Pahola, a digital chatbot for guiding alcohol consumption.

ChatDoctor: A medical chat model fine-tuned on LLaMA using medical domain knowledge.

Collects data on around 700 diseases and generated 5K doctor-patient conversations to finetune the LLM.

paper: https://t.co/XaLLaem9U6
code: https://t.co/aJOCOwKDyF pic.twitter.com/YbQqXLig26
— elvis (@omarsar0) March 28, 2023

Chatdoctor, created using an extensive dataset comprising 100,000 patient-doctor dialogues extracted from a widely utilised online medical consultation platform, could be proficient in comprehending patient inquiries and offering precise advice.

They used the 7B version of the LLaMA model.

The post 6 Incredible Ways LLMs are Transforming Healthcare appeared first on AIM.

Top 9 Voice-Based Generative AI Assistants Transforming Interaction

Tarunya S — Tue, 11 Jun 2024 04:28:37 +0000

Voice-based generative AI assistants are quietly revolutionising the way we interact with technology, making subtle yet impactful strides. These AI companions are not just about responding to commands anymore; they’re becoming more intuitive, empathetic, and capable of understanding complex human emotions and contexts.

While the progress may seem incremental, the depth of their capabilities is expanding rapidly. Here, we delve into the best voice-based generative AI assistants that are leading the charge.

Top 9 Voice-Based Generative AI Assistants

GPT-4O
Hume AI (EVI)
Project Astra
Pi AI
Perplexity AI
Character.ai
Claude AI
Chatsonic AI
Google Gemini

GPT-4o

First and foremost, OpenAI’s GPT-4o is more advanced and better equipped to create complex applications with many functionalities, which proves its higher level of “development” and the ability to generate more comprehensive code.

Previewed at the recent OpenAI Spring Update announcement, it is the newest flagship model that provides GPT-4-level intelligence but is faster and improves on its capabilities across text, voice, and vision.

GPT-4o is much better than any existing model at understanding and discussing the images you share.

Hume AI (EVI)

Hume AI is an AI technology focused on understanding human emotions to improve interactions between humans and machines. It aims to understand and respond to a wide range of emotional states, using these insights to guide in the AI development.

The company is developing specialised AI models to recognize emotions in diverse cultural contexts, addressing global user needs. Hume AI’s emotion recognition algorithms are being tested for use in virtual reality environments to create more immersive and responsive experiences.

A 20-minute unscripted, unedited, and uncut conversation with @hume_ai + @AnthropicAI about neuroscience, mental health, ancient Greek philosophy, and politics. pic.twitter.com/A5quXNcpYk
— Shan (@ShanRizvi) May 14, 2024

Project Astra

Project Astra, unveiled at Google I/O 2024, could end up as one of Google’s most important AI tools. Astra is being billed as “a universal AI agent that is helpful in everyday life”. It’s something like Google Gemini with added features and supercharged capabilities for a natural conversational experience.

Demis Hassabis says Project Astra is Google's vision for the ultimate AI agent because it is multi-modal and this will lead to truly smart assistants pic.twitter.com/UXuyIuBg9R
— Tsarathustra (@tsarnick) June 7, 2024

Pi AI

Pi, your very own personal AI, from Inflection isn’t just another chatbot, it’s a leap forward in personal intelligence, designed to be there for you, anytime and evolve with every conversation. Pi stands for ‘personal intelligence’.

Pi can also express emotions and empathy, using natural language and emojis. It is designed to be a kind and supportive companion assistant.

Perplexity AI

Perplexity’s main product is its search engine, which relies on NLP. It utilises the context of the user queries to provide a personalised search result. Perplexity summarises the search results and produces a text with inline citations. It helps create, organise, and share information seamlessly.

This model is trained on large datasets of human speech, which include diverse voices, accents, and languages. The extensive training allows the model to generalise well and produce high-quality voice outputs across different contexts.

pic.twitter.com/QEupJvFj5S
— Perplexity (@perplexity_ai) June 7, 2024

Character.ai

Character AI is an exciting and innovative AI chatbot web application that opens up a world of possibilities for interactive conversations. Its capabilities, including the ability to chat with various characters and create personalised interactions, make it a unique and engaging platform.

Claude AI

Claude’s code of ethics, speed, and ability to process large volumes of information enable you to efficiently leverage AI for complex analysis and content generation. However, it’s important to be mindful of potential inaccuracies and limited capabilities.

It is an AI assistant that can generate natural, human-like responses to users’ prompts and questions. Claude can respond to text or image-based inputs and is available on the web or through the Claude mobile app.

Claude AI | BETTER THAN ChatGPT! | How to Use Anthropic AI Claude 3 FREE

Chatsonic AI

Chatsonic is a solid AI-powered chatbot that can help you write blog posts, social media posts, or anything else that you can think of. Whether it’s crafting engaging blog posts, helping with creative writing, or even answering questions, Chatsonic is a reliable and versatile tool. Its ability to generate content quickly and efficiently is truly impressive.

@ChatSonicAI can now talk to images

Upload your image & let Chatsonic analyze it to:

1. Generate related content – Like captions, product descriptions, alt text, marketing copy, etc.

2. Get design reviews – Ask Chatsonic to give improvements for your ad or social media… pic.twitter.com/pjueOW0zAW
— Samanyou Garg (@SamanyouGarg) November 29, 2023

Google Gemini

Gemini for Google Cloud is a new generation of AI assistants for developers, Google Cloud services, and applications. These assist users in working and coding more effectively, gaining deeper data insights, navigating security challenges, and more.

Google co-founder Sergey Brin is credited with helping develop the Gemini LLMs, alongside other Google staff.

The post Top 9 Voice-Based Generative AI Assistants Transforming Interaction appeared first on AIM.

Meet The Indian Techies Who Turned Into Sports Stars

Sukriti Gupta — Sun, 09 Jun 2024 05:01:35 +0000

Indian-origin Saurabh Netravalkar, who represented the India under-19 team, and is now USA’s top cricketer, became an international sensation after winning the recent T20 World Cup match for USA against Pakistan. The interesting part: Netravalkar is the principal member of the technical staff at Oracle, where he has been working for eight years.

Before relocating to the United States in 2015, Netravalkar had a brief stint in Indian domestic cricket. He represented Mumbai in the prestigious Ranji Trophy and was part of the India U-19 team, alongside future cricket stars including KL Rahul, Mayank Agarwal, Harshal Patel, Jaydev Unadkat, and Sandeep Sharma. During the 2010 ICC U-19 World Cup, he emerged as India’s highest wicket-taker, securing nine wickets across six matches.

A graduate of the University of Mumbai and the renowned Cornell University, Netravalkar also co-founded CricDeCode, an app dedicated to cricket.

While social media is full of memes and tweets about the coder turned cricketer, we bring you a list of popular Indian sports personalities who also hold engineering degrees and once worked as techies.

Manasi Joshi

The Indian para-badminton player holds a degree in Electronics Engineering from K. J. Somaiya College of Engineering in Mumbai, and worked as a software engineer until a tragic accident in 2011 led to the amputation of her left leg.

Despite this setback, Joshi found solace in badminton, which she had played since she was six years old. She started playing para-badminton in 2012 and won a gold medal at the 2019 Para-Badminton World Championships in Switzerland, becoming the first Indian athlete to win a gold medal in the sport.

Shikha Pandey

Pandey holds a degree in Electronics and Electrical Engineering from the Goa College of Engineering and also served as an Indian Air Force officer.

After completing her engineering degree in 2010, Pandey was offered jobs by three multinational companies, but she declined all these placement offers and decided to take a year off and focus on her cricketing career.

Pandey represented Goa in domestic cricket and was part of the Indian Women’s Cricket team that won the 2017 ICC Women’s World Cup Qualifier. At the time of the 2020 ICC Women’s T20 World Cup, she held the rank of Squadron Leader.

Sathiyan Gnanasekaran

Gnanasekaran holds a degree in Information Technology from St. Joseph’s College of Engineering in Chennai, and has worked for companies like ONGC as a software engineer. He started playing table tennis as a hobby and was spotted by former Indian paddler Subramanian Raman, who encouraged him to pursue the sport seriously.

Gnanasekaran became the first Indian table tennis player to break into the World Top-25 ITTF rankings in May 2019, after attaining his career best World ranking of 24.

Ravichandran Ashwin

The famous Indian off-spinner pursued a B.Tech degree in Information Technology from SSN College of Engineering in Chennai, and worked as an engineer before turning to cricket.

Ashwin started playing cricket at the age of nine for YMCA and was coached by Chandrasekar Rao during the early part of his career. He represented the Indian under-17 team as an opening batter and later took up medium-pace bowling before switching to off-spin.

Akash Madhwal

The Mumbai Indians star of IPL 2023, pursued a degree in civil engineering from the College of Engineering Roorkee in Uttarakhand. Before turning to cricket, he worked as a practicing engineer. Madhwal made his domestic cricket debut for Uttarakhand in 2019 and has since taken 67 wickets in 56 professional matches across formats.

He joined the Mumbai Indians squad in 2022 as a replacement for the injured Suryakumar Yadav but did not get to play. However, in the 2023 IPL season, Madhwal seized his opportunity and delivered a record-breaking performance in the Eliminator match against Lucknow Super Giants.

Shikha Tandon

The renowned Indian swimmer did her B.Sc. in biotechnology, genetics, and biochemistry from Jain College, Bangalore, India in 2003.

Tandon represented India at the 2004 Athens Olympics, where she participated in the 50m and 100m freestyle events, becoming the first Indian swimmer to qualify for two separate events in an Olympic competition.

She has won 146 national medals and 36 international medals, including five gold medals. After retiring from competitive swimming in 2009, she moved to the USA to pursue a graduate course in bio-sciences.

Tandon worked with the United States Anti-Doping Agency (USADA) for over five years and is currently the Director of Global Partnerships at SVEXA, an exercise intelligence and sports analytics company.

Anil Kumble

The legendary Indian cricketer holds a degree in Mechanical Engineering from Rashtreeya Vidyalaya College of Engineering in Bangalore. He began his cricketing journey at a young age, playing for his school and later for the Karnataka State team. However, he did not give up his engineering career immediately.

Before turning to cricket full-time, Kumble worked as an engineer for a brief period. He even created a software package for the Indian cricket team in 1996, which was an extension of the scoring sheet to gather data for analysis.

Javagal Srinath, the former Indian fast bowler, and EAS Prasanna, the spin legend, also hold engineering degrees.

The post Meet The Indian Techies Who Turned Into Sports Stars appeared first on AIM.

10 AI Courses from Andrew Ng You Must Take

Gopika Raj — Thu, 06 Jun 2024 09:21:35 +0000

Andrew Ng, the founder of Deep Learning.AI and co-founder of Coursera, is a prominent figure in the fields of machine learning and deep learning. His courses on AI are highly regarded by people because they are well-structured and provide insights into the latest developments in the field.

Ng’s courses often include practical assignments and projects that allow one to gain real-world experience in implementing deep learning algorithms and models. These courses are regularly updated to reflect the most recent developments in deep learning.

Register for this Free AI Workshop >

Here are the latest Andrew Ng courses that will help you gain knowledge and develop skills in AI.

AI Agents in LangGraph

In this short course, you will learn how to integrate agentic search to enhance an agent’s knowledge with query-focused answers in predictable formats. You will also learn about implementing agentic memory to save state for reasoning and debugging and see how human-in-the-loop input can guide agents at key junctures.

One can build an agent from scratch and then reconstruct it with LangGraph to thoroughly understand the framework. Finally, one will develop a sophisticated essay-writing agent that incorporates all the lessons from the course.

Enroll and get more details on the course here.

AI Agentic Design Patterns with AutoGen

In this course, you will learn how to use AutoGen to implement agentic design patterns such as multi-agent collaboration, sequential and nested chat, reflection, tool use, and planning.

You will also learn to build and combine specialised agents—like researchers, planners, coders, writers, and critics—that interact to execute complex workflows, such as generating detailed financial reports, which would otherwise require extensive manual effort.

The course includes key agentic design principles with fun demonstrations. For instance, one can build a conversational chess game with two player agents that validate moves, update the board state, and engage in lively banter about the game.

Get to know more about the course and enroll here.

Introduction to On-device AI

In this course, you will deploy a real-time image segmentation model on device, learning essential steps for on-device deployment: neural Network graph capture, on-device compilation, hardware acceleration, and validation of numerical correctness.

Additionally, you will learn how quantisation can make the model 4x faster and 4x smaller, improving performance on resource-constrained edge devices. These techniques are used to deploy models on various devices, including smartphones, drones, and robots, enabling many new and creative applications.

Get more details on the course here.

Multi AI Agent Systems with Crew AI

In this course, one will learn to break down complex tasks into subtasks for multiple AI agents, each with a specialised role.

For example, creating a research report might involve researchers, writers, and quality assurance agents working together. One can define their roles, expectations, and interactions, similar to managing a team.

Additionally, explore key AI techniques such as role-playing, tool use, memory, guardrails, and cross-agent collaboration. Also, build multi-agent systems to tackle complex tasks, finding it both productive and enjoyable to design and watch these agents collaborate.

Enroll and get more details on the course here.

Building Multimodal Search and RAG

In this course, one will learn how contrastive learning works and how to add multimodality to RAG, allowing models to use diverse, relevant contexts to answer questions.

For instance, a query about a financial report might integrate text snippets, graphs, tables, and slides. Also one will learn how visual instruction tuning integrates image understanding into language models and how to build a multi-vector recommender system using Weaviate’s open-source vector database.

Get more details on the course here.

Building Agentic RAG with LlamaIndex

This covers an important shift in RAG, where instead of having the developer write explicit routines to retrieve information for the LLM context, one can build a RAG agent with access to various tools for retrieving information.

One will learn in detail about routing, where the agent uses decision-making to direct requests to multiple tools; tool use, where one can create an interface for agents to select the appropriate tool (function call) and generate the right arguments; and multi-step reasoning with tool use.

Get more details on the course here.

Quantisation In Depth

In this course, you will learn to implement various linear quantisation techniques from scratch, including asymmetric and symmetric modes. Additionally, it will quantise at different granularities (per-tensor, per-channel, per-group) to maintain performance.

You will be able to construct a quantizer to compress the dense layers of any open-source deep learning model to 8-bit precision. Finally, you will practice quantising weights into 2 bits by packing four 2-bit weights into a single 8-bit integer.

Get more details on the course here.

In Prompt Engineering for Vision Models

Here, one will learn how to prompt and fine-tune vision models for personalised image generation, editing, object detection, and segmentation. Depending on the model, prompts can be text, coordinates, or bounding boxes. Additionally one will adjust hyperparameters to shape the output.

One will learn how to work with models like Segment-Anything Model (SAM), OWL-ViT, and Stable Diffusion. Also, to fine-tune Stable Diffusion using a few images to generate personalised results, such as images of a specific person.

Learn more and enrol for the course here.

Getting Started with Mistral

In this course, you will explore Mistral’s open-source models (Mistral 7B, Mixtral 8x7B) and commercial models via API calls and Mistral AI’s Le Chat website.

Implement JSON mode to generate structured outputs for direct integration into larger software systems. Also, you can use function calling for tool use, such as calling custom Python code that queries tabular data.

Ground the LLM’s responses with external knowledge sources using RAG. Build a Mistral-powered chat interface that can reference external documents. This course will help deepen one’s prompt engineering skills.

Get more details and enrol for the course here.

Preprocessing Unstructured Data for LLM

To expand LLM’s knowledge, it’s essential to extract and normalise content from diverse formats such as PDF, PowerPoint, and HTML. This involves enriching the data with metadata to enable more powerful retrieval and reasoning.

In this course, one will learn to preprocess data for LLM applications, focusing on various document types. Also, discover how to extract and normalise documents into a common JSON format enriched with metadata for better search results.

The course covers techniques for document image analysis, including layout detection and vision transformers, to handle PDFs, images, and tables. Additionally, one will learn to build a RAG bot capable of ingesting diverse documents like PDFs, PowerPoints, and Markdown files.

Enrol and get more details on the course here.

The post 10 AI Courses from Andrew Ng You Must Take appeared first on AIM.

Top 12 Generative AI Courses Available on ADaSci

Sukriti Gupta — Wed, 05 Jun 2024 11:53:43 +0000

AI is rapidly evolving, and in order to stay relevant it’s important that you match the pace and keep yourself updated with the latest AI advancements.

To facilitate this, The Association of Data Scientists (ADaSci) offers a variety of AI courses designed to cater to different expertise levels, from mastering LangChain and building AI agents to understanding RAG and parameter-efficient fine-tuning.

Whether you’re a beginner in the GenAI field or a seasoned AI professional, these courses provide hands-on experience and detailed knowledge to keep you ahead in the game. ADaSci’s unique courses are not available anywhere else.

Discover the top 12 AI courses available on ADaSci and unlock new opportunities.

Generative AI Crash Course with Hands-on Implementations

This course will help you get an in-depth understanding of GenAI and its popular models. Participants will receive a detailed knowledge of GPT models, diffusion models, different NLP transformers and ChatGPT. The course will further provide you with a hands-on knowledge of implementing GenAI models in real-world applications.

This course caters to everyone, from beginners in GenAI looking to deepen their understanding and practical skills to professionals in AI and related fields seeking to update their knowledge with the latest advancements in GenAI.

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

This LangChain workshop will help participants master GenAI for innovative applications across industries. You will learn to build and deploy custom AI agents, leveraging LangChain for transformative personalised solutions.

Participants should have a foundational understanding of AI and basic programming skills, preferably in Python.

Diving Deeper into Retrieval-Augmented Generation (RAG) with Vector Databases

This course will help you master the core principles of RAG and its advantages over pure generative models. Participants will delve into advanced AI techniques, unlocking the synergy between RAG and vector databases. You will also understand the tools and strategies for building, deploying, and optimising RAG systems.

Parameter-efficient Fine-tuning of Large Language Models

This workshop will help you understand Parameter-efficient fine-tuning (PEFT) techniques and their benefits for LLM adaptation. Participants will learn methods like LoRA, adapters, and prompt tuning to achieve remarkable results using less parameters.

You will also get hands-on experience building and evaluating your own PEFT model on provided datasets. With this course, you can master resource-efficient training strategies and deployment options for PEFT models.

Building Generative AI Applications with Amazon Bedrock

This hands-on course will provide you with a solid understanding of the Amazon Bedrock architecture, capabilities, and applications. It will help participants develop skills in building and deploying GenAI applications on Bedrock, allowing them to gain insights into real-world use cases, best practices, and the future potential of Bedrock.

Mastering Prompt Engineering for LLMs

With this course, participants will understand the fundamentals of prompt engineering and master the art of crafting, optimising, and customising prompts for various AI models.

It will help you explore various prompting concepts and techniques such as Zero-shot and Few-shot Prompting, Chain of Thought Prompting, Knowledge Generation Prompting, and more.

The LLMOps : Streamlining the GenAI & LLM Operations

This course can be beneficial in understanding the fundamentals of LLMOps and its role in GenAI-powered systems and NLP.

Participants will develop knowledge about the workings of LLMOps and explore its challenges such as model training, deployment, monitoring, and maintenance. They will also learn the design process of LLMOps and acquire practical skills in innovating within the LLMOps operations.

Autonomous AI Agents and AI Copilots

This course will teach you the foundational concepts behind building AI agents and delve into different ML techniques that make them smarter. It also examines the challenges of creating dependable AI agents and the ethical considerations that come with them.

Through this course, you’ll be able to analyse the potential benefits and limitations of autonomous AI agents and AI copilots in different application domains such as healthcare, finance, creative work etc.

You will also understand various techniques in autonomous AI agents and copilots such as BabyAGI, MetaGPT, and Semantic Kernels.

Advanced RAG with Pinecone

This course will take your text generation skills to the next level. It will help you master the utilisation of Pinecone for information retrieval in RAG.

You’ll learn about integrating knowledge bases and crafting powerful prompts, creating informative and creative text outputs.

Building Multi-Agent LLMs with AutoGen

With this course, you’ll learn how to build multi-agent LLMs and create collaborative AI systems using the AutoGen framework.

It will also help you unlock real-world applications, exploring how multi-agent LLMs can be applied in various domains for problem-solving.

Vector Search Techniques with Weaviate

This course will help you explore advanced vector search techniques using Weaviate, a vector search engine. You will learn about Weaviate’s architecture, features, and capabilities for vector-based search and semantic querying.

Participants will dive into hands-on exercises to master indexing, querying, and optimising vector search performance.

Generative AI Application Development with Azure

This course equips you with essential skills to develop, deploy, and monitor GenAI applications using Microsoft Azure. You will gain hands-on experience with Azure’s powerful AI services, enhance your technical expertise, and learn to develop scalable AI solutions.

The post Top 12 Generative AI Courses Available on ADaSci appeared first on AIM.

18 Free AI Courses by NVIDIA in 2024

Shritama Saha — Mon, 03 Jun 2024 08:39:42 +0000

NVIDIA is one of the most influential hardware giants in the world. Apart from its much sought-after GPUs, the company also provides free courses to help you understand more about generative AI, GPU, robotics, chips, and more.

Most importantly, all of these are available free of cost and can be completed in less than a day. Let’s take a look at them.

Register for the Free Workshop >

1. Accelerating Data Science Workflows with Zero Code Changes

Efficient data management and analysis are crucial for companies in software, finance, and retail. Traditional CPU-driven workflows are often cumbersome, but GPUs enable faster insights, driving better business decisions.

In this workshop, one will learn to build and execute end-to-end GPU-accelerated data science workflows for rapid data exploration and production deployment. Using RAPIDS-accelerated libraries, one can apply GPU-accelerated machine learning algorithms, including XGBoost, cuGraph’s single-source shortest path, and cuML’s KNN, DBSCAN, and logistic regression.

More details on the course can be checked here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+T-DS-03+V1

2. Generative AI Explained

This self-paced, free online course introduces generative AI fundamentals, which involve creating new content based on different inputs. Through this course, participants will grasp the concepts, applications, challenges, and prospects of generative AI.

Learning objectives include defining generative AI and its functioning, outlining diverse applications, and discussing the associated challenges and opportunities. All you need to participate is a basic understanding of machine learning and deep learning principles.

To learn the course and know more in detail check it out here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-NP-01+V1

3. Digital Fingerprinting with Morpheus

This one-hour course introduces participants to developing and deploying the NVIDIA digital fingerprinting AI workflow, providing complete data visibility and significantly reducing threat detection time.

Participants will gain hands-on experience with the NVIDIA Morpheus AI Framework, designed to accelerate GPU-based AI applications for filtering, processing, and classifying large volumes of streaming cybersecurity data.

Additionally, they will learn about the NVIDIA Triton Inference Server, an open-source tool that facilitates standardised deployment and execution of AI models across various workloads. No prerequisites are needed for this tutorial, although familiarity with defensive cybersecurity concepts and the Linux command line is beneficial.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+T-DS-02+V2/

4. Building A Brain in 10 Minutes

This course delves into neural networks’ foundations, drawing from biological and psychological insights. Its objectives are to elucidate how neural networks employ data for learning and to grasp the mathematical principles underlying a neuron’s functioning.

While anyone can execute the code provided to observe its operations, a solid grasp of fundamental Python 3 programming concepts—including functions, loops, dictionaries, and arrays—is advised. Additionally, familiarity with computing regression lines is also recommended.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+T-FX-01+V1/

5. An Introduction to CUDA

This course delves into the fundamentals of writing highly parallel CUDA kernels designed to execute on NVIDIA GPUs.

One can gain proficiency in several key areas: launching massively parallel CUDA kernels on NVIDIA GPUs, orchestrating parallel thread execution for large dataset processing, effectively managing memory transfers between the CPU and GPU, and utilising profiling techniques to analyse and optimise the performance of CUDA code.

Here is the link to know more about the course – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+T-AC-01+V1

6. Building A Brain in 10 Minutes

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+T-FX-01+V1/

7. Augment your LLM Using RAG

Retrieval Augmented Generation (RAG), devised by Facebook AI Research in 2020, offers a method to enhance a LLM output by incorporating real-time, domain-specific data, eliminating the need for model retraining. RAG integrates an information retrieval module with a response generator, forming an end-to-end architecture.

Drawing from NVIDIA’s internal practices, this introduction aims to provide a foundational understanding of RAG, including its retrieval mechanism and the essential components within NVIDIA’s AI Foundations framework. By grasping these fundamentals, you can initiate your exploration into LLM and RAG applications.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:NVIDIA+S-FX-16+v1/

8. Getting Started with AI on Jetson Nano

The NVIDIA Jetson Nano Developer Kit empowers makers, self-taught developers, and embedded technology enthusiasts worldwide with the capabilities of AI.

This user-friendly, yet powerful computer facilitates the execution of multiple neural networks simultaneously, enabling various applications such as image classification, object detection, segmentation, and speech processing.

Throughout the course, participants will utilise Jupyter iPython notebooks on Jetson Nano to construct a deep learning classification project employing computer vision models.

By the end of the course, individuals will possess the skills to develop their own deep learning classification and regression models leveraging the capabilities of the Jetson Nano.

Here is the link to know more about the course – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-RX-02+V2

9. Building Video AI Applications at the Edge on Jetson Nano

This self-paced online course aims to equip learners with skills in AI-based video understanding using the NVIDIA Jetson Nano Developer Kit. Through practical exercises and Python application samples in JupyterLab notebooks, participants will explore intelligent video analytics (IVA) applications leveraging the NVIDIA DeepStream SDK.

The course covers setting up the Jetson Nano, constructing end-to-end DeepStream pipelines for video analysis, integrating various input and output sources, configuring multiple video streams, and employing alternate inference engines like YOLO.

Prerequisites include basic Linux command line familiarity and understanding Python 3 programming concepts. The course leverages tools like DeepStream, TensorRT, and requires specific hardware components like the Jetson Nano Developer Kit. Assessment is conducted through multiple-choice questions, and a certificate is provided upon completion.

For this course, you will require hardware including the NVIDIA Jetson Nano Developer Kit or the 2GB version, along with compatible power supply, microSD card, USB data cable, and a USB webcam.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+S-IV-02+V2/

10. Build Custom 3D Scene Manipulator Tools on NVIDIA Omniverse

This course offers practical guidance on extending and enhancing 3D tools using the adaptable Omniverse platform. Taught by the Omniverse developer ecosystem team, participants will gain skills to develop advanced tools for creating physically accurate virtual worlds.

Through self-paced exercises, learners will delve into Python coding to craft custom scene manipulator tools within Omniverse. Key learning objectives include launching Omniverse Code, installing/enabling extensions, navigating the USD stage hierarchy, and creating widget manipulators for scale control.

The course also covers fixing broken manipulators and building specialised scale manipulators. Required tools include Omniverse Code, Visual Studio Code, and the Python Extension. Minimum hardware requirements comprise a desktop or laptop computer equipped with an Intel i7 Gen 5 or AMD Ryzen processor, along with an NVIDIA RTX Enabled GPU with 16GB of memory.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+S-OV-06+V1/

11. Getting Started with USD for Collaborative 3D Workflows

In this self-paced course, participants will delve into the creation of scenes using human-readable Universal Scene Description ASCII (USDA) files.

The programme is divided into two sections: USD Fundamentals, introducing OpenUSD without programming, and Advanced USD, using Python to generate USD files.

Participants will learn OpenUSD scene structures and gain hands-on experience with OpenUSD Composition Arcs, including overriding asset properties with Sublayers, combining assets with References, and creating diverse asset states using Variants.

To learn more about the details of the course, here is the link – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-FX-02+V1

12. Assemble a Simple Robot in Isaac Sim

This course offers a practical tutorial on assembling a basic two-wheel mobile robot using the ‘Assemble a Simple Robot’ guide within the Isaac Sim GPU platform. The tutorial spans around 30 minutes and covers key steps such as connecting a local streaming client to an Omniverse Isaac Sim server, loading a USD mock robot into the simulation environment, and configuring joint drives and properties for the robot’s movement.

Additionally, participants will learn to add articulations to the robot. By the end of the course, attendees will gain familiarity with the Isaac Sim interface and documentation necessary to initiate their own robot simulation projects.

The prerequisites for this course include a Windows or Linux computer capable of installing Omniverse Launcher and applications, along with adequate internet bandwidth for client/server streaming. The course is free of charge, with a duration of 30 minutes, focusing on Omniverse technology.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+T-OV-01+V1/

13. How to Build Open USD Applications for industrial twins

This course introduces the basics of the Omniverse development platform. One will learn how to get started building 3D applications and tools that deliver the functionality needed to support industrial use cases and workflows for aggregating and reviewing large facilities such as factories, warehouses, and more.

The learning objectives include building an application from a kit template, customising the application via settings, creating and modifying extensions, and expanding extension functionality with new features.

To learn the course and know more in detail check it out here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-13+V1

14. Disaster Risk Monitoring Using Satellite Imagery

Created in collaboration with the United Nations Satellite Centre, the course focuses on disaster risk monitoring using satellite imagery, teaching participants to create and implement deep learning models for automated flood detection. The skills gained aim to reduce costs, enhance efficiency, and improve the effectiveness of disaster management efforts.

Participants will learn to execute a machine learning workflow, process large satellite imagery data using hardware-accelerated tools, and apply transfer-learning for building cost-effective deep learning models.

The course also covers deploying models for near real-time analysis and utilising deep learning-based inference for flood event detection and response. Prerequisites include proficiency in Python 3, a basic understanding of machine learning and deep learning concepts, and an interest in satellite imagery manipulation.

To learn the course and know more in detail check it out here – https://courses.nvidia.com/courses/course-v1:DLI+S-ES-01+V1/

15. Introduction to AI in the Data Center

In this course, you will learn about AI use cases, machine learning, and deep learning workflows, as well as the architecture and history of GPUs. With a beginner-friendly approach, the course also covers deployment considerations for AI workloads in data centres, including infrastructure planning and multi-system clusters.

The course is tailored for IT professionals, system and network administrators, DevOps, and data centre professionals.

To learn the course and know more in detail check it out here – https://www.coursera.org/learn/introduction-ai-data-center

16. Fundamentals of Working with Open USD

In this course, participants will explore the foundational concepts of Universal Scene Description (OpenUSD), an open framework for detailed 3D environment creation and collaboration.

Participants will learn to use USD for non-destructive processes, efficient scene assembly with layers, and data separation for optimised 3D workflows across various industries.

Also, the session will cover Layering and Composition essentials, model hierarchy principles for efficient scene structuring, and Scene Graph Instancing for improved scene performance and organisation.

To know more about the course check it out here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-15+V1

17. Introduction to Physics-informed Machine Learning with Modulus

High-fidelity simulations in science and engineering are hindered by computational expense and time constraints, limiting their iterative use in design and optimisation.

NVIDIA Modulus, a physics machine learning platform, tackles these challenges by creating deep learning models that outperform traditional methods by up to 100,000 times, providing fast and accurate simulation results.

One will learn how Modulus integrates with the Omniverse Platform and how to use its API for data-driven and physics-driven problems, addressing challenges from deep learning to multi-physics simulations.

To learn the course and know more in detail check it out here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-04+V1

18. Introduction to DOCA for DPUs

The DOCA Software Framework, in partnership with BlueField DPUs, enables rapid application development, transforming networking, security, and storage performance.

This self-paced course covers DOCA fundamentals for accelerated data centre computing on DPUs, including visualising the framework paradigm, studying BlueField DPU specs, exploring sample applications, and identifying opportunities for DPU-accelerated computation.

One gains introductory knowledge to kickstart application development for enhanced data centre services.

To learn the course and know more in detail check it out here – https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-NP-01+V1

Additional Inputs Contributed – Gopika Raj

The post 18 Free AI Courses by NVIDIA in 2024 appeared first on AIM.

Meet the Team Spearheading OpenAI’s Safety and Security Committee

Vandana Nair — Sun, 02 Jun 2024 06:30:00 +0000

The announcement of OpenAI’s new Safety and Security Committee, tasked with crucial decision-making in OpenAI projects and operations got the internet buzzing, considering CEO Sam Altman is a part of it too.

The discussions revolved around a likely early arrival of GPT-5 and how the committee is a safety bunker for OpenAI. However, the most interesting aspect of this announcement seems to be the members on this committee.

In addition to being led by OpenAI Board directors, the group will also have technical and policy experts to guide them. With Altman in the lead, here’s the team spearheading OpenAI’s new Safety and Security Committee.

Bret Taylor

American entrepreneur and computer programmer Bret Taylor joined the board after Altman was reinstated as CEO following a brief ousting. Former co-CEO of Salesforce, Taylor comes with a vast experience of having also served on the board of tech companies such as Twitter and Shopify. He was also the co-creator of Google Maps.

Taylor has been Altman’s close friend who stood by him during last year’s ousting episode. Recently, Taylor and Larry Summers (another Board member) reacted sharply to Helen Toner’s (former board member who was removed after Altman’s reinstatement as CEO) accusation of Altman lying to the board multiple times and withholding information as some of the reasons for his ousting.

Taylor and Summers rejected the claims made by Toner and were disappointed at her for discussing these issues.

Adam D’Angelo

Adam D’Angelo, co-founder and CEO of Quora, also the former CTO of Facebook, joined the board as an independent director in 2018. He was the only board member whose position remained unaffected after Altman’s ousting and reinstatement as the CEO.

D’Angelo is also the founder of Poe, a platform for multi-chatbot interactions that allows users to interact from all the available LLMs in the market.

Jakub Pachocki

OpenAI’s new chief scientist, Jakub Pachocki, took over Ilya Sutskever’s role upon his exit. Leading OpenAI’s research efforts, Pachocki is one the technical experts on the new safety committee. In Sutskever’s exit announcement on X, Pachocki was referred to as having ‘excellent research leadership’.

Born in Poland, Pachocki excelled in programming contests during his studies and even won $10,000 at the Google Code Jam in 2012. Having studied computer science from the University of Warsaw in 2013, he did a PhD in the same subject from Carnegie Mellon University.

Interestingly, Pachocki took up the role of the director of research in October last year, a month before Altman’s sacking.

Ilya introduced me to the world of deep learning research, and has been a mentor to me, and a great collaborator for many years. His incredible vision for what deep learning could become was foundational to what OpenAI, and the field of AI, is today. I am deeply grateful to him… https://t.co/nsbMIOZHpS
— Jakub Pachocki (@merettm) May 14, 2024

John Schulman

One of the co-founders and head of security at OpenAI, John Schulman is a prominent researcher. At OpenAI, he is focussed on creating and improving algorithms that allow machines to learn from interactions with their environment.

Schulman pursued his undergraduate studies in physics at Caltech and later switched to neuroscience at UC Berkeley before completing his PhD in electrical engineering and computer sciences. His academic work laid the foundation for his future research in reinforcement learning and deep learning.

In a recent podcast with Dwarkesh Patel, Schulman spoke about his anticipation of AGI safety. “If AGI came way sooner than expected, we would definitely want to be careful about it. We might want to slow down a little bit on training and deployment until we’re pretty sure we know we can deal with it safely,” he said.

Matthew Knight

The head of security at OpenAI, Matthew Knight, joined the company in 2020. With a strong background in hardware, software, and wireless security, Knight leads the efforts to ensure the safety and security of OpenAI’s AI models and systems. This also includes ensuring the robustness of AI models against adversarial attacks.

Prior to joining OpenAI, Knight co-founded Agitator, a startup that developed secure and resilient dynamic radio frequency spectrum management technologies.

Lilian Weng

The head of safety systems at OpenAI, Lilian Weng, joined OpenAI in 2018 as a research scientist. At OpenAI, Weng’s work majorly focused on developing algorithms that enable machines to learn, adapt, and perform complex tasks autonomously.

Weng has contributed to the development of advanced reinforcement learning techniques, which are used to train AI agents to make decisions by interacting with their environment and learning from the outcomes of their actions.

She earned her PhD in electrical engineering and computer science from the Massachusetts Institute of Technology.

Aleksander Madry

The head of preparedness at OpenAI, Aleksander Madry, is a professor at MIT in the department of electrical engineering and computer science. He earned his PhD in computer science from MIT and has since become a leading figure in AI research, particularly focusing on machine learning, optimisation, and algorithmic robustness.

Nicole Seligman

A member of the board of directors at OpenAI, Nicole Seligman, is a corporate and civic leader and lawyer. Former EVP and general counsel at Sony Corporation, Seligman currently serves on three public company corporate boards – Paramount Global, MeiraGTx Holdings PLC, and Intuitive Machines Inc. Seligman has made significant contributions to the fields of law and corporate governance.

The post Meet the Team Spearheading OpenAI’s Safety and Security Committee appeared first on AIM.

10 Must Watch OpenAI GPT-4o Demos

Siddharth Jindal — Tue, 14 May 2024 13:30:00 +0000

At the OpenAI Spring Update, OpenAI CTO Mira Murati unveiled GPT-4o, a new flagship model that enriches its suite with ‘omni’ capabilities across text, vision, and audio, promising iterative rollouts to enhance both developer and consumer products in the coming weeks.

With GPT-4o, OpenAI trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. While introducing the model, OpenAI made several demonstrations to showcase its capabilities. Here, we have cherry-picked the top ones.

For customer service

This was a fun one! Take a look at 2 AI agents resolving a customer service claim with #OpenAI new #GPT4o.

Working with customers to build transformational solutions always gets me fired up. The potential solutions we can build with this new SOTA model has my head spinning! pic.twitter.com/86SNgNI6Tl
— Joe Beutler (@JoeBeutler) May 14, 2024

OpenAI’s GPT-4o is capable of engaging in natural and realistic voice conversations. This capability of ChatGPT makes it an ideal solution for building customer service chatbots, where two AI agents can collaborate to resolve customer service claims.

Real Time Translation

Live audience request for GPT-4o realtime translation pic.twitter.com/VSj5phFKM6
— OpenAI (@OpenAI) May 13, 2024

During the spring update event, OpenAI’s CTO, Mira Murati demonstrated the real-time translation capabilities of GPT-4o, successfully translating Italian to English and vice versa. This feature poses a significant threat to Google Translate and Duolingo, which offer similar services.

Interestingly, Duolingo stock fell 3.5%, wiping out ~$250M in market value, within minutes of OpenAI demoing the real-time translation capabilities of GPT-4o.

Human-Computer-Computer Interaction

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx
— Greg Brockman (@gdb) May 13, 2024

GPT-4o can reason across text, audio, and video in real-time. It’s extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction). In this demo, you can see how OpenAI President Greg Brockman moderated a conversation between two ChatGPTs.

AI Education and Tutor

This demo is insane.

A student shares their iPad screen with the new ChatGPT + GPT-4o, and the AI speaks with them and helps them learn in *realtime*.

Imagine giving this to every student in the world.

The future is so, so bright. pic.twitter.com/t14M4fDjwV
— Mckay Wrigley (@mckaywrigley) May 13, 2024

In another demo presented by Khan Academy, a student shared their screen with ChatGPT using GPT-4o. ChatGPT assisted the student step-by-step in solving a mathematical problem. Unlike providing the entire solution at once, ChatGPT guided the student towards the solution. Additionally, students can also share their notebooks using their mobile camera, and ChatGPT will be able to understand the content.

Meeting AI with GPT-4o

One demo that's easy to miss, but I think is significant in what it shows is likely to be possible soon, is this demo — GPT-4o for meetings: https://t.co/UeT5285R9c
— Greg Brockman (@gdb) May 13, 2024

GPT-4o, through the desktop, can join online meetings and moderate them as well, giving its own valuable inputs, which can be crucial in making decisions. Moreover, it can transcribe and summarize meeting discussions in real-time, ensuring that no important details are missed and providing a reliable reference for participants.

Assistant for Visually Impaired Individuals

GPT-4o as tested by @BeMyEyes: pic.twitter.com/WeAoVmxUFH
— Greg Brockman (@gdb) May 14, 2024

BemyEyes, a mobile app designed for visually impaired individuals, tested GPT-4’s vision capabilities to assist a visually impaired person in navigating the city. ChatGPT was able to accurately identify the location and minute details of the surroundings.

Unlike human volunteers who may not be available at all times, GPT-4o can offer continuous support, ensuring that visually impaired users have access to assistance whenever they need it.

Interview Prep

Interview prep with GPT-4o pic.twitter.com/st3LjUmywa
— OpenAI (@OpenAI) May 13, 2024

In this demonstration, ChatGPT helps a candidate prepare for an interview. Using the front camera, ChatGPT can tell whether the candidate is dressed appropriately. Moreover, it can also help with preparations by conducting mock interviews and providing feedback on answers, highlighting strengths and areas for improvement to enhance performance.

Jam with ChatGPT

Lullabies and whispers with GPT-4o pic.twitter.com/5T7ob0ItuM
— OpenAI (@OpenAI) May 13, 2024

GPT-4o has a surprise talent – it can sing! Users can request personalised songs for special occasions like birthdays, anniversaries, or just for fun. The chatbot can generate a variety of tunes and melodies based on emotions or specific details provided by the user, from soft whispers to energetic anthems.

AI Coding Assistant

Live demo of coding assistance and desktop app pic.twitter.com/GlSPDLJYsZ
— OpenAI (@OpenAI) May 13, 2024

OpenAI has introduced the ChatGPT app for desktop. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT, acting as your friendly, go-to colleague in times of crisis. This is like an AI assistant who is always there to help you out. It can help you out with any problem you come across from writing codes to brainstorming ideas.

Rock, Paper, Scissors with GPT-4o

6. Rock, Paper, Scissors with GPT-4o pic.twitter.com/oMuMRRbrKO
— Angry Tom (@AngryTomtweets) May 13, 2024

With ChatGPT, you can enjoy playing fun games like Rock, Paper, and Scissors, with ChatGPT as the perfect referee. It can also hype you up and cheer for you during the game.

The post 10 Must Watch OpenAI GPT-4o Demos appeared first on AIM.

Top 5 Reasons Why You Must Participate in Bhasha Techathon

Siddharth Jindal — Fri, 10 May 2024 05:36:29 +0000

India’s most exciting hackathon, Bhasha Techathon, is organised by Machine Hack in collaboration with Digital India Bhashini Division and Google Cloud to innovate technology solutions for Indian languages.

India, a land of vibrant cultures and diverse tongues, deserves to have its rich linguistic heritage reflected in the technological landscape.

Bhashini aims to develop robust AI models that understand and process Indian languages effectively. This paves the way for a more inclusive digital world where everyone, regardless of their primary language, can access information, engage with technology, and participate in the digital economy.

Here are the five reasons why you should participate in this hackathon.

[Continue reading until the end, where you’ll find our cheat sheet]

Addressing Crucial Language Challenges

Bhasha Techathon addresses six critical problem statements in NLP, ranging from voice-to-text applications to video-to-text conversions and the categorisation of complaints.

These challenges are not only technical but also highly relevant to real-world applications, providing participants with the opportunity to work on projects that have direct societal impacts, particularly in enhancing accessibility and understanding across India’s multitude of languages.

Open to All

One of the most compelling reasons to participate in the Bhasha Techathon is its inclusivity. Whether you are a student, a professional, or simply an AI enthusiast, the techathon welcomes individuals from all backgrounds. This inclusivity fosters a diverse environment where different perspectives and skills come together to innovate and solve complex problems.

Collaboration and Networking

Participants can either compete individually or as part of a team. This setup not only enhances collaboration, allowing individuals to learn from each other, but also provides a fantastic networking opportunity. Engaging with peers and industry leaders can open doors to future collaborations and career opportunities, especially as participants are invited to present their solutions to a jury of experts.

Career Advancement

The techathon is not just about winning; it’s about building and showcasing your capabilities. Participants gain hands-on experience with the latest technologies in AI and NLP, guided by the expertise of leaders from Google Cloud and MachineHack. This experience is invaluable and can significantly boost one’s career, providing exposure to practical applications of theoretical knowledge.

Recognition and Rewards

The rewards at Bhasha Techathon are substantial, with prize money offered to the top performers. However, beyond the financial incentives, participants gain recognition for their skills and innovations. This recognition can enhance their professional profile and open up further opportunities in the tech industry.

[Click here to participate now!]

[End Date: 15th May 2024]

A Cheatsheet for Bhasha Techathon Participants

Let’s break down each problem statement and provide key pointers on how to approach them:

Chatbot Assistance in Regional Languages for MOPR Users

Language Support: Integrate all 22 Indian scheduled languages, prioritize language selection functionality for user convenience.
NLP Integration: Train NLP models extensively on diverse datasets to ensure accurate understanding of queries in different regional languages.
Contextual Understanding: Develop algorithms that analyze user queries considering specific Panchayati Raj terminology and nuances.
Database Integration: Establish APIs to retrieve relevant information from Ministry of Panchayati Raj databases seamlessly.
User Interface Design: Design an intuitive chatbot interface with clear language selection options and instructions for users.
Testing and Evaluation: Conduct rigorous testing across languages, gather user feedback for continuous improvement.

Conversion of FAQs Section on the Website

Multilingual Support: Enable access to FAQs in all 22 Indian languages with a language selector for user preference.
Translation and Transliteration: Ensure accurate presentation of FAQ content using translation and transliteration techniques.
Interactive Chatbot: Implement language-specific interactive chatbots for real-time engagement.
NLP Capabilities: Integrate NLP for conversational understanding and response to user queries.
Search Functionality: Include language-specific search features for quick access to relevant information.
Multimedia Integration: Enhance FAQs with multimedia elements for enhanced user experience.

Voice to Text and Complaint Categorization through AI/ML

Voice-to-Text Conversion: Develop accurate voice message transcription in 22 Indian languages.
Text Embedding: Use techniques like Word2Vec for efficient complaint categorisation based on word relationships.
NLP Processing: Employ NLP for text preprocessing and feature extraction to improve complaint analysis accuracy.
Integration with CMS: Seamlessly integrate categorised complaints into existing systems for analysis and reporting.

Video-to-text and Complaint Categorization through AI/ML

Video-to-Text Conversion: Develop systems for accurate video transcription and consider multi-modal analysis for complaint understanding.
Complaint Categorization: Train AI/ML models to categorise transcribed text from videos using NLP techniques.
Embedding and NLP Processing: Utilize techniques like BERT for semantic understanding and sentiment analysis.
Integration with CMS: Ensure seamless integration of categorised complaints into existing systems for efficient processing.

CDSS in Multiple Indian Languages

CDSS Development: Create a comprehensive CDSS with multilingual support and adaptive recommendations.
Interface Design: Develop a user-friendly interface supporting all 22 Indian languages with customisation options.
Medical Terminology: Incorporate accurate medical terminology in each Indian language for precision.
Language-Adaptive Recommendations: Train the CDSS to deliver recommendations in chosen languages considering linguistic nuances.
Compliance: Ensure adherence to regulatory guidelines and standards for healthcare technologies in India.

What are you waiting for?

The Bhasha Techathon isn’t just a competition; it’s a call to action. It’s a chance to leverage your tech skills for the greater good while propelling yourself to the forefront of AI innovation. Imagine developing a language translation tool that empowers rural communities or a virtual assistant that speaks your native tongue. The possibilities are boundless!

The post Top 5 Reasons Why You Must Participate in Bhasha Techathon appeared first on AIM.

10 Best Online AI Courses for Free in 2024

Donna Eva — Mon, 29 Apr 2024 06:41:08 +0000

With AI enjoying unprecedented prominence across the globe, the need for material on how it exactly works has shot up remarkably. The good news is, the access to these best online AI courses has never been more open.

Several universities have shot to the top of the leaderboard in terms of offering courses in AI and data science. Nearly 75 universities figured in the 2024 QS World University Rankings for data science and artificial intelligence, compared to barely 20 in 2023.

This spells an increased interest not only in learning about AI but also in teaching it. But what does this spell for you?

Even if you’re already taking a course and are interested in further widening your horizons, there is an endless supply of online courses that you could take to upskill yourself. This would prove helpful especially with most jobs moving towards using AI in their daily functioning.

However, finding the perfect course that is both informative and affordable may be difficult. So, here is a rundown of some of the best courses on AI being offered for free right now.

Best Online AI Certification Courses Available for Free in 2024

Artificial Intelligence Course by MIT
Big Data, Artificial Intelligence and Ethics By University of California
Artificial Intelligence Courses by Harvard University
Machine Learning Specialisation by Stanford University
Machine Learning Foundations University of Washington
Artificial Intelligence by Georgia Institute of Technology
AI for Anyone Course by Google
Introduction to AI by Intel
AI foundations course by IBM
machine learning and AI course by AWS

1. Massachusetts Institute of Technology

MIT has made available Patrick Winston’s 6.034 Artificial Intelligence course on its website. The course runs through the basics of knowledge presentation, problem-solving and learning methods for AI.

It includes lectures from Prof Winston, as well as access to all assignments, examinations, readings, tutorials and demonstrations needed to complete the course. The course itself is self-paced and completely free.

Learn more about it here.

2. University of California, Davis

UCD is currently offering a course on ‘Big Data, Artificial Intelligence, and Ethics’ through Coursera. The course goes through opportunities available in big data, and how exactly AI works. It also advertises opportunities to interact with IBM Watson and a focus on understanding natural language processing.

Learn more about the course here.

3. Harvard University

Harvard offers several free courses on artificial intelligence, ranging from the basics of AI to its implications for business and policy. There are a total of seven courses available, with courses on data science, machine learning, Python and even the fundamentals of TinyML.

Learn more about the courses here.

4. Stanford University

Stanford University Online offers a course titled ‘Machine Learning Specialisation’ from the Stanford School of Engineering. The self-paced course is being offered through Coursera, where interested applicants can learn about all things ML from Andrew Ng.

The course includes modules on multi-linear progression, logistic regression, neural networks, and clustering among others.

Learn more about the course here.

5. University of Washington

The University of Washington is offering a course on ‘Machine Learning Foundations: A Case Study Approach’ through Coursera. The self-paced 18-hour course covers machine learning and deep learning concepts, as well as a rundown on Python programming.

Learn more about the course here.

6. Georgia Institute of Technology

Georgia Tech offers a short free course on artificial intelligence. Taught by Thad Starner, famous for his work on wearable computing, the two-hour course goes through the fundamentals of classical search, machine learning, pattern learning and probability. The course is currently being offered through Udacity.

Learn more about the course here.

If you’d prefer learning from the big leaguers themselves, several big-tech companies also offer free courses in the fundamentals of AI and machine learning.

7. Google

Google maintains a short ‘Google AI for Anyone’ course. The two-hour-long self-paced course talks about the fundamentals of AI, data learning and machine learning, and their relationships with each other.

It also takes the student through the understanding of neural networks, AI ethics, applications and implications of poor data.

Learn more about the course here.

8. Intel

Intel offers an eight-week long course on ‘Introduction to AI’. The course is thorough and goes through the history of AI to its usage in current times. However, the course requires a prior understanding of Python programming.

The course is aimed specifically at students, industry professionals from other science fields and developers.

Learn more about the course here.

9. IBM

IBM offers an AI foundations course in partnership with Coursera. The course runs through the fundamentals of AI, with a special focus on generative AI and the usage of chatbots.

Interestingly, it also offers a module on building AI-backed chatbots without programming. Like the UCD course, it also provides access to IBM Watson and is easily accessible to those with next to no knowledge of AI.

Learn more about the course here.

10. Amazon Web Services

AWS offers a free machine learning and AI course, complete with a learning plan. The ten-hour self-paced course is aimed at beginners. It offers input on the fundamentals of machine learning, terminologies and its use in businesses.

The course also includes an introduction to Amazon SageMaker, their own machine-learning platform.

Learn more about the course here.

The post 10 Best Online AI Courses for Free in 2024 appeared first on AIM.

Top 9 Semiconductor GCCs in 2024 India

Shyam Nandan Upadhyay — Mon, 15 Apr 2024 08:30:00 +0000

Semiconductor GCCs are on the rise in India. About 30% of the new GCCs set up in India during Q4 2023 were in the semiconductor space, signalling a growing interest in leveraging local talent for front-end design, performance testing, and post-silicon validation.

A closer look at the recent trends shows Bengaluru racing ahead in India’s semiconductor GCC landscape. The country’s own Silicon Valley hosts approximately 42% of all semiconductor GCC units and 61% of GCC talent in the country.

Hyderabad follows with 23% of the total units and 21% of the talent.

Here are the top semiconductor units in India.

1. Signature IP

Signature IP, a US-based company founded in 2021, is dedicated to advancing network-on-chip (NoC) technology. As one of the emerging semiconductor players in India, Signature IP established a Global Capability Center (GCC) in October 2023.

The company expanded its presence by inaugurating a new R&D centre in Bhubaneswar, with a focus on developing cutting-edge NoC solutions. The centre aims to foster collaboration with local universities, research institutions, and semiconductor companies to drive innovation and talent development in the NoC domain.

2. EdgeCortix

EdgeCortix, a Japan-based fabless semiconductor company, specialises in developing AI-specific processor architecture from the ground up. As one of the recent entrants in India’s semiconductor landscape, EdgeCortix has established a GCC in Hyderabad.

The company focuses on designing AI-specific processor architecture, offering a full-stack AI inference software development environment, run-time reconfigurable edge AI inference IP, and edge AI chips for boards and systems.

EdgeCortix’s flagship product, the Dynamic Neural Accelerator IP core, is scalable from 1024 to 32768 MACs and boasts a 16x improvement in inference/sec/watt compared to GPUs.

3. M31 Technology Corporation

M31 Technology Corporation is a Taiwan-based silicon IP provider that opened an R&D design centre in Bengaluru in October 2023. It focuses on IP development, IC design, and EDA, including memory compilers and standard cell library solutions.

The Bengaluru R&D centre is M31’s first international location for overseas R&D. The company has been awarded TSMC’s Best IP Partner Award for many consecutive years.

4. Micron Technology

Micron is investing $2.75 billion to build a semiconductor facility in Sanand, Gujarat. It will focus on the assembly and testing of DRAM and developing a 1 TB 232-Layer 3D TLC NAND Flash memory chip for diverse applications in domestic and international markets.

Construction is expected to begin this year, with Phase 1 (500,000 sq ft cleanroom) operational by late 2024. Phase 2, similar in scale to Phase 1, is slated to start in the latter half of the decade.

The project is expected to create up to 5,000 direct Micron jobs and 15,000 community jobs over the next several years. The company will also receive 50% fiscal support from the central government and 20% from the Gujarat government.

5. AMD

AMD recently inaugurated its largest global design centre in Bengaluru, and is planning to employ about 3,000 engineers.

The 500,000 sq ft AMD Technostar campus, with 60,000 sq ft of R&D labs, is part of AMD’s $400 million investment in India over five years. The centre will focus on high-performance CPUs, GPUs, SoCs and FPGAs.

6. Intel

Intel operates its GCC in India, with design and R&D centres playing a pivotal role in its global semiconductor operations. These centres are primarily involved in chip design and development activities.

Although Intel doesn’t currently manufacture chips in India, it has collaborated with prestigious academic institutions like IIT Bombay to foster semiconductor research and talent development.

For instance, Intel has established the Emsys Lab at IIT Bombay, concentrating on electronic and embedded system design, prototyping, evaluation, and hardware-accelerated simulation.

However, it remains open to the potential of future semiconductor manufacturing in India.

7. Texas Instruments

Texas Instruments (TI) was the first multinational company to establish a software design and R&D centre in India in 1985, located in Bengaluru. Over the past three decades, TI’s India centre has evolved into a critical R&D hub, with engineers contributing to almost every product developed globally by TI.

In 2002, TI India expanded its focus to include the design of 3G wireless chipsets and the development of Wireless LAN (WLAN) chipsets.

In 2005, TI India partnered with Indian manufacturer BPL to create the first cell phones. These were designed and manufactured in India, tailored to the specific needs of the Indian market and based on TI chipsets and reference designs.

In December 2010, TI established Kilby Labs in Bangalore, marking its first international expansion of the research program beyond the US. The labs focus on innovation in energy efficiency, bio-electronics, and life sciences, further solidifying TI’s commitment to technological advancement.

8. Nvidia

NVIDIA has established four engineering centres in India, including Bangalore and Delhi, employing a total of 4,000 engineers. This makes India the company’s second-largest talent pool after the United States.

It is actively collaborating with leading Indian companies such as Reliance and the Tata Group to establish advanced AI data centres and computing infrastructure within India.

The AI data centres will leverage NVIDIA’s next-generation GH200 Grace Hopper Superchip and DGX Cloud, an AI supercomputing service, to deliver exceptional performance and easy access to AI technology.

Additionally, Tata Communications and NVIDIA are jointly developing an AI cloud in India, utilising Tata’s global network to provide critical infrastructure for the next generation of computing and bring AI capabilities to enterprises.

9. Qualcomm

Qualcomm has made a significant investment of Rs 177.27 crore to enhance its presence in Chennai by establishing a new design centre facility. The new facility is expected to create employment opportunities for up to 1,600 professionals and will be instrumental in driving Qualcomm’s R&D efforts in 5G technology on a global scale.

With existing engineering centres in Bengaluru, Hyderabad, Chennai, and Delhi, Qualcomm boasts a workforce of 4,000 engineers in India, positioning the country as its second-largest talent pool after the US.

These Indian offices specialise in various domains such as wireless modem and multimedia software, DSP and embedded applications, and digital media networking solutions.

The post Top 9 Semiconductor GCCs in 2024 India appeared first on AIM.

Top 6 AI/ML Hackathons to Participate in 2024

Siddharth Jindal — Fri, 22 Mar 2024 12:00:07 +0000

Looking to dive into the exciting world of AI and machine learning? Fret not, as we list the top hackathons being organised globally this year. These hackathons offer a platform for tech enthusiasts, developers, and innovators to showcase their skills, collaborate on cutting-edge projects, and compete for exciting prizes.

Online Hackathon On Data-Driven Innovation For Citizen Grievance Redressal

Join the Online Hackathon on Data-driven Innovation for Citizen Grievance Redressal, organised by the Department of Administrative Reforms & Public Grievances (DARPG) of the Ministry of Personnel, Public Grievances & Pensions. The hackathon aims to address challenges in citizen grievance handling using data-driven solutions.

The top 3 most innovative solutions will be awarded cash prizes of Rs 2,00,000, Rs 1,00,000 and Rs 50,000, respectively. Participants, who could be students, researchers, startups, or even companies, can form teams of up to five members. Registration is open for those aged 18 and above. The teams must register on Janparichay and submit details on https://event.data.gov.in.

Selected entries will receive certificates, and DARPG will consider adopting the winning solutions for further development and implementation in the Citizen Grievance Redressal systems of the Government of India.

Data Science Student Championship

AI developers’ go-to platform, MachineHack, and Praxis Tech School are collectively calling upon the bright minds from engineering colleges and universities to participate in the third edition of the ‘Data Science Student Championship’. This collaboration invites undergraduate and postgraduate students from academic institutions across India to engage in the hackathon.

The two-month-long spectacle began on February 29 and will conclude on April 25. It promises the participants an exceptional platform to showcase their data science and problem analysis skills. The hackathon winners stand a chance to get: Rs 25,000 for the first prize, Rs 15,000 for the second place, and Rs 10,000 for the third position.

This data contest serves as a golden opportunity for students and academic researchers in various STEM fields to captivate the attention of premier firms. It’s the stage to unveil their capabilities, innovate, and make a mark for themselves in data science.

Google AI Hackathon

Participate in the Google AI Hackathon and build creative apps using generative AI tools with Gemini. The winners stand a chance to win up to $50,000 in prizes, along with recognition from Google and meetings with the Google Labs team. Submit your code repository URL and a 3-minute demo video showcasing your app.

Prizes will be awarded for creativity, business value, technical implementation, and community impact. Join this global hackathon to showcase your skills, innovate with AI, and win valuable rewards.

Bhasha Techathon

In collaboration with Google Cloud and MachineHack, Bhashini presents Bhasha Techathon, where innovation converges with impact. The techathon invites participants to address six problem statements in the field of NLP. The goal is to cultivate effective and indigenous solutions to language-specific challenges.

The techathon is scheduled to take place between March 8 and April 21, 2024. It is open for a diverse range of participants, including working professionals, startups, entrepreneurs, students, innovators, and freelancers.

ISB Hackathon 2024

ISB Institute of Data Science, in collaboration with the CyberPeace Foundation, is organising Hackathon 2024 from March to July 2024. This hackathon is focused on leveraging artificial intelligence and deep learning techniques to address the growing challenge of detecting deep fake images, videos, and text. Teams of one to five participants will have the opportunity to develop innovative solutions for this critical issue in today’s digital world.

The hackathon will follow a structured schedule, starting with team registration and a workshop to understand the problem statement. Participants will then receive the data set for Round 1, where they will submit their solutions to improve model efficiency. Top teams will be shortlisted for a presentation at ISB Hyderabad, where they will work on Round 2 and present their solutions to the jury. The winning team will be announced at the end of the hackathon.

Advanced RAG Hackathon

Advanced RAG Hackathon invites developers to build RAG applications and chatbots using platforms like Vectara, LlamaIndex, Together AI, and Unstructured.io. The event features one week of intensive online development, including workshops and mentorship sessions, providing participants with the tools and guidance needed to succeed.

The hackathon will start on April 12 on the lablab.ai platform and discord server. With a prize pool of $14,000 (including $6,500 in cash) and special prizes from sponsors like LlamaIndex, Unstructured.io, Vectara, and Together AI, participants have the chance to win rewards and recognition for their creations.

The hackathon also offers networking opportunities with fellow developers and startup enthusiasts, fostering collaboration and idea sharing.

The post Top 6 AI/ML Hackathons to Participate in 2024 appeared first on AIM.

What’s Devin Up to?

K L Krithika — Sun, 17 Mar 2024 05:30:00 +0000

Devin, the world’s first AI software engineer, has been quite busy performing endlessly various end-to-end tasks, from debugging code repositories to fine-tuning large language models.

It has also been helping select developers work more efficiently by automating tasks and assisting in testing, debugging, and deploying applications. Devin’s capabilities span multiple domains, making it a versatile tool for software development.

As AI continues to advance, tools like Devin will play an important role in the future of software development. Let’s look at what it is capable of and what it has been doing so far:

Devin Likes to Debug and Test

Devin excels at debugging and testing code in open-source repositories. It seamlessly navigates through the codebase, writes comprehensive test cases, and employs advanced debugging techniques to identify and resolve issues when presented with a specific bug. By leveraging print statements and re-running tests, the AI software engineer ensures that fixes are effective and no new problems are introduced, saving developers valuable time and effort.

Devin Likes to Fine-tune Large Language Models

Fine-tuning large language models, such as the 7B llama model, becomes a breeze with Devin. By cloning repositories, setting up dependencies, and running training jobs, it streamlines the process of adapting models to specific tasks. When faced with challenges like CUDA issues, Devin troubleshoots by examining the environment and reinstalling packages, ensuring smooth training progress and providing regular status updates.

Devin Knows How to Set Up Computer Vision Models

Devin proves its worth by taking on complex Upwork jobs, such as setting up computer vision models. Given a job description, it sets up the necessary repository, resolves versioning issues, and processes images from the internet to run through the model. Through meticulous debugging and code fixes, the AI software engineer generates sample outputs and provides comprehensive reports, delivering high-quality work that exceeds client expectations.

Devin Enhances User Experience in Open-Source Tools

Open-source tools often face user experience challenges, but Devin is here to help. By cloning repositories, understanding codebases, and addressing specific issues, it improves user experiences in minutes. With its ability to install dependencies, make code changes, and thoroughly test modifications, the AI software engineer ensures open-source tools become more user-friendly and accessible to a wider audience.

Devin Generates Images from Blog Posts

Devin demonstrates its versatility by generating images based on blog post instructions. By reading and comprehending blog content, it identifies and fixes edge cases and bugs, creating stunning visuals like personalised desktop backgrounds. With its ability to generate bonus images, the AI software engineer adds creativity and originality to the output.

Devin Can Develop Web-Based Games

Devin demonstrates its proficiency in creating engaging web-based games, such as the Game of Life. When given specific requirements, it efficiently sets up a React application, writes clean and efficient code, and deploys the game using platforms like Netlify. It continuously enhances the game based on user feedback, adding features and fixing bugs. Devin ensures the game is responsive and interactive across devices, allowing developers to focus on the creative aspects of game design while it handles the technical implementation, bringing game ideas to life quickly.

Devin Knows How to Fix Bugs in Open-Source Libraries

Devin shines when fixing bugs in open-source libraries. It diagnoses issues precisely by setting up repositories, reproducing buggy outputs, and identifying relevant code. Through careful code modifications, debug output cleanup, and thorough testing; the AI software engineer ensures that bugs are squashed and libraries remain stable and reliable.

Devin Does Data Analysis and Simplifies Visualisation

Devin simplifies data analysis and visualisation tasks, even when faced with challenging data formats and geospatial complexities. By reading documentation, performing exploratory data analysis, and processing data from various sources, it can create informative and visually appealing visualisations. With its ability to respond to user requests and deploy applications, the AI software engineer makes data insights accessible and interactive.

The post What’s Devin Up to? appeared first on AIM.

10 Underrated Women in AI to Watchout For

K L Krithika — Mon, 11 Mar 2024 12:33:06 +0000

Recent data shows that women hold approximately 26.7% of technology-related jobs. Despite the critical role technology and artificial intelligence play in shaping our future, the presence of women in these fields, especially in leadership positions, remains abysmally low.

However, despite these challenges, (the few) women in AI are pioneering change, breaking barriers, and paving the way for future generations. Their work not only contributes to technological advancement but also ensures that the development and application of AI is inclusive, equitable, and representative of the diverse society it serves.

While the media constantly covers the tech elites, here is a list of 10 underrated women whose works range from spreading awareness about AI to building them and ensuring its ethical use.

Aishwarya Srinivasan

Aishwarya Srinivasan currently works as a senior AI advocate within the Microsoft for Startups group at Microsoft, supporting startups in developing machine learning solutions. She recently became an angel investor with Hustle Fund and DynamoFL, after founding Illuminate AI, a nonprofit dedicated to mentoring in AI, in March 2021.

Prior to joining Microsoft, she was a data scientist at Google Cloud and an AI & ML innovation leader at IBM Data & AI.

Aishwarya has a postgraduate degree in data science from Columbia University. She has global work experience having engaged with clients and led projects in London, Dubai, Istanbul, and India. She holds a patent awarded in 2018 for developing a reinforcement learning model for machine trading.

Tulsee Doshi

Tulsee Doshi leads Google’s Responsible AI & Human-Centred Technology Organisation. Her work focuses on incorporating ethical considerations into product development and policy, specifically aimed at creating equitable and transparent user experiences. At Google, Doshi manages a team dedicated to improving fairness, safety, and inclusivity across products.

Prior to her current positions, Doshi taught a product leadership course at Product School and led projects at YouTube to enhance inclusive machine learning and creator diversity. She started at Google as an associate product manager, focusing on making Search features more relevant worldwide.

Doshi holds a bachelor’s degree in symbolic systems and a master of science in artificial intelligence from Stanford University. At Stanford’s HCI Lab, she was involved in research on crowdsourcing and expert collaboration technologies, earning a Best Paper Award at UIST 2014.

Ritu Raman

Ritu Raman is the d’Arbeloff assistant professor of mechanical engineering at MIT, where she leads a lab focused on developing adaptive living materials, with current projects centred on engineering biological actuators. This research aims to enhance machine functionality and restore mobility in humans. Raman’s work involves integrating living muscular and neural tissues to create actuators that autonomously adjust to environmental changes.

Her academic background includes a bachelor’s in mechanical engineering with a minor in biomedical engineering from Cornell University, followed by an MS and PhD in mechanical engineering from the University of Illinois at Urbana-Champaign.

Isabelle Guyon

Isabelle Guyon is a director and research scientist at Google, on leave from her role as a professor of artificial intelligence at Université Paris-Saclay (Orsay), specialising in data-centric AI, statistical data analysis, pattern recognition, and machine learning.

Before her current position at Google, Guyon worked as an independent consultant and a researcher at AT&T Bell Laboratories. There, she made significant advancements in neural networks for pen computer interfaces, collaborating with Yann LeCun and Yoshua Bengio. She is the primary inventor of SVM-RFE, a variable selection technique based on SVM, widely cited and used as a benchmark for new feature selection methods.

Guyon holds a PhD degree in physical sciences from the University Pierre and Marie Curie, Paris.

Olga Russakovsky

Olga Russakovsky, Ukrainian-American, completed her PhD in computer vision at Stanford University in 2015, working closely with Fei-Fei Li. Together, they developed ImageNet, a comprehensive image database pivotal for advancements in computer vision. Her research focused on reducing image classification’s dependency on human annotators and addressing human bias in algorithm development.

Following her PhD, Russakovsky continued her research as a postdoctoral fellow at Carnegie Mellon University and is now an associate professor at Princeton University. Her work emphasises the importance of algorithmic fairness in visual recognition systems and has proposed computational solutions to mitigate historical and societal biases.

Additionally, her significant contributions to the field include leading the Imagenet Large Scale Visual Recognition Challenge, with the founding paper cited over 13,000 times.

Devi Parikh

Devi Parikh, who is the senior director of generative AI at Meta, worked on developing Make-A-Video 3D in 2023 and Make-A-Video in 2022 that propelled the text-to-video generation. In the same year, she introduced AudioGen, a novel text-to-audio generation tool, and Make-A-Scene, which allows for more creative control in AI-generated images.

These contributions follow her earlier endeavours to humanise AI research, evident in her 2020 projects ‘Humans of AI: Stories, Not Stats’ and ‘AI Paygrades’, aimed at demystifying the AI research community and promoting transparency in AI industry hiring practices, respectively.

Aude Oliva

Aude Oliva serves as the director in the MIT-IBM Watson AI Lab and director of strategic industry engagement in the MIT Schwarzman College of Computing. Her latest research uses deep learning to enable computers to recognize locations within images based on their composite features, such as identifying a bedroom from the presence of a bed, window, and posters, or a kitchen from a stove, tile, and countertop.

Oliva’s significant contributions include her work on hybrid images. This work supports applications in information privacy, time-lapses, marketing, and brainteasers. Additionally, she explores the psychological perception of images, focusing on memorability, content, and the human visual system’s limitations.

Daphne Koller

Daphne Koller, an Israeli-American computer scientist, co-founded Coursera in 2012. Before Coursera, she worked on probabilistic models with applications in various domains, including computer vision and computational biology. Koller was a Stanford University professor and received the ACM-Infosys Foundation Award in Computing Sciences in 2008, the first recipient of this $150,000 award.

She became a MacArthur Fellow in 2004, recognized for her innovative work in AI. After Stanford, she pursued postdoctoral research at UC Berkeley. Koller has been honoured by being elected to the National Academy of Engineering in 2011, the American Academy of Arts and Sciences in 2014, and the National Academy of Sciences in 2023.

In recent years, Koller left Coursera to focus on new ventures in biotech, founding Insitro in 2018, a drug discovery startup leveraging machine learning and genomics. In 2020, she co-founded Engageli, offering an innovative online learning platform.

Cynthia Rudin

Cynthia Diane Rudin is a professor at Duke University, where she directs the Interpretable Machine Learning Lab. In 2022, she was honoured with the Squirrel AI Award for her contributions to transparent AI systems in critical domains. She also received the Guggenheim Fellowship in the same year and was elected a Fellow of the Association for the Advancement of Artificial Intelligence.

Her research includes developing the Series Finder algorithm for crime series detection and scoring systems for medical diagnosis. Before her tenure at Duke, Rudin was a faculty member at the MIT Sloan School of Management and held research positions at New York University and Columbia University.
She completed her PhD in applied and computational mathematics at Princeton University in 2004 and has been recognised as one of the most impressive professors at MIT by Business Insider in 2015.

Daniela L Rus

Daniela L Rus is the director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and holds the Andrew and Erna Viterbi Professorship in the Department of Electrical Engineering and Computer Science at MIT.

Rus’s contributions address challenges in machine learning such as data quality, bias, and adaptability of systems, alongside innovations in robotics like soft robotics, modular robots, and autonomous vehicle algorithms. She has developed technologies to assist the physically disabled.

Rus was born in Romania and moved to the United States, earning her bachelor’s degree at the University of Iowa and her PhD at Cornell University. She started her academic career at Dartmouth College before moving to MIT in 2004.

Join Rising 2024, the largest summit on Diversity and Inclusion in India, taking place on April 4-5 in Bangalore. Grab your passes now.

The post 10 Underrated Women in AI to Watchout For appeared first on AIM.

Top 10 Alternatives to OpenAI’s Sora

Siddharth Jindal — Fri, 08 Mar 2024 11:00:00 +0000

As LLMs advance, video-generation capabilities emerge as the next frontier. OpenAI’s Sora has truly impressed with its hyper-realistic video generation skills. Here, we present some compelling alternatives that you can use and experiment with.

RunwayML Gen 2

RunwayML Gen 2 allows users to create entire worlds, animations, and stories simply by providing text descriptions. Users can also experiment with reference images, utilising various prompting modes and advanced settings to fine-tune their creative process.

The recent addition of the Multi-Motion Brush enhances control over motion within generated videos. Gen-2 is accessible on both the Runway web platform and their mobile app, providing flexibility for creative endeavours on the go.

Users can preview and download generated videos, selecting the one that aligns with their vision. However, considerations include cost implications, with Gen-2 operating on a credit system, and each second of video generation priced at $.05.

Pika

Pika Labs is an AI text-to-video tool that enables users to create videos and animations from simple text prompts. Pika can generate videos in various styles, ranging from cartoons and anime to cinematic formats. Not confined solely to text-to-video conversion, Pika can also transform images into videos and perform video-to-video conversions.

Recently, Pika introduced a lip-sync feature, allowing users to add voice to characters, with Pika seamlessly syncing words to their movements. Additional features include ‘modify region’ and ‘expand canvas’.

Lumiere

Lumiere is the closest competitor to Sora from Google DeepMind, as it, too, creates realistic and coherent videos directly from textual descriptions, with a duration of up to five seconds.

In contrast to many text-to-video models that generate videos frame-by-frame, Lumiere employs a Space-Time Diffusion Model. This approach allows Lumiere to generate the entire video’s duration in one go, ensuring better coherence and consistency throughout.

Lumiere stands out with unique features, including image-to-video generation, stylised generation, cinemagraphs, and inpainting, setting it apart from other models in terms of versatility and customisation options.

Imagen Video

Imagen Video from Google is a text-conditional video generation system based on a cascade of video diffusion models. This model can produce 1280×768 videos at 24 frames per second. Not only does the model create top-notch videos, but it also offers a high level of control and a broad understanding of the world.

It can produce a variety of videos and text animations in different artistic styles, showcasing a solid grasp of 3D objects.

Emu Video

Meta’s Emu Video allows you to create short videos based on text descriptions. It utilises a diffusion model approach. This means it starts with a noisy image and progressively refines it based on the text prompt until it generates the final video frame by frame

It employs a two-step process: First, an image is generated based on the text prompt. Then, using that image and the prompt again, the model creates a multi-frame video

This model produces visually striking 512×512 four-second videos at 16 frames per second, outperforming models like Make-a-Video, Imagen-Vide, Cog Video, Gen2 and Pika.

CogVideo

A team of researchers from the University of Tsinghua in Beijing has introduced CogVideo, a large-scale pretrained text-to-video generative model. CogVideo employs a multi-frame-rate hierarchical training strategy and builds upon a pre-trained text-to-image model known as CogView2.

VideoPoet

VideoPoet is an LLM developed by Google Research specifically for video generation. It can generate two-second videos based on various input formats, including text descriptions, existing images, videos, and audio clips.

VideoPoet offers some level of control over the generation process. You can experiment with different text prompts, reference images, or adjust specific settings to refine the final video output. Moreover, it offers features such as zero-shot stylization and applying visual effects.

Stable Video Diffusion

Stable Video Diffusion from Stability AI is an open-source tool that transforms text and image inputs into vivid scenes, elevating concepts into live-action cinematic creations. It comes with two image-to-video models that can create 14 and 25 frames, offering customisable frame rates from 3 to 30 frames per second.

Make A Video

Developed by Meta AI, Make-A-Video translates progress in Text-to-Image (T2I) generation to Text-to-Video (T2V) without requiring text-video data. It learns visual and multimodal representations from paired text-image data and motion from unsupervised video footage.

Magic VideoV2

ByteDance’s Magic Video 2, also known as MagicVideo, is an efficient video-generation framework based on latent diffusion models. MagicVideo-V2 integrates text-to-image, image-to-video, video-to-video, and video frame interpolation, providing a new strategy for generating smooth and highly aesthetic videos.

The post Top 10 Alternatives to OpenAI’s Sora appeared first on AIM.

Top 10 Gadgets at MWC 2024

K L Krithika — Thu, 29 Feb 2024 11:35:30 +0000

The Mobile World Congress (MWC), an annual event organised by the GSMA, showcased the latest in mobile technology, including smartphones, services, and advancements in 5G and artificial intelligence. Held in Barcelona, Spain, it is the largest exhibition for the mobile industry, attracting global participants from the tech community.

This year, it hosted a plethora of devices with AI integration being the common thread. One standout example of AI’s application was the Honor Magic V2, where eye-tracking technology allows users to interact with their device in a hands-free manner.

This feature, along with other AI-driven innovations presented at MWC, underscores the industry’s shift towards creating more personalised and efficient user experiences.

Here is a list of top 10 gadgets showcased at MWC this year.

HMD Barbie Phone

HMD, initially celebrated for reviving Nokia phones at MWC 2017, struggled to compete with giants like Samsung and Apple, shifting its focus to budget Android and feature phones. At MWC 2024, it announced its first profitable year in 2023 and introduced a rebranding strategy, adopting ‘Human Mobile Devices’ as its new moniker.

Its 2024 device lineup featured a classic Nokia model, and a unique Barbie flip phone developed in collaboration with Mattel, targeting a summer release as a pink, digital detox tool. As for the non-Barbie phones, the company has plans for those, too, though no details are available at the moment.

Motorola Debuts Smart Connect

Motorola showed off its innovative Adaptive Display concept phone, a departure from traditional designs with its bendable structure, allowing it to be bent backward.

Motorola also introduced Smart Connect, a collaborative effort with Lenovo that builds upon the Ready for platform. This new feature allows for wireless connection between a Motorola phone and nearby displays, including Lenovo tablets and Windows laptops available through the Microsoft Store, enhancing productivity and inter-device usability.

OnePlus Watch 2

In the spotlight at MWC 2024 was the OnePlus Watch 2, which boasts a significant improvement over its predecessor. A standout feature of the OnePlus Watch 2 is its dual operating system capability, powered by two distinct processors. It operates on Google Wear OS with the Qualcomm Snapdragon W5 Gen 1 chipset for demanding tasks such as navigation, music playback, and app usage.

The OnePlus Watch 2, priced at $300, is currently available for preorder and will officially go on sale on March 4. OnePlus is also offering a promotional discount of $50 for those trading in any watch, including analog models, towards the purchase.

A Transparent Laptop from Lenovo

Lenovo introduced a concept at MWC 2024, known as Project Crystal, a transparent laptop. While it’s not slated for immediate release, the concept showcases a glimpse into the future of laptop design. The laptop’s Micro-LED transparent screen offers a futuristic look, allowing users to see through the device while still providing a bright display for normal app usage.

However, this transparency means that others can see the user’s screen, posing privacy concerns. Lenovo mentioned the potential for adjusting the screen’s transmissivity to create an opaque layer for privacy, though such features were not demonstrated.

Samsung Galaxy Ring

Samsung unveiled its latest wearable, the Galaxy ring, at the Mobile World Congress. This is the first time it was showcased to the public. This smart ring, designed to monitor health data and provide insights based on daily and nightly metrics, will expand Samsung’s wearable market. The Galaxy ring can monitor temperature, heart rate, respiratory rate, sleep movement, and time taken to fall asleep.

Interestingly, the Galaxy ring will also offer payment capabilities, distinguishing it from other smart rings that focus solely on health or fitness tracking. The ring is available in black, gold, and silver, and comes in nine sizes, accompanied by a sizing kit. Its price in India starts from ₹24,599 and is set to be released later this year.

Honor Magic V2

Honor displayed its new devices, the Magic 6 Pro and Magic V2 RSR smartphones, and the MagicBook Pro 16 laptop, heavily emphasising AI features that aren’t necessarily driven by actual artificial intelligence. A notable demonstration featured the eye-tracking technology on the Magic 6 Pro, which allows users to expand notifications by simply looking at them.

This feature, expected to be added via a software update, was highlighted through an unusual demo where the technology was used to control a car (an Alfa Romeo), with options like Engine Start and Stop, Forward, and Backward, showcasing the potential of eye-tracking for hands-free device interaction.

TCL’s NXTPaper 5G and Portable 5G Dongle

TCL introduced a new addition to its NXTPaper range, the TCL 50 XL NXTPaper 5G, featuring a 6.8-inch screen with a 120-Hz refresh rate designed to mimic paper. Despite its modest specs, its $229 price point is aimed at readers preferring to engage with digital content on their phones.

Additionally, TCL unveiled the NXTPAPER 14 Pro, equipped with the same eye-friendly technology in a larger 14-inch display, targeting productivity users with its MediaTek Dimensity 8020 processor, 12 GB of RAM, a 12,000-mAh battery, and 256 GB of storage.

ZTE 5G+AI Eyewear-free 3D Tablet

Nubia introduced its latest devices on the ZTE stage, despite emphasising its independence from ZTE. The highlights include the Nubia Flip, the brand’s first foldable phone, featuring a 6.9-inch 120-Hz display that folds to a compact size and sports a unique circular screen on the front. Priced at $599, it offers a Snapdragon 7 Gen 1 processor and unique features like a 3D interactive pet and extensive customization options.

Another significant release was the Nubia Pad 3D II, a tablet capable of displaying 3D content without glasses through eye-tracking technology. This new version introduces 5G connectivity and incorporates “AI concepts” for enhanced functionality, including dual cameras for 3D content creation and an AI feature that converts 2D to 3D content.

Xiaomi 14 Smartphone

Xiaomi unveiled its flagship Xiaomi 14 Ultra at the Mobile World Congress. This model enhances its predecessor’s capabilities, offering an unparalleled display, a more durable construction. However, the high cost and specific target market of photography enthusiasts might limit its appeal. The device, starting at €1,499, with an optional Photography Kit for €199, introduces HyperOS, an interface designed to refine user experience.

Additionally, Xiaomi unveiled other devices including the Xiaomi Pad 6S Pro, Xiaomi Smart Band 8 Pro, and the Watch S3 with HyperOS, alongside the Xiaomi Watch 2 running on Google’s Wear OS. While these products won’t reach the US market, their launch in Europe demonstrates Xiaomi’s strategic global expansion.

Humane AI

The Humane Ai pin, introduced a few months ago, was on display at the MWC. The wearable device aims towards a future less dependent on smartphones. It is designed by former Apple employees, and aims for a screen-free existence, blending seamlessly into personal attire while offering sophisticated AI functionalities.

Priced at $699, with a $24 monthly subscription for connectivity and AI services, the Ai Pin operates through voice and gesture interactions, supporting up to 50 languages and adapting to local languages automatically. Humane emphasises privacy with features like an LED indicator for the camera and encrypted data management, marking the Ai Pin as an innovative step towards integrating AI into daily life without screens.

The post Top 10 Gadgets at MWC 2024 appeared first on AIM.