UHG
Search
Close this search box.

Why AI Keeps Creating Body Horror

Dissing Dream Machine’s gymnast video, Yann LeCun implied that it’s nearly impossible for video generation models to generate anything physics-based.

Share

Table of Content

Luma AI’s Dream Machine has some pretty impressive capabilities, but its most interesting one lies in creating body horror.

While many have succeeded in jailbreaking the relatively new video generation model to generate gory or NSFW videos, most have inadvertently faced some pretty shocking results.

This isn’t uncommon, as generative AI has been pretty notorious for creating nightmare fuel when it comes to generating humans. From generating too many fingers to messing up basic body proportions and fusing faces, users have been pointing out these flaws with the first iterations of DALL-E, Midjourney and Stable Diffusion.

Responding to Dream Machine’s attempt at generating the video of a gymnast, Meta’s chief AI scientist Yann LeCun implied that currently, it’s nearly impossible for video generation models to generate anything physics-based.

Are We Doomed to Have AI Mess Ups?

Early image generation models largely relied on layering several images and finetuning them to create a prompt-relevant image, this resulted in the models often mistaking hands and other body parts for something else.

This is in both parts due to the dataset that the model relies on as well as how the model goes about identifying different parts, resulting in pretty outlandish hallucinations.

Responding to a query from Buzzfeed last year, Stability AI explained the reason behind this. “It’s generally understood that within AI datasets, human images display hands less visibly than they do faces. Hands also tend to be much smaller in the source images, as they are relatively rarely visible in large form,” a spokesperson said.

Midjourney and other image generation models, over time, have managed to rectify these issues, through refining their datasets to focus on certain aspects and improving the model’s capabilities.

Just like image generation models got better, LeCun conceded that video generation models, too, would improve. However, his bold prediction was that systems that would be able to understand physics would not be generative.

“Video generation systems will get better with time, no doubt. But learning systems that actually understand physics will not be generative. All birds and mammals understand physics better than any video generation system. Yet none of them can generate detailed videos,” he said.

Forget the Horrors, What About Physics?

While the body horror aspects of AI-generated content have garnered significant attention, the more fundamental challenge lies in creating AI systems that truly understand and replicate real-world physics.

As LeCun points out, even the most advanced video generation models struggle with basic physical principles that animals intuitively grasp. Maybe improving this could solve the issue of body horror altogether.

This goes beyond just aesthetics or generating uncanny valley humans. A core challenge with AI, which includes achieving AGI, is trying to bridge the gap between pattern recognition and a genuine understanding of how the world works.

Current generative models excel at producing visually convincing imagery, but, as LeCun and many others have pointed out, they lack the underlying comprehension of cause and effect, motion, and physical interactions that govern our reality.

Addressing this challenge could require a shift in approach. Rather than focusing solely on improving generative capabilities, researchers might need to develop new architectures that can learn and apply physical principles.

This could involve incorporating physics engines, simulations, or novel training methods that emphasise understanding over mere reproduction. Maybe even trying to incorporate 3D models within datasets to give them a better understanding of how objects, including human bodies, could move in certain situations.

Though lesser known, we already have models like MotionCraft, PhyDiff and MultiPhys which make use of physics simulators and 3D models.

The future of AI in visual content creation may not lie in increasingly realistic generative models but in systems that can reason about and manipulate physical concepts. These advancements could lead to AI that avoids body horror and also produces generations that are fundamentally more coherent and aligned with our physical world.

📣 Want to advertise in AIM? Book here

Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.
Flagship Events
Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.