The AI godfather, Yann LeCun, is usually right about most things AI and even AGI to an extent.
The ripostes between LeCun and Elon Musk started after the latter invited people to join xAI’s mission after the recent funding announcement seems to be not ending anytime soon.
“Join xAI if you can stand a boss who claims that what you are working on will be solved next year (no pressure),” responded LeCun, advising interested candidates against joining Musk’s company.
Further, he said that he likes Musk’s cars, rockets, solar panels, and satellite networks but dislikes his vengeful politics, conspiracy theories, and hype.
LeCun believes he is politically correct because he is a “scientist, not a business or product person,” unlike Musk, who built Tesla and uses CNNs, aka ConvNets, developed by LeCun.
However, Musk replied that they “don’t use CNNs much these days, tbh”.
This left LeCun perplexed; he asked how Tesla does real-time image understanding in FSD without “ConvNets, TBH”.
Musk is yet to respond. It is highly unlikely that Tesla is using anything other than CNNs, and if not CNNs, it is most likely using Google’s Visual Transformer.
Coincidentally, Meta recently released an in-depth introduction to Vision-Language Models, which promise transformative capabilities in image processing and navigation through advanced spatial and contextual understanding.
Joke’s on Musk
Musk shouldn’t have questioned LeCun’s contribution in AI by asking how much research he conducted “in the last five years,”
LeCun, being LeCun, candidly replied, saying: “Over 80 technical papers published since January 2022.”
“One of these papers introduced convolutional neural networks (ConvNets) in 1989. Every single driving assistance system today uses ConvNets. That includes MobilEye (since 2014), Nvidia, Tesla, and just about everyone else. Technological marvels don’t just pop out of the vacuum,” he said.
To this Musk didn’t reply anything and went silent. It highly seems unlikely that Tesla is not using CNN. Condolences to the Tesla FSD team who have to ship a version without CNNs by next week,” joked a user on X.
CNNs also have some limitations. They can be sensitive to variations in the input data, such as changes in lighting, orientation, and scale. This can affect their performance in real-world scenarios where such variations are common. Moreover, while CNNs are good at capturing local spatial relationships, they may struggle with understanding global spatial relationships and context within an image.
On the other hand, Vision Transformers (ViTs) apply the transformer architecture, originally designed for natural language processing (NLP), to computer vision tasks. This approach diverges from traditional Convolutional Neural Networks (CNNs) by focusing on global relationships within an image rather than local features.
In ViTs, images are represented as sequences, and class labels for the image are predicted, which allows models to learn image structure independently. Input images are treated as a sequence of patches where every patch is flattened into a single vector by concatenating the channels of all pixels in a patch and then linearly projecting it to the desired input dimension.
Google even claimed that their ViT outperforms state-of-the-art CNN with four times fewer computational resources when trained on sufficient data.
LeCun is All You Need
Not everyone agrees that Vision Transformers are better than CNN. “Entertainment aside, getting rid of CNNs for real-world AI deployment is almost impossible, ” said Perplexity AI founder Aravind Srinivas.
“Even if you went for a ViT architecture, you must process the input using local patches with shared weights for efficiency and generalisation. This is even more crucial when processing multiple frames at the video level, such as in Tesla FSD,” he added.
Hugging Face’s CTO quickly joined the conversation, siding with LeCun and said, “I would pick Yann LeCun over Elon Musk every single day of the week. Despite getting much less money, recognition, and visibility than entrepreneurs, the scientists who publish their groundbreaking research openly are the cornerstone of technological progress and massively contribute to making the world a better place!”
All in all , AI advancements in companies such as OpenAI, xAI and others would not have been possible without research scientists.
“SpaceX would not exist without the thousands of scientific papers on rocket engine design, propellant chemistry, rocket control, material science, orbital mechanics, heat dissipation, trajectory planning, and the hundreds of scientists who got where they are by studying these papers,” claimed Lecun.