Python News, Stories and Latest Updates

HTML Should be the Language with Zero Haters

Anshul Vipat — Wed, 10 Jul 2024 09:02:44 +0000

Despite being a 30-year-old language, HTML is irreplaceable even now—good luck building anything on the web without it. About 93% of websites use HTML5, the most recent version of the HTML language.

Recently, a developer named Ray made a statement that HTML is the only language with zero haters. His post triggered a debate on the usage of HTML.

Why Hate HTML

Many on Reddit mocked HTML for its lack of functionality control. According to them, HTML is not a programming language, so it lacks the logic, control structures, and data manipulation capabilities that developers often find rewarding.

Source: Reddit

The Backbone of the Internet

However, the fact is that HTML is the internet’s unsung hero. HTML scripting interacts interdependently with other web development languages, most notably JavaScript.

Consider HTML to be the skeleton of a building, setting out the rooms and floors, whereas JavaScript represents the energy that fuels the building’s functioning. They collaborate to create amazing websites and applications.

Generally, you’ll learn HTML basics, tie in CSS, and then start learning JS once you can make static web pages using those.

No wonder Statista ranked HTML as the world’s second-most-used programming language, outperformed in popularity by Java alone.

In another Reddit discussion, developers agreed that understanding HTML is not just a skill, it’s a necessity for anyone aspiring to be a good developer in the digital world.

“This is like asking if a foundation and framing are still needed to build a house, or can you just install the doorbell into the air,” said a developer.

Another one mentioned how he got the job because he could build an entire website without a framework in just HTML and CSS. This shows that if you want to work as a frontend or backend developer, you must be familiar with HTML.

HTML Scores Over Others

As per the Stack Developer Survey 2023, HTML/CSS and JavaScript are almost tied as the most popular languages for people learning to code. Unlike other programming languages, HTML is actually considered one of the easiest to learn.

Some developers say they prefer its latest version, HTML5. One of the unique features in HTML is “semantic tags” that assist the computer in understanding what the different elements of a webpage imply.

Aside from making it simple to organise your code structure, semantic HTML makes it very straightforward for screen readers to identify and differentiate specific information in your web application.

According to them, it has a more flexible syntax compared to other markup languages like XML. It can ignore small errors in the code. Also, unlike XML, HTML5 offers offline capabilities through features like Application Cache and Local Storage.

Even in interpreted languages like Python, you need to execute the entire script to see the results. However, with HTML, you can make a small change to the code, like adding a new paragraph or changing the colour of a heading, save the file, and immediately see the updated content in the web browser.

It’s The Big Daddy

A developer, who builds prototypes, said that he still primarily uses HTML and CSS, with some JS.

“I often notice people who say that HTML has become irrelevant and bring up whatever JavaScript library of choice they love. Nine out of 10 times I look at their code and [realise that] they didn’t think much about accessibility or semantics,” he added.

Whether you’re a novice web developer or an experienced coder, a solid understanding of HTML remains crucial to building the foundation of the digital experiences we encounter daily on the internet.

The post HTML Should be the Language with Zero Haters appeared first on AIM.

Why is C++ Not Used in AI Research?

Anshul Vipat — Fri, 21 Jun 2024 12:30:00 +0000

C++, a language that once shone brightly in the late twentieth century, was at the forefront of technological advancements, particularly in space exploration.

However, the emergence of newer, more visually appealing programming languages has shifted the spotlight away from C++.

At the AI+Data Summit 2024, researcher Yejin Choi said that researchers no longer use the language for AI research.

So, is C++ becoming a relic of the past?

Not Many Takers for AI

Despite its performance benefits and applications in various AI fields, such as speech recognition and computer vision, C++ is not the go-to language for AI development.

Its complexity and steep learning curve pose significant challenges. In contrast, Python’s user-friendly nature, extensive libraries, and large developer communities have propelled it to the forefront of AI programming.

Furthermore, C++ involves manual memory management, which can result in memory leaks and errors if not done correctly. This can be a considerable issue, particularly in large-scale AI programmes.

Microsoft emphasised this issue when it revealed that 70% of its updates in the previous 12 years were solutions for memory safety bugs, owing to Windows being mostly written in C and C++.

Google’s Chrome team released their own research, which revealed that memory management and safety flaws accounted for 70% of all major security bugs in the Chrome codebase. It is largely written in C++.

C++ also lacks built-in support for garbage collection, database access, and threading, which can necessitate extra effort to develop.

This can be particularly challenging in AI applications that require concurrent processing of data and tasks, such as deep learning and neural networks, real-time systems and embedded systems, data processing, and data science.

To overcome these limitations, developers often use third-party libraries and frameworks that provide threading support, such as OpenMP or Boost. However, these libraries can add complexity and overhead to the code, which may only be ideal for some applications.

C++ is Complicated

If you’ve visited a page like the C++ FAQ, you’ll understand how hard C++ can be. A comma in the wrong location might trigger hundreds of compile errors in earlier language versions.

The language has improved since C++ 11, with move semantics for transferring ownership and rvalue references, although there is still a high learning curve.

Developing a New Application

In recent years, we’ve witnessed the growth of various programming languages that potentially replace C++ for low-level system tasks, like Rust, which provides safety and security by eliminating buffer overflows and memory leaks (and is much easier to learn than C++).

When you compare the feature sets of modern languages like C++, Python, and Rust, the C language begins to look like a dinosaur! The C standard has not had new features introduced since 2011!

The 2017 standard release included technical corrections and clarifications, and the 2023 standard release did not rock the boat either.

Is C++ Losing Popularity?

Mark Russinovich, the chief technical officer of Microsoft Azure, has stated that developers should stop creating code in the programming languages C and C++ and that the industry should treat these computer languages as “deprecated”.

Ken Thompson, the Bell Labs researcher who designed the original Unix operating system, called it a “bad language” that is “way too big, way too complex” and “obviously built by a committee”.

GitHub compiled a list of the top ten most popular programming languages for machine learning. Python is the most popular language in machine learning repositories, with C++ being sixth.

According to Stack Overflow’s Developer Survey, beginners beginning to code are more likely to prefer Python over C++ than professionals.

While C++ provides advantages regarding speed and memory management, it also has disadvantages, such as a high learning curve and little community assistance.

Despite its challenges, C++ can be a powerful choice for machine learning applications that require high-performance processing and advanced memory management. The choice between C++ and Python for machine learning ultimately depends on the specific needs of the application and the developers’ skill level.

The post Why is C++ Not Used in AI Research? appeared first on AIM.

Meet TORAX, Google DeepMind’s Breakthrough in Open-Source Nuclear Fusion Simulation

Gopika Raj — Thu, 13 Jun 2024 11:43:25 +0000

Google DeepMind researchers have released TORAX, a new open-source differentiable tokamak core transport simulator implemented in Python using the JAX framework. TORAX essentially simulates the transport of particles, energy, and momentum within the core of a tokamak fusion reactor.

According to the new paper, TORAX solves coupled equations for ion heat, electron heat, particle transport, and current diffusion. It incorporates modular physics-based and machine-learning models, leverages JAX for fast runtimes via just-in-time compilation and automatic differentiation, enables gradient-based optimisation workflows and Jacobian-based PDE solvers, and facilitates coupling to machine-learning surrogate models of physics.

TORAX has been verified against the established RAPTOR code, demonstrating excellent agreement in simulated plasma profiles at stationary state. For an ITER L-mode scenario, the normalised root-mean-square deviation between TORAX and RAPTOR temperature and density profiles was around 1%.

A key innovation is TORAX’s use of the JAX framework, allowing just-in-time compilation for speed and automatic differentiation for advanced algorithms like gradient-based optimisation. JAX also simplifies the integration of machine learning surrogate models like the QLKNN neural network trained on gyrokinetic turbulence simulations.

“TORAX offers a powerful and versatile tool for accelerating fusion energy research,” said Google DeepMind research scientist and lead author Jonathan Citrin. “Its differentiability and ability to leverage machine learning models are game-changers.”

The open-source TORAX code aims to foster collaboration and rapid progress in tokamak modelling for fusion reactor design and operation.

Simulation Training

Google DeepMind has a history of open-sourcing simulators for this purpose. Back in 2020, they released a scalable environment simulator for artificial intelligence research, which helped DeepMind create 2D environments for AI and machine learning research. Simulated training is also the most commonly adopted technique to equip general-purpose robots for the real world.

The post Meet TORAX, Google DeepMind’s Breakthrough in Open-Source Nuclear Fusion Simulation appeared first on AIM.

Hating on Python is Literally a Skill Issue

Mohit Pandey — Wed, 29 May 2024 09:43:54 +0000

The internet is buzzing, and once again, Python finds itself in the crosshairs. A user sparked a lively debate with her post on X: “Hating on Python is literally a skill issue.” This simple statement set off a firestorm of opinions.

The point, as succinctly captured in the phrase, suggests that those who criticise Python might be struggling with the language’s simplicity and accessibility. It’s reminiscent of the common gripe about C++: “C++ is terrible because of memory leaks.”

Python, with its straightforward syntax and wide applicability, is easy to pick up but deceptively challenging to master in nuanced, efficient coding.

Some argue that “Python is easy to hate on because it’s a trash language that’s merely an abstraction anyway.” This critique is not uncommon.

Many developers accustomed to the power and precision of languages like C++ or Java often find Python’s abstract nature frustrating, especially when dealing with performance-intensive applications.

But What is the Issue?

Sanjay Nithin offers a more technical criticism saying, “Can’t handle environments properly, properly handled environments aren’t easy to deploy, takes a lot of time and space to process.” These are legitimate concerns, especially in production environments where efficiency and scalability are paramount.

Python is a good language pic.twitter.com/XFKvXPkb2H
— Mark Tenenholtz (@marktenenholtz) September 11, 2023

Some people have positive criticism about Python and say that lack of constraints such as typing encourages bad practices throughout the ecosystem.

This highlights a frequent complaint among seasoned developers that Python’s flexibility can lead to sloppy code and poor software engineering practices if not managed properly, which has been a long discussed issue.

Conversely, Andre Infante argues that programming languages are designed to minimise errors: “The whole point of programming languages is to protect you from your own mistakes and make it harder to make them. Skill issues are language issues.”

The ease of making mistakes in Python is a double-edged sword—it makes the language accessible but also more prone to user error.

“It is hard to beat Java at speed. I’m an ML engineer, and Python is slow. Do write Java whenever you can.” This critique underscores Python’s performance limitations compared to more optimised languages like Java.

AGI will be built on Python

Despite these criticisms, supporters of Python are unwavering: “Seems 100% true, Python is based and will be based.” This sentiment reflects the strong community support and the language’s entrenchment in various tech domains, particularly AI and data science.

Proficiency and comfort with Python come from skill and familiarity, not inherent flaws in the language. Earlier, AIM made a point that the most frustrating programming language is also the one that you work on the most.

As the market shows, Python’s popularity is no fluke. It’s a testament to its broad applicability and the skill of those who wield it effectively. According to Statista, Python is the most popular programming language.

python is the best language actually. you can tell because it's the most popular. the market has spoken. pic.twitter.com/WZgUhDvAPw
— xlr8harder (@xlr8harder) May 29, 2024

Python’s dynamic typing can lead to subtle bugs that may not be caught until runtime, and its performance can sometimes lag behind that of compiled languages like C++. Apart from people heavily discussing how dynamic typing is one of the worst things about Python, it is also the community that makes it worse for people.

“Python is great for scripting random tasks and calling functions implemented in C. It is not great for building applications.” This pragmatic take acknowledges Python’s strengths and weaknesses, recognizing its utility in specific contexts.

In the end, hating Python may indeed be a skill issue. “Only a bad craftsperson blames their tools.” Python’s simplicity and versatility have made it a staple in the programming world. While it may not be perfect for every application, its accessibility and vast library support make it an indispensable tool for many.

Meanwhile, this debate seems to be forgetting the fact that English is the hottest programming language. Moreover, Microsoft is now also allowing people to code in their native languages, which could even mark the end of Python. The conversation of ‘skill issue’ is simply going to shift to whoever prompts the Copilot the best.

The post Hating on Python is Literally a Skill Issue appeared first on AIM.

Open Source Libraries are Going Through Trust Issues

K L Krithika — Wed, 03 Apr 2024 06:04:23 +0000

Last week, Python Package Index (PyPI) downloaded by thousands of companies, started exfiltrating sensitive customer data to an unknown server. The maintainers suspended new user registrations for a day.

It was a ‘multi-stage attack’ to steal crypto wallets, sensitive data from browsers (cookies, extensions data, etc.), and other credentials using a method called typosquatting.

This involves attackers uploading malicious packages with names deceptively similar to popular legitimate packages. Cybersecurity firm Phylum, which tracked the campaign, noted that the attackers published 67 variations of ‘requirements’, 38 variations of ‘Matplotlib’, and dozens of other misspelt variations of widely-used packages.

For a long time now, software package libraries have been the target of malware attacks. PyPI of Python, Node package manager of Javascript, RubyGems of Ruby are all prone to attacks more sophisticated than the last.

Researchers who studied malicious code in PyPI said, “Over 50% of malicious code exhibits multiple malicious behaviours, with information stealing and command execution being particularly prevalent. We observed several novel attack vectors and anti-detection techniques.”

According to the study, 74.81% of all malicious packages successfully entered end-user projects through source code installation. Researchers also said the malicious payload employed a persistence mechanism to survive reboots. Yehuda Gelb, Jossef Harush Kadouri, and Tzachi Zornstain led the research.

This is not the first time

The PyPI administrators and the Python community are actively working to combat these malicious attacks on the security of the ecosystem.

Like the measures taken last week, PyPI suspended new user registrations in November and December last year, “These temporary suspensions allow the PyPI team to triage the influx of malicious packages and implement additional security measures,” said the researchers.

Moreover, PyPI is taking proactive steps, just like other libraries. The registry now requires two-factor authentication for critical projects and packages, making it harder for attackers to hijack maintainer accounts. The team is also investing in improved malware scanning capabilities to identify and remove malicious packages quickly.

The paper also suggests that end-users should exercise caution when selecting and installing packages using pip and other tools and verify the software packages’ sources and credibility to ensure system security.

The impact of these attacks on businesses are severe. Last year, malicious Python packages stole sensitive information like AWS credentials and transmitted them to publicly accessible endpoints.

A Cat-and-Mouse Game

Since PyPI has grown in popularity, it has become an increasingly attractive target for attackers seeking to infiltrate the software supply chain. The evolution of its security measures has been a constant game of cat and mouse, with attackers continually refining their tactics and PyPI administrators working to stay one step ahead.

In the early days of PyPI, the repository relied on a largely trust-based model, prioritising ease of contribution for the growing Python community. Over the years, one of the most significant steps forward came with the introduction of two-factor authentication (2FA) for PyPI accounts.

As Donald Stufft, a PyPI administrator and maintainer since 2013, explained, “Two-factor authentication immediately neutralises the risk associated with a compromised password. If an attacker has someone’s password, that is no longer enough to give them access to that account.”

PyPI has also implemented other measures, such as API tokens for more secure package uploads and improved malware scanning tools. However, the sheer volume of packages and the constantly evolving threat landscape mean that PyPI’s security team is always playing catch-up.

Feross Aboukhadijeh, the founder of Socket, a company that provides supply chain security for JavaScript, Python, and Go dependencies, highlighted the scale of the problem, “At Socket, we see about 100 attacks like this every single week.”

Despite the challenges, the Python community has made significant progress in recent years. Stufft noted, “We’ve gotten a lot more confident in our 2FA implementation and in what the impact of enabling it is for both people publishing to PyPI, and to the PyPI team itself.”

The repository has also benefited from increased funding and resources, including the hiring of a dedicated PyPI safety and security engineer.

The impact of these attacks on businesses can be severe, as demonstrated by recent incidents where malicious Python packages stole sensitive information like AWS credentials and transmitted them to publicly accessible endpoints.

This not only puts the affected companies at risk but also exposes their customers to potential security breaches and compromised software releases.

As Aboukhadijeh put it, “Open source is one of the best things. But I think one of the things that we don’t appreciate is just the amount of trust that we place in all open source actors to be good.”

The post Open Source Libraries are Going Through Trust Issues appeared first on AIM.

Ruff Emerges as the Fastest-Growing Python Linter Ever

K L Krithika — Sun, 24 Mar 2024 06:30:00 +0000

Ruff, a Python linter written in Rust, has taken the Python community by storm with its blazing fast performance and comprehensive feature set.

The project, created by Charlie Marsh in 2022, recently hit a major milestone with over 400 contributors on GitHub. Marsh also said on X, “Just got confirmation that an >8M LoC codebase successfully migrated to ruff format” Compared to the linters like Flake8, Pylint or Black, Ruff is extremely fast and accurate and is quickly adopted by the community.

Linters are essential tools for any programming language, helping developers catch potential errors, enforce coding standards, and maintain a consistent style across large codebases. In the Python ecosystem, popular linters like Flake8, Pylint, and Black have been widely used for years. However, these tools often come with performance trade-offs, especially when dealing with large projects.

Ruff, on the other hand, is surprisingly quick compared to the others while also integrating more functionality into a single, unified interface. By leveraging the speed and safety of Rust, Ruff is able to analyse and format code at an astonishing pace.

In benchmarks, Ruff has proven to be 10-100x faster than tools like Flake8 and Black, making it a game-changer for developers working on large codebases.

Time scale of Ruff compared with other linters

Debugging at Lightning Speed

One of the key factors contributing to Ruff’s speed is its built-in caching mechanism, which avoids re-analysing unchanged files. This, combined with its highly optimised Rust implementation, allows Ruff to blaze through even the most massive codebases in a matter of seconds. Rust is already known for its speed and memory efficiency, making it an ideal choice for performance-critical applications like linters.

As Nick Schrock, founder of Elementl and co-creator of GraphQL, noted, “On our largest module pylint takes about 2.5 minutes, parallelised across 4 cores on my M1. Running ruff against our entire codebase takes .4 seconds.”

Ruff’s speed is not its only selling point, however. The tool boasts an impressive array of features, including over 700 built-in rules, native re-implementations of popular Flake8 plugins, and drop-in parity with tools like isort and Black. This means that developers can replace multiple linters and formatters with Ruff, simplifying their development workflows and reducing the overall complexity of their toolchains.

The Python community has been quick to embrace Ruff, with many high-profile projects like Apache Airflow, FastAPI, Hugging Face, Pandas, and SciPy already adopting it. Users have expressed disbelief at how quickly it can analyse their code. Sebastián Ramírez, creator of FastAPI, quipped, “Ruff is so fast that sometimes I add an intentional bug in the code just to confirm it’s actually running and checking the code.”

Perhaps the most surprising endorsement of Ruff comes from the Pylint project itself. In a recent development, even the Pylint, codebase has begun using Ruff for linting, a testament to the tool’s growing popularity and effectiveness.

The Inception

Charlie Marsh was motivated to build Ruff to solve his own issues with coding, “it’s the tooling I wish I’d had,” he said in an interview. Marsh’s ultimate goal is to create an integrated toolchain that includes not only a linter but also an auto formatter and potentially a type checker. By bundling these functionalities together, Ruff would provide a more powerful and efficient solution for Python developers.

Despite the challenges of building a new tool and learning Rust, Marsh stayed committed and even announced his new company, Astral, which will continue developing Ruff and other high-performance Python tools.

“To me, the community’s response to Ruff is itself evidence of an opportunity to make the Python ecosystem more productive by building great tools. Astral exists to meet that opportunity.”

Marsh said his ultimate goal is an integrated toolchain that includes not just a linter, but also an auto formatter and possibly a type checker. By combining these tools, Ruff aims to provide an even more powerful and efficient solution for Python developers.

As Ruff continues to evolve and grow, it is set to change how Python developers work. With its speed, features, and community support, Ruff is on track to become the standard for Python linting and formatting.

The post Ruff Emerges as the Fastest-Growing Python Linter Ever appeared first on AIM.

Google Launches TensorFlow GNN 1.0 for Advanced Graph Neural Networks

K L Krithika — Wed, 07 Feb 2024 05:49:03 +0000

The Google TensorFlow team has released TensorFlow GNN 1.0 (TF-GNN), an update to its machine learning framework to better develop and scale graph neural networks (GNNs).

This new library can handle analysing complex networks, such as transportation and social networks. TF-GNN focuses on both the structure of graphs and the features of their nodes. This library bridges the gap between discrete graph data and continuous neural network models, for more detailed predictions and analyses.

TF-GNN introduces a suite of advanced features for the TensorFlow ecosystem. At the heart of these advancements is the tfgnn.GraphTensor object, which represents heterogeneous graphs characterised by diverse node and edge types. This integration allows for the efficient handling of graph data, enhancing the TensorFlow ecosystem’s ability to manage complex network structures.

The library provides a Python API, that can configure subgraph sampling for different computational environments, from individual workstations to distributed systems. This flexibility is crucial for handling datasets of varying sizes and complexities. Furthermore, TF-GNN introduces integrated gradients for model attribution, offering insights into the features most influential in predictions, thereby enhancing model training and evaluation.

By incorporating the structure and data of graphs, GNNs offer predictions on entire graphs, individual nodes, or potential edges. This improves the understanding of complex relationships and attributes, making TF-GNN a powerful tool for a wide range of applications.

TensorFlow GNN 1.0 is available as part of the TensorFlow ecosystem, with resources, documentation, and code samples accessible online for developers worldwide.

The post Google Launches TensorFlow GNN 1.0 for Advanced Graph Neural Networks appeared first on AIM.

8 Must-Know OCR Tools for Training AI/ML Models

Vandana Nair — Tue, 06 Feb 2024 12:30:00 +0000

India boasts over 400 languages and a rich linguistic tapestry but faces the challenge of bridging the digital divide, which is exacerbated by the dominance of English in LLMs. Perpetually hungry for data, large language models are extensively trained on online information. However, the absence of non-English language data and the abundance of vast offline data can be leveraged with OCR.

Optical Character Recognition (OCR), which is the process of transforming an image containing text into a machine-readable text format, digitises content into data that can be used for analytics, automation, training AI models and other processes. With the function to extract data, OCR enables LLMs to analyse and process the said data.

Here are a few OCR tools that can aid developers and coders train AL/ML models.

Best OCR Software with Machine Learning in 2024

Surya
Bhashini
Tesseract OCR
PyTesseract
EasyOCR
OpenCV
OCRopus
Kraken

Surya

Surya, a multilingual text line detection model designed for document OCR, has been trained on diverse documents, including scientific papers. The training ensures that Surya excels in detecting text lines within documents, delivering pinpoint accuracy in line-level bounding boxes and clear identification of column breaks in PDFs and images.

Bhashini

Bhashini, an app developed to help people translate content in different Indian languages, recently introduced an OCR feature, called SCENE. The feature allows users to extract text by simply scanning an image using the camera. Bhashini was recently used by the Prime Minister Narendra Modi to address students during ‘Pariksha Pe Charcha’.

Tesseract OCR

Tesseract OCR is an open-source OCR engine maintained by Google. It was first developed by Hewlett-Packard, and later taken over by Google. Tesseract has unicode (UTF-8), supports more than 100 languages and can be integrated with LLMs to extract text from images. It also supports various image formats such as PNG, JPEG, TIFF.

PyTesseract

Python-Tesseract serves as an optical character recognition (OCR) utility for Python. Essentially, it is capable of identifying and interpreting the text contained within images. Python-tesseract acts as a wrapper for Google’s Tesseract-OCR Engine.

It proves handy as a standalone execution script for Tesseract, capable of interpreting all image formats supported by the Pillow and Leptonica imaging libraries, such as jpeg, png, gif, bmp, tiff, among others. Furthermore, when employed as a script, Python-tesseract outputs the recognized text directly rather than storing it in a file.

EasyOCR

EasyOCR is a Python package that provides a straightforward interface for performing OCR tasks. It is an open-source OCR engine that supports multiple languages and can be used with LLMs for text recognition and data extraction. It also offers pre-trained models for various use cases.

OpenCV

OpenCV (Open Source Computer Vision) is a collection of programming functions primarily focused on real-time computer vision tasks. While it may require more customisation, it can be used in conjunction with LLMs for OCR tasks.

In Python, OpenCV facilitates image processing by providing functions for tasks such as image resizing, pixel manipulation, object detection, and more.

OCRopus

OCRopus is another open-source OCR engine that is designed for high accuracy and efficiency. It includes various preprocessing and post-processing techniques suitable for AI and ML applications. OCRopus commands typically display a stack trace alongside an error message, but this does not necessarily indicate a problem.

Kraken

Kraken is an OCR engine implemented in Python and optimised for historical and degraded document recognition. It can be used in AI and ML models for tasks involving challenging document images. Kraken can be run on Linux or Mac OS X (both x64 and ARM).

Resources

The post 8 Must-Know OCR Tools for Training AI/ML Models appeared first on AIM.

Python Adds Support to JIT Compiler

K L Krithika — Wed, 10 Jan 2024 07:57:17 +0000

Python announced a new update yesterday that added support to the JIT (Just in Time) compiler. This addition, made by CPython core developer Brandt Bucher in the end of 2023, is a substantial change to the CPython Interpreter.

The inclusion of the JIT compiler in Python 3.13 builds upon the earlier introduction of the Specialising Adaptive Interpreter in Python 3.11, continuing the trend of significant updates to boost Python’s performance. This update is building on the progress made with the adaptive interpreter in Python 3.11.

The JIT compiler changes the Python code into machine code when the code is run for the first time, unlike Ahead of Time (AOT) compilers like GCC or Rust’s rustc. This new compiler type copies instructions for each command and fills in the blanks for bytecode arguments.

The copy-and-patch JIT was chosen for its simplicity and efficiency compared to a full JIT compiler. It avoids the complexity and resource demands of compiling high-level bytecodes into an Intermediate Language (IL) and then into machine code. This approach is particularly beneficial for Python, which runs on various CPU architectures.

The initial benchmarks show a performance improvement of 2-9%. While this might seem modest, it lays the groundwork for more significant optimisations in the future. This update is not just about immediate performance gains but also about setting a foundation for future advancements in Python’s efficiency.

The post Python Adds Support to JIT Compiler appeared first on AIM.

How GPT-4 Fast-Tracked Novice Developers to Pros in Less Than a Year

Vandana Nair — Wed, 03 Jan 2024 12:09:30 +0000

Josh Olin, entrepreneur and founder of WeGPT.ai, recently tweeted about how GPT-4 enabled him to build web requests and applications using GPT-4, and in the process learn Python. In a span of seven months, from last April, Olin used GPT-4 to create the fundamental capability of fetching HTTP data from an API endpoint.

Further, the resulting code was then put in Gist, continuously guiding GPT-4’s attention to its own source code by providing new feature requests, suggesting improvements, and fixing bugs. Olin had no prior knowledge of Python.

As challenging as that sounded, GPT-4’s capability as a coding teacher has resulted in a number of people experimenting with the technology.

OpenAI co-founder Greg Brockman’s tweet on GPT-4’s capabilities. Source: X

Moving Away from the Traditional Route

With the versatility of the model, programming has become one of the prominent real-use cases, allowing anyone, even those without coding knowledge, to build apps, thereby, enabling anyone to become a programmer.

Traditionally, a software engineer who has undergone a formal training of four years of engineering is equipped to become a Python programmer provided the Python course was taught as part of the curriculum. If not through a professional degree, certification and courses can make a person proficient in Python in 6-12 months, and even more for advanced courses.

With GPT-4, one accomplishes the task of programming within minutes and even learn python language in the process. With the investment cost on an OpenAI API or ChatGPT-4 subscription, a user can pretty much program and learn. Furthermore, GPT-4 allows clubbing of features which allows one to tweak the functions as per requirement.

Daniel Ávila Arias, co-founder of CodeGPT, an AI Saas platform enabling developers and companies to build AI-based solutions shared a video on using the CodeGPT extension with OpenAI to review Python code.

Creating your own Copilot in VSCode is ridiculously easy

In this video, using the CodeGPT extension in VSCode, I select OpenAI and the gpt-4-1106-preview model to review Python code.

You can download the extension for FREE at this link: https://t.co/i8lfNwzZ3m pic.twitter.com/L06cCNoewU
— Daniel San (@dani_avila7) December 29, 2023

With GPT-4 Vision, coding is accelerated to another level. With just images of basic drawing or scribbles from a whiteboard, a whole coding program can be generated.

Relevance of Coding Teachers?

GPT-4 and ChatGPT have enabled different forms of self-learning, with the question of the fate of teachers on the line. A study done by the University of Toronto observed that AI coding assistant tools such as OpenAI Codex, enhanced the performance of novice programmers, allowing them to write code more efficiently and with reduced frustrations.

While the tools may not just be a teacher, they work best as a knowledge accelerator. In certain cases, a person with prior knowledge of Python programming will be able to effectively use it as opposed to someone with zero knowledge of it.

A New Era of Effective Accelerationism

Big tech companies’ shift towards educating users and teaching them programming language is gaining pace. Google is not far behind with the company offering cloud platforms that allow Python developers to build applications but also offer exclusive courses to learn the programming language.

With the massive advancements of AI in 2023, learning goals also took a massive shift. Discussions on how people will study for jobs that won’t exist in the future took shape.

IBM’s global managing partner of generative AI Mathew Candy recently said that you don’t need a computer science degree to get a job in tech, and it would be much easier for people without technical skills to build products. Thus, hinting towards a shift in learning and employment that require self-learning, in this case, programming.

It is possible that in 2024, we will witness further developments where more real use-cases and practical applications of GPT-4 in programming will emerge.

The post How GPT-4 Fast-Tracked Novice Developers to Pros in Less Than a Year appeared first on AIM.

Open-Source LLM360 Unveiled by Cerebras Systems, Petuum and MBZUAI

Vandana Nair — Tue, 12 Dec 2023 10:47:30 +0000

AI supercomputer company Cerebras Systems, AI company Petuum, and Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) launched LLM360, a framework for creating open-source large language models (LLMs). Developed in partnership with MBZUAI’s Institute of Foundation Models, LLM360 empowers developers by providing detailed insights and methodologies, promising to simplify, expedite, and reduce costs in the development of LLMs.

Two open-source large language models are released : Amber, a 7 billion parameter English-language model trained on 1.2 trillion tokens, and CrystalCoder, a 7 billion parameter model, trained on 1.4T tokens designed for English language and coding tasks. Both the models are released under the Apache 2.0 license. There is also another model Diamond with 65 billion parameters which is set to release soon. These models are trained on the Condor Galaxy 1 supercomputer, built by G42 and Cerebras systems.

Both the models are built on Meta’s LLaMA architecture and Amber is said to perform similarly to LLaMA-7B, OpenLLaMA-v2-7B and outperforms Pythia-6.7B.

Source: LLM360 Blog

CrystalCoder undergoes meticulous training, incorporating a thoughtful blend of text and code data to enhance its effectiveness in both domains. Notably, the introduction of code data occurs early in the pretraining stage, distinguishing it from Code Llama 2, which relies solely on code data during fine-tuning on Llama 2. Furthermore, CrystalCoder is specifically trained on Python and web programming language, strategically designed to elevate its capabilities as a programming assistant.

UAE Heading Towards AI Dominance

With the recent AI developments, UAE is working towards becoming an AI superpower. Following TII’s Falcon and demographic-specific Jais large language model, UAE has been also rallying for open-source models to promote research initiatives. With the recent AI company, A171 that was launched a few weeks ago, UAE looks to even take on AI giant OpenAI.

The post Open-Source LLM360 Unveiled by Cerebras Systems, Petuum and MBZUAI appeared first on AIM.

NVIDIA Unveils CUDA Quantum 0.5 to Accelerate Quantum Workflows with GPUs

Shyam Nandan Upadhyay — Thu, 30 Nov 2023 07:00:36 +0000

NVIDIA announced the launch of CUDA Quantum 0.5, the latest iteration of its CUDA Quantum platform, tailored explicitly for developing quantum-classical computing applications. It boasts an open-source programming model that seamlessly integrates quantum processor units (QPUs), GPUs, and CPUs.

By accelerating workflows spanning quantum simulation, quantum machine learning, and quantum chemistry, CUDA Quantum optimises these intricate processes through its compiler toolchain, harnessing the immense power of GPUs.

At its core, CUDA Quantum 0.5 brings forth a suite of innovations. A significant addition is the support for adaptive quantum kernels, a development spearheaded by the QIR alliance. This advancement enables the platform to navigate complex quantum error correction and hybrid quantum-classical computations, crucial for intricate control flow and intertwined primitives.

Further augmenting its capabilities, CUDA Quantum 0.5 introduces Fermionic and Givens rotation kernels, catering specifically to quantum chemistry simulations. These kernels streamline operations on fermionic systems, empowering researchers to develop novel quantum algorithms tailored for applications in chemistry, thereby accelerating research in this domain.

In a significant stride towards quantum mechanics integration, the platform now supports exponentials of Pauli matrices. This enhancement proves invaluable for researchers engaged in quantum simulations of physical systems such as molecules, paving the way for the development of quantum algorithms tailored for optimization problems, thereby broadening the practical applications of quantum computing.

The integration of IQM and Oxford Quantum Circuits’ (OQC) QPU backends stands as a monumental achievement for CUDA Quantum 0.5. This integration expands its compatibility across a diverse range of quantum computing technologies, complementing the existing support for platforms from Quantinuum and IonQ. Developers and researchers now gain the flexibility to execute CUDA Quantum code seamlessly across multiple quantum platforms, opening doors to a myriad of possibilities.

A notable addition to this iteration is the advancement in tensor network-based simulators. These simulators prove invaluable for large-scale simulations of quantum circuits involving numerous qubits, surpassing the memory constraints of traditional state vector-based simulators. Moreover, the inclusion of a matrix product state (MPS) simulator, leveraging tensor decomposition techniques, facilitates handling a vast number of qubits and deeper gate depths within a relatively confined memory space, redefining the boundaries of quantum circuit simulations.

For those eager to explore the capabilities of CUDA Quantum 0.5, a comprehensive Getting Started guide lays out the steps for delving into Python and C++ examples. Advanced users can further explore the tutorials gallery to unleash the full potential of quantum-classical applications. To engage with the CUDA Quantum community, the open-source repository serves as a central hub for feedback, issue reporting, and collaborative feature suggestions.

The post NVIDIA Unveils CUDA Quantum 0.5 to Accelerate Quantum Workflows with GPUs appeared first on AIM.

What Can Java Do for Machine Learning?

K L Krithika — Thu, 05 Oct 2023 05:46:45 +0000

Python and R are undoubtedly the most widely used languages for machine learning, and yet there is no dearth of developers who use Java for the same purpose. In fact, the language is slowly catching up with Python.

Meanwhile, LinkedIn and Oracle released Dagli and Tribuo frameworks, respectively, in 2020, which are also contributing to the Java Machine Learning Library (JavaML). The library gives users access to an extensive range of machine learning tools, apart from wrappers and APIs to integrate different frameworks to Java.

How Java is used in ML

Java is the go-to tool for many machine learning tasks. Users can create algorithms, build models, and easily launch applications with this language. The good thing about Java is its flexibility—it can handle everything from preparing data to making models.

Evelyn Miller, data science lead at Magnimind Academy, said, “You should remember that Java gives support for development in any field you want, and data science is no different.”

Developers can use Java to make it easy for different parts of their app to talk to the ML features. Using third-party open source libraries and frameworks, users can leverage Java to implement what any other language does. For instance, the open source library TensorFlowJava can run on any JVM for building, training and deploying machine learning models.

Java also helps make the launch of machine learning applications smooth and offers libraries with specific tools for different tasks. A popular Java machine learning toolkit Weka provides a graphical interface for data preprocessing, modelling, and evaluation.

This library, developed by the University of Waikato, is as old as the language itself. However, it is still the most widely used library available and its popularity continues to rise because of its flexible data mining software.

Even big tech companies, including Google, Amazon, and Microsoft, are leveraging Java for machine learning. Google developers use Java for various applications, in fact, the entire Google Suite is built especially in Java code.

Apart from Weka, Apache Mahout is another framework widely used by enterprises like Facebook, LinkedIn, Twitter, and Yahoo. This is mostly because the framework is scalable. Complex data structures are manipulated in Java, which might not be possible in Python.

This can be done using different frameworks, for example, Mahout uses a distributed linear algebra while ADAMS (Advanced Data mining And Machine learning System) is a tree-like structure. This allows data manipulation in a variety of ways.

Adopting Java

There are 8-10 million Java developers in the world. Frank Greco, a senior consultant at Google, said at a talk, “All the big tech companies are interested to know more about using Java for ML.”

He, along with his peers, are working on promoting the language for ML. “Java’s role in ML will come as a revelation,” Greco said. His team engaged with major players, the likes of Twitter, Oracle, IBM, and Amazon.

The excitement for using Java in ML is unanimous across these industry giants — there is a genuine interest in exploring how Java could be harnessed for ML. “It isn’t a case of dismissing Java in favour of Python; instead, all are keen to understand Java’s potential in the ML realm,” he explained.

Greco built the JSR 381, a Java-friendly API for visual recognition and generic ML API which can be used for high-level abstractions. This API is not tied to any ML framework but developers can choose a framework that best suits their needs.

“The goal was to make visual recognition and ML easy to use by non-experts,” he said. Amazon implemented this API, and Greco says it is a good starting point for the language. He said, “I believe that with feedback from the community, we can move this forward.”

The post What Can Java Do for Machine Learning? appeared first on AIM.

The Go-To Friend for AI Programming

K L Krithika — Mon, 04 Sep 2023 09:30:00 +0000

Yes, we are talking about Python. This modern programming language is ubiquitous in machine learning, data analysis and pretty much the entire tech ecosystem. If you scroll down papers with code you’ll find most of the research on machine learning is done using Pytorch, a framework built out of python. The language isn’t used only in research but also in scripting, automation, web development, testing etc. But why is the language so popular?

It has a simple and readable syntax which resembles natural language. With more than 137,000 libraries that include everything from data analysis, deep learning, computer vision, web development to name a few, Python serves as a general purpose language with a dynamic use case. Python enjoys a strong community support of active developers who contribute to the growth of the language by creating libraries, frameworks and tools, for example the Python Package Index (PyPi) which hosts thousands of third party packages that extend Python’s capabilities, enabling developers to solve complex problems efficiently.

Python and AI

Python is widely used to build AI models, more so than any other languages. The language is the second most used one because it is simple, direct, and easy to learn. Python also allows computationally expensive libraries to be written in C and then imported as Python modules, meaning users do not have to write in C which is more clunky and difficult to work with.

This is done with Python’s CFFI. This module allows Python to leverage libraries in C and combined with tools like Cython, allows developers to write Pythonic code while achieving speeds comparable to those of C, which is particularly useful for performance-critical applications. This is evident in its 30 million downloads per month.

Not limited to C but other programming languages that provide C-compatible interfaces, allowing interaction by creating a C layer around functions in these languages.

Most importantly, Python is better focused, as a community, on finding a Pythonic way to proceed, and then advocating it, than previous cultures. They have multiple independent communities of use: web, data science, ML, devops. They also built the right kind of libraries like Numpy and pandas (for data analysis and machine learning respectively) that sealed the deal for it in the scientific and research communities.

The language also saw massive support from corporates, Google invested heavily in building Tensorflow. While PyTorch is primarily developed and maintained by Facebook’s AI Research (FAIR) lab, which is part of Meta Platforms, Inc. It isn’t surprising that a bigger community usually means better support and more libraries, which feeds back into its growth of the massive community.

The Python Software Foundation has been responsible for maintaining and developing Python, and they are constantly adding new features and functionality. Users can be sure that the language will be supported with for the foreseeable future makes Python a good choice for AI development.

Other languages catching up?

While none of the other languages hold up to the breadth of development in Python, they are nonetheless used for specific purposes. Rust is gaining attention in AI development due to its focus on memory safety, performance, and concurrent programming. Rust is known for preventing common programming errors that can lead to security vulnerabilities. This is crucial for AI systems that handle sensitive data. Its memory management is more manual compared to Python, but this provides fine-grained control over resources.

Ruby’s adoption in AI is not as widespread as Python, but its ease of use and community support make it an attractive choice for AI development in certain contexts. Ruby has gained attention in AI development, especially in the context of web applications that leverage AI features. Ruby has libraries like TensorFlow.rb, which brings TensorFlow to the Ruby community, and other AI-related gems.

Python still remains a dominant force in AI development, with more than 275,495 companies using it. The language is beginner friendly while at the same time being used by experts for the development of AI thanks to its extensive documentation.

There are many languages you can use, but it's hard to find documentation for anything other than Python.

That's why I think Python is the best bet to start learning.
— Santiago (@svpino) August 26, 2023

The language has a bright future as it’s now being taught to children in schools, and now is a part of the curriculum for students as young as 7 years old.

The post The Go-To Friend for AI Programming appeared first on AIM.

Fixit 2 vs Ruff

Vandana Nair — Wed, 09 Aug 2023 06:58:53 +0000

“Fixit is dead! Long live Fixit 2,” said Meta, releasing the latest version of its open-source auto-fixing linter. This all-new linter was launched with the intention of enhancing developers’ efficiency and competence, encompassing both open-source projects and an extensive array of projects within the internal monorepo. All of this is fine, but how does it fare against Rust-based Ruff released earlier this year?

Amethyst Reese, the project lead and primary engineer for Fixit 2, said on HN, that these are two different linters with very different goals. “While Ruff prioritises speed over all other concerns, and chooses Rust as the method of achieving that goal with a set of generic lint rules, Fixit is focused on making it easy for Python developers to write custom lint rules,” she added.

Further, she said that use of LibCST (which has a parser written in Rust) makes it easy to build lint rules in Python. She said that Fixit permits engineers to craft specific linting rules within their project repository with minimal content, enabling prompt activation without the need for creating personalised plugins or packages, and without the requirement for rebuilding or deploying a fresh iteration of Fixit. “This lowers the barriers to writing rules, and provides a quick development and feedback cycle when testing them,” she added.

Moreover, she also said that hierarchical configuration allows Fixit to fit well into larger monorepos, with each team able to easily control what lint rules run on their code base. “Open source projects homed in those monorepos also have tools to ensure that results are the same when running on internal vs external CI systems,” she added.

Limitations of Fixit

The auto-fixing linter, Fixit, was originally developed for Instagram and later released as an open-source tool, but faced limitations. It lacked the capability to accommodate local lint rules or hierarchical configuration, features that were crucial for the monorepo structure hosting a multitude of projects. Requests from developers to incorporate Fixit into the monorepo were numerous; however, various challenges emerged, resulting in only partial support for a limited set of security lint rules. This diminished the direct benefits that could have been derived for the Python codebase.

Considering the AI/ML shift, Meta embarked on a partial rewrite of Fixit where the crucial emphasis was on embracing an open-source-first approach. The new version, Fixit 2, meets the requirement of both internal monorepos and open-source projects. It also introduced support for local, in-repository lint rules similar to those in Flake8, a more refined command-line interface (CLI), and an improved application programming interface (API) to enable seamless integration with other tools and automation. The risk of generating incorrect syntax is eliminated in Fixit 2 and it suggests and provides automated fixes based on the lint rules themselves, enhancing its utility and accuracy in code improvement.

Creating a fresh lint rule requires just a few lines of code, often fewer than twelve, along with inline definition of test cases. Additionally, you have the flexibility to position the rule adjacent to the code it intends to lint, streamlining the process.

Source: engineering.fb

Moving Away from Flake8

At Meta, Python holds a prominent position as one of the most extensively employed programming languages. The company believes that Python’s attributes of having a user-friendly syntax that is easy to understand, and its extensive collection of open-source libraries that provide pre-built functionality, serve as a pivotal tool.

There are a number of effective linters available in a Python ecosystem. At Meta, Flake 8 has been used since 2016 and has proven highly successful in aiding developers to minimise bugs and maintain a well-structured codebase. Flake8 is a widely used Python tool that combines several static analysis tools to check Python code for adherence to coding style and potential errors. As a linter, it scans your Python code without actually executing it, helping you catch potential issues and maintain a consistent code style.

The flake8-bugbear plugin, an extension for the Flake8 Python tool that provides additional linting rules to catch potential issues and improvements in Python code, was created by Łukasz Langa, while working at Meta. A prominent figure in the Python community, and a developer who has made significant contributions to various Python projects such as Black and worked as Python Software Foundation developer-in-residence and release manager for Python versions 3.8 and 3.9.

Flake8 has been a cornerstone of our code linting approach, but it has limitations. Creating new linting rules requires complete plugins, leading to complex plugins addressing multiple errors. When linting issues arise, Flake8 only provides location details, lacking suggestions for improvements. Consequently, developers experience trial and error to meet linting standards. Furthermore, Flake8 relies on stdlib ast module, hindering the parsing of new syntax features. Thus, adopting new language features hinges on tool updates, potentially slowing down development.

While Meta has explained about Fixit 2, it did not mention anything about where it stands in terms of speed when compared to other linters. Ruff, which is written in Rust, as opposed to others which are Python-based, is the quickest. Ruff outpaces Flake8 by about 150 times in speed on macOS and surpasses Pycodestyle by 75 times, along with outstripping Pyflakes and Pylint by 50 times, among others. Ruff achieves a swift total processing time of around 60 milliseconds for a single file in CPython, making it notably faster.

Source: GitHub

The post Fixit 2 vs Ruff appeared first on AIM.

Prompt Engineering is the New C++

Mohit Pandey — Fri, 09 Jun 2023 07:16:05 +0000

Can a programming language ever get replaced? One might say that C was replaced by C++, which was later replaced by Python. Though Python has been able to stand its ground amid rapidly-rising languages like Rust, the latest competitor is our beloved English — riding on prompt engineering.

Promptgramming is something that Python folks are forced to hate. As Aleksa Gordić, former DeepMind and ex-Microsoft researcher put it, “The history of computer science is the history of smirking at a thing that’s about to ‘replace’ you.”

During the Microsoft Build 2023, there was great emphasis on the fact that, “now everyone’s a developer”, with the rise of models that can write codes for everyone with just a single prompt. But on the other hand, the creator of models such as ChatGPT, Sam Altman believes otherwise.

He believes that the current state of prompt engineering is only because of the temporary limitations and capabilities of large language models. He believes in his mind that anyone who does prompt engineering right now would not be doing it in the next five years. That is not because it will become a fad, it is because everyone, including the LLMs, would be too proficient in understanding what the user wants, and generating it perfectly.

Is it a tall claim?

Firstly, it might be true technologically. Altman being the creator of such models might be able to estimate the possibilities that these LLMs hold in the future. Secondly, language being the most important tool in the world is not something that anyone would deny.

“Text is the projection of the world,” said OpenAI’s co-founder Ilya Sutskever. While he was talking about the capabilities of LLMs and that text-based models would possibly lead to AGI, it can also be said that the perfect arrangement of words to bring out meaning and information is always going to be mightier than any sword.

Coming back to programming languages. Just as Python could not replace C++, prompts may not be able to replace coding. Some people still use Assembly, C, or C++ for building frameworks and algorithms. Python is good for machine learning, and prompt engineering is good for building models quickly, but to build a foundation from scratch, C or C++ is always going to be there.

That is not to say that prompt engineering holds no value. It has already found its niche and is probably going to stay here for a while. In certain cases, a lot of prompt engineering jobs pay more than Python. Temporarily, it might look like “promptgramming” is actually replacing Python.

What will replace prompt engineering?

If we think about it, prompting is very similar to a lot of other languages. It is just a language that feels closer to human language. Going on ChatGPT or Codex and just typing in what you want does not work that easily. It’s a skill to learn as well. Anyone who has tried building models by throwing prompts at these models would know that it’s not a piece of cake.

The best part about prompting is that it is mostly just trial and error — if one prompt doesn’t work, try another. It might work out eventually. But isn’t that the case with a lot of programming languages as well? You indeed have to have a basic knowledge of coding before even jumping into Python, prompt engineering just removes that barrier to entry as well.

Apart from the legal and ethical issues about copying code from Stack Overflow and other programmers, the problem with these code generation platforms is that they are not perfect. Even if two people generate the same prompt, the results might be different. Moreover, debugging the code written by these models is a task in itself — one that would require knowledge of programming languages.

So what would kill prompt engineering? Probably in a few years as LLMs get better, it would be in everyone’s ability to generate perfect outputs every time. By the way, the most interesting thing that would kill prompt engineering would be thought prompting. With NeuraLink and similar projects coming into the picture, soon, just like Python developers smirk at prompt engineering, prompt folks would jeer at them, “These thought prompters can’t even type their own prompts. Losers!”

The post Prompt Engineering is the New C++ appeared first on AIM.

PyTorch Tabular v1.0.2 Released

Tasmia Ansari — Wed, 31 May 2023 10:55:34 +0000

A new version, v1.0.2 of PyTorch Tabular has been released. The library now includes a new method in TabularModel for enabling feature importance. Feature Importance has been enabled for FTTransformer and GATE models. The release note was generated by ChatGPT using the git commit logs from the last release, wrote Manu Joseph the creator of PyTorch Tabular, GATE and LAMA-Net on LinkedIn.

Check out the GitHub repository to learn more about the update.

PyTorch Tabular is a deep learning library that makes working with deep learning and tabular data easy and fast. The library has been built on frameworks PyTorch and PyTorch Lightning, and it works on pandas data frames directly.

The framework makes the standard modeling pipeline simple enough for practitioners while also being reliable enough for production use. It also focuses on customization so that it can be used in a variety of research settings. The below picture depicts the structure of the framework.

Latest Additions

The latest enhancements in the library include several updates.

Firstly, two additional parameters have been added to the GATE model, expanding its functionality and versatility. Additionally, the library configuration now includes the metric_prob_input parameter, providing improved control over metrics within the models. The GATE model has undergone slight improvements, including adjustments to defaults that enhance its overall performance.

Furthermore, various minor bug fixes and improvements have been implemented, such as the addition of accelerator options in the configuration and enhancements to the progress bar. Alongside these enhancements, the library has been updated with newer versions of dependencies, including docformatter, pyupgrade, and ruff-pre-commit. These updates contribute to the library’s overall reliability, functionality, and performance.

Read: How to Handle Tabular Data for Deep Learning Using PyTorch Tabular?

The latest version has been released four months after v1.0.1. Other improvements include various code optimizations, bug fixes, and CI enhancements. For more details, refer to the commits on the library’s GitHub repository.

PyPi: https://pypi.org/project/pytorch-tabular/

Documentation: https://pytorch-tabular.readthedocs.io/en/latest

The post PyTorch Tabular v1.0.2 Released appeared first on AIM.

Now Everyone’s a Developer, Thanks to Microsoft

Mohit Pandey — Wed, 24 May 2023 09:30:00 +0000

We are witnessing a generational shift in technology and the job market with AI. Until recently, coders were using low-code/no-code tools like Codex, Github Copilot, or Replit to write better code. Now, even ChatGPT or Bard can generate code ready for deployment just by inputting simple prompts in natural language.

Nick Bostrom, in his TED Talk in 2015, said, “Machine intelligence is the last invention that humanity will ever need to make. Machines will then be better at inventing than we are.”

This explains a lot of what is happening now. No one needs to count themselves out of this AI phenomenon anymore. Even if you haven’t ever written even a single line of code, now machines will do that for you. All you need to do is tell the no-code platform what you want specifically in whatever you are trying to build, the AI will generate the code. All you need to do is just deploy it.

“I believe everyone is a developer now,” was mentioned multiple times at the Microsoft Build 2023 conference. Now everyone would be able to code and land a job in AI, even if they haven’t learned how to code. “There are several opportunities for people who might not consider themselves traditional developers”, Microsoft is introducing more things to make this true.

At Microsoft Build 2023, it was clear that the company wants everything to be integrated with AI by introducing a Copilot in almost every single offering.

Is it that easy?

Andrej Karpathy posted in January, “The hottest new programming language is English.” Some still argue that there is a need for programmers, but these new softwares are making the job look obsolete. Soon, instead of an eligibility requirement, the job listings for developers will say “knowledge of Python or C++ is an additional advantage, but not a requirement“.

It is increasingly becoming true with prompt engineering. Moreover, this hot new job in the market is getting paid more than Python developers. In certain cases, the salaries are upwards of $335,000, which is higher than a majority of full stack developer roles.

There has always been a disparity between salaries of programmers, coders, or developers versus other jobs that do not require the knowledge of programming. Software engineers have been the highest paying job for a long time now. But people who spend years and thousands of dollars to learn programming also expect higher salaries for their skills. But this is mostly not required anymore.

We have all heard of “upskilling”, now it’s time for “downskilling”. The developers with expertise in C++ or Python should start removing it from their resumes to get jobs quicker. However, if you are building an auto-coding platform similar to ChatGPT and CodeX, then you have no other choice, but to upskill. If not, you are by default a ‘prompt engineer’ – as resonated at Microsoft Build.

This is exactly what even Mark Cuban said a few years ago: “Twenty years from now, if you are a coder, you might be out of job.” It seems true now that AI is coming for the job of developers. This does not mean there is no need for them, but that a lot of the ones without too much experience in coding and in-depth working of AI systems, can now be replaced by anyone who can prompt AI the best to perform simple tasks.

So developers have two choices — either “upskill” yourself to build something to compete with Microsoft, OpenAI, and Google, or “downskill” yourself to get a job quicker.

‘Overskilled’ for a job

Microsoft, Google, and OpenAI, have brought in a massive change, not just in theirs, but every company. Watching the capabilities of these AI models, everyone was scared of losing their job, as companies put a freeze on hiring and laid off people whose jobs could be done by AI. Now, it seems like not being skilled with programming languages is going to help people land a developer job even better!

Companies would not want to hire a person who requires a higher salary just because they paid a higher tuition fee to learn computer science, and coding nuances. If they can reap the same benefits, with someone who can just prompt AI to generate the same results, in certain cases, even quicker, what’s the use?

Though prompt engineer salaries are not going to stay on top. They will drop significantly once everyone starts to adopt the technology. Rob Lennon, a prompt engineering tutor said, “In six months, 50,000 people will be able to do this job. The value of this knowledge is greater today than it will be tomorrow.”

People have already been using ChatGPT to get multiple jobs. Some have even started business with it, some have developed their own apps. In certain cases, prompt engineering and ChatGPT has become an important skill to bag a job.

Moreover, people who do not want to get into AI, can adopt the technology to become better at their job. For example, a writer can use ChatGPT to write quicker, and even better with certain prompts. Good news for the writers protesting against Hollywood. They can prompt AI to write much better, which is trained on their own content!

On the other hand, this might also make people lose artistic jobs. For example, a person who has never written a poem in his life, can now prompt ChatGPT to write a beautiful one with the right prompts. While the poet who did not learn will keep struggling with ideas.

Now everyone can become a coder, developer, or programmer, without ever having deployed a single code in their lives. Good luck developers, stay strong. Meanwhile, a person who knows English will generate code to replace you.

The post Now Everyone’s a Developer, Thanks to Microsoft appeared first on AIM.

This Job Role will Still be Relevant When Data Scientists be Gone

Lokesh Choudhary — Fri, 12 May 2023 12:47:30 +0000

In an era when being a Data Scientist was the epitome of cool, college graduates flocked to the field, drawn by the allure of its potential. The hype was real, and the demand for these professionals was skyrocketing. However, as artificial intelligence (AI) and machine learning (ML) continue to advance at an astonishing pace, doubts have arisen about the very existence of Data Scientists. The rapid adoption of AI and ML has ignited a passionate debate about the future of this once-revered profession.

On one side of the argument, there are those who assert that the recent announcement of OpenAI that the company will be introducing plugins to ChatGPt while teasing the launch of a code interpreter and web browser plugin will render traditional data science roles obsolete. They believe that the plugins may replace many of the common workflows of a data scientist, including visualization, trend analysis, and even data transformation. When looking at the code interpreter in tandem with the other advancements in the data science field, there is a notion that the algorithms and automation offered by AI will replace the need for human intervention in data analysis. Conversely, there are those who staunchly maintain that AI and ML will open up new and exciting opportunities in the field of data science.

One such role is of ML Engineers, where experts believe that the role of Data Scientists will gradually transform into. According to a report by Indeed, the job title “machine learning engineer” is growing at a rate of 344%, while the job title “data scientist” is growing at a rate of 25%. While another report by O’Reilly Media found that 80% of data scientists are planning to learn machine learning in the next year.

Since the age of generative AI is catching up, and the models often involve large-scale data processing and sophisticated algorithmic architectures, ML Engineers will be in more demand than ever. The engineers possess the technical expertise to handle the computational challenges associated with training and deploying these models effectively. They have a deep understanding of distributed computing, parallel processing, and GPU acceleration, allowing them to optimize the performance of generative AI models and scale them to handle vast amounts of data.

Additionally, ML Engineers are skilled in the deployment and productionisation of ML models. Generative AI models are not just research prototypes; they are increasingly being integrated into real-world applications. ML Engineers have the know-how to deploy these models into production environments, ensuring their stability, scalability, and robustness. They are proficient in building end-to-end ML pipelines, handling data preprocessing, model deployment, and monitoring, which are crucial steps in incorporating generative AI into practical use cases.

Furthermore, generative AI models often require fine-tuning and customization to align with specific business objectives and user requirements. ML Engineers possess the expertise to fine-tune and adapt these models, leveraging techniques such as transfer learning and hyperparameter tuning. They can tailor generative AI models to address specific challenges and optimize their performance for the intended application domain. Moreover, ML Engineers have a comprehensive understanding of the ethical implications and considerations associated with generative AI. They are aware of the potential biases, fairness issues, and privacy concerns that can arise when deploying AI models that generate content. ML Engineers are equipped to address these challenges and implement safeguards to ensure the responsible and ethical use of generative AI.

Data Designers

The role of a “Data Designer” is also becoming increasingly crucial in today’s data-driven organizations, particularly in the era of Generative AI. These professionals hold the responsibility of defining the organization’s “unique norm of data,” encompassing aspects such as data literacy, models, topics, and ontology. Moreover, they play a pivotal role in establishing a unified and coherent data vision across the entire organization, ensuring that everyone adopts a “common language” when dealing with data.

The primary focus of a data designer is to establish a structured framework for data management, ensuring that data is organized, accessible, and usable across the organization. They design and implement data models, which serve as blueprints for how data is structured, stored, and interconnected. These models help in capturing and representing the relationships between different data elements, enabling efficient data analysis and interpretation.

In addition to data modelling, data designers also define data standards and guidelines for data governance. They establish data quality criteria and ensure that data is accurate, consistent, and reliable. Data designers collaborate with various stakeholders, including data engineers, data scientists, and business analysts, to understand their data requirements and translate them into practical data design solutions.

Another important aspect of a data designer’s role is to create a common language or ontology for data within the organization. They develop a standardized vocabulary and terminology that allows different teams and departments to communicate effectively when working with data. This helps in avoiding confusion, improving collaboration, and promoting data literacy across the organization.

The post This Job Role will Still be Relevant When Data Scientists be Gone appeared first on AIM.

This New Programming Language is Likely to Replace Python

Mohit Pandey — Wed, 03 May 2023 09:30:00 +0000

AI infrastructure company, Modular AI, recently unveiled Mojo, a new programming language that combines the syntax of Python along with the portability and speed of C, making it ideal for both research and production.

Besides this, in the Product Launch 2023 Keynote, Tim Davis and Chris Lattner, the person behind LLVM and Swift fame also released one of the fastest, unified inference engines called Modular Platform.

Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.

Also, it's up to 35000x faster than Python and … deploys pic.twitter.com/tjT09U4F80
— Modular (@Modular) May 2, 2023

The creators of Mojo say that it had no intention of creating a new programming language. “But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, we realised that programming across the entire stack was too complicated,” reads the blog.

This means building a programming language with powerful compile-time metaprogramming, integration of adaptive compilation techniques, caching throughout the compilation flow, and other things that are not supported by existing languages. That is the direction that Mojo is heading towards. The team claims it is 35000X faster than Python.

Some of the key features are:

Native support for multiple hardware backends: Mojo allows for utilization of CPUs, GPUs, TPUs, and custom ASICs, catering to the strengths of each hardware type.
High-level syntax and semantics: Mojo’s high-level syntax and semantics are comparable to Python, making it easy for Python-savvy developers to learn and use.
Automatic parallelisation: Mojo simplifies writing of efficient, parallel code through automatic parallelisation across multiple hardware backends, without requiring low-level parallelisation knowledge.
Type inference and checking: Mojo offers a type inference and checking system, catching compile-time errors and reducing the likelihood of runtime errors.
Static compilation: Mojo is statically compiled, resulting in faster execution times and better optimization as code is compiled before execution.

New Programming Language, Really?

Looks like Julia, the one that was touted as the Python replacement for its scalability and one of the most embraced programming languages of the last few years, competing with Rust, finally has another competitor.

Moreover, according to the documentation of Mojo, instead of starting from scratch, the programming language will leverage the entire ecosystem of Python libraries, while also being built on a brand new codebase. This, along with the high computational ability of C and C++ will enable AI Python developers to rely on Mojo, instead of falling back on C or C++.

One of the major motivations behind building the new programming language according to the developers was that most of the modern programming systems rely on accelerators like GPU for operations, and only “fall back” on main CPUs for supporting operations like data loading, pre and post processing, and integrations into foreign system written in other languages. The company wanted to support the full gamut of this into one language.

Moreover, to not build and innovate a new syntax or community, the company decided to go with Python and its ecosystem. A very smart move indeed!

Mojo is also going to remain open-source till it becomes the superset of Python 3.

Competitions Galore

According to the Stack Overflow Developer Survey 2022, Rust is the most loved programming language, that too for the last seven years continuously. The problem with Rust is its complex syntax, making it a difficult steep learning curve. But even then, Rust is used by Meta, Dropbox, with Google planning to implement it as well.

In the same survey, Julia ranked in the top 5 of the most loved languages, defeating Python. Same was the case the year before that. Viral Shah, the co-creator of Julia, in a decade old interview with AIM, said, “We wanted a language that did not trade-off performance for productivity and instead provided both high performance and high productivity.”

Interestingly, Elon Musk had recently tweeted about how AGI will not be built on Python, but on Rust. This comes after him saying that he is a fan of Rust last year. To this thread, some users replied that they are on the side of Chris Lattner, and hope that it’s Swift, one of the earlier offerings of Lattner. Now, Modular said that “What if it’s the best of all of them?”

What if its the best of all of them? Tune in May 2nd at 9am to find out on https://t.co/bhbmGy7hYb https://t.co/vXetLqqKQs pic.twitter.com/S15DOjA1aH
— Modular (@Modular) April 22, 2023

Addressing a lot of these questions on HackerNews about the comparison being made with Julia and Rust, and also future plans to compete with Python, Chris Lattner, one of the co-creators, praises Julia as a “wonderful language and a wonderful community,” calling himself as a super fan. Addressing the differences between Julia and Mojo, he stresses on the point that Mojo has a bunch of technical advancements when it comes to languages like Swift, Rust, C++, and even Julia, because it has learnt from them, and built over them.

He further adds that there is definitely space in the AI/ML landscape for another language that makes it easier to deploy and scale down models, while also supporting the full Python ecosystem. He further said, “Julia is far more mature and advanced in many ways.” Interesting how Lattner looks at a problem and decides to make a new programming language altogether, as pointed out by a Twitter user.

Though the developers have been humble about how they are taking the approach with Python, the community on HackerNews and Twitter is all out comparing it with Python.

A Game Changer?

Python, or even Julia, isn’t a preferred programming language when it comes to systems programming, but mostly for AI model building. Though it overcomes that limitation with low-level binding to C and C++ for building libraries. But building these hybrid libraries is a complicated task that requires the knowledge of C and C++ as well. This is where Mojo comes in and makes it into a one integratable and backwards compatible Python-like language – “Pythonic-C”

But on the other hand, whenever there is a new technology coming in place, there are the sceptics and naysayers, who sometimes also bring up interesting points. While some people on Hacker News forum are arguing that this might be a Python replacement, some are still sceptical about the performance improvement that the creators of the language promise. Some even don’t call it ‘Pythony’, which the developers behind the language have made efforts to stay away from.

Python seems like it’s becoming the language of the future for its ease of use, utility across many distinct domains, and support through various large language models.

The duct tape of programming languages.

Wouldn’t have expected it. https://t.co/Wmf7gKCNHr
— Andrew Ruiz (@then_there_was) May 2, 2023

Another person on the forum calls Mojo the final nail in the coffin for “Julia as a replacement for Python”. Maybe Julia has missed out on its window of opportunity to replace Python, and Mojo is here to do the job. Still, the arena of programming languages remains unpredictable.

Moreover, this might be just another Julia moment in the world of programming, with Python syntax. Anyway, OpenAI is on a somewhat similar mission with Triton, their own programming language.

The post This New Programming Language is Likely to Replace Python appeared first on AIM.

6 Data Science Job Openings at Leading Indian Companies

Shritama Saha — Fri, 28 Apr 2023 04:30:00 +0000

Industries spanning finance, healthcare, e-commerce, and marketing are harnessing the power of data science to boost growth, efficiency, and innovation, making it a lucrative field to be in. As the demand for data scientists continues to soar, this may be the right time for those seeking a profitable career shift. So if you are a budding or a seasoned data scientist looking for a job change, we have got you covered.

JP Morgan & Chase

Role: Associate Sr – Data Science

JP Morgan & Chase is seeking individuals for the position of Associate Sr – Data Science who is passionate about converting data into valuable insights and empowering breakthroughs in business. They will analyse customer behaviours and predict financial needs to optimise sales performance and develop self-serve tools for real-time information and identify potential attrition events for effective counteraction.

Minimum Qualifications:

Bachelor’s/Master’s Degree in quantitative analytics fields with over seven years of data science experience
Proficient in SAS/Python programming, data wrangling, and complex SQL scripting
Skilled in solving business problems with fact-based and scientific analytics

Preferred Qualifications:

Familiarity with financial services, consulting, or marketing agency insights functions
Experience in workforce analytics, sales performance analytics, or sales science organisation
Expertise in big data disciplines, Agile methodologies, and new technologies

Click here to apply.

Bosch

Role: Data Scientist

As a data scientist in the Bosch global team, you’ll provide AI and ML solutions and collaborate with other departments, enhance existing cloud-based solutions, and explore new use cases.

Minimum Qualifications:

The ideal candidate must have a bachelor’s or master’s degree in software engineering, computer science, mathematics, or a similar field.
At least two years of experience in professional data science.
The candidate must possess extensive knowledge of Python (with knowledge of R being a plus), as well as object-oriented programming languages with a strong emphasis on clean code. They must have a proven track record in areas such as ML, neural networks, pattern analysis, time series forecasting, data analysis, and data-pipeline technologies (e.g., Kubernetes, Docker, NoSQL databases, Workflow-Engine).
A strong understanding of statistics is necessary, and knowledge of cloud technologies (ideally Microsoft Azure) and SQL would also be needed.

Click here to learn more about it.

PayPal

Role: Manager – Data Science

As a Manager -Decision Science, you will lead and develop a team of skilled decision scientists, providing them with daily guidance to ensure exceptional results and also oversee key metrics and make changes when necessary to improve performance.

Minimum Qualifications:

Experience as either a manager or Lead Decision Scientist / Lead Data Scientist / Analytics roles
The ideal candidates must have excellent analytical skills, including expertise in SQL and data visualisation.
They should also have experience in leading cross-functional collaborations and managing relationships with multiple stakeholders.
The ability to effectively lead a team and promote teamwork is also essential.
The company recognises that some candidates may lack confidence due to imposter syndrome and encourages them to apply regardless.

Apply here.

LSEG (London Stock Exchange Group)

Role: Data Scientist

The role of a data scientist involves data handling, programming, and developing automated solutions for sourcing information. It includes analysing content, building models, and using machine learning to improve core processes. Collaboration with various teams is required to tackle large-scale analytics issues and create visualisations and pipeline tools.

Minimum Qualifications:

Higher education in statistics, mathematics or engineering in computer science with data science certification.
Proficiency in developing and delivering automation using Python and R. Additionally, they should be skilled in utilising one or more technologies such as VBA, SQL, JAVA, and RPA to drive improvements in business processes.
Strong understanding of data science principles and have experience managing financial content.
Adaptable to new technologies and can provide guidance to team members.

Preferred Qualifications:

The desired skills for this role include data mining, data sourcing, NLP, unsupervised and deep learning, and predictive modelling.
They should also be experienced in using AWS and other cloud-based tools that facilitate the onboarding of Python codes.

Check out their careers page now.

PwC

Role: Senior Manager – Data Science

The role involves collaborating with US-based consultants and clients, and working closely with the business analytics teams in India. The main responsibilities will include leading high-level analytics consulting projects and providing sound business advice to project teams.

Minimum Qualifications:

The candidate should be experienced in managing and deploying ML models on cloud environments, with a strong understanding of supervised and unsupervised ML algorithms, statistics, and data analysis.
The candidate also has extensive experience working with various ML frameworks and tools such as scikit-learn, mlr, caret, H2O, TensorFlow, Pytorch, and MLlib.
They must have advanced-level programming skills in SQL and Python/Pyspark, which enables them to guide teams. In addition, the candidate is proficient in using visualisation tools such as Tableau, PowerBI, and AWS QuickSight to convey information to stakeholders.

Apply here.

Walmart Global Tech

Role: Senior Data Scientist

As a senior data scientist, you’ll solve complex problems by analysing terabytes of data using data science tools and techniques. You’ll also be involved in developing PoCs and presenting them to the product team, then work with ML and software engineering teams to deploy solutions as APIs or pipelines. You’ll stay updated with the latest tech and mentor junior associates in providing robust data science solutions.

Minimum Qualifications:

The candidate either needs to have a bachelor’s degree in a related field such as statistics, economics, analytics, mathematics, computer science, or information technology, along with at least three years of experience in an analytics-related field or have a master’s degree in one of the mentioned fields with at least one year of experience in an analytics-related field or have a minimum of five years of experience in an analytics or related field.

Click here to apply.

The post 6 Data Science Job Openings at Leading Indian Companies appeared first on AIM.

Developer of Ruff Launches Astral, Chokes Python

Tasmia Ansari — Wed, 19 Apr 2023 07:41:40 +0000

Charlie Marsh, former Spring Discovery and Khan Academy staff engineer, has announced the launch of his new company, Astral, to build developer tools for the Python ecosystem. The company is based on the successful release of Ruff, a fast Python linter written in Rust, which has gained significant traction. Reacting to the release, developers said, “Rust is eating Python like it did JavaScript.”

In just two months since its initial release, Ruff has garnered over one million monthly downloads and has been embraced by Airflow, FastAPI, Pandas, and SciPy. Tech companies including Amazon, Microsoft, Mozilla, and Netflix have also adopted Ruff for their development workflows. The overwhelming support and adoption of Ruff have validated Marsh’s belief that Python tooling could be significantly faster.

What sets Ruff apart from other linters in the market is its exceptional speed.Ruff is approximately 150 times faster on macOS compared to Flake8, 75 times faster than pycodestyle, and 50 times faster than pyflakes and pylint.

The advantages of Ruff, as listed by the developer, include its second-mover advantage, having learned from existing tools in the market. Ruff also offers deduplicated work, enabling it to perform multiple tasks at once, which would require multiple linters to parse through the code separately. Furthermore, Ruff’s performance is attributed to Rust being faster than Python.

To propel the growth of Astral and continue developing Ruff, Marsh has raised $4 million in seed funding, led by Accel, a leading venture capital firm known for backing successful tech startups. Other investors are Caffeinated Capital, Guillermo Rauch (Vercel), Solomon Hykes (Docker), David Cramer (Sentry), Wes McKinney (Voltron), and Nick Schrock (Elementl), among others..

In the release statement, Marsh expressed his excitement about the future of Astral and his commitment to pushing the boundaries in the Python ecosystem. He believes Astral will empower developers to build cutting-edge applications with ease.

The post Developer of Ruff Launches Astral, Chokes Python appeared first on AIM.

ViperGPT vs GPT-4

Vandana Nair — Tue, 21 Mar 2023 06:30:00 +0000

Former Google Research Scientist Carl Vondrick, who is currently an Assistant Professor at Columbia University, along with two computer vision PhD researchers from the same university, Dídac Surís and Sachit Menon, proposed the ViperGPT, a framework for programmatic composition of specialised vision, language, math, and logic functions for complex visual queries.

This new model is capable of connecting individual advances in vision and languages, alongside enabling them to show capabilities beyond what any individual model can do on its own. Simply put, you can input your query in any visual format, including image and video, and obtain the desired result. Depending on the type of query, the output generated can either be in an image format or text format. But, in the case of GPT-4, the output is only text format.

How does it work?

The framework uses a combination of step-by-step reasoning from Codex along with external knowledge queried from GPT-3’s text model which results in impressive performance in this setting.

ViperGPT currently uses Codex API on the GPT-3 model. The pre-trained models used are : GPT-3 model for LLM query function (text-davinci-003). The official OpenAI Python API8 is also used.

The LLM then uses the API to write a Python programme to solve the input query. This code is then used on the input image/video to generate the desired output via vision and language models.

Providing Codex with API exposing visual capabilities, such as “find, compute_depth”, are enough to create these programmes. With prior training on code, the model is able to reason on how to use these functions and implement relevant logic. By this approach, the results of the model have delivered remarkable ‘zero-shot performance’—without training on task-specific images.

The paper also mentions that as the model improves, ViperGPT will produce improved results. To support research in the proposed model, the team said that a Python library will be developed that will promote rapid development for programme synthesis for visual tasks which will eventually become open source.

Queries solved on ViperGPT (Source: arXiv)

Evaluation model

ViperGPT was evaluated on four tasks to understand the model’s diverse capabilities in varied contexts without additional training. The tasks include:

Visual grounding, which implies associating language with visual input.
Compositional image question answering, which means that the model works on answering questions using complex compositional reasoning that combines multiple visual and textual inputs.
External knowledge-dependent image question answering, which is a framework to answer questions about images that require external knowledge beyond what information is shown in the image, such as general knowledge or factual information.
Video causal and temporal reasoning, which indicates a model’s ability to reason about events and causal relationships in a video based on both visual and temporal cues.

The researchers considered these tasks to roughly build on one another, with visual grounding being a prerequisite for compositional image question answering, so and so forth. The result: Better spatial relations, best accuracy, alongside outperforming on all zero-shot methods and more.

ViperGPT vs GPT-4

However, one question remains: How is it different from generative models such as GPT-4? The latest multimodal platform, ‘GPT-4’, takes inputs in text and image format. The AI model can receive text prompts and images where the user can specify any type of vision or language-related task. However, the image input capability is still a research prototype and the output provided will be in text alone.

In ViperGPT, depending on the query, the model can produce an output that can be in any format such as text, multiple choice selection, or image regions.

It is to be noted that the parameters used for training GPT-4 or ViperGPT models are not available. It is also to be seen in the long run whether ViperGPT can be used in tandem with other problem solving models, such as GPT-4 itself, to provide an integrated framework utilising recognition and generative models.

The post ViperGPT vs GPT-4 appeared first on AIM.

PyCaret Releases New Update, Includes Time Series Forecasting, Object-Oriented API and more

Lokesh Choudhary — Mon, 20 Mar 2023 08:47:36 +0000

Pycaret, an open-source, low-code machine learning library, recently released a new update – PyCaret 3.0. The new update includes several new features and improvements. The library was created by Moez Ali, who is currently serving as product director for artificial intelligence at antuit.ai.

The library, which automates machine learning workflows and makes experiments exponentially fast and efficient, now includes stable Time Series Forecasting, a new Object-Oriented API, more options for Experiment Logging, Refactored Preprocessing Module, Compatibility with the latest scikit-learn version, Distributed Parallel Model Training, and Accelerated Model Training on CPU.

Key Features

The Time Series module, which is now stable, currently supports forecasting tasks, with plans to add time-series anomaly detection and clustering algorithms in the future. The Object-Oriented API allows the effortless conducting of multiple experiments in the same notebook, with parameters linked to an object and associated with various modelling and preprocessing options. The Experiment Logging feature now includes new options, such as wandb, cometml, and dagshub, alongside the default MLflow.

PyCaret 3.0 includes several new preprocessing functionalities, such as innovative categorical encoding techniques, support for text features in machine learning modelling, novel outlier detection methods, and advanced feature selection techniques. The library now guarantees to avoid target leakage as the entire pipeline is fitted at a fold level.

Moreover, PyCaret 3.0 is compatible with the latest scikit-learn version (1.X), making it easier to use both libraries simultaneously in the same environment. Distributed Parallel Model Training and Accelerated Model Training on the CPU also improve the library’s performance, making it a much more productive tool for citizen data scientists, and power users who can perform simple and moderately sophisticated analytical tasks with ease.

The post PyCaret Releases New Update, Includes Time Series Forecasting, Object-Oriented API and more appeared first on AIM.

Ruff, the New Rust-Based Python Linter for Programmers

Shritama Saha — Wed, 15 Feb 2023 10:03:17 +0000

Former Spring Discovery and Khan Academy staff engineer Charlie Marsh recently released the latest version of ‘Ruff’, a fast Python linter written in Rust.

Ruff can support over 200 lint rules and is now being used in big open-source projects such as FastAPI, Bokeh, Zulip, and Pydantic.

In comparison to Flake8, Ruff is approximately 150 times faster on macOS, 75 times faster than pycodestyle, and 50 times faster than pyflakes and pylint, among others.

The difference between receiving real-time feedback (roughly 300 and 500 milliseconds) and waiting for 12 or more seconds is 25 times, at least. With a 150-times speed increase, the difference becomes a mere 300–500 milliseconds compared to 75 seconds. Ruff’s total processing time for a single file in CPython is about 60 milliseconds, making it faster.

Ruff employs RustPython’s AST parser and implements its own AST traversal, visitor abstraction, and lint-rule logic. In addition, it supports Python 3.10, consisting of the new pattern-matching syntax.

Ruff can be installed through pip, like other command-line tools.

Like ESLint, Ruff enables caching, allowing single-file code linting and CPython codebase linting in 60 ms. Ruff also includes file watching similar to TypeScript, making it a persistent linter that restarts when the source code is altered. Ruff supports pyproject.toml-based setup, which is becoming more popular in the Python community.

Order the latest Book by Dinesh Pundri

Order

[Update: 16 February 2023 15:32 | Previously, the headline erroneously mentioned that Ruff is a programming language instead of Python Linter. The headline has now been updated to reflect the changes.]

The post Ruff, the New Rust-Based Python Linter for Programmers appeared first on AIM.

10 Python Libraries For Your Coding Nightmares

Tasmia Ansari — Mon, 06 Feb 2023 11:30:18 +0000

It is perhaps to not much of a surprise that in the developer community, Python is considered as one of the most popular programming languages of all time. The popularity of the language is often attributed to its versatile nature along with the humongous collection of Python libraries that allows developers to pick their favourites.

In this article, we shed light on some of the lucky-find libraries worth every developer’s time!

Black

An important aspect of coding is the format. Programmes small in size are easier to understand and as the complexity increases, it keeps getting difficult to understand even for the coders. To write the code in a readable, easy-to-understand format, ‘Black’ comes to the rescue. It ensures quality code through automated formatting.

Apart from reporting formatting errors, Black also fixes them. It can also be integrated with Vim, Emacs, VSCode, Atom or GIT.

Here’s the link to the GitHub repository.

Camelot

Extracting crucial data tables from PDFs is difficult but an even bigger problem lingers with the huge amount of data available in PDF files.

This is where ‘Camelot’—an open source versatile library—helps in extracting information without compromising the quality. It comes packaged with a command-line interface and is built on pdfminer, another text extraction tool for PDFs.

Here’s the link to the GitHub repository.

Colorama

The Colorama package methodically prints coloured text in Python. It only supports the 16-colours scheme and prepares the ANSI Escape sequences to produce the text. On Windows, Colorama strips these ANSI characters from stdout and converts them into equivalent win32 calls.

Here’s the link to the Github repository.

Livepython

Livepython lets coders visually track execution of their Python programmes. It traces changes in variables as they run the programme. The alpha software is meant to provide an insight into a given programme’s execution flow and highlights the lines as they are being executed. It contains three main components—a Python tracer, an Electron app along with a node.js gateway script to manage communication.

Here’s the link to the GitHub repository.

Ftfy (Fixes Text For You)

Designed by Robyn Speer, ftfy fixes broken text in Unicode. This works differently than turning non-Unicode into Unicode.

Ftfy has the ability to fix encoding mistakes, commonly known as ‘mojibake’, as it detects character patterns meant to be UTF-8. Currently, it uses Twitter’s streaming API as a source of realistic sample data and works in Python 2.7, Python 3.2, or later versions.

Here’s the link to the GitHub repository.

Git-story

Using a single command, Git-story generates mp4 videos presenting the layout and progression of the Git commit history. It helps developers visualise aspects of their code projects. This is especially true for version control systems like Git, where understanding the team’s workflow is a priority.

Here’s the link to the GitHub repository.

Rebound

This command-line tool fetches Stack Overflow results when an exception is thrown at the developers. All one needs to do is use the rebound command to execute the files. Built on Urwid, Rebound works on MacOS, Linux, and Windows. To scrape content from Stack Overflow content and sub-process errors, the tool uses Beautiful Soup.

Here’s the link to the GitHub repository.

Icecream

Bugs are a developer’s worst nightmare. Using print() to understand the pipeline’s flow and spot errors is the most common method among developers. There are several reasons why ic(), or Icecream package, is better. Firstly, it prints variable names as well as the values and it is 40% faster. Furthermore, the output is highlighted.

Here’s the link to the GitHub repository.

Isort

In Django projects—especially in views where a great amount of imports are dealt with—‘Isort’ is extremely useful. It automatically organises the imports and aligns them in sections by type. This Python library provides a command line utility as well as plugins for various editors. It requires Python 3.8+ to run but also supports formatting Python 2 code.

Here’s the link to the GitHub repository.

Manim

With 13k stars on Github, Manim enables creating animations for mathematical concepts using Python. Note that there are two versions currently; one created by Grant and the other is forked and maintained by the Manim Community.

Here’s the link to the GitHub repository.

The post 10 Python Libraries For Your Coding Nightmares appeared first on AIM.

OpenAI Launches Classifier to Tackle Misuse of AI-Generated Content

AIM — Wed, 01 Feb 2023 06:35:52 +0000

In a move that makes AI detector tools redundant, OpenAI yesterday announced that it has trained a classifier, alongside launching AI Text Classifier to distinguish between text written by a human and text written by AIs from a variety of providers.

Citing automated misinformation campaigns, OpenAI said while it is impossible to reliably detect all AI-written text, it believes that classifiers can inform mitigations for false claims that AI-generated text was written by a human.

Compared to its previously released classifier, this new classifier by OpenAI is significantly more reliable on text from more recent AI systems. The company has made this publicly available to get feedback, checking if tools like this are useful.

Check out the AI Text Classifier here.

AI Text Classifier

AI Text Classifier is a fine-tuned GPT model that predicts how likely it is that a piece of text was generated by AI from a variety of sources, including ChatGPT and alike. OpenAI said that this classifier is available as a tool to spark discussions on AI literacy.

Limitations of AI Text Classifier

A minimum of 1,000 characters are needed for the classifier to run, which is roughly 150–250 words.
The classifier isn’t always accurate; it’s capable of mislabeling AI as well as human-written texts.
Editing AI-generated content is simple and can be used to manipulate the classifier.
Since it was mostly trained on English content created by adults, the classifier is likely to make mistakes when dealing with text written by non-English texts and kids.
Text that is very universal and predictable cannot be reliably identified like the first five alphabets will be the same.
Neural network-based classifiers perform poorly when used outside of training data. OpenAI said that the classifier can occasionally be very certain that a false prediction will be made for inputs that are substantially different to the text in our training set.

We tried the AI Classifier with an essay generated by ChatGPT and here is the result.

The classifier does not assert that the text is generated by AI but agrees that it ‘possibly‘ is.

ChatGPT Ban

This comes in the backdrop of multiple schools and educational institutions raising concerns about students using ChatGPT for answers. ChatGPT has drawn flacks of educational institutions across the world as it can easily write essays, codes, scripts, and poetry and prose of all kinds. Considering the backlash, OpenAI has developed a preliminary resource on the use of ChatGPT for educators, which outlines some of the uses and associated limitations and considerations. OpenAI is working with US schools to find solutions for the problem.

The post OpenAI Launches Classifier to Tackle Misuse of AI-Generated Content appeared first on AIM.

Google Unveils MusicLM, A Music DALL-E

Shritama Saha — Fri, 27 Jan 2023 13:05:41 +0000

Google has released MusicLM, a generative model for creating high-fidelity music from text descriptions, such as “a calming violin melody supported by a distorted guitar riff”. MusicLM makes music at 24 kHz that holds steady for several minutes by modelling the process of conditional music synthesis as a hierarchical sequence-to-sequence modelling problem.

According to tests, MusicLM works better than older systems in terms of audio quality and fidelity to the written descriptions. MusicLM can be conditioned on both text and a melody by changing whistled and hummed melodies to match a text caption’s description of that style.

It also unveiled MusicCaps, the first evaluation dataset collected specifically for the task of text-to-music generation. It is a hand-curated, high-quality dataset of 5.5k music-text pairs prepared by musicians.

Read the full paper here.

Key Features

MusicLM can create music from any text description. Plus, if the audio of a melody is given, it can generate new music inspired by that melody customized by prompts. It turned someone humming ‘Bella Ciao’ into a cappella chorus. It can generate audio with stories and progression and also generate music from paintings.

3. Generate music from paintings:
The Persistence of Memory- Salvador Dalí
"His melting-clock imagery mocks the rigidity of chronometric time. The watches themselves look like soft cheese—indeed, by Dali s own account they were inspired by hallucinations after eating…" pic.twitter.com/te2u0pqN2E
— bleedingedge.ai (@bleedingedgeai) January 27, 2023

Training Process

Each stage is modelled as a sequence-to-sequence task leveraging decoder-only Transformers.

During training, MuLan audio tokens, semantic tokens, and acoustic tokens from the audio-only training set are extracted.

In the semantic modelling stage, semantic tokens are predicted using MuLan audio tokens as conditioning.

In the next acoustic modelling stage, the model predicts acoustic tokens with both MuLan audio tokens and semantic tokens.

During inference, MuLan text tokens, computed from the text prompt, are used as a conditioning signal and convert the generated audio tokens to waveforms using the SoundStream decoder.

Limitations

Some limitations of the method are inherited from MuLan, in that the model misunderstands negations and does not adhere to the precise temporal ordering described in the text.

The Music DALL-E

Similarly to how DALL-E 2 uses CLIP for text encoding, MusicLM is based on a joint music-text embedding model for the same purpose. But unlike DALL-E 2, which uses a diffusion model as a decoder, MusicLM’s decoder is based on AudioLM.

Two weeks ago, Microsoft released VALL-E, a new language model approach for text-to-speech synthesis (TTS) that uses audio codec codes as intermediate representations. It demonstrated in-context learning capabilities in zero-shot scenarios after being pre-trained on 60,000 hours of English speech data.

However, Google has announced it will not make MusicLM available to the public due to potential risks. These include the possibility of programming biases leading to underrepresentation and cultural appropriation, technical errors, and the risk of unauthorized use of creative content.

The post Google Unveils MusicLM, A Music DALL-E appeared first on AIM.

At Davos 2023, Tech Leaders Debate the Future of Generative AI

Shritama Saha — Wed, 18 Jan 2023 08:53:49 +0000

Generative AI was one of the hot topics of the World Economic Forum’s (WEF) Annual Meeting at Davos, where ChatGPT became the ‘goblin word’ of the conference. Even in the shivering cold of -7°C, tech representatives could not stop hyping the immense potential of AI advancements in every field.

The origins of generative AI were with the advent of GANs. In an exclusive interaction with AIM, AI stalwart Bengio raved about the distance that generative models had come since their emergence in 2014. On the recent advancements made by text-to-image generators like OpenAI’s DALL.E and StabilityAI’s Stable Diffusion, Bengio stated, “One of the things that impressed me the most is the progress in generative models.”

At the Davos conference, Microsoft CEO Satya Nadella discussed the stir ChatGPT, the intelligent chatbot from OpenAI, has generated. He said that Microsoft intends to open up access to its cloud-based Azure OpenAI service so that anyone may use its AI tools for business and that customers can access ChatGPT through Azure. Furthermore, he stated that the ChatGPT API would be released soon.

Although Nadella acknowledged that generative AI advancements could be potentially dangerous, he also said they could help resolve problems rather than create new ones.

Another AI stalwart, Meta AI chief Yan LeCun told AIM that data systems are entertaining but not really useful. “To be useful, they have to make sense of real problems for people, help them in their daily lives as if they were traditional assistants completely out of reach,” he added, painting the real picture.

Talking about ChatGPT, LeCun added that many individuals are working on language models using slightly different methods. He said there are three to four companies producing GPT-X-like models. “But, they [OpenAI] have been able to deploy their systems in such a way that they have a data flywheel. So, the more feedback they have, the more the feedback they get from the system, and later adjust it to produce better outputs,” he explained.

Matthew Prince, chief executive officer of Cybersecurity company Cloudflare, stated that generative AI could serve as a “really good thought partner” or a junior coder. He said Cloudflare was writing code on its “Workers” platform using such tech. Cloudflare is also looking into how such technology could help its free-tier clients get answers to questions more quickly.

According to Alex Karp, CEO of Palantir Technologies, which develops software that helps governments track the movements of armies and businesses examine their supply networks, among other things, such AI might be used in the military. “The idea that an autonomous thing could generate results is useful for war,” Karp stated. However, he added that the nation with the fastest AI development would “define the law of the land,” and it is important to consider how technology will affect any war with China.

In other topics, CEO Cristiano Amon of chip maker Qualcomm said that internet-connected glasses are set to make a dramatic comeback.

According to him, connected glasses will eventually replace smartphones as the go-to computer for daily chores as computing grows and the metaverse becomes more pervasive in daily life. Although there are currently connected glasses on the market, the technology has yet to become widely adopted. A notable setback is the Google Glass technology. “The whole tech trend is the merging of digital and physical spaces; within the decade, it’s going to be as big as phones,” he said. However, Amon predicted that the metaverse’s speed would make the technology more resilient this time.

The World Economic Forum’s (WEF) Annual Meeting has started in Davos, Switzerland, with the central theme “Cooperation in a Fragmented World”. The conference is scheduled from January 16th to January 20th, 2023.

The post At Davos 2023, Tech Leaders Debate the Future of Generative AI appeared first on AIM.

Microsoft Announces Azure OpenAI Service for All

Shritama Saha — Tue, 17 Jan 2023 05:12:28 +0000

Microsoft today announced the general availability of its cloud-based Azure OpenAI service so that common people can use its AI tools like GPT-3.5, Codex, and DALL•E 2 to enhance their work. It also said customers could access OpenAI’s flagship AI chatbot ChatGPT through Azure. OpenAI too announced that it would add ChatGPT to its API soon.

We've learned a lot from the ChatGPT research preview and have been making important updates based on user feedback. ChatGPT will be coming to our API and Microsoft's Azure OpenAI Service soon.

Sign up for updates here: https://t.co/C7kMVpMAKv
— OpenAI (@OpenAI) January 17, 2023

Those using the Azure Service already have access to tools like the GPT-3.5 language system that ChatGPT is based on and the Dall-E model for generating images from text prompts. “ChatGPT is coming soon to the Azure OpenAI Service, which is now generally available, as we help customers apply the world’s most advanced AI models to their own business imperatives,” tweeted Microsoft chief Satya Nadella.

ChatGPT is coming soon to the Azure OpenAI Service, which is now generally available, as we help customers apply the world’s most advanced AI models to their own business imperatives. https://t.co/kQwydRWWnZ
— Satya Nadella (@satyanadella) January 17, 2023

Microsoft stated that organizations of all sizes and industries are utilizing the Azure OpenAI Service to achieve greater results with fewer resources, enhance the user experience, and streamline internal operations. Both small startups like Moveworks and large multinational corporations like KPMG are leveraging the capabilities of Azure OpenAI Service for advanced applications such as customer support, personalization, and extracting valuable insights from data through search, extraction, and classification.

Recently, OpenAI announced its monetisation plans for ChatGPT and posted a waitlist link on its Discord server along with a range of questions on payment preferences for the paid version, which will be called ChatGPT Professional. This news came soon after Microsoft revealed that it’s now looking at pumping an additional $10 billion in OpenAI, a huge leap from its initial $1 billion investment in the company in 2019.

When OpenAI rolled out ChatGPT in November, it went viral in less than 10 days. Built on GPT-3.5, ChatGPT can interact with humans in natural language and text and remember the context. Whether writing codes or a joke, the futuristic chatbot can do both.

The post Microsoft Announces Azure OpenAI Service for All appeared first on AIM.