The race for building a team of AI software engineers doesn’t stop. After Cognition’s Devin, Cosine, the human reasoning lab, has introduced Genie, hailed as the most capable AI software engineering model globally, achieving 30.08% on SWE-Bench evaluations.
Genie is designed to emulate the cognitive processes of human engineers, enabling it to solve complex problems with remarkable accuracy and efficiency. “We believe that if you want a model to behave like a software engineer, it has to be shown how a human software engineer works,” said Alistair Pullen, the founder of Cosine.
Moreover, this UK-based AI startup Cosine has secured $2.5 million in funding from SOMA and Uphonest, with additional investment from Lakestar and Focal and is part of the YC-W23 batch.
As the first AI Software Engineering colleague, Genie is trained on data that mirrors the logic, workflow, and cognitive processes of human engineers.
This allows it to overcome the limitations of existing AI tools, which are often extensions of foundational models with added features like web browsers or code interpreters. Unlike these, Genie can tackle unseen problems, iteratively test solutions, and proceed logically, akin to a human engineer.
Genie has set a new standard on SWE-Bench, achieving a score of 30.08%, a 57% improvement over the previous best scores held by Amazon’s Q and Code Factory.
This milestone not only represents the highest score ever recorded but also the largest single increase in the benchmark’s history. Genie’s enhanced reasoning and planning capabilities extend beyond software engineering, positioning it as a versatile tool for various domains.
In its development, Genie was evaluated using SWE-Bench and HumanEval, with a strong focus on its ability to solve software engineering problems and retrieve the correct code for tasks.
Genie scored 64.27% in retrieving necessary code lines, identifying 91,475 out of 142,338 required lines. This marks significant progress, though Cosine acknowledges room for improvement in this area.
Genie’s development involved overcoming challenges related to training models with limited context windows. Early efforts using smaller models highlighted the need for a larger context model, leading to Genie’s training on billions of tokens. The training mix was carefully selected to ensure proficiency in the programming languages most relevant to users.
Cosine’s innovative approach to Genie’s development included the use of self-improvement techniques, where the model was exposed to imperfect scenarios and learned to correct its mistakes. This iterative process significantly strengthened Genie’s problem-solving abilities.
Looking ahead, Cosine plans to continue refining Genie, expanding its capabilities across more programming languages and frameworks. The company aims to develop smaller models for simpler tasks and larger ones for complex challenges, leveraging their unique dataset. Exciting future developments include fine-tuning Genie on specific codebases, enabling it to understand large, legacy systems even in less common languages.