🤖 Creating a fully autonomous AI developer to write and maintain code is still the holy grail of software development. There are many brilliant teams currently working on this, and each of them have their own unique approach. The team at Cosine have made a massive step towards making this a reality. “Cosine achieves state-of-the-art results on the SWE-bench benchmark” They have achieved a score of 43.8% on SWE Bench, which was verified by OpenAI. What's even more incredible is they have managed to do this with a lean team and a relatively limited budget. Though it’s important to note Cosine isn’t looking to replace human developers entirely, but rather augmented them with human-like assistants that they can collaborate with on any kind of coding task. Yang Li + Sam Stenner + Alistair Pullen Read more about their collaboration with OpenAI 👇 https://lnkd.in/gu7xXVPn
Rockman Law’s Post
More Relevant Posts
-
Coding has been the prime jewel of the IT crown, how long they can hold their hegemony, only time will tell. Artificial Intelligence changed it all. The "battle" aspect highlights the rivalry among different platforms, such as GitHub Copilot, OpenAI's Codex, Amazon’s CodeWhisperer, Tab-nine, and others, all of which aim to become the go-to solution for developers and companies. These tools vary in capabilities, supported languages, and features, so they’re often compared against one another based on criteria like accuracy, speed, ease of integration, and overall impact on developer productivity.
AI Coding Tools Battle: Which Tool Will Lead the Future of Coding?
https://www.youtube.com/
To view or add a comment, sign in
-
If you read only one thing about AI/LLMs this week, make it this: Hear me out… by Gergely Orosz https://lnkd.in/eFTYDSMA A quick update this week — I am still catching up from vacation — but I loved this take. Gergely writes the excellent Pragmatic Engineer Substack (subscribed!), and his thinking on AI coding is still evolving (see this from a few weeks ago: https://lnkd.in/esDxHMUZ). But this Tweetstorm is right on: coding is possibly the best suited domain for LLMs, because we have so many existing tools and workflows to check code for correctness, style, adherence to patterns, etc. There’s still a lot to do to get the right context into the system, and set up the right checkpoints in process and output, but agentic feedback loops combined with compilers and linters get you 90% of the way there. It’s not a question of if, but exactly how. That said, we are living in an age of lofty promises. Devin and OpenDevin and SWE-Agent (and Stride Conductor) are amazing — with a few tools and some situational awareness, LLMs are able to work with codebases much as a junior engineer would. They can’t yet do everything a smart, experienced human dev can do — and that’s OK! But they’re really well suited for dirty jobs — stuff like tech debt remediation, fixing broken tests, clearing out trivial backlog, etc. Enjoy the weekend!
Gergely Orosz (@GergelyOrosz) on X
twitter.com
To view or add a comment, sign in
-
CodeMate #VSCode Extension 2.9.0 released yesterday. Checkout what's new 🤯 - Code Action and Code Lens added for modifying codebase, creating docstrings, inline comments or asking any question within the editor through in-line commands 💯 Added shortcut commands to trigger CodeMate inside editor- - // Codemate: <query>: for asking any questions related to the code - // Generate: <query>: for generating code from natural language within the editor keeping the existing code in context Checkout CodeMate AI now at https://codemate.ai Ayush Singhal Kshitij S. Tyagi #codemateai #softwaredevelopment #AI #Programming
To view or add a comment, sign in
-
Discover the power of #AI in coding! Tools like: Tabnine (AI-driven code suggestions) Codex (AI-generated code snippets) DeepCode (AI-based code review) are revolutionizing the way we code, making development faster, more efficient & accurate #AICoding #FutureOfWork
To view or add a comment, sign in
-
I am optimistic about LLMs in software engineering and wanted to put this to the test by competing in the first AI competitive coding competition, the NeurIPS HackerCup AI Competition. My team, Matus Lecky and Isaac Ray, competed in the open track of the competition, where we faced the challenge of self-hosting inference and training on a 40GB A100, limiting us to small LLMs. We made it to Round 2, where we came 4th out of 872 participants, however in round 3, the questions got significantly harder and our agent was unable to solve a problem (In our defense, neither did almost any other team including the closed track teams with access to the superior O1 model). Despite the ending, I would still like to briefly describe our strategy because we put considerable effort into it: 🔹 Scaffolding & Pipeline: Careful prompt engineering was essential to getting the small LLM to produce high quality code/reasoning that could be successfully parsed and executed. 🔹 Observations & COT: We generated a pool of observations about the problem and step by step reasoning that we randomly selected from to concatenate to the problem statement. 🔹 Codestral-22B: we tested many models and found this was the best base model that fit comfortably in 40GB. 🔹 Maj@128: We generated 128 code samples, tested them against the sample cases, and applied majority voting if multiple passed. 🔹 Inference speed: Using VLLM for parallel inference and carefully tuned parameters we reached an average output of 2000 tk/s allowing for more tokens per question without exceeding the strict time limit. 🔹 Code improvement: Repeatedly improve the best scoring samples, until they passed. Using these strategies, we enabled Codestral-22B to solve the easier questions of the competition which it previously couldn’t handle in a zero-shot setup. Despite this progress it is clear that open-models are not yet competent competitive coders, but I’m optimistic that with current advancements in LLM reasoning, next year’s competition will show major improvements. We’ll also be open-sourcing our codebase at https://lnkd.in/eC9v9U4N for those interested in building on it or exploring further enhancements. We’re excited to see where this work can lead!
GitHub - Joeclinton1/MapCoder-Hackercup: A modified version of the MapCoder project made for the Neurips 2024 Hackercup Ai track
github.com
To view or add a comment, sign in
-
𝐒𝐦𝐨𝐥 𝐀𝐠𝐞𝐧𝐭𝐬: 𝐓𝐡𝐞 𝐒𝐦𝐚𝐥𝐥 𝐀𝐈 𝐇𝐞𝐫𝐨𝐞𝐬 𝐌𝐚𝐤𝐢𝐧𝐠 𝐚 𝐁𝐢𝐠 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞! Hugging Face introduces the Most Straightforward Way to Build AI Agents! Hugging Face has created the simplest framework yet that cuts through the complexity. What makes this special: • Works with ANY language model (OpenAI, Anthropic, or Hugging Face Hub models) • First-class support for Code Agents - not just agents that write code, but agents that execute their actions through code • Seamless Hugging Face Hub integration for sharing and loading tools • Zero unnecessary abstractions - just clean, efficient code SMOL agents are powerful yet beautifully minimal. #AI #Programming #TechInnovation #AIAgents #OpenSource #HuggingFace #DeveloperTools https://lnkd.in/gicmFGdc
smolagents
huggingface.co
To view or add a comment, sign in
-
In just ONE year, AI has gone from struggling with even the simplest bug fixes to successfully handling nearly three-quarters of the set of real-world coding tasks in popular open-source software.
Why AI Progress Is Increasingly Invisible
time.com
To view or add a comment, sign in
-
AI powered code reviews CodeRabbit 🤖 As we are heading towards the greatest time of AI we have this tool to review the code of the PR. 💤 I don't know how effective it is as everyone has a different sort of architecture while coding. 🤷 Will you use this tool to review the PR? 🤔 Get your's: https://lnkd.in/dGVYB3Ee #codereviews #coderabbit #githubpr #codereviewai #github #ai
To view or add a comment, sign in
-
-
I've yet to understand how people even get running code with Ai - beyond basic repetitive lines copilot still makes up methods and syntax for me, so I end up debugging longer than it takes to just use the ide auto-completion. Ai coding is amazing when you know what you want, but when it gets ahead of you it's not that different from copying mystery code from stack overflow. #programing #webDevelopment #ai https://lnkd.in/dbP9UfJ3
To view or add a comment, sign in
-
Tired of Prompt Engineering Hassles? Are you tired of the frustration that comes with crafting complex prompts for large language models (LLMs)? You're not alone. Many developers struggle with the intricacies of prompt engineering, often feeling overwhelmed and stuck. DSPy is designed to simplify your life and empower you to focus on what you love most: building innovative solutions. Imagine a tool that allows you to create and optimize LLM applications effortlessly—this is the promise of DSPy. ** How DSPy Works? DSPy transforms your interaction with LLMs by shifting from manual prompting to a programmatic approach. It automates prompt generation and model optimization, significantly reducing the risk of errors. With DSPy, you can develop adaptive pipelines that learn and improve over time, making your workflow more efficient and effective. ** What You Can Achieve with DSPy? With DSPy, you can streamline your LLM development process and eliminate the hassle of complex prompt crafting. Build efficient applications that dynamically adjust to changing data and requirements, all while joining a vibrant community that continuously evolves and enhances the DSPy framework. Dive into the world of DSPy where you focus on programming, not prompting. Discover how this innovative tool can transform your AI projects. Find the link to DSPy in the comments!
To view or add a comment, sign in
-