AI in Software Engineering: The Rise of Code Assistants, Vibe Coding, and Smarter Infrastructure

Aug 05, 2025

Artificial intelligence isn’t just knocking on the door of software engineering; it’s already inside, rearranging the furniture. From initial keystrokes in new applications to overseeing complex cloud systems, AI is revolutionizing how we design, develop, test, and maintain our systems. The evolution is swift, opportunities are vast, and the learning journey can be challenging and humbling. In this issue of Digital Reflections, we explore these developments, focusing on the tools and companies leading the change in how software is created, tested, and released to the world.

The next decade won’t simply be about building smarter AI coders; it will be about orchestrating diverse systems, human and machine, that can check each other’s work, preserve context, and reason about outcomes instead of just predicting the next token.

AI Code Assistants

We have come a long way since the early days of code completion. A significant milestone was Microsoft’s introduction of IntelliSense, first tested in Visual Basic 5.0, and officially launched in the late 1990s, back when Windows 95 was still making waves.

Code completion shown in Microsoft Visual J++ circa 1990

Code completion became a standard feature across various platforms by the end of the pre‑AI era. Notable examples include the Eclipse framework, JetBrains IntelliJ, and Borland’s Delphi and C++ IDEs. This feature was easier to implement in languages that supported introspection, a language feature that allowed code structure to be examined using code written in the same language.

Through the 2000s and 2010s, progress was steady but incremental. IDEs became faster at indexing large projects, reducing the lag between typing and suggestions. Completions grew more context‑aware: IntelliJ added semantic analysis to suggest methods relevant to the surrounding logic; Visual Studio’s IntelliSense incorporated type inference and symbol search; and Eclipse integrated static analysis to flag errors before compilation. By the late 2010s, cloud‑backed services like TabNine began using statistical models trained on public code, hinting at what was to come, but these tools still mainly worked at the line level, with no real grasp of the broader project.

Then, the ground shifted. Over the past few years, large language models and generative AI have evolved to transform code completion from a line‑by‑line helper into a full‑fledged collaborator. From GitHub Copilot to Claude Code, these assistants can parse your entire codebase, understand your intent, and generate anything from a missing function to a complete module, often before you have finished typing the first few characters. Unlike traditional AI chat interfaces, they work silently in the background, offering _ghost text_: suggestions you can accept, reject, or adapt. The effect can feel almost telepathic: start writing a function to translate English to Spanish, and the assistant will finish it for you in real time, as if it already knew where you were headed.

Example of GitHub Copilot integrated in the Neovim code editor

AI code assistants can also help answer questions about the structure of a code project, identify and suggest solutions for bugs, offer suggestions for performance improvements, and add the frequently neglected code documentation.

Notable code assistants in 2025:

AWS CodeWisperer: AWS‑trained coding assistant with real‑time suggestions and security checks.
ChatGPT with Code Interpreter (Beta): AI for code writing, debugging, and execution via chat.
Claude Code: conversational AI built for coding, refactoring, and long‑context reasoning.
Continuous AI: an open‑source framework to build custom AI coding agents in VS Code or JetBrains.
Cursor: AI‑first code editor that understands full repo context and natural‑language edits.
Gemini Code Assist: Google’s AI coder driven by Gemini 2.5, supporting completions, transformations, chat, agent mode, and code reviews.
GitHub Copilot: context‑aware AI pair‑programmer integrated into GitHub and IDEs for fast autocompletion & chat.
Replit Agent: AI that turns natural‑language prompts into working apps or code. with autocomplete
Tabnine: privacy‑focused, team‑trained AI for code completion and in‑IDE chat
Windsurf (formerly Codeium: free, multi‑language AI-powered editor with autocompletion and multi‑line editing

Vibe Coding, Reasoning Models, and Beyond

As coding assistants become mainstream, these tools have enabled individuals with limited or no technical backgrounds to create software by simply expressing their ideas in plain English. As the AI assistant makes mistakes, the human asks for it to correct those mistakes, guiding step by step towards the desired outcome. This is known as "Vibe Coding". The computer scientist Andrej Karpathy coined this term in February 2025, and it has been a sensation in the tech influencer world. The issue with Vibe Coding is that developers often rely on AI-generated code without fully understanding its functioning, verifying its correctness, or addressing security risks. This results in undetected bugs and cybersecurity vulnerabilities.12

In response to these risks, the field has started turning its attention to reasoning. The conversation gained momentum with the arrival of DeepSeek,34 one of the first models explicitly designed to disclose structured reasoning to the end user. DeepSeek sparked broader recognition that AI tools could evolve beyond mere code generation to consider intent, logic, and edge cases. Its debut marked a shift in how the industry viewed reasoning as a safeguard against the brittleness of vibe coding.

After DeepSeek signaled that AI could start simulating reasoning, Apple’s June 2025 paper “The Illusion of Thinking” 5 brought that idea under public scrutiny. By testing models like OpenAI’s o1 and o3, Claude Sonnet Thinking, and DeepSeek‑R1 on structured puzzles such as the Tower of Hanoi6 and river‑crossing7 problems, Apple demonstrated that model accuracy collapsed as complexity increased, and that internal chain-of-thought traces inexplicably shortened even when token budgets remained plentiful.89 The paper rapidly polarized the AI community: some praised its controlled‐environment evaluation and interpretability goals, while others, including Anthropic and independent researchers, argued that experimental design flaws, evaluation bias, and token limitations, not fundamental reasoning failures, were the true culprits.1011 That debate has crystallized a deeper takeaway: reasoning can no longer be assessed solely by final‑answer correctness. We need evaluation frameworks that observe the process, effort allocation, and failure modes of AI when the questions scale beyond memorization.

While researchers continue to debate what constitutes reasoning in AI, a new wave of startups is pushing the field forward in more pragmatic ways. These companies are not waiting for perfect solutions from frontier labs; instead, they are building tools that enhance how developers interact with AI, improve reasoning reliability in real-world workflows, and experiment with new architectures that go beyond next-token prediction. In doing so, they’re helping to redefine what useful, applied reasoning can look like today.

AI companies to look for in the space:

Aider: An open‑source, Git-based coding assistant that automates commits, messages, refactoring, and voice-triggered edits, all from the command line. Supports Claude and GPT‑4o.
Anysphere: Creator of Cursor, a full‑repo code editor powered by GPT‑4 and Claude, which supports natural‑language multi‑line edits and can autonomously refactor or manage tasks across entire codebases.
Augment: Augment is working on context-aware tooling that understands the whole structure of large codebases to adapt and generate code that fits team styles and architecture constraints.
Cognition AI: Developer of Devin, a so-called “AI software engineer” capable of executing entire development projects end-to-end: writing code, running builds, fixing bugs, and deploying apps. Also, Cognition has acquired Windsurf.
Lovable: A modular AI dev platform that generates component-based systems from natural language.
Magic: This LLM startup focuses on high-capacity models that can generate, explain, and optimize code across millions of lines, helping teams manage large software systems at scale.
Poolside AI: Poolside is building custom large language models tailored to developer workflows. Its AI assistant integrates deeply with IDEs and corporate environments.
Qodo (formerly CodiumAI): Qodo offers a “code integrity platform” that auto-generates unit tests, PR descriptions, refactors, and contextual code chat—all inside Git and IDEs.
Reflection: Founded by former Google DeepMind researchers, this startup has built Asimov, an AI agent that understands not just code but also associated emails, docs, and commit history.

Who Watches the Coders' Coders?

After testing many AI tools and code assistants on my day-to-day engineering work, I have observed a recurring issue: local minima loops. A model encounters an error, proposes a fix that causes a new mistake, and then circles back to the original solution, getting stuck in a loop without progressing. To make things even worse, over time, the model often forgets key contextual elements from earlier interactions, omitting functions or ignoring prior constraints. These issues are especially problematic for large codebases with complex logic.

The underlying issue? Larger context windows are not always better (an active research topic at the moment of writing this article).1213 Otherwise, much of the training data for these tools comes from the public internet forums like Stack Overflow, where code is often rushed, hacked together, or tailored to narrow use cases. These low-quality examples lead models to reproduce brittle or inefficient code. Michael Paulson (a.k.a. The Primeagen) explains this problem by pointing to the shape of the statistical distribution for code available on the internet; only a tiny percentage is great code.

Addressing these challenges has become the focus of a growing ecosystem of AI tools specifically dedicated to code quality. From automated code review platforms to context-aware refactoring assistants, these solutions aim to identify bugs earlier, enforce style and security standards, and maintain architectural integrity, especially in large, evolving codebases where human review alone may struggle to keep pace.

AI tools focused on code quality:

All Hands: AI-assisted pull request summaries and codebase documentation generation to help teams maintain clarity and reduce onboarding time.
Codacy: Automates code quality checks, security scanning, and style enforcement for every commit or pull request, integrating with CI/CD pipelines.
CodeAnt AI: Automates code reviews and checks for code vulnerabilities.
CodeRabbit: An AI-powered code reviewer that leaves actionable comments directly in pull requests, helping teams enforce style guides and catch logical errors early.
PullRequest: Provides on-demand code review by combining AI-driven analysis with a network of vetted human reviewers to improve quality and reduce technical debt.
SonarQube AI Code Quality: Automatically detecting bugs, vulnerabilities, and maintainability issues across dozens of programming languages.
What The Diff: Automatically writes pull request descriptions.

It’s hard not to notice the irony: when AI-generated code stumbles, we humans are adding another layer of AI to catch the mistakes. As the Roman poet Juvenal asked, “Quis custodiet ipsos custodes?”, who will watch the watchers? In practice, the most effective oversight often comes from diversity rather than duplication: different models, trained on various datasets, checking each other’s output. This not only spreads the risk but also gives us a better shot at catching the blind spots that any single AI, no matter how advanced, will inevitably have.

Infrastructure Management

Once code is written, tested, and refined, the real challenge often begins: deploying and managing it at scale. Infrastructure provisioning, monitoring, and scaling can be more complex than the code itself. Now, AI is entering this space, not just as a helper but as a strategic orchestrator: automating configuration, spotting misconfigurations, tuning performance, and even cutting cloud costs.

Cloud computing has already revolutionized how software is launched, making it possible to launch a new product in hours instead of months. Developers can provision resources on demand, pay only for what they use, and deploy globally. But as systems grew larger and more interdependent, the need for repeatable, automated infrastructure became clear. This led to the emergence of infrastructure as code (IaC),14 with tools like Terraform and CloudFormation that allow teams to define and manage infrastructure through version-controlled configuration files.

Managing cloud infrastructure remains challenging, particularly when multiple environment configurations must be maintained (e.g., testing vs. production). Recently, AI-powered tools have assisted with infrastructure configuration. These include cloud-native Amazon Q, Google Gemini Cloud Assist, Microsoft Copilot in Azure, and third-party tools like CloudAdvisor and Pulumi AI.

Today, most AI tools in this area operate as conversational bots: their effectiveness depends heavily on the engineer's prompts and the contextual data they are given. However, more capable tools are emerging with live telemetry, automated cybersecurity analysis, and configuration awareness to deliver tailored recommendations or even take corrective action automatically. Below are some notable examples.

AI applications for infrastructure management:

Aikido: A Security automation platform that uses AI to prioritize vulnerabilities based on exploitability, context, and business impact, helping teams fix the most critical issues first.
Datadog Watchdog: Machine learning–driven anomaly detection and forecasting that integrates with logs, metrics, and traces to surface potential infrastructure or application issues.
Dynatrace: Observability platform with the Davis AI engine that automatically detects anomalies, identifies root causes, and provides remediation suggestions across complex multi-cloud environments.
Wiz: Uses an AI-backed Security Graph to map cloud resources, detect vulnerabilities, and highlight critical attack paths with environment-specific remediation guidance.

Where Do We Go from Here?

Artificial intelligence is no longer a sidekick in engineering; it’s becoming an active participant in the design, coding, and operation of complex systems. From IntelliSense’s humble beginnings to reasoning‑aware assistants and AI‑driven infrastructure management, we are watching the profession shift from “engineer builds with tools” to “engineer collaborates with tools”. This evolution offers enormous leverage: faster iteration, broader accessibility, and automation at scales that once demanded entire teams. But it also introduces new fragilities: vibe-coded bugs, reasoning blind spots, and AIs overseeing other AIs. This demands just as much creativity to manage as the original problems we set out to solve.

The next decade won’t simply be about building smarter AI coders; it will be about orchestrating diverse systems, human and machine, that can check each other’s work, preserve context, and reason about outcomes instead of just predicting the next token. In other words, engineering’s newest challenge is ensuring that in our race toward automation, we keep the ability to understand, explain, and, when needed, overrule our most sophisticated digital collaborators. Because in the end, the future of engineering isn’t AI replacing us, it’s AI raising the standard of what we, together, can create.

I hope you have enjoyed this installment of Digital Reflections! Remember to subscribe and follow for more.

Digital Reflections is available at digirex.substack.com and also on LinkedIn.