AI Talk: Dawn of the Era of Agents
- Juggy Jagannathan
- Feb 24
- 4 min read
With the advent of ever more powerful AI models, we are beginning to see task-specific-agents pressed into actions. In this blog we examine what exactly is an agent, how does one create them and deploy them. And what are some precautions one need to take in utilizing them.

What is an agent?
The notion of an “agent” is as old as the field of AI itself. In the classic textbook by Russell and Norvig, Artificial Intelligence: A Modern Approach, the second chapter discusses this very subject. Their simple definition is:
“An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.”
A Roomba vacuum cleaner fits the bill. So does a software agent that scans the Internet to inform you of interesting news. Google Cloud distinguishes between an AI agent, an AI assistant, and a bot. An AI agent performs tasks autonomously, plans, reasons, learns, and executes tasks. Its interaction with users and other agents is proactive and goal-oriented.
A slew of survey articles that came out in the past two months on this topic, indicate that 'Agents AT (after Transformer)' have reached to the peak of Agent hype cycle. Why Agents AT? I am just clubbing all the previous efforts as Agents BT (before Transformer). I recently spotted a publication from China titled: "The rise and potential of large language model based agents: a survey," referencing 574 works—talk about dedication! Another is an IEEE Survey called: "Agentic AI: Autonomous Intelligence for Complex Goals - A comprehensive Survey". Here I will highlight a few key architectural considerations and practical tools used for implementing present day solutions.
Problem Solving
Every agent is entrusted with doing something specific. For instance, a travel agent is given an itinerary and asked to make appropriate reservations. How does it go about doing that? This Continuum Labs blog on AI Agents summarizes efforts related to reasoning, planning, and tool calling. In the past, traditional AI approaches involved using rules to plan out a strategy on what needs to be done and then step-by-step execute the strategy. The modern Generative AI (or LLM-based) approaches aren't so different —prompt the LLM to come up with a plan (using chain-of-thought reasoning) and then execute each step with API calls to the right tools. One of the first systems to use LLMs in this manner was the ReAct framework, which did precisely that: it combined CoT reasoning with tool invocation.
Once you come up with a plan, tool invocation and coordination of the results become paramount. A host of agentic frameworks now support this perceive–think–act paradigm. For a good list of these frameworks, see: "Building Intelligent Apps with Agentic AI" or "Top 7 Frameworks for Building AI Agents in 2025". Tool use can take many forms: it might invoke Retrieval Augmented Generation (RAG) to leverage unstructured content, GraphRAG for both structured and unstructured data, or other specialized strategies.
On a lighter note, I’d love someone to build me a spam phone call filter. I recently answered a spam call with my best robotic voice: “I am an AI assistant. How may I help you?” The caller hung up immediately.
Embodied Agents
Another category of agents is the embodied agent, primarily in robotics. ChatGPT once offered this succinct description:
“AI systems that operate within, or control, a physical or simulated body. These agents combine perception, action, and environment interactions in ways that go beyond purely text-based or abstract environments.”"
Robotics has always had some notion of intelligence, captivating us in sci-fi films and novels as well as in real-world applications. Today, though, “embodiment” extends beyond humanoid forms. We now have toasters and refrigerators talking to us—though let’s hope they don’t start negotiating union wages anytime soon!
Vision and language understanding capabilities have converged to push embodied AI forward. Google DeepMind is pioneering work in Vision Language Models (VLMs), with its Robotic Transformer 2 (RT-2) that converts commands to robot actions. Apple's "Grounding Multimodal Large Language Models in Actions," also explores bridging text instructions and real-world robot operations.
These models are evolving rapidly. The automation they enable differs significantly from factory-floor robots (often referred to as “hard automation,” which are tightly scripted). By contrast, the next generation is far more flexible: given a task, they devise a solution and then translate it into actions—a huge leap in complexity. That said, these capabilities are often demonstrated in toy environments or limited domains. As with self-driving cars, full autonomy in the wild remains a big challenge.
Multi-Agent Systems (MAS)
Finally, let’s touch on multi-agent systems. This area is also well-established in AI. The 1970s saw the rise of the Blackboard architecture, which provided a global memory shared by a collection of knowledge sources (i.e., agents). One of its first applications was in speech recognition at Carnegie Mellon University, where different ways of interpreting speech signals cooperated to solve speech-to-text tasks. The architecture lends itself to all kinds of agent organization—structured hierarchically or as a group of peers opportunistically responding to a shared “blackboard” of world state.
Fast-forward 50 years: multi-agent systems remain incredibly relevant. On arXiv, where many research preprints appear, the multi-agent (MA) category is abuzz with new submissions every month. In February 2025 alone, it recorded 135 new publications—and the month isn’t even over yet! The range of problems MAS can tackle is effectively limitless, provided the individual agents can cooperate or compete intelligently.
RoboCup is a particularly captivating example: the organizers aim to build a robot soccer team capable of beating the human World Cup champions by 2050.
One noteworthy paper (by Penn State and Google Cloud AI Research) explores Chain-of-Agents (CoA) to improve Retrieval Augmented Generation. RAG traditionally grabs chunks of relevant data to answer a query but struggles when information is spread out across multiple sections. CoA addresses that gap by harnessing multi-agent collaboration—essentially letting agents “talk among themselves” to piece together a more complete answer.
Concluding Thoughts
The field of agents is vibrant and rapidly evolving, with huge potential for real-world applications. However, with great power comes the need for responsible design. Proper guardrails, oversight, and verification are essential to prevent unwanted outcomes. As with all Generative AI solutions, the guiding principle is: “Don’t trust, always verify.”
May your agents be ever loyal, your spam calls ever fewer, and your robotic vacuum really cleans!
Comments