AI agents acting autonomously in the world

babyAGI and AutoGPT: The rise of autonomous AI agents?

The tech elite of Silicon Valley are chattering excitedly over babyAGI, the new AI kid on the block. While it doesn’t quite live up to AGI (artificial general intelligence)—the omnipotent AI heavyweight that gives Elon Musk sleepless nights—babyAGI is still a dazzling and noteworthy addition to the AI playground.

babyAGI

In a nutshell, babyAGI morphs GPT-4 (OpenAI’s latest language model that usually spits out words) into a handy digital sidekick that can perform tasks and take actions online. Instead of receiving text-based responses from GPT-4, babyAGI lets you do things like launch and execute a Twitter follower campaign or create and manage a content marketing biz.

These agents access large language models (LLMs) to perform real-world tasks through an agent’s software layer, thanks to a relatively simple action loop and prompt. AutoGPTs, including babyAGI, are out of the box, and reports are piling up on the Internet of people ordering pizza via AutoGPT-style AI agents and delegating small tasks to them. The key thing is that they can not only interact with each other but also with humans in principle.

AutoGPT

AutoGPT isn’t a new user interface like ChatGPT, but rather the concept of self-prompting LLMs to automate numerous tasks with text and code. Auto here comes from autonomous control: self-prompting and auto-prompting are meant. After an initial prompt (request in natural language), the LLM begins to develop and execute further prompts, which can again lead to further instructions that the program issues to itself.

The approach is powerful as it can be connected to real-world tools. These tools enable agents and their programs to, for example, search the Internet, not only write code but also test it, give or receive instructions to other agents, programs, or even humans. Countless possible applications are conceivable, up to and including the control of robots that also physically operate in the world. Individual API calls to the language model are linked in loops for this purpose (agent loop), giving the impression that the agents can perceive, think and act independently.

Particularly popular at the moment is the AutoGPT repository on GitHub, where experimental code is available “to make GPT-4 fully autonomous,” according to the repository description. The repository includes a demo, detailed installation instructions, usage information, voice mode, configuration for API keys at Google, instructions on how to set up the required memory, a “GPT-3.5-only mode,” tips on image synthesis, and notes on limitations as well as how to run tests. With the data shared, agents are able to search the Internet to gather information, manage long- and short-term reminders, set up GPT-4 instances for text generation, access popular websites and platforms, and both store and aggregate data with GPT-3.5.

Free to use and open-source, AutoGPTs like babyAGI could spell trouble for well-funded startups aiming to create commercial AI assistants, such as Adept AI and Inflection AI.

Despite not being AGI, AutoGPTs come with risks. For one, the continuous loops of prompts can lead to exorbitant bills with OpenAI. Other concerns include potential use in cyberattacks, fraud schemes, and misinformation mills. Plus, if users aren’t careful with their requests, the autonomous bots may inadvertently perform actions on the user’s behalf—like making purchases or scheduling appointments—that the user didn’t intend.

Outlook: Skynet ante portas?

In the (very near?) future, it’s conceivable that robots, connected to large language models for information retrieval and processing, could similarly act and interact in the physical world. These robots would use sensors to perceive their surroundings. While still a futuristic vision, the rapid pace of AI development may bring such implementations within reach sooner than we think. Microsoft’s AI research division, OpenAI’s main partner, recently presented a research report on converting natural language instructions into executable robot actions using ChatGPT.

Materials related to this project can be found in a GitHub repository. The research team shares prompts used in human-robot communication, which, according to the project description, can be easily adapted and integrated into existing robotic and visual recognition systems.

In conclusion, while babyAGI isn’t the Skynet in diapers you may have feared, it’s still an impressive and significant development in the world of AI. With its potential to connect to real-world tools and the prospect of connecting it to the physical world through and with robots, it’s a development that deserves attention and careful examination. As these GPT-based agents become more integrated into our daily lives, it’s crucial to understand and manage the risks associated with their usage.

Scroll to top