Internet-Draft | Network Management Agent Concept | February 2025 |
Zhao, et al. | Expires 1 September 2025 | [Page] |
With the development of AI(Artificial Intelligence) technology, large model have shown significant advantages and great potential in recognition, understanding, decision-making, and generation, and can well match the self-intelligent network management requirements for the goal of autonomous network or Intent-based Networking, and can be used as one of the potential driving technologies to drive high-level autonomous networks. When introducing AI for network management, how to integrate AI technology and deal with the relationship with the existing network management entity (such as network controller) is the focus of research and standardization.¶
This document presents the concept of AI based network management agent(NMA), provides the basic definition and reference architecture of NMA, discusses the relationship of NMA with traditional network controller or other network management entity by exploring the delpoyment mode of NMA, and proposes the comman processing flow and typical application scenarios of NMA.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this document takes place on the Network Management Operations Working Group mailing list ([email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/.¶
Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-nmop/draft-ietf-nmop-digital-map-concept.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 1 September 2025.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
As the types of operator services become increasingly diverse, the complexity and difficulty of network operations and maintenance continue to grow. On one hand, new service scenarios such as industrial internet, vehicle-road collaboration, and 5GtoB for vertical industries are constantly emerging, and customer services like Extended Reality (XR), Virtual Reality (VR), and smart home are becoming more abundant, with a continuous increase in network access volume. On the other hand, with the popularization of 5G and gigabit optical networks, operators' networks are facing a situation where networks from 2G to 5G coexist. The network protocols and characteristics vary across different network domains, leading to a continuous increase in the difficulty and complexity of network operations and maintenance. Relying solely on traditional manual operations and maintenance methods can no longer meet the increasingly complex network operations and maintenance demands. The level of network intelligence has become a key factor directly affecting network performance and user experience. Against this backdrop, enhancing the level of network intelligence and creating Autonomous Networks (AN)[TMF-IG1230] or Intent-based Networking [RFC9315] has become a global consensus among operators¶
Autonomous Networks provide an architecture for the delivery of services and capabilities with “Zero-X” (Zero-wait, Zero-trouble, Zero-touch) experience for the users of vertical industries and consumers and “Self-X” experience (Self-configuration, self-healing, self-optimizing) for network operators. In particular, the AN framework defines 6 automation levels, spanning from Level 0 (L0) where operations and maintenance are fully manual, to Level 5 (L5) where the network is fully automated, managed by the AI and the human intervention is reduced to the minimum.¶
As of today, the industry sees quite different levels of automation from operator to operator, but the average level is considered to be between L2 and L3. Mainstream operators are releasing goals and plans to achieve Level 4 (L4) autonomous networks by 2025. L4+ AN sets higher requirement in intention, decision-making, analysis, perception, and execution. Artificial Intelligence (AI) large model technology has shown significant advantages and great potential in identification, understanding, decision-making, and generation. It has technical features such as multimodal fusion perception capabilities, more user-friendly human-computer interaction and knowledge Q&A capabilities, and content generation capabilities, which can well match the new requirements of Level 4 Autonomous Networks and already be one of the core driving technologies to achieve high-level autonomous networks.¶
While the key issues after the introduction of AI in network include:¶
1) The application architecture and deployment methods of AI in network management are still unclear, that is in what form AI can help network management?¶
2) The relationship between AI and the existing network controllers is not clear.¶
3) New interface capability requirements after AI is introduced are not clear either.¶
Therefore, it is necessary to define the general architecture and application form of AI in network management.¶
The concept of Network Management Agent (NMA) draws inspiration from the “AI Agent”. According to the framework proposed in the blog[LLM-powered-autonomous-agents]by OpenAI's Lilian Weng, the functions of an LLM-powered Agent include several key components: planning, memory and using tools to complete actions. Following the mainstream definition widely accepted in the industry, an AI Agent refers to “an intelligent entity with the ability to perceive the environment, make decisions, and execute actions, and can gradually achieve set goals through independent thinking and tool invocation”. In Google's latest Agent white paper[Agents], “a Generative AI agent can be defined as an application that attempts to achieve a goal by observing the world and acting upon it using the tools that it has at its disposal. Agents are autonomous and can act independently of human intervention, especially when provided with proper goals or objectives they are meant to achieve.”¶
The key features of AI agents include reasoning and decision-making abilities, goal-orientation, and autonomy. Among these, autonomy means that once the appropriate goals are provided, it can act independently without human intervention. As the concept of AI agent becomes widely accepted in the industry, it’s expected to become one of the most feasible application forms of AI.¶
Similarly, the network management agent (NMA) which can be understood as the AI Agent for network management, refers to a network management entity built based on ML/AI and equipped with the autonomous closed-loop task processing capabilities. It can automatically carry out network status perception, task intent interpretation, task planning, decision-making and task execution operations based on user task intentions or preset goals, so as to achieve closed-loop processing of scenarios-oriented network management tasks.¶
This document is trying to give a standardized common architecture for the use of AI in network management, which can be in the form of NMA. The following chapters will propose the concept of AI-based NMA, define the reference architecture of NMA and functional requirements of NMA for different scenarios, clarify the relationship of NMA with existing controller or other control systems, and discuss the general task processing workflow and typical application scenarios of NMA.¶
AI: Artificial Intelligence¶
LLM: Large Language Model¶
NMA: Network Management Agent, refers to AI based network management agent¶
The document defines the following terms:¶
A network management entity built based on ML/AI and equipped with the autonomous task processing capabilities. It can automatically carry out network status perception, task intent[RFC9315]interpretation, task planning, decision-making and task execution operations based on user task intentions or preset goals, so as to achieve closed-loop processing of scenarios-oriented network management tasks. For different application scenarios, NMA can be subdivided into multiple scenario-oriented agents.¶
In this section we’ll analyze the functional requirements and reference architecture of the NMA.¶
The NMA should support the following capabilities:¶
Support receiving task requests initiated by network operators or users through natural language. It should be noted that natural language interaction is not the only way to use NMA, network operators can also use GUI (Graphical User Interface) to operate NMA. But NMA should have the capability of understanding natural language and translate into task intents through the build-in Large Language Models (LLMs) reasoning capability.¶
Support perception of network status through querying the data of controller and other network management tools. Network status include network topology, service configuration, alarms, performance and other information needed for processing the task.¶
Support task planning and breaking down task intent into specific operations based on the user input and network status perception. The task planning process can also utilize the reasoning capability of LLMs.¶
Support selecting appropriate tools and automatically invoking corresponding tools or APIs to complete the execution of each sub operation. The toolkit includes management functions from existing controller as well as other standalone management tools like Network Digital Twin (NDT) [I-D.irtf-nmrg-network-digital-twin], etc.¶
Support generating the task execution results based on the output of each operation and sending back to network operators or users.¶
Support analysis and self-assessment of execution results, and enable autonomous or human intervention optimization based on evaluation results to continuously improve the accuracy of task execution.¶
Supporting collaboration among multiple intelligent agents to complete complex tasks.¶
In order to achieve above capabilities, by referring to the common AI agent framework, this document presents the reference functional architecture of NMA as shown in Figure 1.¶
+--------------------------------------------+ | Network Management Agent (NMA) | | +---------------------+ +----------------+ | | | Intent Management | | Memory | | | +---------------------+ | +------------+ | | | +---------------------+ | | Long-term | | | | | Network Paerception | | +------------+ | | | +---------------------+ | +------------+ | | Tool | +---------------------+ | | Short-term | | | invocation | | Task Planning | | +------------+ | | | +---------------------+ +----------------+ | Controller<---+ | +---------------------+ +----------------+ | | | | Orchestration and | | | | NDT<---+----+-> Execution | | | | | | +---------------------+ | Multi-agents | | Other <---+ | +---------------------+ | Collaboration | | external tools | | Reflection and | | | | | | Self-optimization | | | | | +---------------------+ +----------------+ | +----------------------^---------------------+ | +----------------------v---------------------+ | Common AI Service Layer | | +----------------++------------++--------+ | | | Large language || Multimodal || Small | | | | Models(LLMs) || Models || Models | | | +----------------++------------++--------+ | | +----------------------------------------+ | | | Knowledge Base | | | +----------------------------------------+ | +--------------------------------------------+
The main function components of NMA include:¶
Basic capability provided by AI models, responsible for collecting the input task information and translate into intents through AI model reasoning.¶
Achieve real-time query for network status information related to the task intent. Network status information is not limited to network topology, service configurations, device status, alarms, performances, etc. The query source can be controller, ENO, etc.¶
Based on the reasoning ability of AI models, break down the task intention into multiple sub operations.¶
Select the appropriate tools based on the specific operation, and automatically call the relevant tools or interfaces to perform the operation. After each sub operation is completed, the execution results of each operation are formed into task execution results.¶
Select the appropriate tools based on the specific operation, and automatically call the relevant tools or interfaces to perform the operation. After each sub operation is completed, the execution results of each operation are formed into task execution results.¶
Additionally, artificial evaluation methods can be integrated to further optimize the NMA's performance through human supervision, enhancing the NMA's intention understanding and task execution capabilities.¶
Responsible for storing and processing various types of information during the operation of NMA, including long-term memory (LTM) and short-term memory (STM). STM stores information that NMA is currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. LTM can store information for a remarkably long time, ranging from a few days to months or years. To summarize, STM is for in-context learning which is short and finite, as it is restricted by the finite context window length of Transformer. LTM is for the external vector store that the NMA can attend to query time, accessible via fast retrieval.¶
Responsible for completing collaboration between multiple NMAs at different levels or in different application scenarios. The specific collaboration mechanism needs further research.¶
In addition, there is a common AI service layer, including various large language models (LLMs), multimodal models, small models, and knowledge base. Among them, AI models provide public interactive intelligence capabilities as unified agent engine, to simplify NMA development. Knowledge base provides unified search for multi-type knowledge bases including vector knowledge base, system online help, operation and maintenance data logs), combines AI models to complete knowledge fusion and extraction, and improves the accuracy of NMA task execution.¶
Various NMAs can be constructed based on the common AI service layer. During the operation of NMA, it leverages the model reasoning capabilities and knowledge base provided by the AI service layer to achieve functions such as intent parsing and task planning. It should be noted that, depending on the actual deployment requirements, the AI basic service can also be deployed within the NMA.¶
For different application scenarios, there can be multiple scenario-oriented agents (like apps in the phone). Aimed at the network planning, construction, maintenance, optimization, and operation scenarios, the main NMAs could include:¶
Network Fault Handling Agent: This agent can be created by pre-training specific AI model based on the network troubleshooting guidance documents, network equipment product documents, and other materials. The agent can solidify the fault handling experience of experts, and realize fault impact analysis, root cause self-diagnosis, and self-repair of network faults by orchestrating and calling models or network control APIs. It also interfaces with the work order dispatching system to achieve automated closed-loop processing of work orders, etc.¶
To be discussed in the later version.¶
When deploying an NMA based management/control architecture, it is possible to consider two different deployment models, where the NMA can be part of an existing network controller, or can be an independent system deployed separately and interacting both with the controller and the network. The two deployment modes can be called: Independent deployment mode and Integrated deployment mode and are shown in Figure 2.¶
+-----------------------------+ +--------------------+ | | | | | Network <--C_A_I--> Network Management | | Controller | | Agent(NMA) | | | | | +--------------^--------------+ +----------^---------+ | | Southbound Interface(SBI) Intelligent SBI(I_SBI) | | +--------------v-----------------------------------v---------+ | Physical Network | +------------------------------------------------------------+ (a) +------------------------------------------------------------+ | Network Controller | | | | +--------------------+ +--------------------+ | | | Original Function <--Internal-> Network management | | | | Modules | Interface | Agent(NMA) | | | +--------------------+ (I_I) +--------------------+ | | | +------------------------------^-----------------------------+ | Extended SBI(E_SBI) | +------------------------------v-----------------------------+ | Physical Network | +------------------------------------------------------------+ (b)
As shown in Figure 2(a), NMA is independently deployed from the original network controller. NMA and controller are independent systems. A new east-west interface needs to be added between the NMA and the controller to achieve capability calling and result feedback operations. This interface can be called “C_A_I”. In this deployment mode, controller uses southbound interface (SBI) to interact with physical network, while an intelligent southbound interface (abbreviated as “I_SBI”) needs to be added between NMA and the underlying physical network.¶
As shown in Figure 2(b), NMA is integrated and deployed with the original network controller, and the NMA serves as a function of the controller. NMA interacts with original function modules through internal interface (abbreviated as “I_I”). The enhanced controller interacts with the underlay physical network through extended SBI (abbreviated as “E_SBI”).¶
The specific functional requirements and information model definition of interfaces mentioned above will be discussed in the following version.¶
While the integrated deployment mode is relatively simple, due to an internal communication between the NMA and the controller, the independent deployment mode introduces several challenges to be analyzed, that can be grouped into “single agent” and “multi agent” challenges.¶
Starting from and architecture with a single NMA, like the one shown in Figure 3 below, the challenges that we need to address are:¶
User input ^ | Trigger +------------>-------+ +-------------------------+ +------------> Agent |<--------> Common AI Service Layer | | Trigger +---^---+ +-------------------------+ | | Existing interfaces: REST, RESTConf, gRPC | SSH +-------+----------------+---------------+----------+ | NetConf | | | | | | gRPC/gNMI/gNOI | | | | | | | +-----v------+ +-------v-------+ +-----v-----+ +--v--+ +----------------+-< Controller | | Observability | | Inventory | | ... | | +-----^------+ +-------^-------+ +-----^-----+ +--^--+ | | | | | +---v-------v----------------v---------------v----------v--+ | Network Infrastructure | +----------------------------------------------------------+
Things get a bit more complex when multiple NMAs are deployed and, in addition to interacting with existing controller, they need to interact with other NMAs as shown in Figure 4. In this case the challenges to consider are:¶
User input ^ +-----------+ | Trigger +---> Agent B <--------------+ +------------>---------+ | +-----^-----+ +-----v-----+ +------------> Agent A |<-------+-------- |--------------> Agent C | | Trigger +---^--^--+ | +---- ^-----+ | | | | | | | | | | | SSH + +----+ | +----+-----+ | NetConf | | | | | | gRPC/gNMI/gNOI | | | | | | | +-----v------+ +-------v-------+ +-----v-----+ +--v--+ +----------------+-< Controller | | Observability | | Inventory | | ... | | +-----^------+ +-------^-------+ +-----^-----+ +--^--+ | | | | | +---v-------v----------------v---------------v----------v--+ | Network Infrastructure | +----------------------------------------------------------+
The embedded AI model within NMA serves as the interface for user information input, and NMA instance uses the large model as the interface to clarify problems through multiple rounds, analyze positioning, generate plans, invoke interfaces/tools to handle problems, and complete closed-loop processing of problems, so as to build end-to-end problem processing assistance capabilities.¶
User/Network +-----> Management Task | | | v | Intent Analysis <-------+ +-- Service Configuration | | | | API/Tool | | v | | | Model Reasoning | Alarm Monitor | | ^ | API/Tool | v | | | Task Decomposition <----+ | Performance Monitor | | | API/Tool | v | | Tool/API Invocation-----> Toolkit ----+ Network Optimization | | | ^ | API/Tool | v | | | | Process Encapsulation | | | Topology Management | | | | | API/Tool | v | | | +---Executive Result Analysis | | +-- other APIs/Tools | | | | | | | | +-----------------------v--+-----------------------------+ | Physical Network | +--------------------------------------------------------+
The common processing flow of NMA instance are shown in Figure 3. The processing steps include:¶
User/Network Management Task Input: Input the user’s task information Through multiple rounds of natural language interaction.¶
Intent Analysis: Analysis user task intent through AI model reasoning provided by the AI based basic services within NMA.¶
Task Decomposition: Split the task into detailed operations to be performed based on the analyzed intent of the task.¶
Tool/API Invocation: Call the corresponding tool or function API to complete the execution of each operation listed in step 3). The toolkit refers to the collection of all tools that can be used directly to manage and operate physical networks, which can include management functions from existing controller, EMS, or standalone other management tools. The toolkit can include service configuration API/Tool, alarm monitor API/Tool, performance monitor API/Tool, network optimization API/Tool, topology management API/Tool, etc.¶
Process Encapsulation: Encapsulate each execution step. According to the order or dependency of all the operations, package the individual operation results into the execution result of the entire task.¶
Executive result analysis: Analyze the task processing results and return to the user.¶
Through above processing flow, NMA can achieve closed-loop automated processing of tasks and constructing end-to-end intelligent network maintenance assistance capabilities. For example, in the intelligent troubleshooting scenario, NMA can identify the cause of the fault and call the corresponding interfaces to handle it, such as creating a troubleshooting order, automatically initiating rerouting/optical power optimization, and other troubleshooting operations, and automatically verifying the progress of the order execution, with feedback on the troubleshooting results after the job order is completed.¶
The introduction of NMA can effectively improve the level of intelligent operation and maintenance of network, thus promoting the continuous evolution of communication network towards higher-level self-intelligence.¶
Typical applications of NMA in networks can cover network operation and maintenance and operation processes:¶
Intelligent planning and construction: such as broadband installation, resource/capacity planning, intelligent acceptance, site selection, etc.¶
including intelligent question and answer, customer service assistant, automatic classification of user complaints, customer retention, product recommendation, automatic flow of work orders, anti-fraud monitoring and identification, intelligent marketing and other value-added services. This part is outside the scope of this document.¶
The starting point for the application of NMA in the live network should comprehensively consider the scenarios with strong demand, feasible technology, and good input-output ratio, and at the same time meet the requirements of sufficient data for AI pre-training during the construction of NMA instance, perfect data annotations, and high fault tolerance rate. Based on above considerations, the broadband installation and maintenance assistant, fault diagnosis, operation and maintenance assistant may become the first application scenarios.¶
This document has no requests for IANA action.¶