AI based Network Management Agent(NMA): Concepts and Architecture

Internet-Draft	Network Management Agent Concept	February 2025
Zhao, et al.	Expires 1 September 2025	[Page]

Abstract

With the development of AI(Artificial Intelligence) technology, large model have shown significant advantages and great potential in recognition, understanding, decision-making, and generation, and can well match the self-intelligent network management requirements for the goal of autonomous network or Intent-based Networking, and can be used as one of the potential driving technologies to drive high-level autonomous networks. When introducing AI for network management, how to integrate AI technology and deal with the relationship with the existing network management entity (such as network controller) is the focus of research and standardization.¶

This document presents the concept of AI based network management agent(NMA), provides the basic definition and reference architecture of NMA, discusses the relationship of NMA with traditional network controller or other network management entity by exploring the delpoyment mode of NMA, and proposes the comman processing flow and typical application scenarios of NMA.¶

1. Introduction

1.1. Background

As the types of operator services become increasingly diverse, the complexity and difficulty of network operations and maintenance continue to grow. On one hand, new service scenarios such as industrial internet, vehicle-road collaboration, and 5GtoB for vertical industries are constantly emerging, and customer services like Extended Reality (XR), Virtual Reality (VR), and smart home are becoming more abundant, with a continuous increase in network access volume. On the other hand, with the popularization of 5G and gigabit optical networks, operators' networks are facing a situation where networks from 2G to 5G coexist. The network protocols and characteristics vary across different network domains, leading to a continuous increase in the difficulty and complexity of network operations and maintenance. Relying solely on traditional manual operations and maintenance methods can no longer meet the increasingly complex network operations and maintenance demands. The level of network intelligence has become a key factor directly affecting network performance and user experience. Against this backdrop, enhancing the level of network intelligence and creating Autonomous Networks (AN)[TMF-IG1230] or Intent-based Networking [RFC9315] has become a global consensus among operators¶

Autonomous Networks provide an architecture for the delivery of services and capabilities with “Zero-X” (Zero-wait, Zero-trouble, Zero-touch) experience for the users of vertical industries and consumers and “Self-X” experience (Self-configuration, self-healing, self-optimizing) for network operators. In particular, the AN framework defines 6 automation levels, spanning from Level 0 (L0) where operations and maintenance are fully manual, to Level 5 (L5) where the network is fully automated, managed by the AI and the human intervention is reduced to the minimum.¶

As of today, the industry sees quite different levels of automation from operator to operator, but the average level is considered to be between L2 and L3. Mainstream operators are releasing goals and plans to achieve Level 4 (L4) autonomous networks by 2025. L4+ AN sets higher requirement in intention, decision-making, analysis, perception, and execution. Artificial Intelligence (AI) large model technology has shown significant advantages and great potential in identification, understanding, decision-making, and generation. It has technical features such as multimodal fusion perception capabilities, more user-friendly human-computer interaction and knowledge Q&A capabilities, and content generation capabilities, which can well match the new requirements of Level 4 Autonomous Networks and already be one of the core driving technologies to achieve high-level autonomous networks.¶

While the key issues after the introduction of AI in network include:¶

1) The application architecture and deployment methods of AI in network management are still unclear, that is in what form AI can help network management?¶

2) The relationship between AI and the existing network controllers is not clear.¶

3) New interface capability requirements after AI is introduced are not clear either.¶

Therefore, it is necessary to define the general architecture and application form of AI in network management.¶

1.2. Introduction of Network Management Agent (NMA)

The concept of Network Management Agent (NMA) draws inspiration from the “AI Agent”. According to the framework proposed in the blog[LLM-powered-autonomous-agents]by OpenAI's Lilian Weng, the functions of an LLM-powered Agent include several key components: planning, memory and using tools to complete actions. Following the mainstream definition widely accepted in the industry, an AI Agent refers to “an intelligent entity with the ability to perceive the environment, make decisions, and execute actions, and can gradually achieve set goals through independent thinking and tool invocation”. In Google's latest Agent white paper[Agents], “a Generative AI agent can be defined as an application that attempts to achieve a goal by observing the world and acting upon it using the tools that it has at its disposal. Agents are autonomous and can act independently of human intervention, especially when provided with proper goals or objectives they are meant to achieve.”¶

The key features of AI agents include reasoning and decision-making abilities, goal-orientation, and autonomy. Among these, autonomy means that once the appropriate goals are provided, it can act independently without human intervention. As the concept of AI agent becomes widely accepted in the industry, it’s expected to become one of the most feasible application forms of AI.¶

Similarly, the network management agent (NMA) which can be understood as the AI Agent for network management, refers to a network management entity built based on ML/AI and equipped with the autonomous closed-loop task processing capabilities. It can automatically carry out network status perception, task intent interpretation, task planning, decision-making and task execution operations based on user task intentions or preset goals, so as to achieve closed-loop processing of scenarios-oriented network management tasks.¶

This document is trying to give a standardized common architecture for the use of AI in network management, which can be in the form of NMA. The following chapters will propose the concept of AI-based NMA, define the reference architecture of NMA and functional requirements of NMA for different scenarios, clarify the relationship of NMA with existing controller or other control systems, and discuss the general task processing workflow and typical application scenarios of NMA.¶

3. Reference architecture of NMA

In this section we’ll analyze the functional requirements and reference architecture of the NMA.¶

3.1. Function Requirements of NMA

The NMA should support the following capabilities:¶

Support receiving task requests initiated by network operators or users through natural language. It should be noted that natural language interaction is not the only way to use NMA, network operators can also use GUI (Graphical User Interface) to operate NMA. But NMA should have the capability of understanding natural language and translate into task intents through the build-in Large Language Models (LLMs) reasoning capability.¶
Support perception of network status through querying the data of controller and other network management tools. Network status include network topology, service configuration, alarms, performance and other information needed for processing the task.¶
Support task planning and breaking down task intent into specific operations based on the user input and network status perception. The task planning process can also utilize the reasoning capability of LLMs.¶
Support selecting appropriate tools and automatically invoking corresponding tools or APIs to complete the execution of each sub operation. The toolkit includes management functions from existing controller as well as other standalone management tools like Network Digital Twin (NDT) [I-D.irtf-nmrg-network-digital-twin], etc.¶
Support generating the task execution results based on the output of each operation and sending back to network operators or users.¶
Support analysis and self-assessment of execution results, and enable autonomous or human intervention optimization based on evaluation results to continuously improve the accuracy of task execution.¶
Supporting collaboration among multiple intelligent agents to complete complex tasks.¶

3.2. Reference Architecture of NMA

In order to achieve above capabilities, by referring to the common AI agent framework, this document presents the reference functional architecture of NMA as shown in Figure 1.¶


                    +--------------------------------------------+
                    |        Network Management Agent (NMA)      |
                    | +---------------------+ +----------------+ |
                    | |  Intent Management  | |     Memory     | |
                    | +---------------------+ | +------------+ | |
                    | +---------------------+ | |  Long-term | | |
                    | | Network Paerception | | +------------+ | |
                    | +---------------------+ | +------------+ | |
          Tool      | +---------------------+ | | Short-term | | |
       invocation   | |     Task Planning   | | +------------+ | |
                    | +---------------------+ +----------------+ |
 Controller<---+    | +---------------------+ +----------------+ |
               |    | |  Orchestration and  | |                | |
        NDT<---+----+->      Execution      | |                | |
               |    | +---------------------+ |  Multi-agents  | |
     Other <---+    | +---------------------+ |  Collaboration | |
 external tools     | |   Reflection and    | |                | |
                    | |  Self-optimization  | |                | |
                    | +---------------------+ +----------------+ |
                    +----------------------^---------------------+
                                           |
                    +----------------------v---------------------+
                    |         Common AI Service Layer            |
                    | +----------------++------------++--------+ |
                    | | Large language || Multimodal || Small  | |
                    | |  Models(LLMs)  ||   Models   || Models | |
                    | +----------------++------------++--------+ |
                    | +----------------------------------------+ |
                    | |             Knowledge Base             | |
                    | +----------------------------------------+ |
                    +--------------------------------------------+

Figure 1: Reference function architecture of NMA

The main function components of NMA include:¶

Intent Management:

Basic capability provided by AI models, responsible for collecting the input task information and translate into intents through AI model reasoning.¶

Network Perception:

Achieve real-time query for network status information related to the task intent. Network status information is not limited to network topology, service configurations, device status, alarms, performances, etc. The query source can be controller, ENO, etc.¶

Task Planning:

Based on the reasoning ability of AI models, break down the task intention into multiple sub operations.¶

Orchestration and execution:

Select the appropriate tools based on the specific operation, and automatically call the relevant tools or interfaces to perform the operation. After each sub operation is completed, the execution results of each operation are formed into task execution results.¶

Reflection and self-optimization:

Additionally, artificial evaluation methods can be integrated to further optimize the NMA's performance through human supervision, enhancing the NMA's intention understanding and task execution capabilities.¶

Memory:

Responsible for storing and processing various types of information during the operation of NMA, including long-term memory (LTM) and short-term memory (STM). STM stores information that NMA is currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. LTM can store information for a remarkably long time, ranging from a few days to months or years. To summarize, STM is for in-context learning which is short and finite, as it is restricted by the finite context window length of Transformer. LTM is for the external vector store that the NMA can attend to query time, accessible via fast retrieval.¶

Multi-agents collaboration

Responsible for completing collaboration between multiple NMAs at different levels or in different application scenarios. The specific collaboration mechanism needs further research.¶

In addition, there is a common AI service layer, including various large language models (LLMs), multimodal models, small models, and knowledge base. Among them, AI models provide public interactive intelligence capabilities as unified agent engine, to simplify NMA development. Knowledge base provides unified search for multi-type knowledge bases including vector knowledge base, system online help, operation and maintenance data logs), combines AI models to complete knowledge fusion and extraction, and improves the accuracy of NMA task execution.¶

Various NMAs can be constructed based on the common AI service layer. During the operation of NMA, it leverages the model reasoning capabilities and knowledge base provided by the AI service layer to achieve functions such as intent parsing and task planning. It should be noted that, depending on the actual deployment requirements, the AI basic service can also be deployed within the NMA.¶

For different application scenarios, there can be multiple scenario-oriented agents (like apps in the phone). Aimed at the network planning, construction, maintenance, optimization, and operation scenarios, the main NMAs could include:¶

Network Fault Handling Agent: This agent can be created by pre-training specific AI model based on the network troubleshooting guidance documents, network equipment product documents, and other materials. The agent can solidify the fault handling experience of experts, and realize fault impact analysis, root cause self-diagnosis, and self-repair of network faults by orchestrating and calling models or network control APIs. It also interfaces with the work order dispatching system to achieve automated closed-loop processing of work orders, etc.¶
Network Planning Agent: Makes use of the capabilities of AI large model to understand the network planning intent (user intent, business development goals, network construction plans, etc.), and analyzes and forecasts the current network resource usage (traffic, performance, user scale, resource utilization, etc.) to output planning schemes.¶
Network Optimization Agent: Understands the network optimization goal through natural language, converts the optimization intent into network optimization constraint rules, such as network load thresholds, service route optimization strategies, etc. The instance can use traffic prediction models to predict the future traffic and bandwidth utilization of the entire network, automatically generate resource, hidden danger, performance, traffic, and other prediction results, and can automatically generate optimization strategies based on the prediction results to perform traffic pre-diversion, autonomous decision-making, and automatic execution to achieve dynamic energy saving of equipment and optimal traffic of the entire network, etc.¶
Intelligent Assistant Agent: This instance can have open Q&A capability based on LLM, providing a dialogue Q&A style operation and maintenance. Users can "one-click" input fault descriptions or resource names in natural language, and the instance will automatically perform intent recognition and query to significantly improve the efficiency of knowledge questioning, fault reporting, and maintenance support.¶

To be discussed in the later version.¶

4. Network Automation Architecture Based on NMAs

When deploying an NMA based management/control architecture, it is possible to consider two different deployment models, where the NMA can be part of an existing network controller, or can be an independent system deployed separately and interacting both with the controller and the network. The two deployment modes can be called: Independent deployment mode and Integrated deployment mode and are shown in Figure 2.¶


+-----------------------------+         +--------------------+
|                             |         |                    |
|          Network            <--C_A_I--> Network Management |
|        Controller           |         |    Agent(NMA)      |
|                             |         |                    |
+--------------^--------------+         +----------^---------+
               |                                   |
    Southbound Interface(SBI)           Intelligent SBI(I_SBI)
               |                                   |
+--------------v-----------------------------------v---------+
|                        Physical Network                    |
+------------------------------------------------------------+
                              (a)

+------------------------------------------------------------+
|                     Network Controller                     |
|                                                            |
|  +--------------------+           +--------------------+   |
|  | Original Function  <--Internal-> Network management |   |
|  |      Modules       | Interface |      Agent(NMA)    |   |
|  +--------------------+   (I_I)   +--------------------+   |
|                                                            |
+------------------------------^-----------------------------+
                               |
                      Extended SBI(E_SBI)
                               |
+------------------------------v-----------------------------+
|                       Physical Network                     |
+------------------------------------------------------------+
                              (b)

Figure 2: Deployment mode of network management agent (NMA)

Independent deployment mode:

As shown in Figure 2(a), NMA is independently deployed from the original network controller. NMA and controller are independent systems. A new east-west interface needs to be added between the NMA and the controller to achieve capability calling and result feedback operations. This interface can be called “C_A_I”. In this deployment mode, controller uses southbound interface (SBI) to interact with physical network, while an intelligent southbound interface (abbreviated as “I_SBI”) needs to be added between NMA and the underlying physical network.¶

Integrated deployment mode:

As shown in Figure 2(b), NMA is integrated and deployed with the original network controller, and the NMA serves as a function of the controller. NMA interacts with original function modules through internal interface (abbreviated as “I_I”). The enhanced controller interacts with the underlay physical network through extended SBI (abbreviated as “E_SBI”).¶

The specific functional requirements and information model definition of interfaces mentioned above will be discussed in the following version.¶

4.1. Deployment modes considerations and requirements

While the integrated deployment mode is relatively simple, due to an internal communication between the NMA and the controller, the independent deployment mode introduces several challenges to be analyzed, that can be grouped into “single agent” and “multi agent” challenges.¶

4.1.1. Single Agent Challenges

Starting from and architecture with a single NMA, like the one shown in Figure 3 below, the challenges that we need to address are:¶

NMA APIs: Agents use descriptions of APIs and tools in order to use them. A gap analysis against existing tools needs to be carried out to understand if the NMA API requirements can be met and if we can find an optimal or common way to describe network APIs for LLMs.¶
NMA triggers: Agents need to be triggered with an input, which can be “just” a natural language input or something with a more structured format. Is the trigger going to be initiated by a controller or is it ”just” a human readable string?¶
NMA interaction with existing controller: A wide variety of protocol and models exist today to interact with different components of existing controller. A gap analysis needs to be run to understand if those protocols and models are enough or extensions are needed in order to interact no longer with humans/UIs and higher order orchestrators/controllers but also by NMAs.¶

  User input
   ^
   | Trigger
   +------------>-------+         +-------------------------+
   +------------> Agent |<--------> Common AI Service Layer |
   | Trigger    +---^---+         +-------------------------+
   |                |      Existing interfaces: REST, RESTConf, gRPC
   |            SSH +-------+----------------+---------------+----------+
   |        NetConf |       |                |               |          |
   | gRPC/gNMI/gNOI |       |                |               |          |
   |                | +-----v------+ +-------v-------+ +-----v-----+ +--v--+
   +----------------+-< Controller | | Observability | | Inventory | | ... |
                    | +-----^------+ +-------^-------+ +-----^-----+ +--^--+
                    |       |                |               |          |
                +---v-------v----------------v---------------v----------v--+
                |                     Network Infrastructure               |
                +----------------------------------------------------------+

Figure 3: Network management architecture with single agent

4.1.2. Multi Agents Challenges

Things get a bit more complex when multiple NMAs are deployed and, in addition to interacting with existing controller, they need to interact with other NMAs as shown in Figure 4. In this case the challenges to consider are:¶

Inter NMA communication: It is just a natural language “string” or we need a more structured format/protocol? How can we ensure agents have a common understanding of context and can interwork?¶
NMA discovery: How do agents know about each other? They need to advertise their existence and capabilities to other NMAs? How do we describe their capabilities? How do we do it in a way that they can discover each other?¶

  User input
   ^                                   +-----------+
   | Trigger                       +--->  Agent B  <--------------+
   +------------>---------+        |   +-----^-----+        +-----v-----+
   +------------> Agent A |<-------+-------- |-------------->  Agent C  |
   | Trigger    +---^--^--+                  |              +---- ^-----+
   |                |  |                     |                    |
   |                |  |                     |                    |
   |            SSH +  +----+                |               +----+-----+
   |        NetConf |       |                |               |          |
   | gRPC/gNMI/gNOI |       |                |               |          |
   |                | +-----v------+ +-------v-------+ +-----v-----+ +--v--+
   +----------------+-< Controller | | Observability | | Inventory | | ... |
                    | +-----^------+ +-------^-------+ +-----^-----+ +--^--+
                    |       |                |               |          |
                +---v-------v----------------v---------------v----------v--+
                |                     Network Infrastructure               |
                +----------------------------------------------------------+

Figure 4: Network management architecture with multi agents

5. Common processing flow of NMA

The embedded AI model within NMA serves as the interface for user information input, and NMA instance uses the large model as the interface to clarify problems through multiple rounds, analyze positioning, generate plans, invoke interfaces/tools to handle problems, and complete closed-loop processing of problems, so as to build end-to-end problem processing assistance capabilities.¶

          User/Network
+-----> Management Task
|               |
|               v
|       Intent Analysis <-------+            +-- Service Configuration
|               |               |            |         API/Tool
|               |               v            |
|               |       Model Reasoning      |      Alarm Monitor
|               |               ^            |         API/Tool
|               v               |            |
|       Task Decomposition <----+            |   Performance Monitor
|               |                            |         API/Tool
|               v                            |
|      Tool/API Invocation-----> Toolkit ----+   Network Optimization
|               |                  |  ^      |         API/Tool
|               v                  |  |      |
|     Process Encapsulation        |  |      |   Topology Management
|               |                  |  |      |         API/Tool
|               v                  |  |      |
+---Executive Result Analysis      |  |      +-- other APIs/Tools
                                   |  |
                                   |  |
                                   |  |
                                   |  |
           +-----------------------v--+-----------------------------+
           |                   Physical Network                     |
           +--------------------------------------------------------+

Figure 5: Common processing flow of NMA

The common processing flow of NMA instance are shown in Figure 3. The processing steps include:¶

User/Network Management Task Input: Input the user’s task information Through multiple rounds of natural language interaction.¶
Intent Analysis: Analysis user task intent through AI model reasoning provided by the AI based basic services within NMA.¶
Task Decomposition: Split the task into detailed operations to be performed based on the analyzed intent of the task.¶
Tool/API Invocation: Call the corresponding tool or function API to complete the execution of each operation listed in step 3). The toolkit refers to the collection of all tools that can be used directly to manage and operate physical networks, which can include management functions from existing controller, EMS, or standalone other management tools. The toolkit can include service configuration API/Tool, alarm monitor API/Tool, performance monitor API/Tool, network optimization API/Tool, topology management API/Tool, etc.¶
Process Encapsulation: Encapsulate each execution step. According to the order or dependency of all the operations, package the individual operation results into the execution result of the entire task.¶
Executive result analysis: Analyze the task processing results and return to the user.¶

Through above processing flow, NMA can achieve closed-loop automated processing of tasks and constructing end-to-end intelligent network maintenance assistance capabilities. For example, in the intelligent troubleshooting scenario, NMA can identify the cause of the fault and call the corresponding interfaces to handle it, such as creating a troubleshooting order, automatically initiating rerouting/optical power optimization, and other troubleshooting operations, and automatically verifying the progress of the order execution, with feedback on the troubleshooting results after the job order is completed.¶

The introduction of NMA can effectively improve the level of intelligent operation and maintenance of network, thus promoting the continuous evolution of communication network towards higher-level self-intelligence.¶

[Agents]: Wiesinger, J., Marlow, P., and V. Vuskovic, "Google Whitepaper: Agents", 10 September 2024.
[I-D.irtf-nmrg-ai-challenges]: François, J., Clemm, A., Papadimitriou, D., Fernandes, S., and S. Schneider, "Research Challenges in Coupling Artificial Intelligence and Network Management", Work in Progress, Internet-Draft, draft-irtf-nmrg-ai-challenges-03, 4 March 2024, <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-ai-challenges-03>.
[I-D.irtf-nmrg-network-digital-twin]: Zhou, C., Yang, H., Duan, X., Lopez, D., Paster, A., Wu, Q., Bouncadair, M., and C. Jacquenet, "Network Digital Twin: Concepts and Reference Architecture", Work in Progress, Internet-Draft, draft-irtf-nmrg-network-digital-twin-arch-09, 24 January 2025, <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-network-digital-twin-arch-09>.
[I-D.kdj-nmrg-ibn-usecases]: Yao, K., Chen, D., Jeong, J., Wu, Q., Yang, C., and L. Contreras, "Use Cases and Practices for Intent-Based Networking", Work in Progress, Internet-Draft, draft-kdj-nmrg-ibn-usecases-01, 8 July 2024, <https://datatracker.ietf.org/doc/html/draft-kdj-nmrg-ibn-usecases-01>.
[LLM-powered-autonomous-agents]: Weng, L., "LLM Powered Autonomous Agents", 23 June 2023.
[RFC7575]: Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic Networking: Definitions and Design Goals", RFC 7575, DOI 10.17487/RFC7575, June 2015, <https://www.rfc-editor.org/rfc/rfc7575>.
[RFC7576]: Jiang, S., Carpenter, B., and M. Behringer, "General Gap Analysis for Autonomic Networking", RFC 7576, DOI 10.17487/RFC7576, June 2015, <https://www.rfc-editor.org/rfc/rfc7576>.
[RFC9222]: Carpenter, B. E., Ciavaglia, L., Jiang, S., and P. Peloso, "Guidelines for Autonomic Service Agents", RFC 9222, DOI 10.17487/RFC9222, March 2022, <https://www.rfc-editor.org/rfc/rfc9222>.
[RFC9315]: Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", RFC 9315, DOI 10.17487/RFC9315, October 2022, <https://www.rfc-editor.org/rfc/rfc9315>.
[TMF-IG1230]: McDonnell, K., Machwe, A., Milham, D., O’Sullivan, J., Clemm, A., and J. Niemöller, "Autonomous Networks Technical Architecture", TMF IG1230, December 2022.

AI based Network Management Agent(NMA): Concepts and Architecture

Abstract

Discussion Venues

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

1.1. Background

1.2. Introduction of Network Management Agent (NMA)

2. Terminology

2.1. Acronyms and Abbreviations

2.2. Definitions

3. Reference architecture of NMA

3.1. Function Requirements of NMA

3.2. Reference Architecture of NMA

4. Network Automation Architecture Based on NMAs

4.1. Deployment modes considerations and requirements

4.1.1. Single Agent Challenges

4.1.2. Multi Agents Challenges

5. Common processing flow of NMA

6. Typical Application Scenarios after Introducing NMA

7. Security Considerations

8. IANA Considerations

9. References

9.1. Normative References

9.2. Informative References

Authors' Addresses