MCP Server On FC Tour Stop 4: Long Connection Idle Billing Reduces Costs by Up to 87%

Explore how Alibaba Cloud Function Compute reduces the idle cost of MCP Server long connections and achieves a technical breakthrough of reducing costs by up to 87%.
Core content:
1. Introduction to Alibaba Cloud Function Compute service and its support for MCP Server
2. Analysis of the idle long connection problem in MCP Server scenario
3. Function Compute's billing optimization solution for MCP Server long connection idleness
Function Compute (FC) is an event-driven, fully managed computing service of Alibaba Cloud. With Function Compute, you do not need to purchase and manage infrastructure such as servers. You only need to write and upload code or images. Function Compute prepares computing resources for you, runs tasks flexibly and reliably, and provides functions such as log query, performance monitoring, and alarm. In the MCP Server scenario, Function Compute not only supports the one-click hosting of community open source Stdio MCP Server to Function Compute through MCP Runtime , but also solves the key problem of MCP Server Session retention through affinity scheduling . At the same time, Function Compute implements the long connection idle billing capability based on the existing millisecond-level billing of Function Compute for the scenario characteristics of MCP Server, and supports the MCP Server deployed to Function Compute to implement pay-as-you-go billing. In sparse call scenarios, the hosting cost of MCP Server can be reduced by up to 87%.
Why MCP Server
Maybe there is a resource idleness issue?
Cloud Native
In the first article of the series, MCP Server Practice Tour, Stop 1: MCP Protocol Analysis and Cloud Adaptation , we deeply analyzed the MCP and SSE protocols. The protocol implements the client-server interaction control and session persistence mechanism by defining standardized event types. The interaction process is shown in the following figure:
event:endpoint
For events of this type, the sessionId information is put into data and returned. (Connection1)initialized
As confirmation. (Connection3)list tools
. (Connection4)call tool
. (Connection5)Therefore, due to the characteristics of the MCP Server communication protocol that requires session persistence, once initialized, a long connection will be established to bind fixed server resources. However, the business traffic of most MCP Servers presents typical sparse access and burstiness characteristics - the request distribution is highly discrete and the traffic peaks and valleys fluctuate significantly, resulting in very low actual resource utilization of server resources. The typical scenario is shown in the figure below:
A user initialized an MCP Server through a large model to achieve the search capability of a document library. The large model session lasted for 1 hour before being closed. During this period, a total of 2 searches were performed. The resource retention time was 1 hour. The actual initialization and search time was 7.1 seconds, and the resource idle ratio was as high as 99.8%. In some complex AI Agent scenarios, a session may need to connect to multiple MCP Servers for different purposes, which may result in a large amount of idle resources. Who will bear this idle cost?
Function Compute offers benefits to users.
Bear the cost of idle MCP Server resources
Cloud Native
Pay-as-you-go billing is the purpose of Function Compute to reduce costs for users. Through long-term efforts in technology, Function Compute has established three core capabilities:
High-density and diversified co-location realizes staggered use of computing resources: Function Compute has active users from different industries, generating a large number of load types. Based on Alibaba Cloud's "sandbox containers" and "bare metal servers", Function Compute realizes high-density deployment on computing nodes, allowing computing resources to be used in staggered manner in a variety of scenarios, thereby improving the overall energy efficiency of the server. Actively intervene in scheduling based on function-level resource profiling to reduce the risk of resource crowding: Function Compute builds an accurate resource profiling model for active functions based on historical data. It can identify the resource usage of functions in different time periods. Based on the profiling model, it can proactively schedule functions that occupy a large amount of computing resources to reduce the probability of request delays due to computing node resource crowding. Hundreds of milliseconds of fast elasticity and smooth migration capabilities can quickly handle resource crowding issues: Function Compute has hundreds of milliseconds of elasticity and the ability to smoothly migrate user loads. When resource crowding is detected, it can be quickly restored through smooth migration.
Because of these core capabilities, Function Compute has achieved cost reduction through technology, so it chose to pass on profits to users and bear the cost of idle resources of MCP Server.
Technical details of idle billing implementation
Cloud Native
Currently, Function Compute has realized the ability to pay by usage. As shown in the Function Compute Billing Overview [1] , the mode of billing by request is a typical mode of billing by usage. Fees are only incurred during the execution of requests. If there is no request execution, the instance is in a "frozen" state. The resources will be automatically released after the freeze lasts for a few minutes, and no additional fees will be incurred. Even if you choose to reserve an instance, since there is a clear "frozen" state to determine idleness, the reserved instance only needs to pay the memory cost when there is no request execution. However, due to the characteristics of MCP Server protocol session maintenance, asynchronous submission and streaming return, computing resources must always be kept active during the session. Therefore, it is impossible to reduce the idle resource cost by a clear "frozen" state. Therefore, Function Compute introduces an additional method to determine idleness for MCP Server scenarios, as shown below:
Function Compute divides the MCP Server long connection time into multiple idle judgment cycles. If the actual CPU time consumed in a cycle is lower than a certain threshold, the cycle is considered idle. The setting of this threshold ensures that the function instance is considered active only when actual calls such as Initialize/List Tools/Call Tools occur.
Take the above sparse call scenario as an example:
Of the 4 actual actions, only 8 seconds are counted as active periods, and the remaining 3592 seconds are idle periods. During the idle periods, CPU costs are exempted from metering, and only memory costs are calculated. For memory costs, refer to the Function Compute Billing Overview [1] . Taking a 2-core 3GB configuration as an example, memory costs only account for 18%, and the overall cost savings are:
(1- (8 + 3592 * 18%)/3600), about 82%. If we take the minimum memory ratio configuration of 2 cores and 2GB as an example, the overall cost can be reduced by 87% . In order to simplify the understanding of idle billing in the MCP Server scenario, the cost savings are directly converted into reduced CU computing time. The unconditional reduction is reflected in the reduction of CU computing time and active vCPU time in metering .
How to enable MCP Server
Idle billing capacity?
Cloud Native
The main goal of the MCP Server idle billing capability is to solve the problem of idleness caused by session affinity. When session affinity is enabled, the idle billing capability is enabled by default. For the enabling method, refer to MCP affinity scheduling [2] .
When you develop an MCP service using the MCP runtime in the Function Compute console [3] or create an MCP service using Function AI [4] , the created function comes with the idle billing capability of the MCP Server:
Function Compute console:
Funciton AI Console:
In addition, for scenarios where Websocket needs to maintain long requests, idle billing is also supported. It is enabled by default without parameter settings. Please try it out. For billing details, please go to Function Compute - Resource Usage Details [7] .
Related links
【1】Overview of Function Compute Billing
https://www.alibabacloud.com/help/zh/functioncompute/fc-3-0/product-overview/billing-overview-1?spm=a2c63.p38356.help-menu-2508973.d_0_3_0.429913e5SVq5eU
【2】MCP Affinity Scheduling
https://help.aliyun.com/zh/functioncompute/fc-3-0/user-guide/mcp-sse-affinity-scheduling?spm=a2c 4g.11186623.help-menu-2508973.d_2_6_0.12b85577qvYmAf&scm=20140722.H_2882558._.OR_help-T_cn~zh-V_1 #b44485e04bfeo
【3】Function Compute Console
https://fcnext.console.aliyun.com/overview
【4】Create MCP service
https://help.aliyun.com/zh/functioncompute/fc-3-0/mcp-server
【5】CreateFunction – Create a function
https://help.aliyun.com/zh/functioncompute/fc-3-0/developer-reference/api-fc-2023-03-30-createfunction
【6】UpdateFunction - Update function
https://help.aliyun.com/zh/functioncompute/fc-3-0/developer-reference/api-fc-2023-03-30-updatefunction
【7】Function Compute - Resource Usage Details
https://fcnext.console.aliyun.com/billing
Related articles