Woter AI detection.Hurry - ends Jul 10th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

MCP Server On FC Tour Stop 4: Long Connection Idle Billing Reduces Costs by Up to 87%

Written by

Caleb Hayes

Updated on:June-28th-2025

Function Compute (FC) is an event-driven, fully managed computing service of Alibaba Cloud. With Function Compute, you do not need to purchase and manage infrastructure such as servers. You only need to write and upload code or images. Function Compute prepares computing resources for you, runs tasks flexibly and reliably, and provides functions such as log query, performance monitoring, and alarm. In the MCP Server scenario, Function Compute not only supports the one-click hosting of community open source Stdio MCP Server to Function Compute through MCP Runtime , but also solves the key problem of MCP Server Session retention through affinity scheduling . At the same time, Function Compute implements the long connection idle billing capability based on the existing millisecond-level billing of Function Compute for the scenario characteristics of MCP Server, and supports the MCP Server deployed to Function Compute to implement pay-as-you-go billing. In sparse call scenarios, the hosting cost of MCP Server can be reduced by up to 87%.

Why MCP Server

Maybe there is a resource idleness issue?

Cloud Native

In the first article of the series, MCP Server Practice Tour, Stop 1: MCP Protocol Analysis and Cloud Adaptation , we deeply analyzed the MCP and SSE protocols. The protocol implements the client-server interaction control and session persistence mechanism by defining standardized event types. The interaction process is shown in the following figure:

1. The client initiates a GET request to establish an SSE persistent connection . (Connection 1)

2. Server reply event：endpoint For events of this type, the sessionId information is put into data and returned. (Connection1)

3. The client uses the sessionId information returned in step 2 to initiate the first HTTP POST request. (Connection2)

4. The server responds quickly with 202, but with no content. (Connection2)

5. The server returns the actual message requested in step 3. (Connection1)

6. The client uses the sessionId returned in step 2 to initiate an HTTP POST request initialized As confirmation. (Connection3)

7. The server responds quickly with 202, no content. (Connection3)

8. The client uses the sessionId returned in step 2 to initiate an HTTP POST request list tools. (Connection4)

9. The server responds quickly with 202, no content. (Connection4)

10. The server returns the actual message requested in step 8, which is the tool list. (Connection1)

11. The client uses the sessionId returned in step 2 to initiate an HTTP POST request call tool. (Connection5)

12. The server responds quickly with 202, no content. (Connection5)

13. The server returns the actual message requested in step 11, which is the tool call result. (Connection1)

Therefore, due to the characteristics of the MCP Server communication protocol that requires session persistence, once initialized, a long connection will be established to bind fixed server resources. However, the business traffic of most MCP Servers presents typical sparse access and burstiness characteristics - the request distribution is highly discrete and the traffic peaks and valleys fluctuate significantly, resulting in very low actual resource utilization of server resources. The typical scenario is shown in the figure below:

A user initialized an MCP Server through a large model to achieve the search capability of a document library. The large model session lasted for 1 hour before being closed. During this period, a total of 2 searches were performed. The resource retention time was 1 hour. The actual initialization and search time was 7.1 seconds, and the resource idle ratio was as high as 99.8%. In some complex AI Agent scenarios, a session may need to connect to multiple MCP Servers for different purposes, which may result in a large amount of idle resources. Who will bear this idle cost?

Function Compute offers benefits to users.

Bear the cost of idle MCP Server resources

Cloud Native

Pay-as-you-go billing is the purpose of Function Compute to reduce costs for users. Through long-term efforts in technology, Function Compute has established three core capabilities:

High-density and diversified co-location realizes staggered use of computing resources: Function Compute has active users from different industries, generating a large number of load types. Based on Alibaba Cloud's "sandbox containers" and "bare metal servers", Function Compute realizes high-density deployment on computing nodes, allowing computing resources to be used in staggered manner in a variety of scenarios, thereby improving the overall energy efficiency of the server.
Actively intervene in scheduling based on function-level resource profiling to reduce the risk of resource crowding: Function Compute builds an accurate resource profiling model for active functions based on historical data. It can identify the resource usage of functions in different time periods. Based on the profiling model, it can proactively schedule functions that occupy a large amount of computing resources to reduce the probability of request delays due to computing node resource crowding.
Hundreds of milliseconds of fast elasticity and smooth migration capabilities can quickly handle resource crowding issues: Function Compute has hundreds of milliseconds of elasticity and the ability to smoothly migrate user loads. When resource crowding is detected, it can be quickly restored through smooth migration.

Because of these core capabilities, Function Compute has achieved cost reduction through technology, so it chose to pass on profits to users and bear the cost of idle resources of MCP Server.

Technical details of idle billing implementation

Cloud Native

Currently, Function Compute has realized the ability to pay by usage. As shown in the Function Compute Billing Overview ^[1], the mode of billing by request is a typical mode of billing by usage. Fees are only incurred during the execution of requests. If there is no request execution, the instance is in a "frozen" state. The resources will be automatically released after the freeze lasts for a few minutes, and no additional fees will be incurred. Even if you choose to reserve an instance, since there is a clear "frozen" state to determine idleness, the reserved instance only needs to pay the memory cost when there is no request execution. However, due to the characteristics of MCP Server protocol session maintenance, asynchronous submission and streaming return, computing resources must always be kept active during the session. Therefore, it is impossible to reduce the idle resource cost by a clear "frozen" state. Therefore, Function Compute introduces an additional method to determine idleness for MCP Server scenarios, as shown below:

Function Compute divides the MCP Server long connection time into multiple idle judgment cycles. If the actual CPU time consumed in a cycle is lower than a certain threshold, the cycle is considered idle. The setting of this threshold ensures that the function instance is considered active only when actual calls such as Initialize/List Tools/Call Tools occur.

Take the above sparse call scenario as an example:

Of the 4 actual actions, only 8 seconds are counted as active periods, and the remaining 3592 seconds are idle periods. During the idle periods, CPU costs are exempted from metering, and only memory costs are calculated. For memory costs, refer to the Function Compute Billing Overview ^[1]. Taking a 2-core 3GB configuration as an example, memory costs only account for 18%, and the overall cost savings are:

(1- (8 + 3592 * 18%)/3600), about 82%. If we take the minimum memory ratio configuration of 2 cores and 2GB as an example, the overall cost can be reduced by 87% . In order to simplify the understanding of idle billing in the MCP Server scenario, the cost savings are directly converted into reduced CU computing time. The unconditional reduction is reflected in the reduction of CU computing time and active vCPU time in metering .

How to enable MCP Server

Idle billing capacity?

Cloud Native

The main goal of the MCP Server idle billing capability is to solve the problem of idleness caused by session affinity. When session affinity is enabled, the idle billing capability is enabled by default. For the enabling method, refer to MCP affinity scheduling ^[2].

When you develop an MCP service using the MCP runtime in the Function Compute console ^{[3] or}create an MCP service using Function AI ^[4], the created function comes with the idle billing capability of the MCP Server:

Function Compute console:

Funciton AI Console:

For other scenarios, you need to enable it through parameters when creating a function: call the API CreateFunction - create a function ^[5]or UpdateFunction - update a function ^[6], and specify the affinity policy of the call request as "MCP_SSE" through the SessionAffinity field. Note: Due to the scarcity of GPU resources, idle billing for long connections is not supported for function instances configured with GPUs.

In addition, for scenarios where Websocket needs to maintain long requests, idle billing is also supported. It is enabled by default without parameter settings. Please try it out. For billing details, please go to Function Compute - Resource Usage Details ^[7].

MCP Server On FC Journey 2: From 0 to 1 - MCP Server Market Construction and Existing OpenAPI to MCP Server

MCP Server Practice Tour Stop 3: Technical Insider of MCP Protocol Affinity

MCP Server On FC Tour Stop 4: Long Connection Idle Billing Reduces Costs by Up to 87%

MCP Server Practice Tour Stop 1: MCP Protocol Analysis and Cloud Adaptation

MCP Server On FC Journey 2: From 0 to 1 - MCP Server Market Construction and Existing OpenAPI to MCP Server

MCP Server Practice Tour Stop 3: Technical Insider of MCP Protocol Affinity