The main battlefield of large model reasoning: What is the standard communication protocol?

In the era of large model applications, how do SSE and WebSocket become the new standard for communication protocols?
Core content:
1. The demand for large model reasoning has increased sharply, and the main battlefield for performance improvement has shifted
2. The definition of SSE and WebSocket and their advantages in large model applications
3. The technical challenges and coping strategies of SSE and WebSocket
What are SSE and WebSocket? What was the mainstream network communication protocol before the emergence of large model applications? Why don't large-scale model applications use the mainstream communication protocols of Web applications? Why are SSE and WebSocket more suitable for supporting large model applications? Technical Challenges and Solutions of Real-time Communication Protocols What's Next?
What are SSE and WebSocket?
Efficient one-way communication: Designed for one-way communication from server to client, it perfectly matches large model scenarios (the client sends a request once, and the server continuously returns streaming results). Low latency: Each time a logical paragraph or token is generated, it can be pushed immediately, avoiding the long wait of the traditional HTTP request-response model. Lightweight protocol: Based on HTTP/HTTPS, no additional protocol handshake is required (such as WebSocket's two-way negotiation), reducing connection overhead.
Full-duplex communication : The client and server can send and receive data at the same time.
Persistent connection : After the connection is established, it remains open until it is actively closed. Low latency : Data can be transmitted instantly, suitable for real-time applications.
What was the mainstream network communication protocol before the emergence of large model applications?
Based on request-response model.
Stateless: Each request is independent and the server does not save the client's state. Data encryption prevents data from being eavesdropped or tampered with; identity authentication ensures that the client communicates with the correct server; data integrity prevents data from being modified during transmission. (HTTP is plain text transmission)
Simple and easy to use : The HTTP protocol is simple in design and easy to implement and use.
Widely supported : Almost all browsers, servers, and development frameworks support HTTP. Flexibility : Supports multiple data formats (such as JSON, XML, HTML) and content types. Stateless : Simplifies server design and is suitable for distributed systems.
Security and Compliance : Protect data transmission through encryption technology to prevent eavesdropping and tampering; comply with modern cybersecurity standards (such as GDPR, PCI DSS).
Why don't large-scale model applications use the mainstream communication protocols of Web applications?
Real-time dialogue : Users interact with the model continuously, and the model needs to respond immediately. For example, Tongyi Qianwen and the Q&A robot on the HIgress official website need to respond immediately based on customer questions.
Streaming output : When a large model generates content, it returns the results word by word or sentence by sentence, rather than all at once. However, in applications such as DingTalk and WeChat, when two people talk to each other, streaming output is not used, and text and other content are returned all at once. Long-term task processing : Large models may take a long time to process complex tasks and need to provide progress feedback to the client, especially when processing long texts, as well as multimodal content such as images and videos. This is because responses that rely on large model calculations consume much more resources than responses that rely on manually written business logic. This is why large model calculations rely on GPUs rather than CPUs. CPUs are far inferior to GPUs in parallel computing and large-scale matrix computing. Multiple rounds of interaction : Users and models need to interact multiple times to maintain context. This is a necessary capability for large model applications to ensure user experience.
It only supports one-way communication, that is, the request-response model. The server can only respond when the client initiates the request. It cannot perform two-way communication, resulting in the inability to support streaming output and handle long-term tasks.
Each time the client makes a request, it needs to re-establish a connection, which increases latency and makes it impossible to support real-time conversations. HTTPS is a stateless communication protocol. Each request is independent and the server does not save the client's status. Even if the client can repeatedly send context information with each request, it will bring additional network overhead and make it impossible to efficiently support multi-round interaction scenarios.
Why are SSE and WebSocket more suitable for supporting large model applications?
The workflow of SSE is as follows:
1. The client initiates an SSE connection
The client uses JavaScript
EventSource
The API initiates an HTTP request to the server.The request header contains Accept: text/event-stream
, indicating that the client supports the SSE protocol.Sample code:
const eventSource = new EventSource('https://example.com/sse-endpoint');
2. The server returns an HTTP response
The server response header must contain the following fields:
Content-Type: text/event-stream
: Indicates that the response content is SSE data stream.Cache-Control: no-cache
: Disable cache to ensure data is updated in real time.Connection: keep-alive
: Maintain a long connection.
Example response header:
HTTP/1.1 200 OKContent-Type: text/event-streamCache-Control: no-cacheConnection: keep-alive
The server continuously pushes data through HTTP persistent connections. Each message begins with data:
Starts with two newline characters\n\n
Finish.Example data flow:
data: { "message" : "Hello" }
data: { "message" : "World" }
4. Client processes data
Client via EventSource
ofonmessage
Event monitoring for data pushed by the server.Sample code:
eventSource.onmessage = (event) => { console.log('Received data:', event.data);};
5. Connection closing or error handling
If the connection is lost (such as due to a network problem), the client automatically attempts to reconnect. The server can send
retry:
field specifies the reconnection time in milliseconds.Example reconnect settings:
retry: 5000
Call the API and check the response headers :
Usestream=True
Parameter request streaming response, through the developer tools orcurl
View returnedContent-Type
, iftext/event-stream
, then it is clearly SSE.
Example command:
curl -X POST "https://api.deepseek.com/v1/chat/completions" \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-chat", "messages":[{"role":"user","content":"Hello"}], "stream":true}' \ -v # View detailed response header
< HTTP/1.1 200 OK< Content-Type: text/event-stream< Transfer-Encoding: chunked
Data format verification :
The data body format of the streaming response isdata: {...}\n\n
, in accordance with the SSE specification [1].
Example response snippet:
data: {"id":"123","choices":[{"delta":{"content":"Hi"}}]}data: [DONE]
The workflow of WebSocket is as follows:
1. The client initiates a WebSocket handshake request
The client initiates a WebSocket handshake via an HTTP request, and the request header contains the following fields:
Upgrade: websocket
: Indicates that the client wishes to upgrade to the WebSocket protocol.Connection: Upgrade
: Indicates that the client wishes to upgrade the connection.Sec-WebSocket-Key
: A randomly generated Base64 encoded string used for handshake verification.
Example request header:
GET /ws-endpoint HTTP/1.1Host: example.comUpgrade: websocketConnection: UpgradeSec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==Sec-WebSocket-Version: 13
2. The server returns a WebSocket handshake response
The server verifies the client's handshake request and returns an HTTP 101 status code (Switching Protocols), indicating that the protocol upgrade was successful. The response header contains the following fields:
Upgrade: websocket
: Confirm the protocol upgrade.Connection: Upgrade
: Confirm the connection upgrade.Sec-WebSocket-Accept
: Client-basedSec-WebSocket-Key
The calculated value used to verify the handshake.
Example response header:
HTTP/1.1 101 Switching ProtocolsUpgrade: websocketConnection: UpgradeSec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
3. WebSocket connection establishment
After a successful handshake, the HTTP connection is upgraded to a WebSocket connection, and the client and server can communicate bidirectionally via the WebSocket protocol. The connection is based on TCP and supports full-duplex communication.
4. Data Transfer
The client and server send and receive data frames through the WebSocket protocol. Data frames can be in text or binary format. Text frame : used to transmit text data such as JSON and strings.
{"message": "Hello"}
Binary frame : used to transmit binary data such as pictures, audio, and video.
[0x01, 0x02, 0x03]
5. Connection closed
The client or server can actively send a Close Frame to terminate the connection. The Close frame contains a close status code and an optional reason. Example close frame:
Close Frame: - Code: 1000 (Normal Closure) - Reason: "Connection closed by client"
Event-based communication: Realtime API uses WebSocket for stateful event-driven interaction, and the client and server communicate by sending and receiving events in JSON format1 3 5 . Persistent connections: The WebSocket protocol enables APIs to maintain a persistent two-way connection, allowing for immediate data flow, which is critical for real-time conversations and interactions. Multimodal support: The API not only supports text input, but can also process audio data to provide a richer and more natural user experience.
Technical Challenges and Solutions of Real-time Communication Protocols
Stability risks caused by software changes and service expansion and contraction
The faster the application develops, the more emerging the application is, the higher the frequency of software changes, and gateway upgrades are an important part of software changes. However, gateway upgrades usually involve service restarts, configuration changes, or network switching, which will directly affect the stability of SSE and WebSocket connections. During the service expansion process (adding instances), the existing SSE and WebSocket may not be able to connect to the new instance. During the service reduction process (reducing instances), the existing SSE and WebSocket may be forcibly closed due to the service offline. These applications with high real-time requirements, such as games and large-model real-time chat, will lead to a decline in user experience.
Lossless on- and off-line capability: This capability is widely used when microservices are changed, and can effectively reduce the stability risk of version release and scaling. It is commonly found in commercial capabilities of cloud products. For example, Alibaba Cloud's cloud-native API gateway provides microservice governance capabilities for HTTPS/WebSocket. Client reconnection mechanism: An automatic reconnection mechanism is designed on the client to reduce the impact of interruptions. Just like lossless online and offline, a heartbeat packet is used to detect the connection status. Once interrupted, the connection is automatically reconnected. In addition, the sent data can be recorded on the server to achieve breakpoint resumption. Protocol switching mechanism: When SSE and WebSocket are unavailable, fall back to long polling, but this depends on whether the gateway itself supports these long connections.
Large bandwidth leads to rapid increase in memory stability risk and bandwidth cost
High latency increases the resource cost of preventing malicious attacks
Authentication: Check the compliance of requests from clients. Select a third-party authentication protocol based on specific business needs. Based on our experience with customers, most choose OAuth2 and JWT.
Security protection: Design security protection measures through IP restrictions or based on features such as URLs and request headers. Traffic control: Token-level traffic control based on URL parameters, HTTP request headers, client IP addresses, consumer names, or keys in cookies.
What's Next?