AIGC: Performance Perspective Black Technology: One-click analysis, full presentation of pictures and texts

Written by

Jasper Cole

Updated on:June-30th-2025

Performance analysis tool code analysis report

background

• Last time, we implemented the scripting function to test the large model script, based on the improvement of the previous script, read Excel data to generate analysis charts
• The tool mainly implements core functions such as performance data loading, analysis, visualization and report generation
• The core goal is to help evaluate and analyze the performance of large language models in different test scenarios\

Part 1: Tool Architecture Design

Core class design

Adopt object-oriented approach and encapsulate all functions through performance analysis script class
Implement modular functions such as data loading, indicator calculation, visualization and report generation
Use the logging module for log management and provide complete running status tracking

Data processing flow

Support test data input in Excel format
Implement data preprocessing and validation, including timestamp conversion, test type mapping, etc.
Automatically calculate key performance indicators such as throughput, response time, etc.\

def _load_data ( self ) ->  None : 
    # Read Excel data
    self .df = pd.read_excel( self .data_file)
    
    # Convert timestamp to datetime object
    self .df[ 'timestamp' ] = pd.to_datetime( self .df[ 'timestamp' ])
    
    # Create a test type map and apply it
    self .df[ 'test_type' ] =  self .df[ 'test_id' ].apply(
        lambda  x: test_type_map[ 'basic_test' ]  if  x ==  'basic_test' else
        (test_type_map[ 'long_text_test' ]  if  x ==  'long_text_test' else  test_type_map[ 'concurrency_test' ])
    )
    
    # Calculate throughput
    self .df[ 'throughput' ] =  self .df[ 'total_tokens_generated' ] /  self .df[ 'total_time' ]

Part 2: Performance Indicator Analysis Function

Core performance indicators

Response time analysis: total response time, minimum/maximum response time
Token delay analysis: first token delay, average token delay
Throughput analysis: number of tokens generated per second
Concurrency performance: request success rate, number of requests per second, etc.
Visual analysis

Implement four core chart types:

1. Comparison of response time by test type

2. Token Delay Comparison Analysis

3. Throughput Comparison Analysis

4. Concurrent test response time trend\

def _calculate_test_type_metrics ( self ) ->  None : 
    # Calculate multiple aggregate indicators by test type
    self .test_type_metrics =  self .df.groupby( 'test_type' ).agg({
        'total_time' : [ 'mean' ,  'min' ,  'max' ],
        'first_token_latency' :  'mean' ,
        'avg_token_latency' :  'mean' ,
        'total_tokens_generated' :  'mean' ,
        'throughput' :  'mean'
    }).reset_index()

Part 3: Output and reporting functions

Data export capability

Supports exporting performance indicators to CSV format
Visualization charts are exported as PNG format
Provides structured performance indicator printout test type support

Basic Test
Long Text Test
Concurrency Test
Support Chinese and English test type mapping display

def create_visualizations ( self, output_path:  str  =  'performance_analysis.png' ) -> Figure: 
    # Create a 2x2 sub-graph layout
    fig, axes = plt.subplots( 2 ,  2 , figsize=( 16 ,  12 ))
    
    # Draw four different types of charts
    self ._plot_response_time_comparison(axes[ 0 ,  0 ])
    self ._plot_token_latency_comparison(axes[ 0 ,  1 ])
    self ._plot_throughput_comparison(axes[ 1 ,  0 ])
    self ._plot_concurrency_response_time(axes[ 1 ,  1 ])