Python + Knowledge Graph: Big Data Audit Practice Revealed - Tracking of 50 Million Abnormal Funds

Written by
Audrey Miles
Updated on:June-27th-2025
Recommendation

In the digital economy era, Python and knowledge graph technology help the audit revolution and complete the tracking of 50 million abnormal funds in 72 hours.

Core content:
1. The three major difficulties and challenges faced by traditional auditing
2. Python tool development to break through data processing barriers
3. The design and application of multi-dimensional feature analysis engine and dynamic threshold warning system

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

In the era of digital economy, traditional auditing methods are undergoing disruptive changes. A provincial audit office completed a special audit task that originally took half a month in just 72 hours through the Python data analysis tools and knowledge graph technology independently developed by the province. What kind of technological breakthroughs and practical wisdom are hidden behind this battle between data and wisdom?



1. Breaking through the fog: Traditional auditing encounters three major difficulties


In the spring of 2023, a provincial audit office received a special task: to conduct a penetrating audit of the fiscal expenditures of 87 state-owned enterprises in the province over the past three years. The audit team was faced with a mountain of paper vouchers, electronic data scattered across 12 business systems, and a complex network of related companies.


"Initially, we adopted the traditional sampling audit method, and our team of 20 people only completed the audit of three companies in one week," recalled Director Wang, the chief judge. "Three major pain points became increasingly obvious: inefficient processing of massive data, difficulty in penetrating hidden associations, and delayed identification of abnormal features."


When spot-checking a construction company, auditors found an abnormal expenditure of 5 million yuan in "engineering consulting fees", but when they traced the flow of funds, they were blocked by a three-layer nested chain of related transactions. This episode exposed the fatal shortcoming of traditional auditing - manual verification is difficult to cope with carefully designed complex transaction structures.


2. Technical Icebreaker: Python script breaks through data processing barriers

The audit team set up a technical research team overnight and developed three sets of customized Python tools:

1. "Data Scavenger" cleaning tool (core code example):

Python

def clean_financial_data ( raw_df ) :      
      # Dealing with inconsistent amount units
     raw_df['amount'] = raw_df['amount'].apply(lambda x: x*10000 if '10,000 yuan' in str(x) else x) # Intelligently identify and fill in missing voucher numbers
     pattern = re.compile(r'^[AZ]{2}\d{8}$')
     raw_df [ 'voucher number' ] =  raw_df [ 'voucher number' ] . fillna ( '' ) . apply (          
            lambda  x :  generate_voucher_id ( ) if not  pattern . match ( str ( x ) ) else  x )        
     # Create a unique identifier for fund flow
     raw_df [ 'Transaction fingerprint ' ] =  raw_df.apply (          
            lambda  row : f" { row [ 'payer' ] } _ { row [ 'recipient' ] } _ { row [ 'amount' ] } _ { row [ 'date' ] } " ,  axis = 1 )      
     return raw_df.drop_duplicates(subset=['Transaction fingerprint'])


2. Multi-dimensional feature analysis engine :

  • Construct a 12-dimensional abnormal indicator model, including characteristics such as "percentage of nighttime transactions", "frequency of integer amounts", and "density of related-party transactions".

  • Unsupervised anomaly detection using the isolation forest algorithm


3. Dynamic threshold warning system :

Python

def dynamic_threshold ( df ,  window = 30 ) :      

      df [ 'moving average' ] =  df [ 'amount' ] . rolling ( window = window ) . mean ( )      

      df [ 'standard deviation' ] =  df [ 'amount' ] . rolling ( window = window ) . std ( )      

      df [ 'abnormal threshold' ] =  df [ 'moving average' ] + 3 * df [ 'standard deviation' ]        

      return  df [ df [ 'amount' ] >  df [ 'abnormal threshold' ] ] 

When the system was first put into operation, 100,000 payment data items were cleaned and analyzed within 117 minutes, and 382 high-risk transactions were automatically marked, including 23 consecutive "break-down" payments by an environmental protection company that were precisely within the approval limit.


3. Unraveling the mystery: Knowledge graphs reveal the network of interest transfer

Faced with the abnormal transactions screened out, the audit team used a "killer weapon" - the dynamic knowledge graph system. The platform integrates data from nine dimensions, including industry and commerce, taxation, and justice, and achieved three major breakthroughs:

       1. Equity penetration visualization :

    • Construct a penetration map of "shareholders → legal persons → actual controllers"

    • Identify a group that controls three suppliers through four levels of nested shareholdings


      2. Tracking of fund flows :

      mermaid

      graph LR 

      A [City Investment Company] --> |2022.05.12 Transfer of 8 million |  

      B (Building Materials Trading Company)--> |2022.05.13 Transfer of 7.98 million | 

      C (Consulting Service Company)--> | 2022.05.14 Transfer of 7.95 million |  D (Offshore Company)


      3. Transaction timing analysis :

    • It was found that a certain infrastructure project had an abnormal time sequence chain of "contract signing → advance payment → related party change"

    • Lock down the chain of evidence of 5 shell companies issuing circular invoices


    When the complete network of connections was projected onto the large screen of the command center, a hidden channel for transferring benefits clearly emerged: a state-owned enterprise completed 27 fund transfers through 6 affiliated companies in two years, and eventually transferred 42 million yuan of state-owned assets to a privately controlled overseas company.


    4. Actual combat confrontation: the offensive and defensive game between auditors and audited entities

    While technological breakthroughs have brought about efficiency improvements, they have also given rise to new means of countermeasures. When auditing a new energy company, the audit team encountered three major anti-audit strategies:

    1. Data fog tactics :

    • Using "Yin-Yang Contracts" to Create Data Contradictions

    • Insert interference fields into electronic ledgers

    • Time difference attack :

      • Artificially creating cross-year transaction splits

      • Using holidays to delay fund transfers

    • Relationship camouflage :

      • Hiding the actual controller through a proxy holding agreement

      • Fake overseas strategic investor identities


      The audit team developed targeted countermeasures:

      • Apply NLP technology to analyze the similarity of contract texts

      • Constructing a model for the velocity of funds indicator

      • Introducing social network analysis (SNA) to identify hidden connections

      At a critical confrontation point, the technical team discovered traces of tampering in a company's shareholder change documents by analyzing the metadata of the industrial and commercial change records, thus revealing a breakthrough in the entire counterfeiting network.


      5. Institutional innovation: three paradigm shifts in big data auditing

      This actual combat gave birth to a replicable innovation mechanism:

      1. Smart Audit Workbench :

      • Integrate the entire process of data collection, cleaning, analysis, and visualization

      • Built-in 35 audit analysis models

      • Support custom rule engine

    • Continuous audit mode :

      • Establishing a "data probe" real-time monitoring system

      • Set 14 types of automatic warning rules

      • Realize the transformation from "post-audit" to "in-process prevention and control"

    • Audit knowledge base construction :

      • Accumulate 230 typical case feature libraries

      • Constructing an industry risk indicator benchmark system

      • Form a dynamically updated audit experience map


      A municipal enterprise lamented after being audited: "Now we feel like there are two digital eyes watching every expenditure, which forces us to establish a more standardized internal control system."


      VI. Enlightenment and Outlook: When Audit Meets Artificial Intelligence

      This special operation not only resulted in the recovery of 50 million yuan in funds, but more importantly, it established three industry standards:

      1. Fiscal Expenditure Audit Data Cleansing Specifications (2023 Edition)

      2. Technical Guidelines for Identifying Related-Party Transactions

      3. Guide to Building an Audit Knowledge Graph


      The head of the technical team revealed that they are experimenting with applying large language models to unstructured data analysis, which will enable intelligent review of contract texts in the future. However, audit experts also emphasized: "Technology can never replace the professional judgment of auditors, and human-machine collaboration is the right direction."

      In a provincial audit innovation laboratory, the author saw a striking slogan on the wall: "Data may lie, but logic will not; codes may make mistakes, but common sense will not." This may be the core concept that auditors in the new era uphold.


      Conclusion


      From abacus to Python, from ledgers to knowledge graphs, the evolution of audit technology is a history of gaming against fraudulent means. When 100,000 lines of code meet 1 billion-level funds, this silent contest confirms a truth: technological innovation is not only an efficiency tool, but also a strategic weapon to protect the security of public funds. In this endless offensive and defensive battle, auditors are writing the supervisory wisdom of this era.