Research on the Application of Enterprise Knowledge Graphs Based on Large Language Model Technology

Written by
Clara Bennett
Updated on:June-07th-2025

Recommendation
New exploration of the combination of financial technology and large language model technology to improve the efficiency of financial services and risk prevention and control capabilities.

Core content:

1. Background of financial technology development and the current status of enterprise knowledge graph application

2. Challenges and problems faced by large language model technology in the financial field

3. Knowledge graph construction and application results based on large language models

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

1.Background

As a programmatic document guiding the healthy and orderly development of FinTech, the FinTech Development Plan (2022-2025) clearly emphasizes the use of emerging technologies to strengthen the ability of financial services to the real economy, enhance the level of financial risk prevention and control, and promote the efficient allocation of financial resources. Meanwhile, the "14th Five-Year Plan for the Development of Science and Technology in the Securities and Futures Industry", as a special plan for the industry, points out that it is necessary to enhance the core competitiveness and service effectiveness of securities and futures institutions through improving the standard system of fintech, deepening the application of big data and artificial intelligence and other measures, so as to provide investors with richer, more convenient and safer financial services. The introduction of these two important plans highlights the great importance and forward-looking layout of China's development of fintech. Fintech is expected to drive profound changes in the financial industry and service model through technological innovation, providing strong support for the construction of a modernized financial system.

In recent years, enterprise knowledge graphs and large language model technology have been more widely used in the field of finance and securities. However, due to the limitations of traditional natural language processing technology, the information contained in complex unstructured data such as news and public opinion cannot be efficiently and accurately extracted and integrated into the knowledge graph. At the same time, business queries and results generation for knowledge graphs rely heavily on manual intervention, which cannot provide services for business in a convenient way. The application of generalized big language models in the securities and futures industry scenarios also reveals a number of problems and challenges, including the lack of industry knowledge, the tendency to generate erroneous and inaccurate information based on illusions, and the inability to efficiently and rapidly update the acquired knowledge.

In order to solve the problems faced by the above two technologies, this topic realizes the innovative combination of big language model and knowledge graph through new technologies such as retrieval-enhanced generation and intelligent body, and explores and demonstrates the feasibility of the relevant solutions in the direction of corporate affiliation, event impact and public opinion pulse query, and develops relevant applications for related business practices, aiming to show its potential in compliance due diligence, risk control and investment research decision-making and other The purpose is to demonstrate its application potential in compliance due diligence, risk control, investment research decision-making and other securities and finance scenarios.

2. Content and Achievements

This project has successfully worked out a set of technical solutions that can organically combine large language models and knowledge graphs, including three aspects: knowledge graph construction phase, knowledge graph interaction and knowledge graph application, and has verified its effectiveness in different scenarios. Based on the above graph construction for complex text and general knowledge graph query framework, in the three query scenarios of enterprise affiliation, event impact and public opinion pulse, it realizes the functions of data parsing, natural language interaction and analysis result generation that are difficult or impossible to be realized by traditional NLP technology and human-computer interaction framework.

2.1 Enterprise Knowledge Graph Construction

In this project, we have written and tuned prompt word commands that contain multiple task forms. For example, "event classification" belongs to the text classification task, "event summary" is a summary of the text, and "event geography" belongs to the information extraction task. Each of the sub-tasks in "Event Impact" requires the large language model to complete the above three tasks and at the same time relate the relevant information together by understanding the preceding and following texts, which is extremely difficult for the previous deep models. In the process of research, we found that the large language model can not only parse out the information in the original text according to the instructions of the cue words, but also complete some of the fields with its own knowledge according to the logical relationship between other fields and the extracted information.

This topic proposes a graph storage method for the results of the large language model, which can store the parsing results of the large language model into the graph database separately by fields. In the vector database, the title, which is a high level summary of the event, is transformed into a vector with a length of 1024, and is stored in the vector database along with information such as the body of the event, a summary of the event, and a unique ID generated for each event. The vector database is used to store the indexing information of the events as part of the knowledge graph. In the graph database, "event" exists as a special node type. The unique ID of the event previously stored in the vector database is used as the ID of the node of the specific event instance, thus realizing the correspondence between the data in the vector database and the graph database. Information such as event occurrence time, event name, event classification, event subject and event locale are stored as node attributes in the graph database. Both the event itself and the affected entities extracted together are searched in the graph database for the existence of similar events and entities before they are deposited into the graph database, thus avoiding the duplication problem.

 

2.2 Enterprise Knowledge Graph Query Framework Based on Large Language Models

In this project, we propose to use Agent and Retrieval Augmented Generation (RAG) technologies to integrate large language model capabilities in the knowledge graph query, data recall and result generation phases, and based on this, we define A generalized knowledge graph query architecture based on the knowledge graph query agent is defined.

In order to avoid the "overload" of the large language model caused by a large amount of data input, the data is encapsulated into a standard API according to the usage scenarios or business domains, and the Agent only needs to make choices and judgments at the coarse-grained API level. In addition, we adopt the Multi-Agent scheme, i.e., we first use an Agent to interact with the user, limit the APIs to be called to a range, and then use the built-in Agent in the corresponding APIs to complete the final matching in the process of calling APIs by the executor.

Retrieval and generation then rely on the RAG architecture to solve the large language model illusion problem. Text similarity algorithms and recall sorting algorithms are used to find the text that is most similar to the search object from a large collection of data. The retrieved information is fed into a generative model (usually LLM) along with the original query (or a query rewritten by a predecessor model), and the final result is generated with the model's comprehension and generation capabilities.

This topic is practiced in three scenarios of enterprise association query, event impact query and public opinion pulse query under the environment of the big data laboratory built by the three co-research institutions in a common research laboratory.

In the enterprise association query scenario, the user simply inputs the enterprise and the type of association relationship he/she wants to query in the dialog box, and the interaction module of the Knowledge Graph Query Agent will carry out the preliminary intent matching, and call the Business Name Matching Agent through the Knowledge Graph Retrieval Module to get the full industrial and commercial name corresponding to the short name of the enterprise in the query, and generate the next command to direct the Retrieval Module to call the Enterprise Association Query Agent. The enterprise association query Agent will match the user's intent with possible business queries such as querying shareholders, foreign investment, beneficiary shareholders, and inter-enterprise affiliation in a more subdivided way, and then generate commands to the retrieval module that contain calling interfaces and calling parameters, and return the received query results to the query Agent to generate natural language answer results.

In the event impact query scenario, the knowledge graph query agent will send the question and other information obtained through interaction to the event impact query agent, whose interaction module will convert the original query into an event description according to the instruction, and then recall the relevant event information in the previously constructed knowledge graph, as well as the related information such as land and industry chain penetrated by the graph through the retrieval module. After acquiring the data, the generation module will filter the recalled data through semantic understanding based on the user's question, and finally organize a complete event impact assessment based on the original one or more impact description fragments. When there is no direct reference impact analysis for the event queried by the user, the user can find out whether there is a similar situation in the "knowledge" for deducing the possible impact and further complete the subsequent impact dissemination.

In the opinion pulse query scenario, after receiving a user's request for querying the event pulse of a certain enterprise, the query service Agent will send the matched enterprise information together with the user's query to the opinion pulse query Agent, which will synthesize the information of the enterprise, the event, and the timeframe of the opinion pulse to retrieve the relevant news from the Knowledge Graph, and then hand it over to the Generation Module for generating the pulse according to the format specified in the prompts. The generation module generates the pulse in the format specified by the cue word.

3. Summary and Outlook

This topic innovatively combines a large language model and a knowledge graph, and proposes a construction method and query framework of enterprise knowledge graph based on a large language model by making full use of the efficient memory ability of the knowledge graph and the powerful inference generation ability of the large language model. And taking the securities and financial scenarios such as customer expansion, investment research and risk control as an example, three query functions of enterprise affiliation, event impact and public opinion pulse are developed, which verify the advantages of the combination of large language model and enterprise knowledge graph proposed in this topic over the existing solutions in the aspects of graph construction based on unstructured text, user interaction, impact propagation mining, and complex logic summarization. With the future development of the base large language model and related technologies, it is believed that the complementarity of the two technologies can bring more help to the development of financial technology.

 

Note: This topic was awarded the third prize of the 2023 industry co-research topic by the Securities Information Technology Research and Development Center (Shanghai).