Knowledge Graph

In-depth exploration of the construction, representation and application of knowledge graphs, and insight into their core value in intelligent information processing.
Core content:
1. Definition and components of knowledge graphs
2. Representation methods of entities, relationships and attributes
3. Construction process and technical challenges of knowledge graphs
Knowledge graph is a semantic network that reveals the relationship between entities. It can formally describe the objects in the real world and their relationships. It was first proposed by Google in May 2012. The original intention was to improve the search engine capabilities and enhance the search quality and search experience of users. Now the knowledge graph has been used to refer to various large-scale knowledge bases.
Knowledge graph composition and representation
Components of the Knowledge Graph
The knowledge graph is mainly composed of a graph structure consisting of nodes (entities) and edges (relationships). Nodes represent entities, such as people, places, events, etc., and edges represent various relationships between entities, such as "people live in a certain place". Each entity and relationship can be described by specific attributes, so that the knowledge graph can structure and semantically represent knowledge in the real world. The knowledge graph is mainly composed of the following three parts:
1) Entities are nodes in the graph, representing specific objects or abstract concepts in the real world. Each entity has a unique identifier and attributes to describe its characteristics. For example, the entity "Yao Ming" can have attributes "height" etc.
2) Relationships are edges that connect entities, representing the relationship between entities. Relationships are usually directional and can have different types, representing different ways of association between entities. For example, the entity "Yao Ming" and the entity "Ye Li" can be connected through the relationship "wife".
3) Attributes: Attributes are used to describe the characteristics of an entity or relationship, such as the entity "Yao Ming" has the attribute "occupation".
Representation Method of Knowledge Graph
In general, triples are a common representation of knowledge graphs, and there are three basic forms:
Construction of knowledge graph
How do you decide which are entities, which are attributes, and which are relationships? It depends on how you use the graph and what you want to accomplish.
The architecture of the knowledge graph includes its own logical structure and system architecture. In terms of logical structure, it can be divided into two levels: the model layer and the data layer. The data layer is mainly composed of a series of facts, and knowledge will be stored in units of facts. For example, using the graph database Neo4j to store, The model layer is built on the data layer, mainly through the ontology library to standardize a series of factual expressions in the data layer. The ontology is a conceptual template for a structured knowledge base. The knowledge base formed by the ontology library not only has a strong hierarchical structure, but also has a low degree of redundancy.
The structure of the system architect's construction model is shown in the figure below:
The construction of a large-scale knowledge base requires more intelligent information processing technologies, including knowledge extraction to extract entities, relationships, attributes and other elements from unstructured data; knowledge fusion to eliminate ambiguity and improve data quality; knowledge reasoning to further explore implicit knowledge based on the existing knowledge base, thereby enriching and expanding the knowledge base; the comprehensive vector formed by distributed knowledge representation is of great significance to the construction, reasoning, fusion and application of the knowledge base.
Knowledge Extraction
For unstructured data, use automated technology to extract usable knowledge units, including entity extraction, relationship extraction, and attribute extraction.
Entity extraction: Entities are the most basic elements in the knowledge graph. The completeness, accuracy, and recall of their extraction will directly affect the quality of the knowledge base. Technically, it is more commonly known as NER (named entity recognition). Entity extraction can be based on rules and dictionaries, or based on machine learning model prediction. It is a sequence labeling problem.
Relationship extraction: The goal is to solve the problem of semantic links between entities. In the early days, entity relationships were identified by manually constructing semantic rules and templates, and then the relationship model between entities gradually replaced the pre-defined grammatical rules.
Attribute extraction: The attributes of an entity can be regarded as a nominal relationship between the entity and the attribute, so the problem of extracting entity attributes can be converted into a relationship extraction problem, so most of the relationship extraction methods can be used for reference.
Knowledge Fusion
Since the knowledge in the knowledge graph comes from a wide range of sources, there are problems such as uneven knowledge quality, repeated knowledge from different data sources, and unclear associations between knowledge, so knowledge fusion is necessary. Knowledge fusion is a high-level knowledge organization that integrates heterogeneous data, disambiguates, processes, verifies inferences, and updates knowledge from different knowledge sources under a unified framework to achieve the integration of data, information, methods, experience, and human thinking to form a high-quality knowledge base, including: entity linking, knowledge merging
Entity linking: linking the extracted entity objects to the corresponding correct entity objects in the knowledge base
Knowledge merging: recognizing knowledge from different sources as the same real-world entity
Knowledge Processing
Facts themselves are not equal to knowledge. To ultimately obtain a structured and networked knowledge system, it is necessary to go through a process of knowledge processing, including ontology construction, knowledge reasoning, and quality assessment.
Knowledge reasoning: Further explore implicit knowledge based on the existing knowledge base to enrich and expand the knowledge base
For example, based on rules + syntax: transitivity: A-son-B, B-son-C, A-?-C
Model-based knowledge completion: Given two entities, infer their relationship, such as h + r -> t, h + t -> r, (h, r, t) -> {0, 1}
Knowledge Representation
Convert entities, relations, attributes, etc. in the knowledge graph into vectors, and use the computational relations between vectors to reflect the associations between entities, so that the learned entity representations can be used in text-related tasks.
Technical framework of knowledge graph
Building a knowledge graph usually requires the following technologies:
Graph Databases
Graph databases such as Neo4j are used to store the data structure of knowledge graphs. They are good at processing queries and operations on nodes (entities) and edges (relationships), and are particularly suitable for the storage and query of large-scale knowledge graphs.
Natural Language Processing (NLP)
NLP is used to extract entities and relations from unstructured text. For example, through named entity recognition (NER) and relation extraction, knowledge graphs can be automatically constructed from large amounts of text.
Simple implementation of knowledge graph
This article uses some of Jay Chou's songs, albums, movies and TV series to build a related graph database, and then implements a simple question-answering task based on the graph database
Related graph database construction
Data preparation: When building a graph database, you first need two files: entity-relationship-entity triple file and entity-attribute-attribute value triple file, such as:
#Entity-relationship-entity triple file
Secret (2007 film written and directed by Jay Chou) Director Jay Chou
Secret (2007 film written and directed by Jay Chou) Dialogue language: Chinese
Secret (2007 film written and directed by Jay Chou) Filming location: Tamkang High School
Secret (2007 film written and directed by Jay Chou) Color
Secret Movie Soundtrack Album Singer Jay Chou
Jay Chou, Director of the MV "Not Your Friend"
Prague Square (sung by Jolin Tsai and Jay Chou) Arranged by Zhong Xingmin
Prague Square (sung by Jolin Tsai and Jay Chou) Song language: Mandarin
#Entity-attribute-attribute value triple file
Secret (2007 film written and directed by Jay Chou) imdb code tt1037850
Secret (2007 film written and directed by Jay Chou) Screenwriter Du Zhilang, Jay Chou
Secret (2007 film written and directed by Jay Chou) Produced by Edko Films Ltd.
Secret (2007 film written and directed by Jay Chou) Genre: drama, music, fantasy, emotion
Secret (2007 film written and directed by Jay Chou) Length: 101 minutes
Secret (2007 film written and directed by Jay Chou) Release date July 27, 2007 (Taiwan, China)
Secret (2007 film written and directed by Jay Chou)
Secret (2007 film written and directed by Jay Chou) Produced in Taiwan, China, Hong Kong, China
Secret (2007 film written and directed by Jay Chou) Producer Jiang Zhiqiang
Secret (2007 film written and directed by Jay Chou) Chinese title: Secret
Secret (2007 film written and directed by Jay Chou) Starring Jay Chou, Gui Lunmei, Anthony Wong, Janice Tseng, Su Mingming
Secret (2007 film written and directed by Jay Chou) Major awards: 44th Taiwan Golden Horse Film Awards Outstanding Film of the Year
Secret Soundtrack Producers Jay Chou, Terdsak Janpan
Read data and build cypher statements, add attributes to all entities and concatenate all attributes of an entity into a dictionary-like expression, and execute the table creation script
CREATE (Secret: Movie {imdb code: 'tt1037850' , screenwriter: 'Du Zhilang, Jay Chou' , production company: 'Eden Films Ltd' , type: 'drama, music, fantasy, emotion' , length: '101 minutes' , release time: 'July 27, 2007 (Taiwan, China)' , foreign name: 'Secret' , production area: 'Taiwan, China, Hong Kong, China' , producer: 'Jiang Zhiqiang' , Chinese name: 'Secret' , starring: 'Jay Chou, Gui Lunmei, Anthony Wong, Kai Hsuan Tseng, Su Mingming' , major awards: 'Outstanding Film of the Year at the 44th Taiwan Golden Horse Film Awards' , NAME: 'Secret' })
CREATE (Secret Movie Soundtrack {Producer: 'Jay Chou, TerdsakJanpan' , NAME: 'Secret Movie Soundtrack' })
CREATE (Prague Square: song {Release date: 'March 2003' , Song duration: '4:52' , Song original singer: 'Jolin Tsai, Jay Chou' , Album: '《Look at My 72 Changes》' , Chinese name: 'Prague Square' , NAME: 'Prague Square' })
CREATE (扯:song{track code: '08' , Chinese name: '扯' , NAME: '扯' })
CREATE (Chen Tianjia {Favorite celebrities: 'Tang Yuzhe, Jay Chou' , NAME: 'Chen Tianjia' })
CREATE (BedtimeStories:song{Release time: '2016-06-24' , Song duration: '3:45' , Foreign name: 'BedtimeStories' , Chinese name: 'BedtimeStories' , NAME: 'BedtimeStories' })
CREATE (Slam Dunk: Movie {Starring: 'Jay Chou, Eric Tsang, Wang Gang, Charlene Choi, Bolin Chen, Ng Man-tat, Jacky Wu' ,NAME: 'Slam Dunk' })
CREATE (Slam Dunk 2 {Starring: 'Jay Chou, Charlene Choi' , NAME: 'Slam Dunk 2' })
CREATE (Big Star: Song {Chinese name: 'Big Star' ,NAME: 'Big Star' })
CREATE (Tamkang High School {Famous alumni: 'Jay Chou, Gu Long' , NAME: 'Tamkang High School' })
The final construction data example is as follows
Natural Language Processing
NLP is used to extract entities and relations from unstructured texts. For example, through named entity recognition (NER) and relation extraction, knowledge graphs can be automatically constructed from large amounts of text.
Get the entities, relations, and attributes in the question: You can get entities based on the vocabulary or use the NER model. For relation identification in the question, you can use the text classification model. In this case, text matching is used to get the required information.
#Get the entities mentioned in the question. You can use a vocabulary-based approach or a NER model.
def get_mention_entitys(self, sentence):
return re.findall( "|" .join(self.entity_set), sentence)
# Get the relationship mentioned in the question, and you can also use various text classification models
def get_mention_relations(self, sentence):
return re.findall( "|" .join(self.relation_set), sentence)
# Get the properties mentioned in the question
def get_mention_attributes(self, sentence):
return re.findall( "|" .join(self.attribute_set), sentence)
# Get the tags mentioned in the question
def get_mention_labels(self, sentence):
return re.findall( "|" .join(self.label_set), sentence)
#Preprocess the problem and extract the required information
def parse_sentence(self, sentence):
entities = self.get_mention_entitys(sentence)
relations = self.get_mention_relations(sentence)
labels = self.get_mention_labels(sentence)
attributes = self.get_mention_attributes(sentence)
return { "%ENT%" :entitys,
"%REL%" :relations,
"%LAB%" :labels,
"%ATT%" :attributes}
#Assign the extracted values to the keys
def decode_value_combination(self, value_combination, cypher_check):
res = {}
for index, (key, required_count) in enumerate(cypher_check.items()):
if required_count == 1:
res[key] = value_combination[index][0]
else :
for i in range(required_count):
key_num = key[:-1] + str(i) + "%"
res[key_num] = value_combination[index][i]
return res
#For the case where the number of entities found exceeds the requirements in the template, permutations and combinations are required
#info:{"%ENT%":["Jay Chou", "Vincent Fang"], "%REL%":["Composer"]}
def get_combinations(self, cypher_check, info):
slot_values = []
for key, required_count in cypher_check.items():
slot_values.append(itertools.combinations(info[key], required_count))
value_combinations = itertools.product(*slot_values)
combinations = []
for value_combination in value_combinations:
combinations.append(self.decode_value_combination(value_combination, cypher_check))
return combinations
Based on the extracted entities, relationships and other information, the question template is expanded into the text to be matched Determine the matching cypher by question matching and parse the results Here, we ask a few questions and finally display the system knowledge graph answer content
if __name__ == "__main__" :
graph = GraphQA()
res = graph.query( "Who directed the secret that cannot be told?" )
print (res)
res = graph.query( "Who composed the music for "Fa Ruxue" )
print (res)
res = graph.query( "Who composed the music of love before the Christian era?" )
print (res)
res = graph.query( "What is Jay Chou's zodiac sign?" )
print (res)
res = graph.query( "What is Jay Chou's blood type" )
print (res)
res = graph.query( "Jay Chou's height" )
print (res)
res = graph.query( "What is the relationship between Jay Chou and Tamkang High School?" )
print (res)
#output Final answer
The director of Secret is Jay Chou
Jay Chou
Jay Chou
Capricorn
Type O
175cm
Graduate School
#process
===============
============
Who directed the secret that cannot be told?
info: { '%ENT%' : [ 'Secret' ], '%REL%' : [ 'Director' ], '%LAB%' : [], '%ATT%' : [ 'Director' ]}
[[ 'Who composed the music for The Secret' , 'Match (n)<-[:composer]-(m {NAME:"The Secret"}) return n.NAME' , 'n.NAME' ], [ 'Who directed The Secret' , 'Match (n)<-[:director]-(m {NAME:"The Secret"}) return n.NAME' , 'The director of The Secret is n.NAME' ], [ 'What is the director of The Secret' , 'Match (n) where n.NAME="The Secret" return n.director' , 'n.director' ]]
Who directed the film The Secret Who composed the film The Secret0.5833333333333334
Who directed the film? Who directed the film? Who directed the film? 1.0
Who directed the secret that cannot be told? Who is the director of the secret that cannot be told?0.66666666666666666
The director of Secret is Jay Chou
============
Who composed the music of Hair Like Snow?
info: { '%ENT%' : [ 'Hair as white as snow' ], '%REL%' : [ 'Music composed' ], '%LAB%' : [], '%ATT%' : [ 'Music composed' ]}
[[ 'Who composed the music for Hair Like Snow' , 'Match (n)<-[:music]-(m {NAME:"Hair Like Snow"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Hair Like Snow' , 'Match (n)<-[:director]-(m {NAME:"Hair Like Snow"}) return n.NAME' , 'The director of Hair Like Snow is n.NAME' ], [ 'What is the music for Hair Like Snow' , 'Match (n) where n.NAME="Hair Like Snow" return n.music' , 'n.music' ]]
Who composed the music for "Fa Ru Xue"? Who composed the music for "Fa Ru Xue"? 1.0
Who composed the music for "Fa Ru Xue"? Who directed "Fa Ru Xue 0.5"?
Who composed the music for "Fa Ru Xue"? What is the music for "Fa Ru Xue"? 0.7
Jay Chou
============
Who composed the song "Love before the Common Era"?
info: { '%ENT%' : [ 'Love before the Common Era' ], '%REL%' : [ 'Music composition' ], '%LAB%' : [], '%ATT%' : [ 'Music composition' ]}
[[ 'Who composed the music of Love Before the BC' , 'Match (n)<-[:compose]-(m {NAME:"Love Before the BC"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Love Before the BC' , 'Match (n)<-[:director]-(m {NAME:"Love Before the BC"}) return n.NAME' , 'The director of Love Before the BC is n.NAME' ], [ 'What is the music of Love Before the BC' , 'Match (n) where n.NAME="Love Before the BC" return n.compose' , 'n.compose' ]]
Who composed the song Love before the Common Era? Who composed the song Love before the Common Era? 1.0
Who composed the music for Love Before Christ? Who directed it? Love Before Christ 0.58333333333333334
Who composed the song Love Before the Common Era? What is the song Love Before the Common Era? 0.75
Jay Chou
============
What is Jay Chou's zodiac sign?
info: { '%ENT%' : [ 'Jay Chou' ], '%REL%' : [], '%LAB%' : [], '%ATT%' : [ 'Zodiac' ]}
[[ 'Who composed Jay Chou's music' , 'Match (n)<-[:composer]-(m {NAME:"Jay Chou"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Jay Chou' , 'Match (n)<-[:director]-(m {NAME:"Jay Chou"}) return n.NAME' , 'Jay Chou's director is n.NAME' ], [ 'What is Jay Chou's zodiac sign' , 'Match (n) where n.NAME="Jay Chou" return n.Zodiac sign' , 'n.Zodiac sign' ]]
What is Jay Chou's zodiac sign? Who composed the music for Jay Chou? 0.41666666666666667
What is Jay Chou's zodiac sign? Who directed Jay Chou? 0.3333333333333333
What is Jay Chou's zodiac sign? What is Jay Chou's zodiac sign? 1.0
Capricorn
============
What is Jay Chou's blood type?
info: { '%ENT%' : [ 'Jay Chou' ], '%REL%' : [], '%LAB%' : [], '%ATT%' : [ 'Blood type' ]}
[[ 'Who composed Jay Chou's music' , 'Match (n)<-[:composer]-(m {NAME:"Jay Chou"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Jay Chou' , 'Match (n)<-[:director]-(m {NAME:"Jay Chou"}) return n.NAME' , 'Jay Chou's director is n.NAME' ], [ 'What is Jay Chou's blood type' , 'Match (n) where n.NAME="Jay Chou" return n.blood type' , 'n.blood type' ]]
What is Jay Chou's blood type? Who composed the music for Jay Chou? 0.41666666666666667
What is Jay Chou's blood type? Who directed Jay Chou? 0.3333333333333333
What is Jay Chou's blood type? What is Jay Chou's blood type? 1.0
Type O
============
Jay Chou's height
info: { '%ENT%' : [ 'Jay Chou' ], '%REL%' : [], '%LAB%' : [], '%ATT%' : [ 'Height' ]}
[[ 'Who composed the music for Jay Chou' , 'Match (n)<-[:composer]-(m {NAME:"Jay Chou"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Jay Chou' , 'Match (n)<-[:director]-(m {NAME:"Jay Chou"}) return n.NAME' , 'Jay Chou's director is n.NAME' ], [ 'What is Jay Chou's height' , 'Match (n) where n.NAME="Jay Chou" return n.Height' , 'n.Height' ]]
Jay Chou's height Who composed the music for Jay Chou? 0.4
How tall is Jay Chou? Who directed Jay Chou? 0.4444444444444444
What is Jay Chou's height? What is Jay Chou's height? 0.66666666666666666
175cm
============
What is the relationship between Jay Chou and Tamkang High School?
info: { '%ENT%' : [ 'Jay Chou' , 'Tamkang High School' ], '%REL%' : [], '%LAB%' : [], '%ATT%' : []}
[[ 'Who composed the music for Jay Chou' , 'Match (n)<-[:compose]-(m {NAME:"Jay Chou"}) return n.NAME' , 'n.NAME' ], [ 'Who composed the music for Tamkang High School' , 'Match (n)<-[:compose]-(m {NAME:"Tamkang High School"}) return n.NAME' , 'n.NAME' ], [ 'Who directed Jay Chou' , 'Match (n)<-[:director]-(m {NAME:"Jay Chou"}) return n.NAME' , 'Jay Chou's director is n.NAME' ], [ 'Who directed Tamkang High School' , 'Match (n)<-[:director]-(m {NAME:"Tamkang High School"}) return n.NAME' , 'Tamkang High School's director is n.NAME' ], [ 'What is the relationship between Jay Chou and Tamkang High School' , 'Match (n {NAME:"Jay Chou"})-[REL]->(m {NAME:"Tamkang High School"}) return REL' , 'REL' ]]
What is the relationship between Jay Chou and Tamkang High School? Who composed the music for Jay Chou? 0.23529411764705882
What is the relationship between Jay Chou and Tamkang High School? Who composed the music for Tamkang High School? 0.29411764705882354
What is the relationship between Jay Chou and Tamkang High School? Who directed Jay Chou? 0.17647058823529413
What is the relationship between Jay Chou and Tamkang High School? Who directed Tamkang High School? 0.23529411764705882
What is the relationship between Jay Chou and Tamkang High School? What is the relationship between Jay Chou and Tamkang High School? 1.0
Graduate School
Here, we can see that the example is to determine the matching cypher by calculating the distance between the templates of the question text to be matched to obtain the similarity. Other text matching methods can also be used.