Entity Resolved Knowledge Graphs

This hands-on tutorial in Python demonstrates integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph:

Use three datasets describing businesses in Las Vegas: ~85K records, ~2% duplicates.
Run entity resolution in Senzing to resolve duplicate business names and addresses.
Parse results to construct a knowledge graph in Neo4j.
Analyze and visualize the entity resolved knowledge graph.

We'll walk through example code based on Neo4j Desktop and the Graph Data Science (GDS) library to run Cypher queries on the graph, preparing data for downstream analysis and visualizations with Jupyter, Pandas, Seaborn, PyVis.

The code is simple to download and easy to follow, and presented so you can try it with your own data. Overall, this tutorial takes about 35 minutes total to run.

Why? For one example, popular use of retrieval augmented generation (RAG) to make AI applications more robust has boosted recent interest in KGs. When the entities, relations, and properties in a KG leverage your domain-specific data to strengthen your AI app ... compliance issues and audits rush to the foreground.

TL;DR: sense-making of the data coming from a connected world. During the transition from data integration to KG construction, you need to make sure the entities in your graph get resolved correctly. Otherwise, your AI app downstream will struggle with the kinds of details that make people get concerned, very concerned, very quickly: e.g., billing, deliveries, voter registration, crucial medical details, credit reporting, industrial safety, security, and so on.

Highly recommended:

"Entity Resolved Knowledge Graphs"
"Analytics on Entity Resolved Knowledge Graphs", Mel Richey (2023)

Prerequisites

In this tutorial we'll work in two environments. The configuration and coding are at a level which should be comfortable for most people working in data science. You'll need to have familiarity with how to:

clone a public repo from GitHub
launch a server in the cloud
use Linux command lines
write some code in Python

Total estimated project time: 35 minutes.

Cloud computing budget: running Senzing in this tutorial cost a total of $0.04 USD.

Set up local environment

After cloning this repo, connect into the ERKG directory and set up your local environment:

git clone https://github.com/DerwenAI/ERKG.git
cd ERKG

python3.11 -m venv venv
source venv/bin/activate

python3 -m pip install -U pip wheel setuptools
python3 -m pip install -r requirements.txt

We're using Python 3.11 here, although this code should run with most of the recent Python 3.x versions.

Run the tutorial notebooks

First, launch Jupyter:

./venv/bin/jupyter lab

Then based on the tutorial, follow the steps shown in these notebooks:

You can view the results -- an interactive visualization of the entity resolved knowledge graph -- by loading examples/big_vegas.2.html in a web browser. The full HTML+JavaScript is large and may take several minutes to load.

Deleting data

If you need to clear the database and start over, run this in Neo4j Desktop:

MATCH (n)
CALL {
  WITH n
  DETACH DELETE n
} IN TRANSACTIONS

See: https://neo4j.com/docs/cypher-manual/current/subqueries/subqueries-in-transactions/#delete-with-call-in-transactions

Kudos

Many thanks to: @akollegger, @brianmacy

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
article		article
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entity Resolved Knowledge Graphs

Prerequisites

Set up local environment

Run the tutorial notebooks

Deleting data

Kudos

About

Releases 2

License

DerwenAI/ERKG

Folders and files

Latest commit

History

Repository files navigation

Entity Resolved Knowledge Graphs

Prerequisites

Set up local environment

Run the tutorial notebooks

Deleting data

Kudos

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2