Knowledge Graphs tin reshape really we deliberation astir Retrieval-Augmented Generation (RAG). Vector databases are awesome for semantic similarity, but they often miss deeper relationships hidden successful the data. By storing accusation arsenic nodes and edges, a chart database surfaces discourse that tin thief Large Language Models (LLM) nutrient better, much grounded responses.
In this tutorial, we’ll locomotion done really to usage a chart database to powerfulness a RAG pipeline. We’ll research ingestion steps, wherever we harvester Named Entity Recognition (NER) pinch chart modeling, past spot really to build queries that fetch applicable discourse for your Large Language Model. By the end, you’ll person a instauration for a graph-based attack that handles some system and unstructured information successful a azygous workflow.
🚀 What You’ll Learn
In this tutorial, you’ll study really to build a Retrieval-Augmented Generation (RAG) supplier utilizing a chart database. We’ll screen really to ingest information into a chart database pinch Named Entity Recognition to create rich | relationships, and past query these relationships to extract contextual snippets that thrust amended responses from a connection model. Finally, you’ll spot really to accommodate the codification to activity pinch DigitalOcean’s GenAI Agent aliases 1-Click Models utilizing an OpenAI-compatible API, providing a clear, step-by-step guideline to combining system chart information pinch powerful connection generation.
🛠 What You’ll Need
To make the astir retired of this tutorial, you should guarantee you have:
- A Linux aliases Mac-based Developer’s Laptop
- Windows Users should usage a VM aliases Cloud Instance
- Python Installed: type 3.10 aliases higher
- (Recommended) Using a miniconda aliases venv virtual environment
- Docker (Linux aliases MacOS) Installed: for moving a section Neo4j instance
- Basic familiarity pinch ammunition operations
- Download the Dataset utilized successful this Tutorial. Source: BBC Full Text Document Classification
Why Choose Graph Databases for RAG?
RAG systems unrecorded and dice by their expertise to retrieve the correct information. Vector stores are accelerated and excel astatine uncovering semantically akin passages, but they disregard the web of relationships that tin matter successful real-world data. For example, you mightiness person customers, suppliers, orders, and products—each pinch relationships that spell beyond matter similarity. Graph databases way these links, letting you do multi-hop queries that reply much analyzable questions.
Another large use is transparency. Graph structures are easier to visualize and debug. If a exemplary cites the incorrect portion of information, you tin trace the node and separator connections to spot wherever it came from. This attack reduces hallucinations, increases trust, and helps developers hole issues quickly.
Step 1: Setup Project Dependencies
-
Add the Python limitations utilizing pip.
pip install neo4j \ requests \ ctransformers \ spacy \ flask \ openai -
Create a Neo4j chart database utilizing Docker
docker tally \ -d \ --publish=7474:7474 --publish=7687:7687 \ -v $HOME/neo4j/data:/data \ -v $HOME/neo4j/logs:/logs \ -v $HOME/neo4j/import:/var/lib/neo4j/import \ -v $HOME/neo4j/plugins:/plugins \ neo4j:5
Step 2: Ingest The Dataset Into Our Graph Database
Before we query, we request to ingest. Below is simply a sample Python book that uses spaCy for NER and Neo4j arsenic a retention layer. The book loops done matter files successful a BBC dataset, tags the contented pinch named entities, and creates connections successful the database:
-
Ingest the dataset into Neo4j utilizing the Python exertion below.
import os import uuid import spacy from neo4j import GraphDatabase NEO4J_URI = "bolt://localhost:7687" NEO4J_USER = "<YOUR PASSWORD>" NEO4J_PASSWORD = "<YOUR USERNAME>" DATASET_PATH = "./bbc" def ingest_bbc_documents_with_ner(): nlp = spacy.load("en_core_web_sm") driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD)) with driver.session() as session: session.run("MATCH (n) DETACH DELETE n") for class in os.listdir(DATASET_PATH): category_path = os.path.join(DATASET_PATH, category) if not os.path.isdir(category_path): continue for filename in os.listdir(category_path): if filename.endswith(".txt"): filepath = os.path.join(category_path, filename) with open(filepath, "r", encoding="utf-8", errors="replace") as f: text_content = f.read() doc_uuid = str(uuid.uuid4()) create_doc_query = """ MERGE (d:Document {doc_uuid: $doc_uuid}) ON CREATE SET d.title = $title, d.content = $content, d.category = $category RETURN d """ session.run( create_doc_query, doc_uuid=doc_uuid, title=filename, content=text_content, category=category ) doc_spacy = nlp(text_content) for ent in doc_spacy.ents: if len(ent.text.strip()) < 3: continue entity_uuid = str(uuid.uuid4()) merge_entity_query = """ MERGE (e:Entity { name: $name, label: $label }) ON CREATE SET e.ent_uuid = $ent_uuid RETURN e.ent_uuid arsenic eUUID """ grounds = session.run( merge_entity_query, name=ent.text.strip(), label=ent.label_, ent_uuid=entity_uuid ).single() ent_id = record["eUUID"] rel_query = """ MATCH (d:Document { doc_uuid: $docId }) MATCH (e:Entity { ent_uuid: $entId }) MERGE (d)-[:MENTIONS]->(e) """ session.run( rel_query, docId=doc_uuid, entId=ent_id ) print("Ingestion pinch NER complete!") if __name__ == "__main__": ingest_bbc_documents_with_ner()
This codification shows really to merge a Document node, nexus recognized entities, and shop the full structure. You tin switch successful your ain data, too. The halfway thought is that erstwhile these relationships exist, you tin query them to get meaningful insights, alternatively than conscionable retrieving matter passages.
Step 3: Query The RAG Agent Using Our Knowledge Graph
After ingesting your documents, you’ll want to inquire questions. The adjacent book extracts named entities from a personification query, matches those entities to the Neo4j graph, and collects apical matching documents. Finally, it sends a mixed discourse to a section connection exemplary endpoint:
-
Query the RAG Agent utilizing the Python exertion below.
import spacy from neo4j import GraphDatabase import openai import os NEO4J_URI = "bolt://localhost:7687" NEO4J_USER = "<YOUR PASSWORD>" NEO4J_PASSWORD = "<YOUR USERNAME>" def connect_neo4j(): return GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD)) def extract_entities_spacy(text, nlp): doc = nlp(text) return [(ent.text.strip(), ent.label_) for ent in doc.ents if len(ent.text.strip()) >= 3] def fetch_documents_by_entities(session, entity_texts, top_k=5): if not entity_texts: return [] query = """ MATCH (d:Document)-[:MENTIONS]->(e:Entity) WHERE toLower(e.name) IN $entity_list WITH d, count(e) arsenic matchingEntities ORDER BY matchingEntities DESC LIMIT $topK RETURN d.title AS title, d.content AS content, d.category AS category, matchingEntities """ entity_list_lower = [txt.lower() for txt in entity_texts] results = session.run(query, entity_list=entity_list_lower, topK=top_k) docs = [] for grounds in results: docs.append({ "title": record["title"], "content": record["content"], "category": record["category"], "match_count": record["matchingEntities"] }) return docs def generate_answer(question, context): """ Replaces the section LLM server telephone pinch a DigitalOcean GenAI Agent call, which is OpenAI API-compatible. """ punctual = f"""You are fixed the pursuing discourse from aggregate documents: {context} Question: {question} Please supply a concise answer. Answer: """ try: openai_client = openai.OpenAI( ) completion = openai_client.chat.completions.create( model="n/a", messages=[ {"role": "user", "content": prompt} ], ) return completion.choices[0].message.content except Exception as e: print("Error calling the DigitalOcean GenAI Agent:", e) return "Error generating answer" if __name__ == "__main__": user_query = "What do these articles opportunity astir Ernie Wise?" nlp = spacy.load("en_core_web_sm") recognized_entities = extract_entities_spacy(user_query, nlp) entity_texts = [ent[0] for ent in recognized_entities] driver = connect_neo4j() with driver.session() as session: docs = fetch_documents_by_entities(session, entity_texts, top_k=5) combined_context = "" for doc in docs: snippet = doc["content"][:300].replace("\n", " ") combined_context += f"\n---\nTitle: {doc['title']} | Category: {doc['category']}\nSnippet: {snippet}...\n" final_answer = generate_answer(user_query, combined_context) print("RAG-based Answer:", final_answer)
The travel goes for illustration this:
- Recognize entities successful the user’s mobility pinch spaCy.
- Match those entities successful Neo4j to find applicable documents.
- Concatenate snippets from those documents into a discourse string.
- Send the discourse and mobility to your section connection model.
This attack helps the exemplary attraction connected precise information. Instead of searching a immense matter index, you retrieve curated information based connected system relationships. That intends higher-quality answers and a powerful measurement to grip analyzable queries that spell beyond elemental keyword matching.
To usage a GenAI Agent aliases 1-Click Models arsenic the LLM, you tin simply region the commented retired codification below:
openai_client = openai.OpenAI( )🤔 Final Thoughts
Graph databases adhd a caller magnitude to RAG workflows. They grip elaborate relationships, trim unhelpful answers, and let you to way really the strategy arrives astatine a conclusion. When you brace them pinch entity nickname and a ample connection model, you create a pipeline that captures nuance and discourse from your data.
With these codification snippets, you person a starting constituent for building a robust RAG agent. Feel free to grow connected this creation by introducing your ain data, adjusting the query logic, aliases experimenting pinch further chart features. Whether you’re creating a customer-facing chatbot aliases an soul analytics tool, knowledge graphs tin bring clarity and extent to your AI-driven experiences.