Introducing Darth Vecdor: A Free, Open-Source Platform to Create Knowledge Graphs Using LLMs (such as ChatGPT)

Picture by ChatGPT from prompt by (and with some image editing by) Jonathan A. Handler

by Jonathan A. Handler, MD, FACEP, FAMIA

I Wanted a Comprehensive Medical Knowledge Graph

For decades, I have hypothesized that a comprehensive medical knowledge database (aka, “knowledge base” or “knowledge graph”) would enable radical positive transformation in healthcare. I wanted a database containing relationships between concepts, like:

  1. All the symptoms of every medical problem.
  2. All the tests appropriate for every medical problem.
  3. All the medication treatments for every medical problem.
  4. All the complications of every medical problem.
  5. All the types of specialities that diagnose and/or treat every medical problem.
  6. All the surgeries (if any) appropriate for every medical problem.
  7. The potential “gravity” of every medical problem.
  8. For each “diagnosis code”, whether it was a symptom, a diagnosis, or something else.
  9. Etc.

With such a database, I imagined we could use it to automatically determine when patients were failing to get a diagnosis, getting the wrong diagnosis, failing to get any therapy when therapy was appropriate, getting the wrong therapy, suffering complications from their diagnosis or therapy, and so much more. With a knowledge graph, I believed we could build systems to help accomplish what I call the “3 M’s” of clinical quality: measure, manage, and maximize the quality of care for every patient on every visit (and between visits!).

Over the years, I found many knowledge graphs. I found free ones, commercial ones, and even aggregations of knowledge graphs (like this one). However, none seemed to have everything I needed. They either lacked my desired relationships, had incomplete coverage within the relationships, were too expensive, had limitations on use, weren’t consistently (or at all) maintained, seemed impossible to use, or had some other “gotcha”.

Not Finding What I Needed, I Made Darth Vecdor

The Idea of Darth Vecdor

Many LLMs, especially the “foundational models” like ChatGPT and others, hold vast amounts of knowledge. As I explored LLMs, it hit me… maybe we could extract that knowledge and store it in database to finally make the knowledge base I always wanted! In fact, maybe it could create almost any knowledge base. It seemed like it could be a better, more modern version of my earlier search engine web scraping idea!

So began my latest labor of love, Darth Vecdor. Building on the decades of my percolating ideas in this space, I worked on it for more than a year. Now, Darth Vecdor is a platform intended to build (hopefully useful) knowledge bases from LLM responses to prompts.

What Does Darth Vecdor Do?

Darth Vecdor is designed to be able to import virtually any terminology or set of concepts (.e.g., diagnosis codes, lab tests, medications, surgeries, medical specialties, types of commercial vehicles, household appliances, etc.). Then Darth Vecdor can take any of those lists and repeatedly query an LLM, using each item on the list, to get virtually any relationship(s). For example, for every clinical problem, it can query for all of its treatments, and/or all of its symptoms, and/or all of the tests one can perform to assess it. For every type of commercial vehicle (e.g., dump truck, excavator, etc.), it can query for all of its possible uses. Darth Vecdor stores the relationships in a knowledge graph having both strings (words) and associated embeddings (vectors, or sets of numbers, intended to represent the “meaning”, or “semantics,” of the strings).

Using embeddings, Darth Vecdor can attempt to match the relationship “objects” back to virtually any coding system or terminology, such as SNOMED-CT or ICD-10, or the custom terms and codes used at a particular company. An end result could be every clinical problem in SNOMED-CT and the LLM-provided medication treatments for each of those problems, with each such medication mapped by Darth Vecdor into an RxNorm medication terminology code.

You may have concerns (and rightly so!) about all the ways Darth Vecdor could fail or provide incomplete and/or incorrect information. Darth Vecdor has mechanisms intended to (hopefully!) avoid those pitfalls. These, and much more, are extensively discussed in my Darth Vecdor preprint. Will Darth Vecdor actually be able to produce useful, high quality knowledge graphs? I hope so! FWIW, so far, anecdotally, I’ve been impressed with the outputs it can produce. That already feels like a big advance over my prior efforts in this space. I am cautiously optimistic. With that said, see the many caveats in an upcoming section of this post, in the preprint, and on the Darth Vecdor GitHub site.

Darth Vecdor is designed to be extensible to work with nearly “any” LLM that can be repeatedly queried using Python code. So far, virtually all of this work was done using ChatGPT (via its API) as the LLM (though some rudimentary, early work with Darth Vecdor was done with at least one other LLM). Code will need to be written to work with other LLMs, but hopefully that code will be “easy” for me, or other programmers, to write.

Where Can You Get Darth Vecdor and More Info About It?

  • For more information about Darth Vecdor, see the preprint here.
  • The Darth Vecdor source code can be downloaded here. Make sure to read the license, warnings, caveats, and other information on the GitHub site and at the bottom of this post.

A Hypothetical Example of Darth Vecdor in Action

Let’s imagine the following:

You have downloaded the Darth Vecdor source code onto your computer. You have Python installed and know how to code in it, since Darth Vecdor was written in the Python programming language. You have also installed a copy of the free, open-source PostgreSQL (“Postgres”) database program running on your computer (since Darth Vecdor requires it). You have also imported a list of the names of thousands of medical problems (like headache or myocardial infarction) into a table called “medical_problems” into your Postgres database. After reading the Darth Vecdor license and all of its caveats and warnings (and the warnings and caveats in this post farther down), you decide to proceed. You configure Darth Vecdor to run on your laptop. You then launch a browser on your laptop to connect to the Darth Vecdor system that is running on your laptop. You click the menu in the upper right corner to load up your “medical_problems” into Darth Vecdor as a “terminology,” then fill out the form on screen. Here is a partial hypothetical example screenshot:

After Darth Vecdor imports your terminology, you click the Darth Vecdor menu to get the form you can fill out to make relationships for every medical problem in your terminology. Here is a partial example screenshot of the configuration form you fill out to ask an LLM for every medication that treats each problem in your terminology:

Darth Vecdor has a test mode on this form that allows you to get the relationships for a set of sample set of terms without the results getting written to the database. Doing test runs prior to doing a full production run may help you adjust the prompts and other configurations to get your desired response quality from the LLM. A full run may take hours or days (or more!), depending on the size of your terminology and the configurations you set in the form. However, if it runs successfully, when complete, your laptop’s PostgreSQL Darth Vecdor database should now contain the medications that treat each problem in your terminology, at least according to the responses from the LLM. Are those responses correct, complete, and useful? That’s up to you to assess.

You can query your PostgreSQL database and get results that might look something like this (a few rows of a hypothetical example using aliased column names):

ProblemMedication Treatment
headacheacetaminophen
headacheibuprofen
headachenaproxen
headachesumatriptan

If the results don’t look correct or meet your needs, you might select your prior configuration for creating this relationship in Darth Vecdor (it should save the configurations and show the most recent one in the form for recall when needed), make some modifications, and run it again.

In the same run, or in additional runs, for each medical problem in your terminology, you can create prompts for virtually any other relationships you want, such as the causes of each of the medical problems, or the specialties that most often treat each of the medical problems. Will the results be correct, usable, and complete? That likely mostly depends on your prompt, the LLM, and your configurations. If the Darth Vecdor code itself proves a problem in getting the responses you need, it’s open source, so you can edit it! Again, you should validate that the results meet your needs before using them for any purpose.

Darth Vecdor has other functions. One is the ability to use queries to create code sets from a terminology you have loaded into Darth Vecdor, like “only the neoplasm (cancers) in the SNOMED-CT terminology” or “only the neoplasms (cancers) among ICD-10-CM diagnosis codes.”

When you create code sets, and when you create relationships, you can configure “expansion sets” to have the LLM get synonymous terms for the code set’s code strings (associated terms) or relationship strings (the responses returned by the LLM to your relationship prompt). From these, embeddings are created from various combinations of the original terms and/or the synonyms. The intent of these combination embeddings are to attempt to improve the likelihood that comparing two concepts by comparing their embeddings will give a correct response regarding how closely they are to one another in “meaning.” This may help when using another of Darth Vecdor’s functions: matching the response from an LLM to one of your code set codes (like attempting to matching “myocardial infarction” to the closest equivalent SNOMED-CT code).

Darth Vecdor also has a function that allows you to write a query that will be run on every code in a selected code set, with the results stored in one of your database tables. If you will need those results frequently, and generating them requires slow, computationally expensive queries, this is intended to allow you to “pre-run” those queries, materialize (store) the results, and now future queries for those results on the materialized table should be much, much faster.

Why the Name “Darth Vecdor”?

I named it Darth Vecdor because vectors provide an important underpinning to its code and its supporting technologies. I like Star Wars and I thought the name was clever when I came up with it. At the time I concocted it, I couldn’t find any hits on a Google search using that exact spelling (with quotes around it). My hope is to avoid confusion if something else were to have the same name, and even at the moment of writing this sentence (12/12/2025 at 5:30 pm CT) I still tried it and got no hits.

Important Warnings, Caveats, and More

Nothing in this post should be construed as medical advice. While I hope Darth Vecdor can provide value in many areas, Darth Vecdor is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. Any use is entirely at your own discretion and risk (and, of course, the risk of those for whom you have responsibility). You need the appropriate expertise and associated due diligence to safely, effectively, and appropriately use Darth Vecdor. There is no assurance that Darth Vecdor or any of its outputs meets or will meet any or all needs for any use. I have discussed its potential use in the medical domain, but that does not imply it is safe or appropriate for any use in healthcare or any other use at all. Darth Vecdor is highly configurable, so suitability or insuitability for any use also may relate to your configuration and use of the system. I tried hard to build it with quality, but you should assume it has serious bugs and design flaws, and the system surely lacks critical functionality. Depending on whether and how it, and/or its outputs, are used, Darth Vecdor and/or its outputs could lead to dangerous outcomes. It’s entirely up to you assess and validate its suitability (or the suitability of its outputs) for any purpose whatsoever. If there are disclaimers that I should have put here but didn’t, please imagine that any and all such disclaimers are here.

Conclusion

I intend to post more about Darth Vecdor in the coming weeks, months, etc. For example, at some point I will need to make enough information available for someone to reasonably be able to configure and run it without either my help or a lot of work (or both!). 😀 I hope to have additional posts on some of my use cases, perhaps even associated prompts or other configurations I used to build the knowledge graphs I hope to power some of those use cases. I may even share some of the knowledge graphs themselves. However, at least for now, I wanted to share Darth Vecdor’s availability for any early explorers who may have interest in it.

Hopefully this can provide some value to you and your efforts to beneficially serve the world, whatever those efforts may be. As always, YMMV.

All opinions expressed here are entirely those of the author(s) and do not necessarily represent the opinions or positions of their employers (if any), affiliates (if any), or anyone else. The author(s) reserve the right to change his/her/their minds at any time.

6 responses to “Introducing Darth Vecdor: A Free, Open-Source Platform to Create Knowledge Graphs Using LLMs (such as ChatGPT)”

  1. tfischer16 Avatar
    tfischer16

    This is so cool, Jon! Thank you for building in public and sharing it for others!!

    Like

  2. tfischer16 Avatar
    tfischer16

    So cool, Jon!! Thank you for building and public and sharing this!

    Liked by 1 person

  3. Nick Orr Avatar
    Nick Orr

    I provide an online wellness program originally designed for individuals with pre-diabetes, which includes an AI-generated progress report over the 12 weeks of the class. I’m curious about possible use cases of your platform in this context.

    Could the data you’ve developed support prevention-focused programs like mine? Is there potential for real-world participant data (e.g., lifestyle changes, progress metrics) to feed into your database to help identify the most effective strategies for preventing Type 2 diabetes? Also, can the knowledge map your system creates be multi-dimensional, or is it limited to 2D?

    Like

    1. jonhandlermd Avatar

      Hi Nick, thanks for writing. The knowledge map the system creates is a knowledge graph generally in the form of a “triple store”. Each “tidbit” of information is stored in a database as a subject concept, predicate (or relationship), and object concept. For example, one “tidbit” of information (also known as a “triple” because it has these 3 components) might be:

      subject concept = “strawberry”
      relationship = “has color of”
      object concept = “red”

      Darth Vecdor is intended to use LLMs to build up a “knowledge base” or “knowledge graph” composed of a whole lot of these “triples”. This is sometimes therefore called a “triple store.” The triple store can be represented as a big “spider web” kind of thing in which every concept (subject or object) is called a node (think of it like a dot holding the concept, like “strawberry”) or “red” and every relationship is called an “edge” and it’s represented by the line connecting two dots. You might see this “web” (or “graph”) displayed with the concept name next to or inside each node (or dot), and each line connecting nodes has the name of the relationship it represents right next to it. There are other ways to display this. The Wikipedia article on knowledge graphs has a picture that might help (https://en.wikipedia.org/wiki/Knowledge_graph).

      Generally, I have seen these graphs displayed as two-dimensional objects and I haven’t seen graphs typically referred to as “dimensional” (though this could be unveiling some ignorance I have on this). One could consider each different relationship type as a “dimension”, and the graphs could even be visually represented in this multi-dimensional way. I’m guessing that’s been done before. So, I guess it depends on what you mean by “multi-dimensional.” I would say the graphs can have as many dimensions as you desire, if you define each dimension as a different type of relationship.

      With regard to the possible use cases in the specific context of a wellness program for individuals with pre-diabetes, I think that really depends on the need. The knowledge base Darth Vecdor is designed to generate is knowledge extracted from a large language model, such as ChatGPT. This is more about making “known knowledge hopefully more usable” than it is about creating new knowledge from a new dataset as you describe. With that said, I can imagine potential, hypothetical use cases in the space you describe, but one would have to be very thoughtful about how the knowledge base would be tested and validated, and how it would be implemented to ensure safety and effectiveness. I hope to post some thoughts about how I’m thinking of using, or actually using Darth Vecdor in the coming weeks, and perhaps that will spur some potentially useful ideas that might be relevant for your efforts, which are focused on a really important area.

      Like

      1. Nick Orr Avatar
        Nick Orr

        Thank you for taking time to reply. I read the Wiki and did some AI searching. I didn’t know how to state my use case so I let Gemini state it for me … “My use case involves merging clinical prediabetes data with Social Determinants of Health (SDoH) to identify health disparities. By overlaying this with a Behavior Change Intervention Ontology, we can perform precision intervention mapping. This allows us to identify not just the necessary life skills, but the optimal adult learning techniques (e.g., andragogical vs. task-oriented coaching) that will most effectively drive behavior modification for that specific individual’s social context.”

        This is why I mentioned multidimensional. I want to combine the best individualized life skill training with the most effective life-style modifications that have the biggest impact on the individual based on their situation. Not everyone can afford a life coach or a personal trainer. I want to reduce the gap of well-being and see possibilities in using AI to do so. Prediabetes is just an example and where I have chosen to begin. But I think other Heath Disparities, like smoking, can also be addressed in the same way.

        Thank you again. Nick

        Like

Leave a reply to jonhandlermd Cancel reply