
by Jonathan A. Handler, MD, FACEP, FAMIA
I Wanted a Comprehensive Medical Knowledge Graph
For decades, I have hypothesized that a comprehensive medical knowledge database (aka, “knowledge base” or “knowledge graph”) would enable radical positive transformation in healthcare. I wanted a database containing relationships between concepts, like:
- All the symptoms of every medical problem.
- All the tests appropriate for every medical problem.
- All the medication treatments for every medical problem.
- All the complications of every medical problem.
- All the types of specialities that diagnose and/or treat every medical problem.
- All the surgeries (if any) appropriate for every medical problem.
- The potential “gravity” of every medical problem.
- For each “diagnosis code”, whether it was a symptom, a diagnosis, or something else.
- Etc.
With such a database, I imagined we could use it to automatically determine when patients were failing to get a diagnosis, getting the wrong diagnosis, failing to get any therapy when therapy was appropriate, getting the wrong therapy, suffering complications from their diagnosis or therapy, and so much more. With a knowledge graph, I believed we could build systems to help accomplish what I call the “3 M’s” of clinical quality: measure, manage, and maximize the quality of care for every patient on every visit (and between visits!).
Over the years, I found many knowledge graphs. I found free ones, commercial ones, and even aggregations of knowledge graphs (like this one). However, none seemed to have everything I needed. They either lacked my desired relationships, had incomplete coverage within the relationships, were too expensive, had limitations on use, weren’t consistently (or at all) maintained, seemed impossible to use, or had some other “gotcha”.
Not Finding What I Needed, I Made Darth Vecdor
The Idea of Darth Vecdor
Many LLMs, especially the “foundational models” like ChatGPT and others, hold vast amounts of knowledge. As I explored LLMs, it hit me… maybe we could extract that knowledge and store it in database to finally make the knowledge base I always wanted! In fact, maybe it could create almost any knowledge base. It seemed like it could be a better, more modern version of my earlier search engine web scraping idea!
So began my latest labor of love, Darth Vecdor. Building on the decades of my percolating ideas in this space, I worked on it for more than a year. Now, Darth Vecdor is a platform intended to build (hopefully useful) knowledge bases from LLM responses to prompts.
What Does Darth Vecdor Do?
Darth Vecdor is designed to be able to import virtually any terminology or set of concepts (.e.g., diagnosis codes, lab tests, medications, surgeries, medical specialties, types of commercial vehicles, household appliances, etc.). Then Darth Vecdor can take any of those lists and repeatedly query an LLM, using each item on the list, to get virtually any relationship(s). For example, for every clinical problem, it can query for all of its treatments, and/or all of its symptoms, and/or all of the tests one can perform to assess it. For every type of commercial vehicle (e.g., dump truck, excavator, etc.), it can query for all of its possible uses. Darth Vecdor stores the relationships in a knowledge graph having both strings (words) and associated embeddings (vectors, or sets of numbers, intended to represent the “meaning”, or “semantics,” of the strings).
Using embeddings, Darth Vecdor can attempt to match the relationship “objects” back to virtually any coding system or terminology, such as SNOMED-CT or ICD-10, or the custom terms and codes used at a particular company. An end result could be every clinical problem in SNOMED-CT and the LLM-provided medication treatments for each of those problems, with each such medication mapped by Darth Vecdor into an RxNorm medication terminology code.
You may have concerns (and rightly so!) about all the ways Darth Vecdor could fail or provide incomplete and/or incorrect information. Darth Vecdor has mechanisms intended to (hopefully!) avoid those pitfalls. These, and much more, are extensively discussed in my Darth Vecdor preprint. Will Darth Vecdor actually be able to produce useful, high quality knowledge graphs? I hope so! FWIW, so far, anecdotally, I’ve been impressed with the outputs it can produce. That already feels like a big advance over my prior efforts in this space. I am cautiously optimistic. With that said, see the many caveats in an upcoming section of this post, in the preprint, and on the Darth Vecdor GitHub site.
Darth Vecdor is designed to be extensible to work with nearly “any” LLM that can be repeatedly queried using Python code. So far, virtually all of this work was done using ChatGPT (via its API) as the LLM (though some rudimentary, early work with Darth Vecdor was done with at least one other LLM). Code will need to be written to work with other LLMs, but hopefully that code will be “easy” for me, or other programmers, to write.
Where Can You Get Darth Vecdor and More Info About It?
- For more information about Darth Vecdor, see the preprint here.
- The Darth Vecdor source code can be downloaded here. Make sure to read the license, warnings, caveats, and other information on the GitHub site and at the bottom of this post.
A Hypothetical Example of Darth Vecdor in Action
Let’s imagine the following:
You have downloaded the Darth Vecdor source code onto your computer. You have Python installed and know how to code in it, since Darth Vecdor was written in the Python programming language. You have also installed a copy of the free, open-source PostgreSQL (“Postgres”) database program running on your computer (since Darth Vecdor requires it). You have also imported a list of the names of thousands of medical problems (like headache or myocardial infarction) into a table called “medical_problems” into your Postgres database. After reading the Darth Vecdor license and all of its caveats and warnings (and the warnings and caveats in this post farther down), you decide to proceed. You configure Darth Vecdor to run on your laptop. You then launch a browser on your laptop to connect to the Darth Vecdor system that is running on your laptop. You click the menu in the upper right corner to load up your “medical_problems” into Darth Vecdor as a “terminology,” then fill out the form on screen. Here is a partial hypothetical example screenshot:

After Darth Vecdor imports your terminology, you click the Darth Vecdor menu to get the form you can fill out to make relationships for every medical problem in your terminology. Here is a partial example screenshot of the configuration form you fill out to ask an LLM for every medication that treats each problem in your terminology:

Darth Vecdor has a test mode on this form that allows you to get the relationships for a set of sample set of terms without the results getting written to the database. Doing test runs prior to doing a full production run may help you adjust the prompts and other configurations to get your desired response quality from the LLM. A full run may take hours or days (or more!), depending on the size of your terminology and the configurations you set in the form. However, if it runs successfully, when complete, your laptop’s PostgreSQL Darth Vecdor database should now contain the medications that treat each problem in your terminology, at least according to the responses from the LLM. Are those responses correct, complete, and useful? That’s up to you to assess.
You can query your PostgreSQL database and get results that might look something like this (a few rows of a hypothetical example using aliased column names):
| Problem | Medication Treatment |
| headache | acetaminophen |
| headache | ibuprofen |
| headache | naproxen |
| headache | sumatriptan |
If the results don’t look correct or meet your needs, you might select your prior configuration for creating this relationship in Darth Vecdor (it should save the configurations and show the most recent one in the form for recall when needed), make some modifications, and run it again.
In the same run, or in additional runs, for each medical problem in your terminology, you can create prompts for virtually any other relationships you want, such as the causes of each of the medical problems, or the specialties that most often treat each of the medical problems. Will the results be correct, usable, and complete? That likely mostly depends on your prompt, the LLM, and your configurations. If the Darth Vecdor code itself proves a problem in getting the responses you need, it’s open source, so you can edit it! Again, you should validate that the results meet your needs before using them for any purpose.
Darth Vecdor has other functions. One is the ability to use queries to create code sets from a terminology you have loaded into Darth Vecdor, like “only the neoplasm (cancers) in the SNOMED-CT terminology” or “only the neoplasms (cancers) among ICD-10-CM diagnosis codes.”
When you create code sets, and when you create relationships, you can configure “expansion sets” to have the LLM get synonymous terms for the code set’s code strings (associated terms) or relationship strings (the responses returned by the LLM to your relationship prompt). From these, embeddings are created from various combinations of the original terms and/or the synonyms. The intent of these combination embeddings are to attempt to improve the likelihood that comparing two concepts by comparing their embeddings will give a correct response regarding how closely they are to one another in “meaning.” This may help when using another of Darth Vecdor’s functions: matching the response from an LLM to one of your code set codes (like attempting to matching “myocardial infarction” to the closest equivalent SNOMED-CT code).
Darth Vecdor also has a function that allows you to write a query that will be run on every code in a selected code set, with the results stored in one of your database tables. If you will need those results frequently, and generating them requires slow, computationally expensive queries, this is intended to allow you to “pre-run” those queries, materialize (store) the results, and now future queries for those results on the materialized table should be much, much faster.
Why the Name “Darth Vecdor”?
I named it Darth Vecdor because vectors provide an important underpinning to its code and its supporting technologies. I like Star Wars and I thought the name was clever when I came up with it. At the time I concocted it, I couldn’t find any hits on a Google search using that exact spelling (with quotes around it). My hope is to avoid confusion if something else were to have the same name, and even at the moment of writing this sentence (12/12/2025 at 5:30 pm CT) I still tried it and got no hits.
Important Warnings, Caveats, and More
Nothing in this post should be construed as medical advice. While I hope Darth Vecdor can provide value in many areas, Darth Vecdor is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. Any use is entirely at your own discretion and risk (and, of course, the risk of those for whom you have responsibility). You need the appropriate expertise and associated due diligence to safely, effectively, and appropriately use Darth Vecdor. There is no assurance that Darth Vecdor or any of its outputs meets or will meet any or all needs for any use. I have discussed its potential use in the medical domain, but that does not imply it is safe or appropriate for any use in healthcare or any other use at all. Darth Vecdor is highly configurable, so suitability or insuitability for any use also may relate to your configuration and use of the system. I tried hard to build it with quality, but you should assume it has serious bugs and design flaws, and the system surely lacks critical functionality. Depending on whether and how it, and/or its outputs, are used, Darth Vecdor and/or its outputs could lead to dangerous outcomes. It’s entirely up to you assess and validate its suitability (or the suitability of its outputs) for any purpose whatsoever. If there are disclaimers that I should have put here but didn’t, please imagine that any and all such disclaimers are here.
Conclusion
I intend to post more about Darth Vecdor in the coming weeks, months, etc. For example, at some point I will need to make enough information available for someone to reasonably be able to configure and run it without either my help or a lot of work (or both!). 😀 I hope to have additional posts on some of my use cases, perhaps even associated prompts or other configurations I used to build the knowledge graphs I hope to power some of those use cases. I may even share some of the knowledge graphs themselves. However, at least for now, I wanted to share Darth Vecdor’s availability for any early explorers who may have interest in it.
Hopefully this can provide some value to you and your efforts to beneficially serve the world, whatever those efforts may be. As always, YMMV.
All opinions expressed here are entirely those of the author(s) and do not necessarily represent the opinions or positions of their employers (if any), affiliates (if any), or anyone else. The author(s) reserve the right to change his/her/their minds at any time.
Leave a reply to Nick Orr Cancel reply