When finding answers from your own data feels like a PhD-level challenge… there’s a better way.

The Problem … why is it so hard just to ask a question?

Today, access to the knowledge locked in the huge amounts of unstructured and structured data is essential even for the most mundane aspects of life:

  • Can I install a 2m*3m garage at my home?
  • Who is managing the Chia account?
  • I want an overview of the XYZ project.
  • … and so on

With some notable exceptions, getting answers to your own questions using the knowledge sources that contain the answer probably requires a PhD. 

We show how it is possible to combine unstructured and structured data within a consolidated knowledge model and then use LLMs to answer the user’s own questions using their own language.

The Goals … to answer users questions in their own language of their own data

My whole career has been in the pursuit of enabling users the ability to ask their own questions of information. There are several aspects of this pursuit:

  1. The information users want to question is available in a variety of formats, ranging from
    • highly unstructured (images etc), 
    • semi-unstructured (text, tables, etc), 
    • semi-structured (spreadsheets, document-stores), and 
    • highly-structured (relational stores, graph stores, etc), 
  2. Users want to ask questions in a natural language, but they are offered:
    • SQL, SPARQL etc require being fluent in query languages
    • Lucene, Elasticsearch, etc requires knowledge of the search syntax
    • Dashboards are limited and rigid

The Practice … is complicated

The solutions that are offered to the users have fallen into two camps driven by the format of the information.

  • If the information is in the form of unstructured documents, then the favored solution would be some form text search, such as Lucene or Elasticsearch
    • Lexical search engines provide a user-friendly syntax to express the query, but it does require training on this syntax to get the best results. Queries are not close to natural language
  • If the information is already structured, say in a database (SQL, NoSQL, or whatever), then the favored solution would be a database query engine.
    • Databases provide powerful structured query languages (DSQL, SPARQL, Cipher, and so on) but they can scarcely be called user-friendly and not close to natural-language. Thus most queries have to be written by specialists and published via some form of dash-bporad so mere-mortals can use them

Thus the format within which the information is available dictates the language that users must use to ask their questions.

The Dream … ask anything, no PhD required

In reality, information comes in a mixture of formats: ranging from highly structured database tables, through spreadsheets, document tables, document text to images. 

  1. A consolidated knowledge model that can be populated with the full range of information  ranging from unstructured to structured.
  2. Users can query this Knowledge Model using their own language
  3. The response includes any information from the Knowledge Model 
  4. The provenance of any response is fully explained to the users, allowing users to ‘drill’ into any response for greater detail

The Solution … Knowledge Models and LLMs to the rescue

Imagine being able to ask these questions about the Kennedy family:

“Who is Jack Schlossberg?”

“Tell me about Rosemont College”

“How is Jack Schlossberg connected to Rosemont College”

We know the answers lie in a mixture of unstructured (text documents and pictures) and structured (database) data sources.

QuestionUnstructuredsourcesStructuredsourcesUnstructured +Structured sources
Who is Jack Schlossberg?
Tell me about Rosemont College
How is Jack Schlossberg indirectly connected to Rosemont College?

Therefore to obtain an answer to “How is Jack Schlossberg connected to Rosemont College” will require facts from unstructured and structured sources simultaneously.

However, let’s develop the querying of information with LLMs starting with unstructured, then structured, and finally a combination of both.

Example LLM Querying of Unstructured Information

OK, if I didn’t mention AI, LLM, RAG you probably wouldn’t have bothered opening this blog … with good reason. These technologies have transformed how we can solve such problems. 

A typical ‘unstructured’ text is the Kennedys Biography. So lets apply LLM RAG as follows:

  • Specify the Kennedys Biography as a source of data.
  • Vector Index the document text
  • A user query/question is submitted to the RAG-based application
    • This uses natural language, so achieve Goal 2
  • The RAG engine retrieves the matching statements from the vector index

Lets ask a simple question:

To which we get an answer as follows

I’m happy with this answer, as it is much richer than a simple text (Lucene etc)  search of text which would just return lexically matching sentences. The LLM has assembled the answer into a very readable narrative.

If I ask a question about something it does not know then, rather than hallucinating, Dedoctive tells me there isn’t anything there:

Observations

  • Data is much more than groups or tables of facts. In the relational-world we relate facts together. The graph-world of data we have a network of facts that are joined together.
  • The deficiencies of RAG-based analysis is discussed at length here.
    • Often ambiguous due to reliance on vector similarity
    • Struggles with multi-concept queries or logical connections
    • Only stores knowledge as vectors
    • Cannot connect structured and unstructured data

Example LLM Querying of Structured Information

The deficiencies of RAG and vector-similarity are now well documented, especially in the case of structured information. So how can we make progress achieving Goal 1: querying structured as well as unstructured data.

Dedoctive  is a framework that we have developed that creates a multi-index knowledge model of the supplied information. Since the information source can be structured as well as unstructured, we can use it to test typical questions of structured databases.

The Kennedys database is a sample database created by Topquadrant to demonstrate features of its Topbraid Composer product. This database contains a comprehensive Kenndy family tree, in RDF format

It’s about as structured as data gets, so a great example to use to demonstrate an LLM querying structured data.

For this example we used Dedoctive as follows:

  • Specify the Kennedys dataset as a source of data.
  • Load Kennedys into the Dedoctive knowledge model
    • Dedoctive creates semantic, lexical, and graphical indices of the submitted RDF.
  • A user query/question is submitted to Dedoctive
    • This uses natural language, so achieve Goal 2
  • Dedoctive retrieves all of the relevant RDF statements from the knowledge model
    • Dedoctive uses a combination of semantic, lexical, and graphical search to fetch the ‘relevant’ statements
      • Semantic for closeness of knowledge fragments with the semantics of the question
      • Lexical for similarity of the words within a question to knowledge fragments
      • Graphical for relatedness of concepts within the question to knowledge fragment concepts.
  • These relevant RDF statements are used as the context for an LLM query along with the user’s question.

Lets explore this with a low-ball question 

To which Dedoctive, using the RDF statements from the Knowledge Model responds as follows:

No bad, but not really a challenging question. Note that it does provide the link to the RDF IRI.

What about the following:

To which we get the following response:

This shows that the LLM can capitalize on one of the main advantages of RDF graphs: we can navigate from one concept through many intermediate concepts to a target concept.

The LLMs walks-through the concepts in its answer and has conveniently provided links to the relevant IRIs within the graph model, so we could explore each IRI  further.

Since we are querying graph data, it is reasonable to ask to explore this answer ‘EXPLORE RESPONSE’ so that we can understand how the answer was constructed from the underlying RDF statements:

This demonstrates querying structured data  (Goal 1) using natural language (Goal:2). But users want not just structured OR unstructured information. They want to query a combination of both.

Observations

  • It is possible to provide the LLM structured information as part of the context, in conjunction with the question, and get a well-reasoned answer.
  • Transforming the structured data into a graph form (subject-verb-object) aligns well with the chunking of narrative text into sentences.
  • With a suitable strategy for assembling the relevant ‘statements’ from the knowledge model, an LLM can answer complex structured data questions, thus achieving Goals 1 and 2
  • Handles with multi-concept queries or logical connections, so necessary when questioning relational or graph information sources.

Example LLM Querying of Hybrid (Structured and Unstructured) Information

A far more realistic scenario is when a user wants answers from a hybrid of structured and unstructured information sources.

Now, exploring a family tree, especially the Kennedys family, is a popular occupation. Information will come from multiple sources, and in different formats. In this example we have a website with a detailed description of the Kenndys family, together with an RDF graph of the Kennedy family tree.

Although I refer to this website as ‘unstructured’, I would suggest it is typical of many factual texts in that it is a collection of ‘statements’ assembled as a narrative.

For example 

JFK, the son of Joseph Patrick Kennedy 

Could be the statement:

‘JFK’   ‘is-son-of’   ‘Joseph-Patrick-Kennedy’

Or equivalently

<Subject> <Verb> <Object>

Yes, you’ve guessed already where I am going with this😀

As for the structured information source example, we load this into Dedoctive’s knowledge model, the only difference being is that we also index the website and its contents (images etc)

Lets explore this with a typical family tree question: how are things related. These are easy questions to ask, but require the LLM discovering a path between concepts:

To which we get the nicely detailed answer.

As before, each part of the answer is qualified with references to the source of facts. However, now those sources are a combination of the structured information in the graph, and the unstructured information from the website.

Additionally, we can explore the same path that the LLM took through these facts using ‘EXPLORE RESPONSE’ as shown below:

Observations

  • The multi-index knowledge model accommodates the extremes of unstructured and structured information sources
  • Query-answers connect this structured and unstructured data into harmonized results
  • It handles multi-concept queries or logical connections that span both the unstructured and structured information sources

Why … does an Unstructured and Structured Knowledge Model work with LLMs?

Querying unstructured texts with an LLM or asking an LLM to construct a structured data query, such as SQL, is a ‘solved’ problem. However, that is not the approach that  is presented here. 

In this approach we are getting the LLM to assemble an answer to a query from  a combination, aka ‘federation’, of unstructured and structured information. So how can we unify the treatment knowledge within these extremes of data? 

Dedoctive uses a common knowledge model to capture the facts within the information sources. 

  • Unstructured information, such as documents, diagrams, and images, are converted to narrative text, which is then transformed to parts-of-speech statements: subject-verb-object
  • Structured information, such as tables, spreadsheets, and databases, are converted to directed graphs, which are then transformed to graph statements: subject-verb-object

Thus we can unify the treatment of unstructured and structured information by transforming everything to subject-verb-object. Therefore, when Dedoctive is requested to retrieve ‘fact’ statements from its knowledge model, they are being drawn indirectly from unstructured and structured data sources.

The Conclusion: Real Answers, No PhD Required

We’ve all been there—drowning in our own data, unsure how to even begin asking the right question. Until now, doing anything meaningful with that data usually meant brushing up on query languages, search syntax, or relying on dashboards someone else built.

But it doesn’t have to be that way.

What we’ve explored here is a new approach: combining structured and unstructured information into a single, unified knowledge model, then using LLMs to query it in natural language. The result? You ask questions in your own words—and get clear, traceable answers that pull from across your entire data landscape.

It all comes down to one simple shift: transforming everything—whether it’s documents, databases, spreadsheets, or graphs—into a common subject-verb-object format. That unification makes it possible to reason across formats, follow connections, and generate responses that are both accurate and easy to explore.

This isn’t just about AI. It’s about finally making your own data accessible to you—without needing to be a data scientist.

If that sounds like the future you want to build (or use), we’re already there.
Start asking better questions. Get better answers. No PhD required.


Leave a Reply

Your email address will not be published. Required fields are marked *