I recently spoke to a YarcData customer who has a very effective data warehouse that has been built up over a number of years. Like most data warehouses, they fine-tuned the data models and normalization of their data to optimize for the various queries across the KPIs, reports, and dashboards that their business users identified and prioritized. This data warehouse has been dimensionalized (is that a word?) across all the germane business parameters so their business users can run queries for various analyses along any one of those parameters.
The problem they face is that the business users kept coming up with queries that completely flew in the face of their organization, layout and dimensionalization of the data. These queries were initially called the “maverick queries” since they were completely off the wall and required join, after join, after join, after join. But every time someone ran one of these “maverick queries”, it brought the entire data warehouse to its knees and made it unavailable/unusable by the large number of users that needed the existing reports/dashboards. Consequently these “maverick queries” became the “forbidden queries” since no one was allowed to run them anymore!
The “maverick” users being of a strong and determined disposition started pulling extracts of the data warehouse into data marts to run their “forbidden queries” but this had its own challenges since they were working off a subset of the data and setting up and running these data marts took quite a bit of time. The worst part of the forbidden queries is that they are not only adhoc but also highly dynamic – the queries were constantly changing and by the time you built a data mart for one set of questions, the business was on to another set of questions.
Enter Graph Analytics. By representing the same data in the data warehouse as a graph, the data is now queryable (am pretty sure that¹s not a word!) along any dimension, any relationship. The approach is to load the YarcData graph analytics appliance every night from the main data warehouse and then run the “forbidden queries” on the YarcData appliance during the day thereby freeing up the data warehouse to be focused on the critical reports/dashboards. The users that wanted the traditional dashboards/reports were happy since the data warehouse was always available for their business critical operations. The “maverick” users that wanted the “forbidden queries” were ecstatic since they could now run the “forbidden queries” – some of the data might be a day old – but that is a huge improvement over not being able to run them at all.
While we have seen a lot of usage of Graph Analytics in traditional graph problems, this was an interesting, non-traditional use case for Graph Analytics – enabling existing data warehouses to handle their “forbidden queries”. What are your “forbidden queries”? How do you handle them?

Back around 2004 to 2008, I worked a project that was taking the approach of modeling data via ontologies using some propriety methods. As I sought out more standard methods for data modeling I found some of the new methods we find in today’s Semantic Web technology stack. I can recall many, many conversations on the topic of “fusion”, the term we used to describe determining that two data instances were equivalent (and the need to maximize fusion to search for linkages between instances). E.g. “Osama” in one context, being considered the exact same as “Usama” in another, or similar context. The problem is daunting and represents one of the more difficult challenges facing the natural language processing field today. Those challenges are beyond the scope of today’s blog, but quite similarly the problem arises in RDF, and that is the topic for today.
There is a certain amount of misguided belief in the market that Semantic Web technologies simply aren’t performant enough for the needs of a business and I often hear this presented as a reason for not choosing these technologies over a traditional RDBMS or other technology such as a NoSQL solution.