Search engines are currently based on Natural Language Processing, actually a tiny subset of NLP called “key word” search technology. This technology, while yielding quick information to millions of people over many years, has always been found lacking when attempting to find “exact answers” to ad hoc information requests.
Many NLP researchers believe that Semantics and Ontology may provide more exact answers, with what is called Semantic Search. Note that the recent success of IBM’s Watson in winning a Jeopardy contest over human Jeopardy experts was attributed to its semantic ability to understand questions. But if this is the “only” success for Semantic Search, why is Semantic Search so difficult?
There are four main problems with Semantic Search when using Natural Language as the sole User input:
1. The lack of any Visual component to “pure” NL:
- Basic User Interface (UI) design has a “first principle”, usually reduced to the phrase “recognition is better than recall”. This principle is the basis for “lookups” that users can select from instead of trying to remember a name (even state abbreviations need lookups). Without a visual component, the user may call a term a different name (“account” instead of “customer”).
- Using Natural Language information requests introduces spelling errors and typos.
2. The information requested is “not in the data source”:
- The subject area is not specified (so the same term can apply to phrases in more than one Subject Area)
- Ths subject area is correct, but the data source(s) available to the search product don’t contain the correct “granularity” of data to produce the answer (Example: a database trivia question may ask for the age of players but the source database doesn’t contain player dates of birth).
3. Generic phrases are used in the NL request:
- The user may use a generic or abstract term, but there is a lack of inference connection from that generic term to a concrete term having “values” in the data source.
4. The Semantic Search product is based on unstructured or semi-structured data sources.
- Exact answers tend to “live” in well-structured databases, which support a first-order query language capable of generating a query that produces a precise answer.
Note: as a disclaimer, I am bringing a Semantic Search product to market that couples NL with a unique Visual Request Specification. The product can extract exact answers from any relational database.