Why is Semantic Search so difficult?

May 24th, 2011

Search engines are currently based on Natural Language Processing, actually a tiny subset of NLP called “key word” search technology. This technology, while yielding quick information to millions of people over many years, has always been found lacking when attempting to find “exact answers” to ad hoc information requests.

Many NLP researchers believe that Semantics and Ontology may provide more exact answers, with what is called Semantic Search.  Note that the recent success of IBM’s Watson in winning a Jeopardy contest over human Jeopardy experts was attributed to its semantic ability to understand questions.  But if this is the “only” success for Semantic Search, why is Semantic Search so difficult?

There are four main problems with Semantic Search when using Natural Language as the sole User input:

1. The lack of any Visual component to “pure” NL:

  • Basic User Interface (UI) design has a “first principle”, usually reduced to the phrase “recognition is better than recall”.  This principle is the basis for “lookups” that users can select from instead of trying to remember a name (even state abbreviations need lookups).  Without a visual component, the user may call a term a different name (“account” instead of “customer”).
  • Using Natural Language information requests introduces spelling errors and typos.

2. The information requested is “not in the data source”:

  • The subject area is not specified (so the same term can apply to phrases in more than one Subject Area)
  • Ths subject area is correct, but the data source(s) available to the search product don’t contain the correct “granularity” of data to produce the answer (Example: a database trivia question may ask for the age of players but the source database doesn’t contain player dates of birth).

3.  Generic phrases are used in the NL request:

  • The user may use a generic or abstract term, but there is a lack of inference connection from that generic term to a concrete term having “values” in the data source.

4. The Semantic Search product is based on unstructured or semi-structured data sources.

  • Exact answers tend to “live” in well-structured databases, which support a first-order query language capable of generating a query that produces a precise answer.

Note: as a disclaimer, I am bringing a Semantic Search product to market that couples NL with a unique Visual Request Specification.  The product can extract exact answers from any relational database.

How semantics will revolutionize enterprise search

October 19th, 2011

Semantics is the new “hot” technology in Enterprise Search.  Semantic search products, attempting to produce “exact answers” based on Natural Language Processing (NLP), have been around seemingly forever (with the earliest attempts occurring in the 1960s).   Very few such semantic search products provided a level of reliability to become commercially viable (EasyAsk being one).

But two recent successful implementations have propelled NL-based search into the forefront.  Early in 2011 IBM’s Watson won a Jeopardy contest against two Jeopardy experts.  And then Apple’s iPhone 4S release included Siri, which incorporates speech to text, text to speech and semantic search technology.  Siri was developed over many years by the leading government-oriented IT research organization, SRI.

Leading search vendors are well aware of the power of semantics, which is partly based on taxonomies or semantic models whose objects are related through inference.  And now HTML5 offers a method of “semantic tagging” that will allow companies to build or modify web sites that tag their products, services and other information in a much more “findable” way: search engine bots can match semantically tagged DOM objects to published semantic objects found in published taxonomies, ontologies and semantic models.    Google, Microsoft and Yahoo teamed together to host Schema.org, a site that publishes taxonomies, ontologies and semantic models that web designers can select from to semantically tag “findable” DOM objects in their web pages.  And of course “finding exact answers” on corporate websites or in corporate databases is one of the major tenets of Enterprise Search.

Disclaimer for this post: the author is the inventor of a semantic search engine being tested for release in the next few months.