Sunday, February 10, 2008

Natural Language Search - Playground...

The concept of keyword search - like we all do using Google today - is serving good results to find information that are containing facts we are looking for. Personally I got better over the many years I use Google e.g. in what keywords to use to even find things you don't know a name for. The idea of getting facts as results vs. getting possible documents (=web pages) that might contain the fact is winning new ground. A breed of new start-ups are trying to extract facts out of the web and provide the user the ability to pose queries as natural language questions (like asking a friend for something).

I'm talking about two things here:
  • 1st: facts extraction in information (semantic analyses)
  • 2nd: ability to understand a natural language question
I'm playing around with some of the new search services - these includes true knowledge and Powerset (both in privat beta) and some tests with hakia, Google and ask.com. Some of the services are in experimental state - so no judging if they have the stuff for the next Google.

Questions are written in English - supporting multiple languages is probably the hardest part of natural language search (at least if the engine should really understand it vs. using a statistical approach). And none of the tested understand anything else than English.

The response is a straight fact or links to documents that contains the fact. The guys at true knowledge are building an engine that analyses the question and provide a straight answer - or partial answer with the option to teach the site the missing facts. The response contains the detailed reasoning in order to verify that the question was correctly understood by the engine.
  • My question: Who is president of Switzerland?
  • Computed question by true knowledge: Who is the president (head of a nation state) of switzerland, the country in Western Europe at the current time?
  • Answer by true knowledge: Pascal Couchepin (born 1942), the Swiss Federal Councilor
The answer is correct - see screenshot:

trueknowledge.com: Who is president of Switzerland?

Asking the same question to Google does also provide a straight answer (I was surprised) including links to the sources to backup the fact.
  • Answer by Google: Switzerland — President: Micheline CALMY-REY
That is not true (anymore) - Google is using "old" sources. Background: In Switzerland we have a new president every year - Micheline Calmy-Rey was president in 2007.

The team in the labs at Powerset are using the wikipedia content as primary source for facts. Results are wiki articles that contain the fact (with clever hit-highlight).
Good - but the hit-summary does not show the name - I have to follow up with the source.

Hakia - serving semantically analysed results from the ask.com index. Same question:
  • Answer by Hakia: Possible answer: Jean-Marie Musy was a lawyer, Swiss Federal Councilor and was twice elected president of Switzerland.
Not true anymore (was president in 1925 and 1930) - though the 1st search result (below the answer) presents a recent news article talking about the current president (topic: joke about a certain Dr.).

Funny detail - asking ask.com provides:
  • Answer from ask.com: The Chief of State of Switzerland is President Samuel Schmid, who is also Head of State
Old fact too (was president in 2005)

That overview shows the 2nd hardest problem (on my list) in semantic extraction - facts changing over time. How to figure out if a fact is still true - or how long has a fact been true - what is the date of the source. The web has a very weak infrastructure to provide these information.

Google and ask.com had the correct sources linked - and most of them are up-to-date - it looks like they are not indexing and analysing that often. The true knowledge guys are taking extra care on that topic - facts are always put in time context (including dependency to inherited facts). So they are able to distinguish between the 1st question and this one:
  • My question: Who was president of Switzerland in 2006?
  • Answer by true knowledge: If there are any answers, I couldn't find any.
No answer - but a link with a wizard to add/reference the fact. I did add it - so if you run the same question now:
  • Answer by true knowledge: Micheline Calmy-Rey (born 1945), the Swiss politician
So much for now.. a lot more exiting findings to tell.. in another post someday.

On my mind lately..

Had the pleasure to help out in the Best of Swiss Web jury. You might remember from last year - local.ch won the Master award. Was an interesting learning to take apart some of the submitted websites. This years award night is March 11th - see you there.


Gave a few speeches on emerging open web standards and mapping last year at webtuesday, blogcamp and tech talk. Although it needs a lot of time to prepare (including time to research on facts..) it's fun and rewarding to do it.
Next upcoming speech is for the fine folks of /ch/open - focusing on data portability with open web standards.

Update 18/02: More information about the speech


Good to see some more weby start-ups in Switzerland:
  • wua.la - user-centric distributed storage (looks like the promotion tour was a success - great job Dominik & team)
  • amazee.com (blog) - one yet to reveal social collaboration thingy (best wishes to Markus for joining the crew)
  • exsila.ch - entertaining goods sharing community


And yes.. Matchbox 20 are on tour playing in UK - for fans - book your fight to London for the May 1st - I'm in :-)

About me

flickr.com Photos
... and travel photos at TrekEarth
Upcoming Events
Publishing with Blogger
CSS by Joé Lemelin & Stéphanie Léveillé
This site is XFN friendly
creativecommons by-nc-sa