Examining Geobase
The database contains the
following information:Information
about states:1. Area of the state in
square kilometers 2. Population of
the state in citizens 3. Capital of
the state 4. Which states border a
given state 5. Rivers in the state
6. Cities in the state 7. Highest
and lowest point in the state in
meters Information about rivers: 1.
Length of river in kilometers
Information about cities: 1.
Population of the city in citizens
Try to ask a few random questions.
If Geobase doesn''t understand a
question, it will tell you the word
it can''t parse.Take a look at the
following sample queries. What are
the states?
What are the cities of New York?
What is the highest mountain in
California?
What are the names of the states
which border New Mexico?
Which rivers run through the state
that border the state with the
capital Olympia?
The language is defined in the file
GEOBASE.LAN, and the database is
defined in GEOBASE.DBA.Be
imaginative! Geobase will understand
many English sentences, but
occasionally you will find a
sentence that Geobase simply does
not recognize. This is the dilemma
of a natural language interface. If
you find a question, you feel
Geobase should be able to answer but
can''t, you will need to improve
Geobase so that it understands the
query! The Idea Behind
GeobaseGeobase illustrates one way
of implementing a natural language
interface to a database. However,
developing a complete natural
language interface to a database is
a very complicated task, as natural
languages are far more complex than
programming languages. There are far
more words in the natural language,
and natural languages have difficult
ambiguities. But Visual Prolog is
extremely well suited for natural
language processing, because the
backtracking mechanism can be used
to handle ambiguities.In Geobase the
stored data is a USA geographical
database. However, you could use the
same approach for other types of
data.The key idea behind Geobase is
simple: The user views the database
as a network of entities connected
by associations. This is known as an
entity association network. The
entities are the items stored in the
database. In Geobase the entities
are states, cities, capitals of
states, rivers, lakes, etc. The
associations are words that connect
the entities in queries. For
example:Cities in the state of
California. Here the two entities,
cities and state, are connected by
the association in. The word "the"
is just ignored here, and California
is regarded as an actual constant
for the state entity.Geobase is
designed to accept simple English.
This means that, rather than
worrying whether a sentence is
grammatically correct, Geobase tries
to extract the meaning by attempting
to match the user''s query with the
entity association network.Queries
can be combined to form rather
complex queries. For example:which
rivers run through states that
border the state with the capital
Austin?In order to make the query
match the entity association network,
Geobase must simplify the various
forms of the query. This occurs
while Geobase "parses" the query.The
first step is to ignore certain
words, such as:which, is, are, the,
tell, me, what, give, as, that,
please to, how, many, live, lives,
living, there, do, doesThis step
makes the query look like this:
rivers run through states border
state with capital Austin?The next
step is to find the internal names
for entities and associations.
Entities can have synonyms, and the
query can use plural forms of the
entity names. Associations can
consist of several words, and they
can also have synonyms. After these
conversions, the query looks like
this: river in state border state
with capital Austin?Geobase can now
classify the words as either
entities or associations and group
the query into subqueries (E=entity,
A=association, C=constant):river in
state border state with capital
Austin?E A (E A (E A E C))Geobase
can then evaluate the query by first
finding the name of the state with
the capital Austin, then finding all
the states that border this state,
and finally looking up which rivers
run through these states. Adapting
the Geobase IdeaGeobase is a natural
language query interface to an
existing database. You can adapt the
Geobase mechanisms to your own
natural language query interface; we
explain how in this section.Create
Your DatabaseThe first thing you
need to do is to create your
database. How the database is stored
or was created, has nothing to do
with Geobase. You can use internal
database sections or Visual
Prolog''s external database system,
or you could even access some other
database files by means of the
Visual Prolog Toolbox. Geobase
accesses the actual database through
the predicates (db) and b(ent).For
simplicity, the geographical
database is stored in an internal
database section, which you can load
from disk by calling the (consult)
predicate. Here are some sample
declarations from the geographical
database:/*state(Name,Abbreviation,Capitol,Area,
Admit,Population,City,City,City,City*/state(string,string,
string,real,real,integer,string,string,string,string)/*
city(State,Abbreviation,Name,Population)
*/city(string,string,string,real)/*river(Name,Length,
StateList)*/river(string,integer,list)/*border(State,Abbreviation,StateList)
*/border(string,string,list)/*etc.*/Porting
GeobaseThe first step in porting
Geobase to your own database is to
draw the entity association network.
The next step is to model this
network with the database predicate
schema:schema(Entity,Assoc,Entity)Here
are some examples of schema clauses
from Geobase:schema("capital","of","state")
schema("state","with","capital")
schema("population","of","state")
schema("state","with","population")
schema("area","of","state")
schema("city","in","state")}After
you have defined the entity
association network, you should
implement Geobase''s interface to
the database. This requires that you
define clauses for the two
predicates db and ent.Predicatesdb(ent,assoc,ent,string,string)ent(ent,string)The
ent PredicateThe (ent) predicate is
responsible for delivering all
instances of a given entity. In the
first argument of ent, Geobase
passes the name of an entity and
expects the second to return actual
string values for this entity.Here
are some example clauses of ent from
Geobase:ent(continent,usa).ent(city,Name)
:- city(_,_,Name,_).ent(state,Name)
:- state(Name,_,_,_,_,_,_,_,_,_).ent(capital,Name):-
state(_,_,Name,_,_,_,_,_,_,_).ent(river,Name)
:- river(Name,_,_).}The (db)
predicate is a bit more complicated
than ent. It is responsible for
modeling the relation between the
two entities (the association). You
can also regard the (db) predicate
as a function between one entity
value and another value. All the
arrows in the entity association
network (modeled by the (schema)
relation) should be implemented in
clauses for the (db) predicate. Here
are some examples from the
geographical database:db(city,in,state,City,State)
:-city(State,_,City,_).db(state,with,city,State,City)
:-city(State,_,City,_).db(abbreviation,of,state,Ab,State)
:- state(State,Ab,_,_,_,_,_,_,_,_).db(area,of,state,Area,State)
:-state(State,_,_,_,Area1,_,_,_,_,_),str_real(Area,Area1).db(capitol,of,state,Capital,State)
:-state(State,_,Capital,_,_,_,_,_,_,_).db(state,border,state,State1,State2):-
border(State2,_,List),member(State1,List).db(length,of,river,Length,River)
:-river(River,Length1,_),str_real(Length,Length1).db(state,with,river,State,River)
:-river(River,_,List),member(State,List).That''s
really all you need to do in order
to provide a natural language
interface for your existing
database.Translating Natural
Language QueriesMost natural
languages (and English in particular)
are not simple, straightforward, and
consistent. Nouns can be singular or
plural, verbs conjugate, synonyms
exist. Translating sentences from
natural language to something the
program recognizes is not a simple
task. In the following sections we
discuss how the Geobase program
deals with these translation
issues.Internal Entity NamesGeobase
needs to obtain an internal entity
name from the words the user has
used. They break down into three
separate problems:1). Plural forms
of entities. The user might use the
word states, which is the entity
name state appended by an s; or the
word cities, which comes from the
entity name city. The predicate (entn)
is responsible for converting plural
entities to their singluar forms.2).
Synonyms for entities. The user
might type town instead of city, or
place instead of point. Synonyms for
entities are stored in the database
predicate {synonym}.3). Compound
entity values. The entity values
might consist of more than one word,
like new york or salt lake city.
Geobase handles this situation
during parsing with the predicate db(get_cmpent).Some
of the involved clauses look like
these:Predicatesent_name(ent,string)
/* Converts between an entity name
and an internal entity name */entn(string,string)
/* Converts an entity to singular
form */entity(string) /* Gets all
entities */ent_synonym(string,string)
/* Synonyms for entities */Clausesent_name(Ent,Navn)
:- entn(E,Navn),ent_synonym(E,Ent),entity(Ent).ent_synonym(E,Ent)
:-synonym(E,Ent).ent_synonym(E,E).entn(E,N)
:-concat(E,"s",N).entn(E,N) :-free(E),
bound(N), concat(X,"ies",N),
concat(X,"y",E).entn(E,E).entity("name")
:-!.entity("continent"):-!.entity(X)
:- schema(X,_,_).Internal Names for
AssociationsIn the same way that
entities can have synonyms and
consist of several words, so can the
associations in the queries be
represented by several words. The
alternative forms for the
association names are stored in the
b(assoc) database predicate. b(assoc)
stores a list of words that can be
used for the internal association
name; for example:assoc("in",["in"])
assoc("in",["running","through"])
assoc("in",["runs","through"])
assoc("in",["run","through"])
assoc("with",["with"])
assoc("with",["traversed"])
assoc("with",["traversed","by"])The
predicate (get_assoc) is responsible
for recognizing an association in
the beginning of a list of words. It
does this by using the
nondeterministic version of append
to split the list up into two parts.
If the first part of the list
matches an alternative for an
association in the (assoc) predicate,
the corresponding internal
association name is
returned.get_assoc(IL,OL,A) :-
append(ASL,OL,IL), assoc(A,ASL).The
parser is responsible for
recognizing the query sentence
structure. There are many types of
sentences, but these are classified
by the parser into nine different
cases. Each of these nine cases has
alternatives in the domain (query).
The (query) domain is defined
recursively, which means it can
represent nested queries.Give me
cities -ENT - q_e(ENT)state with the
city new york -ENT ASSOC ENT CONST -
q_eaec(ENT,ASSOC,ENT,STR)rivers in
(....) -ENT ASSOC SUBQUERY - q_eaq(ENT,ASSOC,ENT,QUERY)rivers
longer than 1000 miles -ENT REL UNIT
VAL - q_sel(ENT,RELOP,UNIT,REAL)the
smallest (...) -MIN SUBQUERY - q_min(ENT,QUERY)the
biggest (..) -MAX SUBQUERY - q_max(ENT,QUERY)rivers
that does not traverse -ENT ASSOC
NOT SUBQ - q_not(ENT,QUERY)rivers
that are longer than1 thousand
milesor that run through texas -SUBQUERY
OR SUBQUERY - q_or(QUERY,QUERY)which
state borders nevadaand borders
arizona -SUBQUERY AND SUBQUERY -
q_and(QUERY,QUERY)The words that
users can type for minimum, maximum,
units, etc., are stored in the
language database section. The
definition in Geobase looks like
this:entitysize(entity,keyword)relop(keywords,relative_size)
/* relational operator */assoc(association_between_entities,keyword)
synonym(keyword,entity)ignore(keyword)min(keyword)max(keyword)
size(entity,keyword)unit(keyword,keyword)Parsing
by Difference ListsThe parser uses a
method called "parsing by difference
lists." The first two arguments of
the parsing predicates are the input
list and what remains of the list
after part of a query is stripped
off. In the last argument the parser
builds up a structure for the
query.The parser consists of several
predicates and clauses, each of
which is responsible for handling
special cases in recognizing the
query. If you want to understand
everything about the parser, study
the comments and use trace mode to
follow how Geobase parses various
queries.The following clause
recognizes the query How large is
the town new york. The filter gives
the parser list"large", "town", "new",
"york".s_attr([BIG,ENAME|S1],S2,E1,q_eaec(E1,A,E2,X)):-
/*First s_attr clause*/ent_name(E2,ENAME),
/*Entity type town is a city. Look
up entity in the language scheme*/size(E2,BIG),
/* look up city size is large */entitysize(E2,E1),
/* look up city scale is population
*/schema(E1,A,E2), /* look up scheme
population of city */get_ent(S1,S2,X),!./*
return an entity name and query */The
parser is also able to recognize the
more ambiguous query How large is
new york. Given this query, the
first clause for s_attr fails
because it expects an entity type (such
as as town or state). Then the
program calls the second clause for
s_attr, shown here.s_attr([BIG|S1],S2,E1,q_eaec(E1,A,E2,X)):-
/*Second s_attr clause*/get_ent(S1,S2,X),
size(E2,BIG),entitysize(E2,E1),
schema(E1,A,E2),ent(E2,X),!.Using
this clause, the parser decides that
new york refers to the city and that
large refers to the number of
citizens.Once the parser returns a
query, Geobase calls the (eval)
clause that actually determines the
query. The actual calls into the
database are made with the (db) and
(ent) predicates.
|