DATE AND TIME: Monday, April 29, 2001 4:10 p.m.
PLACE: 319 Snedecor
SPEAKER:
Elizabeth D. Liddy, Center for Natural Language Processing,
School of Information Studies, Syracuse University, Syracuse, NY
TITLE:
Towards Improved Access to Statistical Information
ABSTRACT:
With the ever-increasing availability of statistical information on
the
Web, it is important to understand why and how people seek and use
statistical information, and to develop and test prototype interfaces
that
aid in finding, understanding, and using tables. Our research, funded
by
NSF's Digital Government Program, is aimed at empowering users to ask
questions quite naturally, the same way they do when submitting email
queries to a virtual reference service, but here with dynamic interaction
with the tables possible. This methodology uses Natural Language Processing
(NLP) to interpret and represent a user's needs and to match this
representation against the metadata representation of tables' contents
to
find the requested data.
Our research utilized 1,000 email queries gathered from logs of users'
seeking statistical information. These were analyzed in order to determine
the dimensions of interest in typical statistical queries, as well
as the
linguistic regularities that can be captured in a statistical-query
sublanguage grammar. We developed an ontology of query dimensions using
this data-up analysis of the queries, and extended the ontology where
necessary with values from actual tables. Next we developed an NLP
statistical-query sublanguage grammar that enables the system to
semantically parse users' queries and produce a template-based internal
query representation, which will next be used to map these dimensions
into
the tables' metadata.
COFFEE: 3:45 p.m., 104 Snedecor Hall