Some
examples of problems that may be addressed by RTG members
We list below some specific topics
for initial collaboration. These topics will pose significant statistical
challenges, lead to new statistical methodologies/paradigms, and suggest new
theoretical research and doctoral dissertations for students in the program. In
addition, the topics will lend themselves to experiential learning by allowing
doctoral students, post-docs, and graduate advisors to collaborate with
scientists conducting research at partner laboratories.
Reliability
and Performance Assessment of Complex Systems
Large
Wireless Networks (with Lucent Technologies). Wireless networks are
pervasive, dynamic, and increasingly important in national defense and
emergency response (Pei and Gerla 2001; Malone 2004). They are, however, so
complex and rapidly changing that their field performance is difficult to
monitor, evaluate, and improve. This issue will be addressed as follows.
·
Data Sampling ─ Wireless networks produce
gigabytes of data in minutes on calls, signals, and network states, and the
data are complex because call control is dynamic and simultaneously shared by
several separately monitored base stations. Such data overwhelm the capability
of current analysis systems and are, consequently, little used. Thus, there is
an urgent need for coordinated, rapid, adaptive sampling that extends beyond
use of existing methods (e.g., Lynch 2003; Mattes and Mosig 2004) to extract
the rich information from such data for monitoring performance, detecting
network degradation, and optimizing quality of service.
·
Performance Monitoring ─ Recent trends in
wireless communications include the transition to packet-based, data-carrying
wireless networks of increased complexity, the introduction of new services
like wireless video, and the explosion in number of users. Much work has
recently been done on process monitoring (e.g., Wu and Meeker 2002; Apley and Lee 2003; Grigg and
Farewell 2004). Some work has also been done on network event and intrusion
detection (e.g., Becker et al. 1998; Manikopoulos and Papavassiliou 2002). The
existing methods, however, are not sufficient here, and we will develop new
monitoring metrics and schemes to evaluate network performance and
user-experienced quality.
·
Dynamic Visualization ─ Wireless networks are
complex: even a call from a stationary caller is maintained by several base
stations simultaneously, with the station of primary control depending on other
concurrent calls in the network. This leads to messy data with complex,
space/time-varying correlation structure, which requires new ways to visualize
performance in real time, across networks, equipments, users, and calls. This
capability is currently lacking, and we will extend existing methods (e.g., Pascoe
et al. 2002; Nagel and Granum 2004) to visualize performance.
·
Hierarchical Spatial/Temporal Models ─ Wireless
networks involve multiple streams of signaling events, as well as packet flows
and network summary statistics arriving at different time scales (event-driven,
regularly spaced, etc.), and require multiple levels of analysis. Analysis is
across networks, base stations, communities of interest, callers, calls, etc.
This requires the development of new hierarchical spatial-temporal models,
because existing ones (e.g., Waller et al. 1997; Wu and David 2002; Wikle 2003)
are not suited for network modeling.
Modern Power Systems (with the NSF-supported I/UCRC Power Systems Engineering
Research Center at ISU). This center is led by Professor James McCalley
of the Electrical and Computer Engineering Department, and one of the center’s
focus areas is power systems reliability. The reliability of our electric power
generating, transmission, and distribution systems is of critical national
importance. Aging infrastructures and new technologies in data acquisition,
storage, and processing are giving rise to new problems on the interface of
statistics and power engineering. Statistical analysis of reliability data has
traditionally focused on failure-time data, modeled as a function of limited
available environmental data (e.g., Singpurwalla 1995; Meeker and Escobar 1998,
chap. 17─19; Meeker et al. 2002). Reliability data are getting much
richer, however, mainly because of advances in sensor technology and decreases
in sensor costs. For example, detailed information is becoming available about
degradation over time (condition-monitoring data; e.g., see Han and Song 2003)
and environmental conditions (e.g., temperatures and voltage surges) to which
systems have been exposed. Development of appropriate statistical and reliability
models and methods for such data will have a strong positive impact on allowing
power engineers to allocate limited resources to maximize system reliability
and availability.
Nano-Scale
Metrology and Metrology for High Through-Put Devices (with NIST)
Computation has significantly
enabled research and development in science and engineering, complementing and
enhancing traditional approaches based solely on theory and physical
experimentation. The particular critical needs of nano-scale metrology and instrumentation
lie in the following two areas of the computational and statistical sciences.
·
Modeling and Simulation Coordinated with High-Precision
Nano-Scale Measurements ─ No single existing modeling approach can
adequately cover the enormous range of length and time scales that must be
addressed in nano-metrology. For example, while capable of capturing the
essential physics with a small number of atoms over a few nanoseconds, quantum
mechanical/electronic structure calculations quickly become difficult when the
number of atoms grows. A challenging problem in computational nanotechnology is
thus the development of multi-scale methods for the modeling of nano-scale
components (material, mechanical, fluidic, etc.) encountered in
nano-manufacturing systems (e.g., see La Magna et al. 2002; Srivastava and
Atluri 2002; Chandra and Namilae 2003; Shen and Atluri 2004). These methods
will combine molecular dynamics, coarse-grained meso-scopic dynamics,
stochastic dynamics, and continuum theories. Here, techniques for transforming
information from one type of simulation to another (such as homogenization for
coarse-graining) will be needed.
·
Development of Statistical Tools to Effectively Manage and
Exploit Nano-Scale Measurements and Other High-Volume Scientific Data ─
Uncertainty and correlation are inherent in such data (Weckenmann et al. 2004).
Specialized statistical tools must be integrated with instrumentation to
collect, process, verify, correct, and condense the enormous volume of data,
and off-load the data for archiving. We will develop new statistical methods
for data sampling to monitor and control instruments, for real-time estimation,
and for stochastic signal detection, identification, and classification.
The above issues also arise in other modern physical
sciences with large-scale datasets. For example, gigabytes of data are common
in high-throughput experiments and in image data resulting from modern
instruments (e.g., scanning electron microscopes and atomic force microscopes).
When modeling and simulation is used to validate or stand in for physical
measurement, gigabytes of data are generated to describe state variables at
each point in space for each time step.
Computer
Models as Alternatives to Physical Models (with LANL)
Interest
in using computer models in physical sciences and engineering applications is
growing rapidly (e.g., see Currin et al. 1991; Welch et al. 1992; Berk et al.
2002). In particular, the LANL Statistics group contributes heavily to studies
involving models of military systems, industrial processes, transportation
systems, earth-ocean dynamics, and other large-scale phenomena. Three important
research topics in computer modeling are the following.
·
Sensitivity/Uncertainty Analysis ─ The
importance of this topic can be seen from much of the recent work (e.g.,
Saltelli et al. 1999; Saltelli et al. 2000; Oakley and O’Hagan 2002, 2004;
Morris et al. 2004). The goal is to understand which specific inputs (often
among 1000 or more inputs) are most influential in determining output behavior,
generally using as few (often expensive) runs of the model as possible. This
allows subsequent effort to be much better focused, and is a critical first
step in understanding the empirical behavior of a computer model.
·
Model Validation ─ In simple cases, this is
based on comparing computer output to data acquired in physical experiments to
determine whether the model matches reality. This is not, however, physically
possible in many systems, such as the weapons systems studied at LANL. The
physically observable quantities may be only intermediate values, or have
indirect relationships to the outputs produced by the model. Methodology
development for comprehensive model validation will require substantial and
fundamental research (Bayarri et al. 2002).
·
Model Calibration ─ In almost any realistic
context, a computer model cannot possibly express all the complexity of the
system to be modeled. For use in specific predictive contexts, model
calibration ─ adjusting model parameters so that outputs are relevant to
the prediction context ─ is required (Kennedy and O’Hagan 2001). Of
particular interest is understanding the impact of this activity on the
uncertainty of model predictions.
Performance
Assessment of Complex Systems without Conventional Testing (with LANL)
The
reliability of newly designed complex systems is often difficult to assess by
traditional means, because direct testing is either impractical or impossible.
Examples include physical systems such as large weapons systems (e.g.,
Farquharson et al. 2001), and virtual systems such as detailed response plans
for dealing with large-scale national disasters (e.g., Olshansky and Wu 2001).
It is conceptually possible to describe systems of these kinds in great detail,
but their reliability and performance in practice cannot be assessed directly
because traditional experiments cannot be carried out. There is, however,
valuable information usually available but indirectly related to the
reliability of such a system. For example, such information includes:
·
Physical testing data from similar systems, or from settings
not of present interest, e.g., weapons-test data on systems in use before the
adoption of the Nuclear Test Ban Treaty,
·
Physical testing data from subsystems, e.g., data from
lab-scale tests of weapon guidance systems under carefully controlled
conditions,
·
Expert opinion, e.g., opinion reflecting the experience of
disaster-relief managers in dealing with specific components of a response
strategy, and
·
Numerical results of simulation studies, e.g., results based
on computer models of components of a disaster-response plan and interactions
between the components.
While such indirect
information is relevant to assessing the system’s performance, it is
challenging to develop appropriate methodology that effectively uses this
information (Fuentes et al. 2003). Because many of the systems that are central
to the LANL mission are difficult to assess directly, the LANL Statistics group
has substantial experience in research and consulting related to the analysis
based on multiple sources of information (e.g., Johnson et al. 2002; Reese et
al. 2004). Nevertheless, statistical research in this area has just begun, and
we will work with the LANL Statistics group to develop statistical methodology
that will be effective for a wide variety of applications.
Non-Destructive Evaluation (with the
NSF-supported I/
This center (CNDE), led
by Professor Bruce Thompson (member of the National Academy of Engineering),
has an interdisciplinary group of 50 faculty and staff members working in close
cooperation with industry to advance the field of non-destructive evaluation
(NDE). CNDE also has about 65 undergraduates and graduate students. As a
practical discipline, NDE is widely used in many areas of application, such as
aerospace reliability, nuclear and fossil-fuel power generation, chemical plant
reliability, and condition-based monitoring of operating systems (Krautkramer
and Krautkramer 1990; Halmshaw 1991). NDE involves data generation and
interpretation. Much NDE research focuses on the development of new and better
NDE methods and the development of physics-based models that provide
predictions of distributions of signal and noise for inspection /evaluation of
NDE systems (e.g., Krautkramer and Krautkramer 1990; Halmshaw 1991; von
Kreutzbruck et al. 2001; Liu and Forsyth 2004). Such physics-based models
further fundamental understanding of NDE systems, and reduce the amount of
expensive physical experimentation necessary in development and qualification
of new NDE systems. Statistics plays an important role in NDE (e.g., Hovey and
Berens 1989; Sweeting 1995; Olin and Meeker 1996; Leemans and Forsyth 2004).
Important performance metrics, such as probability of detection, depend on the
variability in NDE data. The use of physics-based models has, meanwhile,
increased opportunities for research in statistical theory and methods for NDE
applications. Since 1989, W. Meeker has served as the statistician on many CNDE
projects, and CNDE typically supports one or two Statistics graduate students
at ISU each year to do research and to help with projects.
NDE data are generated in both
actual applications (field data) and laboratory experiments. Field data are
used to quantify the actual properties of an operating NDE system, and
laboratory data are needed to quantify those parts of the inspection process
that are not yet understood well enough to have a physics-based model.
(Reduction in reliance on expensive laboratory experiments is an important
goal.) Inspection data are complex because they often involve contaminated
errors in the predictors (input variables), censoring, and/or truncation. In
efforts important to NDE scientists, we propose to develop new statistical methodology
for the analysis of complicated inspection data and the (highly multivariate)
data resulting from developing NDE methods. In addition, we will develop new
Bayesian statistical methods to deal with complicated inverse problems of flaw
characterization based on NDE data.
Combinatorial
Discovery in Chemistry, Biology, Materials Science, and Engineering (with the Combinatorial
Discovery Initiative at ISU)
Combinatorial
Science embodies the use of massively parallel strategies for the creation and
high-throughput testing of enormous numbers of samples, organized in sample libraries, for accelerated scientific
discovery (Szostak 1997; Borman 1998; Watkins 2001). This approach to drug
discovery has received substantial attention in recent years (Borman 1998), and
analogous approaches are currently being developed in many other areas (e.g.,
Cawse 2001; Panicker et al. 2004). The Combinatorial Discovery Initiative at
ISU, led by Professor Marc Porter of the Chemistry Department, involves
participating faculty members from 10 academic departments and four colleges at
ISU. Specific research areas include the development of rapid assays for
investigating the structure and selectivity of catalysts, the role of
biomaterial structures in biochemical interactions, and the impact of
interfacial structures of various length scales in the development of
nano-materials. Broad application areas with potentially great benefit from the
new techniques developed by this work include energy conversion/generation and
drug delivery systems development. In each case, enormous and complex data
structures are used as input and generated as output. We will develop new
statistical concepts and methods for the design
of appropriate libraries, and for the analysis techniques for screening that take advantage of the
properties of the specific physical system under study. Analysis of massive observational data sets has drawn the
attention of many research statisticians over the last decade (e.g., Hastie et
al. 2001; Bozdogan 2004). The rapidly expanding area of Combinatorial Science, however, requires new statistical
approaches for massive design and analysis in carefully controlled experimental
contexts.