Some examples of problems that may be addressed by RTG members

 

We list below some specific topics for initial collaboration. These topics will pose significant statistical challenges, lead to new statistical methodologies/paradigms, and suggest new theoretical research and doctoral dissertations for students in the program. In addition, the topics will lend themselves to experiential learning by allowing doctoral students, post-docs, and graduate advisors to collaborate with scientists conducting research at partner laboratories.

 

Reliability and Performance Assessment of Complex Systems

          Large Wireless Networks (with Lucent Technologies). Wireless networks are pervasive, dynamic, and increasingly important in national defense and emergency response (Pei and Gerla 2001; Malone 2004). They are, however, so complex and rapidly changing that their field performance is difficult to monitor, evaluate, and improve. This issue will be addressed as follows.

·        Data Sampling ─ Wireless networks produce gigabytes of data in minutes on calls, signals, and network states, and the data are complex because call control is dynamic and simultaneously shared by several separately monitored base stations. Such data overwhelm the capability of current analysis systems and are, consequently, little used. Thus, there is an urgent need for coordinated, rapid, adaptive sampling that extends beyond use of existing methods (e.g., Lynch 2003; Mattes and Mosig 2004) to extract the rich information from such data for monitoring performance, detecting network degradation, and optimizing quality of service.

·        Performance Monitoring ─ Recent trends in wireless communications include the transition to packet-based, data-carrying wireless networks of increased complexity, the introduction of new services like wireless video, and the explosion in number of users. Much work has recently been done on process monitoring (e.g., Wu and Meeker 2002; Apley and Lee 2003; Grigg and Farewell 2004). Some work has also been done on network event and intrusion detection (e.g., Becker et al. 1998; Manikopoulos and Papavassiliou 2002). The existing methods, however, are not sufficient here, and we will develop new monitoring metrics and schemes to evaluate network performance and user-experienced quality.

·        Dynamic Visualization ─ Wireless networks are complex: even a call from a stationary caller is maintained by several base stations simultaneously, with the station of primary control depending on other concurrent calls in the network. This leads to messy data with complex, space/time-varying correlation structure, which requires new ways to visualize performance in real time, across networks, equipments, users, and calls. This capability is currently lacking, and we will extend existing methods (e.g., Pascoe et al. 2002; Nagel and Granum 2004) to visualize performance.

·        Hierarchical Spatial/Temporal Models ─ Wireless networks involve multiple streams of signaling events, as well as packet flows and network summary statistics arriving at different time scales (event-driven, regularly spaced, etc.), and require multiple levels of analysis. Analysis is across networks, base stations, communities of interest, callers, calls, etc. This requires the development of new hierarchical spatial-temporal models, because existing ones (e.g., Waller et al. 1997; Wu and David 2002; Wikle 2003) are not suited for network modeling.

 

          Modern Power Systems (with the NSF-supported I/UCRC Power Systems Engineering Research Center at ISU). This center is led by Professor James McCalley of the Electrical and Computer Engineering Department, and one of the center’s focus areas is power systems reliability. The reliability of our electric power generating, transmission, and distribution systems is of critical national importance. Aging infrastructures and new technologies in data acquisition, storage, and processing are giving rise to new problems on the interface of statistics and power engineering. Statistical analysis of reliability data has traditionally focused on failure-time data, modeled as a function of limited available environmental data (e.g., Singpurwalla 1995; Meeker and Escobar 1998, chap. 17─19; Meeker et al. 2002). Reliability data are getting much richer, however, mainly because of advances in sensor technology and decreases in sensor costs. For example, detailed information is becoming available about degradation over time (condition-monitoring data; e.g., see Han and Song 2003) and environmental conditions (e.g., temperatures and voltage surges) to which systems have been exposed. Development of appropriate statistical and reliability models and methods for such data will have a strong positive impact on allowing power engineers to allocate limited resources to maximize system reliability and availability.

 

Nano-Scale Metrology and Metrology for High Through-Put Devices (with NIST)

          Computation has significantly enabled research and development in science and engineering, complementing and enhancing traditional approaches based solely on theory and physical experimentation. The particular critical needs of nano-scale metrology and instrumentation lie in the following two areas of the computational and statistical sciences.

·        Modeling and Simulation Coordinated with High-Precision Nano-Scale Measurements ─ No single existing modeling approach can adequately cover the enormous range of length and time scales that must be addressed in nano-metrology. For example, while capable of capturing the essential physics with a small number of atoms over a few nanoseconds, quantum mechanical/electronic structure calculations quickly become difficult when the number of atoms grows. A challenging problem in computational nanotechnology is thus the development of multi-scale methods for the modeling of nano-scale components (material, mechanical, fluidic, etc.) encountered in nano-manufacturing systems (e.g., see La Magna et al. 2002; Srivastava and Atluri 2002; Chandra and Namilae 2003; Shen and Atluri 2004). These methods will combine molecular dynamics, coarse-grained meso-scopic dynamics, stochastic dynamics, and continuum theories. Here, techniques for transforming information from one type of simulation to another (such as homogenization for coarse-graining) will be needed.

·        Development of Statistical Tools to Effectively Manage and Exploit Nano-Scale Measurements and Other High-Volume Scientific Data ─ Uncertainty and correlation are inherent in such data (Weckenmann et al. 2004). Specialized statistical tools must be integrated with instrumentation to collect, process, verify, correct, and condense the enormous volume of data, and off-load the data for archiving. We will develop new statistical methods for data sampling to monitor and control instruments, for real-time estimation, and for stochastic signal detection, identification, and classification.

 

The above issues also arise in other modern physical sciences with large-scale datasets. For example, gigabytes of data are common in high-throughput experiments and in image data resulting from modern instruments (e.g., scanning electron microscopes and atomic force microscopes). When modeling and simulation is used to validate or stand in for physical measurement, gigabytes of data are generated to describe state variables at each point in space for each time step.

 

Computer Models as Alternatives to Physical Models (with LANL)

          Interest in using computer models in physical sciences and engineering applications is growing rapidly (e.g., see Currin et al. 1991; Welch et al. 1992; Berk et al. 2002). In particular, the LANL Statistics group contributes heavily to studies involving models of military systems, industrial processes, transportation systems, earth-ocean dynamics, and other large-scale phenomena. Three important research topics in computer modeling are the following.

·        Sensitivity/Uncertainty Analysis ─ The importance of this topic can be seen from much of the recent work (e.g., Saltelli et al. 1999; Saltelli et al. 2000; Oakley and O’Hagan 2002, 2004; Morris et al. 2004). The goal is to understand which specific inputs (often among 1000 or more inputs) are most influential in determining output behavior, generally using as few (often expensive) runs of the model as possible. This allows subsequent effort to be much better focused, and is a critical first step in understanding the empirical behavior of a computer model.

·        Model Validation ─ In simple cases, this is based on comparing computer output to data acquired in physical experiments to determine whether the model matches reality. This is not, however, physically possible in many systems, such as the weapons systems studied at LANL. The physically observable quantities may be only intermediate values, or have indirect relationships to the outputs produced by the model. Methodology development for comprehensive model validation will require substantial and fundamental research (Bayarri et al. 2002).

·        Model Calibration ─ In almost any realistic context, a computer model cannot possibly express all the complexity of the system to be modeled. For use in specific predictive contexts, model calibration ─ adjusting model parameters so that outputs are relevant to the prediction context ─ is required (Kennedy and O’Hagan 2001). Of particular interest is understanding the impact of this activity on the uncertainty of model predictions.

 

Performance Assessment of Complex Systems without Conventional Testing (with LANL)

          The reliability of newly designed complex systems is often difficult to assess by traditional means, because direct testing is either impractical or impossible. Examples include physical systems such as large weapons systems (e.g., Farquharson et al. 2001), and virtual systems such as detailed response plans for dealing with large-scale national disasters (e.g., Olshansky and Wu 2001). It is conceptually possible to describe systems of these kinds in great detail, but their reliability and performance in practice cannot be assessed directly because traditional experiments cannot be carried out. There is, however, valuable information usually available but indirectly related to the reliability of such a system. For example, such information includes:

·        Physical testing data from similar systems, or from settings not of present interest, e.g., weapons-test data on systems in use before the adoption of the Nuclear Test Ban Treaty,

·        Physical testing data from subsystems, e.g., data from lab-scale tests of weapon guidance systems under carefully controlled conditions,

·        Expert opinion, e.g., opinion reflecting the experience of disaster-relief managers in dealing with specific components of a response strategy, and

·        Numerical results of simulation studies, e.g., results based on computer models of components of a disaster-response plan and interactions between the components.

While such indirect information is relevant to assessing the system’s performance, it is challenging to develop appropriate methodology that effectively uses this information (Fuentes et al. 2003). Because many of the systems that are central to the LANL mission are difficult to assess directly, the LANL Statistics group has substantial experience in research and consulting related to the analysis based on multiple sources of information (e.g., Johnson et al. 2002; Reese et al. 2004). Nevertheless, statistical research in this area has just begun, and we will work with the LANL Statistics group to develop statistical methodology that will be effective for a wide variety of applications.

 

Non-Destructive Evaluation (with the NSF-supported I/UCRC Center for Non-Destructive Evaluation at ISU)

          This center (CNDE), led by Professor Bruce Thompson (member of the National Academy of Engineering), has an interdisciplinary group of 50 faculty and staff members working in close cooperation with industry to advance the field of non-destructive evaluation (NDE). CNDE also has about 65 undergraduates and graduate students. As a practical discipline, NDE is widely used in many areas of application, such as aerospace reliability, nuclear and fossil-fuel power generation, chemical plant reliability, and condition-based monitoring of operating systems (Krautkramer and Krautkramer 1990; Halmshaw 1991). NDE involves data generation and interpretation. Much NDE research focuses on the development of new and better NDE methods and the development of physics-based models that provide predictions of distributions of signal and noise for inspection /evaluation of NDE systems (e.g., Krautkramer and Krautkramer 1990; Halmshaw 1991; von Kreutzbruck et al. 2001; Liu and Forsyth 2004). Such physics-based models further fundamental understanding of NDE systems, and reduce the amount of expensive physical experimentation necessary in development and qualification of new NDE systems. Statistics plays an important role in NDE (e.g., Hovey and Berens 1989; Sweeting 1995; Olin and Meeker 1996; Leemans and Forsyth 2004). Important performance metrics, such as probability of detection, depend on the variability in NDE data. The use of physics-based models has, meanwhile, increased opportunities for research in statistical theory and methods for NDE applications. Since 1989, W. Meeker has served as the statistician on many CNDE projects, and CNDE typically supports one or two Statistics graduate students at ISU each year to do research and to help with projects.

 

NDE data are generated in both actual applications (field data) and laboratory experiments. Field data are used to quantify the actual properties of an operating NDE system, and laboratory data are needed to quantify those parts of the inspection process that are not yet understood well enough to have a physics-based model. (Reduction in reliance on expensive laboratory experiments is an important goal.) Inspection data are complex because they often involve contaminated errors in the predictors (input variables), censoring, and/or truncation. In efforts important to NDE scientists, we propose to develop new statistical methodology for the analysis of complicated inspection data and the (highly multivariate) data resulting from developing NDE methods. In addition, we will develop new Bayesian statistical methods to deal with complicated inverse problems of flaw characterization based on NDE data.

 

 

Combinatorial Discovery in Chemistry, Biology, Materials Science, and Engineering (with the Combinatorial Discovery Initiative at ISU)

          Combinatorial Science embodies the use of massively parallel strategies for the creation and high-throughput testing of enormous numbers of samples, organized in sample libraries, for accelerated scientific discovery (Szostak 1997; Borman 1998; Watkins 2001). This approach to drug discovery has received substantial attention in recent years (Borman 1998), and analogous approaches are currently being developed in many other areas (e.g., Cawse 2001; Panicker et al. 2004). The Combinatorial Discovery Initiative at ISU, led by Professor Marc Porter of the Chemistry Department, involves participating faculty members from 10 academic departments and four colleges at ISU. Specific research areas include the development of rapid assays for investigating the structure and selectivity of catalysts, the role of biomaterial structures in biochemical interactions, and the impact of interfacial structures of various length scales in the development of nano-materials. Broad application areas with potentially great benefit from the new techniques developed by this work include energy conversion/generation and drug delivery systems development. In each case, enormous and complex data structures are used as input and generated as output. We will develop new statistical concepts and methods for the design of appropriate libraries, and for the analysis techniques for screening that take advantage of the properties of the specific physical system under study. Analysis of massive observational data sets has drawn the attention of many research statisticians over the last decade (e.g., Hastie et al. 2001; Bozdogan 2004). The rapidly expanding area of Combinatorial Science, however, requires new statistical approaches for massive design and analysis in carefully controlled experimental contexts.