# Statistics Research Experience for Undergraduate Students

**Statistics Research Experience for Undergraduate Students**

*Apply to become an undergraduate intern for the Iowa State University Department of Statistics 2023 REU summer program. *

**2023 REU Program Information**

Program Dates: June 12-August 4, 2023

Application Deadline: March 24, 2023

Apply Online: http://bit.ly/3IE7jJ3

Compensation: All admitted students receive a weekly stipend of $550 plus an allowance for travel, housing, and meals.

Contact: Anthony Greiter, learning and development specialist, agreiter@iastate.edu

The Iowa State University Department of Statistics is accepting applications for its summer research experience for undergraduate students (REU). The 2023 program will provide students with the opportunity to conduct hands-on research using their computational and statistical skills.

During the eight-week immersive internship program, students will work closely with peer mentors, faculty members, and graduate students on current research projects. They will gain valuable research experience, leadership skills, and a deeper understanding of statistics.

The Statistics REU program will begin on June 12 and continue through August 4. Students are expected to participate in program activities for approximately 40 hours each week and join in team meetings and other scheduled events. Students will present their research findings at the end-of-program poster session.

**Application Process**

Students of underrepresented groups are strongly encouraged to apply. Preference will be given to students who have completed their sophomore year in an undergraduate degree program.

Applications for the 2023 Statistics REU will be accepted until all available spaces have been filled. To guarantee an application is reviewed, please apply by March 24 at http://bit.ly/3IE7jJ3. If students are selected for the program, they will receive notification by April 7.

**2023 Research Projects**

REU students will work on a research project in teams composed of three undergraduate students and at least one graduate student under the direction of a faculty member. The team will include one Iowa State statistics student and two visiting students.

Students will work on the following research projects:

**Project 1: ****Monte Carlo Estimation of Small Area Indicators**

Small area estimation is the problem of constructing estimates for domains with small sample sizes. Examples of important small area parameters include the Gini coefficient and the poverty gap. The predictors of these small area parameters are defined as complex integrals. Numerical methods are needed to approximate the integrals. In undergraduate math classes, we learn to use a Riemann sum to approximate an integral. A statistical way to approximate the integral is to use simulation, or Monte Carlo. We will compare alternative integral approximations for complex small area parameters.

**Project 2: Estimating the Poverty Rate in Small Area**

Standard survey estimators can be unreliable when sample sizes in domains of interest are small. Small area estimation refers to the use of models to obtain efficient estimates for small domains. An important small-area parameter is the poverty rate. The poverty rate is defined formally as the proportion of individuals in a domain with income below the poverty line. We compare two estimators of the poverty rate. First, we define binary indicators of whether or not individual units are classified as in poverty. We then fit a logistic mixed model to the binary indicators. For the second approach, we model the incomes directly and then regard the poverty rate as a nonlinear parameter. We will compare the two approaches using simulated data.

**Project 3: Diagnostic Tests to Detect Avian Viral Pathogens**

Students will work with a team of statisticians, computer scientists, and veterinary professionals to develop new diagnostic tests for the rapid, *in situ *detection of avian viral pathogens, such as Avian Influenza Virus (IAV) and Infectious Bronchitis Virus (IBV), on the farm. Students will help to develop computational and statistical methods to identify these pathogens in modern Nanopore sequencing data applied to swab samples, with the long-term goal of rapidly detecting and managing these important diseases on the farm before outbreaks can spread. Students will develop skills in programming, big data management, genetics, and statistics to conduct the research.

**Project 4: Deep Learning Methods for Detecting Protein-DNA Binding Events**

Students will extend a modern deep learning method to detect protein binding sites in a genome (generically, "peak calling") to the relatively new technology of CUT&RUN sequencing. Knowing where a protein binds in a genome under particular biological conditions is important for elucidating how the cell responds to changing conditions. In this case, we are interested in understanding how hematopoietic stem cells differentiate into blood cells to ultimately use a patient's blood to cure blood malignancies. Students will develop skills in computing, deep learning models, statistics, and genetics to conduct this project.

**Project 5: Estimating the Probability of True “Hits” when Searching Databases of Bullet and Cartridge Case Images**

A crime has been committed, and firearms examiners have recovered spent bullets and cartridge cases from the scene. To generate investigative leads, examiners compare images of those bullets and cartridge cases to thousands of images that other examiners have uploaded onto a database. The hope is to get “a hit,” that is, to find one or more images in the database that look like the crime scene samples. The only available database at the time is called NIBIN and is proprietary, as is the algorithm that produces a similarity score between the query sample and the images in the database. Consequently, there is no way to know how many times a search may fail to find a real match or the probability that the search may result in false hits.

Using our images and algorithms developed by researchers at Iowa State University, we will assemble a small database and mimic what examiners do when they query NIBIN. To explore the probability of false positives, we will carry out searches when true matches are not included in the database. Students will learn to use R for the statistical calculations and to operate instruments that produce high-resolution images of the surface of bullets and cartridge cases.