CSAFE and STAT Students Participate in Summer Undergraduate Research Symposium

Several students participating in undergraduate statistics research this summer presented posters in the Great Hall of the Memorial Union on Thursday, August 3.

Students with poster

Estimating Probability of True ‘Hits’:  Cathryn Barbour, Blanca Parker

Advisors: Dr. Alicia Carriquiry, Dr. Heike Hoffmann

Abstract: A crime has been committed and firearms examiners have recovered a bullet from the scene. To generate investigative leads, examiners look for a “hit” or a match between the crime scene bullet and a known source. The current practice is to deploy NIBIN: a large proprietary database consisting of documented firearms cases. Third party validation of the accuracy of NIBIN’s algorithm that returns a list of potentially similar bullets to the one submitted as a query is not possible at this time. We seek to explore the probability of a true “hit” and risk of a false positive. The density of similarity scores of pairs of bullets fired from different guns (DS) and by the same gun (SS) are modeled by beta distributions. Assuming a single true match is in the database, we then simulate a search by sampling n items: n-1 from the DS density and one from the SS density. We discover that false positives increase dramatically with the size of the database; our simulated databases containing 1,000 to 10,000 items. As of July 14th, 2023, NIBIN contains 5.7 million items.  

Students with poster

Assessing uniformity of coverage in PCR-amplified nanopore sequencing to diagnose avian infectious bronchitis virus in chickens: Emily Lopez, Jaanve Mehta

Advisor: Dr. Karin Dorman

Abstract: Avian Infectious Bronchitis Virus (IBV) causes a highly contagious respiratory disease found in poultry that can cause trouble breathing, nasal discharge, misshapen eggs, and/or reduced egg production resulting in inflated egg prices when salable products are limited due to this disease. Currently, there are no accurate, rapid tests to diagnose IBV. To accurately and rapidly diagnose and limit the spread of IBV, our collaborators have proposed utilizing Nanopore sequencing technology to rapidly detect IBV, and other pathogens, through real-time sequencing of nucleic acids in field samples. However, two problems arise for accurately identifying IBV in a sample: low levels of IBV in the samples make it difficult to detect reliably, and IBV is highly susceptible to mutations, so new variants are constantly arising. We can accommodate strain variability and detect low abundance IBV sequences by utilizing PCR to amplify the whole S1 gene in 11 short (approx. 200 base pair) tiled amplicons (primers designed by PrimalScheme). The simultaneous use of 11 primer pairs allows IBV detection despite the failure of one or more primers due to mutation; however, the goal is to design primers that uniformly amplify the entire S1 gene in currently known IBV strains. To test the protocol, these primer sets were used on ten positive samples, including seven clinical samples. We designed a bioinformatics pipeline to obtain the number of reads of each amplicon for each of the ten samples. We discovered and accommodated unexpected amplicon products and amplicon fragmentation. We used a multinomial probability model to test for uniform coverage among the 11 amplicons and discovered significant non-uniformity. We investigated the causes of nonuniform amplicon coverage to suggest strategies for better primer design.

Students with poster

Small Area Estimation for Poverty Rate Indicators: Lavion Sumerlin, Hazer Becic

Advisor: Dr. Emily Berg

Abstract: Small area estimation is the problem of constructing estimates for domains with small sample sizes. An important small area parameter is the poverty rate, defined formally as the proportion of individuals in a domain with income below the poverty line. We compare two estimators of the poverty rate. One specifies a logistic mixed model for binary indicators that individuals are below the poverty line, and the other assumes that the log incomes have normal distributions. We apply the methods to income data for individuals in Spanish provinces. We compare the two approaches for predicting province-level poverty rates.


Applications of Confocal Microscopes on Toolmark Analysis: Eden Amin, Samantha Springer, Nate Simon, Sanika Gokakkar, Dr. Jeff Salyards, Dr. Heike Hofmann

Abstract: We lack published articles demonstrating the potential of wirecutter toolmark comparisons. This study involves creating test cuts with aluminum wire using Kaiweets wirecutters. Our preliminary findings show that test cuts have a small area of no striations, or “smush”, before leaving any striae. In the next steps, we will create a program to extract three-dimensional data, or a signature, from the test cuts, ultimately determining accuracy as a function of the area of the cut. For this study, we have created a method for achieving reliable scans of the aluminum wire on a confocal microscope. Our initial data collection, consisting of four wires cut per wire cutter at three separate locations (inner, middle, and outer), has shown some variance between the location of the test cut on the wire cutters and the scan itself. Statistical analysis of the cuts has proven there are distinguishing factors between wire cuts and being able to identify the replicate cuts. Each packaged wire has a corresponding file within an organized folder system to easily relocate and compare for future analysis.


Using Neural Networks to Identify Interest Points in Shoe Treads: Matthew Nissen, Gautham Venkatasubramanian

Abstract:  The goal of this project was to create a machine learning algorithm to detect corner points on pictures of shoe treads. Current corner detection software is only good at picking up on sharp, well defined corners which footprints rarely have. In the future, this type of technique could be used to aid further analysis by mapping the corner points onto shoe treads before more advanced techniques could work.

    We first manually marked shoeprint photos with dots on the corners, and sampled from these images to create our dataset of corner pixels and non-corner pixels. We then experimented with different neural network structures to optimize performance. We also ran many trials checking through hyperparameters, finding the optimal settings one by one. Examples of these hyperparameters are learning rates, number of training cycles, image size, and layer numbers.

    The project was largely successful, creating a model with over 90% accuracy on data it had never seen before. We found that simple models worked best overall, allowing for strong outcomes while preventing overfitting and speeding up computation times.

    This model will need to be continuously improved however, because the pictures contain thousands of pixels and marking 10% of them incorrectly means hundreds of wrong marks. What we have now will work as a powerful starting point for this future research.


Expanding ShoeCase: A Mock Crime Scene Footwear Impression Database: Saniya Lyles, Tiffany Ongtowasruk, Abigail Tibben, Gautham Venkatasubramanian

Abstract: One of the continuous problems in forensic science is a lack of data. Creating datasets representative of casework is a challenge, because it can be difficult to obtain large quantities of images collected under the same protocol and with the same shoes. Data collected in this way, however, can be of significant use for researchers and training within forensic science disciplines.   

    The current ShoeCase dataset is active on ISU Datashare, with mixed impression types, flooring, lift techniques, and digital file types, the complete dataset will include over 900 shoeprint images contained in more than 3,000 digital files. While this allows further research in forensics, there were only two shoe types for these impressions. The goal is to make this dataset as robust and authentic as possible, which will allow for higher levels of testing and certainty with current and future research. 

    The REU team at the Center for Statistics and Applications in Forensic Evidence (CSAFE) worked to expand the data that we currently have with new types of shoes and alternate collection protocols. Our poster presentation will explain why this dataset is important, the process of collecting and processing images, as well as the importance of documented variability within datasets.


Perceived housing problems and depressive symptoms among middle-aged and older Americans: Feifan Cao, Peiyi Lu, Mack Shelley

Abstract: Housing insecurity and depression are prevalent among older Americans. In 2019, 37.1 million US households were burdened by housing costs. About 10% of US older adults (aged 65+) reported depression in a national survey in 2014-2015. Housing insecurity has been shown to be associated with worse mental health in prior research. However, previous studies mostly examined one dimension of housing insecurity (e.g., affordability) and few focused on older adults. This study aimed to examine the relationship of perceived housing problems with depressive symptoms among middle-aged and older Americans. Negative binomial models estimated using data from the Health and Retirement Study from 2006, 2010, 2014, and 2018 showed that about 5%-7% of respondents had housing problems during every study visit, 5.67% of respondents experienced persistent housing problems over the study period of 12 years, having housing problems was associated with significantly higher risk of depressive symptoms, and a dose-response relationship was observed in the severity and duration of housing problems.