Research Themes

I integrate the theory and practice of artificial intelligence (AI) and machine learning (ML) with statistics. AI/ML methods often forego heavy model assumptions of classical statistics in favor of a model-free data-driven approach. Though this lends scalability and flexibility to the AI/ML methods, they frequently neglect the data's inherent structure. Classical statistics has long been used to model these domain-specific structures, with scientists' domain expertise as a foundation. I combine the power of statistical modeling with the scalability and flexibility of AI/ML, bringing the best of both worlds together.

I am especially interested in understanding the dependence structure in the data. AI/ML models often ignore the data's dependent structure, assuming independence implicitly or explicitly. However, this dependence is frequently of crucial importance: failing to address it may result in poor efficacy, and the structure itself may be of scientific significance. I turn these challenges into assets by explicitly modeling the dependence using statistical tools. Using the knowledge that the observations or features are dependent, I "borrow strength" across the rows or columns of the data, increasing the power and accuracy of the AI/ML approaches while also providing insight into the dependence structure itself. I apply this concept to integrate spatial and temporal models into AI/ML framework, which is a key element of my methodological research.

My research focuses on collaborating with scientists to solve open scientific challenges by merging AI/ML approaches with domain expertise via statistics. My research paradigm on data dependence is of fundamental interest in environmental science, biomedical sciences (e.g., genetics, genomics, neuroscience, and radiology), earth system sciences, finance, data privacy, and algorithmic fairness.