2023 IISA Conference

  • 1st June to 4th June, 2023
  • Colorado School of Mines, Golden, Colorado USA.
  • IISA2023@intindstat.org

Short Courses and Workshops

The following short courses on topics of current interests will be offered as a part of the Conference:

  • Short Course 1 :-

    Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions

    Martin Slawski (George Mason University) and Emanuel Ben-David (US Census Bureau) (June 4th, 2023, 09:00 - 12:30)

    Priyanjali Bukke (Ph.D. Student at George Mason University)

  • Workshop :-

    Workshop on Machine Learning (ML) Algorithms and Their Applications

    Anwesha Bhattacharya and Agus Sudjianto , Wells Fargo (3rd June 2023, 13:30 to 17:30)

  • Short Course 2 :-

    Statistical Inference of Network Data: Past, Present, and Future

    Srijan Sengupta (North Carolina State University) (June 1st, 2023, 13:30 - 17:00)


Short Courses

Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions

View More

Workshop on Machine Learning (ML) Algorithms and Their Applications

View More

Statistical Inference of Network Data: Past, Present, and Future

View More

Short Courses

Primary affiliation

India

Outside India

Half day Short course (26th December)

USD 12.00

USD 50.00

Full day Short course (30th December)

USD 24.00

USD 100.00

Workshop on Machine Learning (ML) Algorithms and Their Applications

Date: Saturday 3rd June 2023, 13:30 to 17:30.


Title: Workshop on Machine Learning (ML) Algorithms and Their Applications


Instructor: Anwesha Bhattacharya and Agus Sudjianto, Wells Fargo


Description: In this four hour workshop, you will learn about commonly used ML algorithms and how they are used in practice. The algorithms covered will include Feedforward Neural Networks, Gradient Boosting, and Random Forest. PI_ML, an open-source toolbox with easy-to-use interface, will give participants hands-on experience in training and assessing performance of the algorithms on real datasets. Participants will also learn about how to interpret the results using post-hoc explainability techniques and assessing model robustness and model weaknesses. Applications of these algorithms in banking will also be described.


Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions

Date: Sunday June 4th, 2023, 09:00 - 12:30


Title: Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions


Instructor: Martin Slawski (George Mason University) and Emanuel Ben-David (US Census Bureau).Priyanjali Bukke (Ph.D. Student at George Mason University)


Description: Data analysis is often based on files that are the result of merging and integrating multiple data sets from different sources. In data integration, record linkage is an essential task for linking records across data sets that refer to the same entity. Record linkage is not error-free; there is a possibility that records belonging to different entities are mismatched, or that records belonging to the same entity are not identified. As a result, linkage error can significantly reduce the quality of the resulting data. In subsequent statistical analyses, it is, therefore, advisable to make suitable adjustments that account for potential bias caused by data contamination or sample selection introduced by record linkage. In this workshop, we present a tutorial covering probabilistic record linkage, sources of linkage error, and their consequences, as well as methods and software accounting for such errors to enable reliable post-linkage data analysis and inference.

Topics:

  • Overview of record linkage and entity resolution
  • Impact of linkage error on statistical analysis with linked data files
  • Linkage error adjustment and correction methods for a variety of regression and other popular multivariate analysis techniques
  • Hands-on training and practice of these techniques using R software


Bio:

  • Martin Slawski is an Assistant Professor in the Department of Statistics at George Mason University. His research interests include record linkage, data compression, high-dimensional data, the interface between statistics and optimization, and forensic statistics. His research has been supported by grants from NSF, the National Institute of Justice, and NIH. Before joining George Mason University, he was a postdoctoral associate in Statistics and Computer Science at Rutgers University. He received his Ph.D. in Computer Science from Saarland University, Germany, in 2015.

  • Emanuel Ben-David is a research mathematical statistician in the Center for Statistical Research and Methodology at the US Census Bureau. Before joining the US Census Bureau, he was a research assistant professor in the department of statistics at Columbia University (2012-2015), a postdoctoral associate in the department of statistics at Stanford University (2010-2012) and a postdoctoral fellow in the Statistical and Applied Mathematical Sciences Institute (2009-2010). Emanuel Ben-David received his PhD in statistics from Indiana University-Bloomington. His research interests include record linkage and data integration, survey sampling, graphical models, multivariate statistics, and applied optimization.

  • Priyanjali Bukke is a Ph.D. student in Statistical Sciences at George Mason University, co-advised by Martin Slawski and Brady West. She also completed her M.S. in Biostatistics and bachelor's degrees at Mason. Her research interests include approaches to analyzing linked data with potential mismatch errors and unstructured health-related data.

Statistical Inference of Network Data: Past, Present, and Future

Date: Sunday June 1st, 2023, 13:30 - 17:00


Title: Statistical Inference of Network Data: Past, Present, and Future


Instructor: Srijan Sengupta (North Carolina State University)


Description:

  • We live in a highly interconnected world where many physical, social, biological, and technological systems consist of agents or entities interacting with each other. Examples include a virus being transmitted over social contact networks, global trade between countries, and the human brain. Any such system can be represented as a network, by denoting the agents/entities as vertices, and the interactions between them as edges. This makes networks an important and ubiquitous type of data spanning a remarkable variety of complex systems.

  • The structure and configuration of networks are quite different from that of traditional forms of statistical data, necessitating the development of novel statistical methods for realistic modeling and reliable inference for network data. Fittingly, the last two decades have seen a remarkable surge in research aimed at developing such statistical methodology. In this short course, we will study this rapidly evolving field of statistics, which encompasses statistical models, algorithms, and inferential methods for analyzing data in the form of networks. We will also explore existing challenges and open problems in this area. The objective of the course is to provide a graduate level introduction to the statistical inference of network data as well as research opportunities.

  • Pre-requisites and Target Audience:

  • Participants should have at least Masters-level knowledge of the basic foundations of statistical inference, including common statistical distributions, basic probability theory, linear algebra, maximum likelihood estimation, Bayesian inference, and common statistical inferential concepts such as consistency of estimators, p-values, and power of tests. Familiarity with network data will be helpful but not necessary. The workshop is expected to be beneficial to experienced practitioners of statistics, graduate students, and academic researchers.

  • Reference text:

  • The following materials are useful as a general reference for this short course. Please note that going through these materials is not a pre-requisite for the course. Rather, these materials should be considered companion materials that are useful for preparing prior to the course and for continued engagement after the course. Other references will be shared during the course.

  • Goldenberg, A., Zheng, A. X., Fienberg, S. E., & Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2), 129-233.

  • Newman, M. (2018). Networks (Second Edition). Oxford university press. 9780198805090

  • Instructor Bio:

  • Srijan Sengupta is an Assistant Professor of Statistics at North Carolina State University. He received his Ph.D. in Statistics from the University of Illinois at Urbana-Champaign in July 2016.

  • He is working on formal inferential algorithms for network data and applying such algorithms to epidemiology, social sciences, and environmental health. He is also working on developing a statistical science of patient safety, focusing on adverse medical events due to human errors, medical devices, drug reactions, and radiation therapy.
  • Note:

    • 1. Short Course/Workshop registration: USD 50 each