Short Courses and Workshops
The following short courses on topics of current interests will be offered as a part of the Conference:
-
Short Course 1 :-
Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions
Martin Slawski (George Mason University) and Emanuel Ben-David (US Census Bureau) (June 4th, 2023, 09:00 - 12:30)
Priyanjali Bukke (Ph.D. Student at George Mason University)
-
Workshop :-
Workshop on Machine Learning (ML) Algorithms and Their Applications
Anwesha Bhattacharya and Agus Sudjianto , Wells Fargo (3rd June 2023, 13:30 to 17:30)
-
Short Course 2 :-
Statistical Inference of Network Data: Past, Present, and Future
Srijan Sengupta (North Carolina State University) (June 1st, 2023, 13:30 - 17:00)
Short Courses
Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions
Workshop on Machine Learning (ML) Algorithms and Their Applications
Statistical Inference of Network Data: Past, Present, and Future
Workshop on Machine Learning (ML) Algorithms and Their Applications
Date: Saturday 3rd June 2023, 13:30 to 17:30.
Title: Workshop on Machine Learning (ML) Algorithms and Their Applications
Instructor: Anwesha Bhattacharya and Agus Sudjianto, Wells Fargo
Description: In this four hour workshop, you will learn about commonly used ML algorithms and how they are used in practice. The algorithms covered will include Feedforward Neural Networks, Gradient Boosting, and Random Forest. PI_ML, an open-source toolbox with easy-to-use interface, will give participants hands-on experience in training and assessing performance of the algorithms on real datasets. Participants will also learn about how to interpret the results using post-hoc explainability techniques and assessing model robustness and model weaknesses. Applications of these algorithms in banking will also be described.
Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions
Date: Sunday June 4th, 2023, 09:00 - 12:30
Title: Data Analysis after Record Linkage: Sources of Error, Consequences, and Possible Solutions
Instructor: Martin Slawski (George Mason University) and Emanuel Ben-David (US Census Bureau).Priyanjali Bukke (Ph.D. Student at George Mason University)
Description: Data analysis is often based on files that are the result of merging and integrating multiple data sets from different sources. In data integration, record linkage is an essential task for linking records across data sets that refer to the same entity. Record linkage is not error-free; there is a possibility that records belonging to different entities are mismatched, or that records belonging to the same entity are not identified. As a result, linkage error can significantly reduce the quality of the resulting data. In subsequent statistical analyses, it is, therefore, advisable to make suitable adjustments that account for potential bias caused by data contamination or sample selection introduced by record linkage. In this workshop, we present a tutorial covering probabilistic record linkage, sources of linkage error, and their consequences, as well as methods and software accounting for such errors to enable reliable post-linkage data analysis and inference.
Topics:
- Overview of record linkage and entity resolution
- Impact of linkage error on statistical analysis with linked data files
- Linkage error adjustment and correction methods for a variety of regression and other popular multivariate analysis techniques
- Hands-on training and practice of these techniques using R software
Bio:
- Martin Slawski is an Assistant Professor in the Department of Statistics at George Mason University. His research interests include record linkage, data compression, high-dimensional data, the interface between statistics and optimization, and forensic statistics. His research has been supported by grants from NSF, the National Institute of Justice, and NIH. Before joining George Mason University, he was a postdoctoral associate in Statistics and Computer Science at Rutgers University. He received his Ph.D. in Computer Science from Saarland University, Germany, in 2015.
- Emanuel Ben-David is a research mathematical statistician in the Center for Statistical Research and Methodology at the US Census Bureau. Before joining the US Census Bureau, he was a research assistant professor in the department of statistics at Columbia University (2012-2015), a postdoctoral associate in the department of statistics at Stanford University (2010-2012) and a postdoctoral fellow in the Statistical and Applied Mathematical Sciences Institute (2009-2010). Emanuel Ben-David received his PhD in statistics from Indiana University-Bloomington. His research interests include record linkage and data integration, survey sampling, graphical models, multivariate statistics, and applied optimization.
- Priyanjali Bukke is a Ph.D. student in Statistical Sciences at George Mason University, co-advised by Martin Slawski and Brady West. She also completed her M.S. in Biostatistics and bachelor's degrees at Mason. Her research interests include approaches to analyzing linked data with potential mismatch errors and unstructured health-related data.
Statistical Inference of Network Data: Past, Present, and Future
Date: Sunday June 1st, 2023, 13:30 - 17:00
Title: Statistical Inference of Network Data: Past, Present, and Future
Instructor: Srijan Sengupta (North Carolina State University)
Description:
Pre-requisites and Target Audience:
Reference text:
Instructor Bio:
Note:
- 1. Short Course/Workshop registration: USD 50 each