August 15, 2022

Introduction to Machine Learning for Evaluators

TEI 332: Introduction to Machine Learning for Evaluators

Instructor: Peter York

Description: There is a growing demand from public and private policymakers and funders to apply big data science and machine learning for evaluation. The demand is growing due to public awareness of how the private sector uses machine learning algorithms to create on-demand tools that cost-effectively augment human planning, assessment, prediction, and decision-making. In fact, government agencies like the National Science Foundation and the U.S. Department of Health and Human Services are currently using big data science and machine learning to evaluate their impact. When applied correctly, machine learning algorithms can significantly reduce the cost and time of conducting evaluations, including producing on-demand quasi-experimental actionable evidence on an ongoing basis.

In this introductory course, participants will learn the fundamentals of integrating the theory, methods, and machine learning algorithms of big data science into their evaluation approach. This will include an introduction to Bayesian theory, machine learning algorithms, predictive and prescriptive analytics, causal modeling, and addressing selection and algorithmic bias. The course will guide participants through an interactive step-by-step process of building evaluation models using primary and secondary datasets. This will include (1) finding and assessing the quality of existing data; (2) cleaning and preparing the data; (3) framing and aligning the data to your theory of change or logic model; (4) staging the evaluation to mitigate selection bias; (5) training machine learning algorithms to find and evaluate naturally occurring counterfactual experiments in history; and (6) evaluating and addressing the level and types of algorithmic bias in the results. This course will introduce machine learning algorithms for structured (quantitative, ordinal, and categorical) and unstructured (qualitative text) data modeling, including how to train machine learning algorithms to support conducting a mixed methods evaluation. For text analytics, participants will learn about natural language processing (NLP) algorithms that are used to improve the breadth and depth of qualitative analyses while significantly reducing the time it takes. The course will use an open-source, no-cost, no-code (knowledge of R or Python is not required) visual-based analytics platform – KNIME – and will introduce participants to its suite of analytic tools and machine learning algorithms.

Recommended Audience: This course is best suited for mid to late-career evaluators with experience conducting quantitative and mixed methods evaluations, especially preparing and analyzing primary and secondary datasets using analytic software packages like SPSS, SAS, and Stata.