Leading with Data Science

LWDS consists of 10 Real Estate Data Analytics (REDA) pre-recorded workshops + 6 Live Sessions. There are 3 optional capstone projects. Participants who complete one of these capstone projects receive a certificate.

Understanding of Housing Industry Data
Understanding of Data Science
Participants gained hands-on experience with current year-to-date secondary market data, as well as 2 billion+ rows of historical loan performance and mortgage transaction data - both single-family and multi-family - through live Snowflake data shares and interactive visualizations in HMDAVision; students also had access to a class wiki with rich housing and mortgage industry backgrounders and strategy briefs.
Students were introduced to different types of inference; how data science models relate to rules, causes, and effects; how the data science process can begin with either a focused question or with data; concepts relating to balancing use cases, models, and data; and how data wrangling and feature engineering fit in the data science pipeline, with a live demo of AWS SageMaker with NY Fed mortgage debt and delinquency data.
Understanding of Machine Learning
Participants learned about generalization as the goal of machine learning; gained perspective on the difference between supervised and unsupervised ML; took a deep dive into classification and linear regression models; gained hands-on experience both with coding ML models in Python with statsmodels and scikit-learn, as well as with automated machine learning platforms Qlik AutoML; answered questions about prepayment, forbearance, and refinancing; and learned how to communicate the results of ML experiments.
Unique to LWDS is the participants' live access to billions of rows of current and historical housing finance industry data.
Course Overview
Class Structure
LWDS consists of 2 tracks - single-family and multifamily. Single-family is delivered on Tuesdays and multifamily is delivered on Thursdays. The concepts and lessons will be very much the same on Tuesdays and Thursdays, but the data will be different.
week 1
What is data science? What are its tools and techniques? Learn how inference is the guiding force in data science and machine learning, and how some types of inference may be attainable in data science while others may not be.

Start your deep dive into the housing industry by understanding the data sets generated by the industry. Where do they come from? What are their strengths? What are their limitations? We'll see live examples of key concepts, and you will be introduced to the tools we will be using in the class, and how to get set up with them.
Descriptive and Inferential Data Science
week 2
Predictive Data Science is largely the domain of Machine Learning. If data science begins with acquiring data and asking a focused question,
machine learning is where we answer the question.

What is supervised machine learning vs. unsupervised ML? Which one is a better fit for answering questions in the Housing Industry? Learn
about multiple machine learning algorithms, with an introduction to linear regression, and a deep dive into classification. We'll get immersed in loan-level data in Snowflake, and see live examples of machine learning in Qlik AutoML (top-down) and in Python with the scikit-learn library (bottom-up) with both historical and current 2022 data.
Predictive Data Science
week 3

Having introduced our housing finance data, how to work with it at scale, and some key machine learning techniques, we turn to questions of what can go wrong with our analysis, and how to avoid these pitfalls. We tie this to our course theme of inference and specifically causal inference, laying out concepts and techniques in greater detail. We walk through a live example of data wrangling in Python with the pandas library. From there, we'll return to live examples of predicting both discrete and continuous values in both Qlik AutoML and Python. We'll return to the the question: why Python? And finally we'll recap the course, tie some course themes together through a brief discussion of the Alignment Problem, and share resources on how to build on course skills and use them in the housing industry.

Causal Inference