I’m currently a senior data scientist at Nielsen. I did my master’s in computer science and undergrad in computer science and the Integrated Science Program, both at Northwestern (go ‘cats!). My email address is me@(this domain).
This is a web version of my resume; if you’d like, you can view a PDF here.
Senior Data Scientist - Nielsen (July 2020 - present)
- Researched and shipped novel Bayesian inference methodology for measuring audiences for 100s of ad campaigns on 3 platforms, using PySpark, NumPyro, and Airflow (US patent pending).
- Designed and developed evaluation framework for benchmarking cutting-edge Bayesian PyTorch TV viewing model against 100s of GBs of historical data.
- Productionalized ad-hoc ETL runs and ML model training to be more reproducible, increasing iteration speed.
- Created matplotlib visualizations to evaluate and build trust in machine learning models.
Data Scientist - Nielsen (July 2018 - present)
- Rebuilt flagship TV ratings model with machine learning in PySpark to train 10x faster and admit 4x fewer false positives (US patent pending).
- Developed alpha version of internal Python framework for unifying workflows for 1000+ data scientists.
- Automated documentation build of internal framework using Sphinx and AWS.
- Created PySpark libraries to unify team’s workflow and enable consistent comparison of different model candidates.
- Presented tech talks on Spark, Bayesian modeling, MLflow, and Python to mixed audiences of 200+ data scientists, software engineers, and business leaders.
Software Engineering Intern - Qualtrics (June 2017 - August 2017)
- Added pagination, custom data types, & UI enhancements to “action planning” module on Employee Experience platform (AngularJS, Java).
- Redesigned handling of page filters for action planning dashboards by refactoring shared and product-specific code.
- Increased test coverage for product by 10% and wrote test files from scratch for untested services.
Teaching Assistant - Northwestern University (March 2016 - June 2018)
- Mentored students in intro programming, intermediate Python, discrete mathematics, and data structures courses.
- Assisted with curriculum and exam design; led small-group tutorial sections; taught students individually.
Lead Helpdesk Analyst - Northwestern University (September 2014 - June 2018)
- Developed Chrome extension to automate often-forgotten parts of help desk tickets, deploying to 60 student staff members and reducing incomplete tickets by over 90%.
- Mentored, managed, trained, and completed performance reviews for 5 student consultants semiannually.
- Wrote Python scripts to assist with scheduling, accounting for staffing needs, class schedules, and individual preferences.
Python: fluent in core language features, scientific computing libraries (NumPy, pandas, scikit-learn, PyTorch), visualization (Matplotlib, Seaborn, Altair), Bayesian inference (PyMC3, Pyro, NumPyro)
Technologies: Git, Unix, unit testing and TDD, machine learning, statistical modeling, Bayesian inference, data viz in Python and d3.js, AWS, Docker (basic)
M.S. Computer Science - Northwestern University, 2018 (GPA 4.0, focusing on machine learning and data science)
B.S. Computer Science - Northwestrn University, 2018 (GPA 3.96, summa cum laude)
- Student in the Integrated Science Program, a selective, research-oriented program in science and mathematics (isp.northwestern.edu).
- Member of Tau Beta Pi Engineering Honor Society.
- Built a responsive and mobile-friendly web tool for helping players of the video game Pokemon Mystery Dungeon: Rescue Team DX
Tech for Campaigns (volunteer data scientist & engineer)
- Volunteer data scientist on a team building a model to predict state & local elections.
- Improved data ingestion pipeline for election prediction model, resulting in 200+ unit tests passing, fewer build failures, and 10% faster CI runs.
- Built tools to download & process messy, disparate electoral data from 4 state election boards.
- Investigated political polarization over time on Twitter by replicating methods of Barberá et al. using Python and R.
- Collected 53 million Tweets over 3 weeks with Twitter Streaming API, storing in MongoDB database.
- Used correspondence analysis to estimate political ideology of 3 million users and analyze online polarization.
- Leveraged Fitbit API to obtain two years of minute-by-minute sleep data.
- Analyzed and visualized data in Python to draw conclusions and gain insights about personal sleep patterns.