Peter Yang

Graduate Research Assistant

Interpretable Machine Learning Lab, Duke University

Biography

A distinguished honoree in Data Science from the University of Michigan, Peter Yang is now a Research Assistant at Duke University’s Interpretable Machine Learning Lab, working under PhD candidate Stephen Ni-Hahn and Professor Cynthia Rudin. He is also pursuing an MS in Interdisciplinary Data Science at Duke, with a focus on graphical neural networks and generative AI.

Previously, Peter spent three years as a Senior Data Science Analyst at Curinos, Inc., where he developed innovative financial and pricing models for major retail banks.

Interests

Graphical Neural Networks
Gen AI
Music Generation
Financial Modeling

Experience

Graduate Research Assistant

Interpretable Machine Learning Lab, Duke University

August 2024 – Present Durham, NC

Researched and developed a custom implementation of DiffPool for Heterogeneous GNN used in musical analysis in PyTorch, improving Cross-Entropy Loss by more than 60% in validation with additional hyperparameter tuning.
Co-advised by PhD candidate Stephen Ni-Hahn and Prof. Cynthia Rudin.

Senior Data Science Analyst

Curinos

September 2023 – June 2024 Chicago, IL

Researched and developed industry-level nonlinear Asset-Liability Management (ALM) models to predict acquisition and other portfolio balances for smaller banks and credit unions, resulting in improved acquisition prediction vs. legacy models in terms of out-of-sample validation.
Created automated ad-hoc regression notebooks with PySpark for creating, testing, and validating models with different configurations, reducing the time to build proof-of-concept models by half.

Data Science Analyst II

Curinos

April 2022 – September 2023 Chicago, IL

Led ML engineering team to migrate legacy modeling pipeline from using Cloudera to Databricks, coordinating with DevSecOps and Application teams to schedule testing, promotion, and release plans, leading to more than $100k in annual savings for data platform expenses and a 30% decrease in pipeline processing time on average. (Publicly acknowledged in company-wide town hall meeting)
Tuned nonlinear hierarchical price elasticity models en masse for multiple major US banks, each with 10,000+ model segments, resulting in improved fit in terms of both AIC and R2 with a significantly higher rate of convergence.
Installed and managed more than 10,000 price elasticity models per client bank to predict and optimize their deposit portfolio across a wide range of interest rates, with precise Model Risk Management documentation.

Data Science Analyst

Curinos

August 2021 – April 2022 Chicago, IL

Converted local, single-threaded, legacy modeling pipeline to use SparkR and Cloudera, reducing run time for model fitting by up to 30 times.
Performed Exploratory Data Analysis (EDA) for client banks to tune and reconfigure their models and data segments, leading to better-performing price elasticity models in terms of MAPE, R2, and rate of convergence.
Set up and automated custom SQL procedures to clean, wrangle, map, and transform client’s data feed to be used in the modeling pipeline, partially eliminating the need for manual model data refreshes.

Honors Student Researcher

University of Michigan - Ann Arbor

May 2020 – April 2021 Ann Arbor, MI

Researched Content Based Music Classification System with Neural Networks. Advised by Prof.Edward Ionides and Prof.Daniel Forger
Developed new music classification methods using Musical Instrument Digital Interface (MIDI) and LSTM neural networks resulting in 82% accuracy in music classification, more than 10% improvement over conventional ML methods.
Improved models using supervised machine learning methods like Support Vector Machines, Decision Trees, Ensemble Methods, K-nearest neighbors etc.
Recieved ”Highest Honor” distinction in Data Science from UMich, one of only 2 awarded in 2021.

Certificates

Generative AI with Large Language Models

Coursera Jul 2023

Gain foundational knowledge, practical skills, and a functional understanding of how generative AI works

See certificate

Introduction to Scala

DataCamp Nov 2022

See certificate

Deep Learning Spcialization

Coursera May 2021

See certificate

Share Data Through the Art of Visualization

Coursera May 2021

See certificate

Projects

Duke Gen AI Hackathon — Duke ProfMatch

Developed an LLM-based professor recommendation system for Duke students using GPT-4o-mini with a state-of-the-art Graph-based Retrieval Augmented Generation system, LightRAG, enabling personalized recommendations.

Healthcare Data Analytics Platform - SanAssist

Built a scalable web app with a team of 4, leveraging a fine-tuned GPT-2 (LoRA) for healthcare analytics, achieving a perplexity of 3.32 (outperforming Google’s MedPaLM). Deployed using Docker on AWS ECR and App Runner.

Muscribe: Transcribing Music to Scores

A research project into developing a model that can create scores from pieces of music.

Californian House Price Prediction with Kaggle Data

Performed EDA and a simple XGBoost to predict house prices in California in a single Jupiter notebook. This is simple data project to showcase how I’d approach a relatively straight forward modeling task.

Squirrels API - Use Case Development and Documentation

Developing use cases and documentation for the Squirrels API

Recent Publications

Quickly discover relevant content by filtering publications.

Peter Yang (2021). The Classical-Romantic Dichotomy: A Machine Learning Approach. Honors Thesis.

PDF Cite

Contact

Feel free to leave me a message and I’ll get back to you as soon as possible!

chaopeter.yang@gmail.com
Chicago, IL