Biography

A distinguished honoree in Data Science from the University of Michigan, Peter Yang is now a Research Assistant at Duke University’s Interpretable Machine Learning Lab, working under PhD candidate Stephen Ni-Hahn and Professor Cynthia Rudin. He is also pursuing an MS in Interdisciplinary Data Science at Duke, with a focus on graphical neural networks and generative AI.

Previously, Peter spent three years as a Senior Data Science Analyst at Curinos, Inc., where he developed innovative financial and pricing models for major retail banks.

Interests
  • Graphical Neural Networks
  • Gen AI
  • Music Generation
  • Financial Modeling

Experience

 
 
 
 
 
Interpretable Machine Learning Lab, Duke University
Graduate Research Assistant
Interpretable Machine Learning Lab, Duke University
August 2024 – Present Durham, NC
  • Researched and developed a custom implementation of DiffPool for Heterogeneous GNN used in musical analysis in PyTorch, improving Cross-Entropy Loss by more than 60% in validation with additional hyperparameter tuning.
  • Co-advised by PhD candidate Stephen Ni-Hahn and Prof. Cynthia Rudin.
 
 
 
 
 
Curinos
Senior Data Science Analyst
Curinos
September 2023 – June 2024 Chicago, IL
  • Researched and developed industry-level nonlinear Asset-Liability Management (ALM) models to predict acquisition and other portfolio balances for smaller banks and credit unions, resulting in improved acquisition prediction vs. legacy models in terms of out-of-sample validation.
  • Created automated ad-hoc regression notebooks with PySpark for creating, testing, and validating models with different configurations, reducing the time to build proof-of-concept models by half.
 
 
 
 
 
Curinos
Data Science Analyst II
Curinos
April 2022 – September 2023 Chicago, IL
  • Led ML engineering team to migrate legacy modeling pipeline from using Cloudera to Databricks, coordinating with DevSecOps and Application teams to schedule testing, promotion, and release plans, leading to more than $100k in annual savings for data platform expenses and a 30% decrease in pipeline processing time on average. (Publicly acknowledged in company-wide town hall meeting)
  • Tuned nonlinear hierarchical price elasticity models en masse for multiple major US banks, each with 10,000+ model segments, resulting in improved fit in terms of both AIC and R2 with a significantly higher rate of convergence.
  • Installed and managed more than 10,000 price elasticity models per client bank to predict and optimize their deposit portfolio across a wide range of interest rates, with precise Model Risk Management documentation.
 
 
 
 
 
Curinos
Data Science Analyst
Curinos
August 2021 – April 2022 Chicago, IL
  • Converted local, single-threaded, legacy modeling pipeline to use SparkR and Cloudera, reducing run time for model fitting by up to 30 times.
  • Performed Exploratory Data Analysis (EDA) for client banks to tune and reconfigure their models and data segments, leading to better-performing price elasticity models in terms of MAPE, R2, and rate of convergence.
  • Set up and automated custom SQL procedures to clean, wrangle, map, and transform client’s data feed to be used in the modeling pipeline, partially eliminating the need for manual model data refreshes.
 
 
 
 
 
University of Michigan - Ann Arbor
Honors Student Researcher
May 2020 – April 2021 Ann Arbor, MI
  • Researched Content Based Music Classification System with Neural Networks. Advised by Prof.Edward Ionides and Prof.Daniel Forger
  • Developed new music classification methods using Musical Instrument Digital Interface (MIDI) and LSTM neural networks resulting in 82% accuracy in music classification, more than 10% improvement over conventional ML methods.
  • Improved models using supervised machine learning methods like Support Vector Machines, Decision Trees, Ensemble Methods, K-nearest neighbors etc.
  • Recieved ”Highest Honor” distinction in Data Science from UMich, one of only 2 awarded in 2021.

Certificates

Gain foundational knowledge, practical skills, and a functional understanding of how generative AI works
See certificate
DataCamp
Introduction to Scala
See certificate
Coursera
Deep Learning Spcialization
See certificate
Coursera
Share Data Through the Art of Visualization
See certificate

Projects

*
Duke Gen AI Hackathon — Duke ProfMatch
Developed an LLM-based professor recommendation system for Duke students using GPT-4o-mini with a state-of-the-art Graph-based Retrieval Augmented Generation system, LightRAG, enabling personalized recommendations.
Healthcare Data Analytics Platform - SanAssist
Built a scalable web app with a team of 4, leveraging a fine-tuned GPT-2 (LoRA) for healthcare analytics, achieving a perplexity of 3.32 (outperforming Google’s MedPaLM). Deployed using Docker on AWS ECR and App Runner.
Muscribe: Transcribing Music to Scores
A research project into developing a model that can create scores from pieces of music.
Californian House Price Prediction with Kaggle Data
Performed EDA and a simple XGBoost to predict house prices in California in a single Jupiter notebook. This is simple data project to showcase how I’d approach a relatively straight forward modeling task.
Squirrels API - Use Case Development and Documentation
Developing use cases and documentation for the Squirrels API

Contact

Feel free to leave me a message and I’ll get back to you as soon as possible!