SanAssist: LLM-Powered Healthcare Data Dashboard

Healthcare dashboard mockup

SanAssist is an innovative web application that empowers healthcare professionals with seamless data analysis and natural language querying. The platform combines interactive dashboards built with the Squirrels library and a fine-tuned GPT-2 model (via LoRA) to deliver context-aware insights from patient datasets.

The system includes a complete ETL pipeline (Databricks + Pandas) for data ingestion, transformation, and storage, connected to a lightweight Python microservice. Healthcare professionals can upload datasets, explore interactive charts and tables, and query the integrated chatbot for insights like “Which patients are at the highest risk for sepsis?”.

SanAssist is fully containerized with Docker, deployed on AWS Elastic Container Registry (ECR) and App Runner, and automated via GitHub Actions CI/CD pipelines. Load tests with 10,000 concurrent users confirmed scalability, with peak throughput of 602 RPS. The fine-tuned model achieved a perplexity score of 3.32, outperforming general-purpose medical LLMs such as Med-PaLM in domain-specific accuracy.

With a robust data pipeline, secure deployment, and intuitive interface, SanAssist demonstrates how LLMs can transform healthcare analytics, bridging the gap between raw data and actionable clinical insights.


Chao Péter Yang
Chao Péter Yang
ML Research Assistant

My research focuses on symbolic music generation and graph machine learning, with broader interests in generative modeling, agentic AI systems, and large models. I aim to bridge theory and practice in data science to create both scientific and real-world impact.