Healthcare-focused data architect and analyst with hands-on experience in developing robust pipelines, integrating FHIR and claims data, and optimizing reporting processes. Adept at translating complex datasets into actionable insights that support clinical and business goals.
Led the end-to-end development of a data integration project using Python, SQL, SSIS, dbt, and RESTful APIs to enhance FHIR-based BCDA data with CCLF, improving data completeness. Built and maintained robust ETL pipelines for commercial claims and Epic Clarity EHR data, resolving legacy issues and integrating data into a customized Caboodle database. Improved SQL solutions for commercial supplemental files, boosting HEDIS performance metrics, and supported CMS reporting through data preparation and validation.
Built a Python pipeline using regex and NLP to extract over 95% of unstructured data from doctor notes. Developed SQL scripts for HEDIS measures and developed interactive dashboards in Tableau and Power BI to visualize key commercial metrics.
Developed a keyword-based search tool using Python and PowerShell to efficiently retrieve content from over 1,200 Power BI and SQL reports, improving accessibility for 60+ users. Co-created a machine learning pipeline to forecast cash balances across 204 branches, supporting a utility with projected savings of $5.5–10M annually. The prototype won first place in an internal datathon and advanced toward production deployment.
• Proposed and implemented a bisecting hierarchical clustering algorithm for time series data to forecast college enrollment, improving accuracy by 15% over conventional methods.
Drexel University
Clark University
University of Maryland, College Park
Some highlights of my work in data, analytics, and engineering.
Developed a bisecting hierarchical clustering algorithm and implemented a hierarchical forecasting scheme using statistical models to improve college enrollment forecasts by 15%, enabling institutions to enhance fairness representation across racial/ethnic groups through data aggregation.
Utilized multiple statistical tests to investigate if there is a significant bias in the predictive outcomes of a typical heart transplant decision-making platform. Revealed the existence of gender bias and regional bias.
Predicted patient survival rates based on data from the first 24 hours of intensive care using ensemble machine learning methods, such as XGBoost and LightGBM. Obtained high accuracy and finished in the top 10% of participants.
Feel free to reach out about data engineering, analytics, or healthcare data projects.