Headshot of Yuji Mori

Yuji Mori

Data Scientist in the Healthcare Space

Impact-driven Data Scientist with 7+ years of experience applying advanced analytics and machine learning to solve complex business problems. Adept at translating ambiguous business problems into data-supported solutions leveraging causal inference and statistical modeling. A proven technical leader with a track record of building data pipelines and spearheading the adoption of Generative AI to achieve company-wide efficiencies.

Experience by Expertise

My work focuses on using advanced research methods to prove the value and effectiveness of healthcare services through financial and clinical outcomes.

  • Leveraged causal inference to quantify cost savings by analyzing medical claims data from two distinct populations: members who used our product and a comparable cohort who did not.
  • My research demonstrated that engaged members achieved significant savings in community healthcare utilization, including reductions in urgent care visits, hospitalizations, and specialty visits.
  • Analysis and key insights were presented to senior leadership, directly influencing major strategic decisions.
  • Helped defend and renew multi-million dollar contracts with my analytical findings by proving the value of our product.
  • Measured the impact of treatments on clinical outcomes for chronic condition populations, such as diabetics (lowering A1c), obesity (reducing BMI), hypertension (lowering blood pressure), and mental health conditions (improving PHQ-9 and GAD-7 scores).
  • Extensive daily experience with medical data, including claims, diagnosis codes, CPT codes, HEDIS, and EMRs to inform product and experimental design.

In my role, the patient's care journey is the product. My daily responsibilities focus on improving our delivery of healthcare, whether through clinic efficiencies, new app features, or better health outcomes. At the core, we use data to power our understanding of the patient experience.

  • Led comprehensive funnel and cohort analyses to inform business strategy by mapping the member journey from acquisition through their entire course of care to identify drop-off points.
  • Conducted detailed cohort analyses, segmenting our population by chronic condition to understand long-term engagement and measure financial value over time.
  • Processed and analyzed patient satisfaction survey data to understand overall sentiment towards our individual clinics, providers, and services as a whole.

I have also had opportunities to work on projects in the realm of marketing analytics within my current role.

  • Conducted attribution analyses on clinical campaigns, such as flu shot initiatives and women's health awareness, to quantify a campaign's direct impact on new registrations and engagements.
  • My reports revealed that "on-site" campaigns were significantly more effective at driving engagement than digital outreach methods.

I have proactively identified and led the development of key initiatives to bring machine learning and deep learning capabilities into production. As the lead technical representative on our company’s AI Committee, I was tasked with guiding the organization in the strategic adoption of AI/ML tools.

  • Led the "Health Score" project, an algorithm to provide a comprehensive health score for each patient. The objective is to develop a predictive model that can assess a patient’s health trajectory and forecast adverse outcomes, enabling proactive interventions.
  • Championed the development of an internal, AI-powered knowledge base using Generative AI, allowing client-facing teams to quickly retrieve information on products, projects, and contracts.
  • Played a critical role in a large-scale data migration project, overseeing the automated translation of thousands of in-production SQL scripts from MS SQL-Server to Snowflake SQL.
  • Conceptualized predictive models to optimize marketing campaign performance by forecasting patient engagement and retention.

I have a strong background in mentoring and developing the next generation of data professionals, focusing on equipping junior data analysts with essential skills for success.

  • My guidance covered deep-dive analyses of complex datasets like medical claims, dashboard creation, and writing efficient, clean SQL code.
  • Supervised projects involving intricate data-matching processes to create a unified, person-level identity from disparate data sources—a foundational task for accurate reporting.
  • My goal has been to cultivate a culture of continuous career growth and technical excellence.

Education

University of California, Los Angeles (2020-2022)

Master of Applied Statistics

Master’s Thesis: Applications of NLP for Predicting Self-Harm Risk

  • Developed a self-harm risk prediction model using BERT-based Neural Networks, Random Forests, and sentiment analysis on social media text.
  • Demonstrated that fine-tuned BERT significantly outperforms Random Forests in classification accuracy while maintaining computational efficiency.
  • Read the Full Paper (PDF)

University of California, Davis (2014-2018)

Bachelor of Science (B.S.) in Statistics, Minor in Computer Science

Skills

Programming

  • Python, R, SQL, SAS, C++, Perl, Shell Scripting (Bash)

Statistics + ML

  • Causal Inference, AI Deep Learning (TensorFlow), NLP, Regression, A/B Testing, Multivariate Analysis, Clustering, Classification

Data Engineering

  • Snowflake, Hadoop Hive, Spark + MapReduce, AWS Suite, Azure

Visualization

  • Tableau, matplotlib, ggplot, Plotly, Shiny, Streamlit

Other Tools

  • Git/Github, Jupyter Notebooks, Airflow, dbt, Atlassian JIRA

Personal Projects

I developed and evaluated multiple recommender systems using a large dataset of implicit user feedback from the Steam video game platform. The project involved implementing and comparing collaborative filtering techniques—specifically item-item similarity and matrix factorization with Alternating Least Squares (ALS)—alongside a content-based model using TF-IDF on game descriptions. Ultimately, the matrix factorization model proved to be the most reliable and highest-performing method for generating personalized top-10 recommendations.

Read the Report (PDF)

A research poster submission for the UC Davis Undergraduate Research Conference, summarizing key findings and methodologies from my work as a Bioinformatics intern at the Michelmore Plant Genomics Lab.

View Poster (PDF)

Contact

Feel free to reach out. I'm always open to discussing new projects or opportunities.