Medical AI Research

1. Core Vision

의료 영상에서 단순 분류(Classification)가 아니라, 질병을 정량화(Quantification)하고, 구조화된 임상 표현(structured clinical representations)을 생성하여 LLM 기반의 임상 추론으로 연결되는 "End-to-End Clinical Reasoning Pipeline"을 구축하는 것이 전체적인 연구 비전입니다.

이 비전은 다음과 같은 핵심 목표를 가집니다:

  • 의료 영상 → 병변 분할(segmentation) → 수치화(quantification) → 정량 지표 기반 예측 모델
  • 정량 지표 + 이미지 + 텍스트 리포트 → Multimodal LLM 기반 AI Doctor Assistant
  • 안전성·신뢰성 강화: 모델 검증(Formal Verification), 안전 뉴런 분석, mechanistic interpretability

2. Research Theme A: Medical Image Quantification & Disease Modeling

전통적인 CNN 기반 분류를 넘어, 질병의 진행 정도를 수치화하고 임상적으로 해석 가능한 continuous biomarker를 생성하는 연구 라인입니다.

A.1. Ophthalmology (안과 영상 기반 정량화)

  • Epiretinal Membrane(ERM), Diabetic Retinopathy, Macular Disease 등
  • Fundus / OCT-B-scans 기반:
    • 병변 segmentation (Membrane, Retina layers, cyst regions)
    • Thickness map, curvature, reflectance profile 등 정량 지표 생성
  • 정량 지표 기반 disease staging, progression prediction 모델

A.2. Gait / Orthopedics (정형외과 보행 분석)

  • Markerless video (pose estimation) → biomechanical features → gait anomaly quantification
  • Clinical grading 대신 정량 feature를 활용한 진단 및 progression 모델
  • Pediatric, elderly imbalance assessment 등 확장 가능

A.3. Multi-modal structured data integration

영상, 정량 feature, EMR, 검사 수치(labs) 등 통합.
최종 목적: disease progression world model 구축.


3. Research Theme B: Domain-Specialized Medical LLMs (Ophtimus-V2 계열)

사용자가 직접 개발한 Ophthalmology 특화 LLM(Ophtimus-V2-Tx) 연구 라인입니다.

B.1. Clinical reasoning 모델

  • 케이스 리포트 기반 fine-tuning
  • 증상–영상–진단–치료로 이어지는 "임상 지식 경로(clinical knowledge pathway)" 학습
  • hallucination 감소 및 안전성 강화 목적의 LoRA 및 structured LoRA 실험

B.2. Multi-modal 입력 확장

  • Fundus / OCT(B-scan) embedding + structured quantification + textual description
  • 나아가 의료용 World Model과 결합하여 progression simulator 연동 가능

B.3. Safety & Trustworthiness

  • "Safety Neurons" 분석
  • Mechanistic interpretability (circuit-level patterns in reasoning)
  • Clinically harmful output 검출 및 unlearning

4. Research Theme C: Formal Verification + AI Safety for Medical AI

의료 AI의 신뢰성과 규제 대응(의료기기 인허가 등)을 위해
정형 기법(Formal Methods) + AI Safety를 결합한 독자적 연구 라인.

C.1. Verified Environment Models

  • Timed Automata 기반 의료 프로세스 모델
  • Model checking(PCTL, CTL, TCTL)을 통한 안전 제약 조건 검증
  • 강화학습 또는 AI inference가 이 제약을 위반하지 않도록 control shield 제공

C.2. Verified AI Controllers

  • Medical AI inference pipeline에 safety property 강제
  • "언제 어떤 입력에서 위험한 출력이 발생 가능한가"를 검증하는 분석
  • Verification-aware fine-tuning 또는 pruning

C.3. Trustworthy Data & Contamination Check

  • Crowd annotation에서 LLM-cheating 탐지(peer prediction 기반)
  • 의료 데이터 라벨의 신뢰성 확보

5. Research Theme D: Medical World Models & Embodied AI

NeurIPS 2025의 핵심 트렌드("World Models", "Embodied AI for Healthcare")와 직접적으로 정렬되는 연구 방향.

D.1. Disease Progression World Model

  • Retina / ERM progression dynamics를 모델링하는 generative world model
  • OCT/B-scan 연속 영상 기반 temporal latent dynamics
  • "만약 환자의 상태가 X라면, 6개월 후의 OCT는 어떻게 변할까?" 같은 counterfactual simulation 가능

D.2. Multi-modal Clinical Simulator

  • 이미지, 정량 biomarker, 텍스트 리포트, 치료 이력 포함
  • LLM에게 구조화된 임상 시뮬레이션 컨텍스트 제공
  • 임상 결정지원(Decision Support) 최대 강화

D.3. Reinforcement Learning in Verified Clinical Simulation

  • 실세계 의료를 직접 학습시키는 것이 금지되는 경우
  • Verified world model 기반 safe RL 적용 가능
  • Treatment planning 또는 screening 정책 최적화 연구로 확장 가능

6. Research Theme E: Foundations for AI-Driven Clinical Decision Support

위의 모든 축(A~D)를 통합하여 임상 추론 자동화라는 궁극적인 의료 AI 목적을 지원.

E.1. Image → Biomarker → Reasoner → Recommendation

  • 완전히 end-to-end 연결 가능한 pipeline 구축
  • 영상 기반 quantification이 LLM reasoning의 입력 구조로 연결됨

E.2. Multi-lingual / Multi-institution Generalization

  • 한국, 미국(UPenn), 기타 기관 데이터 협력 기반
  • Robustness, distribution shift 연구 수행

E.3. Regulatory-readiness

  • 신뢰성 평가 지표(specificity, sensitivity, FN-critical tasks)
  • "Safety case" 구조를 갖춘 의료 AI 문서화 가능

7. 전체 테마 요약 (One-page Executive Summary)

사용자의 Medical AI 연구는 단순한 이미지 분류를 넘어서 다음의 통합적 연구 생태계를 구축하는 것에 초점을 둔다.

  1. 질병 정량화 기술
    • 영상 기반 병변 분석, 수치화, progression modeling
  2. 임상 특화 LLM 개발(Ophtimus-V2-Tx)
    • Ophthalmology 전문 reasoning 모델
    • Multi-modal (OCT/Fundus + EMR + biomarkers) 처리
  3. AI Safety & Formal Verification 적용
    • 의료 AI를 위한 safety constraints 보장
    • Verified environment + verified inference
  4. World Model 기반 임상 시뮬레이션
    • 질병 진행 시뮬레이션
    • LLM의 clinical decision reasoning을 위한 foundation
  5. 전반적 의료 의사결정 지원 시스템 구축
    • Data → Image → Quantification → LLM → Decision까지 end-to-end

Key Themes

  • Domain-specialized LLMs for ophthalmology (e.g., Ophtimus-V2-Tx)
  • Noise-robust medical image analysis and quantification
  • Reliable mapping from model outputs to clinical coding systems
  • Evaluation frameworks for safety, robustness, and explainability

Selected Projects

  • Ophtimus-V2-Tx: An 8B-parameter ophthalmology LLM trained on case reports and evaluated with CliBench-based coding.
  • ERM Quantification: Low-cost and fast SD-OCT based epiretinal membrane detection and thickness quantification.

Ophtimus: Ophthalmology-specific LLM

GitHub Repository

Python PyTorch Transformers LangChain Streamlit FastAPI

🤗 Models and Datasets  |  📕 AAAI 2025 workshop Paper

Introduction

Ophtimus is an open-source large language model (LLM) specialized in ophthalmology, built with 8 billion parameters based on the LLaMA architecture. It was trained on carefully curated ophthalmology-specific data, including medical papers, textbooks, and research reports. Through filtering, summarization, and preprocessing, only the most relevant and high-quality information was retained.

Designed to be both lightweight and high-performing, Ophtimus is suitable for real-world applications such as clinical decision support, medical education, and patient communication. The model and its training pipeline are fully open-sourced, providing a practical reference for developing similar domain-specific LLMs in other areas of medicine.

GitHub Repository: github.com/jinkimh/Ophtimus-Ophthalmology-LLM

Ophtimus Overall Architecture

Dataset Details

Note: All datasets were either newly constructed or adapted for this project. Pre-training datasets were curated from open-source ophthalmology materials, while instruction-tuning and evaluation datasets were built by extracting only ophthalmology-relevant samples from broader medical corpora. All data underwent preprocessing steps including deduplication, language filtering (English only), and removal of any personally identifiable information (PII).

Dataset name Source Size Purpose Key Features
Ophthalmology-pubmed-corpus [Link] Ophthalmology paper 18.4M Tokens Pre-Training • Map-reduce method summary
• Broad ophthalmic keywords
Ophthalmology-textbook-corpus [Link] Ophthalmology textbook 4M Tokens Pre-Training • Trusted medical sources
• Rich in diagnostic cases
Ophthalmology MCQA Inst dataset [Link] Ophthalmology Docs 51.7k QAs Inst-Tuning • Diverse multiple-choice formats
• Reasoning included
• Variety of ophthalmic topics
Ophthalmology EQA Inst dataset [Link] Ophthalmology Docs 49.3k QAs Inst-Tuning • Variety of ophthalmic topics
Ophtimus-Eval-Dataset [Link] Medical platform data 2,153 QAs Evaluation • expert-verified data
• MCQA dataset
PubMedQA-ophthal-Dataset [Link] PubMedQA 297 QAs Evaluation • Ophthalmology domain filtered
• True/False MCQA dataset
MedMCQA-Ophthal-Dataset [Link] MedMCQA 6,932 QAs Evaluation • Ophthalmology domain filtered
• MCQA dataset
EQAEval-Dataset [Link] MedQuAD, Others 1,389 QAs Evaluation • Diverse open-source datasets
• Ophthalmology domain filtered
• Essay QA

Model Details

Note: The "pre-training" and "fine-tuning" columns in the table refer to the training performed in this project. The base models had already undergone pre-training and/or fine-tuning prior to this project, and we applied transfer learning using those models.

Model name Base model Parameters Pre-training Instruction-tuning
Ophtimus-Base [Link] Llama-3.1-8B 8B
Ophtimus-Llama-1B [Link] Llama-3.2-1B-Instruct 1B
Ophtimus-Llama-3B [Link] Llama-3.2-3B-Instruct 3B
Ophtimus-Llama-8B [Link] Llama-3.1-8B-Instruct 8B
Ophtimus-Instruct-8B [Link] Ophtimus-Base 8B

Performance

Note: Multi-Choice QA: Ophtimus-Eval, MedMCQA, PubMedQA | Essay QA: MedQuAD, Medical Flashcards, Medical Wikidoc
Ophtimus-Eval is a proprietary dataset collected from a medical platform. The others are established medical benchmark datasets, from which only ophthalmology-related QA pairs were extracted for evaluation.

Model Multi-Choice Question Essay Question
Ophtimus Eval MedMCQA (Ophth) PubmedQA (Ophth) RougeL BLEU METEOR SemScore
OpenAI GPT-4o 71.95% 81.95% 89.90% 0.193 0.082 0.341 0.761
Llama-3-8B-Instrct 48.60% 74.02% 63.97% 0.193 0.064 0.244 0.684
Llama-3.1-8B-Instrct 39.78% 57.96% 83.84% 0.177 0.054 0.215 0.641
Eye-Llama 32.56% 59.43% 66.11% 0.183 0.062 0.211 0.686
PMC-Llama-13B 48.28% 63.45% 72.48% 0.223 0.082 0.288 0.714
Ophtimus-Llama-1B 41.45% 45.74% 61.95% 0.219 0.076 0.217 0.711
Ophtimus-Llama-3B 52.70% 62.10% 69.36% 0.224 0.077 0.225 0.726
Ophtimus-Llama-8B 60.78% 68.25% 69.70% 0.226 0.083 0.230 0.733
Ophtimus-Instruct-8B 63.85% 71.51% 72.73% 0.222 0.079 0.224 0.735

Quickstart

Install Dependencies

cd Ophtimus-Ophthalmology-LLM
pip install -r requirements.txt

Ophtimus Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name example : BaekSeungJu/Ophtimus-Instruct-8B or Ophtimus-Llama-1B or Ophtimus-Llama-3B or Ophtimus-Llama-8B
model_name = "BaekSeungJu/Ophtimus-Instruct-8B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

system_instruction = (
    "You are an expert ophthalmologist. Please provide accurate and "
    "medically sound answers to the user's ophthalmology-related question."
)

# Enter your questions in the list
questions = [
    "Please describe the symptoms and treatment of epiretinal membrane.",
    "What's good for eyes?"
]

prompts = []
for question in questions:
    row_json = [
        {"role": "system", "content": system_instruction},
        {"role": "user", "content": question}
    ]
    prompt = tokenizer.apply_chat_template(row_json, add_generation_prompt=True, tokenize=False)
    prompts.append(prompt)

input_ids = tokenizer(
    prompts,
    padding=True,
    return_tensors="pt",
)["input_ids"].to("cuda")

model.eval()
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=1024,
        do_sample=False,
    )

decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)
for i, text in enumerate(decoded):
    print(f"------------------------\nAnswer for question {i+1}:\n{text}")

For more details, visit the GitHub repository.

Ophtimus-V2-TX

To be Updated

SD-OCT-based Epiretinal Membrane Diagnostic Assistant System

Python PyTorch OpenCV YOLO Pillow

Introduction

This project presents a low-cost and efficient method for detecting and quantifying Epiretinal Membranes (ERM) using Spectral-Domain Optical Coherence Tomography (SD-OCT). By applying deep learning techniques—specifically, YOLO object detection—we generate en face "ERM Projection Images" from B-scan data, enabling intuitive visualization and accurate measurement of ERM areas. The method also introduces a novel approach to quantify the association between ERM and retinal thickness, enhancing clinical decision-making. Our approach aims to bridge the diagnostic performance gap between SD-OCT and Swept-Source OCT (SS-OCT) while maintaining accessibility and reducing diagnostic burden.

ERM System Architecture

Overall pipeline architecture for ERM detection & quantification

YOLO Model Evaluation

We evaluated three YOLO-based models (v5, v8, v11) for ERM detection using SD-OCT B-scan images.
Each model was trained on two datasets (2,200 images for Full, 1,100 images for Half) and tested on 650 expert-labeled images.

Model Size Params (M) Precision Recall mAP@50 mAP@50:95 Dataset Size
YOLOv5S7.020.7520.7030.7220.423Full
0.6940.6420.6640.376Half
M20.870.7830.7340.7520.444Full
0.7230.6850.7010.396Half
L46.140.8130.7620.7840.463Full
0.7450.7040.7260.414Half
X86.220.8360.7840.8020.485Full
0.7630.7250.7430.437Half
YOLOv8S11.140.7810.7360.7640.447Full
0.7230.6760.7010.393Half
M25.860.8130.7620.7910.466Full
0.7480.7050.7240.412Half
L43.630.8440.7920.8230.482Full
0.7740.7310.7540.436Half
X68.150.8670.8140.8420.504Full
0.7930.7520.7720.454Half
YOLOv11S9.430.8040.7520.7830.468Full
0.7460.6920.7140.417Half
M20.050.8460.7940.8210.493Full
0.7740.7360.7570.443Half
L25.310.8730.8230.8540.524Full
0.8070.7730.7930.476Half
X56.870.9020.8570.8820.556Full
0.8360.8030.8260.507Half

GitHub repository: github.com/jinkimh/SD-OCT-ERM-Quantification

Gait Anomaly Detection

To be Updated