This repository contains tools for classifying tumors according to the MSK OncoTree (OT) platform. See the MSK OncoTree publication for details. Proper tumor classification improves data interoperability, enables more reliable genomic and clinical outcome analyses, facilitates patient matching for biomarker-driven clinical trials, and ultimately supports the delivery of more precise, evidence-based treatment recommendations. This tool package enables OT classification of tumors at scale using a cloud or locally deployed LLM.
- OncoTreeClassifier - Makes use of a Ollama deployed LLM to match tumor information to the best OT Tissue and then the best OT Node within that Tissue.
- OncoTreeComparator - Benchmarks the classifier codes against a truth set of 100 Tempus tumor reports.
- OncoTreePrinter - Parses the OT data structure, pulls the referenced NCI Thesaurus codes, filters, formats, and outputs text for LLM prompt construction
- TempusPathoPrinter - (USeq Repo) Parses Tempus v3.3+ json reports for information useful for the OncoTreeClassifier.
- Resources - Reference files for the various applications. See the oncoTree10MinPres2April2026.pptx for a project overview.
- Install Java version 21 or later. Check it 'java -version'
- Download the latest USeq_XXX.zip release and unzip it.
- Download the latest OncoTree OT_XXX.jar
- Download the latest OncoTree Resource folder OTResourcesXXX.zip and unzip it.
- Obtain a Ollama.com key and save it in a file called key.txt , alternatively see the RunScripts folder for bash and snakemake workflow files for utilizing local nodes and a slurm cluster.
Convert your tumor information into a structured JSON file with these elements:
{
"icd_code_descriptions": "Malignant neoplasm of pancreas; Malignant neoplasm of pancreas, unspecified; Adenocarcinoma; Pancreas",
"original_path_lab_diagnosis": "Adenocarcinoma",
"test_order_id": "2ZN719381V",
"sample_site": "Liver"
}
DO NOT insert any PHI in these JSON files
For Tempus v3.3+ JSON reports, use the USeq/TempusPathoPrinter to create these JSON files:
java -jar USeq_9.3.9/Apps/TempusPathoPrinter -j TempusReports -s ParsedReports \
-i OTResources29June2026/ICD/ICD-10_Diagnosis.txt \
-m OTResources29June2026/ICD/ICD_Morphology.txt \
-t OTResources29June2026/ICD/ICD_Topology.txt -r
Execute the classifier using the Ollama.com service:
java -jar OT_0.1.jar Classifier \
-k $(cat key.txt) \
-m gemma4:31b-cloud \
-c 24000 \
-t OTResources29June2026/promptTissue.txt \
-n OTResources29June2026/tissueCodeNodeCodes.txt \
-a OTResources29June2026/TissueNodeCatalog \
-e OTResources29June2026/TissueNodeExamples \
-j OTResources29June2026/TestJsons \
-r Results
Results for the TestJsons: 2ZN719381V.PANCREAS.PAAD.json 6VE87GH83V.BRAIN.HGGNOS.json 7T3IRL8Y85.MYELOID.RDD.json
HCI Data Science Hackathon - Many thanks to the 2025 'OncoTree LLM: AI Assisted Tumor Classification for Precision Oncology' first place team: Bradley Demarest, Gabby Fort, Chase Maughan, Jake Reed, and David Nix

