Ovarian Carcinoma Histopathology Dataset


The ovarian carcinomas (OC) dataset is a growing collection of whole slide histopathology images digitzed from biopsy sections of five ovarian carcinoma subtypes: high grade serous (HGSC), low grade serous (LGSC), endometrioid (EN), mucinous (MC) and clear cell carcinomas (CC). At present, the collection includes slides from 80 different patients equally split in training and test sets.

This collection of whole slide images was acquired in the context of a Transcanadian study on the reproducibility of ovarian carcinomas subtyping accross 6 different pathology centers. Each whole slide image was digitized with an AperioScope scanner at 40x magnification and was selected by expert pathologists to cover as much of the lesion as possible from a selection of tissue slides. In addition each image has associated meta-data (immunocytology results) provided along with the final diagnosis.

The dataset was introduced to evaluate clinicians' agreement and diagnostic reproducibility then extended to evaluate automatic multiclass classification systems for ovarian carcinomas, where the goal is to automatically predict a carcinoma subtype for each whole slide image.



The following papers proposed automatic systems for ovarian carcinoma subtypes classification using this dataset:

  • Aicha BenTaieb, Masoud Nosrati, Hector Li-Chang, David Huntsman, and Ghassan Hamarneh. Clinically-Inspired Automatic Classification of Ovarian Carcinoma Subtypes. Journal of Pathology Informatics, 7(1):1-28, 2016. [pdf] [code] [bibtex]

  • Aicha BenTaieb, Hector Li-Chang, David Huntsman, and Ghassan Hamarneh. Automatic Diagnosis of Ovarian Carcinomas via Sparse Multiresolution Tissue Representation. In Lecture Notes in Computer Science, Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9349, pages 629-636, 2015. [pdf] [bibtex]

  • Aicha BenTaieb, Hector Li-Chang, David Huntsman, and Ghassan Hamarneh. A Structured Latent Model for Ovarian Carcinoma Subtyping from Histopathology Slides. Medical Image Analysis (MedIA), 39:194-205, 2017. [pdf] [code] [bibtex]

This dataset is for academic, non-commercial use only. If you use this dataset in a publication, please cite the following paper:

title={Diagnosis of ovarian carcinoma cell type is highly reproducible: a transcanadian study},
author={K{\"o}bel, Martin and Kalloger, Steve E and Baker, Patricia M and Ewanowich, Carol A and Arseneau, Jocelyne and Zherebitskiy, Viktor and Abdulkarim, Soran and Leung, Samuel and Duggan, M{\'a}ire A and Fontaine, Dan and others},
journal={The American journal of surgical pathology},



  1. Download all files linked below

  2. Use the following command to merge all files into a single folder. ⚠️ Total uncompressed file size >100 GB, containing all histopathology slides in svs format.

cat data_part* > total_data.zip