Digital pathology is essential for cancer diagnosis and treatment, and plays an important role in providing medical care and pharmaceutical research and development. Pathology traditionally depends on the expertise and experience of pathologists, and conducts a thorough inspection of organizational samples to identify abnormalities. However, the complexity and amount of cases requires advanced tools for pathologists to provide more accurate diagnosis.
The digitalization of pathological slides known as the whole slide image (WSI) has created a new field of computational science. By applying AI to these digitized WSIs, researchers are working to unlock new insights and strengthen their current annotation workflows. The extremely important advancement in the field of computing pathology is the appearance of a large, deep neural network architecture known as a basic model (FMS). These models are trained using self -monitoring learning algorithms on a vast dataset, so that you can capture visual expressions and patterns of patterns unique to pathological images. The power of FMS can be effectively transferred and fine -tuned for various downstream tasks, from automated disease detection and quantitative biomarker analysis of tissue to quantitative biomarker analysis and pathological sub -typing. It is the ability to learn data embedding.
Recently, French Startup Bioptimus has released a new pathological vision FM: H-Optimus-0, the world’s largest FM for pathology. With a 1.1 billion parameters, H-OPTIMUS-0 was trained with hundreds of millions of images extracted from over 500,000 Histopathology slides. This sets a new benchmark for state -of -the -art performance in important medical diagnostic tasks, from identifying cancer cells to the detection of the tumor.
Recently adding H-Optimus-0 to Amazon Sagemaker Jumpstart shows an important milestone for healthcare organizations for advanced AI functions. This powerful FM is a valuable tool for organizations that are trying to strengthen digital pathological workflows, with comprehensive training on tissue pathological slides over 500,000.
This post shows how to use H-Optimus-0 for two common digital pathology tasks. Patch -level analysis for detailed organizational inspections and slide level analysis for widespread diagnosis evaluation. Through practical examples, it shows how to use this FM to these specific use cases while optimizing the calculation resources.
Overview of solutions
Our solutions use the AWS integrated ecosystem to create efficient scalable pipelines for digital pathological AI workflows. The architecture combines the following services.
The following figure shows a solution architecture for training and deploying FMS, which is fine-tuned using H-Optimus-0.

This figure shows a solution architecture for training and deploying FMS, which is fine-tuned using H-Optimus-0.
This post provides examples of scripts and training notebooks in the following GitHub repository.
Prerequisite
Assume that you can access the AWS account and have been authenticated. The AWS CloudFormation template of this solution hosts the sageMaker notebook using the T3.medium instance. For feature extraction, use a G5.2XLARGE Instan style equipped with the NVIDIA T4 GPU tested in the US-2 AWS area. Training jobs are executed on P3.2XLARGE and G5.2XLARGE instances. Check the AWS service assignment and make sure you can access these Instan style.
Create AWS infrastructure
Use AWS CloudFormation to automate setup of core infrastructure to start a pathological AI workflow. The provided infra-stack.yml template creates a complete environment where models have been fine-tuned and trained.
CloudFormation Stack uses Amazon Virtual Private Cloud (Amazon VPC) to configure a safe network environment and establish both public and private subnets with gateway suitable for Internet connections. In this network, create an EFS file system and efficiently preserve and provide large -scale pathological slide images. This stack also provides a sage maker notebook instance that automatically connects to EFS storage, and provides seamless access to training data.
This template processes all necessary security configurations, including the role of AWS ID and access management (IAM). When deploying the stack, be careful of the identifier of the private subnet and security group. You need to make sure that the training job can access EFS data storage.
See Readme in the GitHub repository for detailed setup procedures and configuration options.
Use FMS for patch -level prediction tasks
Patch -level analysis is the basis of digital pathological AI workflow. Instead of processing the entire WSI that can exceed some gigabytes, patch -level analysis focuses on specific organizational areas. With this target an approach, you can speed up resources and speed up model development cycles. The following figure shows the workflow of the patch -level prediction task on the WSI.

This figure shows the workflow of the WSI patch -level prediction task.
Classification task: MHIST dataset
Demonstrate patch -level classification using a MHIST dataset containing a column rectal polyp image. Eatically, early detection of cancer -based polyps has a direct impact on the patient’s survival, and makes it a clinically related use case. A simple classification head is added to the H-Optimus-0 prerequisite function, and the linear proof is used to achieve 83 % accuracy. In this implementation, Amazon EFS is used for efficient data streaming, and the optimal GPU usage uses the P3.2xlarge instance.
To access the MHIST dataset, send a data request via the portal and get annotations.csv and yigase.zip files. Our repositories include Download_mhist.sh scripts that automatically download and organize data in EFS storage.
Seg Maintenance Task: Tokuga Data Set
The second patch -level task indicates a nuclear segmentation using a toy datar set, which requires accurate pixel -level prediction of the ocean tissue. By adding a Mask2Former Vit adapter head, H-Optimus-0 adapts to the segmentation so that the model can generate a detailed segmentation mask while using the strong FM feature extraction function.
The toked dataset is available in Kaggle. The repository contains scripts that automatically download and prepare training data. The implementation of the segmentation is executed on the G5.16xlarge instance to process pixel -level prediction calculation requests.
Use FMS for WSI -level tasks
Analysis of the entire WSI will increase the size, so it often exceeds 50,000 x 50,000 pixels, causing unique tasks. To deal with this, implement multiple instance learning (MIL) that handles each WSI as a small patch collection. Our caution -based MIL approach automatically learns the most relevant areas of final predictions. The following figure shows the workflow of the WSI -level prediction task using MIL.

This figure shows the workflow of the WSI -level prediction task using MIL.
WSI processing pipeline
Our implementation optimizes WSI analysis through the following methods:
- Intelligent Patching -Efficiently loads the WSI using the Cucim library of the GPU acceleration, identifies and extracts only areas containing organizations by applying Canny Edge detection.
- Features extraction -The selected patch is processed in parallel using GPU acceleration, and features are stored in HDF5 format with high space efficient for downstream analysis.
MSI status prediction
Demonstrate the WSI pipeline by predicting the microsterite instability (MSI) status, which is an important biomarker that leads to the decision of immunotherapy in cancer treatment. The TCGA-COAD dataset used in this task can be accessed from the GDC data portal, and the repository provides detailed instructions for downloading the WSI and the corresponding MSI label.
cleaning
When you’re done, don’t forget to delete the associated resources (Amazon EFS Storage and Sagemaker Notebook instance) to avoid unexpected costs.
Conclusion
This post has shown how to build scalable digital pathological AI workflows using H-Optimus-0 FM using AWS services. Through both patch -level tasks (MHIST classification and lizard nuclear segmentation) and WSI analysis (MSI status prediction), we have shown how to efficiently handle the unique issues of computing pathology.
The implementation emphasizes seamless integration between AWS services for processing large -scale pathological data processing. Using Amazon EFS for this demo to enable high -loop text training workflows, but production deployment may consider AWS health image for long -term preservation of medical image data.
I hope this pipeline functions as a starting point for your pathology AI Initiative. The provided GitHub repositories include the components that help you build and scaling the pathological workflow of a specific use case. Clone the repository and set up infrastructure using the provided CloudFormation template. Next, fine-tune H-Optimus-0 with your own pathological dataset and downstream task, and compare the results with the current method.
We want to hear about your experience and insights. To move forward in the field of calculation pathology, we will reach out to us or contribute to the published FMS.
About the author
Pierre de Mariard Amazon Web Services Senior AI/ML Solutions Architect supports healthcare and life science industry customers. During the free time, Pierre enjoys skiing and explores the New York food scene.
Christopher Amazon Web Services (AWS) Senior Partner Account Manager, which supports independent software vendors (ISV) to innovate, build, and sell healthcare software (SaaS) solutions as a service in public divisions. Masu. Christopher is a part of the technical field community (TFC) of healthcare and life science, aiming to accelerate the digitalization and use of healthcare data, promote the results and promote personalized care. I am.