In today’s rapidly changing world, monitoring the health of the Earth’s vegetation is more important than ever. Vegetation plays an important role in maintaining ecological balance, providing nutrients, and acting as a carbon sink. Traditionally, monitoring the health of vegetation has been a difficult task. Methods such as field surveys and manual satellite data analysis are not only time-consuming, but also require significant resources and expertise. These traditional approaches are cumbersome. This often results in delays in data collection and analysis, making it difficult to track and quickly respond to changes in the environment. Furthermore, the high costs associated with these methods limit their availability and frequency, impeding comprehensive and continuous global vegetation monitoring efforts on a global scale. Considering these challenges, we have developed an innovative solution to streamline the vegetation monitoring process and increase efficiency on a global scale.
Moving away from traditional, labor-intensive methods of monitoring vegetation health, Amazon SageMaker geospatial capabilities provide a streamlined and cost-effective solution. Amazon SageMaker supports geospatial machine learning (ML) capabilities that enable data scientists and ML engineers to build, train, and deploy ML models using geospatial data. These geospatial capabilities open up a new world of environmental monitoring possibilities. SageMaker allows users to access a wide range of geospatial datasets, efficiently process and enrich this data, and accelerate development timelines. Tasks that previously took days or even weeks to complete can now be completed in a fraction of the time.
In this post, we demonstrate the power of SageMaker’s geospatial capabilities by mapping the world’s vegetation in under 20 minutes. This example highlights not only the efficiency of SageMaker, but also the impact of how geospatial ML can be used to monitor the environment for sustainability and conservation purposes.
identify your area of interest
First, we’ll show you how to apply SageMaker to analyze geospatial data on a global scale. To get started, follow the steps described in Getting Started with Amazon SageMaker Geospatial Features. Start by specifying geographic coordinates that define a bounding box that covers the area of interest. This bounding box acts as a filter to select only relevant satellite images that cover the Earth’s landmass.
Data acquisition
SageMaker geospatial capabilities provide access to a wide range of public geospatial datasets, including Sentinel-2, Landsat 8, Copernicus DEM, and NAIP. We chose Sentinel-2 for our vegetation mapping project due to its global coverage and frequency of updates. The Sentinel-2 satellite captures images of the Earth’s surface at a resolution of 10 meters every five days. In this example, we select the first week of December 2023. Filter images with less than 10% cloud coverage to ensure coverage of most of the visible ground. In this way, the analysis is based on clear and reliable images.
By utilizing search_raster_data_collection
We used SageMaker geospatial functions to identify 8,581 unique Sentinel-2 images taken during the first week of December 2023. To verify the accuracy of our selections, we plotted the footprints of these images on a map to ensure they were the correct images. analysis.
SageMaker geospatial processing jobs
When we queried the data using SageMaker geospatial capabilities, we received comprehensive details about the target image, including data footprint, properties around spectral bands, and hyperlinks for direct access. These hyperlinks allow you to bypass the memory- and storage-intensive traditional method of first downloading images and then processing them locally. This task is made even more challenging by the size and scale of datasets that exceed 4 TB. Each of the 8,000 images is large, has multiple channels, and is approximately 500 MB in individual size. Processing terabytes of data on a single machine takes too much time. Setting up a processing cluster is an alternative, but it introduces its own complexities, from data distribution to infrastructure management. SageMaker Geospatial uses Amazon SageMaker Processing to streamline this. Uses dedicated geospatial containers and SageMaker processing jobs for a simplified management experience for creating and running clusters. With just a few lines of code, you can scale out your geospatial workloads using SageMaker processing jobs. Simply specify your workload, the location of your geospatial data on Amazon Simple Storage Service (Amazon S3), and a script that defines your geospatial container. SageMaker Processing provisions cluster resources to run geospatial ML workloads at city, country, or continent scale.
Our project uses 25 clusters, each cluster consisting of 20 instances, to scale out geospatial workloads. The 8,581 images were then divided into 25 batches for efficient processing. Each batch contains approximately 340 images. These batches are distributed evenly across the machines in the cluster. All batch manifests are uploaded to Amazon S3 and ready for processing jobs, so each segment is processed quickly and efficiently.
Once the input data is ready, we move on to core analysis that reveals insights into vegetation health through the Normalized Difference Vegetation Index (NDVI). NDVI is calculated from the difference between near-infrared (NIR) and red reflectances and normalized by their sum, resulting in a value ranging from -1 to 1. Higher NDVI values indicate denser, healthier vegetation, and a value of 0 indicates no vegetation. Negative values typically refer to bodies of water. This indicator serves as an important tool for assessing vegetation health and distribution. Below is an example of what NDVI looks like.
Now that the calculation logic is defined, you are ready to start your geospatial SageMaker processing job. This involves a simple three-step process: setting up the compute cluster, defining the computation details, and organizing the input and output details.
First, set up your cluster by determining the number and type of instances you need for your job and ensuring they are suitable for geospatial data processing. The computing environment itself is prepared by selecting geospatial images that come with all the packages commonly used to process geospatial data.
Next, use as input the manifest you created earlier that lists all image hyperlinks. Also specify the S3 location to save the results.
Configuring these elements allows you to start multiple processing jobs at once, allowing them to run concurrently and increase efficiency.
When you launch a job, SageMaker automatically launches the required instances and configures the cluster to process the images listed in the input manifest. This entire setup works seamlessly and requires no manual management. You can use the SageMaker console to monitor and manage your processing jobs. Provides real-time updates on the status and completion of processing tasks. In this example, 500 instances took less than 20 minutes to process all 8,581 images. SageMaker’s scalability allows you to reduce processing time by simply increasing the number of instances as needed.
conclusion
The power and efficiency of SageMaker’s geospatial capabilities has opened new doors for environmental monitoring, especially in the area of vegetation mapping. This example showed how to process over 8,500 satellite images in under 20 minutes. We have not only demonstrated the technical feasibility, but also the efficiency gains of using the cloud for environmental analysis. This approach represents a major leap from traditional resource-intensive methods to a more agile, scalable, and cost-effective approach. The flexibility to scale up or down processing resources as needed, and the ease of accessing and analyzing vast datasets, positions SageMaker as a transformational tool in the field of geospatial analysis. SageMaker simplifies the complexity associated with large-scale data processing, allowing scientists, researchers, and businesses to focus on extracting insights rather than infrastructure and data management. Masu.
Looking to the future, the integration of ML and geospatial analysis promises to further deepen our understanding of Earth’s ecosystems. The possibility of monitoring changes in real time, predicting future trends, and making more informed decisions and responding could significantly contribute to global conservation efforts. This vegetation mapping example is just the beginning for performing planetary-scale ML. For more information, see Amazon SageMaker Geospatial Features.
About the author
Shuo I am a senior applied scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current research interests include LLM evaluation and data generation. In his free time, he enjoys running, playing basketball, and spending time with his family.
Anirudh Viswanathan I am a Senior Product Manager for Technical – External Services on the SageMaker Geospatial ML team. He holds a master’s degree in robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named an inventor on more than 40 patents. He enjoys long distance running and visiting art galleries and Broadway shows.
Janos Vositz He is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he leverages AI and ML to deliver innovative solutions and supports customers around the world in building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, enhanced by a strong background in software engineering and industry expertise in areas such as autonomous driving.
Lee Elan Lee Applied Science Manager for Human-in-the-Loop Services, AWS AI, and Amazon. His research interests include 3D deep learning, learning visual and linguistic representations. Previously, he served as Senior Scientist at Alexa AI, Head of Machine Learning at Scale AI, and Chief Scientist at Pony.ai. Previously, he worked on Uber ATG’s Perception team and Uber’s Machine Learning Platform team, where he worked on strategic initiatives in machine learning, machine learning systems, and AI for self-driving. He began his career at Bell Laboratories and served as an adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and has done some work on machine learning for autonomous driving, 3D vision and robotics, machine learning systems, and adversarial machine learning at NeurIPS, ICML, CVPR, and ICCV. We co-hosted the shop. He holds a PhD in computer science from Cornell University. He is an ACM Fellow and an IEEE Fellow.
Amit Modi is the product lead for SageMaker MLOps, ML Governance, and Responsible AI on AWS. With over 10 years of B2B experience, he drives innovation and builds scalable products and teams that deliver value to customers around the world.
chris efland is a visionary technology leader with over 20 years of experience driving product innovation and growth. Chris has helped both startups and large corporations develop new products, including consumer electronics and enterprise software, across many industries. In his current role at Amazon Web Services (AWS), Chris leads the geospatial AI/ML category. He works on the front lines of Amazon SageMaker, Amazon’s fastest growing ML service, serving over 100,000 customers worldwide. He recently led the launch of new geospatial capabilities in Amazon SageMaker. It’s a powerful toolset that enables data scientists and machine learning engineers to build, train, and deploy ML models using satellite imagery, maps, and location data. Prior to joining AWS, Chris was responsible for Lyft’s autonomous vehicle (AV) tools and AV maps, leading the company’s automated mapping efforts and the toolchain used to build and operate Lyft’s self-driving vehicle fleet. led. He also served as an engineering director at HERE Technologies and Nokia, and co-founded several startups.