Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in an ML project. Amazon SageMaker Canvas is a low-code, no-code visual interface to build and deploy ML models without writing any code. Based on customer feedback, we have integrated the advanced ML-specific data preparation capabilities of SageMaker Data Wrangler within SageMaker Canvas, providing users with an end-to-end, no-code workspace for data preparation, building and deploying ML models.
SageMaker Canvas abstracts much of the complexity of the ML workflow so you can prepare your data and build or use models to generate highly accurate business insights without writing any code. Additionally, preparing your data in SageMaker Canvas offers many enhancements, including up to 10x faster page loads, a natural language interface for data preparation, the ability to see the size and shape of your data at each step, and improvements to replace and reorder transformations for iterating through your dataflow. Finally, you can create models with one click in the same interface, or create SageMaker Canvas datasets to fine-tune your foundation model (FM).
In this post, we show how to bring your existing SageMaker Data Wrangler flows (instructions created when building data transformations) from SageMaker Studio Classic to SageMaker Canvas. We provide an example of moving files from SageMaker Studio Classic to Amazon Simple Storage Service (Amazon S3) as an intermediate step before importing the files into SageMaker Canvas.
Solution overview
The high-level steps are as follows:
- Open a terminal in SageMaker Studio and copy the flow file to Amazon S3.
- Import the flow file from Amazon S3 into SageMaker Canvas.
Prerequisites
In this example, data-wrangler-classic-flows
As a staging folder for migrating flow files to Amazon S3. You don’t have to create a migration folder, but in this example, a folder was created using the file system browser portion of SageMaker Studio Classic. After you create the folder, be careful to move and consolidate related SageMaker Data Wrangler flow files together. In the following screenshot, the three flow files required for migration have been moved into the folder. data-wrangler-classic-flows,
One of these files, as shown in the left pane, is titanic.flow
will open and appear in the right pane.
Copy the flow file to Amazon S3
To copy your flow files to Amazon S3, complete the following steps:
- To open a new terminal in SageMaker Studio Classic, file Menu, Select Terminal.
- Once you have a new terminal open, you can copy the flowfile to your preferred Amazon S3 location by entering the following command (replace NNNNNNNNNNNN with your AWS account number):
The following screenshot shows an example of the Amazon S3 sync process. A confirmation message appears after all files have been uploaded. You can adjust the code above to meet your own input folder and Amazon S3 location needs. If you don’t want to create a folder, when you enter the terminal, you can just click Change Directory (cd
) command copies all flow files across your SageMaker Studio Classic file system to Amazon S3, regardless of their original folder.
After you upload the files to Amazon S3, you can use the Amazon S3 console to verify that the files have been copied. In the following screenshot, the original three flow files are visible in the S3 bucket.
Importing Data Wrangler flow files into SageMaker Canvas
To import your flow file into SageMaker Canvas, follow these steps:
- In the SageMaker Studio console, Data Wrangler In the navigation pane.
- choose Importing a Dataflow.
- for Select a data source and click choose Amazon S3.
- for Input S3 endpointEnter the Amazon S3 location that you used earlier to copy the files from SageMaker Studio to Amazon S3. goYou can also navigate to the Amazon S3 location using the following browsers:
- Select the flow file you want to import, Import.
After you import the file, the SageMaker Data Wrangler page updates to show the newly imported file, as shown in the following screenshot.
Using SageMaker Canvas to transform data with SageMaker Data Wrangler
Select one of the flows (in this example, titanic.flow
) to start the SageMaker Data Wrangler transformation.
You can now add analysis and transformation to your dataflows using either a visual interface (Accelerate Data Prep for ML with Amazon SageMaker Canvas) or a natural language interface (Explore and prepare data in natural language using new features in Amazon SageMaker Canvas).
When you’re happy with your data, select the plus sign Create a modelor select export Export the dataset to build and use an ML model.
Alternative migration methods
This post described how to migrate SageMaker Data Wrangler flow files from a SageMaker Studio Classic environment using Amazon S3. Phase 3: (Optional) Migrate data from Studio Classic to Studio provides a second method to transfer flow files using a local machine. Additionally, you can download a single flow file from the SageMaker Studio tree control to your local machine and manually import it into SageMaker Canvas. Choose the method that suits your needs and use case.
cleaning
When you’re done, shut down any SageMaker Data Wrangler applications that are running in SageMaker Studio Classic. To save costs, you can also delete the flow files from the SageMaker Studio Classic file browser, which is an Amazon Elastic File System (Amazon EFS) volume. You can also delete the intermediate files in Amazon S3. Once the flow files have been imported into SageMaker Canvas, the files copied to Amazon S3 are no longer needed.
You can log out of SageMaker Canvas when you are done with your work, and then relaunch it when you are ready to use it again.
Conclusion
Migrating your existing SageMaker Data Wrangler flows to SageMaker Canvas is a straightforward process, allowing you to use the advanced data preparation you’ve already developed while taking advantage of the end-to-end low-code, no-code ML workflow of SageMaker Canvas. By following the steps outlined in this post, you can seamlessly migrate your data wrangling artifacts to your SageMaker Canvas environment, streamlining your ML projects and empowering business analysts and non-technical users to build and deploy models more efficiently.
Try SageMaker Canvas today and experience the power of a unified platform for data preparation, model building, and deployment.
About the Author
Charles Laughlin Charles is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MSc in Supply Chain Management and a PhD in Data Science. Charles works on the Amazon SageMaker service team, integrating research and customer feedback into the service roadmap. In his work, he works daily with various AWS customers, helping them transform their businesses with cutting edge AWS technologies and thought leadership.
Dan Schinreich He is a Senior Product Manager at Amazon SageMaker, focusing on expanding our no-code/low-code offerings. He is passionate about making ML and generative AI more accessible to help solve tough problems. Outside of work, he enjoys playing hockey, scuba diving, and reading sci-fi novels.
Phuong Nguyen He is a Senior Product Manager at AWS with 15 years of experience building customer-centric, data-driven products, where he leads ML data preparation for SageMaker Canvas and SageMaker Data Wrangler.
Davide Galittelli I’m a Specialist Solutions Architect for AI/ML in EMEA. I’m based in Brussels and work closely with clients across the Benelux. I’ve been a developer since I was young, starting to code at the age of 7. I started learning AI/ML after graduating university and have been hooked ever since. Get verified