Machine learning (ML) projects are inherently complex and involve multiple intricate steps, from data collection and pre-processing to model building, deployment, and maintenance. Data scientists face numerous challenges throughout this process, including choosing the right tools, providing step-by-step instructions with code samples, and troubleshooting errors and issues. These repetitive challenges can bog down and slow down project progress. Fortunately, generative AI-powered developer assistants such as Amazon Q Developer have emerged to help data scientists streamline their workflows and fast-track ML projects, saving them time to focus on strategic initiatives and innovation.
Amazon Q Developer is fully integrated with Amazon SageMaker Studio, an integrated development environment (IDE) that provides a single web-based interface to manage all stages of ML development. You can use this natural language assistant from your SageMaker Studio notebooks to get personalized assistance using natural language. It provides tool recommendations, step-by-step guidance, code generation, and troubleshooting support. This integration simplifies your ML workflow, allowing you to efficiently build, train, and deploy ML models without having to leave SageMaker Studio to search for additional resources or documentation.
In this post, we present a real-world use case where we develop an ML model that analyzes a diabetes dataset from 130 U.S. hospitals to predict the likelihood of post-discharge readmission. In this exercise, we use Amazon Q Developer in SageMaker Studio at different stages of the development lifecycle to experience first-hand how this natural language assistant can help streamline the development process and accelerate time to value for even the most experienced data scientists and ML engineers.
Solution overview
For AWS Identity and Access Management (IAM) and AWS IAM Identity Center users, you can use the Amazon Q Developer Pro level subscription within Amazon SageMaker. Administrators can subscribe users to the Pro level in the Amazon Q Developer console, enable the Pro level in the SageMaker domain settings, and specify the Amazon Resource Name (ARN) for the Amazon Q Developer profile. The Pro level provides unlimited chat and inline code suggestions. For detailed instructions, see Setting up Amazon Q Developer for a User.
If you don’t have a Pro Tier subscription but would like to try the feature, you can access the Amazon Q Developer free tier by adding the relevant policies to the SageMaker service role. Administrators can go to the IAM console, search for the SageMaker Studio role, and add the policies described in Configure Amazon Q Developer for Your Users. The free tier is available for both IAM users and IAM Identity Center users.
To begin your ML project to predict the likelihood of hospital readmission for diabetic patients, you will need to download the US 130 Hospital Diabetes Dataset. This dataset contains 10 years (1999-2008) of clinical care data for 130 US hospitals and integrated delivery networks. Each row represents a hospital record, including a patient who was diagnosed with diabetes and had a test performed.
At the time of writing, Amazon Q Developer support in SageMaker Studio is only available in JupyterLab spaces. Amazon Q Developer is not supported in shared spaces.
Amazon Q Developer Chat
Once you have uploaded your data to SageMaker Studio, you can start working on your ML problem of reducing readmission rates for diabetic patients. First, use the chat feature next to the JupyterLab notebook. You can ask questions like generating code to analyze diabetic data from 130 US hospitals, how should this ML problem be formulated, and how do you plan to build an ML model to predict the likelihood of readmission after discharge. Amazon Q Developer uses AI to provide code recommendations, which are non-deterministic. The results you get may differ from those shown in the following screenshot.
You can ask Amazon Q Developer to help you plan your ML project. In this case, let the assistant show you how to train a random forest classifier using the Diabetes 130-US dataset. Enter the following prompts into the chat and Amazon Q Developer will generate a plan for you. Once the code is generated, you can use the UI to insert the code directly into your notebook:
You can ask an Amazon Q developer to generate code for a specific task by inserting the following prompt:
You can also ask an Amazon Q developer to walk you through your existing code or troubleshoot common errors by simply selecting the cell with the error and typing. /fix
In chat.
The complete list of shortcut commands is as follows:
- /help – Show this help message
- /repair – Fix selected error cells in notebook
- /Clear – Clear the chat window
- /export – Export chat history to a Markdown file
To get the most out of your Amazon Q developer chat, we recommend following best practices when creating your prompts:
- Directly and specifically – Ask precise questions. For example, instead of asking vague questions about AWS services, try asking, “Can you provide example code for training an XGBoost model in SageMaker using the SageMaker Python SDK library?”. Specific questions allow the Assistant to understand exactly what information you need, resulting in a more accurate and helpful answer.
- Providing context – The more context you provide, the better the results. This allows Amazon Q Developer to tailor its response to your specific situation. For example, instead of just asking for code to prepare your data, provide the first three rows of your data to get better code suggestions that require fewer changes.
- Avoid sensitive topics – Amazon Q Developer is designed with guardrail controls: it is best to avoid asking questions related to security, account billing information, or other sensitive topics.
Following these guidelines will help you maximize the value of Amazon Q Developer’s AI-powered code recommendations and streamline your ML projects.
Inline code suggestions for Amazon Q developers
As you type in your JupyterLab notebooks, you also get real-time code suggestions. They provide contextual suggestions based on your existing code and comments, streamlining the coding process. In the following examples, we show how you can use the inline code suggestions feature to generate code blocks for various data science tasks, from data exploration to feature engineering, training a random forest model, evaluating the model, and finally deploying the model to predict the likelihood of hospital readmission for diabetic patients.
The following image shows a list of keyboard shortcuts for navigating Amazon Q Developer.
Let’s start by exploring the data.
First, import some required Python libraries such as pandas and NumPy. Add the following code to the first code cell in your Jupyter Notebook and run the cell.
Add the following comment to the next code cell and before running the cell: input and tabWatch the status bar at the bottom to see Amazon Q Developer generate code suggestions.
You can also ask an Amazon Q developer to create a visualization for you.
Now we can perform feature engineering to prepare the model for training.
The provided dataset has some categorical features and missing data that need to be converted to numerical features. Add the following comment to the next code cell: tab See how Amazon Q Developer can help you:
Finally, Amazon Q Developer allows you to create a simple ML model, a random forest classifier, using scikit-learn.
Amazon Q Developer in SageMaker Data Policy
If you use Amazon Q Developer in SageMaker Studio, your customer content will not be used to improve the service, regardless of whether you are using the free or professional tier. For IDE-level telemetry sharing, Amazon Q Developer may track your usage of the service, such as the number of questions you ask and whether you accept or reject a recommendation. This information does not include customer content or any personally identifiable information, such as IP addresses. If you would like to opt out of IDE-level telemetry, follow the steps below to opt out of sharing your usage data with Amazon Q Developer.
- Above setting Menu, Select Config Editor.
- Uncheck the option Share usage data with Amazon Q developers.
Alternatively, ML platform administrators can use a lifecycle configuration script to disable this option by default for all users in JupyterLab. For more information, see Using Lifecycle Configuration in JupyterLab. To disable data sharing with Amazon Q Developer by default for all users in your SageMaker Studio domain, follow these steps:
- In the SageMaker console, Lifecycle Configuration under Administrator Settings In the navigation pane.
- choose Create a configuration.
- for nameenter a name.
- So script In the section, create a lifecycle configuration script,
shareCodeWhispererContentWithAWS
Configuration Flagsjupyterlab-q
expansion:
- Attach the disable-q-data-sharing lifecycle configuration to the domain.
- Optionally, you can force the lifecycle configuration to run. Run by default
- Use this lifecycle configuration when creating a JupyterLab space.
It is selected by default if the setting is set to: Run by default.
Setup is almost instantaneous. Share usage data with Amazon Q developers Select the JupyterLab space option on launch.
cleaning
To avoid incurring AWS charges after testing this solution, delete the SageMaker Studio domain.
Conclusion
In this post, we covered a real-world use case, developing an ML model to predict the likelihood of post-discharge readmission for patients in a diabetes dataset from 130 US hospitals. This exercise used Amazon Q Developer in SageMaker Studio at different stages of the development lifecycle and demonstrated how this developer assistant can streamline the development process and accelerate time to value, even for experienced ML practitioners. You can access Amazon Q Developer in all AWS Regions where SageMaker is generally available. Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI-powered assistant.
The Assistant is available to all Amazon Q Developer Pro and Free Tier users. For pricing information, see Amazon Q Developer Pricing.
About the Author
James Wu James is a Senior AI/ML Specialist Solutions Architect at AWS, helping customers design and build AI/ML solutions. James’ work covers a wide range of ML use cases with a focus on computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James worked as an architect, developer, and technology leader for over 10 years, spending 6 years in the engineering industry and 4 years in the marketing and advertising industry.
Lauren Mullenex Sr. AI/ML Specialist Solutions Architect at AWS with 10 years of experience in DevOps, Infrastructure and ML, with focus areas including Computer Vision, MLOps/LLMOps and Generative AI.
Shivin Michaellaji He is a Senior Product Manager in the Amazon SageMaker team, focusing on building AI/ML based products for AWS customers.
Pranav Murti Pranav is an AI/ML Specialist Solutions Architect at AWS. He is focused on helping customers build, train, deploy, and migrate their Machine Learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry where he developed large-scale Computer Vision (CV) and Natural Language Processing (NLP) models to improve semiconductor processes using cutting-edge ML techniques. In his spare time, Pranav enjoys playing chess and traveling. You can find Pranav on LinkedIn.
Badrinath Pani He is a Software Development Engineer at Amazon Web Services working on Amazon SageMaker Interactive ML products. He has 12+ years of software development experience in domains such as Automotive, IoT, AR/VR, and Computer Vision. Currently, he is primarily focused on developing machine learning tools aimed at simplifying the experience for data scientists. In his spare time, he enjoys spending time with his family and exploring the beautiful landscapes of the Pacific Northwest.