How to Prepare for the AWS Machine Learning Specialty Certification Exam
If you are here on this page then you are probably considering AWS Certified Machine Learning — Specialty exam (MLS-C01)
I recently cleared the AWS Certified Machine Learning Speciality Exam in my first attempt. I attribute this to a bunch of luck :)
Why is this a Difficult Cert
I definitely feel it is one of the most difficult certification exams in the field of Machine Learning. It is worth sharing my impression of this exam for candidates who are considering this certification.
So — why is this exam is hard?
- Wide range of ML topics such as algorithms, model tuning parameters, data ingestion & preparation techniques, performance, scalability and cost elements are considered are in the questions
- Some of the ML development and operations related questions are very difficult to get right without practical experience
- Lot of emphasis into model performance, accuracy, and security that one would not come across doing “hello world” exercises and off-the-shelf trainings
- Deep knowledge of nuances of SageMaker is required — e.g., all the algorithms natively supported in SageMaker
How to Prepare for this Cert
Having gone through the grind, here is what I feel would be the most efficient way to prepare for this exam:
- Take a few real-world AI use cases and design a solution “pipeline”, end-to-end
- Consider all alternative ways to implement this use case using different types of solution “pipelines” -
- consider building custom components and
- consider options using off-the-shelf AWS services where possible
Example Scenario
Consider this scenario of Video Processing and applying AI to recognise human faces. This is a common use case. In this exam, questions are framed around common use cases like this and then multiple options are suggested as solutions. The candidate is expected to know what is the best option given certain constraints.
Create Solution Pipelines
For example, a common way to solve this use case could have more than one options e.g.
A) Solution Pipeline using AWS DeepLens
Once DeepLens is chosen, even then there are many options at each stage of the solution pipeline that can lead to questions.
B) Solution Pipeline using customer’s own cameras
Even for this option there are many choices possible at each point in the solution “pipeline”
Consider Design Constraints for Each of these“Pipelines”
Having investigated a few options to build the solution “pipeline” now it is time to consider how each of these pipelines would play out with a variety of constraints such as:
- Ease of development
- Ease of management and maintenance
- Nature of the solution — e.g. real-time vs batch
- Ability handle volume and scale
- Ability to extend or expand the solution to flexibly meet future requirements
- Security and network topologies
- Cost etc.
Consider Solution Accuracy Aspects
Having built valid solution pipelines that works under different constraints — one needs to consider the questions of accuracy, time and performance aspect an AI solution. This is where things get deeper into the ML domain.
- Do I have enough data?
- Do I have labelled data?
- Does the data have trend or seasonal characteristics ?
- Do I need any pre-processing of data / features ?
- Do I need to clean the data?
- Do I need to synthetically generate some missing data fields?
- Do I need to combine fields?
- Do I need to normalise some extreme ranges of certain data fields?
- Do I need to encode some fields in the data?
- Do I need to consider filtering out certain “noisy” data points?
- Do I need to add more features in the data, when is this needed?
- Do I have data with highly correlated fields?
- Do I need to sample data to balance it?
- Which algorithm should I use and why? Consider options such as using XGBoost Vs Random Cut Forrest for detecting anomaly in a financial transaction.
- Should consider using Deep Learning or not, and why?
- What are my model performance metrics?
- How to know if Model overfitting the training data?
- How to mitigate overfitting / underfitting?
- How to read a confusion matrix and apply it to real world use case?
- How to understand an Area Under the ROC curve and apply it to real world use case?
- How to mitigate the issue of poor performance of models for new production data?
- How to improve generalisation of a model?
- How to know if a model is losing relevance over time?
- How to tune hyperparameter of most commonly used algorithms?
- How to make use of automatic hyperparameter tuning capability of sagemaker?
Consider the above angles when you are preparing for the exam is extremely important. There are much emphasis on these.
What are the Key AWS Services to Learn/Know About
There is a bunch of focus given to the following AWS Services in this exam. Listed below are most common ones, in the order of their importance for this exam:
- SageMaker
- Lambda
- Gluon
- S3
- Kinesis
- All the language related (Poly, Transcribe, Comprehend, Translate, Lex)
- Rekognition
- EMR
What Not to Worry About
Turns out that you do not need to memorise information about the limits and specifications of AWS services. For example,
- What are the GPU type instances available in AWS
- What is size limit of lambda functions
- What services are available / not available in which regions
You are expected to know the functional aspects of these services and WHEN and HOW to use them.
Sample Questions
Take a look at these sample questions for understanding a flavour of the questions to expect in the exam.
Finally,
I hope that the inputs above are useful for you. Good Luck!