The inclusive NHS-R community have welcomed me to complete my webinar hat trick for this year. In this webinar I focus on building a ML model from scratch. The overview of the webinar is:
- Build your first Machine Learning classification model with tidymodels
- Understanding data processing for machine learning
- Evaluate your machine learning models with ROC curves and Confusion Matrices
- Understand the tidy models process for model creation
- Understand sampling methods in machine learning model creation
- Work with packages such as recipes, yardstick, rsample, tune, parsnip and caret
- Using the ConfusionTableR package to flatten confusion matrix outputs for storing in databases
- Serialise your models
- Assessing global feature importance
Once you have conquered the basics of this, we will then go on to:
- Model improvement via model selection
- Model improvement by hyper parameter tuning
- Model improvement by resampling methods
- Model improvement using a combination of the three methods listed
If this sounds like your kind of thing, then watch the tutorial in the next section.
Where is the Youtube for this?
Embedded in this is the original webinar, hosted by the NHS-R community. I had an excellent group of attendees and I have had some lovely feedback on the back of the session:
Where is the code and workbooks to support this session?
All the content for this session is available on GitHub.
Please if you use this repo, please remember to star the repo. I need more stars going forward, and I know people are using my content, but just not starring. If you are one of these naughty people – give some credit where it is due.
Want to deploy your model?
I have several tutorials on how to deploy your models. The links below will give you everything you need to get started:
- Introduction to Docker full course code: https://github.com/StatsGary/NHS_R_Community_Intro_to_Docker.
- Presentation from the session – this takes you through the session
- Deploying a CARET Machine Learning model as an API with PlumbeR – this shows how to create the ML model, swagger endpoint, create the end point files needed and the OpenAPI.yaml file
- Creating a classification model from scratch with TidyModels – this shows an alternate approach to it, instead of CARET replace with TidyModels.
- Assessing classification model with ConfusionTableR and outputting matrix to database – this will show you how to use the Confusion Matrix object of R and then beable to store the results into a database with ConfusionTableR.
- Deploying our model to Docker – this steps you through how to create the Docker file, get everything in a docker folder for deployment, deploy to Docker with Powershell / CMD and then to consume the endpoint with swagger and JSON – making the model platform agnostic.
- Accessing API and making predictions – this will show you how to use the Swagger API to make predictions on production / unseen data and return the results back to R in JSON. Then we convert the JSON and push it back out.
- Full article taking you through model training and deploying our model to Docker – this is a link to the full article on my website.
Useful references
Without a doubt, the useful references for this can be found on my site, but for TidyModels refer to the RStudio created content on how and why you would use TidyModels.
Shout out and fin!
I would like to thank the NHS-R community for hosting me for this session. I really enjoyed the session and have had some lovely feedback from the attendees on the session. As I like to share everything in an open source guise I have published this article so everyone can access the content.
Reminders – please star the repos, subscribe to my YouTube videos and just reach out if you have any questions.
Using this tutorial you have learned machine learning classification. Tom says it best: