The awesome TidyModels team have been working hard to populate the tidymodels package and make it even easier to get your foot in the door when it comes to development of models in R.
I have been planning this workshop for a long time with my good old colleagues at the NHS-R Community, and we thought it apt to do this workshop in the run up to the awesome NHS-R Conference 2022, which sadly I cannot make this year due to work commitments.
What did the workshop cover?
In the workshop we went over many concepts relating to machine learning, with the focus being on:
- Load in data from the MLDataR library
- Explaratory data analysis
- Create a recipe for model training
- Build a Parsnip baseline regression model and then compare to a cutting edge algorithm (XGBoost)
- Hyperparameter tune with dials and fit seperate models
- Evaluate your model with ConfusionTableR
- Visualise and save your model results
- Serialise model
- Build inference script to pass production data through model
- Deploy your model with Vetiver (a new package for MLOps) which creates a Plumber API and docs for deploying to other services, such as a Dockerfile
This was structured as a code along workshop and it was interesting dealing and fault fixing issues on the spot.
Can I follow along with the workshop?
The workshop can be followed along below:
This contains the full two hour tutorial and active workshop ran on behalf of the NHS-R Community.
Where can I get the code?
The code can be obtained by accessing the supporting GitHub, please make sure you give it a star:
- The associated GitHub repository can be found here: https://github.com/StatsGary/NHS_R_Comm_Build_TM_from_scratch
- Associated resources:
- Building a classification model from scratch and deploying with Plumber: https://youtu.be/PtD5hgHM-DY
- TidyModels webinar on building TidyModels models: https://www.youtube.com/watch?v=hxRx7ozLNKw
- Deploying Plumber web service as a Docker micro-service: https://youtu.be/JK6VLAKRjO4
- Advanced Modelling with Caret in R: https://www.youtube.com/watch?v=9uLiSTc-MUs
- Introduction to Docker and R: https://github.com/StatsGary/NHS_R_Community_Intro_to_Docker
- Assessing classification model with ConfusionTableR and outputting matrix to database – this will show you how to use the Confusion Matrix object of R and then be able to store the results into a database with ConfusionTableR.
I really enjoyed this interactive session and it has had good to have an opportunity to feedback. This session has already attracted interest from the TidyModels team at R-Studio (now Posit) and from members of the group:
Excellent workshop today on building a TidyModels from scratch, was incredibly helpful as a newbie to modelling in R! Thanks @StatsGary @NHSrCommunity 😁 pic.twitter.com/WWQqRRSivI— Craig (@Craig4Epi) November 1, 2022