DTPlyr – easier data.table for DPLYR users

Do you program in R and normally use DPLYR for data wrangling, manipulation or whatever term you call it? Have you heard all the hype about data.table and how this package can significantly improve the performance run time of your R scripts? Have you been meaning to get round to learning data.table and have never…

Continue Reading

Build and improve a Machine Learning Classification model with TidyModels and R

These set of tutorial arose through my desire to use as many machine learning packages as possible. My favourites still remain tensorflow, caret, sci-kit learn and now TidyModels. Why TidyModels? Instead of replacing the modelling package, tidymodels replaces the interface. Better said, tidymodels provides a single set of functions and arguments to define a model. It then fits the…

Continue Reading

NHSDataDictionaRy is back on CRAN

The NHSDataDictionaRy package is now back on CRAN, and I am pleased as punch. This update contains the OpenSafely scraper to get data from the website for lookups developed by Ben Goldacre’s team. Why did it disappear? The package disappeared due to me taking it down for major script and function updates. This has now…

Continue Reading

Feature encoding methods – the Pandas way

This tutorial explores the various ways data can be encoded, using Pandas and Numpy, to prepare the data ready for a Machine Learning, or predictive model pipeline. Encoding methods There are three main methods explored therein: Label encoding – encoding a value based on where the label order falls – could be good for rank…

Continue Reading

Creating Virtual Environments for Python Projects in VS Code

I had a similar problem recently, and then a request came through from a close friend (Chris Mainey) for the same purpose. I thought “I’ll write a blog post on this”. So, what are the benefits of creating virtual environments.” First of all. what are virtual environments: A virtual environment is a Python environment such that the Python interpreter, libraries and…

Continue Reading

ConfusionTableR package has a new function

The ConfusionTableR package has a new function. Welcome to var_impeR which takes a trained caret R model and produces a tibble and a supporting variable importance plot. How to use the new var_impeR function The code following shows how to use the new function: Training a CARET model The following steps were used on the…

Continue Reading