Clustering Geolocation Data Intelligently in Python
In this 2 hour long project, you will learn how to preprocess a text dataset comprising recipes, and split it into a training and validation set. You will learn how to use the HuggingFace library to finetune a deep, generative model, and specifically how to train such a model on Google Colab. Finally, you will learn how to use GPT-2 effectively to create realistic and unique recipes from lists of ingredients based on the aforementioned dataset. This project aims to teach you how to finetune a large-scale model, and the sheer magnitude of resources it takes for these models to learn. You will also learn about knowledge distillation and its efficacy in usecases such as this one.