Date Updated: Feb 25, 2020
Welcome to the regression tutorial (#REG102). This tutorial assumes that you have completed Regression Tutorial (REG101) - Level Beginner. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.
In this tutorial we will use the
pycaret.regression module to learn:
Read Time : Approx 60 Minutes
If you haven't installed PyCaret yet. Please follow the link to Beginner's Tutorial for instructions on how to install pycaret.
If you are running this notebook on Google colab, run the following code at top of your notebook to display interactive visuals.
from pycaret.utils import enable_colab
Before we into the practical execution of the techniques mentioned above in Section 1, it is important to understand what are these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not surprising to also see performance gains in tree-based models. The Below explanations are only brief and we recommend that you do extra reading to dive deeper and get a more thorough understanding of these techniques.
transformationtechnique explained above with the exception that this is only applied to the target variable. Read more to understand the effects of transforming the target variable in regression.
Carat Weightin this experiment. It is a continious distribution of numeric values that can be discretized into intervals. Binning may improve the accuracy of a predictive model by reducing the noise or non-linearity in the data. PyCaret automatically determines the number and size of bins using Sturges rule. Read more
Boosting. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. Read more
For this tutorial we will be using the same dataset that was used in Regression Tutorial (REG101) - Level Beginner.
This case was prepared by Greg Mills (MBA ’07) under the supervision of Phillip E. Pfeifer, Alumni Research Professor of Business Administration. Copyright (c) 2007 by the University of Virginia Darden School Foundation, Charlottesville, VA. All rights reserved.
The original dataset and description can be found here.
You can download the data from the original source found here and load it using the pandas read_csv function or you can use PyCaret's data respository to load the data using the get_data function (This will require internet connection).
from pycaret.datasets import get_data dataset = get_data('diamond', profile=True)