What is supervised learning?
Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal).
A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error.
How does supervised learning work?
Supervised learning uses a training set to teach models to yield the desired output. This training dataset includes inputs and correct outputs, which allow the model to learn over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Supervised learning can be separated into two types of problems when data mining—classification and regression:
Classification uses an algorithm to accurately assign test data into specific categories. It recognizes specific entities within the dataset and attempts to draw some conclusions on how those entities should be labeled or defined. Common classification algorithms are linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbor, and random forest, which are described in more detail below.
Regression is used to understand the relationship between dependent and independent variables. It is commonly used to make projections, such as for sales revenue for a given business. Linear regression, logistical regression, and polynomial regression are popular regression algorithms.
What is the objective of supervised learning?
The objective of a supervised learning model is to predict the correct label for newly presented input data. Supervised learning also helps with:
- Supervised learning allows you to collect data or produce a data output from the previous experience.
- Helps you to optimize performance criteria using experience
- Supervised machine learning helps you to solve various types of real-world computation problems
What are the applications of supervised learning?
Linear regression in supervised learning is typically used in predicting, forecasting, and finding relationships between quantitative data. It is one of the earliest learning techniques, which is still widely used. For example, this technique can be applied to examine if there was a relationship between a company’s advertising budget and its sales. You could also use it to determine if there is a linear relationship between a particular radiation therapy and tumor sizes.
The classification techniques that will be discussed in this section are those focused on predicting a qualitative response by analyzing data and recognizing patterns. For example, this type of technique is used to classify whether or not a credit card transaction is fraudulent. There are many different classification techniques or classifiers, but some of the widely used ones include:
- Logistic regression,
- Linear discriminant analysis,
- K-nearest neighbors,
- Neural Networks, and
- Support Vector Machines
When should we use supervised learning?
Supervised learning is typically done in the context of classification, when we want to map input to output labels, or regression, when we want to map input to a continuous output. Common algorithms in supervised learning include logistic regression, naive bayes, support vector machines, artificial neural networks, and random forests. In both regression and classification, the goal is to find specific relationships or structure in the input data that allow us to effectively produce correct output data. Note that “correct” output is determined entirely from the training data, so while we do have a ground truth that our model will assume is true, it is not to say that data labels are always correct in real-world situations. Noisy, or incorrect, data labels will clearly reduce the effectiveness of your model.When conducting supervised learning, the main considerations are model complexity, and the bias-variance tradeoff. Note that both of these are interrelated.