What is Machine Learning(ML)? As per a definition given by Tom Mitchell, Machine Learning is the ability of a computer program to improve its Performance(P) at a given task(T) using prior experience(E).
ML problems can be broadly classified into Supervised and Unsupervised learning. These categories have further sub-categories.
- Supervised – You are given a data set and there is a known relation between input and output. The computer program uses that test data and to learn the relation and use it to predict the output for any given input.
- Regression – In these set of problems, the output is a continuous function of input, eg. Given a picture of a person, we have to predict their age.
- Classification – Here, the output is discrete. eg. Given a picture of a person, we have to identify their race/gender etc.
- Unsupervised – The computer program is not fed with test instances. It first identifies all different groups/classes that the data can be ‘classified’ into. And then use that knowledge to predict where a particular data instance will fit best into.
- Clustering
- Non-clustering
Now that we are done with definitions, lets take up a simple regression problem and dive into the mathematics involved to arrive at an algorithm(Gradient Descent).
Problem – Given the age(x) of a house, predict its price(y).
Lets assume we are given a data set of 10,000 houses with their age and current market price. So test data for our ML program will be of the form (xi, yi) where i ∈ [1,10000]. Now we will feed these data instances to our learning algorithm and come out with a predictor function, h(x) = y = θ0 + θ1x, where θ0, θ1 are variables that we need to find such that the predicted value of y is closest to the actual y.
h(x) is known as hypothesis function.
A diagram will make things easier…

This is a plot of y against x for all the test instances. Our objective is to find a straight line such that average distance of each data point from the line is minimized. That line can be represented by the equation, y = θ0 + θ1x, where θ0 and θ1 are respectively, the y-intercept and the slope.

To find such line, we will use the mean squared error method.

where Y hat is the predicted value for the ith instance and Y is the actual value.
Lets call this function, our cost function J(θ0, θ1).
