What is Machine Learning(ML)? As per a definition given by Tom Mitchell, Machine Learning is the ability of a computer program to improve its Performance(P) at a given task(T) using prior experience(E).

ML problems can be broadly classified into Supervised and Unsupervised learning. These categories have further sub-categories.

- Supervised – You are given a data set and there is a known relation between input and output. The computer program uses that test data and to learn the relation and use it to predict the output for any given input.
- Regression – In these set of problems, the output is a continuous function of input, eg. Given a picture of a person, we have to predict their age.
- Classification – Here, the output is discrete. eg. Given a picture of a person, we have to identify their race/gender etc.

- Unsupervised – The computer program is not fed with test instances. It first identifies all different groups/classes that the data can be ‘classified’ into. And then use that knowledge to predict where a particular data instance will fit best into.
- Clustering
- Non-clustering

Now that we are done with definitions, lets take up a simple regression problem and dive into the mathematics involved to arrive at an algorithm(Gradient Descent).

Problem – Given the age(x) of a house, predict its price(y).

Lets assume we are given a data set of 10,000 houses with their age and current market price. So test data for our ML program will be of the form (x_{i}, y_{i}) where i ∈ [1,10000]. Now we will feed these data instances to our learning algorithm and come out with a predictor function, h(x) = y = θ_{0} + θ_{1}x, where θ_{0}, θ_{1} are variables that we need to find such that the predicted value of y is closest to the actual y.

h(x) is known as hypothesis function.

A diagram will make things easier…

This is a plot of y against x for all the test instances. Our objective is to find a straight line such that average distance of each data point from the line is minimized. That line can be represented by the equation, y = θ_{0} + θ_{1}x, where θ_{0 }and θ_{1 }are respectively, the y-intercept and the slope.

To find such line, we will use the mean squared error method.

where Y hat is the predicted value for the i^{th } instance and Y is the actual value.

Lets call this function, our cost function J(θ_{0}, θ_{1}).