Learning Machine Learning – Part 1

What is Machine Learning(ML)? As per a definition given by Tom Mitchell, Machine Learning is the ability of a computer program to improve its Performance(P) at a given task(T) using prior experience(E).

ML problems can be broadly classified into Supervised and Unsupervised learning. These categories have further sub-categories.

  • Supervised – You are given a data set and there is a known relation between input and output. The computer program uses that test data and to learn the relation and use it to predict the output for any given input.
    • Regression – In these set of problems, the output is a continuous function of input, eg. Given a picture of a person, we have to predict their age.
    • Classification – Here, the output is discrete. eg. Given a picture of a person, we have to identify their race/gender etc.
  • Unsupervised – The computer program is not fed with test instances. It first identifies all different groups/classes that the data can be ‘classified’ into. And then use that knowledge to predict where a particular data instance will fit best into.
    • Clustering
    • Non-clustering

Now that we are done with definitions, lets take up a simple regression problem and dive into the mathematics involved to arrive at an algorithm(Gradient Descent).

Problem – Given the age(x) of a house,  predict its price(y).

Lets assume we are given a data set of 10,000 houses with their age and current market price. So test data for our ML program will be of the form (xi, yi) where i ∈ [1,10000]. Now we will feed these data instances to our learning algorithm and come out with a predictor function, h(x) = y = θ0 + θ1x, where θ0, θ1 are variables that we need to find such that the predicted value of y is closest to the actual y.

h(x) is known as hypothesis function.

A diagram will make things easier…


This is a plot of y against x for all the test instances. Our objective is to find a straight line such that average distance of each data point from the line is minimized. That line can be represented by the equation, y = θ0 + θ1x, where θand θare respectively, the y-intercept and the slope.


To find such line, we will use the mean squared error method.

\operatorname {MSE}={\frac {1}{n}}\sum _{{i=1}}^{n}({\hat {Y_{i}}}-Y_{i})^{2}

where Y hat is the predicted value for the ith  instance and Y is the actual value.

Lets call this function, our cost function J(θ0, θ1).