Featured
Basics of Machine Learning -Part I
Photo by Franck V. |
“Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.”
- “Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” – Nvidia
- “Machine learning is the science of getting computers to act without being explicitly programmed.” – Stanford
- “Machine learning is based on algorithms that can learn from data without relying on rules-based programming.”- McKinsey & Co.
- “Machine learning algorithms can figure out how to perform important tasks by generalizing from examples.” – University of Washington
- “The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” – Carnegie Mellon University
Types of Machine Learning Algorithms which we use:
- Regression: In regression, we predict the value based on the prior data, like predicting marks of a student based on their marks in the past five years. The value to be predicted is a continuous variable. Continuous variables are numeric variables that have an infinite number of values between any two values e.g. scores of a student, the length of a part or the date and time a payment is received.
- Classification: We assign a label to the output value in classification. E.g. Check if the email is spam or ham.
- Clustering: There is no pre-defined label in clustering. We just group the data based on some segmentation criteria. E.g. Customer Segmentation.
Types of Learning Methods in Machine Learning:
- Supervised :
- In the Supervised Learning Method, we have the past data with the labels to train the data.
- Regression and classification algorithms fall under this category
- Unsupervised:
- There is no pre-defined label are assigned to past data.
- In this we derive the information after processing the data, as we don't have any prior information.
- Clustering algorithms fall under this category
Regression:
Best Fit Line
RSS (Residual Sum of Squares) =
We already know that a line can be represented by Y = β0+β1 X
Substituting the values , for Y , X and e
So the final equation becomes :
Cost Function:
As shown in the above image , the function J(θ) is minimum when θ =0.
Similarly for 2 variable function (RSS function) J(m,c) = [y1- (mx1 +c)]² +....,we can get the minimum as:
Here we are left with two equations with two variables ( m ,c ) and these can be solved using linear equation method.
We can minimize a cost function used below techniques:
- Closed form method: The function to be minimised is simply differentiated and equated to 0 to achieve a solution. The solution is also double differentiated to check if the solution is greater than 0.
- Gradient Descent: Gradient descent is an optimization algorithm used to find the values of the parameters (coefficients) of a function (f) that minimizes a given cost function (cost).It is an iterative minimisation method which reaches the minima step by step (as shown in the figure below).We start with an initial assumed value of the parameter. This initial assumed value can be anything (say ). Then you assume which is rate of learning. For that value (), you calculate the output of the differentiated function which we denote as . Then the new value of the parameter becomes . We continue the process until the algorithm reaches an optimum point () ; i.e the value of the parameter does not change effectively after this point.
The parameter is the learning rate , and its magnitude defines the magnitude of the iterative measures. The range of α is (0,1), but large values of , for example > 0.5, are not favored because the algorithm may miss the minima. If the alpha value is too high then the algorithm of gradient descent may miss the minima and start the iterative search again.
# Takes in X, y, current m and c (both initialised to 0), num_iterations, learning rate
# returns gradient at current m and c for each pair of m and c
def gradient(X, y, m_current=0, c_current=0, iters=1000, learning_rate=0.01):
N = float(len(y))
gd_df = pd.DataFrame( columns = ['m_current', 'c_current','cost'])
for i in range(iters):
y_current = (m_current * X) + c_current
cost = sum([data**2 for data in (y-y_current)]) / N
m_gradient = -(2/N) * sum(X * (y - y_current))
c_gradient = -(2/N) * sum(y - y_current)
m_current = m_current - (learning_rate * m_gradient)
c_current = c_current - (learning_rate * c_gradient)
gd_df.loc[i] = [m_current,c_current,cost]
return(gd_df)
TSS, RSS and R Square
Negative R Square
Example: fit data to a linear regression model constrained so that the
The model makes no sense at all given these data. It is clearly the wrong model, perhaps chosen by accident.
The fit of the model (a straight line constrained to go through the point (0,1500)) is worse than the fit of a horizontal line. Thus the sum-of-squares from the model
When
With linear regression with no constraints,
Comments
Post a Comment