This post demonstrates the mathematical model behind logistic regression, which serves as the building block of the deep learning. I write this post to help others understand deep learning, I also write it for myself to learn deep learning more deeply.
I break down each subject into two posts, the math post explaining how an idea works theoretically, and the code post demonstrating how to implement the idea into code. To demonstrate
logistic regression, I will apply the theory to image recognition, a well-studied machine learning subject, later in the code post. This post lays the foundation for understanding the model.
Modern deep learning techniques model after biological cognition, the basic unit of which is neuron. It is the building block of deep learning starts with neuron-inspired math modeling using logistic regression. The math behind a simple logistic regression classifier looks like below:
Image this function represents a neuron in your brain, the input is the stimulus your brain received (sound, touch, etc.), represented by the data captured in ; and the output is a binary decision of whether that neuron gets triggered or not, represented by the binary value of .
In order to have the neuron work properly, we need to have decent value for weights and bias terms, denoted as and respectively. However, we don’t have good values for nor automatically, we need to train our model and acquire good weights and bias terms. Image our function as a newborn baby, and it takes training to teach a baby to walk, speak etc. We need to train our neuron model and figure out good values for and .
So the question is how to train for logistic regression? Given we have a set of training dataset, we can measure the error of our current model by comparing predicted value and actual value, then take the result to do better. We can break down the question of how to train for logistic regression into a set of smaller problems:
The technique to address question 1 is called forward propagation, and the technique to address question 2 is called backward progation.
Forward progation enables two things. First, it provides a predicted value based on the input . To break down the calculation mathematically:
If the value of is greater than or equal to 0.5, then is predicted as 1, otherwise is 0.
Second, it allows the model to calculate an error based on a loss function, this error quantifies how well our current model is performing using and . The loss function for the logistic regression is below:
How does this loss function makes sense? The way I think about is that if the prediction is close to actual value, the value should be low. If the prediction is far from actual value, the value should be high. Gi
Next we define a cost function for the entire dataset based on the loss function because we have many rows of training data, say records. The value of the error based on the cost function is avaraged out across all errors:
Given we have a way to measure the error of our prediction model, we can set the goal to minimize prediction error by adjusting our model parameters, and this is where backward propagation comes in.
To tackle the problem of how to refine our model to reduce training error, we can more formally define our problem as following:
To implement step 4, we need to apply gradient descent.
Quote from Wikipedia on gradient descent:
Gradient descent is a first-order iterative optimization alogirthm for finding the minimum of a function.
Great! We want to find the minimum of our cost function by adjusting and . Following the gradient descent algorithm, we take the following steps:
The goal is to learn and by minimizing the cost function . For a parameter , the update rule is , where is the learning rate.
Translate the steps above mathematically, we get:
We can now have a big picture of how we want to use logistic regression classifier to predict future dataset based on training dataset. A high-level architectural steps can be summarized as:
In the next blog post the code of logistic regression, we will dive into the implementation of logistic regression based on the model above.