Building a Simple Machine Learning Model from Scratch
A Beginner's Guide to Machine Learning with Python and Basic Libraries
Table of contents
No headings in the article.
If you have been anywhere near social media or any media at all, you must have heard the waves on Generative AI.Generative AI is a branch of artificial intelligence that can produce new content such as images, text, music and more. It can create realistic faces, write catchy songs, generate funny captions and much more. But what is the magic behind it? And how can you learn to do it yourself? The first step is to understand the concept of machine learning.
Machine learning has revolutionized the way we approach problem-solving in the modern world. From image recognition to natural language processing, machine learning has made it possible to build intelligent systems that can learn from data and make predictions based on that learning.
While there are many powerful machine learning libraries and frameworks available today, building a simple machine learning model from scratch can be a rewarding and educational experience. In this article, we will guide you through the process of building a simple machine learning model from scratch, using nothing but Python and some basic math.
Before we dive into the technical details, let's first understand what machine learning is and how it works.
Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. The process of machine learning involves feeding the computer with data and allowing it to learn patterns and relationships within the data. Once a model has been trained on the data, it can be used to make predictions on new data.
Now that we have a basic understanding of machine learning, let's get started with building our own model.
Step 1: Collect and Prepare Data
The first step in building a machine learning model is to collect and prepare the data. For our simple example, we'll be using a dataset that contains information about the height and weight of a group of people. We'll use this data to predict the weight of a person based on their height.
To collect the data, you can either create a dataset manually or use an existing dataset. In our case, we'll create a dataset manually using Python's random module. Here's the code to generate a random dataset:
import random
data = []
for i in range(100):
height = random.randint(140, 200) weight 50+ 0.5 (height - 150) + random.randint(-10, 10)
data.append((height, weight))
This code generates 100 data points, where the height is randomly selected between 140 and 200, and the weight is calculated using a simple linear equation. The weight is calculated based on the height, with a slope of 0.5 and an intercept of 50. We also add some random noise to the weight using the random.randint() function.
Once we have the data, we need to prepare it for training our model. We'll split the data into two sets - a training set and a test set. The training set will be used to train the model, while the test set will be used to evaluate the performance of the model.
import random
data = []
for i in range(100):
height= random.randint(140, 200)
weight 50+ 0.5 * (height 150) + random.randint(-10, 10)
data.append((height, weight))
train_data = data[:80]
test_data = data[80:]
Step 2: Define the Model
The next step is to define the machine learning model. For our simple example, we'll be using a linear regression model.
A linear regression model is a type of supervised learning model that uses a linear equation to predict the output variable based on one or more input variables.
To define the linear regression model,
we'll use the following equation:
y = mx + b
Where y is the output variable (weight), x is the input variable (height), m is the slope or weight of the line, and b is the intercept or bias of the line. Our goal is to find the values of m and b that best fit the data.
Here's the code to define the linear regression model:
class LinearRegression:
def __init__(self):
self.m = 0
self.b = 0
def fit(self, x, y):
n = len(x)
sum_x = sum(x)
sum_y = sum(y)
sum_xy = sum([x[i] * y[i] for i in range(n)])
sum_x_squared = sum([x[i] ** 2 for i in range(n)])
self.m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x ** 2)
self.b = (sum_y - self.m * sum_x) / n
def predict(self, x):
return [self.m * x + self.b for x in x]
The LinearRegression class has two methods - fit() and predict(). The fit() method takes the input data X and output data y as arguments and computes the values of m and b using the least squares method. The predict() method takes a set of input data X and returns the predicted output data y using the values of m and b.
Step 3: Train the Model
With the data and model defined, we can now train the model. The training process involves feeding the input data and output data to the model and allowing it to learn the patterns and relationships within the data.
Here's the code to train our linear regression model using the training set:
model = Linear Regression()
X_train = [data[0] for data in train_data]
In this code, we create an instance of the Linear Regression class and extract the input and output data from the training set. We then call the fit() method of the model and pass in the input and output data to train the model.
Step 4: Test the Model
Once the model has been trained, we can evaluate its performance on the test set. The test set contains data that the model has not seen before, and we use it to measure how well the model can generalize to new data.
Here's the code to test our model using the test set:
X_test = [data[0] for data in test_data]
y_test = [data[1] for data in test_data]
y_pred = model.predict(X_test)
for i in range(len(X_test)):
print(f"Input: {X_test[i]}, Actual Output: {y_test[i]}, Predicted Output: {y_pred[i]}")
Step 5: Visualize the Results
Finally, we can visualize the results of our model by plotting the predicted output against the actual output. This will give us an idea of how well the model is able to predict the output variable based on the input variable.
Here's the code to plot the results:
import matplotlib.pyplot as plt
plt.scatter (X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.xlabel('Height') plt.ylabel('Weight') plt.title('Linear Regression Model')
In this code, we use the matplotlib library to create a scatter plot of the test data, with height on the x-axis and weight on the y-axis. We then plot the predicted output against the input data using a blue line.
Conclusion
In this article, we have seen how to build a simple machine learning model from scratch using Python and some basic math. We started by collecting and preparing the data, then defined the linear regression model, trained the model on the data, tested its performance on a test set, and visualized the results.
While this example is simple, it demonstrates the fundamental concepts of machine learning and provides a foundation for building more complex models. By building models from scratch, we can gain a deeper understanding of how they work and develop the skills necessary to tackle more challenging problems in the field of machine learning.