ML: Depression Model
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
class DepressionModel:
"""A class created to predict the liklihood of depression based off of daily lifestyle.
"""
# a singleton instance of TitanicModel, created to train the model only once, while using it for prediction multiple times
_instance = None
# an instance to initialize the DepressionModel
def __init__(self):
"""
Initializes the DepressionModel with placeholders for the model, scaler, and test data.
"""
# the Depression ML Model
self.model = None
self.scaler = None
self.X_test = None
self.y_test = None
# function to train the model using linear regression
def train_model(self, data_path):
"""
Trains the model using linear regression based on the provided dataset.
Args:
data_path (str): Path to the dataset containing features and labels.
"""
# Load data
data = pd.read_csv(data_path)
# Split the data into features and labels
X = data.drop('Probability of Developing Depression', axis=1)
y = data['Probability of Developing Depression']
# Standardize the features
self.scaler = StandardScaler()
X_scaled = self.scaler.fit_transform(X)
# Split the data into training and testing sets
X_train, self.X_test, y_train, self.y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train a linear regression model
self.model = LinearRegression()
self.model.fit(X_train, y_train)
# function to predict the chance of depression
def predict_depression(self, age, stress_level, exercise_hours, sleep_hours):
"""
Predicts the probability of depression for an individual based on specified lifestyle factors.
Args:
age (float): The age of the individual.
stress_level (float): The stress level of the individual.
exercise_hours (float): The number of hours the individual exercises per week.
sleep_hours (float): The number of hours the individual sleeps per night.
Returns:
float: Probability of experiencing depression.
"""
if self.model is None or self.scaler is None:
raise ValueError("Model has not been trained yet. Please train the model first.")
# Scale input data
input_data = self.scaler.transform([[age, stress_level, exercise_hours, sleep_hours]])
# Predict depression probability
chance_of_depression = self.model.predict(input_data)[0]
return chance_of_depression
@classmethod
def get_instance(cls):
"""
Retrieves the singleton instance of DepressionModel.
If instance doesn't exist, creates and trains it.
Returns:
DepressionModel: Singleton instance of DepressionModel.
"""
# check for instance, if it doesn't exist, create it
if cls._instance is None:
cls._instance = cls()
cls._instance.train_model('depression_dataset.csv')
# return the instance, to be used for prediction
return cls._instance
# Usage
depressionModel = DepressionModel()
def initDepression():
"""
Initializes the Depression Model by loading it into memory.
"""
depressionModel.get_instance()
def testDepression():
"""
Tests the Depression Model by predicting the likelihood of depression based on daily lifestyle.
Prints method documentation, individual data, and depression probability.
"""
# Setup data for prediction
print(" Step 1: Define individual data for prediction:")
individual_data = {
'age': 30,
'stress_level': 5,
'exercise_hours': 3,
'sleep_hours': 7
}
age = 30
stress_level = 5
exercise_hours = 3
sleep_hours = 7
print("\t", individual_data)
print()
# Get an instance of the trained Depression Model
depressionModel = DepressionModel.get_instance()
print(" Step 2:", depressionModel.get_instance.__doc__)
# Predict the probability of depression
print(" Step 3:", depressionModel.predict_depression.__doc__)
depression_probability = depressionModel.predict_depression(age, stress_level, exercise_hours, sleep_hours)
print('\t Probability of depression: {:.2%}'.format(depression_probability))
print()
if __name__ == "__main__":
print(" Begin:", testDepression.__doc__)
testDepression()
Improvements and Accomplishments:
Going into this project, I was very unfamiliar with different types of models, besides the model that geneates the database. I was very unsure of how to start the model for machine learning. Yet through help from friends and trial and error, I was able to accomplish creating both a model and an API based off of predicting the liklihood of depresison.
I feel like after going through this project, I now have a much better understanding of how models, as well as machine learning, works. While dealing with an issue between the improper tyep of model (linear or regession), I did some research into how the different types of models work. I now have gained some deeper understanding on how each of these models work.
Linear Regression Model:
- Models the relationship between a dependant variables
- Assumes liinear relationshpi between variables
- Used for predicting of forecasting tasks, as well as understanding the relationship between variables
Logistic Regression Mdoel:
- Used for binary classification tasks where the dependent variable is categorical and has two possible outcomes (e.g., 0 or 1, yes or no).
- Determines the probability that an input belongs to a specific category (e.e. spam email or not spam)
- Uses logistic regression to map input values to a probability score between 0 and 1.
I feel as thoguh I was also able to learn more about organization fo code, as well as documentation of code. With the example of the titantic code provided, I was able to learn how to properly document my code with the use of multi-line comment and single line comments. The multi-line comments were used to iterate through the steps of each function of the machine learning model, and whenever I run the file the comments run and explain how the model works. The single line comments help to better understand what each function and line of code works.
Explaining the Code
_instance = None
_instance = None is a class-level variable declaration. This line initializes a variable named _instance with a value of None. This is used as part of the singleton design pattern. The singleton pattern ensures that a class has only one instance and provides a global point of access to that instance.
How it works:
- When the class is first accessed, _instance is set to None.
- When get_instance() method is called, it checks if _instance is None. If it is, it creates an instance of the DepressionModel class and assigns it to _instance
- Subsequent calls to get_instance() return the same instance stored in _instance, ensuring that only one instance of the DepressionModel class exists throughout the program’s execution.
Essentially, its a flag to see if an instance of DepressionModel has been created or not.
def train_model(self, data_path):
"""
Trains the model using linear regression based on the provided dataset.
Args:
data_path (str): Path to the dataset containing features and labels.
"""
# Load data
data = pd.read_csv(data_path)
# Split the data into features and labels
X = data.drop('Probability of Developing Depression', axis=1)
y = data['Probability of Developing Depression']
# Standardize the features
self.scaler = StandardScaler()
X_scaled = self.scaler.fit_transform(X)
# Split the data into training and testing sets
X_train, self.X_test, y_train, self.y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train a linear regression model
self.model = LinearRegression()
self.model.fit(X_train, y_train)
This Python code defines a method train_model within a class. This method is responsible for training a linear regression model based on the provided dataset. It loads the dataset from the provided data_path using pandas’ read_csv function. It splits the loaded dataset into features (X) and labels (y). Features are all columns except the one labeled “Probability of Developing Depression”, which is used as the target variable. It standardizes the features using StandardScaler from scikit-learn. Standardization is a preprocessing step that ensures all features have a mean of 0 and a standard deviation of 1. It splits the standardized features and labels into training and testing sets using train_test_split from scikit-learn. 80% of the data is used for training (X_train, y_train), and 20% is kept aside for testing (self.X_test, self.y_test). It initializes a linear regression model (LinearRegression from scikit-learn) and fits it to the training data (X_train, y_train). The trained model is stored as self.model.
def predict_depression(self, age, stress_level, exercise_hours, sleep_hours):
"""
Predicts the probability of depression for an individual based on specified lifestyle factors.
Args:
age (float): The age of the individual.
stress_level (float): The stress level of the individual.
exercise_hours (float): The number of hours the individual exercises per week.
sleep_hours (float): The number of hours the individual sleeps per night.
Returns:
float: Probability of experiencing depression.
"""
if self.model is None or self.scaler is None:
raise ValueError("Model has not been trained yet. Please train the model first.")
# Scale input data
input_data = self.scaler.transform([[age, stress_level, exercise_hours, sleep_hours]])
# Predict depression probability
chance_of_depression = self.model.predict(input_data)[0]
return chance_of_depression
This Python code defines a method predict_depression within a class. This method is responsible for predicting the probability of depression for an individual based on specified lifestyle factors using a previously trained linear regression model. It first checks whether the model and scaler attributes have been initialized (i.e., whether the model has been trained). If not, it raises a ValueError indicating that the model needs to be trained before making predictions.
If the model has been trained, it proceeds to scale the input data using the scaler object obtained during the training phase. The input data consists of the specified lifestyle factors: age, stress_level, exercise_hours, and sleep_hours. These values are passed as arguments to the method.
The scaled input data is then used to make a prediction using the trained linear regression model (self.model). The predict method of the model is called with the scaled input data, resulting in a predicted probability of experiencing depression for the individual.
The predicted probability of depression is returned as the output of the method.