Data Science Economics

Introduction: (Decision Trees and Random Forests

In the ever-evolving world of artificial intelligence and data-driven decision-making, two robust algorithms stand out for their simplicity and effectiveness: Decision Trees and random forests. The interpretability and the accuracy properties make these machine learning techniques widely used for classification and or regression tasks. In this blog, we’re going to see how these models work and share some insights from one implementation with the famous Iris dataset

Decision Trees and Random Forests Potential of Machine Learning

Decision Trees and Random Forests : Predictive Power Foundation

Explanation:

Which one should deals first from Decision Trees and Random Forests ?

Decision Trees

Key Steps in Implementing a Decision Trees and Random Forests

  • Data Preparation: Preprocessing and loading dataset and separate input feature (X) and target variable(Y).
  • Training: We fit a Decision Tree classifier on the training data.
  • Evaluation: Assess model performance on unseen test data using metrics such as accuracy.

Import libraries and iris data set

Code:

import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df=sns.load_dataset("iris")
#To show data set print 
df.head()  #this show first five rows 

Code for decision tree map:

DecisionTreeClassifier is from sklearn.tree.
plot_tree from sklearn.tree.
model=DecisionTreeClassifier().fit(x,y)
plot_tree(model,filled=True)

The trained model of iris data is plotted as a decision tree.

Created by Author

Code Example for Calculating Accuracy:

Now it’s time to check the accuracy of model Import useful libraries of decision tree model

from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

Splitting data into training and testing sets

Code

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42

Training the Decision Tree

dt_model = DecisionTreeClassifier()
dt_model.fit(x_train, y_train)

Making predictions

dt_predictions = dt_model.predict(x_test)

 Evaluating the model

dt_accuracy = accuracy_score(y_test, dt_predictions)
print("Accuracy of the Decision Tree model:", dt_accuracy)

Output : 1.00

Random Forests: The Ensemble Advantage:

Implementation Highlights:

  • Ensemble Creation: They train multiple decision trees on subsets of data.
  • Performance Evaluation: Validation of the ensemble’s effectiveness is done by measuring accuracy.

Code Example for Calculating Accuracy:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Splitting data into training and testing sets:

Again, split data into two parts for train data and for test data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

Training the Random Forest model:

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(x_train, y_train)

 Making predictions:

rf_predictions = rf_model.predict(x_test)

 Evaluating the model:

rf_accuracy = accuracy_score(y_test, rf_predictions)
print("Accuracy of the Random Forest model:", rf_accuracy)

Key Insights:

Final Thoughts:

Leave a Reply

Your email address will not be published. Required fields are marked *