Skip to content

In this project, I build two ML models to predict median_house_price based on 15 other variables

Notifications You must be signed in to change notification settings

tinyHiker/house_price_prediction_models

Repository files navigation

Housing Price Prediction Model

Overview

In this project, I developed a machine learning model to predict housing prices. Utilizing a dataset of various housing attributes, the model's goal is to accurately estimate the median house value based on features such as location, number of rooms, etc.

In the end my random forest model was able to predict the accuracy of the housing prices by around 81%

Key Components

Data Preprocessing

In preprocessing, I cleaned, normalizad, and feature engineered to prepare the data for model training. For example, I

  • eliminated rows with any null values with the .dropna() method
  • seperated dataset into train and test X, Y datasets
  • studing the correlation between all the fields in the training data dataset to feature engineer meaningful new fieatures in the dataset -perfomed natural logarithms on some fields of the trainign dataset in order to create normal distribution
  • derive one-hot-encoded data from categorical data

Model Selection

I chose the 'LinearRegression' and RandomForestRegressor from scikit-learn

Results

Running LinearRegression().score() gave me 0.68747 Running RandomForestRegressor().score() gave me ~0.8175 Clearly, the random forest model worked better

Hyperparameter Tuning

I tried to perform hyperparameter tuning using GridSearchCV. Explored different combinations of n_estimators and max_features to optimize the model's performance.

Tools and Libraries Used

  • Python: Primary programming language
  • Pandas: Data manipulation and analysis
  • Scikit-learn: Machine learning library
  • JupyterLab: Development environment
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
C:\Users\tahai\AppData\Local\Temp\ipykernel_15100\555797462.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
data = pd.read_csv("housing.csv")  # read csv data and store it into a dataframe
data.info() # provides a concise summary of a dataframe. We can see that there are some null values in the total bedrooms column. Since there arent that many null values, we will just drop the rows containing them
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           20640 non-null  float64
 1   latitude            20640 non-null  float64
 2   housing_median_age  20640 non-null  float64
 3   total_rooms         20640 non-null  float64
 4   total_bedrooms      20433 non-null  float64
 5   population          20640 non-null  float64
 6   households          20640 non-null  float64
 7   median_income       20640 non-null  float64
 8   median_house_value  20640 non-null  float64
 9   ocean_proximity     20640 non-null  object 
dtypes: float64(9), object(1)
memory usage: 1.6+ MB
data.dropna(inplace=True) #drop any rows in the dataframe with even one null value. Modifies inplace
data.info() #now we can see that the null values are gone
<class 'pandas.core.frame.DataFrame'>
Index: 20433 entries, 0 to 20639
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           20433 non-null  float64
 1   latitude            20433 non-null  float64
 2   housing_median_age  20433 non-null  float64
 3   total_rooms         20433 non-null  float64
 4   total_bedrooms      20433 non-null  float64
 5   population          20433 non-null  float64
 6   households          20433 non-null  float64
 7   median_income       20433 non-null  float64
 8   median_house_value  20433 non-null  float64
 9   ocean_proximity     20433 non-null  object 
dtypes: float64(9), object(1)
memory usage: 1.7+ MB
from sklearn.model_selection import train_test_split

X = data.drop(['median_house_value'], axis = 1) #All the values except 'median_house_value' are what we are goign to train on
y = data['median_house_value']  # 'median_house_value' is what we want to predict using machine learning
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # Create seperate train and test datasets
train_data = X_train.join(y_train) # Create a complete training dataset. Combines based on common indices
train_data
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income ocean_proximity median_house_value
12390 -116.44 33.74 5.0 846.0 249.0 117.0 67.0 7.9885 INLAND 403300.0
11214 -117.91 33.82 32.0 1408.0 307.0 1331.0 284.0 3.7014 <1H OCEAN 179600.0
3971 -118.58 34.19 35.0 2329.0 399.0 966.0 336.0 3.8839 <1H OCEAN 224900.0
13358 -117.61 34.04 8.0 4116.0 766.0 1785.0 745.0 3.1672 INLAND 150200.0
988 -121.86 37.70 13.0 9621.0 1344.0 4389.0 1391.0 6.6827 INLAND 313700.0
... ... ... ... ... ... ... ... ... ... ...
3662 -118.38 34.25 38.0 983.0 185.0 513.0 170.0 4.8816 <1H OCEAN 231500.0
14732 -117.02 32.81 26.0 1998.0 301.0 874.0 305.0 5.4544 <1H OCEAN 180900.0
16058 -122.49 37.76 52.0 1792.0 305.0 782.0 287.0 4.0391 NEAR BAY 332700.0
6707 -118.15 34.14 27.0 1499.0 426.0 755.0 414.0 3.8750 <1H OCEAN 258300.0
13764 -117.13 34.06 4.0 3078.0 510.0 1341.0 486.0 4.9688 INLAND 163200.0

16346 rows × 10 columns

train_data.hist()
array([[<Axes: title={'center': 'longitude'}>,
        <Axes: title={'center': 'latitude'}>,
        <Axes: title={'center': 'housing_median_age'}>],
       [<Axes: title={'center': 'total_rooms'}>,
        <Axes: title={'center': 'total_bedrooms'}>,
        <Axes: title={'center': 'population'}>],
       [<Axes: title={'center': 'households'}>,
        <Axes: title={'center': 'median_income'}>,
        <Axes: title={'center': 'median_house_value'}>]], dtype=object)

png

numeric_data = train_data.select_dtypes(include=[np.number]) # Creating a new dataset 'numeric_data' with only the numeric datatypes
corr_matrix = numeric_data.corr() # A matrix with all correlation values between all pars of numeric fields
corr_matrix
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value
longitude 1.000000 -0.924710 -0.103642 0.044318 0.069334 0.097529 0.054852 -0.016018 -0.043630
latitude -0.924710 1.000000 0.007825 -0.035261 -0.065960 -0.106548 -0.069899 -0.078715 -0.145993
housing_median_age -0.103642 0.007825 1.000000 -0.362495 -0.324781 -0.301916 -0.306993 -0.124138 0.100925
total_rooms 0.044318 -0.035261 -0.362495 1.000000 0.931237 0.864408 0.919232 0.199789 0.132022
total_bedrooms 0.069334 -0.065960 -0.324781 0.931237 1.000000 0.885152 0.978566 -0.004929 0.048961
population 0.097529 -0.106548 -0.301916 0.864408 0.885152 1.000000 0.915534 0.010363 -0.025150
households 0.054852 -0.069899 -0.306993 0.919232 0.978566 0.915534 1.000000 0.016691 0.064427
median_income -0.016018 -0.078715 -0.124138 0.199789 -0.004929 0.010363 0.016691 1.000000 0.686716
median_house_value -0.043630 -0.145993 0.100925 0.132022 0.048961 -0.025150 0.064427 0.686716 1.000000
plt.figure(figsize=(15,8))
sns.heatmap(corr_matrix, annot=True,cmap= "YlGnBu") #Creating a figure with the correlation values
<Axes: >

png

# A lot of the data in certain fields has skewed distributions. This is not ideal for training an ML model. So we take the natural logarithm of all values in these colummns. This preserves corraltion but creates a normal distribution.
train_data['total_rooms'] = np.log(train_data['total_rooms'] + 1)
train_data['total_bedrooms'] = np.log(train_data['total_bedrooms'] + 1)
train_data['population'] = np.log(train_data['population'] + 1)
train_data['households'] = np.log(train_data['households'] + 1)
train_data.hist(figsize=(15,8))  #now we can see the normal distributions in a histogram
array([[<Axes: title={'center': 'longitude'}>,
        <Axes: title={'center': 'latitude'}>,
        <Axes: title={'center': 'housing_median_age'}>],
       [<Axes: title={'center': 'total_rooms'}>,
        <Axes: title={'center': 'total_bedrooms'}>,
        <Axes: title={'center': 'population'}>],
       [<Axes: title={'center': 'households'}>,
        <Axes: title={'center': 'median_income'}>,
        <Axes: title={'center': 'median_house_value'}>]], dtype=object)

png

dummies = pd.get_dummies(train_data.ocean_proximity)
one_hot_encoded = dummies.astype(int)#Ocean_proximity is categorical data. We now create one_hot_encoded values from this data.
train_data = train_data.join(one_hot_encoded).drop(['ocean_proximity'], axis = 1) #join the one-hot-encoded dataset with the train_data and drop the ocea-proximity encoded data
train_data
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value <1H OCEAN INLAND ISLAND NEAR BAY NEAR OCEAN
12390 -116.44 33.74 5.0 6.741701 5.521461 4.770685 4.219508 7.9885 403300.0 0 1 0 0 0
11214 -117.91 33.82 32.0 7.250636 5.730100 7.194437 5.652489 3.7014 179600.0 1 0 0 0 0
3971 -118.58 34.19 35.0 7.753624 5.991465 6.874198 5.820083 3.8839 224900.0 1 0 0 0 0
13358 -117.61 34.04 8.0 8.322880 6.642487 7.487734 6.614726 3.1672 150200.0 0 1 0 0 0
988 -121.86 37.70 13.0 9.171807 7.204149 8.387085 7.238497 6.6827 313700.0 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3662 -118.38 34.25 38.0 6.891626 5.225747 6.242223 5.141664 4.8816 231500.0 1 0 0 0 0
14732 -117.02 32.81 26.0 7.600402 5.710427 6.774224 5.723585 5.4544 180900.0 1 0 0 0 0
16058 -122.49 37.76 52.0 7.491645 5.723585 6.663133 5.662960 4.0391 332700.0 0 0 0 1 0
6707 -118.15 34.14 27.0 7.313220 6.056784 6.628041 6.028279 3.8750 258300.0 1 0 0 0 0
13764 -117.13 34.06 4.0 8.032360 6.236370 7.201916 6.188264 4.9688 163200.0 0 1 0 0 0

16346 rows × 14 columns

plt.figure(figsize=(15,8))
sns.heatmap(train_data.corr(), annot=True,cmap= "YlGnBu")
<Axes: >

png

plt.figure(figsize=(15,8))
sns.scatterplot(x="latitude", y="longitude", data=train_data, hue="median_house_value", palette="coolwarm") #create this cool little figure that maps latitide and longitude of certain neightborhods and, among them, highlights the neighborhoods with high prices
<Axes: xlabel='latitude', ylabel='longitude'>

png

train_data['bedroom_ratio'] = train_data['total_bedrooms'] / train_data['total_rooms'] #We add a field to train_data called bedroom ratio. How many bedrooms are there for every room in the neighborhood
train_data['household_rooms'] = train_data['total_rooms'] / train_data['households'] #How many rooms are ther per houshold?
plt.figure(figsize=(15,8))
sns.heatmap(train_data.corr(), annot=True,cmap= "YlGnBu")
<Axes: >

png

from sklearn.linear_model import LinearRegression

X_train, y_train  = train_data.drop(['median_house_value'], axis = 1), train_data['median_house_value']

reg = LinearRegression()

reg.fit(X_train, y_train)
<style>#sk-container-id-1 {color: black;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: scikit-learn/scikit-learn#21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style>
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression
LinearRegression()
train_data
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value <1H OCEAN INLAND ISLAND NEAR BAY NEAR OCEAN bedroom_ratio household_rooms
12390 -116.44 33.74 5.0 6.741701 5.521461 4.770685 4.219508 7.9885 403300.0 0 1 0 0 0 0.819001 1.597746
11214 -117.91 33.82 32.0 7.250636 5.730100 7.194437 5.652489 3.7014 179600.0 1 0 0 0 0 0.790289 1.282733
3971 -118.58 34.19 35.0 7.753624 5.991465 6.874198 5.820083 3.8839 224900.0 1 0 0 0 0 0.772731 1.332219
13358 -117.61 34.04 8.0 8.322880 6.642487 7.487734 6.614726 3.1672 150200.0 0 1 0 0 0 0.798100 1.258235
988 -121.86 37.70 13.0 9.171807 7.204149 8.387085 7.238497 6.6827 313700.0 0 1 0 0 0 0.785467 1.267087
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3662 -118.38 34.25 38.0 6.891626 5.225747 6.242223 5.141664 4.8816 231500.0 1 0 0 0 0 0.758275 1.340349
14732 -117.02 32.81 26.0 7.600402 5.710427 6.774224 5.723585 5.4544 180900.0 1 0 0 0 0 0.751332 1.327909
16058 -122.49 37.76 52.0 7.491645 5.723585 6.663133 5.662960 4.0391 332700.0 0 0 0 1 0 0.763996 1.322920
6707 -118.15 34.14 27.0 7.313220 6.056784 6.628041 6.028279 3.8750 258300.0 1 0 0 0 0 0.828197 1.213152
13764 -117.13 34.06 4.0 8.032360 6.236370 7.201916 6.188264 4.9688 163200.0 0 1 0 0 0 0.776406 1.297999

16346 rows × 16 columns

LinearRegression()
<style>#sk-container-id-2 {color: black;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: scikit-learn/scikit-learn#21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style>
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression
LinearRegression()
test_data = X_test.join(y_test)

test_data['total_rooms'] = np.log(test_data['total_rooms'] + 1)
test_data['total_bedrooms'] = np.log(test_data['total_bedrooms'] + 1)
test_data['population'] = np.log(test_data['population'] + 1)
test_data['households'] = np.log(test_data['households'] + 1)

test_dummies = pd.get_dummies(test_data.ocean_proximity)

test_one_hot_encoded = test_dummies.astype(int)
test_data = test_data.join(test_one_hot_encoded).drop(['ocean_proximity'], axis = 1) 

test_data['bedroom_ratio'] = test_data['total_bedrooms'] / test_data['total_rooms'] 
test_data['household_rooms'] = test_data['total_rooms'] / test_data['households'] 
new_X_test, new_y_test = test_data.drop(['median_house_value'], axis = 1), test_data['median_house_value']
new_X_test
X_train
reg.score(new_X_test, new_y_test)
0.6746783446725809

from sklearn.ensemble import RandomForestRegressor

forest = RandomForestRegressor()

from sklearn.ensemble import RandomForestRegressor

forest = RandomForestRegressor()
forest.fit(X_train, y_train)
forest.score(new_X_test, new_y_test)
0.8206810649678673
from sklearn.model_selection import GridSearchCV

param_grid = {
    "n_estimators": [3, 10, 30],
    "max_features": [2,4,6, 8]
}

grid_search = GridSearchCV(forest, param_grid, cv=5, scoring="neg_mean_squared_error", return_train_score= True)

grid_search.fit(new_X_test, new_y_test)

About

In this project, I build two ML models to predict median_house_price based on 15 other variables

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published