Loading scikit-learn boston housing dataset

Loading scikit-learn Boston housing dataset

import numpy as np 
import pandas as pd

from sklearn import datasets

Load Boston Housing Dataset

The Boston housing dataset is a famous dataset from the 1970s. It contains 506 observations on housing prices around Boston. It is often used in regression examples and contains 15 features.

boston = datasets.load_boston()
# Load features
X = boston.data

# Load data
y = boston.target

# View first observation
X[0]
array([6.320e-03, 1.800e+01, 2.310e+00, 0.000e+00, 5.380e-01, 6.575e+00,
       6.520e+01, 4.090e+00, 1.000e+00, 2.960e+02, 1.530e+01, 3.969e+02,
       4.980e+00])

The features vary wildly in magnitude, so it is beneficial to standardize them.

Load Boston Housing Dataset as dataframe (option 2)

data = np.c_[boston.target, boston.data]
columns = ['target'] + list(boston.feature_names) # boston.feature_names is array
df = pd.DataFrame(data=data, columns=columns)
df.head()

target CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 24.0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 21.6 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 34.7 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 33.4 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 36.2 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33