Loading scikit learn iris dataset

Loading scikit-learn Iris dataset

from sklearn import datasets
import numpy as np
import pandas as pd

Load Iris Dataset

The Iris flower dataset is one of the most famous databases for classification.

The dataset contains:

  • 3 classes (species of flowers)
  • 50 observations per class
# Load Iris datset
iris = datasets.load_iris()
dir(iris)
['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']
# Create features
X = iris.data

# Create label
y = iris.target

# View first row
X[0]
array([5.1, 3.5, 1.4, 0.2])

Option 2: Load as frame

# np.c_ is the numpy concatenate function
# which is used to concat iris['data'] and iris['target'] arrays 
# for pandas column argument: concat iris['feature_names'] list
# and string list (in this case one string); you can make this anything you'd like..  
# the original dataset would probably call this ['Species']
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                  columns= iris['feature_names'] + ['target'])

df.head()

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0.0
1 4.9 3.0 1.4 0.2 0.0
2 4.7 3.2 1.3 0.2 0.0
3 4.6 3.1 1.5 0.2 0.0
4 5.0 3.6 1.4 0.2 0.0

Load as frame (from 0.23)

An even better way is to set as_frame to True which loads it as a pandas DataFrame instead of an array.

import sklearn
sklearn.__version__
'0.21.1'
# iris = datasets.load_iris(as_frame=True)