My Cat Vs Dog solution using deep learning with transfer learning strategy

This blog introduced a transfer-learning strategy to use ImageNet data for a pre-trained network and then using this semi-trained model to identify cats from dogs. The original dataset with pictures of 12,500 cats and 12,500 dogs were obtained from Kaggle Dogs vs. Cats Redux.

This framework for this project, building up with Keras with Tensorflow backend, were revised from a similar solution using ResNet50 and the official guide of Keras

1. Import modules

from __future__ import print_function
from __future__ import absolute_import

import os
import shutil
import random
import warnings

import cv2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
np.random.seed(0)
from   tqdm import *
from   sklearn.model_selection import train_test_split

import tensorflow as tf
from   keras           import backend as K
from   keras.models    import Model
from   keras.layers    import Dense, Input, BatchNormalization, Activation, merge, Dropout
from   keras.layers    import Conv2D, SeparableConv2D, MaxPooling2D, GlobalAveragePooling2D
from   keras.callbacks import ModelCheckpoint
from   keras.preprocessing       import image
from   keras.preprocessing.image import ImageDataGenerator
from   keras.engine.topology     import get_source_inputs
from   keras.utils.data_utils    import get_file
from   keras.applications.imagenet_utils import decode_predictions, _obtain_input_shape

%matplotlib inline

Using TensorFlow backend.

2. Preprocessing for the input images

Let’s see the dir-names of imput files for training and testing

root_prefix = "/hdfs/huboqiang/kaggle/CatDot"

train_filenames = os.listdir('%s/train/' % (root_prefix))
test_filenames  = os.listdir('%s/test/'  % (root_prefix))
print(train_filenames[0:10])
print(test_filenames[0:10])

['dog.11851.jpg', 'dog.4195.jpg', 'dog.8220.jpg', 'cat.8544.jpg', 'cat.619.jpg', 'dog.1116.jpg', 'dog.8316.jpg', 'cat.3990.jpg', 'cat.2850.jpg', 'cat.847.jpg']

['11399.jpg', '401.jpg', '10173.jpg', '3407.jpg', '11926.jpg', '1516.jpg', '4935.jpg', '7063.jpg', '4693.jpg', '7005.jpg']

Let’s see the total number of images in training set and testing set

train_cat = filter(lambda x: x.split(".")[0] == "cat", train_filenames)
train_dog = filter(lambda x: x.split(".")[0] == "dog", train_filenames)
x = ['train_cat', 'train_dog', 'test']
y = [len(train_cat), len(train_dog), len(test_filenames)]
ax = sns.barplot(x=x, y=y)

png

We concluded that the number of images for cats and dogs were generally equal and the number of graphs in each group is above 10,000.

So the training sets were further divided into 90% for training the model and 10% for evaluate the model using cross validation.

my_train, my_cv = train_test_split(train_filenames, test_size=0.1, random_state=0)
print(len(my_train), len(my_cv))

Output: 22500 2500

We then organized the file system for the input of Keras using soft-links:

def remove_and_create_class(dirname):
    if os.path.exists(dirname):
        shutil.rmtree(dirname)
    os.mkdir(dirname)
    os.mkdir(dirname+'/cat')
    os.mkdir(dirname+'/dog')

remove_and_create_class('%s/mytrain' % (root_prefix))
remove_and_create_class('%s/myvalid' % (root_prefix))

for filename in filter(lambda x: x.split(".")[0] == "cat", my_train):
    os.symlink('%s/train/' % (root_prefix)+filename, '%s/mytrain/cat/' % (root_prefix)+filename)

for filename in filter(lambda x: x.split(".")[0] == "dog", my_train):
    os.symlink('%s/train/' % (root_prefix)+filename, '%s/mytrain/dog/' % (root_prefix)+filename)

for filename in filter(lambda x: x.split(".")[0] == "cat", my_cv):
    os.symlink('%s/train/' % (root_prefix)+filename, '%s/myvalid/cat/' % (root_prefix)+filename)

for filename in filter(lambda x: x.split(".")[0] == "dog", my_cv):
    os.symlink('%s/train/' % (root_prefix)+filename, '%s/myvalid/dog/' % (root_prefix)+filename)

The structure of file system for the input of Keras using soft-links:

tree --filelimit 10 mytrain myvalid 

Results:

mytrain
├── cat [11238 entries exceeds filelimit, not opening dir]
└── dog [11262 entries exceeds filelimit, not opening dir]
myvalid
├── cat [1262 entries exceeds filelimit, not opening dir]
└── dog [1238 entries exceeds filelimit, not opening dir]

4 directories, 0 files

In order to make the most of our few training examples, we will “augment” them via a number of random transformations, so that our model would never see twice the exact same picture. This helps prevent overfitting and helps the model generalize better.

A generator were used to generate images from disk filesystem with a given batch-size.

This was suggested by the Official Guide of Keras, where:

  • rotation_range is a value in degrees (0-180), a range within which to randomly rotate pictures
  • width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally
  • rescale is a value by which we will multiply the data before any other processing. Our original images consist in RGB coefficients in the 0-255, but such values would be too high for our models to process (given a typical learning rate), so we target values between 0 and 1 instead by scaling with a 1/255. factor.
  • shear_range is for randomly applying shearing transformations
  • zoom_range is for randomly zooming inside pictures
  • horizontal_flip is for randomly flipping half of the images horizontally –relevant when there are no assumptions of horizontal assymetry (e.g. real-world pictures).
  • fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.

3. Setting the generator of input files for training process

image_width  = 299
image_height = 299
image_size = (image_width, image_height)

train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

train_generator = train_datagen.flow_from_directory(
        '%s/mytrain' % (root_prefix),
        target_size=image_size,  # all images will be resized to 299x299
        batch_size=16,
        class_mode='binary')

validation_datagen = ImageDataGenerator(rescale=1.0/255)
validation_generator = validation_datagen.flow_from_directory(
        '%s/myvalid' % (root_prefix),
        target_size=image_size, 
        batch_size=16,
        class_mode='binary')

Found 22500 images belonging to 2 classes.

Found 2500 images belonging to 2 classes.

x, y = train_generator.next()

plt.figure(figsize=(16, 8))
for i, (img, label) in enumerate(zip(x, y)):
    plt.subplot(3, 6, i+1)
    if label == 1:
        plt.title('dog')
    else:
        plt.title('cat')
    plt.axis('off')
    plt.imshow(img, interpolation="nearest")

png

4. Building up the framework of the network for training

using model from https://github.com/fchollet/deep-learning-models/blob/master/xception.py

prefix = "https://github.com/fchollet/deep-learning-models/releases/download/v0.4/"
TF_WEIGHTS_PATH        = '%s/xception_weights_tf_dim_ordering_tf_kernels.h5'       % (prefix)
TF_WEIGHTS_PATH_NO_TOP = '%s/xception_weights_tf_dim_ordering_tf_kernels_notop.h5' % (prefix)


# Instantiate the Xception architecture, optionally loading weights pre-trained on ImageNet. 
def Xception(include_top=True, weights='imagenet', input_tensor=None):
    input_shape = (None, None, 3)
    img_input = Input(shape=input_shape)

    x = Conv2D(32, 3, 3, subsample=(2, 2), bias=False, name='block1_conv1')(img_input)
    x = BatchNormalization(name='block1_conv1_bn')(x)
    x = Activation('relu', name='block1_conv1_act')(x)
    x = Conv2D(64, 3, 3, bias=False, name='block1_conv2')(x)
    x = BatchNormalization(name='block1_conv2_bn')(x)
    x = Activation('relu', name='block1_conv2_act')(x)

    residual = Conv2D(128, 1, 1, subsample=(2, 2),
                      border_mode='same', bias=False)(x)
    residual = BatchNormalization()(residual)

    x = SeparableConv2D(128, 3, 3, border_mode='same', bias=False, name='block2_sepconv1')(x)
    x = BatchNormalization(name='block2_sepconv1_bn')(x)
    x = Activation('relu', name='block2_sepconv2_act')(x)
    x = SeparableConv2D(128, 3, 3, border_mode='same', bias=False, name='block2_sepconv2')(x)
    x = BatchNormalization(name='block2_sepconv2_bn')(x)

    x = MaxPooling2D((3, 3), strides=(2, 2), border_mode='same', name='block2_pool')(x)
    x = merge([x, residual], mode='sum')

    residual = Conv2D(256, 1, 1, subsample=(2, 2),
                      border_mode='same', bias=False)(x)
    residual = BatchNormalization()(residual)

    x = Activation('relu', name='block3_sepconv1_act')(x)
    x = SeparableConv2D(256, 3, 3, border_mode='same', bias=False, name='block3_sepconv1')(x)
    x = BatchNormalization(name='block3_sepconv1_bn')(x)
    x = Activation('relu', name='block3_sepconv2_act')(x)
    x = SeparableConv2D(256, 3, 3, border_mode='same', bias=False, name='block3_sepconv2')(x)
    x = BatchNormalization(name='block3_sepconv2_bn')(x)

    x = MaxPooling2D((3, 3), strides=(2, 2), border_mode='same', name='block3_pool')(x)
    x = merge([x, residual], mode='sum')

    residual = Conv2D(728, 1, 1, subsample=(2, 2),
                      border_mode='same', bias=False)(x)
    residual = BatchNormalization()(residual)

    x = Activation('relu', name='block4_sepconv1_act')(x)
    x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name='block4_sepconv1')(x)
    x = BatchNormalization(name='block4_sepconv1_bn')(x)
    x = Activation('relu', name='block4_sepconv2_act')(x)
    x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name='block4_sepconv2')(x)
    x = BatchNormalization(name='block4_sepconv2_bn')(x)

    x = MaxPooling2D((3, 3), strides=(2, 2), border_mode='same', name='block4_pool')(x)
    x = merge([x, residual], mode='sum')

    for i in range(8):
        residual = x
        prefix = 'block' + str(i + 5)
        x = Activation('relu', name=prefix + '_sepconv1_act')(x)
        x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name=prefix + '_sepconv1')(x)
        x = BatchNormalization(name=prefix + '_sepconv1_bn')(x)
        x = Activation('relu', name=prefix + '_sepconv2_act')(x)
        x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name=prefix + '_sepconv2')(x)
        x = BatchNormalization(name=prefix + '_sepconv2_bn')(x)
        x = Activation('relu', name=prefix + '_sepconv3_act')(x)
        x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name=prefix + '_sepconv3')(x)
        x = BatchNormalization(name=prefix + '_sepconv3_bn')(x)

        x = merge([x, residual], mode='sum')

    residual = Conv2D(1024, 1, 1, subsample=(2, 2),
                      border_mode='same', bias=False)(x)
    residual = BatchNormalization()(residual)

    x = Activation('relu', name='block13_sepconv1_act')(x)
    x = SeparableConv2D(728, 3, 3, border_mode='same', bias=False, name='block13_sepconv1')(x)
    x = BatchNormalization(name='block13_sepconv1_bn')(x)
    x = Activation('relu', name='block13_sepconv2_act')(x)
    x = SeparableConv2D(1024, 3, 3, border_mode='same', bias=False, name='block13_sepconv2')(x)
    x = BatchNormalization(name='block13_sepconv2_bn')(x)

    x = MaxPooling2D((3, 3), strides=(2, 2), border_mode='same', name='block13_pool')(x)
    x = merge([x, residual], mode='sum')

    x = SeparableConv2D(1536, 3, 3, border_mode='same', bias=False, name='block14_sepconv1')(x)
    x = BatchNormalization(name='block14_sepconv1_bn')(x)
    x = Activation('relu', name='block14_sepconv1_act')(x)

    x = SeparableConv2D(2048, 3, 3, border_mode='same', bias=False, name='block14_sepconv2')(x)
    x = BatchNormalization(name='block14_sepconv2_bn')(x)
    x = Activation('relu', name='block14_sepconv2_act')(x)

    if include_top:
        x = GlobalAveragePooling2D(name='avg_pool')(x)
        x = Dense(1000, activation='softmax', name='predictions')(x)

    # Create model
    model = Model(img_input, x)

    # load weights
    if weights == 'imagenet':
        if include_top:
            weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels.h5',
                                    TF_WEIGHTS_PATH,
                                    cache_subdir='models')
        else:
            weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels_notop.h5',
                                    TF_WEIGHTS_PATH_NO_TOP,
                                    cache_subdir='models')
        model.load_weights(weights_path)

    return model


base_model = Xception(include_top=False, weights='imagenet')
x = GlobalAveragePooling2D(name='avg_pool')(base_model.output)
x = Dropout(0.5)(x)
x = Dense(1, activation='sigmoid', name='output')(x)

Building up the model, freezing the convolutionary layers with downloaded parameters trained using ImageNet and just train the bottom layers.

top_num = 2
model   = Model(input=base_model.input, output=x)

for layer in model.layers[:-top_num]:
    layer.trainable = False

for layer in model.layers[-top_num:]:
    layer.trainable = True

Define the loss function as crossentropy and optimizer as adadelta. Training the model for 50 epoches and save the best one with the lowest loss cost for validation sets.

5. Training the defined neural network in GPU

model.compile(loss='binary_crossentropy', optimizer='adadelta', metrics=['accuracy'])
best_model = ModelCheckpoint("xception_best.h5", monitor='val_loss', verbose=0, save_best_only=True)

model.fit_generator(
        train_generator,
        samples_per_epoch=2048,
        nb_epoch=50,
        validation_data=validation_generator,
        nb_val_samples=1024,
        callbacks=[best_model])

results:

Epoch 1/50

2048/2048 [==============================] - 71s - loss: 0.4697 - acc: 0.8223 - val_loss: 0.2908 - val_acc: 0.9619

Epoch 2/50

2048/2048 [==============================] - 66s - loss: 0.2697 - acc: 0.9355 - val_loss: 0.2038 - val_acc: 0.9756

Epoch 3/50

2048/2048 [==============================] - 59s - loss: 0.1997 - acc: 0.9600 - val_loss: 0.1425 - val_acc: 0.9805

Epoch 4/50

2048/2048 [==============================] - 56s - loss: 0.1684 - acc: 0.9600 - val_loss: 0.1283 - val_acc: 0.9756

Epoch 49/50

2048/2048 [==============================] - 40s - loss: 0.0792 - acc: 0.9712 - val_loss: 0.0354 - val_acc: 0.9922

Epoch 50/50

2048/2048 [==============================] - 40s - loss: 0.0758 - acc: 0.9761 - val_loss: 0.0287 - val_acc: 0.9941

Information for GPU while training:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:01:00.0      On |                  N/A |
| 50%   83C    P2   162W / 250W |  11706MiB / 12186MiB |     38%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     27188    G   /usr/bin/Xorg                                   69MiB |
|    0     27236    G   gnome-shell                                     11MiB |
|    0     29952    C   /home/huboqiang/anaconda2/bin/python         11621MiB |
+-----------------------------------------------------------------------------+

Save the model’s parameter:

with open('xception.json', 'w') as f:
    f.write(model.to_json())

6. Predicting the results in test set:

model.load_weights('xception_best.h5')
def get_image(index):
    img = cv2.imread('/hdfs/huboqiang/kaggle/CatDot/test/%d.jpg' % index)
    img = cv2.resize(img, image_size)
    img.astype(np.float32)
    img = img / 255.0
    return img

test_num = 12500
image_matrix = np.zeros((test_num, image_width, image_height, 3), dtype=np.float32)

for i in tqdm(range(test_num)):
    image_matrix[i] = get_image(i+1)

predictions = model.predict(image_matrix, verbose=1)
s = 'id,label\n'
for i, p in enumerate(predictions):
    s += '%d,%f\n' % (i+1, p)

with open('KerasResultUsingXceptionNotop.csv', 'w') as f:
    f.write(s)    
    

Processing:

100%|██████████| 12500/12500 [00:56<00:00, 221.78it/s]

12500/12500 [==============================] - 98s

7. Learning Deep Features for Discriminative Localization

Using the algorithm in https://arxiv.org/abs/1512.04150

# cat as dogs:
species = "cat"
def get_cv_cat0_image(infile):
    infile = "%s/myvalid/%s/%s" % (root_prefix, species, infile)
    img = cv2.imread(infile)
    img = cv2.resize(img, image_size)
    img.astype(np.float32)
    img = img / 255.0
    return img

l_infiles = os.listdir("%s/myvalid/%s" % (root_prefix, species))
cv_cat_num = len(l_infiles)
image_matrix_cv_cat0 = np.zeros((cv_cat_num, image_width, image_height, 3), dtype=np.float32)
for i in tqdm(range(cv_cat_num)):
    image_matrix_cv_cat0[i] = get_cv_cat0_image(l_infiles[i])

predictions = model.predict(image_matrix_cv_cat0, verbose=1)
l_ids = map(lambda x: x.split(".")[1], l_infiles)
M_out = {"id" : l_ids, "label" : predictions.ravel()}
pd_out = pd.DataFrame(M_out)

weights = model.layers[-1].get_weights()[0]
model2 = Model(input=model.input, output=[base_model.output, model.output])

plt.figure(figsize=(16, 8))

l_val = pd_out[pd_out['label']>0.5]['id'].values
if species != "cat":
    l_val = pd_out[pd_out['label']<0.5]['id'].values

for i,ids in enumerate(l_val):
    img = cv2.imread('%s/myvalid/%s/%s.%s.jpg' % (root_prefix, species, species, ids))
    img = cv2.resize(img, image_size)
    x = img.copy()
    x.astype(np.float32)
    x = x / 255.0
    [base_model_outputs, prediction] = model2.predict(np.expand_dims(x, axis=0))
    prediction = prediction[0]
    base_model_outputs = base_model_outputs[0]
    plt.subplot(3, 7, i+1)
    plt.title('cat %.2f%%' % (100 - prediction*100))
    
    cam = (prediction - 0.5) * np.matmul(base_model_outputs, weights)
    cam -= cam.min()
    cam /= cam.max()
    cam -= 0.2
    cam /= 0.8
    cam = cv2.resize(cam, image_size)
    heatmap = cv2.applyColorMap(np.uint8(255*cam), cv2.COLORMAP_JET)
    heatmap[np.where(cam <= 0.2)] = 0
    out = cv2.addWeighted(img, 0.9, heatmap, 0.2, 0)
    
    plt.axis('off')
    plt.imshow(out[:,:,::-1])

Output:

100%|██████████| 1262/1262 [00:04<00:00, 307.28it/s]

1262/1262 [==============================] - 9s

Resulting Images:

png

Results:

Score after submit to Kaggle: 0.06799

Summarizing number of lines coded:

Steps number of lines coded
Import 30 lines
Pre-porcessing 6+5+2+21 = 34 lines
Deep Learning 23+121+8+10+21 = 185 lines
Visualization: 11+54 = 65 lines

本文作者Boqiang Hu, 欢迎评论、交流。
转载请务必标注出处: My Cat Vs Dog solution using deep learning with transfer learning strategy.