In this tutorial, you will learn how to take any pre-trained deep learning image classifier and turn it into an object detector using Keras, TensorFlow, and OpenCV. Show
Today, we’re starting a four-part series on deep learning and object detection:
The goal of this series of posts is to obtain a deeper understanding of how deep learning-based object detectors work, and more specifically:
Today, we’ll be starting with the fundamentals of object detection, including how to take a pre-trained image classifier and utilize image pyramids, sliding windows, and non-maxima suppression to build a basic object detector (think HOG + Linear SVM-inspired). Over the coming weeks, we’ll learn how to build an end-to-end trainable network from scratch. But for today, let’s start with the basics. To learn how to take any Convolutional Neural Network image classifier and turn it into an object detector with Keras and TensorFlow, just keep reading. Looking for the source code to this post?Jump Right To The Downloads SectionTurning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCVIn the first part of this tutorial, we’ll discuss the key differences between image classification and object detection tasks. I’ll then show you how you can take any Convolutional Neural Network trained for image classification and then turn it into an object detector, all in ~200 lines of code. From there, we’ll implement the code necessary to take an image classifier and turn it into an object detector using Keras, TensorFlow, and OpenCV. Finally, we’ll review the results of our work, noting some of the problems and limitations with our implementation, including how we can improve this method. Image classification vs. object detectionFigure 1: Left: Image classification. Right: Object detection. In this blog post, we will learn how to turn any deep learning image classifier CNN into an object detector with Keras, TensorFlow, and OpenCV.When performing image classification, given an input image, we present it to our neural network, and we obtain a single class label and a probability associated with the class label prediction (Figure 1, left). This class label is meant to characterize the contents of the entire image, or at least the most dominant, visible contents of the image. We can thus think of image classification as:
Object detection, on the other hand, not only tells us what is in the image (i.e., class label) but also where in the image the object is via bounding box (x, y)-coordinates (Figure 1, right). Therefore, object detection algorithms allow us to:
At the very core, any object detection algorithm (regardless of traditional computer vision or state-of-the-art deep learning), follows the same pattern:
Today, you’ll see an example of this pattern in action. How can we turn any deep learning image classifier into an object detector?At this point, you’re likely wondering:
And essentially, that is correct — object detection does require a specialized network architecture. Anyone who has read papers on Faster R-CNN, Single Shot Detectors (SSDs), YOLO, RetinaNet, etc. knows that object detection networks are more complex, more involved, and take multiple orders of magnitude and more effort to implement compared to traditional image classification. That said, there is a hack we can leverage to turn our CNN image classifier into an object detector — and the secret sauce lies in traditional computer vision algorithms. Back before deep learning-based object detectors, the state-of-the-art was to use HOG + Linear SVM to detect objects in an image. We’ll be borrowing elements from HOG + Linear SVM to convert any deep neural network image classifier into an object detector. The first key ingredient from HOG + Linear SVM is to use image pyramids. An “image pyramid” is a multi-scale representation of an image: Figure 2: Image pyramids allow us to produce images at different scales. When turning an image classifier into an object detector, it is important to classify windows at multiple scales. We will learn how to write an image pyramid Python generator and put it to work in our Keras, TensorFlow, and OpenCV script.Utilizing an image pyramid allows us to find objects in images at different scales (i.e., sizes) of an image (Figure 2). At the bottom of the pyramid, we have the original image at its original size (in terms of width and height). And at each subsequent layer, the image is resized (subsampled) and optionally smoothed (usually via Gaussian blurring). The image is progressively subsampled until some stopping criterion is met, which is normally when a minimum size has been reached and no further subsampling needs to take place. The second key ingredient we need is sliding windows: Figure 3: We will classify regions of our multi-scale image representations. These regions are generated by means of sliding windows. The combination of image pyramids and sliding windows allow us to turn any image classifier into an object detector using Keras, TensorFlow, and OpenCV.As the name suggests, a sliding window is a fixed-size rectangle that slides from left-to-right and top-to-bottom within an image. (As Figure 3 demonstrates, our sliding window could be used to detect the face in the input image). At each stop of the window we would:
Combined with image pyramids, sliding windows allow us to localize objects at different locations and multiple scales of the input image: The final key ingredient we need is non-maxima suppression. When performing object detection, our object detector will typically produce multiple, overlapping bounding boxes surrounding an object in an image. Figure 4: One key ingredient to turning a CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV is applying a process known as non-maxima suppression (NMS). We will use NMS to suppress weak, overlapping bounding boxes in favor of higher confidence predictions.This behavior is totally normal — it simply implies that as the sliding window approaches an image, our classifier component is returning larger and larger probabilities of a positive detection. Of course, multiple bounding boxes pose a problem — there’s only one object there, and we somehow need to collapse/remove the extraneous bounding boxes. The solution to the problem is to apply non-maxima suppression (NMS), which collapses weak, overlapping bounding boxes in favor of the more confident ones: Figure 5: After non-maxima suppression (NMS) has been applied, we’re left with a single detection for each object in the image. TensorFlow, Keras, and OpenCV allow us to turn a CNN image classifier into an object detector.On the left, we have multiple detections, while on the right, we have the output of non-maxima suppression, which collapses the multiple bounding boxes into a single detection. Combining traditional computer vision with deep learning to build an object detectorFigure 6: The steps to turn a deep learning classifier into an object detector using Python and libraries such as TensorFlow, Keras, and OpenCV.In order to take any Convolutional Neural Network trained for image classification and instead utilize it for object detection, we’re going to utilize the three key ingredients for traditional computer vision:
The general flow of our algorithm will be:
That may seem like a complicated process, but as you’ll see in the remainder of this post, we can implement the entire object detection procedure in < 200 lines of code! Configuring your development environmentTo configure your system for this tutorial, I first recommend following either of these tutorials:
Either tutorial will help you configure your system with all the necessary software for this blog post in a convenient Python virtual environment. Please note that PyImageSearch does not recommend or support Windows for CV/DL projects. Project structureOnce you extract the .zip from the “Downloads” section of this blog post, your directory will be organized as follows: . ├── images │ ├── hummingbird.jpg │ ├── lawn_mower.jpg │ └── stingray.jpg ├── pyimagesearch │ ├── __init__.py │ └── detection_helpers.py └── detect_with_classifier.py 2 directories, 6 files Today’s # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])8 module contains a Python file — # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])9 — consisting of two helper functions:
Using the helper functions, our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image2 Python driver script accomplishes object detection by means of a classifier (using a sliding window and image pyramid approach). The classifier we’re using is a pre-trained ResNet50 CNN trained on the ImageNet dataset. The ImageNet dataset consists of 1,000 classes of objects. Three def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image3 are provided for testing purposes. You should also test this script with images of your own — given that our classifier-based object detector can recognize 1,000 types of classes, most everyday objects and animals can be recognized. Have fun with it! Implementing our image pyramid and sliding window utility functionsIn order to turn our CNN image classifier into an object detector, we must first implement helper utilities to construct sliding windows and image pyramids. Let’s implement this helper functions now — open up the # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])9 file in the # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])8 module, and insert the following code: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]]) We begin by importing my package of convenience functions, imutils. From there, we dive right in by defining our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image6 generator function. This function expects three parameters:
The actual “sliding” of our window takes place on Lines 6-9 according to the following:
The # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())0 keyword is used in place of the # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())1 keyword because our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image6 function is implemented as a Python generator. For more information on our sliding windows implementation, please refer to my previous Sliding Windows for Object Detection with Python and OpenCV article. Now that we’ve successfully defined our sliding window routine, let’s implement our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image0 generator used to construct a multi-scale representation of an input image: def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image Our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image0 function accepts three parameters as well:
Now that we know the parameters that must be inputted to the function, let’s dive into the internals of our image pyramid generator function. Referring to Figure 2, notice that the largest representation of our image is the input image itself. Line 13 of our generator simply yields the original, unaltered def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 the first time our generator is asked to produce a layer of our pyramid. Subsequent generated images are controlled by the infinite # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224)1 loop beginning on Line 16. Inside the loop, we first compute the dimensions of the next image in the pyramid according to our # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())6 and the original def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 dimensions (Line 18). In this case, we simply divide the width of the input image by the # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())6 to determine our width ( # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224)5) ratio. From there, we go ahead and # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224)6 the def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 down to the # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224)8 while maintaining aspect ratio (Line 19). As you can see, we are using the aspect-aware resizing helper built into my imutils package. While we are effectively done (we’ve resized our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7, and now we can # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())0 it), we need to implement an exit condition so that our generator knows to stop. As we learned when we defined our parameters to the def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image0 function, the exit condition is determined by the # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())8 parameter. Therefore, the conditional on Lines 23 and 24 determines whether our resized image is too small (height # load our network weights from disk print("[INFO] loading network...") model = ResNet50(weights="imagenet", include_top=True) # load the input image from disk, resize it such that it has the # has the supplied width, and then grab its dimensions orig = cv2.imread(args["image"]) orig = imutils.resize(orig, width=WIDTH) (H, W) = orig.shape[:2]3 width) and exits the loop accordingly. Assuming our scaled output def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 passes our # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())8 threshold, Line 27 yields it to the caller. For more details, please refer to my Image Pyramids with Python and OpenCV article, which also includes an alternative scikit-image image pyramid implementation that may be useful to you. Using Keras and TensorFlow to turn a pre-trained image classifier into an object detectorWith our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image6 and def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image0 functions implemented, let’s now use them to take a deep neural network trained for image classification and turn it into an object detector. Open up a new file, name it def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image2, and let’s begin coding: # import the necessary packages from tensorflow.keras.applications import ResNet50 from tensorflow.keras.applications.resnet import preprocess_input from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.applications import imagenet_utils from imutils.object_detection import non_max_suppression from pyimagesearch.detection_helpers import sliding_window from pyimagesearch.detection_helpers import image_pyramid import numpy as np import argparse import imutils import time import cv2 This script begins with a selection of imports including:
Now that our imports are taken care of, let’s parse command line arguments: # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args()) The following arguments must be supplied to this Python script at runtime from your terminal:
We now have a handful of constants to define for our object detection procedures: # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224) Our classifier-based object detection methodology constants include:
Understanding what each of the above constants controls is crucial to your understanding of how to turn an image classifier into an object detector with Keras, TensorFlow, and OpenCV. Be sure to mentally distinguish each of these before moving on. Let’s load our ResNet classification CNN and input image: # load our network weights from disk print("[INFO] loading network...") model = ResNet50(weights="imagenet", include_top=True) # load the input image from disk, resize it such that it has the # has the supplied width, and then grab its dimensions orig = cv2.imread(args["image"]) orig = imutils.resize(orig, width=WIDTH) (H, W) = orig.shape[:2] Line 36 loads ResNet pre-trained on ImageNet. If you choose to use a different pre-trained classifier, you can substitute one here for your particular project. To learn how to train your own classifier, I suggest you read Deep Learning for Computer Vision with Python. We also load our input # loop over the image pyramid for image in pyramid: # determine the scale factor between the *original* image # dimensions and the *current* layer of the pyramid scale = W / float(image.shape[1]) # for each layer of the image pyramid, loop over the sliding # window locations for (x, y, roiOrig) in sliding_window(image, WIN_STEP, ROI_SIZE): # scale the (x, y)-coordinates of the ROI with respect to the # *original* image dimensions x = int(x * scale) y = int(y * scale) w = int(ROI_SIZE[0] * scale) h = int(ROI_SIZE[1] * scale) # take the ROI and preprocess it so we can later classify # the region using Keras/TensorFlow roi = cv2.resize(roiOrig, INPUT_SIZE) roi = img_to_array(roi) roi = preprocess_input(roi) # update our list of ROIs and associated coordinates rois.append(roi) locs.append((x, y, x + w, y + h))8. Once it is loaded, we # initialize variables used for the object detection procedure WIDTH = 600 PYR_SCALE = 1.5 WIN_STEP = 16 ROI_SIZE = eval(args["size"]) INPUT_SIZE = (224, 224)6 it (while maintaining aspect ratio according to our constant # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0)0) and grab resulting image dimensions. From here, we’re ready to initialize our image pyramid generator object: # initialize the image pyramid pyramid = image_pyramid(orig, scale=PYR_SCALE, minSize=ROI_SIZE) # initialize two lists, one to hold the ROIs generated from the image # pyramid and sliding window, and another list used to store the # (x, y)-coordinates of where the ROI was in the original image rois = [] locs = [] # time how long it takes to loop over the image pyramid layers and # sliding window locations start = time.time() On Line 45, we supply the necessary parameters to our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image0 generator function. Given that # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0)2 is a generator object at this point, we can loop over values it produces. Before we do just that, Lines 50 and 51 initialize two lists:
And we also set a # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0)5 timestamp so we can later determine how long our classification-based object detection method (given our parameters) took on the input image (Line 55). Let’s loop over each def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 our # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0)2 produces: # loop over the image pyramid for image in pyramid: # determine the scale factor between the *original* image # dimensions and the *current* layer of the pyramid scale = W / float(image.shape[1]) # for each layer of the image pyramid, loop over the sliding # window locations for (x, y, roiOrig) in sliding_window(image, WIN_STEP, ROI_SIZE): # scale the (x, y)-coordinates of the ROI with respect to the # *original* image dimensions x = int(x * scale) y = int(y * scale) w = int(ROI_SIZE[0] * scale) h = int(ROI_SIZE[1] * scale) # take the ROI and preprocess it so we can later classify # the region using Keras/TensorFlow roi = cv2.resize(roiOrig, INPUT_SIZE) roi = img_to_array(roi) roi = preprocess_input(roi) # update our list of ROIs and associated coordinates rois.append(roi) locs.append((x, y, x + w, y + h)) Looping over the layers of our image pyramid begins on Line 58. Our first step in the loop is to compute the # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-s", "--size", type=str, default="(200, 150)", help="ROI size (in pixels)") ap.add_argument("-c", "--min-conf", type=float, default=0.9, help="minimum probability to filter weak detections") ap.add_argument("-v", "--visualize", type=int, default=-1, help="whether or not to show extra visualizations for debugging") args = vars(ap.parse_args())6 factor between the original def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image7 dimensions ( # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])00) and current layer dimensions ( # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])01) of our pyramid (Line 61). We need this value to later upscale our object bounding boxes. Now we’ll cascade into our sliding window loop from this particular layer in our image pyramid. Our def image_pyramid(image, scale=1.5, minSize=(224, 224)): # yield the original image yield image # keep looping over the image pyramid while True: # compute the dimensions of the next image in the pyramid w = int(image.shape[1] / scale) image = imutils.resize(image, width=w) # if the resized image does not meet the supplied minimum # size, then stop constructing the pyramid if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]: break # yield the next image in the pyramid yield image6 generator allows us to look side-to-side and up-and-down in our image. For each ROI that it generates, we’ll soon apply image classification. Line 65 defines our loop over our sliding windows. Inside, we:
We also handle optional visualization: # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0) Here, we visualize both the original image with a green box indicating where we are “looking” and the resized ROI, which is ready for classification (Lines 85-95). As you can see, we’ll only # initialize the image pyramid pyramid = image_pyramid(orig, scale=PYR_SCALE, minSize=ROI_SIZE) # initialize two lists, one to hold the ROIs generated from the image # pyramid and sliding window, and another list used to store the # (x, y)-coordinates of where the ROI was in the original image rois = [] locs = [] # time how long it takes to loop over the image pyramid layers and # sliding window locations start = time.time()6 when the flag is set via the command line. Next, we’ll (1) check our benchmark on the pyramid + sliding window process, (2) classify all of our # check to see if we are visualizing each of the sliding # windows in the image pyramid if args["visualize"] > 0: # clone the original image and then draw a bounding box # surrounding the current region clone = orig.copy() cv2.rectangle(clone, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the visualization and current ROI cv2.imshow("Visualization", clone) cv2.imshow("ROI", roiOrig) cv2.waitKey(0)3 in batch, and (3) decode predictions: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])0 First, we # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])08 our pyramid + sliding window timer and show how long the process took (Lines 99-101). Then, we take the ROIs and pass them (in batch) through our pre-trained image classifier (i.e., ResNet) via # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])09 (Lines 104-118). As you can see, we print out a benchmark for the inference process here too. Finally, Line 117 decodes the predictions, grabbing only the top prediction for each ROI. We’ll need a means to map class labels (keys) to ROI locations associated with that label (values); the # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10 dictionary (Line 118) serves that purpose. Let’s go ahead and populate our # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10 dictionary now: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])1 Looping over predictions beginning on Line 121, we first grab the prediction information including the ImageNet ID, class # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])12, and probability (Line 123). From there, we check to see if the minimum confidence has been met (Line 127). Assuming so, we update the # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10 dictionary (Lines 130-136) with the bounding # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])14 and # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])15 score tuple (value) associated with each class # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])12 (key). As a recap, so far, we have:
We’re not quite done yet with turning our image classifier into an object detector with Keras, TensorFlow, and OpenCV. Now, we need to visualize the results. This is the time where you would implement logic to do something useful with the results ( # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10), whereas in our case, we’re simply going to annotate the objects. We will also have to handle our overlapping detections by means of non-maxima suppression (NMS). Let’s go ahead and loop over over all # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])19 in our # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10 list: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])2 Our loop over the # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])10 for each of the detected objects begins on Line 139. We make a # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])22 of the original input image so that we can annotate it (Line 142). We then annotate all bounding boxes for the current # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])12 (Lines 145-149). So that we can visualize the before/after applying NMS, Line 154 displays the “before” image, and then we proceed to make another # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])22 (Line 155). Now, let’s apply NMS and display our “after” NMS visualization: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])3 To apply NMS, we first extract the bounding # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])25 and associated prediction probabilities ( # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])26) via Lines 159 and 160. We then pass those results into my imultils implementation of NMS (Line 161). For more details on non-maxima suppression, be sure to refer to my blog post. After NMS has been applied, Lines 165-171 annotate bounding box rectangles and labels on the “after” image. Lines 174 and 175 display the results until a key is pressed, at which point all GUI windows close, and the script exits. Great job! In the next section, we’ll analyze results of our method for using an image classifier for object detection purposes. Image classifier to object detector results using Keras and TensorFlowAt this point, we are ready to see the results of our hard work. Make sure you use the “Downloads” section of this tutorial to download the source code and example images from this blog post. From there, open up a terminal, and execute the following command: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])4Figure 7: Top: Classifier-based object detection. Bottom: Classifier-based object detection followed by non-maxima suppression. In this tutorial, we used TensorFlow, Keras, and OpenCV to turn a CNN image classifier into an object detector. Here, you can see that I have inputted an example image containing a “stingray” which CNNs trained on ImageNet will be able to recognize (since ImageNet contains a “stingray” class). Figure 7 (top) shows the original output from our object detection procedure. Notice how there are multiple, overlapping bounding boxes surrounding the stingray. Applying non-maxima suppression (Figure 7, bottom) collapses the bounding boxes into a single detection. Let’s try another image, this one of a hummingbird (again, which networks trained on ImageNet will be able to recognize): # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])5Figure 8: Turning a deep learning convolutional neural network image classifier into an object detector with Python, Keras, and OpenCV. Figure 8 (top) shows the original output of our detection procedure, while the bottom shows the output after applying non-maxima suppression. Again, our “image classifier turned object detector” procedure performed well here. But let’s now try an example image where our object detection algorithm doesn’t perform optimally: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])6Figure 9: Turning a deep learning convolutional neural network image classifier into an object detector with Python, Keras, and OpenCV. The bottom shows the result after NMS has been applied. At first glance, it appears this method worked perfectly — we were able to localize the “lawn mower” in the input image. But there was actually a second detection for a “half-track” (a military vehicle that has regular wheels on the front and tank-like tracks on the back): Figure 10: What do we do when we have a false-positive detection using our CNN image classifier-based object detector?Clearly, there is not a half-track in this image, so how do we improve the results of our object detection procedure? The answer is to increase our # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])27 to remove false-positive predictions: # import the necessary packages import imutils def sliding_window(image, step, ws): # slide a window across the image for y in range(0, image.shape[0] - ws[1], step): for x in range(0, image.shape[1] - ws[0], step): # yield the current window yield (x, y, image[y:y + ws[1], x:x + ws[0]])7Figure 11: By increasing the confidence threshold in our classifier-based object detector (made with TensorFlow, Keras, and OpenCV), we’ve eliminated the false-positive “half-track” detection. By increasing the minimum confidence to 95%, we have filtered out the less confident “half-track” prediction, leaving only the (correct) “lawn mower” object detection. While our procedure for turning a pre-trained image classifier into an object detector isn’t perfect, it still can be used for certain situations, specifically when images are captured in controlled environments. In the rest of this series, we’ll be learning how to improve upon our object detection results and build a more robust deep learning-based object detector. Problems, limitations, and next stepsIf you carefully inspect the results of our object detection procedure, you’ll notice a few key takeaways:
Throughout this four-part series, we’ll be examining how to resolve these issues and build an object detector similar to the R-CNN family of networks. What's next? I recommend PyImageSearch University.Course information: I strongly believe that if you had the right teacher you could master computer vision and deep learning. Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science? That’s not the case. All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught. If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery. Inside PyImageSearch University you'll find:
Click here to join PyImageSearch University SummaryIn this tutorial, you learned how to take any pre-trained deep learning image classifier and turn into an object detector using Keras, TensorFlow, and OpenCV. To accomplish this task, we combined deep learning with traditional computer vision algorithms:
The end results of our hacked together object detection routine were fairly reasonable, but there were two primary problems:
In order to fix both of these problems, next week, we’ll start exploring the algorithms necessary to build an object detector from the R-CNN, Fast R-CNN, and Faster R-CNN family. This will be a great series of tutorials, so you won’t want to miss them! To download the source code to this post (and be notified when the next tutorial in this series publishes), simply enter your email address in the form below! Download the Source Code and FREE 17-page Resource GuideEnter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! |