Hazmat Part 3: Choosing the Class

Hey there, it’s Anthony, ready to bring an end to this epic trilogy of identifying hazmat signs.

In this part we will cover how we solved the second half of the generic object detection problem (as detailed in the first part): classification.

The algorithms used in this part are by far the most complicated. Rather than a tutorial on how the algorithms work, this is better described as a tutorial on how to use OpenCV functions that execute the algorithms. Therefore, there will be a lot more code and less explanations than in the first two parts.

First, an algorithm called Scale-Invariant Feature Transform (SIFT) can be used to find “key points” in an image. When applied to a sample image of a hazmat sign, the following output can be found:

It should be noted that SIFT has detected text as features. This enabled us to forego implementing standalone text detection. SIFT has been accurate enough to tell the difference between signs whose only difference is text (like explosives 1.1 and 1.2) surprisingly consistently. The above output was generated by the following code:

# imports
import cv2, numpy


# read the image
img = cv2.imread('template.png')


# make it grayscale
gray = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2GRAY)


# make a new variable called 'out' as a copy of the input image
out = img.copy()


# make a new SIFT detector object as per the opencv implementation
sift = cv2.xfeatures2d.SIFT_create()


# store the key points of our SIFT detector's detections in the variable kp
kp = sift.detect(gray, None)


# draw them using opencv's keypoint drawing function onto the 'out' image defined above
cv2.drawKeypoints(gray, kp, outImage=out)


# append the first and second image together
final = numpy.hstack((img, out))


# display the two images appended together
cv2.imshow("Features", final)


# await a key press
cv2.waitKey(0)

# imports

import cv2, numpy

# read the image

img = cv2.imread('template.png')

# make it grayscale

gray = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2GRAY)

# make a new variable called 'out' as a copy of the input image

out = img.copy()

# make a new SIFT detector object as per the opencv implementation

sift = cv2.xfeatures2d.SIFT_create()

# store the key points of our SIFT detector's detections in the variable kp

kp = sift.detect(gray, None)

# draw them using opencv's keypoint drawing function onto the 'out' image defined above

cv2.drawKeypoints(gray, kp, outImage=out)

# append the first and second image together

final = numpy.hstack((img, out))

# display the two images appended together

cv2.imshow("Features", final)

# await a key press

cv2.waitKey(0)

An algorithm called k-nearest-neighbors (KNN) can be used to match the similarity of two images based on their features, found by SIFT. The best matching features are chosen, appended to an array called ‘good’ and outputted, as in the following python function:

def bff_match(image, template):
	# Initiate SIFT detector
	sift = cv2.xfeatures2d.SIFT_create()

	# find the keypoints and descriptors with SIFT
	kp1, des1 = sift.detectAndCompute(image, None) # apparently this only works with 8-bit images
	kp2, des2 = sift.detectAndCompute(template, None)

	FLANN_INDEX_KDTREE = 0
	index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
	search_params = dict(checks = 50)
	flann = cv2.FlannBasedMatcher(index_params, search_params)
	matches = flann.knnMatch(des1, des2, k=2)

	# store all the good matches as per Lowe's ratio test.
	good = []
	for m,n in matches:
		if m.distance < MIN_MATCH_RATING*n.distance:
			good.append(m)

	# return the number of matches (the tutorial describes how to draw the features if interested)
	return good

def bff_match(image, template):

# Initiate SIFT detector

sift = cv2.xfeatures2d.SIFT_create()

# find the keypoints and descriptors with SIFT

kp1, des1 = sift.detectAndCompute(image, None) # apparently this only works with 8-bit images

kp2, des2 = sift.detectAndCompute(template, None)

FLANN_INDEX_KDTREE = 0

index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)

search_params = dict(checks = 50)

flann = cv2.FlannBasedMatcher(index_params, search_params)

matches = flann.knnMatch(des1, des2, k=2)

# store all the good matches as per Lowe's ratio test.

good = []

for m,n in matches:

if m.distance < MIN_MATCH_RATING*n.distance:

good.append(m)

# return the number of matches (the tutorial describes how to draw the features if interested)

return good

By feeding the output of that function into the following code:

if len(good)>MIN_MATCH_COUNT:
    src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
    dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

    M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
    matchesMask = mask.ravel().tolist()

    h,w = img1.shape
    pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
    dst = cv2.perspectiveTransform(pts,M)

    img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)

draw_params = dict(matchColor = (0,255,0), # draw matches in green color
                   singlePointColor = None,
                   matchesMask = matchesMask, # draw only inliers
                   flags = 2)

img3 = cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)

plt.imshow(img3, 'gray'),plt.show()

if len(good)>MIN_MATCH_COUNT:

src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)

dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)

matchesMask = mask.ravel().tolist()

h,w = img1.shape

pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)

dst = cv2.perspectiveTransform(pts,M)

img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)

draw_params = dict(matchColor = (0,255,0), # draw matches in green color

singlePointColor = None,

matchesMask = matchesMask, # draw only inliers

flags = 2)

img3 = cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)

plt.imshow(img3, 'gray'),plt.show()

We can see this output:

The best matching features have been mapped from a template image to a real-world image (of a hazmat sign on my phone).

Using the following function, we can loop over a series of templates and find the one that best matches an inputted real-world image:

# classify the input image
def classify(image, sign_list):

    # Loop through signs to store bff and color matches
    for i, sign in enumerate(sign_list):
        template = cv2.imread(sign.image)
        sign.bff_data = bff_match(image, template)
        sign.bff = len(sign.bff_data)
        
    # Sort the results using the bff match attribute in reverse order
    sign_list.sort(key=lambda x: x.bff, reverse=True)

    best = sign_list[0]
    best_image = cv2.imread(best.image)

    return best.name

# classify the input image

def classify(image, sign_list):

# Loop through signs to store bff and color matches

for i, sign in enumerate(sign_list):

template = cv2.imread(sign.image)

sign.bff_data = bff_match(image, template)

sign.bff = len(sign.bff_data)

# Sort the results using the bff match attribute in reverse order

sign_list.sort(key=lambda x: x.bff, reverse=True)

best = sign_list[0]

best_image = cv2.imread(best.image)

return best.name

In this code, a ‘sign’ class was used to abstract the process of storing information. See ‘classify_abstracted.py’ in SARTVision for the definition of that class, and ‘hazmat.py’ for where the class is used to define a series of templates, which is passed into classify() as the second parameter, sign_list.

The code from this blog post, which is also used by the SART robot, has been taken from the following tutorials, which I strongly recommend as further reading:

Leave a Reply Cancel Reply