Computer technology related to computer vision and image processing
Object detection is a computer technology related to
computer vision and
image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.[1] Well-researched domains of object detection include
face detection and
pedestrian detection. Object detection has applications in many areas of computer vision, including
image retrieval and
video surveillance.
Often, the test images are sampled from a different data distribution, making the object detection task significantly more difficult.[5] To address the challenges caused by the domain gap between training and test data, many unsupervised domain adaptation approaches have been proposed.[5][6][7][8][9] A simple and straightforward solution of reducing the domain gap is to apply an image-to-image translation approach, such as cycle-GAN.[10] Among other uses, cross-domain object detection is applied in autonomous driving, where models can be trained on a vast amount of video game scenes, since the labels can be generated without manual labor.
Concept
Every object class has its own special
features that help in classifying the class – for example all
circles are round.
Object class detection uses these special features. For example, when looking for circles, objects that are at a particular distance from a point (i.e. the center) are sought. Similarly, when looking for squares, objects that are
perpendicular at corners and have equal side lengths are needed. A similar approach is used for
face identification where eyes, nose, and lips can be found and
features like skin color and distance between eyes can be found.
Methods
Simplified example of training a neural network in object detection: The network is trained by multiple images that are known to depict
starfish and
sea urchins, which are correlated with "nodes" that represent visual
features. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.
Subsequent run of the network on an input image (left):[11] The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two intermediate nodes. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a
false positive result for sea urchin. In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.
Methods for object detection generally fall into either neural network-based or non-neural approaches. For non-neural approaches, it becomes necessary to first define features using one of the methods below, then using a technique such as
support vector machine (SVM) to do the classification. On the other hand, neural techniques are able to do end-to-end object detection without specifically defining features, and are typically based on
convolutional neural networks (CNN).
^Alsanabani, Ala; Ahmed, Mohammed; AL Smadi, Ahmad (2020). "Vehicle Counting Using Detecting-Tracking Combinations: A Comparative Analysis". 2020 the 4th International Conference on Video and Image Processing. pp. 48–54.
doi:
10.1145/3447450.3447458.
ISBN9781450389075.
S2CID233194604.
^Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4203–4212.
arXiv:1711.06897.
Bibcode:
2017arXiv171106897Z.