Abstract:
Trax is built to revolutionize the retail industry using cutting edge computer vision techniques. Our challenges include fine-grained visual recognition of densely arranged items in store displays.
In this presentation, we will introduce a novel CNN architecture which learns both local visual features and neighboring class representations, and extends the softmax function to define a probabilistic graphical model.
In test time, the graph is extracted from a context-enhanced detector, which converges into the correct localizations by iterative reduction of the attention regions, and refinement of object-proposal granularity.
The detected items are then classified by the marginal distributions of the graph joint probability, integrating local and spatial features.
We will further demonstrate how the algorithms can be optimized by dynamic programming techniques and parallel pipeline design.