In this talk we’ll present the technology behind the Fully Automated Store by Checkout Technologies. The actual version of the store is a result of the work of 12 engineers that spans the areas from hardware and design to the ultimate deep learning architectures. Will be also discussed the challenges and lessons learnt during this adventure and what it means to deploy the system which has an AI engine in its core. Creation of the dataset and the invention of the specific metrics that is capable to measure the accuracy of the entire system will be discussed.
8. Camera Positioning Study
1. Retrieve Cad Drawing of the space
2. 3D modeling of the space
3. Define camera position and direction
4. Grasshopper algorithm to make
13. Pose-estimation
● Train 2D pose estimation model using a top view dataset
including renderings from the synthetic datasets
● GPU version of the upsampling model (main bottleneck right now)
● Cameras “software” synchronization
● Reduce CPU and GPU load
16. Tracking: The glossary
Detection
One pose
in a given frame
at a given time
Reconstruction
Many detections
different frames
at a given time
Track
Many reconstructions
at different times
17. Tracking: The approach
Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
by M. Hofmann, D. Wold, G. Rigoll. 2013
18. Tracking: The approach
● Construct all possible
reconstructions and links
● Associate probabilities to them
● Associate probabilities to links
● Create Hypergraph
● Reduce to BIP problem ● Boolean variable per vertex
● Boolean variable per link
● Two constraints per vertex
○ Incoming flow = vertex variable value
○ Outgoing flow = vertex variable value
● Additional constraints from detections
○ Each detection might belong to at most
one flow
● Cost per vertex variable from reconstruction
prob
● Cost per link variable from link prob
● Minimize cost of flow
Binary Integer Programming
Minimize cost with integer variables
satisfying given constraints
20. ● Stabletracks
● Flexible
● No ID switch
● CPU Expensive
● Complexity
● Sensible to parameters calibration
● BIP is NP-Hard
21. Tracking: The doing
Introduced the 3D geometry of the store.
● Use geometric informations on cameras and obstacles to filter reconstructions
● Make all parameters position-dependent
RESULTS:
➔ Lighter graph (-50% variables, -20% equations)
➔ Reduced complexity → Better scalability to bigger stores
29. Assignment problem
The aim is to combine data from cameras and scales to predict events
e = (timestamp, action, scale, product, quantity, user)
2 INPUT
SOURCES
CAMERAS
SCALES
DATA
PROCESSING
DATA
PROCESSING
DATA FUSION
SCALE
ACTION
PRODUCT +
QUANTITY
USER
FINAL OUTPUT
CARTS
TRIGGER
30. For each user we compute
the trajectories of the
distances between relevant
joints and the scale, around
the timestamp of the action.
We train the model to classify
the action on this data.
Assignment problem
wrist
elbow
shoulder
31. We defined some metrics to evaluate how well the
system is performing:
Metrics
RECORD DATA
ANNOTATION TOOL
CALCULATE,
STORE AND
ANALYZE METRICS
The same metrics can be defined in spaces where we
ignore either the user or the action variable.
We also evaluate these metrics on the space of the carts.
Takes into account:
Back-projection
False positive rate
False negative rate
Expected detections of the scene
m1 tells us how many events are well predicted among all these possible outcomes.
m2 is the index of how many ground truth events were well predicted.
m3 tells us how many predictions were right.