Hardware, SLAM tracking,Privacy and AR Cloud Data.
1.
UNIT 2
Hardware, SLAM,Tracking
Ms P.Ananthi, Assistant Professor, Kongu Engineering College 1
2.
Agenda
• How theComputer Vision That Makes Augmented Reality Possible
Works
• A Brief History of AR
• Select an AR Platform-Mapping
3.
6D.ai
I'm Matt Miesnieks,CEO of 6D.ai, a startup that emerged from Oxford University’s Active Vision Lab. My co-
founder, Professor Victor Prisacariu, leads one of the world’s top AR computer vision research groups.
The Unique Approach of 6D.ai
• 6D.ai takes a distinct approach to Augmented Reality (AR). We focus on solving the most challenging technical
problems in AR and provide solutions through developer APIs. The privacy commitment means that Personally
Identifiable Information (PII) never leaves the device unless explicitly allowed by the user. This privacy-first
approach introduces a complex technical challenge: building and managing large SLAM (Simultaneous
Localization and Mapping) maps directly on the device, in real-time. While small maps are easier to handle,
managing large maps—such as those bigger than a large house—is significantly more difficult.
6D.ai's Technical Innovations
• It has developed advanced systems using next-generation 3D reconstruction, relocalization algorithms, and
neural networks. These systems are based on cutting-edge, unpublished research, aimed at delivering a
seamless, intuitive user experience (UX) for multiplayer and persistent AR
4.
Key Features of6D.ai’s Technology
• Real-Time On-Device Processing: All processing occurs in real-time on the device, with cloud storage used
for persistent maps and offline data merging and cleanup.
• Automatic Background Map Building: Maps are built in the background while the app runs. Updates from
all users are merged into a single map, enhancing space coverage.
• Privacy and Data Security: Anchors and maps contain no PII and are securely stored in the cloud, ensuring
that data cannot be reverse-engineered into visual images.
• Shared Map Access: Maps are accessible across all apps, meaning every user benefits from the
contributions of others.
• Offline and Peer-to-Peer Capabilities: 6D.ai's system works offline, in peer-to-peer environments, and in
secure/private settings, making it versatile and independent of cloud dependency.
5.
A Brief Historyof AR
Following is a summary of the key players who brought AR to consumer quality:
• Visual inertial odometry invented at Intersense in the early 2000s by Leonid Nai‐ mark Dekko Samsung
→ →
FB and Magic Leap and Tesla
→
• FlyBy VIO Tango and Apple
→
• Oxford Active Vision Lab George Klein (PTAM) Microsoft
→ →
• Microsoft (David Nister) Tesla
→
• Oxford Gerhard Reitmeir Vuforia
→ →
• Oxford Gabe Sibley Zoox
→ →
• Oxford + Cambridge + Imperial College Kinect Oculus and ML (Richard Newcomb, David Molyneux)
→ →
• Vuforia Eitan Pilipski Snap
→ →
• FlyBy/Vuforia Daqri
→
6.
• Scarcity ofExpertise: Only a handful of people worldwide possess the expertise to build
high-quality AR systems.
• Monocular Visual Inertial Odometry (VIO): VIO has emerged as the leading solution for
mobile tracking, delivering the best user experience (UX) today.
• Origins of VIO: VIO was first implemented in the early 2000s by Intersense, a
military/industrial supplier based in Boston, Massachusetts.
• Development at Dekko: Leonid Naimark, a key figure in VIO's development, was the chief
scientist at Dekko, which proved the limitations of VIO on consumer devices like the iPad 2.
• Google and Apple’s Adoption: The VIO system, initially developed by FlyBy, was licensed
to Google for Tango and later became the core of Apple's ARKit after Apple acquired FlyBy.
7.
• Repetition ofLessons: Many ARKit demos mirror lessons learned from earlier platforms like Vuforia
and Layar, but on a much larger scale.
• Learning from Experience: Developers are revisiting concepts from previous AR generations,
highlighting the iterative learning process in AR development.
• Offering Feedback: The author offers to provide feedback and support for AR projects, based on
extensive experience with various types of AR apps.
• Embrace Novelty: Developers are encouraged to experiment with novelty apps, as they were the first
hits on smartphones, despite the challenge of finding practical AR use cases for handheld devices.
• Utility in Handheld AR: Finding use cases that provide real utility via AR on handheld see-through
devices is challenging but crucial for innovation.
8.
• Hololens andTesla: Key developers from Oxford Active Computing Lab contributed to the development of the VIO
system for Hololens and autonomy systems at Tesla
• Vuforia’s Contributions: Engineers from Cambridge and other institutions played significant roles in developing
Vuforia’s SLAM and VIO systems, which influenced many current AR technologies.
• Snap's AR Engineering: Many experts from Vuforia, including those who studied at Cambridge, now lead AR
software engineering at Snap.
• Influence of Academic Institutions: Research teams from Oxford, Cambridge, and Imperial College have
significantly influenced the development of SLAM and AR tracking systems at companies like Oculus, Magic Leap,
and Apple.
• Lack of New AR Startups: Despite the small, highly skilled talent pool, there are few current AR startups in the AR
tracking domain led by this elite group of engineers.
9.
How and Whyto Select an AR Platform
• Starting with ARKit: Begin developing your AR idea on a device that supports ARKit, as it's
widely accessible and functional.
• Real-World Challenges: Designing an AR app for the real world, where you don't control the
scene, is vastly different from designing smartphone or VR apps where every pixel is controlled.
• Transition to Advanced Platforms: After mastering ARKit, move to platforms like Magic Leap,
6D.ai, or Hololens, which allow for spatial mapping and interaction with 3D structures in
uncontrolled environments.
• Steep Learning Curve: Transitioning from ARKit to advanced platforms requires a significant
shift in thinking, even steeper than transitioning from web to mobile or mobile to VR.
• Rethinking UX and Use Cases: Developers must completely rethink how apps function and
determine what user experiences (UX) or use cases make sense in AR.
10.
Performance Is Statistics
Alwaysdemonstrate or test a system in the
real world. There’s a huge gap between
controlled and uncontrolled scenes. Never
trust a demonstration video.
What does work well mean?
• — No detectable user motion for init
• — Instant convergence
• — Metric scale <2% error
• — No jitter
• — No drift
• — Low power
• — Low BOM cost
• — Hundreds of meters of range with <1%
drift (prior to loop closure)
• — Instant loop closures
• — Loop closure from wide range or
angles
• — Low-featured scenes (e.g., sky, white
walls)
• — Variably lit scenes/low light
• — Repetitive or reflective scenes
11.
• A camera'sdigital image sensor, represented as a grid of pixels, captures varying intensities of light, affecting tracking stability.
• Changes in light conditions alter the photons hitting the sensor, causing the visual tracking system to misinterpret movement.
• AR platforms like ARKit must determine which points in the scene are reliable, averaging calculations to estimate the correct pose.
• To improve system robustness, there must be precise integration and calibration between camera hardware, IMU hardware, and
software algorithms.
• Developers should rigorously test their apps in diverse scenes and lighting conditions to ensure reliability in real-world scenarios.
12.
Integrating Hardware andSoftware
• VIO (Visual Inertial Odometry) is relatively easy to implement but challenging to perfect, requiring rapid convergence of inertial and optical
systems onto a stereoscopic map with high accuracy.
• The Dekko implementation needed users to make specific motions and move the device for about 30 seconds before achieving
convergence, highlighting the complexity of building a great inertial tracking system.
• Only a small number of engineers worldwide have the expertise to build these systems, with most working in specialized fields like missile
tracking or Mars rover navigation.
• Tight integration between hardware and software is crucial, particularly with accurately clock-synchronized IMU and camera systems,
which was previously difficult to achieve.
• The first Tango Peanut phone succeeded because it was the first consumer device to accurately synchronize all components, leading to
excellent tracking.
• Modern systems on chips, like those from Qualcomm, now have synchronized sensor hubs, making VIO viable on most current devices with
proper sensor calibration.
• The close hardware-software dependency has made it nearly impossible for software developers to create great systems without OEM
support.
• Google, Microsoft, Magic Leap, and Apple have succeeded in AR by either investing heavily in hardware specifications or building their own
hardware, with Apple's ARKit thriving due to its control over both hardware and software.
13.
Optical Calibration
o Forthe software to precisely correlate whether a pixel on the camera sensor
matches a point in the real world, the camera system needs to be accurately
calibrated.
Optical
Calbration
Geometric
calibration
Photometric
calibration
14.
Optical
Calibration
• Geometric Calibration:This
process corrects for lens
distortions, such as the barrel
effect, using a pinhole camera
model. It typically involves using a
checkerboard and basic camera
specifications, which most
software developers can handle
without OEM involvement.
15.
• Photometric Calibration:This is more complex and usually requires OEM
involvement. It addresses the image sensor's specifics, including lens
coatings, to ensure accurate color and intensity mapping. This calibration
is essential for robust optical tracking, reducing errors by increasing the
certainty that a pixel on the sensor corresponds to a real-world point.
16.
Inertial Calibration
• IMU= Inertial Measurement Unit
• Combines:
• Accelerometer – Measures acceleration
• Gyroscope – Measures rotational motion
• Found in: Smartphones, VR devices, drones, etc.
• To estimate distance need
→ double integration of acceleration
• Dead reckoning: Estimating distance traveled by double integrating the acceleration
data.
• It’s essentially a "smart guess" based on sensor readings.
• However, errors accumulate very quickly, which makes long-term accuracy difficult.
17.
• To makeshort-term distance estimation accurate enough (for a few seconds).
• This helps when:
• The camera temporarily loses tracking (e.g., user covers the lens).
• Visual data is missing or obstructed.
• Repeated, controlled movements are used to study and model IMU behavior.
• Example: A robotic arm moves the device in exactly the same way multiple times.
• The actual motion (ground truth) is compared with the IMU’s output.
• Filters are written to reduce errors and improve accuracy.
• Achieving real accuracy with IMUs is very difficult. IMU data is affected by manyhidden
errors. Requires identifying issues from signal traces (e.g., RGB lines ingraphs)
18.
• Common AccelerometerErrors
• Fixed Bias
• Constant offset even when no acceleration exists
• Scale Factor Error
• Output doesn’t match expected model
(oftennonlinear)
• Cross-Coupling
• Acceleration in one direction affects measurement
inanother
• Due to sensor misalignment or imperfections
• Vibro-Pendulous Error
• Vibration in sync with pendulum movement
• Like a child on a swing
• Clock Error
• Incorrect measurement of
integration time
• Nonzero Acceleration at Rest
• Sensor gives false readings
even at zero movement
• Graph Analysis
• Errors like these are seen in
RGB trace graphs (e.g., Figure )
19.
The Future ofTracking Beyond VIO in AR
Systems
• VIO (Visual-Inertial Odometry) is still the best
solution for tracking over hundreds of meters:
• Low power, low cost, and high accuracy
• Ideal for Head-Mounted Displays (HMDs)
• Monocular VIO remains dominant due to efficiency
• Future Trends & Alternatives:
• 🔍 Deep Learning:
• Promising for tracking & relocalization
• Current error ~10% vs VIO’s <1%, but improving
• 📏 Depth Cameras:
• Provide metric scale & help in low-feature scenes
• Downsides: High power use, poor outdoor
performance, short range, costly
• 👀 Stereo RGB / Fisheye Lenses:
• Capture more features (e.g., walls, ceilings,
carpets)
• Offer some depth info at lower compute cost
• Limited range due to close camera spacing
• 🌍 Future Direction:
• Need to track across large outdoor areas
• Similar to self-driving car systems, but with:
• Fewer sensors
• Lower power
• Solutions will include cloud-based
services (e.g., Google Tango VPS)
20.
The Future ofAR Computer Vision
• 6DOF Tracking is now standard in modern
devices (since ~2019).
• 3D Reconstruction (a.k.a. spatial mapping/depth
perception):
• Captures real-world shape & structure
• Enables occlusion (virtual objects hiding behind
real ones)
• Uses depth cameras to create point clouds →
mesh Unity render
→
• 🌐 Cloud Hosting & Multi-User Interaction:
• Share and extend 3D models
• Scale for real-time multiplayer AR
• 🧠 Semantic Scene Segmentation:
• Uses deep learning + 3D data (LIDAR/stereo)
• Identifies sidewalks vs roads (e.g., Pokémon Go
safety)
• 🎮 Vision: Real-World MMORG
• Like World of Warcraft but in the real world