Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nate Slater, AWS Solution Architect
Wednesday Ja...
Agenda
• Overview of Key Concepts
• Review App Requirements
• Data Modeling
• Data Collection and Storage
• Machine Learni...
Overview of Key Concepts
Location Awareness
• A ”location aware” app tailors it’s content to be relevant for a given
location.
• Location is derive...
Recommendation Engine
• A recommendation engine supplies content to the app that it thinks
the user will like.
• Recommend...
App Requirements
App Requirements – Search and Location
• Users should be able to search for local businesses on
their mobile device. Requi...
App Requirements – Search and Location
• Search results should be filtered by location and
displayed on a map.
• Only show...
App Requirements - Recommendations
• Recommendations should be based on a predictive model.
• User registration form will ...
Data Modeling
Characterize the Data
• Location data:
• Structured
• Mutable
• One-to-many relationships (e.g. keywords)
• Click-stream d...
Location Data – ER Model
Location Data – Document Model
{ ”keywords": ["seafood"],
"name": "Fog Harbor Fish House",
"location": {
"city": "San Fran...
ER vs Document Models
ER Model:
• Normalized
• Transactional
• Search requires indexing all attributes
• Adding new attrib...
Data Collection and Storage
Types of Data
Transactional
• OLTP
• Document
File
• Logs
Stream
• IoT
• Click-stream
Why Transactional Data Storage?
• High throughput
• Read, Write, Update intensive
• Thousands or Millions of Concurrent in...
Amazon DynamoDB
• Managed NoSQL database service
• Supports both document and key-value data models
• Highly scalable – no...
DynamoDB Streams
Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale...
DynamoDB Streams and AWS Lambda
Architectural Pattern – Materialize Views on DynamoDB
Amazon Kinesis
Managed Service for streaming data ingestion, and processing
Amazon Web Services
AZ AZ AZ
Durable, highly c...
Kinesis Firehose
Makes stream processing even easier!
• Automatically delivers data to S3 and Redshift
• No Kinesis-Client...
Machine Learning
Origins of Machine Learning
Machine Learning – Key Concepts
Segmentation:
• Divide the population into subgroups that have different values
for the ta...
Machine Learning – Key Concepts
No YesYes No NoYes
No Yes Yes No No Yes
Pure Pure Mixed
Machine Learning – Key Concepts
Linear Classification:
• Linear function is used to segregate the dataset
Amazon Machine Learning
Easy to use, managed machine learning service built
for developers
Robust, powerful machine learni...
Powerful Machine Learning Technology
Based on Amazon’s battle-hardened internal systems
Not just the algorithms:
• Smart d...
Machine Learning Workflow
Build & Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
- Create a Datasource objec...
Add Predictions to Existing Data Flow
Your application
Amazon
DynamoDB
+
Trigger event with Lambda
+
Query for predictions...
Demo
Data Architecture
Thank You!
Nate Slater, AWS Solution Architect
 Building a Real-Time Geospatial-Aware Recommendation Engine
Upcoming SlideShare
Loading in …5
×

Building a Real-Time Geospatial-Aware Recommendation Engine

7,877 views

Published on

Recommendation engines help your prospects and customers find the most relevant offers and content. In this presentation, you will learn how to use AWS building blocks to build your own location-aware recommendation engine. You’ll see how to store real-time events using Amazon Kinesis and Amazon DynamoDB. See how to easily move data into Amazon Redshift using Kinesis Firehose. As your site or app rises in popularity, you’ll need to track a wider variety of events and scale to handle traffic and usage spikes. Learn architectural patterns for processing large datasets and high-request volume applications.

Published in: Technology

Building a Real-Time Geospatial-Aware Recommendation Engine

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nate Slater, AWS Solution Architect Wednesday January 20, 2016 Build a Location-Aware Recommendation Engine Power your Apps with AWS Managed Services
  2. 2. Agenda • Overview of Key Concepts • Review App Requirements • Data Modeling • Data Collection and Storage • Machine Learning • Demo
  3. 3. Overview of Key Concepts
  4. 4. Location Awareness • A ”location aware” app tailors it’s content to be relevant for a given location. • Location is derived from the geo-coordinates in the mobile device or web browser. • Location data is used to filter search results to those that have proximity to the location: • Local shops, merchants, or attractions • News and events • Location data also used to tag photos or social media posts so that you can search for content by location: • ”Show me the photos I took at dinner in San Francisco”
  5. 5. Recommendation Engine • A recommendation engine supplies content to the app that it thinks the user will like. • Recommendations can be generated algorithmically: • Recommendations can be generated by a predictive model: if (time == 12:00PM) { displayLocalRestaurants(); } {gender: “m”, age: “18-25”, budget: “$10-20”, restaurant: “Taco Time”} {predictedClass: “3_star”, probability: “0.478”}
  6. 6. App Requirements
  7. 7. App Requirements – Search and Location • Users should be able to search for local businesses on their mobile device. Required search terms: • Name • Address • Description • Keyword (tacos, beer, flowers, etc.)
  8. 8. App Requirements – Search and Location • Search results should be filtered by location and displayed on a map. • Only show results on the rectangular map with geo- coordinates between: (x,y) top right (x,y) bottom left
  9. 9. App Requirements - Recommendations • Recommendations should be based on a predictive model. • User registration form will collect minimal information: • Email Address • Gender • Age Group (under 18, 18-25, over 65, etc.) • Postal Code • Recommendations will be offers from local merchants. • Must be able to track user response to offers: • Click-through • Conversions
  10. 10. Data Modeling
  11. 11. Characterize the Data • Location data: • Structured • Mutable • One-to-many relationships (e.g. keywords) • Click-stream data: • Semi-structured (JSON) • Immutable
  12. 12. Location Data – ER Model
  13. 13. Location Data – Document Model { ”keywords": ["seafood"], "name": "Fog Harbor Fish House", "location": { "city": "San Francisco", "geocoordinate": { "lat": 37.8090391877147, "lon": -122.410291436563 }, "state": "CA", "postal_code": "94133", "address": ["Pier 39", "Ste A-202"]] }, "type": ”restaurant", ”placeId": "a83d” }
  14. 14. ER vs Document Models ER Model: • Normalized • Transactional • Search requires indexing all attributes • Adding new attributes requires schema change Document Model: • Non-Normalized • Attributes can be added without schema change • Entire document can be indexed for search
  15. 15. Data Collection and Storage
  16. 16. Types of Data Transactional • OLTP • Document File • Logs Stream • IoT • Click-stream
  17. 17. Why Transactional Data Storage? • High throughput • Read, Write, Update intensive • Thousands or Millions of Concurrent interactions • Availability, Speed, Recoverability
  18. 18. Amazon DynamoDB • Managed NoSQL database service • Supports both document and key-value data models • Highly scalable – no table size or throughput limits • Consistent, single-digit millisecond latency at any scale • Highly available—3x replication • Simple and powerful API
  19. 19. DynamoDB Streams Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Sub-second latency
  20. 20. DynamoDB Streams and AWS Lambda
  21. 21. Architectural Pattern – Materialize Views on DynamoDB
  22. 22. Amazon Kinesis Managed Service for streaming data ingestion, and processing Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  23. 23. Kinesis Firehose Makes stream processing even easier! • Automatically delivers data to S3 and Redshift • No Kinesis-Client-Library or Lambda functions required • Handles all scaling of the Kinesis stream’s shards • Near Real-time • Data loaded into S3 or Redshift within 60 seconds of hitting the stream • Fully Managed • No operational overhead of managing streams, shards, or KCL applications
  24. 24. Machine Learning
  25. 25. Origins of Machine Learning
  26. 26. Machine Learning – Key Concepts Segmentation: • Divide the population into subgroups that have different values for the target variable No Yes Yes No No Yes
  27. 27. Machine Learning – Key Concepts No YesYes No NoYes No Yes Yes No No Yes Pure Pure Mixed
  28. 28. Machine Learning – Key Concepts Linear Classification: • Linear function is used to segregate the dataset
  29. 29. Amazon Machine Learning Easy to use, managed machine learning service built for developers Robust, powerful machine learning technology based on Amazon’s internal systems Create models using your data already stored in the AWS cloud Deploy models to production in seconds
  30. 30. Powerful Machine Learning Technology Based on Amazon’s battle-hardened internal systems Not just the algorithms: • Smart data transformations • Input data and model quality alerts • Built-in industry best practices Grows with your needs • Train on up to 100 GB of data • Generate billions of predictions • Obtain predictions in batches or real-time
  31. 31. Machine Learning Workflow Build & Train model Evaluate and optimize Retrieve predictions 1 2 3 - Create a Datasource object pointing to your data - Explore and understand your data - Transform data and train your model
  32. 32. Add Predictions to Existing Data Flow Your application Amazon DynamoDB + Trigger event with Lambda + Query for predictions with Amazon ML real-time API
  33. 33. Demo
  34. 34. Data Architecture
  35. 35. Thank You! Nate Slater, AWS Solution Architect

×