Building a personalized web scale application - tht11005 - v1.1


Published on

A close look from the Oracle NoSQL Database product management group at the challenges associated with web scale personalization type workloads. How NoSQL technology is enabling this class of application and how the inability to meet the demands of these emerging workloads can impact the business financially.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Definition of Personalization: Wikipedia, “Personalization”, (, Retrieved 15 Sept 2013 References: Doman, James. "What is the definition of "personalization"?” ( . Quora. Retrieved 19 March 2012.
  • Retail, especially brick and mortar retail, used to be very simple. A customer walked in, found what they wanted, paid for it and walked out. There was typically almost no interaction with the customer and certainly no personalized experience. Very simple, limited set of steps.
  • The store didn’t really interact with the customer at all, unless they needed help. The only record of the customer ever visiting the store was in the form of a sales receipt. Most purchases consisted of a few items. There was no opportunity to recommend other products, customize the experience or learn about how the customer experienced their purchase.
  • With the web, everything changes. A customer’s actions can be captured, customer navigation and information can be presented for that customer, providing a personalized experience. Customers can record comments, suggestions, reviews, etc. Every customer visit is an opportunity to learn more about the customer and guide their shopping experience.
  • Purchases over the web can include 100s of steps, with a wealth of personalized data, dynamic content and navigation. Web sites can capture information relative to this customer – their product ratings, comments, shipping and packaging preferences, their lists. It is a much richer environment – how long did they stay, where did they look, what did they compare, how hard was it to get to the product. All of this information informs the store about how they are doing and can be re-used the next time the customer visits. Yes, you still need to capture the contents of the shopping cart, but that is only one aspect, albeit an important one, of the web retail experience.
  • How does the customer and the merchandiser interact over the web? Via personalization. What does that mean? Every step of the experience is personal and explicit for that customer. A few of the common interactions that people just expect: Personalized greeting,Product recommendations (based on history, based on market segments, based on friends, based on product trends, etc.). Gives you product comments and ratings. Remembers your lists – birthdays, anniversaries, special events, personal “I wanna” lists, etc. Notifies/reminds you of upcoming events and past experiences. Remembers shipping and payment information –> make it simple for me. Everyone has experienced this in one form or another on the web. The more these applications know, the more they can personalize the shopping experience for me. Every web page that I visit encapsulates an opportunity to provide personalized content.
  • Add of this personalized interaction is based on some very simple basic concepts: Rich Customer profiles. Each customer is represented by a rich customer profile with information specific to that customer, past history and recommendations for the future. These profiles are not static, they evolve over time. Capturing new types of information, new recommendations and continually adding new details that can be leveraged to further personalize the experience. Low latency simple data access to relevant data. A web page is a wealth of dynamic content, content that is generated on the fly by hundreds of individual queries. These queries need to be returned with ultra low latency because people will not wait for web pages. (Amazon: 10ms delay  1% loss of revenue). Additionally, not every web page (aka query) needs all of the information from a given customer profile. It only needs the information that is relevant. Scalability. Data only grows – catalogs, product information, customer profiles, historic data, customer ratings and comments, etc. Repositories need to be able to scale as the amount of data and processing increases.
  • But think about it… this is not just about retail. This is about any internet-based interaction where there is an opportunity to deliver a customized, personalized experience. This includes activities like online gaming and gambling, travel, etc. It includes interactions with people AND devices. Knowing the location, status and history of a device allows the application or service to provide more personalized or relevant content. For example, whenever I travel I look at two or three travel web sites. Why? Because I want to a) compare prices, b) get product ratings, c) get personalized recommendations. With cell phones, for example, knowing the location AND direction of travel allows a service to send personalized, relevant updates directly to my phone. If I’m on the east side of the city, I want to hear about promotions or traffic problems where I am now.
  • So, what’s the problem? The problem is that capturing, leveraging and evolving this data is complex. The data itself is simple. Managing it can be a challenge. Doing it scalably and cost effectively is hard. We’ve seen what happens when scalability or cost of operations are not considered – services become unresponsive or too expensive to maintain. Doing it in real-time or near-real-time without length delays in processing is harder still. For example – I may be male, live in the NW and over 50 years old. Does that mean I want to purchase Birkenstocks and Classic Rock and Roll? Not really. I’m much more than just my market segment. In fact I don’t own Birkenstocks and I prefer Folk Music. In order for me to use your web site, you have to do better than to put me in a general box. You need to know what I like. But I’m not your only customer. You have millions of customers just like me. And I’m not willing to wait 10 seconds for a web page on your site. In fact, I’m not likely to wait more than a second or two. Welcome to the “Real Time, I want it Now” generation.
  • How do all of these business requirements for personalized, customized web experiences translate into technology requirements? First of all, you need to integrate with existing Enterprise Systems – that’s ERP, CRM, DW, Analytics, etc. Secondly, you need low latency operations and transactions. Thirdly, because information evolves you also need the flexibility to change the data and the applications and the scalability of the solution as the needs of the business change. That flexibility also has to be cost effective in order to provide real business value.
  • If we look at the various storage options available to handle Big Data – there are essentially three types: Hadoop, NoSQL Databases, Relational Databases HDFS is a great distributed file system. Parallel, highly scalable and no inherent structure. However, it’s tuned primarily for bulk sequential read/write of file blocks. There are no indices for fast access to specific data records, it’s not well suited for lots of small files or updating files that have already been written. Primarily a batch system, write lots of data, then read it all in parallel over and over. Sounds like a datawarehouse, but more unstructured. The Relational Database on the other hand, is usually deployed on a big machine, and supports complex data structures stored in tables with plenty of relationships. Data is manipulated and accessed using rich SQL to build mission critical applications. There is support for variety of data access protocols like ODBC/JDBC along with an elaborate life cycle management infrastructure involving security and backup/restore operations. Enterprises run their mission critical transaction processing systems on relational databases. NoSQL database is the middle ground: a distributed key-value database with a simple data structure. It has indices. It can handle large volumes of data and is usually deployed on a distributed architecture consisting of several small machines. It’s designed for low latency high volume reads and writes of simple data, that is typical with real-time and web-scale specialized applications. It’s not tuned for reading/writing huge files – use a file system for that. It has flex configuration capabilities that make it very suitable to rapid application development requirements. Data scalability at low cost.
  • One of the important features of the Oracle NoSQL Database is that it supports transactions. Why are database transactions important in WS TP and P applications? Because “flaky” or “inconsistent” application actions will drive people elsewhere. Think about – if you visit a web site that sometimes works and sometimes doesn’t work, you’re likely to go elsewhere. I would. For example – let’s say that there's one item left in inventory and two online shoppers both put it in their cart. That’s fine because no one has purchased the item yet. However, this condition needs to be tracked and resolved at some point. This can be a challenge, especially in a globally distributed web application. In most NoSQL database products, the developer has to put special code in their application to handle it. Oracle NoSQL Database allows the developer to let the database handle the transaction consistency. Another example is the purchase of a shopping cart full of items. It is not acceptable for some, but not all of the items in the cart to be successfully processed. Application developers should be able to rely on the database, in this case Oracle NoSQL Database, to enforce the proper transaction behavior, where all or none of the items are purchased.
  • Oracle NoSQL Database provides both scalability – the ability to increase the size and throughput of a cluster AND predictable latency. We conducted this benchmark last year, working directly with our technology partners like Intel and Cisco. This graph summarizes the results of running a YCSB (Yahoo Cloud Services Benchmark) on Oracle NoSQL Database over a set of increasingly sized NoSQL Database clusters. The cluster started at 6 storage nodes (2 shards or partitions with 3 replicas on each) and grew to 12, 24 and 30 storage nodes, running on Intel’s Xeon E5-2690s, running a 95% read, 5% update workload. As you can see, as we added hardware (storage nodes) to the system, we were able to get a linear increase in throughput, while still maintaining very low latency. Adding HW increases throughput and capacity, without adding significant latency to the operations. At the end of the day, why is this important? Because it shows that a) Oracle NoSQL Database can grow as your business, storage and processing needs grow, and b) that increasing your hardware delivers the results that you would expect – more throughput without increased latency. Incidentally, this is more throughput that most companies need, running on a relatively small cluster. For example, Twitter does ~150K API calls per second.
  • Building a personalized web scale application - tht11005 - v1.1

    1. 1. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1
    2. 2. Building a Personalized Web- Scale Application – THT11005 Dave Segleau Dir. Product Management
    3. 3. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.3 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
    4. 4. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.4 Agenda  What is Personalization?  Why does it matter?  Personalization Requirements  Managing Personalization using Oracle NoSQL Database  Q&A
    5. 5. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.5 What is Personalization? “Personalization technology enables the dynamic insertion, customization or suggestion of content in any format that is relevant to the individual user, based on the user’s implicit behavior and preferences, and explicitly given details.” Wikipedia, “Personalization” 1. Any content: New & Existing content 2. Any format: Includes Web Services, Email, Applications, Kiosks, Cell Phones, etc. 3. Any preferences: Implicit & Explicit
    6. 6. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6 • A customer enters the store • They pick what they want • They pay • They walk out the door Shopping Used To Be Easy
    7. 7. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.7 The Store... • Unless the customer asked an assistant for help ... Didn’t have to do anything until the customer checked out • Unless the cashier recognized the customer ... Didn’t really know who the customer was • Unless somebody remembered talking to the customer and took notes ... Didn’t really know what the customer liked or disliked
    8. 8. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.8 THE WEB CHANGES EVERYTHING
    9. 9. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9 Many Small Transactions & Interactions Could be 100’s per purchase Update Cart Maintain Lists Packaging Instructions Capture Comments Shipping Address Track Ratings
    10. 10. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10 Add a chat agent “Hello, Amy”, Can I help you? We think you might be interested in these items. Web Applications Personalize the Experience Here are the orders that you placed recently When I go online, I can’t easily find what I need. It’s too complicated to find what’s best for me. Amy Amy Grant Can’t their sales reps remember my web order? The Acme rep always knows all my orders. Joeuser 25 I have a hard time locating the accessories after I buy the main item. I wish they arranged these better! Sam 79 I remember I saw it the last time I was on the website, can’t seem to locate it anymore!! Jose Jose Ramirez I wish they won’t make me give my address and credit card info every time! Darlene 58 Save their preferences “This is your current shipping address and payment information.” Display their recently browsed items. “Here’s what you been looking at recently”
    11. 11. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11 Customer ID Customer ID Customer ID Customer ID Customer ID Customer ID • Flexibility • Consistency • Scalability • Distribution • Latency The Online Profile Management Challenge Amazon: 10ms delay  1% loss of revenue
    12. 12. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.12 More Than Shopping Other web applications • Gambling, online games, travel sites, advertising, social, career, product online catalog processing, profile management Network-based infrastructure • Phones, smart grid, medical monitoring, factory automation, quality of service, oil & gas exploration, geo- location tracking, financial fraud detection
    13. 13. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.13 Implementing a Personalized Web-Scale App  Identify Key Interactions – Data capture – Personalized data delivery  Application Drives Data Structure  Optimize Data Access – De-normalize, aggregate, last action  Find New Interactions – Incorporate new data, Integrate downstream Best Practices
    14. 14. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.14 The Challenge Transactions and Personalization are Complex Doing it at scale is Hard Doing it in real-time is Harder Still
    15. 15. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.15 Low Latency Transactions, Lookups FlexibleIntegrate with Enterprise Systems 00:02 Requirements 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    16. 16. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.16 Highly flexible key/value data model Supports simple transactions and configurable consistency Enables web-scale transaction processing and personalization
    17. 17. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.17 Choose the RIGHT tool for the job Hadoop Distributed File System (HDFS) Oracle NoSQL Database Oracle Database File System Key Value Store Relational Database No inherent structure Simple data structure Complex data structures, rich SQL High volume writes High volume random or range based reads and writes High volume OLTP with 2-PC Limited functionality, roll-your-own applications Simple get/put high speed storage, flex configuration Security, Backup/Restore, Data life cycle mgmt, XML, etc. Batch Oriented Real-Time, web-scale specialized applications General purpose SQL platform, multiple applications, ODBC, JDBC
    18. 18. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.18  Key-value, JSON & RDF data  Large Object API  BASE & ACID transactions  Highly Available  Simple administration  Data Center Support  Online Rolling Upgrade  Online Cluster Management  Commercial grade support Features Oracle NoSQL Database Scalable, Highly Available, Key-Value Database Application Storage Nodes Datacenter B Storage Nodes Datacenter A Application NoSQL DB Driver Application NoSQL DB Driver Application
    19. 19. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.19 Atomic Consistent Isolated Durable Transaction Code Belongs In The Database Not in Your Application
    20. 20. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.20 Linear increase in throughput Predictable latency Predictable Latency at Scale
    21. 21. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.21 Oracle NoSQL Database Integrated for the Enterprise Real Time Access External Tables MapReduce, OLH, ODC, ODI NoSQL DB Driver Application GRAPH
    22. 22. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22 Engineered System (Big Data Appliance) Commodity Cluster Deployment Options
    23. 23. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23 Oracle NoSQL DB at OOW  Focus on NoSQL Database: – Sessions on JSON, Application Development, Data Centers, etc.  Hands on Lab: Application Development and Schema Design with Oracle NoSQL Database - HOL10085. Wednesday, 3:30 – 4:30 PM. Marriott Marquis - Salon 3/4  Demogrounds (Moscone South, Exhibition Hall left hand side, Booth SL-059)
    24. 24. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24 Oracle NoSQL DB at OOW – 10:00AM – 11:00 AM NoSQL Office Hours • Buca de Beppo - 855 Howard St San Francisco • Bring your questions about the product, competitors, etc. to an informal meeting before the Community Lunch and Demo. Meet and discuss with product development. – 11:00 AM – 1:00 PM NoSQL Community Lunch and Demo • Buca de Beppo - 855 Howard St San Francisco • Join special guests Andrew Mendelsohn and Mike Olson as we relax over beers with fellow architects and developers. What's more, customers will be able to check out the latest Oracle NoSQL Database solutions from Oracle partners. Special Customer Event – Tuesday, Sept 24th
    25. 25. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.25 Oracle NoSQL DB Resources  Oracle Big Data Handbook (Amazon, Barnes & Noble, Oracle Press)  NoSQL DB Use Cases, White Papers, Data Sheets, Benchmarks  NoSQL DB Documentation  NoSQL DB Downloads  NoSQL DB OTN Forum  OU Training Classes
    26. 26. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.26 Q&A
    27. 27. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.27 Graphic Section Divider
    28. 28. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.28