Creating a Single View: Overview and Analysis


Published on

This presentation is an overview of how to create a single view application.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good afternoon, we are here to talk about one of the most coveted and yet seemingly intractable challenges companies face, the single view of their data.
  • This is a three part talk, so if you want to hear the full story this is what your afternoon looks like.
  • We all have one of these. This kind of diagram cracks me up. For the last decade we have been building these high-level diagrams describing our data platforms. They have historically been documented as technology agnostic and included everything like security, business continuity, governance, standards, all in a neat set of boxes with arrows.

    The good news is that we have done the hard work of building skill around our data, formed the standards and processes that support managing the data space. The introduction of new technology doesn’t change that.
  • I come from large businesses, leading a team at JPMorgan and as a distinguished engineer at IBM.

    Everyone has islands of data … intentionally and intelligently defined, but disjointed and fragile.
  • So, lets get right into it.

    We are trying to get from multiple systems … in this case some we control and others we do not … in all their gnarly forms … and create richer document models.

    The use case is going to be customer centric – so a single view of the customer – and we are going to look how we bring together these sources into a more useful fashion and how that might evolve over time.
  • So, even though we plan to cover a customer centric use case, these single views exist in many industries. These are three examples.

    Single view of a trader

    Single view of a product

    Single patient view
  • Lets take a look at the high-level architecture for a single view. This should look a lot like what you document for your solutions.

    On the left we have the internet – our users, their devices, our applications – increasingly wearable devices.

    In the middle we have the web stack and hopefully the expression of some web services.
  • Let’s zoom in on the heart of this system…

    On the right we have a series of data sources from orders, content management, web logs, profiles etc.

    Some of that data is magically flowing across the ETL arrows … maybe it flows to Hadoop, maybe it doesn’t.

    You might have real-time event data, like a twitter feed or news, flowing to a stream analytics engine like Storm.

    What we do know is that the life of data in MongoDB is not static. It exists as and evolving data set where real-time and offline analytics continuously enhances the profile.
  • This approximates the relationships between some of the data sources we have been considering.

    The green elements are showing external systems – where we might simply be loading data (i.e. we do not control them)

    The other systems are ones we own and the colors are discreet systems.

    This is just a portion of what this might look like and it is useful when comparing it with a MongoDB mental model.
  • There are a few things happening here, so lets just take a moment with this diagram.

    First, we have two major views or documents being created – customers and products.

    A bunch of the data that was once external is not embedded into the document.

    Other elements are linked, but you can see some elements are now being aggregated and stored into the respective documents.
  • For example, the web traffic is being aggregated to produce the top pages for a give user … you could imagine linking that to the topics that might be of interest through categories or tags.

    You might use MapReduce to crunch a sentiment score … etc. etc.
  • Which highly rated reviews have “health” in the user supplied text e.g. find( { $text: { $search: “health" } } )

    How often has this person reviewed a product? e.g. reviews.count()

  • Here is the thing … this process is iterative and going to require the repetition to hone

    So focus on a single effort, stop overwhelming yourself.

    What would you ask if you could … this links you back to the business which justifies the whole activity in the first place.

  • Need to add … when do I embed vs when do I reference … and if I reference I need to be clear about the multiple queries or cache and of course the client side join
  • The good news is that MongoDB’s dynamic schemas make this easy.

    Whatever we think are today’s requirements, they will change.

    The data types we are dealing with certainly have changed.

    The key is that schemas reflect the access patterns – the use – of the data.
  • The simplest way to figure out which single view you are trying to create is to list out all the questions your business has interest in asking and see what the main nouns are.

    Items in RED are other nouns, but the prevailing theme is the customer.
  • So what do al these formats look like …

    Everything from opaque binaries with a little meta data, twitter feeds, product catalogs, social commenting, rating services, content management systems etc. This stuff is structure for the most part, but its understandable not linked and would be valuable if they were!


    Product catalog (ebay):
    Binary is from wikipedia
  • So lets actually look at what this might look like …

    Customer is modeled with metadata + a generic container called actions and more defined array for products.

    Notice how products are managed externally. This is because some of the content does not make sense to embed. What is being embedded is the key and whatever summary or metadata might be use. For example, did the customer return the item.
  • Let’s take a look at what the document that represents this structure might look like.

    At the top we have the typical metadata about the customer, then actions and then products. [explain what is in actions]
  • So, great! We have had success and just as would happen in real life, we need to add a feature. Not 24 hours ago we didn’t even have a single view and now all of a sudden we need to extend it.

    In this case we are being asked to include sentiment analytics in our system to help customer representatives and our offer systems be aware and target customers more effectively.

    Customer: Personal sentiment and sentiment to the company
    Product: Sentiment around a specific product / specific version of a product
  • So, again here is the document equivalent.

    It looks trivial because this part of the magic is trivial. Adding data to an existing model is fluid.

    The highlighted content in orange / red is describing that I am overall positive but neutral to the company.
  • So we have the starts of a model, but what if I need to look at the data in more than one way.

    Well, we likely need to be storing it in a few forms.

    Original, aggregates etc.

    The system should really be moving towards a view of incrementing counters with the original data to potential support a detailed view.

    ----- Meeting Notes (5/19/14 12:48) -----
    How do you get it in there and how do you make sure it doesnt escape (wrong door).
  • One thing everyone should be on the same page about …

    Developers have more responsibility than they have had in the past.
  • So many companies are facing change as a result of innovations…

    What questions do I want to answer…

    Now how do you get data in there?

    How do you keep it from walking out the back door?

    Stay tuned…

    Often I get questions about:

    the role of a dba…

    all the things that a software developer just got empowered to do …

    how long does any of this take?
  • Creating a Single View: Overview and Analysis

    1. 1. Enterprise Architect, MongoDB Brian D. Goodman Creating a single view: Overview and data analysis
    2. 2. What Is He Going To Talk About? What are we trying to do? How do approach doing it? How do you navigate to success? Overview & Data Analysis Data Design & Loading Strategies Securing Your Deployment Creating A Single View Part 1 Part 2 Part 3
    3. 3. A data platform is common process, tooling and management from ingestion to presentation • Integrating technologies • Streamlining ingestion • Centralizing access • Supporting analytics
    4. 4. I have sat in your chair looking for answers Source: Elephant by James Fujii @ Source: Gorilla by Luigi Lucarelli @ 13+ years 1.5 years Present
    5. 5. ID, name, address, phone … ID, name, description, qty … ID, ID_PRODUCT, ID_CUSTOMER, QTY, PRICE … - user [10/Oct/2000:13:55:36 -0700] "GET /product.html HTTP/1.0" 200 2326 10577, 41.041731, - 73.70972, Purchase, New York, 3454, 183686, 4 Apple, 0.938748, mixed, [ Company, Brand, … ], [ dbpedia, freebase, yago, website ] China, 0.519803, mixed, [ Country, Location, Admin_Div. … ], [ dbpedia, freebase … ]
    6. 6. Single view use cases abound Financial Services Creating a single view of a product, trader, client Generating recommendations to sales people Identifying anomalous patterns Data sources: client profile, trade data, market data, service catalog Retail Creating a single view of a product or customer Generating recommendations to customers Proactive, personal marketing to customers Data sources: customer profile, loyalty program, inventory, product catalog Healthcare Creating 360-degree patient view Managing electronic health-care records for patients Population management for at-risk demographics Data sources: patient records, sensor data, care center metrics
    7. 7. High-level architecture for a Single View
    8. 8. Zooming in to Single View backend
    9. 9. Traditional mental model
    10. 10. MongoDB mental model
    11. 11. Getting to a Single View of the customer Calculation using the aggregation framework MapReduce Analytics Embedding data in context Generic container to manage all the contact points and analytical observations Full-text search for ad hoc query Geo-indexing for extending insight
    12. 12. Questions we can now ask … • What kinds of products has this person purchased? e.g. distinct( “orders.category”, { “id” : 12345678 } ) • How close are they from a point of service? Now. e.g. find("location" : { $near : [40.8768, -73.787] } ) • Who is most dissatisfiedwithour service? e.g. find().sort({ sentiment : 1 } ).limit( 100 ) • What shouldmy customer representative mention in the next conversation? e.g. find( { “action.topic”: “talkingpoint” } ).sort( createdOn : -1 )
    13. 13. Start focused and iterate quickly 1. Focus on a single effort, a couple of data sources, and a familiar ‘actor” to avoid being overwhelmed 2. Consider what the data is and a few questions you might ask of it – this informs the initial data model 3. Prototype, prove then iterate
    14. 14. RDBMs terminology has an equivalent Source: RDBMS to MongoDB migration guide:
    15. 15. Normalization acts as a decision point to either embed or continue to link Source: RDBMS to MongoDB migration guide:
    16. 16. Embedding vs. linking is all about navigating the compromise between performance & data duplication Relationshi p Embed Link Example 1 : 1 If atomic ✔ ≈ person to twitter handle 1 : n ✖ ✔ person to click paths 1 : few ✔ ✖ person to content of interest n : n For performance ✔ ✔ person to content read
    17. 17. Dynamic schemas enables flexibility and agility • Data is coming in many forms • New data sources surface constantly • Single-views facilitate analytics and breed more data • Flexible schemas make it easy to evolve vs. reinvent • Schemas reflect the access patterns of the data Source: Sogeti: bigdata-introduction-sogeti-consultingservices-businesstechnology-20130628v5-130723151309-phpapp02.ppt
    18. 18. Modeling a single view of the customer begins with the types of queries we think we will want to make • What is the customer (individual or company)doing that is not normal? • What are the customer’s peers doing, that they would do, but haven’t? • What productshas the customer purchased and what is likelihood that they will buy something new in the next week | thirty days? • What willthe customer do this week? • What are customers tellingus about them, us, the world? • What content / products are effectiveat generating sales? • Who is influential – both us and customers? • What does the world look like from our proprietarypointof view vs. the what is happening in the public? • Which customers should I focus on, what should I show them and why is it smart?
    19. 19. What do all these crazy formats look like? Twitter timeline Product catalog Comments Content Binary
    20. 20. Centering around customers enables single access pattern for fast rich queries and iterative extension • Using arrays to allow for multiple relationships (organizations & locations) • Tags as a means to flexibly profile e.g. top_10_percent, badge_of_honor, flight_risk • Actions array to store analytical outputs that describe the customer to sales channels and people e.g. anomalies, recommendations, predictions • Products array to store products they customer has purchased and if they publicly engaged e.g. rating, commenting, tweeting returned etc. • Products collection to store details • Product comments reference Customers
    21. 21. A single-view of the customer enables fast, rich queries and iterative extension { "_id" : ObjectId("5063114bd386d8fadbd6b004"), "name" : "Brian D. Goodman", "organizations" : ["MongoDB”], "locations" : [ { "type" : "work", "address" : "229 W 34 St., 5th floor", "city" : "New York", "state" : "NY", "zipcode" : "10036" } ], "tags" : ["blogger”, "photographer”, "technology”], "lastAccess" : ISODate("2012-12-19T06:15:33.035Z"), "actions" : [ { "type" : "recommendation", "date" : "…", "tags" : [ ”photography", ”video" ], "description" : "Zack Arias released new video!", "action" : "http://…" } ], "products" : [ { "_id" : ObjectId("5063114bd333d8fadbd6b444"), "purchased" : ISODate("2012-12- 19T06:15:33.035Z"), "rated" : 1, "commented" : 1, "tweeted" : 0, "returned" : 0 } ]
    22. 22. Adding streaming customer sentiment to base model enables new analytics and customer insight • Sentiment analytics could be performed – Across all customer expressions – Towards the company – Towards specific products • An aggregate sentiment score might place the customer on a positive / negative scale as a means to identify predisposition and can be contrasted with sentiment towards the company • Sentiment towards products might be stored in an array, similar to comments, enabling context specific dispositions
    23. 23. Adding streaming customer sentiment to base model enables new analytics and customer insight { "_id" : ObjectId("5063114bd386d8fadbd6b004"), "name" : "Brian D. Goodman", "organizations" : ["MongoDB”], "locations" : [ { "type" : "work", "address" : "229 W 34 St., 5th floor", "city" : "New York", "state" : "NY", "zipcode" : "10036" } ], ”sentiment” : { ”personal” : 1, „towardsCompany” : 0 }, "tags" : ["blogger”, "photographer”, "technology”], "lastAccess" : ISODate("2012-12-19T06:15:33.035Z"), "actions" : [ . . . ], "products" : [ { "_id" : ObjectId("5063114bd333d8fadbd6b444"), "purchased" : ISODate("2012-12-19T06:15:33.035Z"), "rated" : 1, "commented" : 1, "tweeted" : 0, "returned" : 0 } ] }
    24. 24. So, we have a model, but what if I need more than one way to look at my data? • Aggregateinstead of recalculate Examples: Creating an activity score based on a few variables on the fly or… Storing a variety of counters, but not the individual records • Store the original granularity for a detailed query and embed aggregations Example: Total number of reviews vs. counting the reviews themselves • Embedding and linking gives you the best of performance and management Example: a master collection for addresses, but an embedded copy for performance
    25. 25. MongoDB shifts responsibilities to developers • Enforcing security, auditing, redaction, schema • Assuming change • Managing the data holistically • Balancing the decision to embed vs. link and the extra management of data that may be required
    26. 26. What does your single-view look like? • What is at the center of your single-view? • What questions are you asking of your data? • How do you plan to get the data in there? • How do you make sure it doesn’t exit out the backdoor?
    27. 27. • MongoDB Seattle : September 16, 2014 • MongoDB Boston : October 1, 2014 • MongoDB DC : October 14, 2014 • MongoDB SF : December 3, 2014 Coming to you soon… Register Now: Use code 25webinar for a 25% discount.
    28. 28. Questions? Stay tuned after the webinar and take our survey for your chance to win MongoDB swag.
    29. 29. Enterprise Architect, MongoDB Brian D. Goodman Thank You