Data-Centric and Message-Centric System Architecture


Published on

Presentation from April, 2010 summarizing the principles of data-centric design and how they apply to DDS technology. Message-centric design is presented by way of contrast.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • From the beginning, the data stream is associated with the schema of the data that will be propagated on that stream. Your applications already have some expectations; if you express those to a data-centric infrastructure, it can help you. For example, you can use this schema to automatically transform data into other formats. (This is how the Routing Service and Web Integration Service work.) The infrastructure can also dissect your data to filter on content (for example “give me updates where x > 5”).
    “Key” means “this field establishes the identity of a unique object.” Like the key in a relational database table. In DDS, can be any number of fields of any type(s).
    New track you’ve never seen before. Notice that since type is already known, only need to send field values, not field names or types.
    Update to a track you’ve already seen
    Another new track – notice that the key is different
    A track you’ve seen before has gone away
  • The first thing to notice is that the knowledge of your data model that was associated with the data stream in the data-centric technology disappears when you use a message-centric technology. That makes it much harder to develop a generic component such as the Web Integration Service, which much transform arbitrary data types to and from XML, downsample data by based on content, etc.
    First message arrives. It has the same structure as we saw before, except without a known type definition, the type information must be embedded within the message itself, significantly increasing its size.
    The second message arrives. It’s in a totally different format than the first! This one is just a blob of binary-encoded data.
    Maybe the consuming application understands how to decode it and maybe not. Each application connected to the network will have expectations about the formats of the messages it receives. But a messaging infrastructure can’t support those expectations, so they have to be enforced by an organizational policy. I write up a Word document that describes how you should format your messages and email it to you, and you have to follow my instructions. If you make a mistake, we’ll have to debug it at integration time. In a data-centric approach, data type enforcement is built in: developers work with typed objects in their programming languages, errors are detected when the code is compiled before it’s ever deployed, and runtime mismatches that do occur are detected automatically by the middleware.
    How do I describe a content-based filter on a binary blob? How do I transform it into another format? How do I map it into a database?
    The third message arrives. It’s in yet a third format: a plain text string. Because the messaging system doesn’t have any concept of object lifecycle, each system has to define its own ad hoc system of sentinels: “create” messages, “dispose” messages, etc. More work, and it makes it much more difficult to leverage something you’ve built for one project on the next project. By comparison, Web Integration Service takes advantage of the built-in lifecycle support in DDS – you saw that when tracks were marked with “X” or “?”.
    And without any knowledge of your objects or their lifecycle, a messaging infrastructure can only support qualities of service that make sense across an entire topic: for example time-to-live (“lifespan” in language of DDS).
  • Last point is most subtle. But lack of explicit data model makes integration more difficult.
    If message definitions and formats are ad hoc and supported only in documentation, not by the infrastructure, problems arise:
    More teams need to share this documentation and implement correctly. More chances for errors, more chances for change management to break down. Team members can call each other up and sort out these problems; multiple geographically and organizationally separated have a harder time.
    Infrastructural components – persistence, logging, technology/protocol gateways, custom tooling – become coupled to system-specific message definitions, making them very brittle in the face of change. If one team manages everything, they can roll out changes to whole line at once. But incremental updates across multiple teams is hard.
    This is how corporations roll out messaging solutions: one IT department from one company has total control over the deployment. Deployment may span several groups within the company, but almost never spans companies.
  • Data-Centric and Message-Centric System Architecture

    1. 1. The Real-Time Middleware Experts Data-Centric and Message-Centric System Architecture Making Design Explicit to Reduce Cost and Risk Rick Warren, Principal Engineer April 2010
    2. 2. Data-Centric vs. Message-Centric Design  Anyone can publish and subscribe – DDS – JMS – AMQP – WS-Eventing – REST-MS  …but different technologies have very different models Treating them as interchangeable has ramifications: – Increased cost, risk – Decreased return on technology investment – Decreased performance © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 2
    3. 3. Data-Centric vs. Message-Centric Design Data-Centric  Infrastructure does understand your data – What data schema(s) will be used – Which objects are distinct from which other objects – What their lifecycles are – How to attach behavior (e.g. filters, QoS) to individual objects  Example technologies: – DDS API – RTPS (DDSI) protocol Message-Centric  Infrastructure does not understand your data – Opaque contents vary from message to message – No object identity; messages indistinguishable – Ad hoc lifecycle mgmt – Behaviors can only apply to whole data stream  Example technologies: – JMS API – AMQP protocol © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 3
    4. 4. Example: Data-Centric Track Data © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 4 Publish Subscribe Data SchemaData Schema x : floatx : float y : floaty : float id : string (key)id : string (key) NewNew 45.645.6 78.978.9 “AA123”“AA123” UpdateUpdate 56.756.7 89.089.0 “AA123”“AA123” NewNew 65.465.4 32.132.1 “DL987”“DL987” DisposeDispose “AA123”“AA123” X Map this into XML; rows + cols Express content-based filters Propagate data efficiently
    5. 5. Example: Data-Centric Track Data © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 5 Publish Subscribe Data SchemaData Schema x : floatx : float y : floaty : float id : string (key)id : string (key) Quality of ServiceQuality of Service DeadlineDeadline Time-Based Filter Time-Based Filter HistoryHistory  Once infrastructure understands objects, can attach QoS contracts to them  “Keep only the latest value” or “I need updates at this rate” make no sense unless per-object – Flight AA123 updates shouldn’t overwrite DL987, even if AA123 is updated more frequently – Update rate for one track shouldn’t change just because another track appeared
    6. 6. Example: Message-Centric Track Data © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 6 Publish Subscribe x=float(45.6)x=float(45.6) y=float(78.9)y=float(78.9) id=“AA123”id=“AA123”0x00000006 4141010203 0042366666 429DCCCD 0x00000006 4141010203 0042366666 429DCCCD “My app knows this means dispose.” “My app knows this means dispose.” (No Data Schema, Limited QoS) (No Data Schema, Limited QoS) Nothing to base filters, xforms on Error checking dev  integration Self-describing data is verbose
    7. 7. When Message-Centric Design Works Well (Example: Securities order processing system)  No notion of objects or state beyond individual message; e.g. “Buy 12 shares IBM @ $12”  No need to filter based on content; e.g. all orders need to be processed eventually  No need for real-time QoS; e.g. maybe to you, “real-time” just means “fast”  Messaging interface is not integration interface between systems, providers, versions: one team implements both sides of interaction at same time © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 7
    8. 8. When Data-Centric Design Works Well (Examples: Distribution of track data, weather data)  Object lifecycle spans multiple updates; e.g. “Track AA123” or “Weather at (45.6°, 78.9°)”  Topic-per-object is impractical because – …objects are too numerous and/or – …their identities are unknown a priori and/or – …commonalities make them more manageable as a group. – Independent topic for each of 10K tracks? For each (latitude, longitude) tuple?  Need data-aware filtering and/or QoS enforcement; e.g. – “Give me the current state of all tracks” or – “Show me a weather map within this geographical region”  Integrating independently developed components and/or systems © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 8
    9. 9. Don’t Confuse Architecture and Technology  Can implement message-centric design with data- centric technology – Use generic data schemas (e.g. an opaque binary buffer) – Don’t define QoS contracts  Can implement data-centric design with message- centric technology – Build layers on top to handle data schemas, data caching, QoS definition and enforcement, discovery, etc. – Capture service definitions informally in documentation  Define data/service architecture first, then select appropriate technology © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 9
    10. 10. Recap  Start with architecture for data, services – Q: What data is to be exchanged under what conditions? – Explicit contracts allow: • Early error checking to lower costs • Automatic enforcement to avoid misbehavior • Independent development with lower-risk integration • Efficient data transmission for greater performance • COTS tools and integration components  …Then select technology – Data-centric and message-centric technologies have different capabilities, tho’ both described as “publish-subscribe messaging” © 2009 Real-Time Innovations, Inc. COMPANY CONFIDENTIAL 10
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.