An Expressive and Scalable XML Event Service


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Tony Hey
  • An Expressive and Scalable XML Event Service

    1. 1. An Expressive and Scalable XML Event Service for Service-Oriented Computing Yi Huang Extreme! Computing Lab Department of Computer Science Indiana University
    2. 2. Publish/Subscribe Event Services <ul><li>Subscribers express interests, later notified of relevant data from publishers </li></ul><ul><li>Event (brokering) services manage subscriptions and deliver events </li></ul><ul><li>Enable loosely-coupled application Integration </li></ul>
    3. 3. Scalability for Pub/Sub systems <ul><li>Scalability: The capability of a system to maintain QoS under an increased load when resources are added. </li></ul><ul><ul><li>Scale up (add resource to a single node) </li></ul></ul><ul><ul><li>Scale out (add more nodes) </li></ul></ul><ul><li>Load Scalability: The ability for a distributed system to easily expand its resource pool to accommodate heavier loads. </li></ul><ul><ul><li>Number of publishers, Number of consumers,Number of subscriptions, Message rate (peak, average), Message size, etc </li></ul></ul><ul><li>Geographic scalability </li></ul><ul><ul><li>Accommodate clients (publisher/consumers) from Internet </li></ul></ul><ul><li>Administrative scalability </li></ul><ul><ul><li>Enable different organizations to share a publish/subscribe service </li></ul></ul>
    4. 4. Challenge: Expressiveness vs. Scalability <ul><li>Has been treated as trade-offs in Publish/Subscribe systems </li></ul><ul><li>Topic-based subscriptions need least processing power=>Best scalability </li></ul><ul><li>Content-based subscriptions (name-value pair) </li></ul><ul><ul><li>More expressive </li></ul></ul><ul><ul><li>consumers can get exact messages they need </li></ul></ul><ul><ul><li>Reduce unnecessary transmission on WAN </li></ul></ul><ul><ul><li>Need more processing power at each broker =>Less scalable </li></ul></ul><ul><ul><li>Available Solutions: covering subscription (SIENA); organize attributes (Gryphon) </li></ul></ul><ul><li>XPath-based Subscriptions </li></ul><ul><ul><li>Filtering XML messages based on message structure and message content, e.g. /a/b[@x=“1”] </li></ul></ul><ul><ul><li>Very expensive to parse XML messages and evaluate against XPath subscriptions </li></ul></ul><ul><ul><li>Create more challenges to scalability </li></ul></ul>
    5. 5. Examples of Content-based Filtering <ul><li>Personalized news delivery </li></ul><ul><ul><li>All the sports news. </li></ul></ul><ul><ul><li>All the articles written by John Smith. </li></ul></ul><ul><ul><li>All the articles referring to the one whose document id is 1234. </li></ul></ul><ul><ul><li>All the events that will take place in Denver. </li></ul></ul><ul><li>System monitoring </li></ul><ul><ul><li>All the log message with “Error” as status. </li></ul></ul><ul><ul><li>All the log messages from the simulation service. </li></ul></ul><ul><li>Stock quotes </li></ul><ul><ul><li>IBM stock value >100 </li></ul></ul>
    6. 6. Applications of Event Brokering Services <ul><li>Application integration </li></ul><ul><li>Personalized news delivery </li></ul><ul><li>Online auction </li></ul><ul><li>Stock tickers </li></ul><ul><li>Human resource management (Peoplesoft) </li></ul><ul><li>Network and application monitoring </li></ul><ul><li>… </li></ul>
    7. 7. Infrastructure to Support Scientific Research <ul><li>‘ e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ </li></ul><ul><li>--John Taylor, Director General of Research Councils UK, Office of Science and Technology </li></ul><ul><li>Scientific research is becoming data centric </li></ul><ul><ul><li>Unify theory, experiment, and simulation </li></ul></ul><ul><ul><li>Using data exploration and data mining </li></ul></ul><ul><ul><li>Data captured by instruments </li></ul></ul><ul><ul><li>Data generated by simulations </li></ul></ul><ul><ul><li>Processed by software </li></ul></ul><ul><ul><li>Scientist analyzes databases/files </li></ul></ul><ul><li>Need International Grid Infrastructure to support e-Science. </li></ul>
    8. 8. Services and SOA <ul><li>Service: A “web server” that runs an application for you. </li></ul><ul><ul><li>You send it requests (XML documents) and it processes the information and send replies (notifications) when it is done. </li></ul></ul><ul><li>Web service: Use standard web technology to create services, e.g. XML,SOAP, WSDL. </li></ul><ul><li>Service Oriented Architecture (SOA) </li></ul><ul><ul><li>Promotes a pluggable framework to add new features and to virtualize access to resources </li></ul></ul><ul><li>Combining SOA and event service enables loosely-coupled system </li></ul>Service 1. Service Request Bussiness Logic 2. Run Weather Simulation (WRF) 3. Publish notifications
    9. 9. My Contributions in PhD <ul><li>Participated in SOA design & implementation for eScience infrastructure in LEAD project </li></ul><ul><li>Created an Internet-scale Web-service-based Event service for SOA: The nerve system in the LEAD SOA </li></ul><ul><ul><li>Made it reliable and scalable </li></ul></ul><ul><ul><li>Integrated with various services </li></ul></ul><ul><ul><li>Tools for management and debugging </li></ul></ul><ul><li>Research problem addressed: Expressiveness vs. Scalability in XML Publish/Subscribe systems, especially the administrative scalability. </li></ul>
    10. 10. Outline <ul><li>Achieving Scalable and Expressive XML Event Services </li></ul><ul><li>Event Service in Service-oriented Scientific Workflow (LEAD project) </li></ul>
    11. 11. Achieving Scalable and Expressive XML Event Services <ul><li>Separation of concerns with a layered model </li></ul><ul><ul><li>WS-Messenger local broker for XML and WS </li></ul></ul><ul><ul><li>OpenPS broker network </li></ul></ul><ul><li>Filtering Result Summary (FRS) </li></ul><ul><ul><li>Reduce complicated event content matching to simple string matching in the routing broker </li></ul></ul>
    12. 12. Load Scalability in LAN <ul><li>Achieved with </li></ul><ul><ul><li>Message queue </li></ul></ul><ul><ul><li>Load sharing among servers </li></ul></ul><ul><ul><li>Load balancing </li></ul></ul><ul><ul><li>Cache </li></ul></ul><ul><li>Achieved in most commercial systems. </li></ul><ul><ul><li>We use existing approaches in our systems. </li></ul></ul><ul><ul><li>Not our research focus. </li></ul></ul>
    13. 13. Difficulties for Achieving Geographical Scalability <ul><li>Long network latency, limited bandwidth </li></ul><ul><ul><li>RTT (from Indiana to Germany) is over 200 ms </li></ul></ul><ul><li>Communication in WAN is inherently unreliable and virtually always point-to-point , whereas LAN communication is generally highly reliable and based on broadcasting </li></ul><ul><li>Centralized solutions lead to a waste of network resources and degrade system performance </li></ul><ul><li>Deployment, monitoring, updating and debugging </li></ul><ul><li>Firewalls </li></ul><ul><ul><li>HTTP usually does not keep connections open </li></ul></ul><ul><ul><li>Cost in initial three-way handshake on TCP </li></ul></ul>
    14. 14. Single Broker vs. Broker Network <ul><li>Duplicate transmissions of same messages </li></ul><ul><li>Take long time to get acknowledgements </li></ul><ul><li>Share the transmission bandwidth </li></ul><ul><li>Brokers take advantage of locality </li></ul>Single Broker 2-Broker network multi-Broker network
    15. 15. Filtering-based Routing <ul><li>Subscription propagation : subscriptions are propagated to every broker leaving state along the path </li></ul><ul><li>Notification delivery : matching notification follow (backwards) the path set by subscriptions </li></ul><ul><li>Reducing subscription propagation </li></ul><ul><ul><li>Avoiding subscription propagation when a filter including the subscription has been already forwarded </li></ul></ul>
    16. 16. Difficulties for Achieving Administrative Scalability <ul><li>Hindered by conflicting resource usage, management, and security policies in multiple, independent administrative domains </li></ul><ul><ul><li>E.g. Cannot trust a third-party service to inspect message content for content-based filtering </li></ul></ul>
    17. 17. Demands for Internet-scale Sharable Services <ul><li>Emerging with the concept of using Internet as computing platform. </li></ul><ul><li>Software-as-service </li></ul><ul><li>Help developers build the next generation of composite applications without maintaining infrastructure </li></ul><ul><li>Require administrative scalability </li></ul><ul><li>Amazon Simple Queue Service </li></ul><ul><li>Amazon Simple Storage Service </li></ul><ul><li>Microsoft BizTalk Service (instead of BizTalk server) </li></ul><ul><ul><li>newly announced 4/24/2007 </li></ul></ul><ul><ul><li>Added simple pub/sub support in 5/2/2007, allows multiple clients to subscribe to a service and receive notifications. </li></ul></ul><ul><ul><li>Not production-level yet </li></ul></ul>
    18. 18. Related Work on Messaging Middleware Topic - based Pub / Sub Content - based simple Pub / Sub Content - based XML Pub / Sub Geographic + Load Load Expressiveness TIBCO , IBM MQ Series , Microsoft Biztalk Server , SCRIBE , Bayeux , Echo OpenJMS , activeMQ , Oracle Advanced Queuing , Apache ServiceMix , WS - Messenger SIENA , Gryphon , HERMES , JEDI Le Subscribe , Xlyeme , Elvin ONYX , Xroute , NaradaBrokering , Xfilter , Yfilter , Xaos , XSQ , Xtrie , IndexFilter , XMLTLK , Apache ServiceMix , WS - Messenger Microsoft BizTalk Service , OpenPS OpenPS Amazon Simple Queuing Service OpenPS Administrative + Geographic + Load Scalability IBM MQ Series , Microsoft MQ , Fiorano MQ , RabbitMQ OpenJMS , activeMQ , Oracle Advanced Queuing Topic - based Queue
    19. 19. Limitation of Existing Content-based Pub/Sub Services <ul><li>Lack of administrative scalability (Not sharable) </li></ul><ul><ul><li>Homogeneity: Require same broker to be deployed across the Internet in every organization </li></ul></ul><ul><ul><li>Need to trust brokers: Brokers need to inspect message content to make routing decision </li></ul></ul><ul><ul><li>Not interoperable: Clients and brokers need same implementation, either C/C++ or Java library . </li></ul></ul><ul><li>=>Not economical: Expensive to deploy and maintain global network for one project </li></ul><ul><li>Gap : </li></ul><ul><ul><li>few projects can afford to deploy and maintain a global network, </li></ul></ul><ul><ul><li>many projects need global messaging system. </li></ul></ul><ul><li>How to create a sharable Pub/Sub service? </li></ul>
    20. 20. Our Solution: Separation of Concerns <ul><li>Layered Publish/subscribe (LPS) Model </li></ul><ul><li>5-layered model built on top of TCP/IP model </li></ul>
    21. 21. LPS Model <ul><li>Traditional approaches in distributed content-based filtering include all 5 layers in intermediary brokers. </li></ul>
    22. 22. Separation of Concerns <ul><li>Separate brokers into local brokers and routing brokers </li></ul><ul><li>Local brokers (WS-Messenger or other broker) are set up locally </li></ul><ul><ul><li>Users can trust to inspect message content </li></ul></ul><ul><ul><li>Acts as local agents </li></ul></ul><ul><li>A global network of routing brokers (OpenPS network) can interact with local brokers using Web service interfaces. </li></ul><ul><ul><li>A brokering service for local brokers </li></ul></ul><ul><ul><li>Handle global message dissemination </li></ul></ul><ul><ul><li>Sharable by many local brokers </li></ul></ul><ul><ul><li>Acts as postal offices </li></ul></ul><ul><li>Make the routing brokers as simple as possible </li></ul><ul><ul><li>Improve scalability and interoperability. </li></ul></ul><ul><ul><li>Limitation of target deployment platform: PlanetLab global testbed </li></ul></ul><ul><ul><ul><li>Shared virtual machines with CPU usage >95%. </li></ul></ul></ul>
    23. 23. Hierarchical Broker Network <ul><li>Analogous to postal service networks </li></ul><ul><li>Economies of scale </li></ul>
    24. 24. Filtering Result Summary (FRS) <ul><li>Local brokers attach FRS to messages </li></ul><ul><li>Observation 1: Topic-based message filtering in the intermediary broker can be reduced to 1-to-1 string matching. </li></ul><ul><li>Observation 2: Content-based message filtering in the intermediary broker can be reduced to any-to-any string matching. </li></ul><ul><li>Simplify complicated message matching to string matching. </li></ul>
    25. 25. Web service-based Event Service (WS-Messenger Local Event Service) <ul><li>Focus on meeting the needs of local services </li></ul><ul><li>Expressiveness </li></ul><ul><li>Web services </li></ul><ul><li>Addressing local load scalability </li></ul>Y. Huang, A. Slominski, et al., &quot; WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing ,&quot; 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid06).
    26. 26. New Requirements by SOA <ul><li>Services are autonomous => may need format transformation </li></ul><ul><li>Integrate heterogeneous services and work with existing event buses </li></ul><ul><ul><li>Java, C++, C#, Windows, Linux, Unix, OpenJMS, activeMQ,… </li></ul></ul><ul><li>XML processing </li></ul><ul><li>XPath-based filtering </li></ul><ul><li>Internet-scale </li></ul><ul><ul><li>Services from different organizations in different locations </li></ul></ul><ul><li>Shared global event service for SOA </li></ul>
    27. 27. Evolution of Pub/Sub Specification <ul><li>CORBA (Common Object Request Broker Architecture) Event Service (3/1995) </li></ul><ul><li>CORBA Notification Service (6/1997) </li></ul><ul><li>Java Message Service (JMS) (4/2002) </li></ul><ul><li>OGSI-Notification (6/2003) </li></ul><ul><li>WS-Notification (1/2004, OASIS standard 10/06) </li></ul><ul><li>WS-Eventing (1/2004, submitted to W3C in 3/06) </li></ul><ul><li>Y. Huang and D. Gannon, &quot; A Comparative Study of Web Services-based Event Notification Specifications ,&quot; Proc. of Workshop on Web Services-based Grid Applications (WSGA), 2006 </li></ul>
    28. 28. WS-Messenger Broker <ul><li>Implemented both WS-Eventing specification and WS-Notification specification </li></ul><ul><li>Mediation approach to reconcile conflicts between WS-Eventing and WS-Notification </li></ul><ul><li>A generic interface to wrap up existing local JMS messaging systems. </li></ul><ul><ul><li>Adapters created: OpenJMS, ActiveMQ, NaradaBrokering </li></ul></ul><ul><li>Efficient XML processing </li></ul><ul><ul><li>Over 4 times faster than Globus Toolkit implementation on XML message processing </li></ul></ul><ul><li>Support very expressive subscriptions </li></ul>Y. Huang, A. Slominski, et al., &quot; WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing ,&quot; (CCGrid06).
    29. 29. Architecture
    30. 30. OpenPS Broker Network <ul><li>Addressing Geographic scalability and Administrative scalability </li></ul><ul><li>Creating a sharable global event service without losing expressiveness </li></ul>
    31. 31. Design Goals of OpenPS <ul><li>Sharable by multiple unrelated projects </li></ul><ul><li>Can integrate heterogeneous services and work with existing event buses </li></ul><ul><ul><li>Java, C++, C#, Windows, Linux, Unix, OpenJMS, activeMQ,… </li></ul></ul><ul><ul><li>Use Web services as “magic bullet” </li></ul></ul><ul><li>Internet-scale </li></ul><ul><ul><li>Can integrate services across organizations around the world </li></ul></ul><ul><ul><li>Support collaboration among brokers from different vendors </li></ul></ul><ul><li>XML or simple message as message payload </li></ul><ul><li>Reliable, </li></ul><ul><ul><li>critical communication foundation </li></ul></ul><ul><li>Scalable </li></ul><ul><ul><li>Expect high-load in short time and increasing load over time </li></ul></ul><ul><li>High performance </li></ul><ul><ul><li>Near real-time message delivery </li></ul></ul>
    32. 32. Implementation
    33. 33. Filtering Result Summary (FRS) <ul><li>Currently, simply use topics and XPath expressions as FRS </li></ul><ul><li>Embedded in HTTP header </li></ul><ul><li>Other FRS formats are allowed </li></ul><ul><ul><li>e.g. encrypted string, unique numbers, etc. </li></ul></ul>
    34. 34. Performance Comparison <ul><li>2500 subscriptions (200 unique) for 50 consumers </li></ul><ul><li>FRS achieved efficient message matching. </li></ul><ul><li>Future work: improve subscription scalability (FRS may get too long). </li></ul>
    35. 35. Evaluation on PlanetLab <ul><li>PlanetLab is a global research network for developing, deploying and accessing planetary-scale services. </li></ul><ul><li>Consists of 804 nodes at 391 sites </li></ul>
    36. 36. Deployment <ul><li>Deployed to over 300 nodes </li></ul><ul><li>Overlay network follows physical network </li></ul><ul><li>Created scripts to automatically start, update, and stop services </li></ul>
    37. 37. Latency (LAN) <ul><li>Local broker processing dominates (message size: 7079 bytes) </li></ul>
    38. 38. Latency (WAN) <ul><li>Delay in OpenPS nodes dominates (message size: 7079 bytes) </li></ul><ul><li>Caused by MTU in IP (1.5KB) </li></ul>
    39. 39. Throughput (WAN) Throughput vs. Thread numbers
    40. 40. Throughput (WAN) <ul><li>Can achieve about 200-300 messages/sec with 600 XPath subscriptions using 7079-byte XML message using 400 threads </li></ul><ul><li>Still have room to improve, e.g. wrapping-up messages </li></ul>
    41. 41. Evaluation Results Summary <ul><li>Latency </li></ul><ul><ul><li>In LAN, local broker processing dominates latency. </li></ul></ul><ul><ul><li>In WAN, delay in OpenPS nodes dominates latency due to network latency </li></ul></ul><ul><li>Throughput </li></ul><ul><ul><li>Used thread pool to compensate latency for higher throughput </li></ul></ul><ul><ul><li>Can achieve about 200-300 msg/s with 600 XPath subscriptions using 7079-byte XML message using 400 threads (From Indiana to Germany) </li></ul></ul><ul><ul><ul><li>Compared to about 3 msg/s using local broker alone </li></ul></ul></ul><ul><ul><ul><li>Compared to about 20 msg/s processing capacity for PL nodes </li></ul></ul></ul><ul><ul><li>Still has room to improve, e.g. wrapping-up messages </li></ul></ul>
    42. 42. Event Service in Service-oriented Scientific Workflow (LEAD project)
    43. 43. The LEAD Project create infrastructure for better predictions of severe weather
    44. 44. The LEAD Project <ul><li>$11.25M over 5 years since 2003 </li></ul>
    45. 45. Traditional Methodology <ul><li>STATIC OBSERVATIONS </li></ul><ul><li>Radar Data </li></ul><ul><li>Mobile Mesonets </li></ul><ul><li>Surface Observations </li></ul><ul><li>Upper-Air Balloons </li></ul><ul><li>Commercial Aircraft </li></ul><ul><li>Geostationary and Polar Orbiting Satellite </li></ul><ul><li>Wind Profilers </li></ul><ul><li>GPS Satellites </li></ul><ul><li>Analysis/Assimilation </li></ul><ul><li>Quality Control </li></ul><ul><li>Retrieval of Unobserved </li></ul><ul><li>Quantities </li></ul><ul><li>Creation of Gridded Fields </li></ul>Prediction/Detection PCs to Teraflop Systems <ul><li>Product Generation, </li></ul><ul><li>Display, </li></ul><ul><li>Dissemination </li></ul><ul><li>End Users </li></ul><ul><li>NWS </li></ul><ul><li>Private Companies </li></ul><ul><li>Students </li></ul>The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather!
    46. 46. Major Paradigm Shift: CASA NETRAD adaptive Doppler Radars.
    47. 47. The LEAD Vision: Adaptive Cyberinfrastructure <ul><li>DYNAMIC OBSERVATIONS </li></ul><ul><li>Analysis/Assimilation </li></ul><ul><li>Quality Control </li></ul><ul><li>Retrieval of Unobserved </li></ul><ul><li>Quantities </li></ul><ul><li>Creation of Gridded Fields </li></ul>Prediction/Detection PCs to Teraflop Systems <ul><li>Product Generation, </li></ul><ul><li>Display, </li></ul><ul><li>Dissemination </li></ul><ul><li>End Users </li></ul><ul><li>NWS </li></ul><ul><li>Private Companies </li></ul><ul><li>Students </li></ul>Models and Algorithms Driving Sensors The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and real-time response.
    48. 48. The Service Architecture <ul><li>Use service </li></ul><ul><ul><li>to promote a pluggable framework </li></ul></ul><ul><ul><li>to add new features </li></ul></ul><ul><ul><li>to virtualize access to resources </li></ul></ul>
    49. 49. A Closer look at LEAD Service Architecture Data Storage Application services Compute Engine User’s Browser Portal server Data Catalog service MyLEAD User Metadata catalog MyLEAD Agent service Data Management Service Workflow Engine Workflow graph Provenance Collection service Event Notification Bus App factory <ul><li>Tied together with Internet-scale event bus. </li></ul>
    50. 50. Example LEAD Workflow
    51. 51. Event Service in LEAD Workflows Y. Huang, A. Slominski, et al., &quot; WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing ,&quot; 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid06).
    52. 52. Notification Views Real Time Monitor MyLEAD
    53. 53. Techniques for Firewalls <ul><li>Problem: How to deliver to event consumers behind the firewalls? </li></ul><ul><li>Solution: Use MessageBox Web service </li></ul>
    54. 54. Tools: Subscription Manager Interface <ul><li>Check and delete subscriptions on different brokers from one simple interface </li></ul>
    55. 55. Tools: Event Notification Message Viewer <ul><li>Debug </li></ul><ul><li>Monitor </li></ul><ul><li>Firewall </li></ul>
    56. 56. Conclusions <ul><li>Extended existing state-of-the-art on Publish/Subscribe systems to Web services and SOA </li></ul><ul><ul><li>Mediation among competing WS specifications </li></ul></ul><ul><ul><li>Achieved load scalability, geographic scalability and administrative scalability </li></ul></ul><ul><li>Reduced messaging filtering to 1-to-1 (topic-based) and any-to-any (content-based) matching </li></ul><ul><li>A layered model for Internet-scale sharable Publish/Subscribe Service </li></ul><ul><ul><li>Separation of concerns and economies of scale </li></ul></ul><ul><li>Applied to real-world SOA project (LEAD workflow) </li></ul>
    57. 57. Thank You!