Large Java EAI Training

1,966 views

Published on

http://www.Intertech.com

This is a slide deck from JavaOne by Intertech.

Published in: Technology, News & Politics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,966
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Large Java EAI Training

  1. 1. Design Guidelines for Large Java Message-based EAI SystemsAn EAI Case Study<br />Intertech<br />March 2006<br />
  2. 2. This Talk<br />Presents an EAI case study<br />A very large EAI system for a retail chain.<br />Identify issues and challenges encountered in the project<br />Identifies lessons learned and recommendations for your EAI projects.<br />Let’s you know others do have it as bad as you.<br />The story does have a happy ending – maybe providing hope to the hopeless.<br />
  3. 3. Large Java EAI<br />Messaging/EAI development  Web or other distributed app development<br />Especially when very large<br />Many new or significantly altered considerations<br />Requirement differences<br />Time and space needs<br />Process control/orchestration<br />Failure handling<br />Monitoring<br />Proprietary nature of vendor solutions<br />Support turnover<br />Staffing needs<br />
  4. 4. The Case Study Situation<br />A major retail chain has dozens of distribution centers<br />Each distribution center or warehouse services hundreds of stores (&gt;1200 total stores).<br />Each distribution center is moving thousands of “cartons” (i.e. boxes) around the warehouse each day<br />Receiving them from trucks through dock doors.<br />Moving them with fork lifts to storage areas in the warehouse<br />Conveying them to “break down” areas for distribution to stores.<br />Conveying them down belts to storage areas or outbound trucks.<br />Moving them onto trucks that depart the warehouse.<br />
  5. 5. The Case Study Situation<br />A box is tracked via labels and bar code readers.<br />Some “reads” are manual and some are automated.<br />Generating literally hundreds of “events” per second per warehouse.<br />RFID was about to make create more events.<br />More “reads” from more points in the warehouse.<br />Potentially adding store “reads” to the event list.<br />
  6. 6. “The System”<br />The retail chain wanted all the data on events regarding the movement of cartons sent to HQ<br />Providing them with unparalleled real time information on inventory levels and product status. <br />Providing more accurate information for merchandise analyst and productivity monitoring for warehouse managers.<br />Providing a Java Web application to nearly 10,000 users to access the data company wide.<br />Reports galore.<br />Some limited ad hoc query reporting.<br />
  7. 7. Let’s do the math…<br />&gt;25 warehouses <br />Each generating ~15-20 carton events per second<br />Averaging 400 messages a second incoming at HQ<br />Peak around 1300 messages a second incoming at HQ<br />Data around an event ~200bytes/msg<br />24x7x52 (31,449,600 seconds for those not counting)<br />= ~ 4-7GB a day<br />
  8. 8. …and the math wasn’t getting any better<br />During Christmas time things were worse – much worse.<br />The organization wants to double its current size by 2010!<br />Oh ya…did I mention RFID was coming<br />Tripling or quadrupling the number of events<br />
  9. 9. My Challenge<br />Design and implement a system to get the data from the warehouses to HQ <br />In near real time to support the reporting needs<br />Use whatever makes sense (to some degree – more later)<br />With a good size team (20-25 people in various roles)<br />
  10. 10. My Background<br />15 year “grizzled” veteran of software development.<br />6 years of Java experience.<br />Author of a Java book.<br />Experienced architect, manager, mentor, trainer.<br />Eager to take on any software system challenge.<br />No experience in EAI!<br />An organization with limited EAI experience.<br />
  11. 11. “The Perfect Storm”<br />The size of the EAI project + abilities of the development team = <br />
  12. 12. The Solution<br />Significant company resources and investment in SeeBeyond EAI product.<br />Put SeeBeyond at all the endpoints (warehouses and HQ).<br />All data would move through SeeBeyond.<br />SeeBeyond is Java based (also a company technology direction).<br />Write routing/minor processing code in Java in SeeBeyond.<br />Significant company resources and investment in Oracle RDBMS.<br />Oracle already at the warehouses<br />Obtain a “honking” big Oracle DB at HQ.<br />Use Oracle stored procedures for heavy lifting (data processing – report data preparation).<br />
  13. 13. Solution diagram<br />
  14. 14. Problem 1 – We weren’t ready<br />As an architect, I was not aware how different an EAI/Java messaging system is.<br />Asynchronous-everywhere nature<br />Had no patterns to follow (No – I had not read Hohpe/Woolf EAI book)<br />Did not have an awareness of the vendor landscape<br />Was easily talked into solutions by others.<br />My organization didn’t see how big it was<br />Had only implemented smaller EAI solutions<br />Finding good help was hard – and a critical step<br />Internally – lots of support but no experience<br />Contractors – lots of desire, but little implementation experience to the scale/level of effort<br />
  15. 15. Getting Yourself Ready<br />Get yourself ready<br />Understand your options – all the three letter E’s (EAI, ETL, EII, etc.)<br />Read EAI patterns<br />Know the products (WBI, Vitria, Tibco, WebMethods, SeeBeyond, etc.)<br />Find people with real EAI experience<br />Experienced with systems matching the size of your app<br />Find people with product expertise<br />Find people with design/pattern expertise<br />
  16. 16. EAI Patterns<br />Enterprise Integration Patterns: Hohpe/Woolf<br />Next Generation Application Integration: Linthicum<br />IT Architectures and Middleware: Britton<br />
  17. 17. Getting Resources Ready<br />Let the network engineers know of your plans<br />You are going to be using a significant amount of pipe.<br />Have you considered failover/load balancing? (comm lines around warehouses get cut on occasion)<br />Let the database engineers know of your plans<br />Terabytes of data to be stored and processed – where will it go?<br />Consider backup/recovery systems<br />Database logs/archiving<br />Performance tuning<br />
  18. 18. Getting Support Ready<br />Support staffs will be lost at turnover<br />How many of your support shops really know …<br />How to manage application servers?<br />How to manage web applications effectively?<br />Can you expect them to be able to operate, maintain and support component based messaging systems?<br />Do they know what a message server or bus is? <br />Across a very distributed environment?<br />Get them trained early (in messaging infrastructure).<br />Have them help you design the monitoring tools and alert systems.<br />Work together to develop proactive systems checks and troubleshooting procedures.<br />
  19. 19. Get Others Ready<br />If your development team isn’t ready, what about…<br />Testing/QA teams?<br />Analyst?<br />Managers?<br />For example, finding experienced testers for asynchronous messaging systems is difficult.<br />They usually need intricate knowledge of the messaging subsystem monitors and admin capabilities.<br />
  20. 20. Problem 2 – Proprietary EAI<br />EAI Products/Solutions are many.<br />EAI Standards are few.<br />EAI/ETL/EII/… market place is tumultuous<br />Sun has purchases SeeBeyond<br />IBM bought Ascential<br />Everyone calling their product an ESB<br />Products/Solutions have scale limits<br />Some they know about<br />Others they do not<br />Java alone does not make you platform independent.<br />
  21. 21. Examine your solution options<br />See if what you have would work.<br />There is a reason MQ has been around a long time.<br />Where possible consider tried, true and already deployed platforms (but again do the math and see if they can support the extra load).<br />In house support is probably equipped (more in a bit)<br />Not everything has to travel by message.<br />Consider multiple/alternate technologies for parts of your solution.<br />ETL is great for certain parts of a large solution<br />There is a reason why products like Oracle are expensive (technologies like Oracle Replication – more in a bit).<br />Does, however, create more issues of timing.<br />
  22. 22. Not everything has to travel by message<br />Consider multiple/alternate technologies for parts of your solution.<br />Replication of reference data<br />Bulk/batch transfers<br />Non-real time needs<br />ETL is great for certain parts of a large solution<br />Examine features in your DB/App Servers<br />There is a reason why products like Oracle are expensive (technologies like Oracle Replication – more in a bit).<br />How about those Message Beans in the app server?<br />This can, however, create more issues of timing.<br />
  23. 23. Reference Data<br />In many applications, you need reference data on both ends of the messaging systems.<br />You can build a “replicating” message engine to treat this like other message data (not recommended).<br />Referential integrity becomes a real problem.<br />Consider issues of message timing (PR becomes the 51st state but messages with PR references start to arrive before the new state data does)<br />Use simple replication technologies where possible<br />ETL tools if reference data changes only happen at certain times.<br />Technologies like Oracle Replication for real time (it can operate over a WAN).<br />
  24. 24. Java = interoperability (not always)<br />Even when you use Java, how is it being applied?<br />Java running inside of proprietary components (like SeeBeyond eWays) does not make you portable. <br />Write component code that can be used by or incorporated by proprietary systems.<br />Under the covers, is the vendor using <br />JMS<br />JMX<br />JAX-RPC<br />Etc…<br />
  25. 25. Process Outside the Bus<br />Process outside the message bus/subsystem if you can<br />Let the bus focus on delivering the goods.<br />Too much processing time in the bus will create<br />Scalability problems<br />Monitoring problems<br />Possibly interoperability problems (especially when using proprietary technology/components)<br />Process with components that are <br />Flexible<br />easy to get at (and change)<br />interoperable (if possible)<br />and contain reusable business logic (if possible)<br />
  26. 26. Problem 3 – We didn’t figure or figured poorly<br />We didn’t do enough “math” up front.<br />We didn’t plan for failure/growth.<br />The messages moved slower than anticipated.<br />The message processing took more time than expected.<br />The amount of data was larger than expected.<br />
  27. 27. Do the math…<br />How much time its going to take to get a message from A to B<br />Test that estimate early.<br />Work with the business analysts to figure out how many messages need to be moved.<br />Make volume estimates part of the non-functional requirements gathering process.<br />Check that against the existing databases if possible.<br />How much data needs to be packaged, shipped, processed, stored?<br />Design the messages and calculate the size of the overall message (XML and all).<br />Calculate the rate and add up the total volume.<br />
  28. 28. …and pad your answer<br />Do you have room to spare??<br />Can the messaging system handle that (on both ends)?<br />Can the consuming database handle that?<br />Can the hardware and network handle that?<br />Anticipate failure<br />What happens if something/anything goes down for an hour?<br />What happens if you go down for a day? <br />What happens if you have unexpected growth?<br />
  29. 29. Problem 4 – Exception Handling Wasn’t<br />More considerations for failover and redundancy<br />Versus Web application<br />We did not plan on downtime<br />Unplanned system issues<br />Planned outages<br />We didn’t build in enough redundancy<br />Load balancing and<br />Failover were both after thoughts<br />All messages always correct all the time (NOT)<br />At first, we had no proper dead letter queuing<br />No proper exception processing<br />No means to properly see and react to issues<br />Many more points of failure and potential issues<br />More widely distributed<br />
  30. 30. Design Load Balancing and Failover Upfront<br />Load balancing and failover must be accommodated<br />Like security, you need a multi-layered approach<br />Hardware (like Big IP)<br />Redundant message bus/message servers<br />Processing components<br />Database<br />EAI system throttling<br />How are you going to kick over to the failover systems (and return to regular systems)?<br />Without losing messages<br />Without causing timing problems in message deliver/receipt<br />
  31. 31. Space, space and more space<br />Plan on extra space for failure<br />A place for queued messages to sit if something goes down<br />Space in the DB or space in the message channels – or both<br />Plan on extra space for logs<br />You are going to want to keep log files around for a while.<br />Some problems take time to manifest to a point of awareness.<br />Devise an automated archive/clean up for logs.<br />No…not all EAI systems provide log clean up utilities.<br />
  32. 32. Anticipate Bad Messages<br />Build a Dead Letter Queue (see EAI Patterns book).<br />Unless you have a simple system, you will have messages the system can’t handle<br />Improper format, wrong data, etc…<br />Build a means to capture and handle these<br />Less they clog your process.<br />Where do you put them? DB, other queue?<br />Who checks them (do you have a “one’s” issue or systemic problem?)<br />
  33. 33. Message Repair<br />If possible, build a message triage mechanism to inspect, fix, resend DLQ’ed messages<br />This can be built/improved over time<br />More manual at first<br />Automated as you learn more.<br />Considerations<br />How are you going to clean up the error “droppings” (messages that are truly dead)<br />Consider a “retry” queue with varied strategies to retry messages that have failed.<br />Failure may be due to row locks or reference updates that are just microseconds away from completion.<br />Be cautious of when/why messages end up in the “dead letter” queue. <br />You don’t want it flooded because the DB is down.<br />
  34. 34. Dead Letter Queue (DLQ)<br />
  35. 35. Managing it/monitor it<br />The multiple points of failures and issues of your systems make them complicated to manage and support.<br />Build in automated monitoring facilities and system health dashboards.<br />You need a one stop shop for what’s up, what’s down, what’s queuing properly, what’s queuing too much, etc.<br />Consider the use of JMX (it is probably already built into some of your infrastructure components.<br />Calculate system thresholds and provide automated alerts to the dashboard and email/page/etc. systems when they start to get close (not once they have been achieved).<br />
  36. 36. Problem 5 – Change is inevitable<br />The size and shape of our messages changed over time.<br />We had no way to deal effectively with change.<br />Consequently, new system versions/updates caused<br />Shutdown<br />Replace (sometimes transforming data to a new structure)<br />Restart<br />The real world was the only time we saw some situations<br />We had no effective test harness <br />Typically leading to ugly back outs<br />
  37. 37. Version Strategy<br />EAI system stability/life span depends on it message structure.<br />Message structure is the hardest part to get exactly right up front.<br />When message formats need to change, this creates a real problem. The entire must be down, queues emptied, etc.<br />Consider version information in the message and routing/processing instructions in the bus.<br />More complicated system<br />Can also affect performance<br />Allows for dual operation (old and new systems) without failure and major down time.<br />Its going to happen – especially early – plan for it.<br />
  38. 38. Version Routing<br />
  39. 39. Testing EAI is a b*&%#<br />Consider collecting days worth of messages or message generating data and using it for replay scenarios.<br />Problem - even if you have all the data, you don’t have the same timing issues you will see in the real world.<br />Testing all the potential message scenarios is impossible with any significant sized system.<br />Consider developing a message “replicator” subsystem.<br />Send replicated messages to a test harness.<br />A “live” test faucet of messages ready whenever you need them.<br />Critical to be able test new/updated processes, performance, etc.<br />Requires a fair amount of hardware and some switch to turn it on/off.<br />Will impact performance<br />Consider putting the “faucet” on just one of the servers in a farm<br />
  40. 40. Test “Faucet”<br />
  41. 41. Wrap up<br />Despite the issues – the system is up and running today.<br />Extremely useful to the business – providing unparalleled distribution information.<br />Like most things in software system development, the lessons learned are more about organization, architecture and design than implementation.<br />Thank you – for your time and attention.<br />

×