Don't Rock the Boat: Managing Data Flow


Published on

In any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog, or merchandise. And to drive successful e-commerce, a business must have complete, accurate, combined data available in a timely manner.

POV By Anand Raman, Commerce Technology Practice Manager, and Arvind Naik,Technical Architect, SapientNitro

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Don't Rock the Boat: Managing Data Flow

  1. 1. POINT OF view POINT OF viewDon’t Rock the Boat:Managing Data FlowBy: Anand Raman, Commerce Technology Practice Manager, and Arvind Naik,Technical Architect, SapientNitroTHE BIG PICTUREIn any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog,or merchandise. And to drive successful e-commerce, a business must have complete, accurate,combined data available in a timely manner.Often, data flow is not a top priority. In fact, it’s perhaps a secondary thought at best. And although eache-commerce project brings its own unique challenges, there are common basal data elements acrossthe board.After understanding the challenges and lessons in this paper, technical architects, developers,and project managers alike will be able to identify data flow design and data availability as afundamental aspect of any e-commerce project and prepare a cohesive plan to address theirunique business challenges.WHY DOES DATA FLOW MATTER?At the on-set of an e-commerce project, businesses typically provide little to no specific requirementsaround data flow. They might plan to have some catalog systems, search features, and product dataloaded through backend systems but that’s about it. Typically the focus is on the customer experienceand how to get the pertinent data to those customers and other business users. But without acomprehensive data flow, there will be setbacks in the future.Attention must also be paid to the timing of flow development. During the later part of the development,questions often arise such as: When should I expect my price or promotion to show up? What shouldI do if I want to remove a product right away? Unless you’ve thought about those questions early on, itcould be too little too late. For instance, if you’re in the testing cycle of a project, it’s likely impossible todesign a solution for data flow issues unless you have been thinking about them early on. Likewise, it’sdifficult to have timely, frequent site refreshes without a comprehensive data flow strategy.It’s paramount to think about data flow as it pertains to every functional requirement—what kind ofdata with what kind of system within what time frame—in order to maintain efficiency and controlthroughout every process. IDEA ENGINEERS © Sapient Corporation 2012
  2. 2. POINT OF viewCOMMON DATA TYPES Fig. 1. A typical data flow diagram  Data mapping is a critical part of logical data flow design and this pictorial diagram represents atypical e-commerce data flow solution. Of course, the process can be much more complicated, but thisrepresentation offers a basic outline of what to expect when planning requirements around data flow.Mapping data in an accessible way will facilitate the discussion on data flow. Laying out the best, typical,and worst case Service-Level Agreement (SLA) is paramount in order to arrive at an agreeable set ofservice levels. At times, new integration techniques and solutions may need to be identified if none ofthe existing integration techniques are sufficient. Be prepared to even change the foundation of thesolution architecture if certain service levels are critical to the existence of the business.Though each e-commerce system is unique and has special business needs, most share commonbasic categories of data. And all e-commerce systems need to handle many types of data each with itsown source, lifecycle, rules, and criticality. Add in the multiple systems, business logic, workflows, andprocessing businesses must go through, and you’ve got a tremendously complex maze on your hands.The first step in defining a data flow strategy is to identify data types relevant to you. They include, butare not limited to:Product Data• Product information (e.g., specifications)• Product lifecycle (e.g., launch date)• Product images (e.g., various renditions)• Product rich content (e.g., multimedia)• Product merchant relationships (e.g., cross-sell, up-sell)• Product social data (e.g., ratings and reviews)• Product pricing (e.g., MSRP, sale price)• Pricing promotions and messages (e.g., discounts, clearance)Category Data• Category information (e.g., taxonomies: master, product, sales)• Category images• Category attributes IDEA ENGINEERS © Sapient Corporation 2012
  3. 3. POINT OF viewMarketing Data• Marketing promotions (e.g., order or shipping offers)• Merchandizing relationships (e.g., personalized recommendations)• Shipping rates calculationsInventory Data• Availability• Stock-in-hand• Release/street date• Backorder/pre-orderSearch Index Data• Searchable attributes• Facets, keywords, SEOOnce data types are identified, understand the expectations of the data by engaging in conversationswith business stakeholders, analysts and other experts. Many times, the requirements are unclear,even for key stakeholders. In such situations, starting with the necessities that are practical andfeasible is often the right approach.It can also help to articulate relationships and dependencies using an entity-relationship diagram. Atypical diagram may have hundreds of tables and a number of dependencies, which have significantimpact on the SLAs.   Fig. 2. An entity-relationship diagramDATA SOURCESMajor corporations have multiple sources to gather data; e-commerce data does not always originatefrom a single source. And, for each piece of data, you have to consider where the best source for thatdata lies. It is important to recognize the benefits and limitations all sources upfront to make the bestpossible decision. IDEA ENGINEERS © Sapient Corporation 2012
  4. 4. POINT OF viewData can originate from a number of systems such as Product Information Management Systems(PIM), Content Management Systems (CMS), Marketing Categorization Systems, Pricing and SalesManagement Systems, Marketing Promotion Management Systems, Social Network and RatingsSystems, and Analytics Data Systems.Each system comes with its own technology, integration options, throughput, data quality, and errorhandling methods. Articulation of these system boundaries is critical as there may be a need to investtime and money to reduce limitations of certain systems in the ecosystem.DATA PROCESSING AND INTEGRATION SYSTEMSOnce you’ve chosen the kinds of data and sources you require, you can then choose your dataprocessing and integration systems. Below is a list that are commonly used, but it could get muchlonger with a real-life project:• Standard DataStage (e.g., ETL)• IBM BODL (Business Object Data Loader)• IBM WebSphere MQ Broker• Custom Integration Layer• WCS Stage Propagation Utility• Secure FTP/MFTRegardless of what system(s) you choose, you must then optimize them. Optimization, a process ofimproving the performance without compromising quality and maintainability, is a critical activity.Optimization challenges differ based on technology and integration techniques, but these strategiescan help:1. Tune Structured Query Language (SQL) several times to ensure efficiency.2. Cache frequently used attributes to avoid unnecessary trips to the database.3. Use batches to commit and process as much as possible, and to avoid high overheads.4. Use parallel threads of processing wherever possible.5. Use persistent MQ queues to protect the messages.6. Pass only the required data to be updated to avoid unnecessary back-and-forth data.7. Use smart updates when it’s not feasible to minimize the message payloads.8. Conduct performance tests to ensure that the end-to-end data flow is optimized.THE CHALLENGESThere is no shortage of challenges when it comes to data flow. A well-designed data solution requiresthat you recognize that:Time matters. Every content type is different in terms of its lifecycle and frequency of change. A lot ofcontent is refreshed monthly or weekly, but some content types (e.g., promotions) have the propensityto change much quicker, forcing related messages (e.g., promotional merchandizing content) tochange at the same rate. And building any e-commerce system doesn’t happen overnight; insteadof months, it typically takes years. Also, each system may be under a different development cycle ortimeline, which can add to the complexity.Data flow management requires constant attention. Providing a consistent customer experience inthe face of ever-changing business and IT priorities is taxing. Businesses continue to adapt and change,as do their products and priorities. Combine that with on-going maintenance, integration specifications,bug fixes, releases, product upgrades … this is no simple endeavor. IDEA ENGINEERS © Sapient Corporation 2012
  5. 5. POINT OF viewArchitecture and design choices have an impact. Caching, an integral part of any e-commerceimplementation, plays spoilsport to the overall strategy if not attended to during the early stages ofimplementation and development. It is imperative that all architecture and design decisions take intoaccount the entire strategy.These all affect business decision-making and the stability of integration. So how do we build systemsthen to meet the ever-changing demands of business? And how do we build data flow around this fluidenvironment? The point is that when we think about data flow, all of these challenges (among others)must become considerations in order to guarantee data availability that’s quality-driven and timelygiven that we’re standing on such shaky ground.THE OBJECTIVESWhen we think about data flow aspects in an e-commerce system, we need to stress several goals for asuccessful business solution.First, identify critical data entities early on and identify the SLA requirements for them. It’s also crucialto identify how soon a data entity can be made available across the systems because the changesmay need to be reflected in multiple areas, not just at the front end. In addition, be sure to identifyemergency scenarios. You must be prepared for any circumstance that may arise, since it could have adetrimental impact in regards to legal issues, profitability, customer satisfaction, and overall businesssuccess—just to name a few.Second, understand your technology and system limitations in order to deliver all data in a timelymanner. What can seem sufficient in the beginning can later reveal gaping holes. It’s mandatory thatyou and your team are thorough and understand each and every data system critical to the data flowdesign, not just in the day-to-day but in extreme situations as well.Third, set expectations for data availability. When you design a system, there are always limitations andit’s important to set the expectations up front so the business can plan out solutions well in advance.Along those lines, too, make sure to understand the impact on the business if an entity is not availableas expected.And last, proactively determine solutions to improve the data flow and update SLA as needed. Doingthis upfront gives you the padding necessary to counteract any issues that may arise in the future, suchas strains on budgets and timelines.WHAT DOES YOUR BUSINESS NEED? Lifecycle Changes Product Attributes Fig. 3. The SLA consistency map Promotions, Image Price. Inventory Assets IDEA ENGINEERS © Sapient Corporation 2012
  6. 6. POINT OF view This is an example of an SLA consistency map we created for one of our clients, which used four quadrants to help them visualize and prioritize critical data entities. On the x-axis on this particular example we have Availability, the things you need as quickly as possible—in this case, entities like up-to-date images, inventory, and price data. On the y-axis we have Consistency, which reflects the importance of accuracy and precision with attributes like lifecycle changes. It is essential to keep in mind that these frequencies have significant impact on the resources and cost required to architect a data flow management solution. FINAL CONSIDERATIONS The complexity and importance of a high-functioning data flow system should be clear at this point. And with so many systems and options available, there are several questions you should be asking yourself: 1. Do I really need this system? Make sure you’re picking the systems that will allow you to optimize the workflow and make data available as quickly and consistently as possible. 2. Is it fit to handle throughput? If the answer is yes, decide what entities it is suitable for. 3. Can I minimize the systems between the source and the destination? The more steps you take, the higher the risk of out-of-sync data, lost data, or increased time to availability. 4. Can a system be upgraded or replaced with a higher-performing system, and can I improvise the systems for handling data? Again, making these decisions will best serve you if you make them upfront. Data flow management is critical to the success of an e-commerce site. It does not end once the data entities are identified and reasonable data flow architecture and integration techniques are implemented. Constant communication to understand expectations, communicate changes, and ensure alignment on an ongoing basis is absolutely essential. Data flow management should not be an afterthought but must be a priority that is addressed during the early phases—and every phase thereafter—of any e-commerce solution.About the Authors Anand has been involved in design, implementation, and support of high volume transactional applications for the retail & travel industry. Over the past few years he has been involved in the build and rollout of the platform. Prior to Sapient, Anand worked with one of India’s largest media houses and worked on putting their popular properties online.Anand Raman Arvind Naik, Technical Architect, has rich e-commerce sites implementation experience at Borders, David’s Bridal, Agriliance, and Sprint in similar roles. He enjoys large-scale technical problem solving and working with data flows. He has been instrumental in end-to-end data flow management and creating strategy roadmap projects for several projects across clients. He is inter- ested in adding Cloud, PIM, and MDM to his technical portfolio.Arvind Naik IDEA ENGINEERS © Sapient Corporation 2012