Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

8,055 views

Published on

DevOpsDays India, Bangalore - 2013

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total views
8,055
On SlideShare
0
From Embeds
0
Number of Embeds
4,133
Actions
Shares
0
Downloads
64
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

  1. 1. Importance of ‘Centralized Event collection’ and BigData platform for Analysis ! DevOpsDays India, Bangalore - 2013 ~/Piyush Manager, Website Operations at MakeMyTrip
  2. 2. What to expect:          MakeMyTrip data challenges! Event Data a.k.a. Logs & Log Analysis Why Centralized Logging …for systems and applications ! Capturing Events: Why structured data emitted from apps for machines is a better approach! Data Service Platform : DSP – Why ? Inputs: Data for DSP Top Architecture Considerations Top level key tasks Tools Arsenal and API Management and Service Cloud DevOpsDays India 2013 : ~/Piyush
  3. 3. MakeMyTrip data challenges …! • • Multi-DC/colocation setup Different type of data sources : internal/ external(structured, semi-structured, unstructured)) – Online Transaction Data Store – ERP – CRM • Email Behavior / Survey results – Web Analytics – Logs • • • – – – – Web Application User Activity logs Social Media Inventory / Catalog Data residing in excel files Monitoring Metric Data : • • Graphite (Time-series whisper), Splunk , ElasticSearch (Logstash) – Many other different sources • Storing and Analyzing Huge Event Data ! DevOpsDays India 2013 : ~/Piyush
  4. 4. Some challenges …! • • • • • • Aggregate web usage data and transactional data to generate one view Process multiple GB's-TB’s of data every day Serve more than a million data services API request / day Ensure business continuity as more and more reliance on MyDSP increases Store Terabytes of historical data Meshing transactional (online and offline) data with consumer behavior and derive analytics • Build flexible data ingestion platform to manage many data feeds from multiple data sources DevOpsDays India 2013 : ~/Piyush
  5. 5. Flow of an Event DevOpsDays India 2013 : ~/Piyush
  6. 6. Event Data a.k.a. Logs • Event Data -> set of chronologically sequenced data records that capture information about an event ! • Virtually every form of system produces event data – Capture it from all components and both client and server side events! • You may call logs as the footprint generated by any activity with the system/app. • Event Data has different characteristics from data stored in traditional data warehouses – Huge Volume: Event data accumulates rapidly and often must be stored for years; many organizations are managing hundreds of terabytes and some are managing petabytes. – Format: Because of the huge variety of sources, event data is unstructured and semi structured. – Velocity – New event data is constantly coming in – Collection : Event data is difficult to collect because of broadly dispersed systems and networks. – Time-stamped : Event data is always inserted once with a time-stamp. It never changes. DevOpsDays India 2013 : ~/Piyush
  7. 7. Log Analysis • Logs are one of the most useful things when it comes to analysis; in simple terms Log analysis is making sense out of system/app-generated log messages (or just LOGS). Through logs we get insights into what is happening into the system. • Help root cause analysis that occurs after any incident. • Personalize User Experience Analyzing Web Usage Data “Security Req“: • Traditionally some compliance requirements too of : Log Management /SEM+ SIM => SIEM • For Data Security – to have one centralized platform for collecting ALL events (Logs) , correlate them and have real time intelligent visibility. • To not just monitor network, OS , devices etc. but ALL applications , business processes too. DevOpsDays India 2013 : ~/Piyush
  8. 8. Why Centralized Logging …for systems and applications ! • Need for Centralized Logging is quiet important nowadays due to:– – – – growth in number of applications, distributed architecture (Service Oriented Architecture) Cloud based apps number of machines and infrastructure size is increasing day by day. • This means that centralized logging and the ability to spot errors in a distributed systems & applications has become even more “valuable” & “needed”. And most importantly – be able to understand the customers and how they interact with websites; – Understanding Change: whether using A/B or Multivariate experiments or tweak / understand new implementations. DevOpsDays India 2013 : ~/Piyush
  9. 9. Capturing Events: Why structured data emitted from apps for machines is a better approach! • Need for standardization:– Developers assume that the first level consumer of a log message is a human and they only know what information is needed to debug an issue. Logs are not just for humans! The primary consumers of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly. Logs change! If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change. Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time. • Suggested Approach : “Logging in JSON Format” – Just to keep it simple and generic for any Application the approach recommended is to {Key: Value} , JSON Log Format (structured/semistructured). – This approach will be helpful for easy parsing and consumption, which would be irrespective of whatever technology/tools we choose to use! DevOpsDays India 2013 : ~/Piyush
  10. 10. Key things to keep in mind/ Rules • • • • • • • • • • Use timestamps for every event Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be append unique user Identification (UUID) number to track unique users. Log in text format / means Avoid logging binary information! Log anything that can add value when aggregated, charted, or further analyzed. Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG. The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so don’t log too much  NTP synced same date time / timezone on every producer and collector machine(#ntpdate ntp.example.com). Reliability: Like video recordings … you don’t’ want to lose the most valuable shoot … so you record every frame and then later during analysis; you may throw away rest of the stuff…picking your best shoot / frame. Here also – logs as events are recorded & should be recorded with proper reliability so that you don’t’ lose any important and usable part of it like the important video frame. Correlation Rules for various event streams to generated and minimize alerts/events. Write Connectors for integrations DevOpsDays India 2013 : ~/Piyush
  11. 11. Data Service Platform : DSP Why we need a data services platform ? - - Integration Layer to bring data from more sources in less time Serve various components – applications and also to Monitoring systems etc. DevOpsDays India 2013 : ~/Piyush
  12. 12. Inputs : Data – what data to include • Clickstream / Web Usage Data – User Activity Logs • Transactional Data Store • Off-line – CRM – Email Behavior -> Logs/ Events DevOpsDays India 2013 : ~/Piyush
  13. 13. Top Architecture Considerations • • • • • Non blocking data ingestion UUID Tagged Events / messages Load balanced data processing across data centers Use of memory based data storage for real-time data systems Easy scalable, HA - highly available and easy to maintain large historical data sets • Data caching to achieve low latency • To ensure Business Continuity , parallel process between two different data centers • Use of Centralized service cloud for API management , security (authentication, authorization), metering and integration DevOpsDays India 2013 : ~/Piyush
  14. 14. Top level key tasks for User Activity Logging & Analysis 1. Data Collection of both Client-Side and Server-Side user activity streams • • Tag every Website visitor with UUID similar to the System UUID’s Collect the activity streams on BigData Platform for Analysis through Kafka Queues & NoSQL data stores 2. Near real-time Data Processing • Preprocessing / Aggregations • • Filtering etc. Pattern Discovery along with the already available cooked data from point 4 • Clustering/Classification/association discovery/Sequence Mining 3. Rule Engine / recommendations algorithms • • Rule Engine : Building effective business rule engine / Correlate Events Content-based filtering / Collaborative Filtering 4. Batch Processing / post processing using Hadoop Ecosystem • Analysis & Storing Cooked data in NoSQL data store 5. Data Services (Web-services) • RESTful API’s to make the data/insights consumable through various data services 6. Reporting/Search interface & Visualization for Product Development teams and other business owners. DevOpsDays India 2013 : ~/Piyush
  15. 15. Data System Lets’ store everything! Query = function (data) Layered Architecture: • every event : Data ! • Precompute View • Batch Layer : Hadoop M/R • Speed Layer : Storm NRT Computation • Serving Layer DevOpsDays India 2013 : ~/Piyush
  16. 16. DevOpsDays India 2013 : ~/Piyush
  17. 17. Clickstream / User Activities Capture : Data is-> “Events” • • Tag every Website visitor with UUID using Apache module - Done – https://github.com/piykumar/modified_mod_cookietrack – Cookie : UUID like 24617072-3124-674f-4b72-675746562434.1381297617597249 JSON Messages like { "timestamp": "2012-12-14T02:30:18", "facility": "clientSide", "clientip": "123.123.123.123", "uuid": "24617072-3124-5544-2f61-695256432432.1379399183414528", "domain": "www.example.com", "server": "abc-123", "request": "/page/request", "pagename": "funnel:example com:page1", "searchKey": "1234567890_", "sessionID": "11111111111111", "event1": "loading", "event2": "interstitial display banner", "severity": "WARN", "short_message": "....meaning short message for aggregation...", "full_message": "full LOG message", "userAgent": "...blah...blah..blah...", "RT": 2 } DevOpsDays India 2013 : ~/Piyush
  18. 18. Tools Arsenal • • • • • • • • • • • ETL : Talend BI : SpagoBI & QlikView Hadoop : Hortonworks NRT Computation: Twitter Storm Document-Oriented NoSQL DB : Couchbase Distributed Search: ElasticSearch Log Collection: Flume, Logstash, Syslog-NG Distributed messaging system : Kafka , RabbitMQ NoSQL : Cassandra, Redis, Neo4J (Graph) API Management : WSO2 API Manager, 3Scale /Nginx Programming Languages : Java , Python, R DevOpsDays India 2013 : ~/Piyush
  19. 19. API Management and Data Services Cloud • 3Scale / Nginx , WSO2: API Manager etc – For centralized distributed repository to serve API’s and provides throttling,meetring, Security features etc. • Inject building a data services layer in Culture and make sure what ever components you create you have some way to chain it in the pipeline or call in independently. DevOpsDays India 2013 : ~/Piyush
  20. 20. Thanks! Questions – If Any  ! ~/Piyush @piykumar http://piyush.me DevOpsDays India 2013 : ~/Piyush

×