SlideShare a Scribd company logo
Importance of ‘Centralized Event collection’
and BigData platform for Analysis !

DevOpsDays India, Bangalore - 2013

~/Piyush
Manager, Website Operations at MakeMyTrip
What to expect:










MakeMyTrip data challenges!
Event Data a.k.a. Logs & Log Analysis
Why Centralized Logging …for systems and applications !
Capturing Events: Why structured data emitted from apps for
machines is a better approach!
Data Service Platform : DSP – Why ?
Inputs: Data for DSP
Top Architecture Considerations
Top level key tasks
Tools Arsenal and API Management and Service Cloud

DevOpsDays India 2013 : ~/Piyush
MakeMyTrip data challenges …!
•
•

Multi-DC/colocation setup
Different type of data sources : internal/ external(structured, semi-structured,
unstructured))
– Online Transaction Data Store
– ERP
– CRM
•

Email Behavior / Survey results

– Web Analytics
– Logs
•
•
•

–
–
–
–

Web
Application
User Activity logs

Social Media
Inventory / Catalog
Data residing in excel files
Monitoring Metric Data :
•
•

Graphite (Time-series whisper),
Splunk , ElasticSearch (Logstash)

– Many other different sources

•

Storing and Analyzing Huge Event Data !

DevOpsDays India 2013 : ~/Piyush
Some challenges …!
•
•
•
•
•
•

Aggregate web usage data and transactional data to generate one view
Process multiple GB's-TB’s of data every day
Serve more than a million data services API request / day
Ensure business continuity as more and more reliance on MyDSP increases
Store Terabytes of historical data
Meshing transactional (online and offline) data with consumer behavior
and derive analytics
• Build flexible data ingestion platform to manage many data feeds from
multiple data sources

DevOpsDays India 2013 : ~/Piyush
Flow of an Event

DevOpsDays India 2013 : ~/Piyush
Event Data a.k.a. Logs
• Event Data -> set of chronologically sequenced data records that capture
information about an event !
• Virtually every form of system produces event data
– Capture it from all components and both client and server side events!

• You may call logs as the footprint generated by any activity with the
system/app.
• Event Data has different characteristics from data stored in traditional
data warehouses
– Huge Volume: Event data accumulates rapidly and often must be stored for years; many
organizations are managing hundreds of terabytes and some are managing petabytes.
– Format: Because of the huge variety of sources, event data is unstructured and semi
structured.
– Velocity – New event data is constantly coming in
– Collection : Event data is difficult to collect because of broadly dispersed systems and
networks.
– Time-stamped : Event data is always inserted once with a time-stamp. It never changes.

DevOpsDays India 2013 : ~/Piyush
Log Analysis
• Logs are one of the most useful things when it comes to analysis; in simple
terms Log analysis is making sense out of system/app-generated log
messages (or just LOGS). Through logs we get insights into what is
happening into the system.
• Help root cause analysis that occurs after any incident.
• Personalize User Experience Analyzing Web Usage Data
“Security Req“:
• Traditionally some compliance requirements too of : Log Management
/SEM+ SIM => SIEM
• For Data Security – to have one centralized platform for collecting ALL
events (Logs) , correlate them and have real time intelligent visibility.
• To not just monitor network, OS , devices etc. but ALL applications ,
business processes too.

DevOpsDays India 2013 : ~/Piyush
Why Centralized Logging …for systems and applications !
• Need for Centralized Logging is quiet important nowadays due to:–
–
–
–

growth in number of applications,
distributed architecture (Service Oriented Architecture)
Cloud based apps
number of machines and infrastructure size is increasing day by day.

• This means that centralized logging and the ability to spot errors in a
distributed systems & applications has become even more “valuable” &
“needed”.
And most importantly
– be able to understand the customers and how they interact with websites;
– Understanding Change: whether using A/B or Multivariate experiments or tweak /
understand new implementations.

DevOpsDays India 2013 : ~/Piyush
Capturing Events: Why structured data emitted from apps for
machines is a better approach!
• Need for standardization:– Developers assume that the first level consumer of a log message is a human and they
only know what information is needed to debug an issue.
Logs are not just for humans!
The primary consumers of logs are shifting from humans to computers. This means log
formats should have a well-defined structure that can be parsed easily and robustly.
Logs change!
If the logs never changed, writing a custom parser might not be too terrible. The
engineer would write it once and be done. But in reality, logs change.
Every time you add a feature, you start logging more data, and as you add more data,
the printf-style format inevitably changes. This implies that the custom parser has to be
updated constantly, consuming valuable development time.

• Suggested Approach : “Logging in JSON Format”
– Just to keep it simple and generic for any Application the approach
recommended is to {Key: Value} , JSON Log Format (structured/semistructured).
– This approach will be helpful for easy parsing and consumption, which
would be irrespective of whatever technology/tools we choose to use!

DevOpsDays India 2013 : ~/Piyush
Key things to keep in mind/ Rules
•
•
•
•
•
•
•
•

•
•

Use timestamps for every event
Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be
append unique user Identification (UUID) number to track unique users.
Log in text format / means Avoid logging binary information!
Log anything that can add value when aggregated, charted, or further
analyzed.
Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG.
The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so
don’t log too much 
NTP synced same date time / timezone on every producer and collector
machine(#ntpdate ntp.example.com).
Reliability: Like video recordings … you don’t’ want to lose the most valuable
shoot … so you record every frame and then later during analysis; you may
throw away rest of the stuff…picking your best shoot / frame. Here also – logs
as events are recorded & should be recorded with proper reliability so that
you don’t’ lose any important and usable part of it like the important video
frame.
Correlation Rules for various event streams to generated and minimize
alerts/events.
Write Connectors for integrations
DevOpsDays India 2013 : ~/Piyush
Data Service Platform : DSP
Why we need a data services platform ?
-

-

Integration Layer to bring data from more
sources in less time
Serve various components – applications
and also to Monitoring systems etc.

DevOpsDays India 2013 : ~/Piyush
Inputs : Data – what data to include
• Clickstream / Web Usage Data
– User Activity Logs

• Transactional Data Store
• Off-line
– CRM
– Email Behavior -> Logs/ Events

DevOpsDays India 2013 : ~/Piyush
Top Architecture Considerations
•
•
•
•
•

Non blocking data ingestion
UUID Tagged Events / messages
Load balanced data processing across data centers
Use of memory based data storage for real-time data systems
Easy scalable, HA - highly available and easy to maintain large historical
data sets
• Data caching to achieve low latency
• To ensure Business Continuity , parallel process between two different
data centers
• Use of Centralized service cloud for API management , security
(authentication, authorization), metering and integration

DevOpsDays India 2013 : ~/Piyush
Top level key tasks for User Activity Logging & Analysis
1. Data Collection of both Client-Side and Server-Side user activity streams
•
•

Tag every Website visitor with UUID similar to the System UUID’s
Collect the activity streams on BigData Platform for Analysis through Kafka Queues & NoSQL data
stores

2. Near real-time Data Processing
•

Preprocessing / Aggregations
•

•

Filtering etc.

Pattern Discovery along with the already available cooked data from point 4
•

Clustering/Classification/association discovery/Sequence Mining

3. Rule Engine / recommendations algorithms
•
•

Rule Engine : Building effective business rule engine / Correlate Events
Content-based filtering / Collaborative Filtering

4. Batch Processing / post processing using Hadoop Ecosystem
•

Analysis & Storing Cooked data in NoSQL data store

5. Data Services (Web-services)
•

RESTful API’s to make the data/insights consumable through various data services

6. Reporting/Search interface & Visualization for Product Development teams and other
business owners.

DevOpsDays India 2013 : ~/Piyush
Data System
Lets’ store
everything!

Query =
function (data)
Layered
Architecture:

• every event : Data !

• Precompute View

• Batch Layer : Hadoop M/R
• Speed Layer : Storm NRT Computation
• Serving Layer
DevOpsDays India 2013 : ~/Piyush
DevOpsDays India 2013 : ~/Piyush
Clickstream / User Activities Capture : Data is-> “Events”
•

•

Tag every Website visitor with UUID using Apache module - Done
– https://github.com/piykumar/modified_mod_cookietrack
– Cookie : UUID like 24617072-3124-674f-4b72-675746562434.1381297617597249
JSON Messages like

{
"timestamp": "2012-12-14T02:30:18",
"facility": "clientSide",
"clientip": "123.123.123.123",
"uuid": "24617072-3124-5544-2f61-695256432432.1379399183414528",
"domain": "www.example.com",
"server": "abc-123",
"request": "/page/request",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "loading",
"event2": "interstitial display banner",
"severity": "WARN",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 2
}

DevOpsDays India 2013 : ~/Piyush
Tools Arsenal
•
•
•
•
•
•
•
•
•
•
•

ETL : Talend
BI : SpagoBI & QlikView
Hadoop : Hortonworks
NRT Computation: Twitter Storm
Document-Oriented NoSQL DB : Couchbase
Distributed Search: ElasticSearch
Log Collection: Flume, Logstash, Syslog-NG
Distributed messaging system : Kafka , RabbitMQ
NoSQL : Cassandra, Redis, Neo4J (Graph)
API Management : WSO2 API Manager, 3Scale /Nginx
Programming Languages : Java , Python, R

DevOpsDays India 2013 : ~/Piyush
API Management and Data Services
Cloud
• 3Scale / Nginx , WSO2: API Manager etc
– For centralized distributed repository to serve API’s and provides
throttling,meetring, Security features etc.

• Inject building a data services layer in Culture
and make sure what ever components you
create you have some way to chain it in the
pipeline or call in independently.

DevOpsDays India 2013 : ~/Piyush
Thanks!
Questions – If Any  !

~/Piyush
@piykumar
http://piyush.me

DevOpsDays India 2013 : ~/Piyush

More Related Content

What's hot

WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev PlatformWSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
confluent
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
Gleicon Moraes
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
Marc Dutoo
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
HostedbyConfluent
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
confluent
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
Alexander Dean
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
confluent
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
HostedbyConfluent
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
Rainforest QA
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
confluent
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
John Burwell
 

What's hot (20)

WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev PlatformWSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
 

Viewers also liked

"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
Piyush Kumar
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
Piyush Kumar
 
Open World of #OSS and #HealthTech
Open World of #OSS and #HealthTechOpen World of #OSS and #HealthTech
Open World of #OSS and #HealthTech
Piyush Kumar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
Regunath B
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded Systems
Mahendra M
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
Idan Gazit
 
Advanced task management with Celery
Advanced task management with CeleryAdvanced task management with Celery
Advanced task management with Celery
Mahendra M
 
An Introduction to Yatra.com
An Introduction to Yatra.comAn Introduction to Yatra.com
An Introduction to Yatra.com
Yatra.Com
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systemsJava and .NET Architect
 
Air ticket reservation system presentation
Air ticket reservation system presentation Air ticket reservation system presentation
Air ticket reservation system presentation
Smit Patel
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
Siddhartha Reddy Kothakapu
 
Project Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemProject Proposal document for Hotel Management System
Project Proposal document for Hotel Management System
Charitha Gamage
 

Viewers also liked (12)

"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Open World of #OSS and #HealthTech
Open World of #OSS and #HealthTechOpen World of #OSS and #HealthTech
Open World of #OSS and #HealthTech
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded Systems
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
 
Advanced task management with Celery
Advanced task management with CeleryAdvanced task management with Celery
Advanced task management with Celery
 
An Introduction to Yatra.com
An Introduction to Yatra.comAn Introduction to Yatra.com
An Introduction to Yatra.com
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systems
 
Air ticket reservation system presentation
Air ticket reservation system presentation Air ticket reservation system presentation
Air ticket reservation system presentation
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Project Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemProject Proposal document for Hotel Management System
Project Proposal document for Hotel Management System
 

Similar to Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

Apigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven ActionsApigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven Actions
Apigee | Google Cloud
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
Matsuo Sawahashi
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
Nicolas Morales
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
Prolifics
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
Wilfried Hoge
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
Brian Petrini
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
TechWell
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
In-Memory Computing Summit
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
Neo4j
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2
 
Active directory solutions brochure
Active directory solutions brochureActive directory solutions brochure
Active directory solutions brochure
Zoho Corporation
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
Splunk
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
Splunk
 

Similar to Importance of ‘Centralized Event collection’ and BigData platform for Analysis ! (20)

Apigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven ActionsApigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven Actions
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Active directory solutions brochure
Active directory solutions brochureActive directory solutions brochure
Active directory solutions brochure
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

  • 1. Importance of ‘Centralized Event collection’ and BigData platform for Analysis ! DevOpsDays India, Bangalore - 2013 ~/Piyush Manager, Website Operations at MakeMyTrip
  • 2. What to expect:          MakeMyTrip data challenges! Event Data a.k.a. Logs & Log Analysis Why Centralized Logging …for systems and applications ! Capturing Events: Why structured data emitted from apps for machines is a better approach! Data Service Platform : DSP – Why ? Inputs: Data for DSP Top Architecture Considerations Top level key tasks Tools Arsenal and API Management and Service Cloud DevOpsDays India 2013 : ~/Piyush
  • 3. MakeMyTrip data challenges …! • • Multi-DC/colocation setup Different type of data sources : internal/ external(structured, semi-structured, unstructured)) – Online Transaction Data Store – ERP – CRM • Email Behavior / Survey results – Web Analytics – Logs • • • – – – – Web Application User Activity logs Social Media Inventory / Catalog Data residing in excel files Monitoring Metric Data : • • Graphite (Time-series whisper), Splunk , ElasticSearch (Logstash) – Many other different sources • Storing and Analyzing Huge Event Data ! DevOpsDays India 2013 : ~/Piyush
  • 4. Some challenges …! • • • • • • Aggregate web usage data and transactional data to generate one view Process multiple GB's-TB’s of data every day Serve more than a million data services API request / day Ensure business continuity as more and more reliance on MyDSP increases Store Terabytes of historical data Meshing transactional (online and offline) data with consumer behavior and derive analytics • Build flexible data ingestion platform to manage many data feeds from multiple data sources DevOpsDays India 2013 : ~/Piyush
  • 5. Flow of an Event DevOpsDays India 2013 : ~/Piyush
  • 6. Event Data a.k.a. Logs • Event Data -> set of chronologically sequenced data records that capture information about an event ! • Virtually every form of system produces event data – Capture it from all components and both client and server side events! • You may call logs as the footprint generated by any activity with the system/app. • Event Data has different characteristics from data stored in traditional data warehouses – Huge Volume: Event data accumulates rapidly and often must be stored for years; many organizations are managing hundreds of terabytes and some are managing petabytes. – Format: Because of the huge variety of sources, event data is unstructured and semi structured. – Velocity – New event data is constantly coming in – Collection : Event data is difficult to collect because of broadly dispersed systems and networks. – Time-stamped : Event data is always inserted once with a time-stamp. It never changes. DevOpsDays India 2013 : ~/Piyush
  • 7. Log Analysis • Logs are one of the most useful things when it comes to analysis; in simple terms Log analysis is making sense out of system/app-generated log messages (or just LOGS). Through logs we get insights into what is happening into the system. • Help root cause analysis that occurs after any incident. • Personalize User Experience Analyzing Web Usage Data “Security Req“: • Traditionally some compliance requirements too of : Log Management /SEM+ SIM => SIEM • For Data Security – to have one centralized platform for collecting ALL events (Logs) , correlate them and have real time intelligent visibility. • To not just monitor network, OS , devices etc. but ALL applications , business processes too. DevOpsDays India 2013 : ~/Piyush
  • 8. Why Centralized Logging …for systems and applications ! • Need for Centralized Logging is quiet important nowadays due to:– – – – growth in number of applications, distributed architecture (Service Oriented Architecture) Cloud based apps number of machines and infrastructure size is increasing day by day. • This means that centralized logging and the ability to spot errors in a distributed systems & applications has become even more “valuable” & “needed”. And most importantly – be able to understand the customers and how they interact with websites; – Understanding Change: whether using A/B or Multivariate experiments or tweak / understand new implementations. DevOpsDays India 2013 : ~/Piyush
  • 9. Capturing Events: Why structured data emitted from apps for machines is a better approach! • Need for standardization:– Developers assume that the first level consumer of a log message is a human and they only know what information is needed to debug an issue. Logs are not just for humans! The primary consumers of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly. Logs change! If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change. Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time. • Suggested Approach : “Logging in JSON Format” – Just to keep it simple and generic for any Application the approach recommended is to {Key: Value} , JSON Log Format (structured/semistructured). – This approach will be helpful for easy parsing and consumption, which would be irrespective of whatever technology/tools we choose to use! DevOpsDays India 2013 : ~/Piyush
  • 10. Key things to keep in mind/ Rules • • • • • • • • • • Use timestamps for every event Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be append unique user Identification (UUID) number to track unique users. Log in text format / means Avoid logging binary information! Log anything that can add value when aggregated, charted, or further analyzed. Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG. The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so don’t log too much  NTP synced same date time / timezone on every producer and collector machine(#ntpdate ntp.example.com). Reliability: Like video recordings … you don’t’ want to lose the most valuable shoot … so you record every frame and then later during analysis; you may throw away rest of the stuff…picking your best shoot / frame. Here also – logs as events are recorded & should be recorded with proper reliability so that you don’t’ lose any important and usable part of it like the important video frame. Correlation Rules for various event streams to generated and minimize alerts/events. Write Connectors for integrations DevOpsDays India 2013 : ~/Piyush
  • 11. Data Service Platform : DSP Why we need a data services platform ? - - Integration Layer to bring data from more sources in less time Serve various components – applications and also to Monitoring systems etc. DevOpsDays India 2013 : ~/Piyush
  • 12. Inputs : Data – what data to include • Clickstream / Web Usage Data – User Activity Logs • Transactional Data Store • Off-line – CRM – Email Behavior -> Logs/ Events DevOpsDays India 2013 : ~/Piyush
  • 13. Top Architecture Considerations • • • • • Non blocking data ingestion UUID Tagged Events / messages Load balanced data processing across data centers Use of memory based data storage for real-time data systems Easy scalable, HA - highly available and easy to maintain large historical data sets • Data caching to achieve low latency • To ensure Business Continuity , parallel process between two different data centers • Use of Centralized service cloud for API management , security (authentication, authorization), metering and integration DevOpsDays India 2013 : ~/Piyush
  • 14. Top level key tasks for User Activity Logging & Analysis 1. Data Collection of both Client-Side and Server-Side user activity streams • • Tag every Website visitor with UUID similar to the System UUID’s Collect the activity streams on BigData Platform for Analysis through Kafka Queues & NoSQL data stores 2. Near real-time Data Processing • Preprocessing / Aggregations • • Filtering etc. Pattern Discovery along with the already available cooked data from point 4 • Clustering/Classification/association discovery/Sequence Mining 3. Rule Engine / recommendations algorithms • • Rule Engine : Building effective business rule engine / Correlate Events Content-based filtering / Collaborative Filtering 4. Batch Processing / post processing using Hadoop Ecosystem • Analysis & Storing Cooked data in NoSQL data store 5. Data Services (Web-services) • RESTful API’s to make the data/insights consumable through various data services 6. Reporting/Search interface & Visualization for Product Development teams and other business owners. DevOpsDays India 2013 : ~/Piyush
  • 15. Data System Lets’ store everything! Query = function (data) Layered Architecture: • every event : Data ! • Precompute View • Batch Layer : Hadoop M/R • Speed Layer : Storm NRT Computation • Serving Layer DevOpsDays India 2013 : ~/Piyush
  • 16. DevOpsDays India 2013 : ~/Piyush
  • 17. Clickstream / User Activities Capture : Data is-> “Events” • • Tag every Website visitor with UUID using Apache module - Done – https://github.com/piykumar/modified_mod_cookietrack – Cookie : UUID like 24617072-3124-674f-4b72-675746562434.1381297617597249 JSON Messages like { "timestamp": "2012-12-14T02:30:18", "facility": "clientSide", "clientip": "123.123.123.123", "uuid": "24617072-3124-5544-2f61-695256432432.1379399183414528", "domain": "www.example.com", "server": "abc-123", "request": "/page/request", "pagename": "funnel:example com:page1", "searchKey": "1234567890_", "sessionID": "11111111111111", "event1": "loading", "event2": "interstitial display banner", "severity": "WARN", "short_message": "....meaning short message for aggregation...", "full_message": "full LOG message", "userAgent": "...blah...blah..blah...", "RT": 2 } DevOpsDays India 2013 : ~/Piyush
  • 18. Tools Arsenal • • • • • • • • • • • ETL : Talend BI : SpagoBI & QlikView Hadoop : Hortonworks NRT Computation: Twitter Storm Document-Oriented NoSQL DB : Couchbase Distributed Search: ElasticSearch Log Collection: Flume, Logstash, Syslog-NG Distributed messaging system : Kafka , RabbitMQ NoSQL : Cassandra, Redis, Neo4J (Graph) API Management : WSO2 API Manager, 3Scale /Nginx Programming Languages : Java , Python, R DevOpsDays India 2013 : ~/Piyush
  • 19. API Management and Data Services Cloud • 3Scale / Nginx , WSO2: API Manager etc – For centralized distributed repository to serve API’s and provides throttling,meetring, Security features etc. • Inject building a data services layer in Culture and make sure what ever components you create you have some way to chain it in the pipeline or call in independently. DevOpsDays India 2013 : ~/Piyush
  • 20. Thanks! Questions – If Any  ! ~/Piyush @piykumar http://piyush.me DevOpsDays India 2013 : ~/Piyush