SlideShare a Scribd company logo
1 of 26
Download to read offline
Use ELK Explore Defect Data
Xu Yabin
Singapore
Content
Customer requirements and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
Customer Requirement
• Online web applications which need to be deployed frequently
• Serious defects and quality issues
• Not enough test before applications deployed
• Defects are always out of control after applications deployed
• Serious defects are always found after the application deployed
• Serious defects are not fixed on time
• Implement Continuous integration and defect management system
• What the result is and how to do continuous assessment for DevOps
activities
Defect KPI Definition
• Based on the customer’s requirements, the defect KPI is
defined as below
• Defect number and distribution
• Defects number before and after applications deployed
• Serious defects number before and after applications
deployed
• Serious defects fixed time
Data analytics tools requirement
• What data analytics tools do we need
• Easily import defect data from current defect system
• Easily configure and calculate to get the KPI data
• Explore defect data without any data model preparation
• Easily dig into the detailed information
• Easy to maintain
• We choose ELK (Elasticsearch, Logstash, Kibana)
Content
Customer requirement and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
ELK Solution
Defect
Management
System
Distributed data
storage and search
engine
Original
defect data
Logstash Elasticsearch Kibana
Data collector Data analytics
and result
• Most of the works are done through configuration, not coding
Original defect data
• Original defect data is from customer’s defect management system,
XML format
ELK Data collector: Logstash
• Collect defect data using Logstash
• Compared to traditional data collector (much code work is needed), Logstash
need no code, only several lines of configuration
• Defect data is put into Elasticsearch through Logstash pipeline
ELK User interface configuration: Kibana
• When data is imported into Elasticsearch, UI configuration
can be done using Kibana
• UI configuration is focused on what will be displayed
• Configuration is in a very natural way
• No business data model is needed before doing the configuration
ELK : User interface
• Easily add query conditions and filters to dig into the data
ELK: Filter and dig into the data:defect distribution by time
• The defect data
view shows all
defect data
Most defects are created
in the year 2015, use the mouse to drag
the area
The defect data is
filtered by the
The defect data is filtered by
the time you selected
ELK: Filter and dig into the data:defect distribution by product
• The defect data
view shows all
defect data
Green part is one product
Double click the green product
The defect data is
filtered by the green product
The defect data view can be
changed to green product
defects
ELK: Multidimensional analysis: defect distribution by product
• Defects
• Defects of different products,different color stands for different products
ELK: Defect KPI displayed
• Severity
• Defect before or after release
• Defect close time
Content
Customer requirement and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
ELK: Advantages
• Analyze data without coding
• Fast deliver and low cost
• High flexibility to analyze data
• Easy deploy and maintain
• Learn business data before the data model is created
• Explore and dig the data step by step based on your understanding of
the business
• Big data method
• Performance
• High Availability
• Extendable
• Collect and import data easily
ELK: Why analyze data without coding
• Data analyzing and display
• Traditional method
• The bottle neck is related database
• Aggregated analysis can’t be done by database itself
• We need code using SQL statement like group by and count
• Even simple code make the analytics difficult, because the data,
data process and UI are coupled with the code
• ELK solution
• Powerful aggregated analysis and search capability
• UI is not coupled with data
• Query conditions and filter can be easily added to current
query
• Simple and powerful aggregated analysis,as SQL
group by
• Business concept can be learned from data
aggregation
• Below is Elasticsearch aggregating code
GET _search
{
"aggs" : {
“product": {
"terms": {"field": "{parsed_xml.product}"}
}
}
}
• The search result can be used for another query
"query_string": { "query":
"parsed_xml.product:“drivers" AND (*)" }
ELK: Query from configuration not coding
• Traditional data query issues:
• Too much data returned from select statement
• The main reason is that people don’t know how much data
will be returned before doing select
• The data is not filtered
• Too much data in one single table
• If one table is divided, the query code need to be modified
to merge the query result
• Too much influence to existed program
• Not easy to be extended when data increases
Traditional data query issues
• Big data method and concept
• When the amount of data can not be processed or handled by a
single point of resources (machines, CPU, etc.), The data and
the processing power and can be horizontal split, and does not
substantially affect the existing architecture
• ELK solution:
• Too much data returned from select statement
• Count before query
• Filtered before query using aggregating result
• Too much data in a single table
• One table can be divided, no need to change query
statement
• Time sequence is supported, easy to divide the time serous
data
• Easy to be extended through distributed data storage
How ELK deal with the data query issues
• From:
https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-
shards.html
• Elasticsearch allows you to start small and scale horizontally as you
grow. Simply add more nodes, and let the cluster automatically take
advantage of the extra hardware.
• Elasticsearch clusters are resilient — they will detect new or failed
nodes, and reorganize and rebalance data automatically, to ensure
that your data is safe and accessible.
ELK data storage: Elasticsearch distributed data storage
• Traditional data collector issues:
• The database is strictly defined by data types (schema)
• Same data may has different data types in different system
• The data schema relationship (data mapping) between
different system should be defined correctly before data
import
• Or the data import will be failed
Traditional data collector issues
• ELK solution:
• Schema less data import
• No consider data type before data import
• If default data type is not right, it can be changed
How ELK deal with the data collector issues
• With the existing plug-ins, much less programming or no programming
• Filtering, processing and increased data can be easily added to an existing
collection pipe line
• Input and output contents are flexible and extendable
ELK data import: Logstash pipe line
Input:
defect data
file
Filter1:
normalize XML
format
Filter2: Get
and parser
defect data
Filter3: Change
time format of
the input data
Output:
Elasticsearch
Input:
defect data
file
Filter4: Add a defect fixed
time field calculated by
defect close time minus
defect open time
Output:
Elasticsearch
Filter1 Filter2 Filter3
Want to add Filter4 to get
defect fixed time
• From: https://www.elastic.co/guide/en/logstash/1.5/deploying-and-
scaling.html
ELK data import: Logstash architecture

More Related Content

What's hot

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
QuerySurge integration with ETL / DataStage
QuerySurge integration with ETL / DataStageQuerySurge integration with ETL / DataStage
QuerySurge integration with ETL / DataStageAsad Abdullah
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Mrunal Shridhar
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your DataRTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Empowering Business Users: OBIEE 12c Visual Analyzer and Data Mashup
Empowering Business Users: OBIEE 12c Visual Analyzer and Data MashupEmpowering Business Users: OBIEE 12c Visual Analyzer and Data Mashup
Empowering Business Users: OBIEE 12c Visual Analyzer and Data MashupEdelweiss Kammermann
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityRTTS
 
Data Security and Protection in DevOps
Data Security and Protection in DevOps Data Security and Protection in DevOps
Data Security and Protection in DevOps Karen Lopez
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
Tableau Best Practices for OBIEE
Tableau Best Practices for OBIEETableau Best Practices for OBIEE
Tableau Best Practices for OBIEEBI Connector
 
Sql pass summit
Sql pass summitSql pass summit
Sql pass summitDon Severs
 

What's hot (20)

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
QuerySurge integration with ETL / DataStage
QuerySurge integration with ETL / DataStageQuerySurge integration with ETL / DataStage
QuerySurge integration with ETL / DataStage
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Smart submit
Smart submitSmart submit
Smart submit
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Empowering Business Users: OBIEE 12c Visual Analyzer and Data Mashup
Empowering Business Users: OBIEE 12c Visual Analyzer and Data MashupEmpowering Business Users: OBIEE 12c Visual Analyzer and Data Mashup
Empowering Business Users: OBIEE 12c Visual Analyzer and Data Mashup
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Data Security and Protection in DevOps
Data Security and Protection in DevOps Data Security and Protection in DevOps
Data Security and Protection in DevOps
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Tableau Best Practices for OBIEE
Tableau Best Practices for OBIEETableau Best Practices for OBIEE
Tableau Best Practices for OBIEE
 
Sql pass summit
Sql pass summitSql pass summit
Sql pass summit
 

Viewers also liked

Anupam chaturvedi resume latest
Anupam chaturvedi resume  latestAnupam chaturvedi resume  latest
Anupam chaturvedi resume latestAnupam chaturvedi
 
Developing On / With Open Source Software
Developing On / With Open Source SoftwareDeveloping On / With Open Source Software
Developing On / With Open Source SoftwareAustin Burdine
 
Technical Team Leader- Paula Mansour
Technical Team Leader- Paula MansourTechnical Team Leader- Paula Mansour
Technical Team Leader- Paula MansourPaula Mansour
 
Resume shubham gupta_javadeveloper(2+yrs)new
Resume shubham gupta_javadeveloper(2+yrs)newResume shubham gupta_javadeveloper(2+yrs)new
Resume shubham gupta_javadeveloper(2+yrs)newShubham Gupta
 

Viewers also liked (7)

陳香貝.doc
陳香貝.doc陳香貝.doc
陳香貝.doc
 
Anupam chaturvedi resume latest
Anupam chaturvedi resume  latestAnupam chaturvedi resume  latest
Anupam chaturvedi resume latest
 
apchi08
apchi08apchi08
apchi08
 
Developing On / With Open Source Software
Developing On / With Open Source SoftwareDeveloping On / With Open Source Software
Developing On / With Open Source Software
 
Resume
ResumeResume
Resume
 
Technical Team Leader- Paula Mansour
Technical Team Leader- Paula MansourTechnical Team Leader- Paula Mansour
Technical Team Leader- Paula Mansour
 
Resume shubham gupta_javadeveloper(2+yrs)new
Resume shubham gupta_javadeveloper(2+yrs)newResume shubham gupta_javadeveloper(2+yrs)new
Resume shubham gupta_javadeveloper(2+yrs)new
 

Similar to UsingELKExploreDefectData

Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tom Rieger
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Tracy Blackburn
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with ElasticsearchAlibaba Cloud
 
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...BMC Software
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Rolta
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Ds03 data analysis
Ds03   data analysisDs03   data analysis
Ds03 data analysisDotNetCampus
 
Geek Sync | Handling HIPAA Compliance with Your Data Access
Geek Sync | Handling HIPAA Compliance with Your Data AccessGeek Sync | Handling HIPAA Compliance with Your Data Access
Geek Sync | Handling HIPAA Compliance with Your Data AccessIDERA Software
 
Spatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FMESpatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FMESafe Software
 

Similar to UsingELKExploreDefectData (20)

Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
Data Segregation for Remedyforce SaaS Help Desk and High-Speed Digital Servic...
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.
 
Development Lifecycle
Development LifecycleDevelopment Lifecycle
Development Lifecycle
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Database Testing
Database TestingDatabase Testing
Database Testing
 
Ds03 data analysis
Ds03   data analysisDs03   data analysis
Ds03 data analysis
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Geek Sync | Handling HIPAA Compliance with Your Data Access
Geek Sync | Handling HIPAA Compliance with Your Data AccessGeek Sync | Handling HIPAA Compliance with Your Data Access
Geek Sync | Handling HIPAA Compliance with Your Data Access
 
Spatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FMESpatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FME
 

UsingELKExploreDefectData

  • 1. Use ELK Explore Defect Data Xu Yabin Singapore
  • 2. Content Customer requirements and defect KPI definition ELK solution ELK compared to traditional analytics method
  • 3. Customer Requirement • Online web applications which need to be deployed frequently • Serious defects and quality issues • Not enough test before applications deployed • Defects are always out of control after applications deployed • Serious defects are always found after the application deployed • Serious defects are not fixed on time • Implement Continuous integration and defect management system • What the result is and how to do continuous assessment for DevOps activities
  • 4. Defect KPI Definition • Based on the customer’s requirements, the defect KPI is defined as below • Defect number and distribution • Defects number before and after applications deployed • Serious defects number before and after applications deployed • Serious defects fixed time
  • 5. Data analytics tools requirement • What data analytics tools do we need • Easily import defect data from current defect system • Easily configure and calculate to get the KPI data • Explore defect data without any data model preparation • Easily dig into the detailed information • Easy to maintain • We choose ELK (Elasticsearch, Logstash, Kibana)
  • 6. Content Customer requirement and defect KPI definition ELK solution ELK compared to traditional analytics method
  • 7. ELK Solution Defect Management System Distributed data storage and search engine Original defect data Logstash Elasticsearch Kibana Data collector Data analytics and result • Most of the works are done through configuration, not coding
  • 8. Original defect data • Original defect data is from customer’s defect management system, XML format
  • 9. ELK Data collector: Logstash • Collect defect data using Logstash • Compared to traditional data collector (much code work is needed), Logstash need no code, only several lines of configuration • Defect data is put into Elasticsearch through Logstash pipeline
  • 10. ELK User interface configuration: Kibana • When data is imported into Elasticsearch, UI configuration can be done using Kibana • UI configuration is focused on what will be displayed • Configuration is in a very natural way • No business data model is needed before doing the configuration
  • 11. ELK : User interface • Easily add query conditions and filters to dig into the data
  • 12. ELK: Filter and dig into the data:defect distribution by time • The defect data view shows all defect data Most defects are created in the year 2015, use the mouse to drag the area The defect data is filtered by the The defect data is filtered by the time you selected
  • 13. ELK: Filter and dig into the data:defect distribution by product • The defect data view shows all defect data Green part is one product Double click the green product The defect data is filtered by the green product The defect data view can be changed to green product defects
  • 14. ELK: Multidimensional analysis: defect distribution by product • Defects • Defects of different products,different color stands for different products
  • 15. ELK: Defect KPI displayed • Severity • Defect before or after release • Defect close time
  • 16. Content Customer requirement and defect KPI definition ELK solution ELK compared to traditional analytics method
  • 17. ELK: Advantages • Analyze data without coding • Fast deliver and low cost • High flexibility to analyze data • Easy deploy and maintain • Learn business data before the data model is created • Explore and dig the data step by step based on your understanding of the business • Big data method • Performance • High Availability • Extendable • Collect and import data easily
  • 18. ELK: Why analyze data without coding • Data analyzing and display • Traditional method • The bottle neck is related database • Aggregated analysis can’t be done by database itself • We need code using SQL statement like group by and count • Even simple code make the analytics difficult, because the data, data process and UI are coupled with the code • ELK solution • Powerful aggregated analysis and search capability • UI is not coupled with data • Query conditions and filter can be easily added to current query
  • 19. • Simple and powerful aggregated analysis,as SQL group by • Business concept can be learned from data aggregation • Below is Elasticsearch aggregating code GET _search { "aggs" : { “product": { "terms": {"field": "{parsed_xml.product}"} } } } • The search result can be used for another query "query_string": { "query": "parsed_xml.product:“drivers" AND (*)" } ELK: Query from configuration not coding
  • 20. • Traditional data query issues: • Too much data returned from select statement • The main reason is that people don’t know how much data will be returned before doing select • The data is not filtered • Too much data in one single table • If one table is divided, the query code need to be modified to merge the query result • Too much influence to existed program • Not easy to be extended when data increases Traditional data query issues
  • 21. • Big data method and concept • When the amount of data can not be processed or handled by a single point of resources (machines, CPU, etc.), The data and the processing power and can be horizontal split, and does not substantially affect the existing architecture • ELK solution: • Too much data returned from select statement • Count before query • Filtered before query using aggregating result • Too much data in a single table • One table can be divided, no need to change query statement • Time sequence is supported, easy to divide the time serous data • Easy to be extended through distributed data storage How ELK deal with the data query issues
  • 22. • From: https://www.elastic.co/guide/en/elasticsearch/guide/current/replica- shards.html • Elasticsearch allows you to start small and scale horizontally as you grow. Simply add more nodes, and let the cluster automatically take advantage of the extra hardware. • Elasticsearch clusters are resilient — they will detect new or failed nodes, and reorganize and rebalance data automatically, to ensure that your data is safe and accessible. ELK data storage: Elasticsearch distributed data storage
  • 23. • Traditional data collector issues: • The database is strictly defined by data types (schema) • Same data may has different data types in different system • The data schema relationship (data mapping) between different system should be defined correctly before data import • Or the data import will be failed Traditional data collector issues
  • 24. • ELK solution: • Schema less data import • No consider data type before data import • If default data type is not right, it can be changed How ELK deal with the data collector issues
  • 25. • With the existing plug-ins, much less programming or no programming • Filtering, processing and increased data can be easily added to an existing collection pipe line • Input and output contents are flexible and extendable ELK data import: Logstash pipe line Input: defect data file Filter1: normalize XML format Filter2: Get and parser defect data Filter3: Change time format of the input data Output: Elasticsearch Input: defect data file Filter4: Add a defect fixed time field calculated by defect close time minus defect open time Output: Elasticsearch Filter1 Filter2 Filter3 Want to add Filter4 to get defect fixed time