The latest distributed system utilizing the cloud is a very complicated configuration in which the components span a plurality of components. Applications for customers are part of products, and service quality targets directly linked to business indicators are needed. Legacy monitoring system based on traditional system management is not linked not only to business indicators but also to measure service quality. Google advocates the idea of site reliability engineering (SRE) and introduces efforts to measure quality of service. Based on the concept of SRE, the service quality monitoring system collects and analyzes logs from various components not only application codes but also whole infrastructure components. Since very large amounts of data must be processed in real time, it is necessary to design carefully with reference to the big data architecture. To utilize this system, you can measure the quality of service, and make it possible to continuously improve the service quality.
Cloud Deployment of Data Harmony
Jeffrey Gordon, Lead Developer, Access Innovations, Inc.
Jeffrey will describe the cloud deployment of the Data Harmony software.
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
Some of my thoughts and adventures encapsulated in a presentation regarding Capacity Planning, Resource Utilization, and Enterprise Managers Collected Metrics.
Cloud Deployment of Data Harmony
Jeffrey Gordon, Lead Developer, Access Innovations, Inc.
Jeffrey will describe the cloud deployment of the Data Harmony software.
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
Some of my thoughts and adventures encapsulated in a presentation regarding Capacity Planning, Resource Utilization, and Enterprise Managers Collected Metrics.
CIS benchmarks are the industry standard to secure IT systems including Public Cloud platforms. The presentation covers how the benchmarks differ for AWS , Azure and GCP clouds and various cloud native services used to achieve the compliance.
The Key factors which determine a good architecture, various types of architecture and when to apply them.
How to define a truly flexible architecture in an Agile environment which will evolve with the business instead of constraining it
Interconnect session 1888: Rational Team Concert Process Customization: What ...Rosa Naranjo
Process customization, especially around work items, has always been a major hotspot in Rational Team Concert (RTC) adoption. Experience also shows that a lot of the requirements in process customization are driven by process problems and the desire to avoid having to fix the process by customizing the tools instead. This presentation shows the process customization capabilities available in RTC. It also shows examples for requirements that are hard or impossible to implement in RTC and it shows requirements that might indicate that you should consider fixing the process rather than the tool.
Webinar: SAP HANA - Features, Architecture and AdvantagesAPPSeCONNECT
We recently had a Webinar on SAP HANA on 21st June 2017. Here are the key points which were covered in the Webinar:
*What is SAP HANA?
*SAP HANA Architecture.
*Key benefits which can lead to SAP HANA as your backend database system.
*Difference between SAP HANA and Traditional RDBMS.
*Use Cases of SAP HANA Database system.
*Technology basics of HANA database.
*Limitations.
Mr. Abhishek Sur, Solution Architect at APPSeCONNECT, was the speaker in the Webinar. This recorded webinar will give you knowledge on the working principles of SAP HANA database and also define mutual pros and cons of SAP HANA over traditional databases used previously. Check out the Recorded Webinar!
Check out all the SAP B1 Integrations here: https://www.appseconnect.com/sap-business-one-integrations/
Check out all the SAP ECC integrations here: https://www.appseconnect.com/sap-ecc-integrations/
VMworld 2013: Performance Management of Business Critical Applications using ...VMworld
VMworld 2013
Vas Mitra, VMware
David Overbeek, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
This session was delivered as part of the Oracle Ground Breakers EMEA tour in Romania. What does "autonomous" really mean, and what makes the database autonomous? If you're looking for the answers to these questions, this is the session for you! In this session, we invite you to take a peek under the hood of the Oracle Autonomous Database, so you can get a clear understanding of how our unique Autonomous Database works. We’ll share our exclusive combination of database features, best practices and machine learning algorithms that make up this family of cloud services. With the use of live demos, we’ll illustrates how it can simplify your approach to data management and accelerate your transition to the cloud.
What are the various concepts to be taken into consideration when creating a software architecture, what are the different types of architecture and when to apply what to a business scenario
Enhance Oracle’s OEM Grid Control by delivering a tool that reports on and drills into the sources of the historical alert volume data across the entire enterprise. The OEM Alert Optimizer significantly adds to the value of OEM Grid Control (10g) Cloud Control (12c) and allows one to fully leverage the tool’s monitoring flexibility so that support staff can spend their time addressing only properly tuned alerts that fire when targets are truly “in trouble”.
This presentation from 2008 is a good summary of Design by Contract and its application to PL/SQL as I have adopted and recommend others to try as well.
Azure SQL Database now has a Managed Instance, for near 100% compatibility for lifting-and-shifting applications running on Microsoft SQL Server to Azure. Contact me for more information.
Assure MIMIX, the leader in IBM i high availability and disaster recovery, keeps your mission-critical business applications running continuously and protects your data from loss. Precisely has recently delivered a new release of Assure MIMIX 10. This new release Assure includes even better support for IBM i customers operating in Cloud, Hosting and Managed Service Ecosystems.
Assure MIMIX 10 provides a new simplified pricing and licensing model built to support the needs of today’s IBM i systems whether they are on-premises or in the cloud. In addition, there are several new capabilities that are designed to make Assure MIMIX an even better solution for IBM i users needing a powerful HA solution.
Join us on this on-demand webinar to learn about the new Assure MIMIX 10 licensing changes as well as:
- Faster, more intelligent synchronization
- Automated configuration capabilities
- Enhanced recovery operations
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services
By turning the data center into an API, AWS has enabled Sumo Logic to build a very large scale IT operational analytics platform as a service at unprecedented scale and velocity. Based around Amazon EC2 and Amazon S3, the Sumo Logic system is ingesting many terabytes of unstructured log data a day while at the same time delivering real-time dashboards and supporting hundreds of thousands of queries against the collected data. When co-founder and CTO Christian Beedgen started Sumo Logic, it was obvious that the service would have to scale quickly and elastically, and AWS has been providing the perfect infrastructure for this endeavor from the start.
In this talk, Christian dives into the core Sumo Logic architecture and explains which AWS services are making Sumo Logic possible. Based around an in-house developed automation and continuous deployment system, Sumo Logic is leveraging Amazon S3 in particular for large-scale data management and Amazon DynamoDB for cluster configuration management. By relying on automation, Sumo Logic is also able to perform sophisticated staging of new code for rapid deployment. Using the log-based instrumentation of the Sumo Logic codebase, Christian will dive into the performance characteristics achieved by the system today and share war stories about lessons learned along the way.
A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation A...Ram G Athreya
Over the past decade the field of Cloud Computing has been the focus of intensive research. In this paper we propose a framework that will simulate the architectural setup of a cloud environment and examine how it can leverage Apriori and Sequential Pattern based recommendation algorithms through R. Furthermore, we present a multi layered application encompassing its backend architecture, user interface built using the responsive web design technique and its development workflow. The proposed system was also exhaustively load tested using Apache JMeter to ensure its reliability at scale and the experimental results are presented.
CIS benchmarks are the industry standard to secure IT systems including Public Cloud platforms. The presentation covers how the benchmarks differ for AWS , Azure and GCP clouds and various cloud native services used to achieve the compliance.
The Key factors which determine a good architecture, various types of architecture and when to apply them.
How to define a truly flexible architecture in an Agile environment which will evolve with the business instead of constraining it
Interconnect session 1888: Rational Team Concert Process Customization: What ...Rosa Naranjo
Process customization, especially around work items, has always been a major hotspot in Rational Team Concert (RTC) adoption. Experience also shows that a lot of the requirements in process customization are driven by process problems and the desire to avoid having to fix the process by customizing the tools instead. This presentation shows the process customization capabilities available in RTC. It also shows examples for requirements that are hard or impossible to implement in RTC and it shows requirements that might indicate that you should consider fixing the process rather than the tool.
Webinar: SAP HANA - Features, Architecture and AdvantagesAPPSeCONNECT
We recently had a Webinar on SAP HANA on 21st June 2017. Here are the key points which were covered in the Webinar:
*What is SAP HANA?
*SAP HANA Architecture.
*Key benefits which can lead to SAP HANA as your backend database system.
*Difference between SAP HANA and Traditional RDBMS.
*Use Cases of SAP HANA Database system.
*Technology basics of HANA database.
*Limitations.
Mr. Abhishek Sur, Solution Architect at APPSeCONNECT, was the speaker in the Webinar. This recorded webinar will give you knowledge on the working principles of SAP HANA database and also define mutual pros and cons of SAP HANA over traditional databases used previously. Check out the Recorded Webinar!
Check out all the SAP B1 Integrations here: https://www.appseconnect.com/sap-business-one-integrations/
Check out all the SAP ECC integrations here: https://www.appseconnect.com/sap-ecc-integrations/
VMworld 2013: Performance Management of Business Critical Applications using ...VMworld
VMworld 2013
Vas Mitra, VMware
David Overbeek, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
This session was delivered as part of the Oracle Ground Breakers EMEA tour in Romania. What does "autonomous" really mean, and what makes the database autonomous? If you're looking for the answers to these questions, this is the session for you! In this session, we invite you to take a peek under the hood of the Oracle Autonomous Database, so you can get a clear understanding of how our unique Autonomous Database works. We’ll share our exclusive combination of database features, best practices and machine learning algorithms that make up this family of cloud services. With the use of live demos, we’ll illustrates how it can simplify your approach to data management and accelerate your transition to the cloud.
What are the various concepts to be taken into consideration when creating a software architecture, what are the different types of architecture and when to apply what to a business scenario
Enhance Oracle’s OEM Grid Control by delivering a tool that reports on and drills into the sources of the historical alert volume data across the entire enterprise. The OEM Alert Optimizer significantly adds to the value of OEM Grid Control (10g) Cloud Control (12c) and allows one to fully leverage the tool’s monitoring flexibility so that support staff can spend their time addressing only properly tuned alerts that fire when targets are truly “in trouble”.
This presentation from 2008 is a good summary of Design by Contract and its application to PL/SQL as I have adopted and recommend others to try as well.
Azure SQL Database now has a Managed Instance, for near 100% compatibility for lifting-and-shifting applications running on Microsoft SQL Server to Azure. Contact me for more information.
Assure MIMIX, the leader in IBM i high availability and disaster recovery, keeps your mission-critical business applications running continuously and protects your data from loss. Precisely has recently delivered a new release of Assure MIMIX 10. This new release Assure includes even better support for IBM i customers operating in Cloud, Hosting and Managed Service Ecosystems.
Assure MIMIX 10 provides a new simplified pricing and licensing model built to support the needs of today’s IBM i systems whether they are on-premises or in the cloud. In addition, there are several new capabilities that are designed to make Assure MIMIX an even better solution for IBM i users needing a powerful HA solution.
Join us on this on-demand webinar to learn about the new Assure MIMIX 10 licensing changes as well as:
- Faster, more intelligent synchronization
- Automated configuration capabilities
- Enhanced recovery operations
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services
By turning the data center into an API, AWS has enabled Sumo Logic to build a very large scale IT operational analytics platform as a service at unprecedented scale and velocity. Based around Amazon EC2 and Amazon S3, the Sumo Logic system is ingesting many terabytes of unstructured log data a day while at the same time delivering real-time dashboards and supporting hundreds of thousands of queries against the collected data. When co-founder and CTO Christian Beedgen started Sumo Logic, it was obvious that the service would have to scale quickly and elastically, and AWS has been providing the perfect infrastructure for this endeavor from the start.
In this talk, Christian dives into the core Sumo Logic architecture and explains which AWS services are making Sumo Logic possible. Based around an in-house developed automation and continuous deployment system, Sumo Logic is leveraging Amazon S3 in particular for large-scale data management and Amazon DynamoDB for cluster configuration management. By relying on automation, Sumo Logic is also able to perform sophisticated staging of new code for rapid deployment. Using the log-based instrumentation of the Sumo Logic codebase, Christian will dive into the performance characteristics achieved by the system today and share war stories about lessons learned along the way.
A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation A...Ram G Athreya
Over the past decade the field of Cloud Computing has been the focus of intensive research. In this paper we propose a framework that will simulate the architectural setup of a cloud environment and examine how it can leverage Apriori and Sequential Pattern based recommendation algorithms through R. Furthermore, we present a multi layered application encompassing its backend architecture, user interface built using the responsive web design technique and its development workflow. The proposed system was also exhaustively load tested using Apache JMeter to ensure its reliability at scale and the experimental results are presented.
ATAGTR2017 Unified APM: The new age performance monitoring for production sys...Agile Testing Alliance
The presentation on Unified APM: The new age performance monitoring for production systems was done during #ATAGTR2017, one of the largest global testing conference. All copyright belongs to the author.
Author and presenter : Kaushik Raghavan
Rakuten’s Journey with Splunk - Evolution of Splunk as a ServiceRakuten Group, Inc.
This is a presentation material of splunklive 2016 Tokyo.
(Japanese) Splunk Live 2016 での発表資料です。
楽天社内で展開しているSplunkの共通基盤である、Splunk as a Serviceのご紹介をします。
ユーザーの活用事例とともに、これまでのSplunkサービスの歩みを振り返り、今後の展望についてもお話します。
また、OSS化され、さらにパワーアップしたウェブツールのご紹介をしつつ、サービスのユーザー拡大と運用のコツをお伝えします。
The adoption of container native and cloud native development practices presents new operational challenges. Today’s microservice environments are polyglot, distributed, container-based, highly-scalable, and ephemeral. To understand your system, you need to be able to follow the life of a request across numerous components distributed in multiple environments. Without the proper tools it can feel impossible to determine a root cause of an issue. This requires a new approach to operations. We will review a series of open source observability tools for logging, monitoring, and tracing to help developers achieve operational excellence for running container-based workloads.
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
In this talk, you’ll learn about techniques used to build a feature drift detection as a service capability for your enterprise and beyond. Feature drift monitoring is a way to check volatility of machine learning model inputs. It can trigger investigations for potential model degradation as well as explain why models have shifted.
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Lucas Jellema
DevOps team are responsible for well performing applications in every aspect, through the entire life cycle and across the stack, including platform and infrastructure, on premises and all cloud environments. Keeping watch on current and predicted behavior of all running components is not an easy challenge.
The challenge is growing with multi tier architectures and IT landscapes distributed across technology stacks, locations and clouds. Oracle Management Cloud provides advanced capabilities to do application, platform and infrastructure monitoring and root cause log analysis. This session introduces OMC and tells about real live experiences with OMC for managing demanding non functional requirements in very hybrid environments. The objective discussed is to quickly spot problems – ideally before they occur – find the cause and a solution and apply the latter. The session demonstrates what OMC can do for Oracle Fusion Middleware and Database, both on premises and in the public cloud.
Migrating from a monolith to microservices – is it worth it?Katherine Golovinova
IURII IVON, EPAM Solution Architect, Microsoft Competency Center Expert.
The term ‘microservices’ has become so popular that many people see it as a silver bullet for all architectural problems, or at least as a trend that should be followed. If your project is a monolith today, does it make sense to move towards microservices? This presentation overviews painful issues to be considered when migrating from a monolith to microservice architecture, ways to solve them, and ideas on the feasibility of such migration.
By talking about Microsoft's journey to Cloud cadence, this talk goes through all the DevOps practices such as Infrastructure as Code, CI/CD, Release Management and Hypothesis Driven Development.
It also introduces the impact of Docker and PaaS in DevOps.
Keynote : évolution et vision d'Elastic ObservabilityElasticsearch
Elastic Observability aide les organisations à faire tendre vers zéro le temps moyen de résolution avec une visibilité complète de toutes les opérations technologiques sur une seule plateforme. Découvrez les dernières fonctionnalités et capacités à tous les niveaux, de l'ingestion aux données, tandis que les leaders de produits qui conçoivent Elastic Observability lèvent le voile sur son avenir.
Similar to Service quality monitoring system architecture (20)
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Service quality monitoring system architecture
1. 1
Service Quality Monitoring System
Architecture
Author: Matsuo Sawahashi
Division: GTS Japan, Solutioning, Chief Architect
Mail: matsuos@jp.ibm.com
2. 2
Self-
introduction
Name: Matsuo Sawahashi
Company: IBM Japan
Division: Global Technology Services
Title: Executive Architect / Chief Architect
Current job:
• Connected Vehicle Project at my client
• Design multi-cloud networking architecture leveraging SD-WAN and Cloud-Exchanges
• Design connected-vehicle platform architecture on Azure based on Zero Trust Security concept
• Design service quality monitoring system based on SRE (Site Reliability Engineering) principle
• GTS Japan Technical Vitalization Community Leader
• Provide mentoring and round table session for junior engineers
• Provide leading-edge technical seminars
• JUAS (Japan System Users Association) part time instructor
Certifications
• TOGAF9 certification
• The Open Group Distinguished Architect
Publications
• OpenStack Deep Technique Guide
3. 3
Executive
Summary
• The latest distributed system utilizing the cloud is a very
complicated configuration in which the components span a
plurality of components
• Applications for customers are part of products, and service
quality targets directly linked to business indicators are needed
• Legacy monitoring system based on traditional system
management is not linked not only to business indicators but also
to measure service quality
• Google advocates the idea of site reliability engineering (SRE)
and introduces efforts to measure quality of service
• Based on the concept of SRE, the service quality monitoring
system collects and analyzes logs from various components not
only application codes but also whole infrastructure components
• Since very large amounts of data must be processed in real time,
it is necessary to design carefully with reference to the big data
architecture
• To utilize this system, you can measure the quality of service,
and make it possible to continuously improve the service quality
4. 4
Problem
Statement
• Legacy approach in service management
• Monitoring each component individually and independently
• Access logs, error logs, CPU / RAM usages, etc..
• Application, server, network and storage
• Monitoring indicators are not tied to business indicators
• What is problem in legacy approach
• It is difficult to measure business service quality
• It is difficult to understand the user’s frustrations directly
• How many users feel frustration in response time?
• What are the functions that are not used so much?
• Which components are making performance worse?
• Approach
• We need to know what is going on in the whole system
including application, middleware, server, storage and network
5. 5
Referenced
Vision – SRE
“Site Reliability
Engineering”
• What is SRE?
• A methodology of system management and service operation
• Google is advocating and practicing
• Goal is to continue to improve site reliability
• What to do in SRE
• Defining business and IT alignment meant in practice
• Define Service Level Indicator (SLI) to measure service reliability
• Define Service Level Objective (SLO) for each SLI
• Monitoring everything - performance, availability and scalability
• Performing continuous improvement based on the result of
monitoring
6. 6
Service Quality
Monitoring
System
• Want is this?
• A system for collecting and analyzing logs through from whole
components making up a system and viewing statistics to
evaluate whether SLO has been achieved
• How does it work?
• Capture whole user’s transaction logs related to user’s
interaction through out from application components and
infrastructure components
• Provide a dashboard including search and analysis functions
• Benefit
• Can monitor the operating status according to business goal
• Can know the user’s experience (UX) systematically
• Can identify where the problem occurred immediately
• Can answer the cause of the problem as soon as there is an
inquiry
8. 8
Big data
architecture
patterns
Lambda Architecture
Hot path
Lambda architecture
Cold path
Batch Layer Service Layer
Master Data Batch View
Speed Layer
Real-time
View
Analytics
Client
Data
Source
• Speed Layer (Hot path) analyzes data in real time
• Batch Layer (Cold path) stores all of incoming data in its raw form and performs batch processing on the
data
• Service Layer indexes the batch view for efficient querying
• The Speed Layer updates the serving layer with incremental updates based on the most recent data
• The Lambda architecture was first proposed by Nathan Marz, author of Storm in 2012
• To realize service quality monitoring system, we need to treat huge log data produced from
variety of components
• Very large data sets require a long processing time to run the sort of queries that clients
need
• These queries need some algorithms such as MapReduce that operate in parallel across the
entire data sets and can not be performed in real time
• We want to get some results in real time with some loss of accuracy some times, then we will
combine batch result and real time result using below architecture patterns
Real-time
processing
Queuing
9. 9
Big data
architecture
patterns
Kappa Architecture Kappa architecture
Speed Layer
Real-time
View
Analytics
Client
Data
Source
• A drawback to the Lambda architecture is its complexity – processing logic appears in two
different places – the cold and hot paths – using difference frameworks
• The Kappa architecture uses a stream processing system and all data flows through a single path
Real-time
processing
Queuing
10. 10
Azure Blob
OS
Implementation
Example
Log Aggregator
& Message
Queuing
Real-time
Streaming
Processing
Store w/Search
& Analyze
Visualization /
Dashboard
Log Collector
Logstash
w/Azure
plugin
Filebeat
Kafka
Apache
Storm
OR
Apache
Spark
Streaming
Elasticsearch Kibana
Azure
Application
Insight
Azure
Monitor Azure Hubs
Logstash
w/Azure
plugin
Azure
Infrastructur
e
Component
s
Application
Code
Components (App / Infra)
•Collecting logs •Aggregating logs •Filtering
•Indexing
•Joining
•Storing data
•Indexing data
•Searching data
•Analyzing data
•Dashboard
•Write out logs •Write out logs
batch processing loop if needed
11. 11
Architectural
Decision
Example
Issue Which architecture should be adopted for processing large log data in real-time and
batching
Decision Kappa architecture
Status Completed
Category Platform
Assumptions A real-time processing feature would be required for viewing latest service quality
measures; and large batch processing feature would be also required for viewing
statistical data over long period.
Options 1. Lambda architecture
2. Kappa architecture
Arguments
(Rationale)
Both architectures would support our requirements, however Lambda architecture has
a complex structure and a lot of servers are required, and running cost may increase.
Kappa architecture has a simple structure.
Risk None
Implications None
Notes None
12. 12
Architectural
Decision
Example
Issue Which product should be used to realize Log Aggregator & Message Queuing function
Decision Kafka
Status Completed
Category Platform
Assumptions This function is the act of collecting large events logs from a variety of different
systems and data sources.
Options 1. Kafka
2. Redis
Arguments
(Rationale)
Redis is an in-memory store and it would be much faster than the disk-based Kafka.
Redis’s in-memory store is small and it can’t store large amount of data for long
periods of time. Kafka supports parallelism due to log partitioning of data. Redis does
not have parallelism.
Risk None
Implications None
Notes None
13. 13
Architectural
Decision
Example
Issue Which product should be used to realize Real-time Streaming Processing function
Decision
Status Under investigation
Category Platform
Assumptions This function is the act of processing streaming data in real-time such as adding
indexes and calculating something. It is important characteristics to have not only
speed but also exactly once capability since this system must be able to analyze the
cause and location of the problem promptly and reliably.
Options 1. Storm
2. Spark Streaming
Arguments
(Rationale)
Storm holds true streaming model for stream processing via core storm layer. Spark
Streaming acts as a wrapper over the batch processing. Storm supports three
message processing mode: At least once, At most once, Exactly once. Spark supports
only one message processing mode i.e. “At least once”.
Risk None
Implications None
Notes None
14. 14
Architectural
Decision
Example
Issue Which product should be used to realize Store w/Search & Analyze function
Decision Elasticsearch
Status Completed
Category Platform
Assumptions This function is the act of storing logs and adding indexes for analysis
Options 1. Elasticsearch
2. Splunk
Arguments
(Rationale)
Elasticsearch is an open source software product and would avoid vender lock-in.
Elasticsearch is free, but extended features are needed to purchase subscriptions.
Splunk is proprietary commercial software with high pricing level. Elasticsearch
supports a lot of plugins. Elasticsearch has now overtaken Splunk in term of the
population of Google searches.
Risk None
Implications None
Notes None
15. 15
Architectural
Decision
Example
Issue Which product should be used to realize Visualization with Dashboard function
Decision Kibana
Status Completed
Category Platform
Assumptions This function is the act of viewing analyzed log data and metric, and providing a
dashboard
Options 1. Kibana
2. Grafana
Arguments
(Rationale)
Grafana is designed for analyzing and visualizing metrics, and it does not allow full-
text data querying. Kibana is the ‘K’ in the ELK Stack produced by Elasticsearch and
most popular open source log analysis platform. Kibana supports not only metrics but
also analyzing log messages. Grafana supports built-in user control and
authentication features, but Kibana requires X-Pack which is a commercial (not free)
bundle of ELK add-ons for access control and authentication or adding open source
solutions such as SearchGuard.
Risk None
Implications None
Notes None
16. 16
Use Case
Example
# Trigger Input Outcome TAT Remark
UC001 Failure inquiries from end
users (unavailable, hardening,
different results, etc.)
• User ID
• Time (Option)
• Screen ID (Option)
• Error Code (Option)
Identification of failure (delay)
occurrence location
(application component or
infrastructure component) and
suggestion of workaround and
solution
Within 5
minutes
UC002 Failure inquiries from
monitoring operators (large
alert occurrence, unknown
alert occurrence, obviously
different events from normal
times, etc.)
• Time (Option)
• Alert Message (Option)
Same as above Within 5
minutes
UC003 Inquiries from the system
administrator (Is it working
normally? Is there any
problem? What is the
capacity situation? What is
the performance situation?)
• N/A Dashboard (number of users
within 1 hour, error rate, delay
rate, capacity upper limit
value and current usage rate
for each component, delay
rate within the most recent
one hour for each component,
trend graph
Within 5
minutes
UC004 Monthly report • N/A Transition graph of number of
users, error rate, delay rate of
the current month, same
information per component
Within 24
hours
17. 17
Log Data
Structure and
Format
Example
Item Type Sample
Transaction ID Text A3828OQZAG8367483
Current time Datetime 2018-09-10T09:01:48Z
Service name Text Authorization_Service
Component
name
Text Login_Component
API name (URL) Text http://www.company.com/login_api
HTTP method Text GET
HTTP status
code
Text 200
Request status Text Success
Response time Text 21
Component
A
API A-1
Component
B
API B-1
Component
CAPI C-1
API A-2
Push Button1
Push Button2
Service 1
Service 2
A concept of “Service”, “Component” and “API” Structure including Logging point
Log format
Logging point
Data Architecture for gathering log data from application component
18. 18
Performance &
Capacity
Assumption
Example
• The average number of components for all requests
• For instance : 10
• The average number of API calls in each component
• For instance : 5
• Average log length
• For instance : 2 KB
• The number of logs per request
• 10 x 5 = 50 records
• Log size per request
• 50 records x 2 KB = 100 KB
• The number of access to application (peak)
• For instance : 1,000 req/sec
• The number of access to log store
• 50 records x 1,000 req/sec = 50,000 req/sec
• The average number of access for 24 hours
• For instance : 1,000,000 req/day
• Log size per day
• 100 KB x 1,000,000 req/day = 100,000,000 KB/day = 95 GB/day
Component X 5 API Calls
Log
Component X 5 API Calls
X 10 Components
Log size : 2 KB
50,000 req/sec
1,000 req/sec (Peak)
95 GB/day
1,000,000 req/day
Sizing Model