2022 Trends in
Enterprise Advanced
Analytics
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
A 2 time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
#AdvAnaly;cs
Image Goes
Here
Know Better™
ChaosSearch helps modern organizations
Know Better™ by activating the data lake for
analytics.
The ChaosSearch Data Lake Platform indexes customers’ cloud
data, rendering it fully searchable and enabling analytics at scale
with massive reductions of time, cost and complexity.
© 2021 ChaosSearch, Inc.
The Data Analytics Challenge
3
Promise vs. Reality
Efficiently manage data growth Rapid time to insights
The Promise:
Self-service access to
all data for instant
insights that maximize
operational efficiency,
security posture, and
the user experience.
Single repository for all data
Dev
Ops
Sec
Ops
IT
Ops
LOB
The Reality:
Complex data swamp
that increases costs
and inhibits actionable
insights.
Data growth and variety
exceeds infrastructure and
resource capabilities
Gaps in data access, time to
access and loss of insights
Complex data silos
Dev
Ops
Sec
Ops
IT
Ops
LOB
© 2021 ChaosSearch, Inc. 4
What if:
You could analyze
any and all your data?
1
2
Automated and massive scale
3
Dramatically reduce time to insight
and save up to 80%
You would
Know Better™
→ Insights at scale
→ Immediate time to insight
→ Free up critical resources
→ See the world the way you want it
Without changing the way your users work
© 2021 ChaosSearch, Inc.
Cloud Data Lake Platform
5
ChaosSearch helps modern organizations Know Better™ by activating the data lake for analytics.
* This is a roadmap item and subject to change.
Beneficial Outcomes
✔ One unified Data Lake for analytics at scale
✔ Log, BI and Product-led growth insights
✔ Game changing simplicity and automation
✔ No more data pipelines or data movements
✔ No more schema management, sharding or
managing server clusters and their uptime
✔ All while using the same set of analytic tools
✔ Scale and performance for analytic workloads with up to
80% cost savings
Your Cloud Object Storage
DevOps/SecOps
Kibana
SecOps
Elastic API
CXO
Tableau/Looker
Business Analyst
Tableau/Looker
Data Scientist
TensorFlow/PyTorch
ChaosSearch Data Platform
Chaos Refinery®
Chaos Fabric ®
Chaos Index®
Elastic API Elastic API SQL API* SQL API* ML APIs*
Data Consumers
PUBLISHED
OPEN APIs
© 2021 ChaosSearch, Inc.
Insights at Scale
6
Easy as 1,2,3
Step 1
Store
Step 2
Connect
Step 3
Analyze
Store any/all
data in your
cloud storage
• AWS S3 and GCP have
industry leading reliability,
resiliency, scalability, cost
effectiveness and security
built in…. simply use it
• No transformation required
Connect in less
than 5 minutes
Analyze data using
existing tools
• Click to configure S3 or GCP connectivity
– Read-only access to bucket data
– Location to write indices into bucket
• Click to Index (Chaos Index)
– Static, Live or Real-time data indexing
– Built-in schema detection/normalization
• Click to create a view (Chaos Refinery)
– Instant/Virtual Aggregation and
Transformation of Indices
– Relational Joins for Correlations
– Advanced JSON exploitation
– Full RBAC controls
• Use the Open APIs that
ChaosSearch publishes to
analyze/visualize the data.
• Data Consumers use their
existing tools.
© 2021 ChaosSearch, Inc.
Log Analytics Transformed
Before: Elasticsearch (ELK stack)
DevO
ps
SecO
ps
LOB
???
• Limited retention
• Expensive to scale
• Management and
configuration
challenges
• Downtime created by
instability at scale
• Multiple data silos
created due to the
limits above
Cloud Object Storage
i.e., Google GCS, AWS S3
Dev
Ops
Sec
Ops
LOB ???
PUBLISHED
ELASTIC API
One unified data lake
Unlimited scale and retention.
Save up to 80% on Managed Service with 99.99% uptime.
With ChaosSearch
© 2021
ChaosSearch, 7
Image Goes
Here
Our SRE teams used to struggle with managing
the vast amount of logs it takes to support
millions of users in real time in a consistent
manner across all our product lines. With
ChaosSearch, we are able to use a singular
solution for our various logs without the hassle
of managing the logging tools as well.”
Joel Snook, Director, DevOps Engineering
ChaosSearch Replaces Elasticsearch for Log Analytics
Activate your cloud object storage to become a hot, analytical data lake.
Thank you
© 2021 ChaosSearch, Inc.
Log Analytics at Scale
10
Optimizing cloud services and applications and mitigating persistent threats relies on complete log coverage
IT & Cloud Ops
Optimization
DevOps
Efficiency
• Efficiently capture all logs
across distributed
architecture, microservices,
containers, etc. to prevent
incidents and improve
troubleshooting
• Eliminate pipelines and
process and join multiple
logs virtually for in-depth
analysis in minutes instead
of days/weeks
• Faster root cause analysis
and troubleshooting
• Instant feedback into CI/CD
pipeline to identify potential
issues prior to production
• Minimize data filtering and
prep – capture all log data
efficiently and join multiple
sources
SecOps & Threat
Hunting
• Unlimited data retention -
Keep logs indefinitely to
thwart persistent threats and
meet compliance mandates
• Centralize all logs for greater
visibility, hunting, and threat
mitigation
• Built-in alerts to tag and
automate response to
threats in near real time.
William McKnight
President, McKnight Consulting Group
• Frequent keynote speaker and trainer internationally
• Consulted to many Global 1000 companies
• Hundreds of articles, blogs and white papers in
publication
• Focused on delivering business value and solving
business problems utilizing proven, streamlined
approaches to information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur
of Year Finalist
• Owner/consultant: 2018 and 2017 Inc. 5000 strategy &
implementation consulting firm
• 30 years of information management and DBMS
experience
William McKnight
The Savvy Manager’s Guide
The
Savvy
Manager’s
Guide
Information
Management
Information Management
Strategies for Gaining a
Competitive Advantage with Data
McKnight Consulting Group Offerings
Strategy
Training
Strategy
§ Trusted Advisor
§ Ac1on Plans
§ Roadmaps
§ Tool Selec1ons
§ Program Management
Training
§ Classes
§ Workshops
Implementa/on
§ Data/Data Warehousing/Business
Intelligence/Analy1cs
§ Master Data Management
§ Governance/Quality
§ Big Data
Implementa;on
3
Why Are Trends Important?
• It is imperative to see trends that affect your
business to know how to respond
• Plan for and deal with change
• Better to be at the beginning of the trend
rather than the end
• Wants, needs, and tastes of your customer
changes
• Make you a leader, not a follower
• Grow your business ideas
• Give you ideas what to improve in your
business
Information Management Leaders
• Information Management leaders of
tomorrow can advance maturity while also
solving business issues
– There’s no budget for “staying on trends”
• Information Management leaders must pick
their winning (i.e., multi-year sustainable)
approaches and get on board
Last Year’s Trends
• Remote Work Continues
• Led by Cloud Capabilities, Strong Tech Spending Rebound in 2021
• Leading Organizations are increasing a focus on AI/ML
• Model Deployment Takes Center Stage
• More Edge AI
• Explainable AI
• Strong Data Lake Adoption
• New Technology Stacks: Shift from only data warehouses, lakes, and
ETL to data fabrics, AI, and pipelines
• Strong DEVOps Adoption
• Strong MLOps Adoption
• Automation
• Open Source Adoption
• Kubernetes Adoption
• We are at the start of General AI
6
Top Trends in Enterprise Analytics
for 2022 and Beyond
• Embedded Databases at the edge
• AI baked into the chips
• Decision making at the edge
• High-Performance Edge AI
• Real-Time Data Wrangling
Edge AI and Edge Computing Dominate
Architectures
Data Scientists Start Doing More Data
Science Than Data Cultivation
Wide Adoption of Containerized data
10
• Data analytics stack goes Kubernetes for
both open source and commercial
• Winners go from thought to POC quickly
• Serverlessness
Kubernetes
Synthetic Data Used for Training AI Models
12
Data Fabric Sees Uptake, Becomes Newest
Hollow Term
13
AI-Enabled Applications
• Venture Funds will shift from AI tools and
technologies to AI-enabled applications
• AI is coding- and trial-heavy; Customers will
begin demanding low- and no-code off-the-
shelf AI
• AI will focus on automation and deep,
complex analysis of big data for immediate
action
14
Data Catalogs Cross Chasm in Data Stack
• Data Catalogs serve as metadata store for
all services including data integration,
prep/transformation, data lake, DW, ML
• Identifies relationships
• Identifies data pipelines
• Serves preferences in data set selection
• Documents all data sets (including
connection info)
15
Data Quality Subsumed into Data
Observability (and Data Observability
Becomes Huge)
16
Predictive
data quality &
observability
Scale
detection
Leverage ML to generate
explainable and adaptive
DQ rules
Scale
architecture
Scan large and diverse
databases, files and
streaming data
Scale
adoption
Empower users with a
unified scoring system
and personal alerts
Streaming Analytics Grows with IoT
17
Data Prep /
Enrichment SQL on
Hadoop
Raw Data Topics
JSON, AVRO
Processed
Data Topics
and
/ or
Stream
Processing
or
Live device log data
Sensors and Automation Drive Data Volume
18
Medicine Jumps Shark on Neurological
Disorders Leading to DNA Revolution
19
Artificial Intelligence, Based on Data, Moves
Hard into Design
20
That Design Extends to Tech and Software
21
AutoML Cements Itself as The Future of ML
22
AutoML features to look for:
• Algorithm availability
• Preprocessing capabili5es
• Search methods
• Ensembles
• Explainability
Automatically build in parallel
multiple models to select the best
Open-Source/Paid AutoML tools
• AutoWEKA
• Auto-sklearn
• TPOT
• Google Cloud AutoML
• H20 AutoML
Apply data
preprocessing
Research to pinpoint
the right ML
algorithm
Optimize
hyperparameters for
selected algorithms
Golden ML ensemble
Automate the design of
machine learning
models:
AutoML
GPT-3 Becomes Premier NLP
23
§ There’s more
maturity in moving
imperfectly than in
merely perfectly
defining the
shortcomings
§ Build credibility
§ Don’t be afraid to
fail
§ Don’t talk yourself
out of having a new
beginning
§Have an open mind
§No plateaus are
comfortable for long
§That resistance is not
about making
progress, it’s the
journey
Winning Approaches in 2022
• Edge AI
• Containerized Data with Kubernetes
• Synthetic Data for Training AI Models
• Avoid Miscommunication: i.e., Data Fabric
• AI-Enabled Applications
• Data Catalogs
• Data Observability
• Streaming Analytics
• AI Design
• AutoML
• GPT3
2022 Trends in
Enterprise Advanced
Analytics
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers 360
President, McKnight Consulting Group
A 2 Time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET

2022 Trends in Enterprise Analytics

  • 1.
    2022 Trends in EnterpriseAdvanced Analytics Presented by: William McKnight “#1 Global Influencer in Big Data” Thinkers360 President, McKnight Consulting Group A 2 time Inc. 5000 Company @williammcknight www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET #AdvAnaly;cs
  • 2.
  • 3.
    ChaosSearch helps modernorganizations Know Better™ by activating the data lake for analytics. The ChaosSearch Data Lake Platform indexes customers’ cloud data, rendering it fully searchable and enabling analytics at scale with massive reductions of time, cost and complexity.
  • 4.
    © 2021 ChaosSearch,Inc. The Data Analytics Challenge 3 Promise vs. Reality Efficiently manage data growth Rapid time to insights The Promise: Self-service access to all data for instant insights that maximize operational efficiency, security posture, and the user experience. Single repository for all data Dev Ops Sec Ops IT Ops LOB The Reality: Complex data swamp that increases costs and inhibits actionable insights. Data growth and variety exceeds infrastructure and resource capabilities Gaps in data access, time to access and loss of insights Complex data silos Dev Ops Sec Ops IT Ops LOB
  • 5.
    © 2021 ChaosSearch,Inc. 4 What if: You could analyze any and all your data? 1 2 Automated and massive scale 3 Dramatically reduce time to insight and save up to 80% You would Know Better™ → Insights at scale → Immediate time to insight → Free up critical resources → See the world the way you want it Without changing the way your users work
  • 6.
    © 2021 ChaosSearch,Inc. Cloud Data Lake Platform 5 ChaosSearch helps modern organizations Know Better™ by activating the data lake for analytics. * This is a roadmap item and subject to change. Beneficial Outcomes ✔ One unified Data Lake for analytics at scale ✔ Log, BI and Product-led growth insights ✔ Game changing simplicity and automation ✔ No more data pipelines or data movements ✔ No more schema management, sharding or managing server clusters and their uptime ✔ All while using the same set of analytic tools ✔ Scale and performance for analytic workloads with up to 80% cost savings Your Cloud Object Storage DevOps/SecOps Kibana SecOps Elastic API CXO Tableau/Looker Business Analyst Tableau/Looker Data Scientist TensorFlow/PyTorch ChaosSearch Data Platform Chaos Refinery® Chaos Fabric ® Chaos Index® Elastic API Elastic API SQL API* SQL API* ML APIs* Data Consumers PUBLISHED OPEN APIs
  • 7.
    © 2021 ChaosSearch,Inc. Insights at Scale 6 Easy as 1,2,3 Step 1 Store Step 2 Connect Step 3 Analyze Store any/all data in your cloud storage • AWS S3 and GCP have industry leading reliability, resiliency, scalability, cost effectiveness and security built in…. simply use it • No transformation required Connect in less than 5 minutes Analyze data using existing tools • Click to configure S3 or GCP connectivity – Read-only access to bucket data – Location to write indices into bucket • Click to Index (Chaos Index) – Static, Live or Real-time data indexing – Built-in schema detection/normalization • Click to create a view (Chaos Refinery) – Instant/Virtual Aggregation and Transformation of Indices – Relational Joins for Correlations – Advanced JSON exploitation – Full RBAC controls • Use the Open APIs that ChaosSearch publishes to analyze/visualize the data. • Data Consumers use their existing tools.
  • 8.
    © 2021 ChaosSearch,Inc. Log Analytics Transformed Before: Elasticsearch (ELK stack) DevO ps SecO ps LOB ??? • Limited retention • Expensive to scale • Management and configuration challenges • Downtime created by instability at scale • Multiple data silos created due to the limits above Cloud Object Storage i.e., Google GCS, AWS S3 Dev Ops Sec Ops LOB ??? PUBLISHED ELASTIC API One unified data lake Unlimited scale and retention. Save up to 80% on Managed Service with 99.99% uptime. With ChaosSearch © 2021 ChaosSearch, 7
  • 9.
    Image Goes Here Our SREteams used to struggle with managing the vast amount of logs it takes to support millions of users in real time in a consistent manner across all our product lines. With ChaosSearch, we are able to use a singular solution for our various logs without the hassle of managing the logging tools as well.” Joel Snook, Director, DevOps Engineering ChaosSearch Replaces Elasticsearch for Log Analytics Activate your cloud object storage to become a hot, analytical data lake.
  • 10.
  • 11.
    © 2021 ChaosSearch,Inc. Log Analytics at Scale 10 Optimizing cloud services and applications and mitigating persistent threats relies on complete log coverage IT & Cloud Ops Optimization DevOps Efficiency • Efficiently capture all logs across distributed architecture, microservices, containers, etc. to prevent incidents and improve troubleshooting • Eliminate pipelines and process and join multiple logs virtually for in-depth analysis in minutes instead of days/weeks • Faster root cause analysis and troubleshooting • Instant feedback into CI/CD pipeline to identify potential issues prior to production • Minimize data filtering and prep – capture all log data efficiently and join multiple sources SecOps & Threat Hunting • Unlimited data retention - Keep logs indefinitely to thwart persistent threats and meet compliance mandates • Centralize all logs for greater visibility, hunting, and threat mitigation • Built-in alerts to tag and automate response to threats in near real time.
  • 12.
    William McKnight President, McKnightConsulting Group • Frequent keynote speaker and trainer internationally • Consulted to many Global 1000 companies • Hundreds of articles, blogs and white papers in publication • Focused on delivering business value and solving business problems utilizing proven, streamlined approaches to information management • Former Database Engineer, Fortune 50 Information Technology executive and Ernst&Young Entrepreneur of Year Finalist • Owner/consultant: 2018 and 2017 Inc. 5000 strategy & implementation consulting firm • 30 years of information management and DBMS experience William McKnight The Savvy Manager’s Guide The Savvy Manager’s Guide Information Management Information Management Strategies for Gaining a Competitive Advantage with Data
  • 13.
    McKnight Consulting GroupOfferings Strategy Training Strategy § Trusted Advisor § Ac1on Plans § Roadmaps § Tool Selec1ons § Program Management Training § Classes § Workshops Implementa/on § Data/Data Warehousing/Business Intelligence/Analy1cs § Master Data Management § Governance/Quality § Big Data Implementa;on 3
  • 14.
    Why Are TrendsImportant? • It is imperative to see trends that affect your business to know how to respond • Plan for and deal with change • Better to be at the beginning of the trend rather than the end • Wants, needs, and tastes of your customer changes • Make you a leader, not a follower • Grow your business ideas • Give you ideas what to improve in your business
  • 15.
    Information Management Leaders •Information Management leaders of tomorrow can advance maturity while also solving business issues – There’s no budget for “staying on trends” • Information Management leaders must pick their winning (i.e., multi-year sustainable) approaches and get on board
  • 16.
    Last Year’s Trends •Remote Work Continues • Led by Cloud Capabilities, Strong Tech Spending Rebound in 2021 • Leading Organizations are increasing a focus on AI/ML • Model Deployment Takes Center Stage • More Edge AI • Explainable AI • Strong Data Lake Adoption • New Technology Stacks: Shift from only data warehouses, lakes, and ETL to data fabrics, AI, and pipelines • Strong DEVOps Adoption • Strong MLOps Adoption • Automation • Open Source Adoption • Kubernetes Adoption • We are at the start of General AI 6
  • 17.
    Top Trends inEnterprise Analytics for 2022 and Beyond
  • 18.
    • Embedded Databasesat the edge • AI baked into the chips • Decision making at the edge • High-Performance Edge AI • Real-Time Data Wrangling Edge AI and Edge Computing Dominate Architectures
  • 19.
    Data Scientists StartDoing More Data Science Than Data Cultivation
  • 20.
    Wide Adoption ofContainerized data 10
  • 21.
    • Data analyticsstack goes Kubernetes for both open source and commercial • Winners go from thought to POC quickly • Serverlessness Kubernetes
  • 22.
    Synthetic Data Usedfor Training AI Models 12
  • 23.
    Data Fabric SeesUptake, Becomes Newest Hollow Term 13
  • 24.
    AI-Enabled Applications • VentureFunds will shift from AI tools and technologies to AI-enabled applications • AI is coding- and trial-heavy; Customers will begin demanding low- and no-code off-the- shelf AI • AI will focus on automation and deep, complex analysis of big data for immediate action 14
  • 25.
    Data Catalogs CrossChasm in Data Stack • Data Catalogs serve as metadata store for all services including data integration, prep/transformation, data lake, DW, ML • Identifies relationships • Identifies data pipelines • Serves preferences in data set selection • Documents all data sets (including connection info) 15
  • 26.
    Data Quality Subsumedinto Data Observability (and Data Observability Becomes Huge) 16 Predictive data quality & observability Scale detection Leverage ML to generate explainable and adaptive DQ rules Scale architecture Scan large and diverse databases, files and streaming data Scale adoption Empower users with a unified scoring system and personal alerts
  • 27.
    Streaming Analytics Growswith IoT 17 Data Prep / Enrichment SQL on Hadoop Raw Data Topics JSON, AVRO Processed Data Topics and / or Stream Processing or Live device log data
  • 28.
    Sensors and AutomationDrive Data Volume 18
  • 29.
    Medicine Jumps Sharkon Neurological Disorders Leading to DNA Revolution 19
  • 30.
    Artificial Intelligence, Basedon Data, Moves Hard into Design 20
  • 31.
    That Design Extendsto Tech and Software 21
  • 32.
    AutoML Cements Itselfas The Future of ML 22 AutoML features to look for: • Algorithm availability • Preprocessing capabili5es • Search methods • Ensembles • Explainability Automatically build in parallel multiple models to select the best Open-Source/Paid AutoML tools • AutoWEKA • Auto-sklearn • TPOT • Google Cloud AutoML • H20 AutoML Apply data preprocessing Research to pinpoint the right ML algorithm Optimize hyperparameters for selected algorithms Golden ML ensemble Automate the design of machine learning models: AutoML
  • 33.
  • 34.
    § There’s more maturityin moving imperfectly than in merely perfectly defining the shortcomings § Build credibility § Don’t be afraid to fail § Don’t talk yourself out of having a new beginning §Have an open mind §No plateaus are comfortable for long §That resistance is not about making progress, it’s the journey
  • 35.
    Winning Approaches in2022 • Edge AI • Containerized Data with Kubernetes • Synthetic Data for Training AI Models • Avoid Miscommunication: i.e., Data Fabric • AI-Enabled Applications • Data Catalogs • Data Observability • Streaming Analytics • AI Design • AutoML • GPT3
  • 36.
    2022 Trends in EnterpriseAdvanced Analytics Presented by: William McKnight “#1 Global Influencer in Big Data” Thinkers 360 President, McKnight Consulting Group A 2 Time Inc. 5000 Company @williammcknight www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET