SlideShare a Scribd company logo
DATEV eG
Data Beats Emotions
How DATEV Generates Business Value with Data-driven Decisions
matthias.mueller@datev.de / @bicaluv
DATEV eG
Agenda
 About the data
 Processing
 Business values
 What’s next
22.04.2019 Data Beats Emotions 2
DATEV eG
DATEV – Company
 Founded in 1966 as a co-operative organization
 Main business is software for tax consulting, accounting, and law business
 Our customers are mostly tax consultants and their clients
 B2B market
 7,500 employees (1,800 devs)
 1 billion euro annual revenue in 2018
 Typical tax consultant has around 10 employees. Few up to 1,500
 40,000 co-operative members
 160,000 companies using our software on behalf of their tax consultants
22.04.2019 Data Beats Emotions 3
DATEV eG
DATEV – Software
on-premises running at customers site
we do have data center applications, but not focused in this talk
MS Windows based, incl. MS SQL Server
250 different applications
22.04.2019 Data Beats Emotions 4
DATEV eG
About the data
 Based on in-memory logs generated for every on-prem application
 Logs include
 Clicks / Tracked User Interactions
 Exceptions
 Performance data
 + metadata: OS, screen resolution, touch device, UI themes, no IP !
22.04.2019 Data Beats Emotions 5
DATEV eG
About the data –
General Data Protection Regulation Compliance
 Personal data tracking requires agreement / consent management
 Dialog shown to each user  no agreement, no tracking data
 2 data schemas from client
 actual data with GUID (Globally Unique Identifier, generated at client site)
 agreement with GUID and User ID (for data warehouse joins)
 Essential for handling right to be forgotten without requiring big data deletes
22.04.2019 Data Beats Emotions 6
{ GUID, [data] }
Click
{ GUID, UserID, [ true | false ] }
Agreement
DATEV eG
About the data – GDPR Compliance
22.04.2019 Data Beats Emotions 7
{ A1, „File.Open“ }
Click1
{ A1, User42, true }
Agreement
{ A1, „File.Quit“ }
Click2
Big Data World
…
DATEV eG
About the data – GDPR Compliance
22.04.2019 Data Beats Emotions 8
{ A1, „File.Open“ }
Click1
{ A1, User42, true }
Agreement
{ A1, „File.Quit“ }
Click2
Big Data World
…
DATEV eG
About the data – Current Figures
22.04.2019 Data Beats Emotions 9
1 2
Agreements
Consent Rate
Startup
of every
Application
60 GB
Logfiles
per day
(decompressed)
200million
events
per day
(6,000/s)
Components
with 1,250
dynamic trace
points 30
Total Client Events in
Hadoop Cluster
billion
Unique User per day
200,000
Approx.
50
83%
DATEV eG22.04.2019 Data Beats Emotions 10© Galusha Photography / fotolia.com
DATEV eG
In early 2015 we tried using online tracking tool
22.04.2019 Data Beats Emotions 11
© kirill_makarov / fotolia.com
DATEV eG
…starting in 2016 we
experimented with
22.04.2019 Data Beats Emotions 12
© Henry Schmitt / fotolia.com
DATEV eG
…at the end of 2016 it settled down
to be a more mature approach
22.04.2019 Data Beats Emotions 13
© joerg dirmeitis / fotolia.com
DATEV eG
Actual Processing
22.04.2019 Data Beats Emotions 14
Data Center
HTTPS
Hadoop ClusterOn-premises ReportingInternet Tracking Server
ISA
DEV
Team of 7, including Devs, Data Scientist, Master of Ceremony, Requirements Engineer, and Product Owner
OP
Team of 2, operate the data center platforms
DMZ
DATEV eG
Actual Processing
22.04.2019 Data Beats Emotions 15
Data Center
HTTPS
Hadoop ClusterOn-premises ReportingInternet Tracking Server
ISA DMZ
DATEV eG
Actual Processing – Client
 Continuous monitoring of client logs using ring buffer
(remember: no individual agreement, no data)
 on-premises clients send data every 3 hours
(random distribution of sending time based on installation time)
  continuous flow of data
 BTW: We do dogfooding for client site data tracking, like buffer overruns, CPU, and
memory usage
22.04.2019 Data Beats Emotions 16
HTTPS
DATEV eG
Actual Processing – Ingestion
 Proprietary protocol to get from ISA to Cluster (DMZ)
 Transfers incoming unsecure data to secure data center every 5 minutes
  continuous flow of data to Hadoop Edge Node
22.04.2019 Data Beats Emotions 17
DATEV eG
Actual Processing – Ingestion
 CRON & Batch: Once every night, data gets processed
 Decompress
 Filter (valid timestamp, test data)
 Store and upload to HDFS in file chunks of 100 MB
22.04.2019 Data Beats Emotions 18
DATEV eG
Actual Processing – ETL Phase 1
 CRON & Batch: Once every night, data gets processed
 Start Spark job for agreement data
 Start Spark jobs for hot data (window of 5 days)
– De-duplicate data
– Add delayed received data
– Generate ORC files with data partitioned by day
– Optimize partitions (e.g. delete outdated partitions due to retention policy)
– Automated check of internal compliance regulations
(it is not allowed that data contains customer confidential data)
22.04.2019 Data Beats Emotions 19
DATEV eG
Actual Processing – ETL Phase 2
 Start Spark jobs to update data for reports
 Generate ORC files for Star Schema (facts and dimensions)
 Aggregations and calculations for reporting
 Update files of report tool incrementally by reading ORC files using Hive ODBC
(external tables)
22.04.2019 Data Beats Emotions 20
DATEV eG
HDP 2.6.5 Production Cluster
22.04.2019 Data Beats Emotions 21
Data Center
Rack 1 Rack 2
Edge
Master
Workers
…0001 …0003 …000 …0015 …0016 …0002 …0004 …0006 …0013 …0014
each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7
each
48 Cores, 512 GB RAM,
1 TB HDD, RHEL 7
each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7
Edge
Master
Workers
each
48 Cores, 512 GB RAM,
1 TB HDD, RHEL 7
DATEV eG
Reporting
22.04.2019 Data Beats Emotions 22
Guided Analytics using
© Saklakova / fotolia.com
DATEV eG
Actual Processing - Reporting
 UX (including click counts)
 Exceptions
 Performance
22.04.2019 Data Beats Emotions 23
22 different default reports
DATEV eG
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 24
Top 10 Screen Resolution
DATEV eG
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 25
Top 10 Screen Resolution by Target Market
Clients / Companies
Tax Consultants
Data Warehouse
Other
Lawyers
DATEV eG
0
5,000
10,000
15,000
20,000
25,000
1 2 3 4 5 6 7
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 26
Program Usage by Target Market
Clients / Companies
Tax Consultants
Data Warehouse
Member CountMember Type
Education Institutes
Public Sector
Lecturer
Other
DATEV eG22.04.2019 Data Beats Emotions 27
© artiemedvedev / fotolia.com
DATEV eG
Business Values
 UX, e.g. optimized screen resolution
 Check „Payed Beta Testers“ actual program usage
 A/B comparison (usage and performance)
 Proof of sales license bundles
 Performance anomaly detection, e.g. based on OSs
22.04.2019 Data Beats Emotions 28
DATEV eG
Business Values
 Discontinuation of over 10 applications and over 30 features within apps
 saves hours in dev and support  €
 Detailed field analysis for new application
 „saved trouble“ from 4,500 customers caused by missing features
 Counting of real SQL server licenses in use
 saves €
22.04.2019 Data Beats Emotions 29
DATEV eG22.04.2019 Data Beats Emotions 30
© bluedesign / fotolia.com
DATEV eG
 Too many different reports requested
 Too many domain/application
specific reports
 Too much domain
specific know-how required
 Requested to support more data sources
like splunk, AppDynamics, and online
apps
Obstacles
22.04.2019 Data Beats Emotions 31
© gustavofrazao / fotolia.com
DATEV eG
Evolve from Guided Analytics…
22.04.2019 Data Beats Emotions 32
On-Prem
Statistics
Data
Program
Statistics
Add. Data
Warehouse
Statistics Team only
Producer
Consumers
POs
Standard Reports
DATEV eG
 Decentralize Analytics
 Open report generation for more users
 Supporting add-hoc SQL queries
using Hive 3 + LLAP
 Supporting Excel
remember: Excel is king (for BI)
Self-Service Analytics
22.04.2019 Data Beats Emotions 33
© vege / fotolia.com
DATEV eG
…to Self-Service Analytics
22.04.2019 Data Beats Emotions 34
On-Prem
Online
Statistics
Data
Source
A
Data Abstraction
Data Catalog
Reporting
Environment
Data Scientist
Power User
Producers
Consumers
Manager
Data Governance Process
Publishing Workflow
Program
Statistics
Add. Data
Warehouse
Source
B
Source
…
DATEV eG
New Challenges
 Data Governance / Guidance for KPIs
 Teaching
 Data literacy
22.04.2019 Data Beats Emotions Seite 35
© Neyro/ fotolia.com
DATEV eG
Self-Service Analytics PoC Example
 Exception Path Analysis
using Kibana + Elasticsearch
22.04.2019 Data Beats Emotions 36
previous
DATEV eG
Self-Service Analytics PoC Example
 Exception Path Analysis
using Kibana + Elasticsearch
22.04.2019 Data Beats Emotions 37
previous
DATEV eG
Self-Service Analytics PoC Example
22.04.2019 Data Beats Emotions 38
 Number of Exceptions on DVD after Release using Qlik Sense
Example Data only
DATEV eG
Self-Service Analytics PoC Example
22.04.2019 Data Beats Emotions 39
 Top 5 Exceptions by DVDs using Qlik Sense
Example Data only
DATEV eG22.04.2019 Data Beats Emotions 40
© abramsdesign / fotolia.com
DATEV eG22.04.2019 Data Beats Emotions 41
© Brian Jackson / fotolia.com
DATEV eG22.04.2019 Data Beats Emotions 42
DATEV eG

More Related Content

What's hot

Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data Virtualization
Denodo
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Denodo
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
DataWorks Summit
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
Databricks
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
Denodo
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
Denodo
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
Thoughtworks
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
Databricks
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
DataWorks Summit
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Denodo
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Denodo
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...
DataWorks Summit
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
DataWorks Summit/Hadoop Summit
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
DataWorks Summit/Hadoop Summit
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
Mark Kerzner
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data Virtualization
Denodo
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 

What's hot (20)

Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data Virtualization
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data Virtualization
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 

Similar to Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
Global Business Solutions SME
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
BigDataExpo
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
DataWorks Summit
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
Hoi Lan Leong
 
Production & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to SuccessProduction & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to Success
NeoFirma
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
Denodo
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
DataKitchen
 
BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?' BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Education and Technology Foundation
 
SEAGATE
SEAGATESEAGATE
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Denodo
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
MongoDB
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
TIBCO_Software
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
 
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Urjanet
 

Similar to Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions (20)

Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
Production & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to SuccessProduction & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to Success
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?' BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?'
 
SEAGATE
SEAGATESEAGATE
SEAGATE
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

  • 1. DATEV eG Data Beats Emotions How DATEV Generates Business Value with Data-driven Decisions matthias.mueller@datev.de / @bicaluv
  • 2. DATEV eG Agenda  About the data  Processing  Business values  What’s next 22.04.2019 Data Beats Emotions 2
  • 3. DATEV eG DATEV – Company  Founded in 1966 as a co-operative organization  Main business is software for tax consulting, accounting, and law business  Our customers are mostly tax consultants and their clients  B2B market  7,500 employees (1,800 devs)  1 billion euro annual revenue in 2018  Typical tax consultant has around 10 employees. Few up to 1,500  40,000 co-operative members  160,000 companies using our software on behalf of their tax consultants 22.04.2019 Data Beats Emotions 3
  • 4. DATEV eG DATEV – Software on-premises running at customers site we do have data center applications, but not focused in this talk MS Windows based, incl. MS SQL Server 250 different applications 22.04.2019 Data Beats Emotions 4
  • 5. DATEV eG About the data  Based on in-memory logs generated for every on-prem application  Logs include  Clicks / Tracked User Interactions  Exceptions  Performance data  + metadata: OS, screen resolution, touch device, UI themes, no IP ! 22.04.2019 Data Beats Emotions 5
  • 6. DATEV eG About the data – General Data Protection Regulation Compliance  Personal data tracking requires agreement / consent management  Dialog shown to each user  no agreement, no tracking data  2 data schemas from client  actual data with GUID (Globally Unique Identifier, generated at client site)  agreement with GUID and User ID (for data warehouse joins)  Essential for handling right to be forgotten without requiring big data deletes 22.04.2019 Data Beats Emotions 6 { GUID, [data] } Click { GUID, UserID, [ true | false ] } Agreement
  • 7. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 7 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  • 8. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 8 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  • 9. DATEV eG About the data – Current Figures 22.04.2019 Data Beats Emotions 9 1 2 Agreements Consent Rate Startup of every Application 60 GB Logfiles per day (decompressed) 200million events per day (6,000/s) Components with 1,250 dynamic trace points 30 Total Client Events in Hadoop Cluster billion Unique User per day 200,000 Approx. 50 83%
  • 10. DATEV eG22.04.2019 Data Beats Emotions 10© Galusha Photography / fotolia.com
  • 11. DATEV eG In early 2015 we tried using online tracking tool 22.04.2019 Data Beats Emotions 11 © kirill_makarov / fotolia.com
  • 12. DATEV eG …starting in 2016 we experimented with 22.04.2019 Data Beats Emotions 12 © Henry Schmitt / fotolia.com
  • 13. DATEV eG …at the end of 2016 it settled down to be a more mature approach 22.04.2019 Data Beats Emotions 13 © joerg dirmeitis / fotolia.com
  • 14. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 14 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DEV Team of 7, including Devs, Data Scientist, Master of Ceremony, Requirements Engineer, and Product Owner OP Team of 2, operate the data center platforms DMZ
  • 15. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 15 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DMZ
  • 16. DATEV eG Actual Processing – Client  Continuous monitoring of client logs using ring buffer (remember: no individual agreement, no data)  on-premises clients send data every 3 hours (random distribution of sending time based on installation time)   continuous flow of data  BTW: We do dogfooding for client site data tracking, like buffer overruns, CPU, and memory usage 22.04.2019 Data Beats Emotions 16 HTTPS
  • 17. DATEV eG Actual Processing – Ingestion  Proprietary protocol to get from ISA to Cluster (DMZ)  Transfers incoming unsecure data to secure data center every 5 minutes   continuous flow of data to Hadoop Edge Node 22.04.2019 Data Beats Emotions 17
  • 18. DATEV eG Actual Processing – Ingestion  CRON & Batch: Once every night, data gets processed  Decompress  Filter (valid timestamp, test data)  Store and upload to HDFS in file chunks of 100 MB 22.04.2019 Data Beats Emotions 18
  • 19. DATEV eG Actual Processing – ETL Phase 1  CRON & Batch: Once every night, data gets processed  Start Spark job for agreement data  Start Spark jobs for hot data (window of 5 days) – De-duplicate data – Add delayed received data – Generate ORC files with data partitioned by day – Optimize partitions (e.g. delete outdated partitions due to retention policy) – Automated check of internal compliance regulations (it is not allowed that data contains customer confidential data) 22.04.2019 Data Beats Emotions 19
  • 20. DATEV eG Actual Processing – ETL Phase 2  Start Spark jobs to update data for reports  Generate ORC files for Star Schema (facts and dimensions)  Aggregations and calculations for reporting  Update files of report tool incrementally by reading ORC files using Hive ODBC (external tables) 22.04.2019 Data Beats Emotions 20
  • 21. DATEV eG HDP 2.6.5 Production Cluster 22.04.2019 Data Beats Emotions 21 Data Center Rack 1 Rack 2 Edge Master Workers …0001 …0003 …000 …0015 …0016 …0002 …0004 …0006 …0013 …0014 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 Edge Master Workers each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7
  • 22. DATEV eG Reporting 22.04.2019 Data Beats Emotions 22 Guided Analytics using © Saklakova / fotolia.com
  • 23. DATEV eG Actual Processing - Reporting  UX (including click counts)  Exceptions  Performance 22.04.2019 Data Beats Emotions 23 22 different default reports
  • 24. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 24 Top 10 Screen Resolution
  • 25. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 25 Top 10 Screen Resolution by Target Market Clients / Companies Tax Consultants Data Warehouse Other Lawyers
  • 26. DATEV eG 0 5,000 10,000 15,000 20,000 25,000 1 2 3 4 5 6 7 Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 26 Program Usage by Target Market Clients / Companies Tax Consultants Data Warehouse Member CountMember Type Education Institutes Public Sector Lecturer Other
  • 27. DATEV eG22.04.2019 Data Beats Emotions 27 © artiemedvedev / fotolia.com
  • 28. DATEV eG Business Values  UX, e.g. optimized screen resolution  Check „Payed Beta Testers“ actual program usage  A/B comparison (usage and performance)  Proof of sales license bundles  Performance anomaly detection, e.g. based on OSs 22.04.2019 Data Beats Emotions 28
  • 29. DATEV eG Business Values  Discontinuation of over 10 applications and over 30 features within apps  saves hours in dev and support  €  Detailed field analysis for new application  „saved trouble“ from 4,500 customers caused by missing features  Counting of real SQL server licenses in use  saves € 22.04.2019 Data Beats Emotions 29
  • 30. DATEV eG22.04.2019 Data Beats Emotions 30 © bluedesign / fotolia.com
  • 31. DATEV eG  Too many different reports requested  Too many domain/application specific reports  Too much domain specific know-how required  Requested to support more data sources like splunk, AppDynamics, and online apps Obstacles 22.04.2019 Data Beats Emotions 31 © gustavofrazao / fotolia.com
  • 32. DATEV eG Evolve from Guided Analytics… 22.04.2019 Data Beats Emotions 32 On-Prem Statistics Data Program Statistics Add. Data Warehouse Statistics Team only Producer Consumers POs Standard Reports
  • 33. DATEV eG  Decentralize Analytics  Open report generation for more users  Supporting add-hoc SQL queries using Hive 3 + LLAP  Supporting Excel remember: Excel is king (for BI) Self-Service Analytics 22.04.2019 Data Beats Emotions 33 © vege / fotolia.com
  • 34. DATEV eG …to Self-Service Analytics 22.04.2019 Data Beats Emotions 34 On-Prem Online Statistics Data Source A Data Abstraction Data Catalog Reporting Environment Data Scientist Power User Producers Consumers Manager Data Governance Process Publishing Workflow Program Statistics Add. Data Warehouse Source B Source …
  • 35. DATEV eG New Challenges  Data Governance / Guidance for KPIs  Teaching  Data literacy 22.04.2019 Data Beats Emotions Seite 35 © Neyro/ fotolia.com
  • 36. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 36 previous
  • 37. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 37 previous
  • 38. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 38  Number of Exceptions on DVD after Release using Qlik Sense Example Data only
  • 39. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 39  Top 5 Exceptions by DVDs using Qlik Sense Example Data only
  • 40. DATEV eG22.04.2019 Data Beats Emotions 40 © abramsdesign / fotolia.com
  • 41. DATEV eG22.04.2019 Data Beats Emotions 41 © Brian Jackson / fotolia.com
  • 42. DATEV eG22.04.2019 Data Beats Emotions 42