SlideShare a Scribd company logo
1 of 43
DATEV eG
Data Beats Emotions
How DATEV Generates Business Value with Data-driven Decisions
matthias.mueller@datev.de / @bicaluv
DATEV eG
Agenda
 About the data
 Processing
 Business values
 What’s next
22.04.2019 Data Beats Emotions 2
DATEV eG
DATEV – Company
 Founded in 1966 as a co-operative organization
 Main business is software for tax consulting, accounting, and law business
 Our customers are mostly tax consultants and their clients
 B2B market
 7,500 employees (1,800 devs)
 1 billion euro annual revenue in 2018
 Typical tax consultant has around 10 employees. Few up to 1,500
 40,000 co-operative members
 160,000 companies using our software on behalf of their tax consultants
22.04.2019 Data Beats Emotions 3
DATEV eG
DATEV – Software
on-premises running at customers site
we do have data center applications, but not focused in this talk
MS Windows based, incl. MS SQL Server
250 different applications
22.04.2019 Data Beats Emotions 4
DATEV eG
About the data
 Based on in-memory logs generated for every on-prem application
 Logs include
 Clicks / Tracked User Interactions
 Exceptions
 Performance data
 + metadata: OS, screen resolution, touch device, UI themes, no IP !
22.04.2019 Data Beats Emotions 5
DATEV eG
About the data –
General Data Protection Regulation Compliance
 Personal data tracking requires agreement / consent management
 Dialog shown to each user  no agreement, no tracking data
 2 data schemas from client
 actual data with GUID (Globally Unique Identifier, generated at client site)
 agreement with GUID and User ID (for data warehouse joins)
 Essential for handling right to be forgotten without requiring big data deletes
22.04.2019 Data Beats Emotions 6
{ GUID, [data] }
Click
{ GUID, UserID, [ true | false ] }
Agreement
DATEV eG
About the data – GDPR Compliance
22.04.2019 Data Beats Emotions 7
{ A1, „File.Open“ }
Click1
{ A1, User42, true }
Agreement
{ A1, „File.Quit“ }
Click2
Big Data World
…
DATEV eG
About the data – GDPR Compliance
22.04.2019 Data Beats Emotions 8
{ A1, „File.Open“ }
Click1
{ A1, User42, true }
Agreement
{ A1, „File.Quit“ }
Click2
Big Data World
…
DATEV eG
About the data – Current Figures
22.04.2019 Data Beats Emotions 9
1 2
Agreements
Consent Rate
Startup
of every
Application
60 GB
Logfiles
per day
(decompressed)
200million
events
per day
(6,000/s)
Components
with 1,250
dynamic trace
points 30
Total Client Events in
Hadoop Cluster
billion
Unique User per day
200,000
Approx.
50
83%
DATEV eG22.04.2019 Data Beats Emotions 10© Galusha Photography / fotolia.com
DATEV eG
In early 2015 we tried using online tracking tool
22.04.2019 Data Beats Emotions 11
© kirill_makarov / fotolia.com
DATEV eG
…starting in 2016 we
experimented with
22.04.2019 Data Beats Emotions 12
© Henry Schmitt / fotolia.com
DATEV eG
…at the end of 2016 it settled down
to be a more mature approach
22.04.2019 Data Beats Emotions 13
© joerg dirmeitis / fotolia.com
DATEV eG
Actual Processing
22.04.2019 Data Beats Emotions 14
Data Center
HTTPS
Hadoop ClusterOn-premises ReportingInternet Tracking Server
ISA
DEV
Team of 7, including Devs, Data Scientist, Master of Ceremony, Requirements Engineer, and Product Owner
OP
Team of 2, operate the data center platforms
DMZ
DATEV eG
Actual Processing
22.04.2019 Data Beats Emotions 15
Data Center
HTTPS
Hadoop ClusterOn-premises ReportingInternet Tracking Server
ISA DMZ
DATEV eG
Actual Processing – Client
 Continuous monitoring of client logs using ring buffer
(remember: no individual agreement, no data)
 on-premises clients send data every 3 hours
(random distribution of sending time based on installation time)
  continuous flow of data
 BTW: We do dogfooding for client site data tracking, like buffer overruns, CPU, and
memory usage
22.04.2019 Data Beats Emotions 16
HTTPS
DATEV eG
Actual Processing – Ingestion
 Proprietary protocol to get from ISA to Cluster (DMZ)
 Transfers incoming unsecure data to secure data center every 5 minutes
  continuous flow of data to Hadoop Edge Node
22.04.2019 Data Beats Emotions 17
DATEV eG
Actual Processing – Ingestion
 CRON & Batch: Once every night, data gets processed
 Decompress
 Filter (valid timestamp, test data)
 Store and upload to HDFS in file chunks of 100 MB
22.04.2019 Data Beats Emotions 18
DATEV eG
Actual Processing – ETL Phase 1
 CRON & Batch: Once every night, data gets processed
 Start Spark job for agreement data
 Start Spark jobs for hot data (window of 5 days)
– De-duplicate data
– Add delayed received data
– Generate ORC files with data partitioned by day
– Optimize partitions (e.g. delete outdated partitions due to retention policy)
– Automated check of internal compliance regulations
(it is not allowed that data contains customer confidential data)
22.04.2019 Data Beats Emotions 19
DATEV eG
Actual Processing – ETL Phase 2
 Start Spark jobs to update data for reports
 Generate ORC files for Star Schema (facts and dimensions)
 Aggregations and calculations for reporting
 Update files of report tool incrementally by reading ORC files using Hive ODBC
(external tables)
22.04.2019 Data Beats Emotions 20
DATEV eG
HDP 2.6.5 Production Cluster
22.04.2019 Data Beats Emotions 21
Data Center
Rack 1 Rack 2
Edge
Master
Workers
…0001 …0003 …000 …0015 …0016 …0002 …0004 …0006 …0013 …0014
each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7
each
48 Cores, 512 GB RAM,
1 TB HDD, RHEL 7
each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7
Edge
Master
Workers
each
48 Cores, 512 GB RAM,
1 TB HDD, RHEL 7
DATEV eG
Reporting
22.04.2019 Data Beats Emotions 22
Guided Analytics using
© Saklakova / fotolia.com
DATEV eG
Actual Processing - Reporting
 UX (including click counts)
 Exceptions
 Performance
22.04.2019 Data Beats Emotions 23
22 different default reports
DATEV eG
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 24
Top 10 Screen Resolution
DATEV eG
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 25
Top 10 Screen Resolution by Target Market
Clients / Companies
Tax Consultants
Data Warehouse
Other
Lawyers
DATEV eG
0
5,000
10,000
15,000
20,000
25,000
1 2 3 4 5 6 7
Actual Processing – Reporting Example
22.04.2019 Data Beats Emotions 26
Program Usage by Target Market
Clients / Companies
Tax Consultants
Data Warehouse
Member CountMember Type
Education Institutes
Public Sector
Lecturer
Other
DATEV eG22.04.2019 Data Beats Emotions 27
© artiemedvedev / fotolia.com
DATEV eG
Business Values
 UX, e.g. optimized screen resolution
 Check „Payed Beta Testers“ actual program usage
 A/B comparison (usage and performance)
 Proof of sales license bundles
 Performance anomaly detection, e.g. based on OSs
22.04.2019 Data Beats Emotions 28
DATEV eG
Business Values
 Discontinuation of over 10 applications and over 30 features within apps
 saves hours in dev and support  €
 Detailed field analysis for new application
 „saved trouble“ from 4,500 customers caused by missing features
 Counting of real SQL server licenses in use
 saves €
22.04.2019 Data Beats Emotions 29
DATEV eG22.04.2019 Data Beats Emotions 30
© bluedesign / fotolia.com
DATEV eG
 Too many different reports requested
 Too many domain/application
specific reports
 Too much domain
specific know-how required
 Requested to support more data sources
like splunk, AppDynamics, and online
apps
Obstacles
22.04.2019 Data Beats Emotions 31
© gustavofrazao / fotolia.com
DATEV eG
Evolve from Guided Analytics…
22.04.2019 Data Beats Emotions 32
On-Prem
Statistics
Data
Program
Statistics
Add. Data
Warehouse
Statistics Team only
Producer
Consumers
POs
Standard Reports
DATEV eG
 Decentralize Analytics
 Open report generation for more users
 Supporting add-hoc SQL queries
using Hive 3 + LLAP
 Supporting Excel
remember: Excel is king (for BI)
Self-Service Analytics
22.04.2019 Data Beats Emotions 33
© vege / fotolia.com
DATEV eG
…to Self-Service Analytics
22.04.2019 Data Beats Emotions 34
On-Prem
Online
Statistics
Data
Source
A
Data Abstraction
Data Catalog
Reporting
Environment
Data Scientist
Power User
Producers
Consumers
Manager
Data Governance Process
Publishing Workflow
Program
Statistics
Add. Data
Warehouse
Source
B
Source
…
DATEV eG
New Challenges
 Data Governance / Guidance for KPIs
 Teaching
 Data literacy
22.04.2019 Data Beats Emotions Seite 35
© Neyro/ fotolia.com
DATEV eG
Self-Service Analytics PoC Example
 Exception Path Analysis
using Kibana + Elasticsearch
22.04.2019 Data Beats Emotions 36
previous
DATEV eG
Self-Service Analytics PoC Example
 Exception Path Analysis
using Kibana + Elasticsearch
22.04.2019 Data Beats Emotions 37
previous
DATEV eG
Self-Service Analytics PoC Example
22.04.2019 Data Beats Emotions 38
 Number of Exceptions on DVD after Release using Qlik Sense
Example Data only
DATEV eG
Self-Service Analytics PoC Example
22.04.2019 Data Beats Emotions 39
 Top 5 Exceptions by DVDs using Qlik Sense
Example Data only
DATEV eG22.04.2019 Data Beats Emotions 40
© abramsdesign / fotolia.com
DATEV eG22.04.2019 Data Beats Emotions 41
© Brian Jackson / fotolia.com
DATEV eG22.04.2019 Data Beats Emotions 42
DATEV eG

More Related Content

What's hot

Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationDenodo
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Denodo
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control TowerDatabricks
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Denodo
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThoughtworks
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationDataWorks Summit
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsDenodo
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDenodo
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...DataWorks Summit
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationDenodo
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyDatabricks
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 

What's hot (20)

Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data Virtualization
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
The Curse of the Data Lake Monster
The Curse of the Data Lake MonsterThe Curse of the Data Lake Monster
The Curse of the Data Lake Monster
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
 
San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...San Antonio’s electric utility making big data analytics the business of the ...
San Antonio’s electric utility making big data analytics the business of the ...
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data Virtualization
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 

Similar to Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBigDataExpo
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMHoi Lan Leong
 
Production & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to SuccessProduction & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to SuccessNeoFirma
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Denodo
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionMongoDB
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessTIBCO_Software
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...Urjanet
 

Similar to Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions (20)

Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
Production & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to SuccessProduction & Well Work Reporting: 7 Keys to Success
Production & Well Work Reporting: 7 Keys to Success
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?' BDPA Cincinnati: 'Big Data - Friend or Foe?'
BDPA Cincinnati: 'Big Data - Friend or Foe?'
 
SEAGATE
SEAGATESEAGATE
SEAGATE
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
Bridging the Data Divide: Using Automation to Unify Data Sources for Sustaina...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

  • 1. DATEV eG Data Beats Emotions How DATEV Generates Business Value with Data-driven Decisions matthias.mueller@datev.de / @bicaluv
  • 2. DATEV eG Agenda  About the data  Processing  Business values  What’s next 22.04.2019 Data Beats Emotions 2
  • 3. DATEV eG DATEV – Company  Founded in 1966 as a co-operative organization  Main business is software for tax consulting, accounting, and law business  Our customers are mostly tax consultants and their clients  B2B market  7,500 employees (1,800 devs)  1 billion euro annual revenue in 2018  Typical tax consultant has around 10 employees. Few up to 1,500  40,000 co-operative members  160,000 companies using our software on behalf of their tax consultants 22.04.2019 Data Beats Emotions 3
  • 4. DATEV eG DATEV – Software on-premises running at customers site we do have data center applications, but not focused in this talk MS Windows based, incl. MS SQL Server 250 different applications 22.04.2019 Data Beats Emotions 4
  • 5. DATEV eG About the data  Based on in-memory logs generated for every on-prem application  Logs include  Clicks / Tracked User Interactions  Exceptions  Performance data  + metadata: OS, screen resolution, touch device, UI themes, no IP ! 22.04.2019 Data Beats Emotions 5
  • 6. DATEV eG About the data – General Data Protection Regulation Compliance  Personal data tracking requires agreement / consent management  Dialog shown to each user  no agreement, no tracking data  2 data schemas from client  actual data with GUID (Globally Unique Identifier, generated at client site)  agreement with GUID and User ID (for data warehouse joins)  Essential for handling right to be forgotten without requiring big data deletes 22.04.2019 Data Beats Emotions 6 { GUID, [data] } Click { GUID, UserID, [ true | false ] } Agreement
  • 7. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 7 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  • 8. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 8 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  • 9. DATEV eG About the data – Current Figures 22.04.2019 Data Beats Emotions 9 1 2 Agreements Consent Rate Startup of every Application 60 GB Logfiles per day (decompressed) 200million events per day (6,000/s) Components with 1,250 dynamic trace points 30 Total Client Events in Hadoop Cluster billion Unique User per day 200,000 Approx. 50 83%
  • 10. DATEV eG22.04.2019 Data Beats Emotions 10© Galusha Photography / fotolia.com
  • 11. DATEV eG In early 2015 we tried using online tracking tool 22.04.2019 Data Beats Emotions 11 © kirill_makarov / fotolia.com
  • 12. DATEV eG …starting in 2016 we experimented with 22.04.2019 Data Beats Emotions 12 © Henry Schmitt / fotolia.com
  • 13. DATEV eG …at the end of 2016 it settled down to be a more mature approach 22.04.2019 Data Beats Emotions 13 © joerg dirmeitis / fotolia.com
  • 14. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 14 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DEV Team of 7, including Devs, Data Scientist, Master of Ceremony, Requirements Engineer, and Product Owner OP Team of 2, operate the data center platforms DMZ
  • 15. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 15 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DMZ
  • 16. DATEV eG Actual Processing – Client  Continuous monitoring of client logs using ring buffer (remember: no individual agreement, no data)  on-premises clients send data every 3 hours (random distribution of sending time based on installation time)   continuous flow of data  BTW: We do dogfooding for client site data tracking, like buffer overruns, CPU, and memory usage 22.04.2019 Data Beats Emotions 16 HTTPS
  • 17. DATEV eG Actual Processing – Ingestion  Proprietary protocol to get from ISA to Cluster (DMZ)  Transfers incoming unsecure data to secure data center every 5 minutes   continuous flow of data to Hadoop Edge Node 22.04.2019 Data Beats Emotions 17
  • 18. DATEV eG Actual Processing – Ingestion  CRON & Batch: Once every night, data gets processed  Decompress  Filter (valid timestamp, test data)  Store and upload to HDFS in file chunks of 100 MB 22.04.2019 Data Beats Emotions 18
  • 19. DATEV eG Actual Processing – ETL Phase 1  CRON & Batch: Once every night, data gets processed  Start Spark job for agreement data  Start Spark jobs for hot data (window of 5 days) – De-duplicate data – Add delayed received data – Generate ORC files with data partitioned by day – Optimize partitions (e.g. delete outdated partitions due to retention policy) – Automated check of internal compliance regulations (it is not allowed that data contains customer confidential data) 22.04.2019 Data Beats Emotions 19
  • 20. DATEV eG Actual Processing – ETL Phase 2  Start Spark jobs to update data for reports  Generate ORC files for Star Schema (facts and dimensions)  Aggregations and calculations for reporting  Update files of report tool incrementally by reading ORC files using Hive ODBC (external tables) 22.04.2019 Data Beats Emotions 20
  • 21. DATEV eG HDP 2.6.5 Production Cluster 22.04.2019 Data Beats Emotions 21 Data Center Rack 1 Rack 2 Edge Master Workers …0001 …0003 …000 …0015 …0016 …0002 …0004 …0006 …0013 …0014 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 Edge Master Workers each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7
  • 22. DATEV eG Reporting 22.04.2019 Data Beats Emotions 22 Guided Analytics using © Saklakova / fotolia.com
  • 23. DATEV eG Actual Processing - Reporting  UX (including click counts)  Exceptions  Performance 22.04.2019 Data Beats Emotions 23 22 different default reports
  • 24. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 24 Top 10 Screen Resolution
  • 25. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 25 Top 10 Screen Resolution by Target Market Clients / Companies Tax Consultants Data Warehouse Other Lawyers
  • 26. DATEV eG 0 5,000 10,000 15,000 20,000 25,000 1 2 3 4 5 6 7 Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 26 Program Usage by Target Market Clients / Companies Tax Consultants Data Warehouse Member CountMember Type Education Institutes Public Sector Lecturer Other
  • 27. DATEV eG22.04.2019 Data Beats Emotions 27 © artiemedvedev / fotolia.com
  • 28. DATEV eG Business Values  UX, e.g. optimized screen resolution  Check „Payed Beta Testers“ actual program usage  A/B comparison (usage and performance)  Proof of sales license bundles  Performance anomaly detection, e.g. based on OSs 22.04.2019 Data Beats Emotions 28
  • 29. DATEV eG Business Values  Discontinuation of over 10 applications and over 30 features within apps  saves hours in dev and support  €  Detailed field analysis for new application  „saved trouble“ from 4,500 customers caused by missing features  Counting of real SQL server licenses in use  saves € 22.04.2019 Data Beats Emotions 29
  • 30. DATEV eG22.04.2019 Data Beats Emotions 30 © bluedesign / fotolia.com
  • 31. DATEV eG  Too many different reports requested  Too many domain/application specific reports  Too much domain specific know-how required  Requested to support more data sources like splunk, AppDynamics, and online apps Obstacles 22.04.2019 Data Beats Emotions 31 © gustavofrazao / fotolia.com
  • 32. DATEV eG Evolve from Guided Analytics… 22.04.2019 Data Beats Emotions 32 On-Prem Statistics Data Program Statistics Add. Data Warehouse Statistics Team only Producer Consumers POs Standard Reports
  • 33. DATEV eG  Decentralize Analytics  Open report generation for more users  Supporting add-hoc SQL queries using Hive 3 + LLAP  Supporting Excel remember: Excel is king (for BI) Self-Service Analytics 22.04.2019 Data Beats Emotions 33 © vege / fotolia.com
  • 34. DATEV eG …to Self-Service Analytics 22.04.2019 Data Beats Emotions 34 On-Prem Online Statistics Data Source A Data Abstraction Data Catalog Reporting Environment Data Scientist Power User Producers Consumers Manager Data Governance Process Publishing Workflow Program Statistics Add. Data Warehouse Source B Source …
  • 35. DATEV eG New Challenges  Data Governance / Guidance for KPIs  Teaching  Data literacy 22.04.2019 Data Beats Emotions Seite 35 © Neyro/ fotolia.com
  • 36. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 36 previous
  • 37. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 37 previous
  • 38. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 38  Number of Exceptions on DVD after Release using Qlik Sense Example Data only
  • 39. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 39  Top 5 Exceptions by DVDs using Qlik Sense Example Data only
  • 40. DATEV eG22.04.2019 Data Beats Emotions 40 © abramsdesign / fotolia.com
  • 41. DATEV eG22.04.2019 Data Beats Emotions 41 © Brian Jackson / fotolia.com
  • 42. DATEV eG22.04.2019 Data Beats Emotions 42