SlideShare a Scribd company logo
DOES MORE DATA MEAN BETTER DECISION MAKING?
(Assessing Data Quality with a Unified-Log and a bit of Stream-
Processing)
Scott Krueger
Data Architect
ATALE OF DATA-DRIVEN DECISION MAKING - ACT ONE
Me: "@StreamEngine - how many user sessions have we had from
Illinois in the last 4 hours from those #ChicagoRocks tweets?"
StreamEngine:" In the last 4 hours, we have had 43, 578 new user sessions
from Ohio as a result of the #ChicagoRocks tweets"
Me: "How confident are you about that?" (I need to make a
quick call here)
StreamEngine: "Sorry, what was that?"
WHY ARE WETALKING ABOUTTHIS
Analytical Sciences
Data Quality
Business and Culture
Tech
Talks at Big Data Week, London 2016
2020
MOTIVATIONS
MOTIVATIONS
http://lemonly.com/work/the-cost-
of-bad-data/
MOTIVATIONS
"From now on, our cars will more deeply understand that
buses (and other large vehicles) are less likely to yield to
us than other types of vehicles, and we hope to handle
situations like this more gracefully in the future."
http://www.bbc.co.uk/news/technology-35692845
SO WHAT'S GOING ON HERE? WE ARE CREATORS OF POOR QUALITY DATA
Machines (we
make machines
that consume /
create data)
Software (we
create software
that consumes /
creates data sets)
photo: Faruk Ates, https://flic.kr/p/stxXK
COMMON DATA PROBLEMS
https://github.com/Quartz/bad-data-guide
Pillar 1: Data Integrity
Data completeness - is it there? is it in tact? are the ‘required-to-be-of-value’ fields
there
Data Interpretation - what is that thing? what does ‘cost’ mean?
Data change - we don't use this anymore so I’m not writing it anymore. Oh, you’re still
reading it?
Pillar 2: Data validity - what's in there?
Values make sense?
Values expected?
Data presence* - are the messages making it? Are there as many as there should be
when data was created? Is this an expected volume?
THE 2 (OR SO) PILLARS OF DATA QUALITY
http://www.newyorkinternationallimousines.com/
DATA INTEGRITY
A definition of what the data is so it can be turned into a meaningful
piece or set of information
Varies with approach and ‘structure’ of data
Event Schemas (not to be confused with the relational DB term)
Examples: protocol buffers, thrift, avro
DATA INTEGRITY:THE GREATTRADE-OFF
Somewhere between left and right something has to prepare data for
usage elsewhere.There is cost associated with every position.
Data-In Data-Out
Schema-on-Write Schema-on-Read
Schema-Inbetween
ATALE OF DATA-DRIVEN DECISION MAKING - ACTTWO
Me: "@StreamEngine - I would like to measure how effective all of our
data-driven decision making is. I need a measure of quality. I think you
can help me with this."
StreamEngine: "Are you from the future?"
Me: "I'm not from it, but I'm thinking about it..."
WHAT ARE WE DOING ABOUT IT?
This requires a brief understanding of our 'unified log' approach at
skyscanner
SKYSCANNER EVENT DATA PLATFORM
EXPLOITTHE PLATFORM - MAKE USE OF WHAT
YOU HAVE
DATA INTEGRITY (SCHEMAVALIDATION)
Data definition - a message that doesn't fit throws an exception
Try...Catch...Log To SchemaValidation Failure “stream/topic” with message
SchemaValidation
DATAVALIDATION: EVERYONE PLAYS A PART
Everywhere between left and right everything has a data validation
opportunity
Data-In Data-Out
Validation-on-Write Validation-on-Read
Validation-Inbetween
DataValidation ReferenceYAML configuration
DataValidation Flow
WHAT DOES IT ALL LOOK LIKE?
alert!
alert!
TO IMPROVE QUALITY….
… ISTO CLOSETHE LOOP
EVERYONE PLAYS A PART
A shared repository for event structure and validation rules
Any service that logs events runs automated tests that use this repository
A generic stream service that assesses data quality and gives the heads up to consumers
Data consumers who find new data quality mishaps commit back to the repository
IFYOU CAN'T MEASURE IT HOW DOYOU KNOW
YOU ARE IMPROVINGTHINGS?
Quality of decision making = 100 -
(((# of data issues detected + recent commits for improved detection)
/ # high quality events logged)))*100)
example: 99.8 %
TIPS FOR IMPROVEMENT
Master Data Management
Metadata Management
Handling Data Change
Culture
MASTER DATA MANAGEMENT
(“ONE SOURCE OF REFERENCE/LOOKUP DATA”)
Simple rules to maintain consistency of reference data
use of enums/constants in schemas for reference data sets you
don't provide in your systems
authoritative data sources (your internal data services; industry
standard sets e.g. "IATA" for travel, ISO geography/timezones etc.)
Bring this ref data as close to the processing as possible
API, csv / json, tables, trans logs -> Unified Log Topic
METADATA MANAGEMENT
(DATA PROVENANCE AND OTHER NICETIES)
Data Debugging
Data flow measurements - how long did it take for my message to go through this pipeline?
Historical records - transparency for everyone (you the business operator, and you the customer)
Governance and regulation - data quality laws? http://www.forbes.com/sites/forbestechcouncil/
2016/04/29/how-companies-can-leverage-real-time-platforms-and-metadata-to-improve-
healthcare-delivery/2/#53fbea89480b
+ float device_diagonal_screen_size = 19;
+ float device_diagonal_screen_size = 19 [deprecated=true];
+ DisplayMeasurement diagonal_screen_size = 26;
DATA CHANGE:WE DON'T ALWAYS GET IT RIGHT
0
5
Time
Event Definition Changes
T1
T2
T3
+ // float device_diagonal_screen_size = 19 [deprecated=true];
+ DisplayMeasurement diagonal_screen_size = 26;
2 rules:
1. Maintain
backwards
compatibility.
2. Rebuild.
DATA QUALITY CULTURE
photo: scott krueger
WHAT ARE YOU GOINGTO DO ABOUT IT?
Understand the causes and details of data problems in your services
Unify and simplify - one source of truth for everything: reference data,
reports, archive, formulae, validation rules, data definitions, metdadata
Start measuring - this is your baseline and allows you to measure
confidence in decision making
Work (or evolve) your tech
Fix the system, stop moaning about it
It’s never too late
ATALE OF DATA-DRIVEN DECISION MAKING - ACTTHREE -THE FINALE
Me: "@StreamEngine - give me some decision quality numbers!"
StreamEngine: "Right now, decision quality is at 96.05%. This
time last week it was 92.4%. Well done! In 1 week sales are up 2
% and this is positively correlated to the decisions you made
with this data. Would you like me to predict sales uplift over the
next month if you improve decision quality by 1 %?
Me: "You bet I would..."
THANKS FOR LISTENING

More Related Content

What's hot

Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
 
First Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud PlatformFirst Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud Platform
confluent
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
Petabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise StackPetabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise Stack
DataStax Academy
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
SingleStore
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
Databricks
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Codemotion
 
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
Bank of China (HK) Tech Talk 1: Dive Into Apache KafkaBank of China (HK) Tech Talk 1: Dive Into Apache Kafka
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
confluent
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
confluent
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
SingleStore
 
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsImplementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Marco Parenzan
 
Integrating Web and Business Data
Integrating Web and Business DataIntegrating Web and Business Data
Integrating Web and Business Data
Safe Software
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
SingleStore
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
SingleStore
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
confluent
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
confluent
 

What's hot (20)

Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
First Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud PlatformFirst Steps with Apache Kafka on Google Cloud Platform
First Steps with Apache Kafka on Google Cloud Platform
 
Driving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive AnalyticsDriving the On-Demand Economy with Predictive Analytics
Driving the On-Demand Economy with Predictive Analytics
 
Petabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise StackPetabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise Stack
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
 
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
Bank of China (HK) Tech Talk 1: Dive Into Apache KafkaBank of China (HK) Tech Talk 1: Dive Into Apache Kafka
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Machines and the Magic of Fast Learning
Machines and the Magic of Fast LearningMachines and the Magic of Fast Learning
Machines and the Magic of Fast Learning
 
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsImplementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
 
Integrating Web and Business Data
Integrating Web and Business DataIntegrating Web and Business Data
Integrating Web and Business Data
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
 

Viewers also liked

BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
Big Data Week
 
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
Big Data Week
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
Big Data Week
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
Big Data Week
 
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven InnovatiomBDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
Big Data Week
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
Big Data Week
 
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
Big Data Week
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
Big Data Week
 
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
Big Data Week
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
Big Data Week
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
Big Data Week
 
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - Roland Major, Transport for London - Cloud Search SecuredBDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
Big Data Week
 
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Vojta Rocek, Trologic - Challenging Big DataBDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
Big Data Week
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Big Data Week
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
Big Data Week
 
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Rob Anderson, MapR - Big Data and Everyday LivesBDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
Big Data Week
 
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
Big Data Week
 
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
Big Data Week
 
ETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data IntegrationETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data Integration
David Fombella Pombal
 

Viewers also liked (20)

BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
 
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
 
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven InnovatiomBDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
 
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
 
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
 
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - Roland Major, Transport for London - Cloud Search SecuredBDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
 
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Vojta Rocek, Trologic - Challenging Big DataBDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
 
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Rob Anderson, MapR - Big Data and Everyday LivesBDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
 
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
 
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
 
ETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data IntegrationETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data Integration
 

Similar to BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decision Making?

There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
Web à Québec
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
jeffd00
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
CLARA CAMPROVIN
 
Data quality
Data qualityData quality
Data quality
sethnainaa
 
Data quality
Data qualityData quality
Data quality
drishtipuro1234
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
 
Crosswalk
CrosswalkCrosswalk
Crosswalk
GBX Summits
 
SaaS Vs On Premise BI
SaaS Vs On Premise BISaaS Vs On Premise BI
SaaS Vs On Premise BI
LCWynne
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Science Council of America
 
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdfAutomatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
4dalert
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
Bad customer data?
Bad customer data?Bad customer data?
Bad customer data?
DataValueTalk
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by Denodo
Justo Hidalgo
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Modern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the IndustryModern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the Industry
Tableau Software
 

Similar to BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decision Making? (20)

There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
 
Data quality
Data qualityData quality
Data quality
 
Data quality
Data qualityData quality
Data quality
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
Crosswalk
CrosswalkCrosswalk
Crosswalk
 
SaaS Vs On Premise BI
SaaS Vs On Premise BISaaS Vs On Premise BI
SaaS Vs On Premise BI
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdfAutomatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Bad customer data?
Bad customer data?Bad customer data?
Bad customer data?
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by Denodo
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Modern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the IndustryModern Manufacturing: 4 Ways Data is Transforming the Industry
Modern Manufacturing: 4 Ways Data is Transforming the Industry
 

More from Big Data Week

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
Big Data Week
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
Big Data Week
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Big Data Week
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
Big Data Week
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
Big Data Week
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
Big Data Week
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
Big Data Week
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
Big Data Week
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
Big Data Week
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
Big Data Week
 

More from Big Data Week (10)

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 

BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decision Making?

  • 1. DOES MORE DATA MEAN BETTER DECISION MAKING? (Assessing Data Quality with a Unified-Log and a bit of Stream- Processing) Scott Krueger Data Architect
  • 2. ATALE OF DATA-DRIVEN DECISION MAKING - ACT ONE Me: "@StreamEngine - how many user sessions have we had from Illinois in the last 4 hours from those #ChicagoRocks tweets?" StreamEngine:" In the last 4 hours, we have had 43, 578 new user sessions from Ohio as a result of the #ChicagoRocks tweets" Me: "How confident are you about that?" (I need to make a quick call here) StreamEngine: "Sorry, what was that?"
  • 3. WHY ARE WETALKING ABOUTTHIS Analytical Sciences Data Quality Business and Culture Tech Talks at Big Data Week, London 2016
  • 7. MOTIVATIONS "From now on, our cars will more deeply understand that buses (and other large vehicles) are less likely to yield to us than other types of vehicles, and we hope to handle situations like this more gracefully in the future." http://www.bbc.co.uk/news/technology-35692845
  • 8. SO WHAT'S GOING ON HERE? WE ARE CREATORS OF POOR QUALITY DATA Machines (we make machines that consume / create data) Software (we create software that consumes / creates data sets) photo: Faruk Ates, https://flic.kr/p/stxXK
  • 10. Pillar 1: Data Integrity Data completeness - is it there? is it in tact? are the ‘required-to-be-of-value’ fields there Data Interpretation - what is that thing? what does ‘cost’ mean? Data change - we don't use this anymore so I’m not writing it anymore. Oh, you’re still reading it? Pillar 2: Data validity - what's in there? Values make sense? Values expected? Data presence* - are the messages making it? Are there as many as there should be when data was created? Is this an expected volume? THE 2 (OR SO) PILLARS OF DATA QUALITY http://www.newyorkinternationallimousines.com/
  • 11. DATA INTEGRITY A definition of what the data is so it can be turned into a meaningful piece or set of information Varies with approach and ‘structure’ of data Event Schemas (not to be confused with the relational DB term) Examples: protocol buffers, thrift, avro
  • 12. DATA INTEGRITY:THE GREATTRADE-OFF Somewhere between left and right something has to prepare data for usage elsewhere.There is cost associated with every position. Data-In Data-Out Schema-on-Write Schema-on-Read Schema-Inbetween
  • 13. ATALE OF DATA-DRIVEN DECISION MAKING - ACTTWO Me: "@StreamEngine - I would like to measure how effective all of our data-driven decision making is. I need a measure of quality. I think you can help me with this." StreamEngine: "Are you from the future?" Me: "I'm not from it, but I'm thinking about it..."
  • 14. WHAT ARE WE DOING ABOUT IT? This requires a brief understanding of our 'unified log' approach at skyscanner
  • 16. EXPLOITTHE PLATFORM - MAKE USE OF WHAT YOU HAVE
  • 17. DATA INTEGRITY (SCHEMAVALIDATION) Data definition - a message that doesn't fit throws an exception Try...Catch...Log To SchemaValidation Failure “stream/topic” with message
  • 19. DATAVALIDATION: EVERYONE PLAYS A PART Everywhere between left and right everything has a data validation opportunity Data-In Data-Out Validation-on-Write Validation-on-Read Validation-Inbetween
  • 22. WHAT DOES IT ALL LOOK LIKE? alert! alert!
  • 23. TO IMPROVE QUALITY…. … ISTO CLOSETHE LOOP
  • 24. EVERYONE PLAYS A PART A shared repository for event structure and validation rules Any service that logs events runs automated tests that use this repository A generic stream service that assesses data quality and gives the heads up to consumers Data consumers who find new data quality mishaps commit back to the repository
  • 25. IFYOU CAN'T MEASURE IT HOW DOYOU KNOW YOU ARE IMPROVINGTHINGS? Quality of decision making = 100 - (((# of data issues detected + recent commits for improved detection) / # high quality events logged)))*100) example: 99.8 %
  • 26. TIPS FOR IMPROVEMENT Master Data Management Metadata Management Handling Data Change Culture
  • 27. MASTER DATA MANAGEMENT (“ONE SOURCE OF REFERENCE/LOOKUP DATA”) Simple rules to maintain consistency of reference data use of enums/constants in schemas for reference data sets you don't provide in your systems authoritative data sources (your internal data services; industry standard sets e.g. "IATA" for travel, ISO geography/timezones etc.) Bring this ref data as close to the processing as possible API, csv / json, tables, trans logs -> Unified Log Topic
  • 28. METADATA MANAGEMENT (DATA PROVENANCE AND OTHER NICETIES) Data Debugging Data flow measurements - how long did it take for my message to go through this pipeline? Historical records - transparency for everyone (you the business operator, and you the customer) Governance and regulation - data quality laws? http://www.forbes.com/sites/forbestechcouncil/ 2016/04/29/how-companies-can-leverage-real-time-platforms-and-metadata-to-improve- healthcare-delivery/2/#53fbea89480b
  • 29. + float device_diagonal_screen_size = 19; + float device_diagonal_screen_size = 19 [deprecated=true]; + DisplayMeasurement diagonal_screen_size = 26; DATA CHANGE:WE DON'T ALWAYS GET IT RIGHT 0 5 Time Event Definition Changes T1 T2 T3 + // float device_diagonal_screen_size = 19 [deprecated=true]; + DisplayMeasurement diagonal_screen_size = 26; 2 rules: 1. Maintain backwards compatibility. 2. Rebuild.
  • 31. WHAT ARE YOU GOINGTO DO ABOUT IT? Understand the causes and details of data problems in your services Unify and simplify - one source of truth for everything: reference data, reports, archive, formulae, validation rules, data definitions, metdadata Start measuring - this is your baseline and allows you to measure confidence in decision making Work (or evolve) your tech Fix the system, stop moaning about it It’s never too late
  • 32. ATALE OF DATA-DRIVEN DECISION MAKING - ACTTHREE -THE FINALE Me: "@StreamEngine - give me some decision quality numbers!" StreamEngine: "Right now, decision quality is at 96.05%. This time last week it was 92.4%. Well done! In 1 week sales are up 2 % and this is positively correlated to the decisions you made with this data. Would you like me to predict sales uplift over the next month if you improve decision quality by 1 %? Me: "You bet I would..."