SlideShare a Scribd company logo
1 of 23
Making Big Data work
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk
© the DataShed Limited 2015
intro
Who am I?
• For the last 3 years, the DataShed has been providing consultancy services to a vast array
of large clients. Our primary focus is ensuring that technology and analytical strategies
are truly aligned so that businesses can leverage the latest and greatest in technology to
model, mine and describe their data asset.
• We were working with Big Data technology before the term was coined, we have
experience delivering analytical systems driven by Petabyte data sets, and have designed,
implemented and supported one of the largest real-time data integration and predictive
analytics platforms in the aviation world.
• Our model is based on using a small number of exceptionally highly skilled individuals to
deliver disruptive and innovative solutions in an agile and delivery-focused manner.
© the DataShed Limited 2015
So what is ‘Big Data’?
© the DataShed Limited 2015
Why do Big Data projects fail?
Too many people think that Big Data is:
“The belief that the more data you have, the more insights and
answers will rise automatically from the pool of ones and zeros.”
Gill Press, Forbes.com
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Real-time data
© the DataShed Limited 2015
© the DataShed Limited 2015
© the DataShed Limited 2015
Continuous Integration Demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Little Big Data
© the DataShed Limited 2015
A problem closer to home…
• Every business needs to understand:
• Their potential customers and market
• Current customers
• Their products and sales
• How and when they engage prospects and customers
• Analytics and data are expensive
• Many of the mandatory elements are very similar for everyone
• The DataShed is Analytics as a Service and Single Customer View as a
Service.
© the DataShed Limited 2015
The deduplication problem…
• SME has 250,000 customers (two systems of record)
• To identify duplicates brute force approach: 31,249,875,000
comparisons
• Building a system to process a minimum of 100 clients a day…
• 3.1 trillion records to compare using > 10 different algorithms
• Traditional scale up approach would be expensive, and makes large
assumptions around blocking and partitioning rules
• A small data problem but a big data solution?
Title First Name Surname Address 1 Address 2 Address 3
Dr R J Smith Two Oaks 112 Old St. County Durham
Mrs Robyn Smith 112 Old Street Durham DH1 5YJ
© the DataShed Limited 2015
© the DataShed Limited 2015
The Shed demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
• ’Big Data’ challenges aren’t necessarily new, however much of the technology is
• Articulate and communicate – focus on distilling your problem down
• Incremental improvement not wholesale replacement
2. Apply appropriate tools
• Understand the economics as well as the technology
• New technologies need to be evaluated within the context of your problem scope
• New technologies are enablers not deliverables (#datalake)
• ’Big Data’ technology should be seen as complementary to existing technology
3. Automate everything
• Continuous integration to include all testing
• Containerise where possible
• Measure everything
© the DataShed Limited 2015
If you really want to get involved…
© the DataShed Limited 2015
Get your hands dirty
If you’re interested in learning more, we’ll be hosting a hands-on labs
event in the near future.
Send your details to:
Email: hello@thedatashed.co.uk
Twitter: @thedatashed
© the DataShed Limited 2015
Any questions?
© the DataShed Limited 2015
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk

More Related Content

What's hot

Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Denodo
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive  and Prescriptive Analytics Expert Session Webinar Predictive  and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar ibi
 
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...Patrick Van Renterghem
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil Richard Titus
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive eraIBM Analytics
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)Denodo
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansenBigDataExpo
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Denodo
 
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...Patrick Van Renterghem
 
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Denodo
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ? Dataiku
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald RootBigDataExpo
 
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ..."Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...Dataconomy Media
 
datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake  datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake Dataconomy Media
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Denodo
 

What's hot (20)

Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive  and Prescriptive Analytics Expert Session Webinar Predictive  and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
 
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
 
Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald Root
 
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ..."Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
 
datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake  datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 

Viewers also liked

Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Barry Devlin
 
Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBarry Devlin
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 

Viewers also liked (6)

Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
 
Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop Tour
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 

Similar to Making Big Data Work: 3 Steps

Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Formulatedby
 
Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model Denodo
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyDataWorks Summit
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Value of data in digital transformation
Value of data in digital transformationValue of data in digital transformation
Value of data in digital transformationLoihde Advisory
 
countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015Richard Scrivener
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Horizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry UpdatesHorizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry UpdatesKeyMark
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenNeo4j
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big DataCloudera, Inc.
 
7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data Systems7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data SystemsPankaj Khattar
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Jeff Kelly
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 

Similar to Making Big Data Work: 3 Steps (20)

Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Value of data in digital transformation
Value of data in digital transformationValue of data in digital transformation
Value of data in digital transformation
 
countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Big data
Big dataBig data
Big data
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Horizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry UpdatesHorizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry Updates
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data Systems7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data Systems
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Making Big Data Work: 3 Steps

  • 1. Making Big Data work Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk © the DataShed Limited 2015
  • 3. Who am I? • For the last 3 years, the DataShed has been providing consultancy services to a vast array of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset. • We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world. • Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner. © the DataShed Limited 2015
  • 4. So what is ‘Big Data’? © the DataShed Limited 2015
  • 5.
  • 6. Why do Big Data projects fail? Too many people think that Big Data is: “The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.” Gill Press, Forbes.com © the DataShed Limited 2015
  • 7. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 8. Real-time data © the DataShed Limited 2015
  • 9. © the DataShed Limited 2015
  • 10.
  • 11. © the DataShed Limited 2015
  • 12. Continuous Integration Demo © the DataShed Limited 2015
  • 13. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 14. Little Big Data © the DataShed Limited 2015
  • 15. A problem closer to home… • Every business needs to understand: • Their potential customers and market • Current customers • Their products and sales • How and when they engage prospects and customers • Analytics and data are expensive • Many of the mandatory elements are very similar for everyone • The DataShed is Analytics as a Service and Single Customer View as a Service. © the DataShed Limited 2015
  • 16. The deduplication problem… • SME has 250,000 customers (two systems of record) • To identify duplicates brute force approach: 31,249,875,000 comparisons • Building a system to process a minimum of 100 clients a day… • 3.1 trillion records to compare using > 10 different algorithms • Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules • A small data problem but a big data solution? Title First Name Surname Address 1 Address 2 Address 3 Dr R J Smith Two Oaks 112 Old St. County Durham Mrs Robyn Smith 112 Old Street Durham DH1 5YJ © the DataShed Limited 2015
  • 17. © the DataShed Limited 2015
  • 18. The Shed demo © the DataShed Limited 2015
  • 19. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 20. How to make Big Data work? 1. Understand your problem • ’Big Data’ challenges aren’t necessarily new, however much of the technology is • Articulate and communicate – focus on distilling your problem down • Incremental improvement not wholesale replacement 2. Apply appropriate tools • Understand the economics as well as the technology • New technologies need to be evaluated within the context of your problem scope • New technologies are enablers not deliverables (#datalake) • ’Big Data’ technology should be seen as complementary to existing technology 3. Automate everything • Continuous integration to include all testing • Containerise where possible • Measure everything © the DataShed Limited 2015
  • 21. If you really want to get involved… © the DataShed Limited 2015
  • 22. Get your hands dirty If you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to: Email: hello@thedatashed.co.uk Twitter: @thedatashed © the DataShed Limited 2015
  • 23. Any questions? © the DataShed Limited 2015 Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk

Editor's Notes

  1. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/2/ I like the last two. #11 is a warning against blindly collecting more data for the sake of collecting more data (see NSA). #12 is an acknowledgment that storing data in “data silos” has been the key obstacle to getting the data to work for us, to improve our work and lives. It’s all about attitude, not technologies or quantities.