SlideShare a Scribd company logo
1 of 23
Making Big Data work
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk
© the DataShed Limited 2015
intro
Who am I?
• For the last 3 years, the DataShed has been providing consultancy services to a vast array
of large clients. Our primary focus is ensuring that technology and analytical strategies
are truly aligned so that businesses can leverage the latest and greatest in technology to
model, mine and describe their data asset.
• We were working with Big Data technology before the term was coined, we have
experience delivering analytical systems driven by Petabyte data sets, and have designed,
implemented and supported one of the largest real-time data integration and predictive
analytics platforms in the aviation world.
• Our model is based on using a small number of exceptionally highly skilled individuals to
deliver disruptive and innovative solutions in an agile and delivery-focused manner.
© the DataShed Limited 2015
So what is ‘Big Data’?
© the DataShed Limited 2015
Why do Big Data projects fail?
Too many people think that Big Data is:
“The belief that the more data you have, the more insights and
answers will rise automatically from the pool of ones and zeros.”
Gill Press, Forbes.com
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Real-time data
© the DataShed Limited 2015
© the DataShed Limited 2015
© the DataShed Limited 2015
Continuous Integration Demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Little Big Data
© the DataShed Limited 2015
A problem closer to home…
• Every business needs to understand:
• Their potential customers and market
• Current customers
• Their products and sales
• How and when they engage prospects and customers
• Analytics and data are expensive
• Many of the mandatory elements are very similar for everyone
• The DataShed is Analytics as a Service and Single Customer View as a
Service.
© the DataShed Limited 2015
The deduplication problem…
• SME has 250,000 customers (two systems of record)
• To identify duplicates brute force approach: 31,249,875,000
comparisons
• Building a system to process a minimum of 100 clients a day…
• 3.1 trillion records to compare using > 10 different algorithms
• Traditional scale up approach would be expensive, and makes large
assumptions around blocking and partitioning rules
• A small data problem but a big data solution?
Title First Name Surname Address 1 Address 2 Address 3
Dr R J Smith Two Oaks 112 Old St. County Durham
Mrs Robyn Smith 112 Old Street Durham DH1 5YJ
© the DataShed Limited 2015
© the DataShed Limited 2015
The Shed demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
• ’Big Data’ challenges aren’t necessarily new, however much of the technology is
• Articulate and communicate – focus on distilling your problem down
• Incremental improvement not wholesale replacement
2. Apply appropriate tools
• Understand the economics as well as the technology
• New technologies need to be evaluated within the context of your problem scope
• New technologies are enablers not deliverables (#datalake)
• ’Big Data’ technology should be seen as complementary to existing technology
3. Automate everything
• Continuous integration to include all testing
• Containerise where possible
• Measure everything
© the DataShed Limited 2015
If you really want to get involved…
© the DataShed Limited 2015
Get your hands dirty
If you’re interested in learning more, we’ll be hosting a hands-on labs
event in the near future.
Send your details to:
Email: hello@thedatashed.co.uk
Twitter: @thedatashed
© the DataShed Limited 2015
Any questions?
© the DataShed Limited 2015
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk

More Related Content

What's hot

Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
BigDataExpo
 

What's hot (20)

Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Predictive and Prescriptive Analytics Expert Session Webinar
Predictive  and Prescriptive Analytics Expert Session Webinar Predictive  and Prescriptive Analytics Expert Session Webinar
Predictive and Prescriptive Analytics Expert Session Webinar
 
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
 
Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald Root
 
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ..."Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
 
datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake  datavirtuality - Beyond the data lake
datavirtuality - Beyond the data lake
 
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...Solution Centric Architectural Presentation - Implementing a Logical Data War...
Solution Centric Architectural Presentation - Implementing a Logical Data War...
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 

Viewers also liked

Viewers also liked (6)

Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
 
Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop Tour
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 

Similar to Making big data work

countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015
Richard Scrivener
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 

Similar to Making big data work (20)

Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model Accelerate Self-service Analytics with Universal Semantic Model
Accelerate Self-service Analytics with Universal Semantic Model
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Value of data in digital transformation
Value of data in digital transformationValue of data in digital transformation
Value of data in digital transformation
 
countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015countingChickens-HerdingCats-2015
countingChickens-HerdingCats-2015
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Big data
Big dataBig data
Big data
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Horizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry UpdatesHorizons 2014 - KeyMark and Industry Updates
Horizons 2014 - KeyMark and Industry Updates
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data Systems7 point Success Strategy for Big Data Systems
7 point Success Strategy for Big Data Systems
 
Democratizing Big Data (Updated)
Democratizing Big Data (Updated)Democratizing Big Data (Updated)
Democratizing Big Data (Updated)
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Making big data work

  • 1. Making Big Data work Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk © the DataShed Limited 2015
  • 3. Who am I? • For the last 3 years, the DataShed has been providing consultancy services to a vast array of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset. • We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world. • Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner. © the DataShed Limited 2015
  • 4. So what is ‘Big Data’? © the DataShed Limited 2015
  • 5.
  • 6. Why do Big Data projects fail? Too many people think that Big Data is: “The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.” Gill Press, Forbes.com © the DataShed Limited 2015
  • 7. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 8. Real-time data © the DataShed Limited 2015
  • 9. © the DataShed Limited 2015
  • 10.
  • 11. © the DataShed Limited 2015
  • 12. Continuous Integration Demo © the DataShed Limited 2015
  • 13. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 14. Little Big Data © the DataShed Limited 2015
  • 15. A problem closer to home… • Every business needs to understand: • Their potential customers and market • Current customers • Their products and sales • How and when they engage prospects and customers • Analytics and data are expensive • Many of the mandatory elements are very similar for everyone • The DataShed is Analytics as a Service and Single Customer View as a Service. © the DataShed Limited 2015
  • 16. The deduplication problem… • SME has 250,000 customers (two systems of record) • To identify duplicates brute force approach: 31,249,875,000 comparisons • Building a system to process a minimum of 100 clients a day… • 3.1 trillion records to compare using > 10 different algorithms • Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules • A small data problem but a big data solution? Title First Name Surname Address 1 Address 2 Address 3 Dr R J Smith Two Oaks 112 Old St. County Durham Mrs Robyn Smith 112 Old Street Durham DH1 5YJ © the DataShed Limited 2015
  • 17. © the DataShed Limited 2015
  • 18. The Shed demo © the DataShed Limited 2015
  • 19. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 20. How to make Big Data work? 1. Understand your problem • ’Big Data’ challenges aren’t necessarily new, however much of the technology is • Articulate and communicate – focus on distilling your problem down • Incremental improvement not wholesale replacement 2. Apply appropriate tools • Understand the economics as well as the technology • New technologies need to be evaluated within the context of your problem scope • New technologies are enablers not deliverables (#datalake) • ’Big Data’ technology should be seen as complementary to existing technology 3. Automate everything • Continuous integration to include all testing • Containerise where possible • Measure everything © the DataShed Limited 2015
  • 21. If you really want to get involved… © the DataShed Limited 2015
  • 22. Get your hands dirty If you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to: Email: hello@thedatashed.co.uk Twitter: @thedatashed © the DataShed Limited 2015
  • 23. Any questions? © the DataShed Limited 2015 Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk

Editor's Notes

  1. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/2/ I like the last two. #11 is a warning against blindly collecting more data for the sake of collecting more data (see NSA). #12 is an acknowledgment that storing data in “data silos” has been the key obstacle to getting the data to work for us, to improve our work and lives. It’s all about attitude, not technologies or quantities.