SlideShare a Scribd company logo
Making Big Data work
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk
©	the	DataShed	Limited	 2015
intro
Who am I?
• For	the	last	3	years,	the	DataShed has	been	providing	consultancy	services	to	a	vast	array	
of	large	clients.	Our	primary	focus	is	ensuring	that	technology	and	analytical	strategies	
are	truly	aligned	so	that	businesses	can	leverage	the	latest	and	greatest	in	technology	to	
model,	mine	and	describe	their	data	asset.	
• We	were	working	with	Big	Data	technology	before	the	term	was	coined,	we	have	
experience	delivering	analytical	systems	driven	by	Petabyte	data	sets,	and	have	designed,	
implemented	and	supported	one	of	the	largest	real-time	data	integration	and	predictive	
analytics	platforms	in	the	aviation	world.
• Our	model	is	based	on	using	a	small	number	of	exceptionally	highly	skilled	individuals	to	
deliver	disruptive	and	innovative	solutions	in	an	agile	and	delivery-focused	manner.
©	the	DataShed	Limited	 2015
So what is ‘Big Data’?
©	the	DataShed	Limited	 2015
Why do Big Data projects fail?
Too	many	people	think	that	Big	Data	is:
“The	belief	that	the	more	data	you	have,	the	more	insights	and	
answers	will	rise	automatically	from	the	pool	of	ones	and	zeros.”
Gill	Press,	Forbes.com
©	the	DataShed	Limited	 2015
How to make Big Data work?
1. Understand	your	problem	
2. Apply	appropriate	tools
3. Automate	everything.
©	the	DataShed	Limited	 2015
Real-time data
©	the	DataShed	Limited	 2015
©	the	DataShed	Limited	 2015
©	the	DataShed	Limited	 2015
Continuous Integration Demo
©	the	DataShed	Limited	 2015
How to make Big Data work?
1. Understand	your	problem	
2. Apply	appropriate	tools
3. Automate	everything.
©	the	DataShed	Limited	 2015
Little Big Data
©	the	DataShed	Limited	 2015
A problem closer to home…
• Every	business	needs	to	understand:
• Their	potential	customers	and	market
• Current	customers
• Their	products	and	sales
• How	and	when	they	engage	prospects	and	customers
• Analytics	and	data	are	expensive
• Many	of	the	mandatory	elements	are	very	similar	for	everyone
• The	DataShed	is	Analytics	as	a	Service	and	Single	Customer	View	as	a	
Service.
©	the	DataShed	Limited	 2015
The deduplication problem…
• SME	has	250,000	customers	(two	systems	of	record)
• To	identify	duplicates	brute	force	approach: 31,249,875,000	
comparisons
• Building	a	system	to	process	a	minimum	of	100	clients	a	day…
• 3.1	trillion	records	to	compare	using	>	10	different	algorithms	
• Traditional	scale	up	approach	would	be	expensive,	and	makes	large	
assumptions	around	blocking	and	partitioning	rules
• A	small	data	problem	but	a	big	data	solution?
Title First	Name Surname Address 1 Address	2 Address	3
Dr R	J Smith TwoOaks 112	Old	St. County	Durham
Mrs Robyn Smith 112	Old	Street Durham DH1	5YJ
©	the	DataShed	Limited	 2015
©	the	DataShed	Limited	 2015
The Shed demo
©	the	DataShed	Limited	 2015
How to make Big Data work?
1. Understand	your	problem	
2. Apply	appropriate	tools
3. Automate	everything.
©	the	DataShed	Limited	 2015
How to make Big Data work?
1. Understand	your	problem
• ’Big	Data’	challenges	aren’t	necessarily	new,	however	much	of	the	technology	is
• Articulate	and	communicate	– focus	on	distilling	your	problem	down
• Incremental improvement	not	wholesale	replacement
2. Apply	appropriate tools
• Understand	the	economics as	well	as	the	technology
• New	technologies	need	to	be	evaluated	within	the	context	of	your	problem	scope
• New	technologies	are	enablers not	deliverables	(#datalake)
• ’Big	Data’	technology	should	be	seen	as	complementary	to	existing	technology
3. Automate	everything
• Continuous	integration	to	include	all	testing
• Containerise	where	possible
• Measure	everything
©	the	DataShed	Limited	 2015
If you really want to get involved…
©	the	DataShed	Limited	 2015
Get your hands dirty
If	you’re	interested	in	learning	more,	we’ll	be	hosting	a	hands-on	labs	
event	in	the	near	future.
Send	your	details	to:
Email:	hello@thedatashed.co.uk
Twitter:	@thedatashed
©	the	DataShed	Limited	 2015
Any questions?
©	the	DataShed	Limited	 2015
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk

More Related Content

What's hot

Talend mike hirt
Talend mike hirtTalend mike hirt
Talend mike hirt
BigDataExpo
 
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient..."Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
Dataconomy Media
 
Anchormen corne versloot
Anchormen corne verslootAnchormen corne versloot
Anchormen corne versloot
BigDataExpo
 
Big data
Big dataBig data
Big data
promediakw
 
3D Data Strategy Framework
3D Data Strategy Framework3D Data Strategy Framework
3D Data Strategy Framework
Daniel Ren
 
Big Data
Big DataBig Data
Big Data
Kiran Jamil
 
A Comprehensive Guide to Data Management for Businesses by Infinit Datum
A Comprehensive Guide to Data Management for Businesses by Infinit DatumA Comprehensive Guide to Data Management for Businesses by Infinit Datum
A Comprehensive Guide to Data Management for Businesses by Infinit Datum
Infinit-O Global, Limited
 
Multi Cloud Data Integration- Retail
Multi Cloud Data Integration- RetailMulti Cloud Data Integration- Retail
Multi Cloud Data Integration- Retail
alanwaler
 
Alliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyronAlliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyron
BigDataExpo
 
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
Microsoft
 
Data is cheap; strategy still matters by Jason Lee
Data is cheap; strategy still matters by Jason LeeData is cheap; strategy still matters by Jason Lee
Data is cheap; strategy still matters by Jason Lee
Data Con LA
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Hortonworks
 
Big data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik VandeputteBig data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik Vandeputte
InspireX
 
Mastech digital to acquire infoTrellis
Mastech digital to acquire infoTrellisMastech digital to acquire infoTrellis
Mastech digital to acquire infoTrellis
Mastech Digital
 
The truth is out there
The truth is out thereThe truth is out there
The truth is out there
Mike Davis
 
London Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentationLondon Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentation
KETL Limited
 
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
Jari Koister
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
 
Talend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK eventTalend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK event
KETL Limited
 
Big Data API’s and Analytics
Big Data API’s and AnalyticsBig Data API’s and Analytics
Big Data API’s and Analytics
Andy Brauer
 

What's hot (20)

Talend mike hirt
Talend mike hirtTalend mike hirt
Talend mike hirt
 
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient..."Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
 
Anchormen corne versloot
Anchormen corne verslootAnchormen corne versloot
Anchormen corne versloot
 
Big data
Big dataBig data
Big data
 
3D Data Strategy Framework
3D Data Strategy Framework3D Data Strategy Framework
3D Data Strategy Framework
 
Big Data
Big DataBig Data
Big Data
 
A Comprehensive Guide to Data Management for Businesses by Infinit Datum
A Comprehensive Guide to Data Management for Businesses by Infinit DatumA Comprehensive Guide to Data Management for Businesses by Infinit Datum
A Comprehensive Guide to Data Management for Businesses by Infinit Datum
 
Multi Cloud Data Integration- Retail
Multi Cloud Data Integration- RetailMulti Cloud Data Integration- Retail
Multi Cloud Data Integration- Retail
 
Alliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyronAlliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyron
 
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
 
Data is cheap; strategy still matters by Jason Lee
Data is cheap; strategy still matters by Jason LeeData is cheap; strategy still matters by Jason Lee
Data is cheap; strategy still matters by Jason Lee
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 
Big data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik VandeputteBig data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik Vandeputte
 
Mastech digital to acquire infoTrellis
Mastech digital to acquire infoTrellisMastech digital to acquire infoTrellis
Mastech digital to acquire infoTrellis
 
The truth is out there
The truth is out thereThe truth is out there
The truth is out there
 
London Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentationLondon Jaspersoft Community User Group Event 2 KETL presentation
London Jaspersoft Community User Group Event 2 KETL presentation
 
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Talend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK eventTalend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK event
 
Big Data API’s and Analytics
Big Data API’s and AnalyticsBig Data API’s and Analytics
Big Data API’s and Analytics
 

Viewers also liked

Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
Don't put all your eggs in the SEO basket - amplify growth combining inbound ...Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
Fruition Business Development
 
Big data hadoop architect program certificate
Big data hadoop architect program certificateBig data hadoop architect program certificate
Big data hadoop architect program certificate
Sumeet Khanna
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013
Sameer Wadkar
 
IoT and Big Data
IoT and Big DataIoT and Big Data
IoT and Big Data
sabnees
 
Oracle EBS Change Projects Process Flows
Oracle EBS Change Projects Process FlowsOracle EBS Change Projects Process Flows
Oracle EBS Change Projects Process Flows
Mahesh Vallampati
 
Pathway to solution architect
Pathway to solution architectPathway to solution architect
Pathway to solution architect
Volodymyr Yelchev
 
Big Data BluePrint
Big Data BluePrintBig Data BluePrint
Big Data BluePrint
Daan Gerits
 
Big data – solution architect
Big data – solution architectBig data – solution architect
Big data – solution architect
Jaya Prakash Mudugal
 
The Data Architect Manifesto
The Data Architect ManifestoThe Data Architect Manifesto
The Data Architect Manifesto
Mahesh Vallampati
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
Bernard Marr
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
Amazon Web Services
 
Exploiting The Potential Of Big Data
Exploiting The Potential Of Big DataExploiting The Potential Of Big Data
Exploiting The Potential Of Big Data
Activate
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
Bernard Marr
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Thirunavukkarasu Ps
 
Sales and Big Data
Sales and Big DataSales and Big Data
Sales and Big Data
Consilium
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and Approaches
Thoughtworks
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
McKinsey on Marketing & Sales
 
IQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data AnalyticsIQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data Analytics
InterQuest Group
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 

Viewers also liked (20)

Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
Don't put all your eggs in the SEO basket - amplify growth combining inbound ...Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
Don't put all your eggs in the SEO basket - amplify growth combining inbound ...
 
Big data hadoop architect program certificate
Big data hadoop architect program certificateBig data hadoop architect program certificate
Big data hadoop architect program certificate
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013
 
IoT and Big Data
IoT and Big DataIoT and Big Data
IoT and Big Data
 
Oracle EBS Change Projects Process Flows
Oracle EBS Change Projects Process FlowsOracle EBS Change Projects Process Flows
Oracle EBS Change Projects Process Flows
 
Pathway to solution architect
Pathway to solution architectPathway to solution architect
Pathway to solution architect
 
Big Data BluePrint
Big Data BluePrintBig Data BluePrint
Big Data BluePrint
 
Big data – solution architect
Big data – solution architectBig data – solution architect
Big data – solution architect
 
The Data Architect Manifesto
The Data Architect ManifestoThe Data Architect Manifesto
The Data Architect Manifesto
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Exploiting The Potential Of Big Data
Exploiting The Potential Of Big DataExploiting The Potential Of Big Data
Exploiting The Potential Of Big Data
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Sales and Big Data
Sales and Big DataSales and Big Data
Sales and Big Data
 
Big Data: Architectures and Approaches
Big Data: Architectures and ApproachesBig Data: Architectures and Approaches
Big Data: Architectures and Approaches
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
IQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data AnalyticsIQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data Analytics
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 

Similar to Making Big Data Work

Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARC
Jim Kaskade
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
ZEDventures-highres
ZEDventures-highresZEDventures-highres
ZEDventures-highres
Jeremy Stierwalt
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
Softweb Solutions
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
Anjan Roy, PMP
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategy
IBM Sverige
 
BigInsights BigData Study 2013 - Exec Summary
BigInsights BigData Study 2013  - Exec SummaryBigInsights BigData Study 2013  - Exec Summary
BigInsights BigData Study 2013 - Exec Summary
BigInsights
 
MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0
John Head
 
Data as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and WhenData as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and When
RocketSource
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
Denodo
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Devon Ziegenfuss
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
Big agendas for big data analytics projects
Big agendas for big data analytics projectsBig agendas for big data analytics projects
Big agendas for big data analytics projects
The Marketing Distillery
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
The Art of Data Science - event slides
The Art of Data Science - event slidesThe Art of Data Science - event slides
The Art of Data Science - event slides
RedPixie
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
Inside Analysis
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
The Marketing Distillery
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
BigDataEverywhere
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 

Similar to Making Big Data Work (20)

Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARC
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
ZEDventures-highres
ZEDventures-highresZEDventures-highres
ZEDventures-highres
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategy
 
BigInsights BigData Study 2013 - Exec Summary
BigInsights BigData Study 2013  - Exec SummaryBigInsights BigData Study 2013  - Exec Summary
BigInsights BigData Study 2013 - Exec Summary
 
MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0
 
Data as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and WhenData as a Service (DaaS): The What, Why, How, Who, and When
Data as a Service (DaaS): The What, Why, How, Who, and When
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Big agendas for big data analytics projects
Big agendas for big data analytics projectsBig agendas for big data analytics projects
Big agendas for big data analytics projects
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
The Art of Data Science - event slides
The Art of Data Science - event slidesThe Art of Data Science - event slides
The Art of Data Science - event slides
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 

More from Corecom Consulting

How to move to the cloud, get it right, stay secure and not cost a fortune
How to move to the cloud, get it right, stay secure and not cost a fortuneHow to move to the cloud, get it right, stay secure and not cost a fortune
How to move to the cloud, get it right, stay secure and not cost a fortune
Corecom Consulting
 
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with AccessibilityTestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
Corecom Consulting
 
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with AccessibilityTestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
Corecom Consulting
 
TestBoss October 2019
TestBoss October 2019TestBoss October 2019
TestBoss October 2019
Corecom Consulting
 
BIBoss: The Data Science Behind Personalisation & AI
BIBoss: The Data Science Behind Personalisation & AIBIBoss: The Data Science Behind Personalisation & AI
BIBoss: The Data Science Behind Personalisation & AI
Corecom Consulting
 
DevBoss May 2019 Presentation
DevBoss May 2019 Presentation DevBoss May 2019 Presentation
DevBoss May 2019 Presentation
Corecom Consulting
 
TestBoss April 2019 Discussion Notes
TestBoss April 2019 Discussion NotesTestBoss April 2019 Discussion Notes
TestBoss April 2019 Discussion Notes
Corecom Consulting
 
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing pieceTestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
Corecom Consulting
 
Professional Networking Lecture
Professional Networking LectureProfessional Networking Lecture
Professional Networking Lecture
Corecom Consulting
 
University of Leeds Professional Networking Lecture
University of Leeds Professional Networking LectureUniversity of Leeds Professional Networking Lecture
University of Leeds Professional Networking Lecture
Corecom Consulting
 
TestBoss November 2018 - Ghost in the machine, how hackers break software
TestBoss November 2018 - Ghost in the machine, how hackers break softwareTestBoss November 2018 - Ghost in the machine, how hackers break software
TestBoss November 2018 - Ghost in the machine, how hackers break software
Corecom Consulting
 
BaBoss October 2018
BaBoss October 2018BaBoss October 2018
BaBoss October 2018
Corecom Consulting
 
Welcome to the team, Adam
Welcome to the team, AdamWelcome to the team, Adam
Welcome to the team, Adam
Corecom Consulting
 
Welcome to the team
Welcome to the team Welcome to the team
Welcome to the team
Corecom Consulting
 
WITBoss June 2018 - Confidence - if you can't make it, fake it
WITBoss June 2018 - Confidence - if you can't make it, fake itWITBoss June 2018 - Confidence - if you can't make it, fake it
WITBoss June 2018 - Confidence - if you can't make it, fake it
Corecom Consulting
 
TestBoss May 2018 - 'How to win with automation and influence people'
TestBoss May 2018 - 'How to win with automation and influence people'TestBoss May 2018 - 'How to win with automation and influence people'
TestBoss May 2018 - 'How to win with automation and influence people'
Corecom Consulting
 
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
Corecom Consulting
 
BABoss February 2018
BABoss February 2018BABoss February 2018
BABoss February 2018
Corecom Consulting
 
The best bits of 2017
The best bits of 2017The best bits of 2017
The best bits of 2017
Corecom Consulting
 
TestBoss: Leaders in Software Testing
TestBoss: Leaders in Software TestingTestBoss: Leaders in Software Testing
TestBoss: Leaders in Software Testing
Corecom Consulting
 

More from Corecom Consulting (20)

How to move to the cloud, get it right, stay secure and not cost a fortune
How to move to the cloud, get it right, stay secure and not cost a fortuneHow to move to the cloud, get it right, stay secure and not cost a fortune
How to move to the cloud, get it right, stay secure and not cost a fortune
 
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with AccessibilityTestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
 
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with AccessibilityTestBoss Manchester Nov 2019 - What's Wrong with Accessibility
TestBoss Manchester Nov 2019 - What's Wrong with Accessibility
 
TestBoss October 2019
TestBoss October 2019TestBoss October 2019
TestBoss October 2019
 
BIBoss: The Data Science Behind Personalisation & AI
BIBoss: The Data Science Behind Personalisation & AIBIBoss: The Data Science Behind Personalisation & AI
BIBoss: The Data Science Behind Personalisation & AI
 
DevBoss May 2019 Presentation
DevBoss May 2019 Presentation DevBoss May 2019 Presentation
DevBoss May 2019 Presentation
 
TestBoss April 2019 Discussion Notes
TestBoss April 2019 Discussion NotesTestBoss April 2019 Discussion Notes
TestBoss April 2019 Discussion Notes
 
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing pieceTestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
 
Professional Networking Lecture
Professional Networking LectureProfessional Networking Lecture
Professional Networking Lecture
 
University of Leeds Professional Networking Lecture
University of Leeds Professional Networking LectureUniversity of Leeds Professional Networking Lecture
University of Leeds Professional Networking Lecture
 
TestBoss November 2018 - Ghost in the machine, how hackers break software
TestBoss November 2018 - Ghost in the machine, how hackers break softwareTestBoss November 2018 - Ghost in the machine, how hackers break software
TestBoss November 2018 - Ghost in the machine, how hackers break software
 
BaBoss October 2018
BaBoss October 2018BaBoss October 2018
BaBoss October 2018
 
Welcome to the team, Adam
Welcome to the team, AdamWelcome to the team, Adam
Welcome to the team, Adam
 
Welcome to the team
Welcome to the team Welcome to the team
Welcome to the team
 
WITBoss June 2018 - Confidence - if you can't make it, fake it
WITBoss June 2018 - Confidence - if you can't make it, fake itWITBoss June 2018 - Confidence - if you can't make it, fake it
WITBoss June 2018 - Confidence - if you can't make it, fake it
 
TestBoss May 2018 - 'How to win with automation and influence people'
TestBoss May 2018 - 'How to win with automation and influence people'TestBoss May 2018 - 'How to win with automation and influence people'
TestBoss May 2018 - 'How to win with automation and influence people'
 
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
TestBoss Manchester March 2018 - 'GDPR: The battles in store for Test Bosses'
 
BABoss February 2018
BABoss February 2018BABoss February 2018
BABoss February 2018
 
The best bits of 2017
The best bits of 2017The best bits of 2017
The best bits of 2017
 
TestBoss: Leaders in Software Testing
TestBoss: Leaders in Software TestingTestBoss: Leaders in Software Testing
TestBoss: Leaders in Software Testing
 

Recently uploaded

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 

Recently uploaded (20)

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 

Making Big Data Work

  • 1. Making Big Data work Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk © the DataShed Limited 2015
  • 3. Who am I? • For the last 3 years, the DataShed has been providing consultancy services to a vast array of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset. • We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world. • Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner. © the DataShed Limited 2015
  • 4. So what is ‘Big Data’? © the DataShed Limited 2015
  • 5.
  • 6. Why do Big Data projects fail? Too many people think that Big Data is: “The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.” Gill Press, Forbes.com © the DataShed Limited 2015
  • 7. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 10.
  • 13. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 15. A problem closer to home… • Every business needs to understand: • Their potential customers and market • Current customers • Their products and sales • How and when they engage prospects and customers • Analytics and data are expensive • Many of the mandatory elements are very similar for everyone • The DataShed is Analytics as a Service and Single Customer View as a Service. © the DataShed Limited 2015
  • 16. The deduplication problem… • SME has 250,000 customers (two systems of record) • To identify duplicates brute force approach: 31,249,875,000 comparisons • Building a system to process a minimum of 100 clients a day… • 3.1 trillion records to compare using > 10 different algorithms • Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules • A small data problem but a big data solution? Title First Name Surname Address 1 Address 2 Address 3 Dr R J Smith TwoOaks 112 Old St. County Durham Mrs Robyn Smith 112 Old Street Durham DH1 5YJ © the DataShed Limited 2015
  • 19. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 20. How to make Big Data work? 1. Understand your problem • ’Big Data’ challenges aren’t necessarily new, however much of the technology is • Articulate and communicate – focus on distilling your problem down • Incremental improvement not wholesale replacement 2. Apply appropriate tools • Understand the economics as well as the technology • New technologies need to be evaluated within the context of your problem scope • New technologies are enablers not deliverables (#datalake) • ’Big Data’ technology should be seen as complementary to existing technology 3. Automate everything • Continuous integration to include all testing • Containerise where possible • Measure everything © the DataShed Limited 2015
  • 21. If you really want to get involved… © the DataShed Limited 2015
  • 22. Get your hands dirty If you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to: Email: hello@thedatashed.co.uk Twitter: @thedatashed © the DataShed Limited 2015
  • 23. Any questions? © the DataShed Limited 2015 Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk