SlideShare a Scribd company logo
1 of 78
Download to read offline
©2015 Protegra Inc. All rights reserved.
Big Data
Terry Bunio - Protegra
Who Am I?
• Data Base Administrator
- Oracle
- SQL Server 6,6.5,7,2000,2005,2008,2012
- Informix
- ADABAS
• Data Modeler/Architect
- Investors Group, LPL Financial, Manitoba Blue Cross, Assante
Financial, CI Funds, Mackenzie Financial
- Normalized and Dimensional
• Agilist
- Innovation Gamer, Team Member, SQL Developer, Test writer, Sticky
Sticker, Project Manager, PMO on SAP Implementation
@tbunio
tbunio@protegra.com
agilevoyageur.com
www.protegra.com
Where can you find me?
Members of the Protegra Community of
software-driven businesses & solutions
Definition
Myths
• Big Data Myth #1: It’s Big
• Big Data Myth #2: You need to apply it right away
• Big Data Myth #3: The more granular the data, the better
• Big Data Myth #4: Big Data is good data
• Big Data Myth #5: Big Data means that analysts become all-important
• Big Data Myth #6: Big Data gives you concrete answers
• Big Data Myth #7: Big Data is a magic 8-ball
• Big Data Myth #8: Big Data can create self-learning algorithms
Evolution
Term Time Frame
Decision Support 1970-1985
Executive Support 1980-1990
Online Analytical Processing 1990-2000
Business Intelligence 1989-2005
Analytics 2005-2010
Big Data 2010-2015
Next??
Facts
• Every 2 days we create as much information as we did from the
beginning of time until 2003
• Over 90% of all the data in the world was created in the past 2 years
• It is expected that by 2020 the amount of digital information in
existence will have grown from 3.2 zettabytes today to 40 zettabytes
• The total amount of data being captured and stored by industry
doubles every 1.2 years
• Every minute we send 204 million emails, generate 1.8 million
Facebook likes, send 278 thousand Tweets, and up-load 200,000
photos to Facebook
Facts
• Google alone processes on average over 40 thousand search
queries per second, making it over 3.5 billion in a single day
• Around 100 hours of video are uploaded to YouTube every
minute and it would take you around 15 years to watch every
video uploaded by users in one day
• If you burned all of the data created in just one day onto DVDs,
you could stack them on top of each other and reach the moon –
twice
• AT&T is thought to hold the world’s largest volume of data in one
unique database – its phone records database is 312 terabytes in
size, and contains almost 2 trillion rows
Facts
• 570 new websites spring into existence every minute of every
day
• 1.9 million IT jobs will be created in the US by 2015 to carry out
big data projects. Each of those will be supported by 3 new jobs
created outside of IT – meaning a total of 6 million new jobs
thanks to big data
• Today’s data centres occupy an area of land equal in size to
almost 6,000 football fields
• Between them, companies monitoring Twitter to measure
“sentiment” analyze 12 terabytes of tweets every day
Facts
• The amount of data transferred over mobile networks increased
by 81% to 1.5 exabytes (1.5 billion gigabytes) per month between
2012 and 2014. Video accounts for 53% of that total
• The NSA is thought to analyze 1.6% of all global internet traffic –
around 30 petabytes (30 million gigabytes) every day
• The value of the Hadoop market is expected to soar from $2
billion in 2013 to $50 billion by 2020, according to market
research firm Allied Market Research
Facts
• The number of Bits of information stored in the digital universe is
thought to have exceeded the number of stars in the physical
universe in 2007
• The boom of the Internet of Things will mean that the amount of
devices connected to the Internet will rise from about 13 billion
today to 50 billion by 2020
• 12 million RFID tags – used to capture data and track movement
of objects in the physical world – had been sold in by 2011. By
2021, it is estimated that number will have risen to 209 billion as
the Internet of Things takes off
Facts
• Big data has been used to predict crimes before they happen – a
“predictive policing” trial in California was able to identify areas
where crime will occur three times more accurately than existing
• By better integrating big data analytics into healthcare, the
industry could save $300 billion a year – that’s the equivalent of
reducing the healthcare costs of every man, woman and child by
$1,000 a year methods of forecasting
• Retailers could increase their profit margins by more than 60%
through the full exploitation of big data analytics
Facts
• The big data industry is expected to grow from US$10.2 billion in
2013 to about US$54.3 billion by 2017
What Big Data is and what it isn’t. What
problems does it solve?
What is Big Data not?
• Not regular data
• Not data that fits into the existing analytic paradigm/toolset
• Doesn’t easily fit into the existing row/column structures
New Types of Data
• Activity Data
- Listening to music
- Watching movies
- Browsing
- Driving
- Walking
- Exercising
• Intentional and Non-Intentional
New Types of Data
• Conversations
- Audio
- Visual
- Textual
New Types of Data
New Types of Data
• Image data
- Photo
• Cameras
• Phones
- Video
• Cameras
• CCTV
• YouTube!
New Types of Data
• Machine to Machine
- Cell phone to towers
- Medical devices
• Sensors
- Location
- Speed
- Acceleration
- Health
- Altitude
- Temperature
- Humidity
- Among many others….
New Types of Data
• Internet of Things
- Toasters
- Fridges
- Phones
- Jet Engines
- Combines
- Manufacturing
Abandoned Activities
• Exciting new area of big data
- What mouse pattern and keystrokes are done but eventually abandoned
- What articles do you almost comments on
- What transactions do you start but not complete?
- Do you do these share common themes?
- At what step is the transaction abandoned?
What makes Data, Big Data?
• Volume
• Velocity
• Variety
• Veracity
Volume
• If we take all the data generated from the beginning of time to 2008
- That same amount of data is now generated almost every minute
• We can now store that data across immense networks
• But there is more data than we can possibly analyze
• One airplane generates 2 terrabytes of data about it’s engines on a
flight across North America
• Half of data for analysis was surveillance video in 2012
Big Data versus Lotsa Data
• Lotsa Data
- Same structured data as you currently have
• Just more of it
- Can be analyzed using the same paradigms/toolsets
• May just take a bit longer
Velocity
• Speed that data is generated, analyzed, and the speed that data
moves around
- Think of how quickly something can be trending on Twitter
• Technology now allows us to analyze data in memory without it ever
being stored
• 200 Billion tweets per year
• Lots of prior analytics are used to static data
- But what happens to your analysis if the data is constantly changing?
• Sensor data
- Can’t store all of it, just too much data
Variety
• We used to be able to just focus on structured data in neat relational
table structures
• 80% of the data is now unstructured
- Text
- Image
- Video
- Voice
- Social Graph data
- No SQL
• Big Data Technology can now bring different types of data together to
analyze
Variety
• Structured
• Unstructured
• Semi-structured
- XML
- JSON
- Usually applied to text to try and enhance the analysis that can be done
- Stored in NoSQL databases
• MongoDB
Variety
• “Variety is the biggest factor leading companies to Big Data” – Gartner
• Do you need all three factors to make it big data?
- No
- Any two of these factors that cause the existing analysis to fall short can make
the data Big Data
• As long as one is variety
Veracity
• 4th V
- No mentioned consistently
• Definition
- Messiness or trustworthiness of data
- Involves the lineage of the data
- Twitter versus corporate data
- Is there insightful value in the data?
• 5th V - Validity
Datafication
What Problems does it solve?
• Risk Modelling
- Banks and Insurance
- Credit Card activity
• Customer churn analysis
- What were people doing just before they left?
• Recommendation engines
- LinkedIn, Amazon, Netflix
• Ad targeting
• Aggregated Transactional Analysis
- Supply Chain Management
• Threat Analysis
Who is leading the way?
• LinkedIn
- People you may know
• Netflix
- Million dollar prize
• GE/John Deere
- Predictive servicing on industrial devices
• Amazon
- Books others have also bought
• Google
- For everything
Who is leading the way?
• UPS
• United HealthCare
• Macys
• Bank of America
• Citigroup
• Verizon Wireless
• City of Brandon
- Snowclearing website
What can be achieved?
• Cost reductions
- Large Enterprises
• Time reductions
- Large Enterprises and Small Enterprises
• Better decisions
- Larges Enterprises, Small Enterprises, and Start ups
• New Offerings/Product Innovations
- Larges Enterprises, Small Enterprises, and Start ups
Why is it important?
• Big Data is here to stay
• If you don’t embrace it, your competition will. They will use it to:
- Deliver with less cost and faster
- Develop new innovations
- Understand customer better to attract them and prevent them from leaving
- Make quicker and better decisions
What is the technology behind Big Data?
Distributed Data
• Big Data just is too big to be stored on one computer
• Storing data across multiple computers allows you to take advantage
of other computer’s processing power
• SANs were the first solution to distributed data
• Cloud data is now the second generation of the solution
- Amazon S3 – Netflix
- Amazon Glacier
Could Computing
• IaaS, PaaS, SaaS, DaaS
• All made possible with virtualization
Hadoop
CAP Theorem
• Consistency
• Availability
• Scalability
• Transactional Data not a good fit for Hadoop
- Requires Consistency
• Behavioural Data is a good fit
- Health Care
- Social Media
Hadoop
• Hadoop was the named of the stuffed elephant that belonged to one
of the developers
• Not a single product
• Collection of applications
• Framework or platform
• Several modules
• Not a database
- Alternative file system with a processing library
Hadoop
• Why use Hadoop?
- Cheaper
- Faster
- Better suited to unstructured data
HDFS
• Hadoop Distributed File System that is spread across many
computers – 100’s to 10,000’s
• Not a database
- These are individual files stored across the computers
• Based off of Google’s GFS
- To index the Internet
MapReduce
• Map splits a task into many pieces
- Split a task up and send it to many computers
• Reduce takes the results and combines them back together
• Has been replaced by YARN
- Sometimes called MapReduce2
• Important feature of Yarn
- Can do stream processing in addition to batch processing
- Can also do Graph processing
MapReduce
• Programming paradigm
- Map component executes a function on each piece of data
• Execute on each node
• Bring the compute to the data
• Output key and value pairs on each node
- Reducer Aggregates the key value pairs on the nodes
• Outputs a combined list
- Mapper and Reducer are classes
• Really a Functional programming model though
• State is not shared
Hello World for MapReduce
• Wordcount
- How much wood could a woodchuck chuck if a woodchuck could chuck
wood?
- {how,1;much,1;wood,2;could,2;a,2;woodchuck,2;chuck,2;if,1}
• We are essentially labelling and counting components of data
Pig and Hive
• Pig
- Platform used to write MapReduce programs
- Use Pig Latin Language
• Hive
- Summarizes query
- Analyzes data
- Uses the HiveQL language
Additional Components
• Hbase – NoSQL database
• Storm – process streaming data
• Spark – in-memory processing
• Giraph – graph processing
Hadoop is Open Source
• Developed by engineers at Yahoo
• Now an open-source project
- You may hear about apache-Hadoop, apache-pig
• Hadoop is free
• Anyone can download or modify
What are large companies and startups using
Big Data for?
General Uses
• Monitoring production lines
• Smart meters for utilities
• Environmental Monitoring
• Infrastructure Management (bridges, railways)
• Supply chain network
• Predictive maintenance
• Energy Management
• Medical and Health Care systems
• Home Automation
Common Applications of Big Data
• For Consumers
- SIRI and Yelp
- Spotify and Amazon
- LinkedIn
- NetFlix
- Google Now
• GPS aware
• Provide pro-active advice on traffic and meal
Common Applications of Big Data
• For Business
- Google Ad searches
- John Deere
- Boeing
- American Airlines
- Orbitz
- Fraud Detection
• The way you move your mouse and navigate website are distinctive
Common Applications of Big Data
• Google flu trends
- Anybody know how it is generated?
- Based on Google search history rather than lab results
• It was found this was more accurate
General Uses of Big Data
Monitoring and Anomaly Detection
• Monitoring detects specific events
• Anomaly Detection detects unexpected events
- Unusual Activity
- Could be on a combination of criteria
- Usually requires human attention
- Invites inspection
- Hey, I’m not sure this is an issue but it isn’t common
• Big Data allows for more detail
- Extremely rare events
- Combination of a large number of factors
• Measure 1,000 different factors at once
• SPAM – Big Data Collection - GMAIL
Data Mining and Text Analytics
• Search for unexpected patterns
- Supermarket/stock investments/set of symptoms
• Text Mining
- Focuses on content
- Sentiment analysis
• Positive/negative
• Work best with very large data sets
Predictive Analytics
• Crystal Ball of Big Data
• Nate Silver
- Accurately predicted every state in 2012 election
- Combined election polls and weighted them by reliability
- ESPN bought his website
• Go visit for March Madness
• Netflix – 10% prize
- Ensemble Model
- Go to kaggle.com
• Offer compensation for people to create predictive models for them
• Free data to teach predictive modeling
Visualization
• Computers spot certain patterns
• Computers excel at predictive models
• Computers excel at data mining
• Humans perceive and interpret better
• Human vision plays an important role in Big Data
What Humans do well
• Identifying visual patterns
• Identifying anomalies
• Seeing patterns across groups
• Interpreting content of images
Gestalt Patterns
How to create a Big Data strategy and what
people and skills will you need for Big Data?
Data Scientists
Data Scientists
• Data Scientists need to be able to have all three competencies
- Coding
- Statistics
- Domain Knowledge
Domain Knowledge
Coding
• Competencies to combine a variety of data to determine patterns and
trends
Types and Skills in Data Science
• “Analyzing the analyzers”
- 40 page book
- Studied 250 data scientists
Types and Skills in Data Science
Types and Skills in Data Science
What should be your strategy?
• Are you Conservative/Moderate/Aggressive ?
• Factors
- Competitors
- Is Industry Technology focused?
- Availability of data
- Data expertise
Big Data Strategy
• Build awareness/competencies
• Low cost of entry
- Open Source
- Cloud based hosting
- Unlike expensive Analytics, this is available to everyone
• Create Big Data Targets - pain point for efficiency or improvement
- Which business process needs better decision making
- Which business process needs faster decision making
- Is someone likely to employ Big Data? If so, where?
- Are we processing large amounts of data that could be made better?
- Could we create a new/enhanced data driven product or service?
Big Data Strategy
• For the potential Big Data targets, is there additional data surrounding
the target that would allow for better decision making?
- Can we acquire that data and incorporate the data into our analysis
- How can we combine different types of data to improve our analysis that
have not been combined before
• Refunds/length of time person spent in store originally?
• Refunds/Salesperson?
• Experiment with a solution and iterate
• Always start with a business problem that could have a Big Data
solution
- Too many refunds or losing clients
- Big Data is not a solution unto itself
- Learn from the Data Warehouse projects
Questions?

More Related Content

Viewers also liked

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agileTerry Bunio
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09Terry Bunio
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back againTerry Bunio
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20Terry Bunio
 
Film analysis
Film analysisFilm analysis
Film analysischalk94
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007Terry Bunio
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enoughTerry Bunio
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pmTerry Bunio
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonTerry Bunio
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 

Viewers also liked (17)

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Introduction to lean and agile
Introduction to lean and agileIntroduction to lean and agile
Introduction to lean and agile
 
Iiba.november.09
Iiba.november.09Iiba.november.09
Iiba.november.09
 
Habitáculo Típico
Habitáculo TípicoHabitáculo Típico
Habitáculo Típico
 
SSRS and Sharepoint there and back again
SSRS and Sharepoint   there and back againSSRS and Sharepoint   there and back again
SSRS and Sharepoint there and back again
 
Pmi sac november 20
Pmi sac november 20Pmi sac november 20
Pmi sac november 20
 
Agile roles
Agile rolesAgile roles
Agile roles
 
Sdec10 lean AMS
Sdec10 lean AMSSdec10 lean AMS
Sdec10 lean AMS
 
Film analysis
Film analysisFilm analysis
Film analysis
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Pmi june 5th 2007
Pmi june 5th 2007Pmi june 5th 2007
Pmi june 5th 2007
 
Sdec11 when user stories are not enough
Sdec11 when user stories are not enoughSdec11 when user stories are not enough
Sdec11 when user stories are not enough
 
Role of an agile pm
Role of an agile pmRole of an agile pm
Role of an agile pm
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
The final frontier
The final frontierThe final frontier
The final frontier
 

Similar to Ictam big data

Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?Nicole Aidney
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Panorama Software
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 

Similar to Ictam big data (20)

Big data
Big dataBig data
Big data
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
Big Data
Big DataBig Data
Big Data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 

More from Terry Bunio

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys roleTerry Bunio
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT FargoTerry Bunio
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primerTerry Bunio
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysTerry Bunio
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementationTerry Bunio
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project ManagerTerry Bunio
 
Agile in different environments
Agile in different environmentsAgile in different environments
Agile in different environmentsTerry Bunio
 

More from Terry Bunio (10)

Uof m empathys role
Uof m empathys roleUof m empathys role
Uof m empathys role
 
#YesEstimates
#YesEstimates#YesEstimates
#YesEstimates
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
Ssrs and sharepoint there and back again - SQL SAT Fargo
Ssrs and sharepoint   there and back again - SQL SAT FargoSsrs and sharepoint   there and back again - SQL SAT Fargo
Ssrs and sharepoint there and back again - SQL SAT Fargo
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
 
Estimating 101
Estimating 101Estimating 101
Estimating 101
 
Sdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92daysSdec09 kick off to deployment in 92days
Sdec09 kick off to deployment in 92days
 
Sdec10 lean package implementation
Sdec10 lean package implementationSdec10 lean package implementation
Sdec10 lean package implementation
 
Role of an agile Project Manager
Role of an agile Project ManagerRole of an agile Project Manager
Role of an agile Project Manager
 
Agile in different environments
Agile in different environmentsAgile in different environments
Agile in different environments
 

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Ictam big data

  • 1. ©2015 Protegra Inc. All rights reserved. Big Data Terry Bunio - Protegra
  • 2. Who Am I? • Data Base Administrator - Oracle - SQL Server 6,6.5,7,2000,2005,2008,2012 - Informix - ADABAS • Data Modeler/Architect - Investors Group, LPL Financial, Manitoba Blue Cross, Assante Financial, CI Funds, Mackenzie Financial - Normalized and Dimensional • Agilist - Innovation Gamer, Team Member, SQL Developer, Test writer, Sticky Sticker, Project Manager, PMO on SAP Implementation
  • 4. Members of the Protegra Community of software-driven businesses & solutions
  • 5.
  • 6.
  • 8. Myths • Big Data Myth #1: It’s Big • Big Data Myth #2: You need to apply it right away • Big Data Myth #3: The more granular the data, the better • Big Data Myth #4: Big Data is good data • Big Data Myth #5: Big Data means that analysts become all-important • Big Data Myth #6: Big Data gives you concrete answers • Big Data Myth #7: Big Data is a magic 8-ball • Big Data Myth #8: Big Data can create self-learning algorithms
  • 9. Evolution Term Time Frame Decision Support 1970-1985 Executive Support 1980-1990 Online Analytical Processing 1990-2000 Business Intelligence 1989-2005 Analytics 2005-2010 Big Data 2010-2015 Next??
  • 10. Facts • Every 2 days we create as much information as we did from the beginning of time until 2003 • Over 90% of all the data in the world was created in the past 2 years • It is expected that by 2020 the amount of digital information in existence will have grown from 3.2 zettabytes today to 40 zettabytes • The total amount of data being captured and stored by industry doubles every 1.2 years • Every minute we send 204 million emails, generate 1.8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000 photos to Facebook
  • 11. Facts • Google alone processes on average over 40 thousand search queries per second, making it over 3.5 billion in a single day • Around 100 hours of video are uploaded to YouTube every minute and it would take you around 15 years to watch every video uploaded by users in one day • If you burned all of the data created in just one day onto DVDs, you could stack them on top of each other and reach the moon – twice • AT&T is thought to hold the world’s largest volume of data in one unique database – its phone records database is 312 terabytes in size, and contains almost 2 trillion rows
  • 12. Facts • 570 new websites spring into existence every minute of every day • 1.9 million IT jobs will be created in the US by 2015 to carry out big data projects. Each of those will be supported by 3 new jobs created outside of IT – meaning a total of 6 million new jobs thanks to big data • Today’s data centres occupy an area of land equal in size to almost 6,000 football fields • Between them, companies monitoring Twitter to measure “sentiment” analyze 12 terabytes of tweets every day
  • 13. Facts • The amount of data transferred over mobile networks increased by 81% to 1.5 exabytes (1.5 billion gigabytes) per month between 2012 and 2014. Video accounts for 53% of that total • The NSA is thought to analyze 1.6% of all global internet traffic – around 30 petabytes (30 million gigabytes) every day • The value of the Hadoop market is expected to soar from $2 billion in 2013 to $50 billion by 2020, according to market research firm Allied Market Research
  • 14. Facts • The number of Bits of information stored in the digital universe is thought to have exceeded the number of stars in the physical universe in 2007 • The boom of the Internet of Things will mean that the amount of devices connected to the Internet will rise from about 13 billion today to 50 billion by 2020 • 12 million RFID tags – used to capture data and track movement of objects in the physical world – had been sold in by 2011. By 2021, it is estimated that number will have risen to 209 billion as the Internet of Things takes off
  • 15. Facts • Big data has been used to predict crimes before they happen – a “predictive policing” trial in California was able to identify areas where crime will occur three times more accurately than existing • By better integrating big data analytics into healthcare, the industry could save $300 billion a year – that’s the equivalent of reducing the healthcare costs of every man, woman and child by $1,000 a year methods of forecasting • Retailers could increase their profit margins by more than 60% through the full exploitation of big data analytics
  • 16. Facts • The big data industry is expected to grow from US$10.2 billion in 2013 to about US$54.3 billion by 2017
  • 17.
  • 18. What Big Data is and what it isn’t. What problems does it solve?
  • 19. What is Big Data not? • Not regular data • Not data that fits into the existing analytic paradigm/toolset • Doesn’t easily fit into the existing row/column structures
  • 20. New Types of Data • Activity Data - Listening to music - Watching movies - Browsing - Driving - Walking - Exercising • Intentional and Non-Intentional
  • 21. New Types of Data • Conversations - Audio - Visual - Textual
  • 22. New Types of Data
  • 23. New Types of Data • Image data - Photo • Cameras • Phones - Video • Cameras • CCTV • YouTube!
  • 24. New Types of Data • Machine to Machine - Cell phone to towers - Medical devices • Sensors - Location - Speed - Acceleration - Health - Altitude - Temperature - Humidity - Among many others….
  • 25. New Types of Data • Internet of Things - Toasters - Fridges - Phones - Jet Engines - Combines - Manufacturing
  • 26. Abandoned Activities • Exciting new area of big data - What mouse pattern and keystrokes are done but eventually abandoned - What articles do you almost comments on - What transactions do you start but not complete? - Do you do these share common themes? - At what step is the transaction abandoned?
  • 27. What makes Data, Big Data? • Volume • Velocity • Variety • Veracity
  • 28. Volume • If we take all the data generated from the beginning of time to 2008 - That same amount of data is now generated almost every minute • We can now store that data across immense networks • But there is more data than we can possibly analyze • One airplane generates 2 terrabytes of data about it’s engines on a flight across North America • Half of data for analysis was surveillance video in 2012
  • 29. Big Data versus Lotsa Data • Lotsa Data - Same structured data as you currently have • Just more of it - Can be analyzed using the same paradigms/toolsets • May just take a bit longer
  • 30. Velocity • Speed that data is generated, analyzed, and the speed that data moves around - Think of how quickly something can be trending on Twitter • Technology now allows us to analyze data in memory without it ever being stored • 200 Billion tweets per year • Lots of prior analytics are used to static data - But what happens to your analysis if the data is constantly changing? • Sensor data - Can’t store all of it, just too much data
  • 31. Variety • We used to be able to just focus on structured data in neat relational table structures • 80% of the data is now unstructured - Text - Image - Video - Voice - Social Graph data - No SQL • Big Data Technology can now bring different types of data together to analyze
  • 32. Variety • Structured • Unstructured • Semi-structured - XML - JSON - Usually applied to text to try and enhance the analysis that can be done - Stored in NoSQL databases • MongoDB
  • 33. Variety • “Variety is the biggest factor leading companies to Big Data” – Gartner • Do you need all three factors to make it big data? - No - Any two of these factors that cause the existing analysis to fall short can make the data Big Data • As long as one is variety
  • 34. Veracity • 4th V - No mentioned consistently • Definition - Messiness or trustworthiness of data - Involves the lineage of the data - Twitter versus corporate data - Is there insightful value in the data? • 5th V - Validity
  • 36. What Problems does it solve? • Risk Modelling - Banks and Insurance - Credit Card activity • Customer churn analysis - What were people doing just before they left? • Recommendation engines - LinkedIn, Amazon, Netflix • Ad targeting • Aggregated Transactional Analysis - Supply Chain Management • Threat Analysis
  • 37. Who is leading the way? • LinkedIn - People you may know • Netflix - Million dollar prize • GE/John Deere - Predictive servicing on industrial devices • Amazon - Books others have also bought • Google - For everything
  • 38. Who is leading the way? • UPS • United HealthCare • Macys • Bank of America • Citigroup • Verizon Wireless • City of Brandon - Snowclearing website
  • 39. What can be achieved? • Cost reductions - Large Enterprises • Time reductions - Large Enterprises and Small Enterprises • Better decisions - Larges Enterprises, Small Enterprises, and Start ups • New Offerings/Product Innovations - Larges Enterprises, Small Enterprises, and Start ups
  • 40. Why is it important? • Big Data is here to stay • If you don’t embrace it, your competition will. They will use it to: - Deliver with less cost and faster - Develop new innovations - Understand customer better to attract them and prevent them from leaving - Make quicker and better decisions
  • 41. What is the technology behind Big Data?
  • 42. Distributed Data • Big Data just is too big to be stored on one computer • Storing data across multiple computers allows you to take advantage of other computer’s processing power • SANs were the first solution to distributed data • Cloud data is now the second generation of the solution - Amazon S3 – Netflix - Amazon Glacier
  • 43. Could Computing • IaaS, PaaS, SaaS, DaaS • All made possible with virtualization
  • 45. CAP Theorem • Consistency • Availability • Scalability • Transactional Data not a good fit for Hadoop - Requires Consistency • Behavioural Data is a good fit - Health Care - Social Media
  • 46. Hadoop • Hadoop was the named of the stuffed elephant that belonged to one of the developers • Not a single product • Collection of applications • Framework or platform • Several modules • Not a database - Alternative file system with a processing library
  • 47. Hadoop • Why use Hadoop? - Cheaper - Faster - Better suited to unstructured data
  • 48. HDFS • Hadoop Distributed File System that is spread across many computers – 100’s to 10,000’s • Not a database - These are individual files stored across the computers • Based off of Google’s GFS - To index the Internet
  • 49. MapReduce • Map splits a task into many pieces - Split a task up and send it to many computers • Reduce takes the results and combines them back together • Has been replaced by YARN - Sometimes called MapReduce2 • Important feature of Yarn - Can do stream processing in addition to batch processing - Can also do Graph processing
  • 50. MapReduce • Programming paradigm - Map component executes a function on each piece of data • Execute on each node • Bring the compute to the data • Output key and value pairs on each node - Reducer Aggregates the key value pairs on the nodes • Outputs a combined list - Mapper and Reducer are classes • Really a Functional programming model though • State is not shared
  • 51. Hello World for MapReduce • Wordcount - How much wood could a woodchuck chuck if a woodchuck could chuck wood? - {how,1;much,1;wood,2;could,2;a,2;woodchuck,2;chuck,2;if,1} • We are essentially labelling and counting components of data
  • 52. Pig and Hive • Pig - Platform used to write MapReduce programs - Use Pig Latin Language • Hive - Summarizes query - Analyzes data - Uses the HiveQL language
  • 53. Additional Components • Hbase – NoSQL database • Storm – process streaming data • Spark – in-memory processing • Giraph – graph processing
  • 54. Hadoop is Open Source • Developed by engineers at Yahoo • Now an open-source project - You may hear about apache-Hadoop, apache-pig • Hadoop is free • Anyone can download or modify
  • 55. What are large companies and startups using Big Data for?
  • 56. General Uses • Monitoring production lines • Smart meters for utilities • Environmental Monitoring • Infrastructure Management (bridges, railways) • Supply chain network • Predictive maintenance • Energy Management • Medical and Health Care systems • Home Automation
  • 57. Common Applications of Big Data • For Consumers - SIRI and Yelp - Spotify and Amazon - LinkedIn - NetFlix - Google Now • GPS aware • Provide pro-active advice on traffic and meal
  • 58. Common Applications of Big Data • For Business - Google Ad searches - John Deere - Boeing - American Airlines - Orbitz - Fraud Detection • The way you move your mouse and navigate website are distinctive
  • 59. Common Applications of Big Data • Google flu trends - Anybody know how it is generated? - Based on Google search history rather than lab results • It was found this was more accurate
  • 60. General Uses of Big Data
  • 61. Monitoring and Anomaly Detection • Monitoring detects specific events • Anomaly Detection detects unexpected events - Unusual Activity - Could be on a combination of criteria - Usually requires human attention - Invites inspection - Hey, I’m not sure this is an issue but it isn’t common • Big Data allows for more detail - Extremely rare events - Combination of a large number of factors • Measure 1,000 different factors at once • SPAM – Big Data Collection - GMAIL
  • 62. Data Mining and Text Analytics • Search for unexpected patterns - Supermarket/stock investments/set of symptoms • Text Mining - Focuses on content - Sentiment analysis • Positive/negative • Work best with very large data sets
  • 63. Predictive Analytics • Crystal Ball of Big Data • Nate Silver - Accurately predicted every state in 2012 election - Combined election polls and weighted them by reliability - ESPN bought his website • Go visit for March Madness • Netflix – 10% prize - Ensemble Model - Go to kaggle.com • Offer compensation for people to create predictive models for them • Free data to teach predictive modeling
  • 64. Visualization • Computers spot certain patterns • Computers excel at predictive models • Computers excel at data mining • Humans perceive and interpret better • Human vision plays an important role in Big Data
  • 65. What Humans do well • Identifying visual patterns • Identifying anomalies • Seeing patterns across groups • Interpreting content of images
  • 67. How to create a Big Data strategy and what people and skills will you need for Big Data?
  • 69. Data Scientists • Data Scientists need to be able to have all three competencies - Coding - Statistics - Domain Knowledge
  • 71. Coding • Competencies to combine a variety of data to determine patterns and trends
  • 72. Types and Skills in Data Science • “Analyzing the analyzers” - 40 page book - Studied 250 data scientists
  • 73. Types and Skills in Data Science
  • 74. Types and Skills in Data Science
  • 75. What should be your strategy? • Are you Conservative/Moderate/Aggressive ? • Factors - Competitors - Is Industry Technology focused? - Availability of data - Data expertise
  • 76. Big Data Strategy • Build awareness/competencies • Low cost of entry - Open Source - Cloud based hosting - Unlike expensive Analytics, this is available to everyone • Create Big Data Targets - pain point for efficiency or improvement - Which business process needs better decision making - Which business process needs faster decision making - Is someone likely to employ Big Data? If so, where? - Are we processing large amounts of data that could be made better? - Could we create a new/enhanced data driven product or service?
  • 77. Big Data Strategy • For the potential Big Data targets, is there additional data surrounding the target that would allow for better decision making? - Can we acquire that data and incorporate the data into our analysis - How can we combine different types of data to improve our analysis that have not been combined before • Refunds/length of time person spent in store originally? • Refunds/Salesperson? • Experiment with a solution and iterate • Always start with a business problem that could have a Big Data solution - Too many refunds or losing clients - Big Data is not a solution unto itself - Learn from the Data Warehouse projects