SlideShare a Scribd company logo
Big Data and
Classification
Paul Balas
Content Architect
303Computing
A Dystopian Future
George Orwell feared those who would deprive us of information.
He feared the truth would be concealed from us.
He never imagined “Big Data”
A glut of information that
would conceal truth
unintentionally
In 9.5 minutes or less…
Convince you that without classification
BIG DATA FAILS
Methods for Classification
Data Mining Classification
Machine-driven
Taxonomies
Human-driven
Classification Helps!
• Group information by common
attributes
• Easily compare similarities
and differences
People Classify
(most) All classification is done by
humans at some point in the life of a
datum
Not Machines
Not Algorithms
But Deep Learning is Changing This
• Machines can now ‘recognize’ complex
objects without supervision
Without Classification, finding information
is like finding a needle in a haystack…
Or, mistaking the haystack for a pile of needles
With Big Data, the haystack is huge
People don’t always agree with rules
of the game for example
Super Bowl XL
Scott Steinmann
A Quiz for you…
On the next slide, I want you to tell
me what these four types of data
have in common
Raise your hand when you get the
answer…
(don’t worry, I won’t call on anyone)
“A computer would deserve to be called
intelligent if it could deceive a human into
believing that it was human.”
Did you get it right?
Alan Turing
The more data types we have
The harder the classification
Classification Cracked The Enigma Code
158,962,555,217,826,360,000
possibilities
Turing used Classification of the data to
narrow the problem set
1st A letter can never be itself
2nd Known Phrases - The weather report
Without Classification
There is no Correlation
Without Correlation
We are all out of jobs!
The ‘Classification Food Chain’
Classification shapes data
Shaped data enables data quality
Data Quality delivers confidence in results
Bad Classification Has Bad Consequences
Elections are won
Shuttles explode
Financial Markets
Meltdown
If you want to be confident in
your Big Data results…
Invest in your classifications as
they are critical to your success!
Thank You!
Paul Balas
303computing@gmail.com

More Related Content

Similar to Why Your Big Data Project Will Fail, and How to Avoid It

Copy of getting into ai event slides (PDF)
Copy of getting into ai   event slides (PDF)Copy of getting into ai   event slides (PDF)
Copy of getting into ai event slides (PDF)
Matthew Miller
 
Big data new physics giga om structure conference ny - march 2011
Big data new physics   giga om structure conference ny - march 2011Big data new physics   giga om structure conference ny - march 2011
Big data new physics giga om structure conference ny - march 2011
Jeff Jonas
 
Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1
Jeff Jonas
 
How Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental StressHow Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental Stress
Coert Du Plessis (杜康)
 
From Information to Insight: Data Storytelling for Organizations
From Information to Insight: Data Storytelling for OrganizationsFrom Information to Insight: Data Storytelling for Organizations
From Information to Insight: Data Storytelling for Organizations
Thinking Machines
 
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
Bruno Henrique - Garu
 
Human and Machine Learning
Human and Machine LearningHuman and Machine Learning
Human and Machine Learning
Ankur Jain
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
Ken Tabor
 
Talking Tech - the art and science of communicating complex ideas (Bristech2...
Talking Tech  - the art and science of communicating complex ideas (Bristech2...Talking Tech  - the art and science of communicating complex ideas (Bristech2...
Talking Tech - the art and science of communicating complex ideas (Bristech2...
Cecilia Thirlway
 
Machine Learning & Artificial Intelligence: Beyond Diagnosis
Machine Learning & Artificial Intelligence: Beyond Diagnosis Machine Learning & Artificial Intelligence: Beyond Diagnosis
Machine Learning & Artificial Intelligence: Beyond Diagnosis
SMARTMD
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Stuart Shulman
 
IBM IOD Conference 2011 Opening Keynote Deck
IBM IOD Conference 2011 Opening Keynote DeckIBM IOD Conference 2011 Opening Keynote Deck
IBM IOD Conference 2011 Opening Keynote Deck
Jeff Jonas
 
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
Junaid Qadir
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
Azzurra Ragone
 
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
John Liu
 
Strata Conference NY: The Accidental Chief Privacy Officer
Strata Conference NY: The Accidental Chief Privacy OfficerStrata Conference NY: The Accidental Chief Privacy Officer
Strata Conference NY: The Accidental Chief Privacy Officer
Jim Adler
 
4 Things You Didn't Know About Big Data
4 Things You Didn't Know About Big Data4 Things You Didn't Know About Big Data
4 Things You Didn't Know About Big Data
Tyrone Systems
 
Big Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarBig Data Analytics - The New Cold War
Big Data Analytics - The New Cold War
Kunal Dutta
 
Healthcare Best Practices in Data Warehousing & Analytics
Healthcare Best Practices in Data Warehousing & AnalyticsHealthcare Best Practices in Data Warehousing & Analytics
Healthcare Best Practices in Data Warehousing & Analytics
Dale Sanders
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold Standards
Stuart Shulman
 

Similar to Why Your Big Data Project Will Fail, and How to Avoid It (20)

Copy of getting into ai event slides (PDF)
Copy of getting into ai   event slides (PDF)Copy of getting into ai   event slides (PDF)
Copy of getting into ai event slides (PDF)
 
Big data new physics giga om structure conference ny - march 2011
Big data new physics   giga om structure conference ny - march 2011Big data new physics   giga om structure conference ny - march 2011
Big data new physics giga om structure conference ny - march 2011
 
Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1
 
How Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental StressHow Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental Stress
 
From Information to Insight: Data Storytelling for Organizations
From Information to Insight: Data Storytelling for OrganizationsFrom Information to Insight: Data Storytelling for Organizations
From Information to Insight: Data Storytelling for Organizations
 
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...
 
Human and Machine Learning
Human and Machine LearningHuman and Machine Learning
Human and Machine Learning
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
 
Talking Tech - the art and science of communicating complex ideas (Bristech2...
Talking Tech  - the art and science of communicating complex ideas (Bristech2...Talking Tech  - the art and science of communicating complex ideas (Bristech2...
Talking Tech - the art and science of communicating complex ideas (Bristech2...
 
Machine Learning & Artificial Intelligence: Beyond Diagnosis
Machine Learning & Artificial Intelligence: Beyond Diagnosis Machine Learning & Artificial Intelligence: Beyond Diagnosis
Machine Learning & Artificial Intelligence: Beyond Diagnosis
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
IBM IOD Conference 2011 Opening Keynote Deck
IBM IOD Conference 2011 Opening Keynote DeckIBM IOD Conference 2011 Opening Keynote Deck
IBM IOD Conference 2011 Opening Keynote Deck
 
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
 
Strata Conference NY: The Accidental Chief Privacy Officer
Strata Conference NY: The Accidental Chief Privacy OfficerStrata Conference NY: The Accidental Chief Privacy Officer
Strata Conference NY: The Accidental Chief Privacy Officer
 
4 Things You Didn't Know About Big Data
4 Things You Didn't Know About Big Data4 Things You Didn't Know About Big Data
4 Things You Didn't Know About Big Data
 
Big Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarBig Data Analytics - The New Cold War
Big Data Analytics - The New Cold War
 
Healthcare Best Practices in Data Warehousing & Analytics
Healthcare Best Practices in Data Warehousing & AnalyticsHealthcare Best Practices in Data Warehousing & Analytics
Healthcare Best Practices in Data Warehousing & Analytics
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold Standards
 

Recently uploaded

Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
SEO Article Boost
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
zyfovom
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
Danica Gill
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
hackersuli
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
Trending Blogers
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Florence Consulting
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 

Recently uploaded (20)

Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 

Why Your Big Data Project Will Fail, and How to Avoid It

Editor's Notes

  1. Big Data and Classification – why more than ever, classification and good data architecture is critical to providing confidence in analytical outcomes for your big data project. Paul Balas 25 + years of data architecture and data-centric implementations from sourcing all types of data, to sourcing, mastering, modeling, and sharing Implemented numerous content architectures for fortune 500 companies and the Kingdom of Saudi Arabia – KAPSARC Chair of the Big Data in Denver LinkedIn Group.
  2. He never imagined a world with so much data that information would be obscured simply by the volume and variety of data.
  3. If you can’t categorize your data, you can’t analyze it. If you aren’t performing data profiling on your big data as a first step to your analysis, you’ve already failed.
  4. The earliest known system of classification is that of Aristotle, who attempted in the 4th cent. B.C. to group animals according to such criteria as mode of reproduction and possession or lack of red blood. Aristotle's pupil Theophrastus classified plants according to their uses and methods of cultivation. Little interest was shown in classification until the 17th and 18th cent., when botanists and zoologists began to devise the modern scheme of categories. The designation of groups was based almost entirely on superficial anatomical resemblances.
  5. Machine driven classification can assist in human analysis and refinement of classification systems, but as it stands, without human context, machine driven classification is limited.
  6. Well-known classification systems such as GAAP-based accounting or plant taxonomies provide a common language that is widely accepted and therefor trusted. Common classification systems facilitate understanding and knowledge sharing.
  7. A new Big Data phenomenon is the ‘Data Lake’. I like to call it the ‘Data Swamp’ as the information added to the lake is useless until it’s classified. The excitement around Hadoop and other NoSQL technologies is it allows you to defer classfication, cleansing, and standardization post-load and on the fly, thus making the ingestion process and certain types of analytical workloads much faster
  8. Billions of dollars and tens of thousands of person-years effort has been spent on search technologies all with the focus of classifying data on-the-fly to help people locate precise information. Most of this effort has been driven by the internet search engines and firms trying to capitalize on e-commerce.
  9. Bad categorization of a population has the effect of completely misleading results and creating controversy NASA: Ninety-seven percent of climate scientists agree that climate-warming trends over the past century are very likely due to human activities,1and most of the leading scientific organizations worldwide have issued public statements endorsing this position. The following is a partial list of these organizations, along with links to their published statements and a selection of related resources.   The Wall Street Journal The Myth of the Climate Change '97%' What is the origin of the false belief—constantly repeated—that almost all scientists agree about global warming? By  JOSEPH BAST And   ROY SPENCER   Ms. Oreskes's definition of consensus covered "man-made" but left out "dangerous"—and scores of articles by prominent scientists such as Richard Lindzen, John Christy, Sherwood Idso and Patrick Michaels, who question the consensus, were excluded. The methodology is also flawed. A study published earlier this year in Nature noted that abstracts of academic papers often contain claims that aren't substantiated in the papers.  
  10. According to IBM, In 2015 Global Data Volume is about 8,000 exabytes Most of it is sensor and social media data By 2020 some predict a 5x growth to 40,000 exabytes 
  11. Even though he was already ejected from the game, Scott Steinmann continues to argue with the umpires call. Why were there so many controversial calls in Super Bowl XL? Where the rules for each penalty applied fairly? The outcome of that game was hotly debated.
  12. What was easy for those of you who knew the answer, is exponentially difficult for machines. Each data type has to be parsed and a common taxonomy applied as metadata to the data itself, then correlated to find the commonalities in each data source.
  13. That is 158 Quintillion if you wanted to know.
  14. Was the chad in favor of Bush or Gore? The risk of the o-ring failure wasn’t correctly classified based on the temperatures it would encounter. Credit default swap risk wasn’t correctly categorized and risky financial decisions ensued.