Why Your Big Data Project Will Fail, and How to Avoid It

•Download as PPTX, PDF•

0 likes•481 views

Why Classification of your Big Data matters with examples of failed classification or interpretation of datum can have bad consequences.

Internet

Big Data and
Classification
Paul Balas
Content Architect
303Computing

A Dystopian Future
George Orwell feared those who would deprive us of information.
He feared the truth would be concealed from us.

He never imagined “Big Data”
A glut of information that
would conceal truth
unintentionally

In 9.5 minutes or less…
Convince you that without classification
BIG DATA FAILS

Methods for Classification
Data Mining Classification
Machine-driven
Taxonomies
Human-driven

Classification Helps!
• Group information by common
attributes
• Easily compare similarities
and differences

People Classify
(most) All classification is done by
humans at some point in the life of a
datum
Not Machines
Not Algorithms

But Deep Learning is Changing This
• Machines can now ‘recognize’ complex
objects without supervision

Without Classification, finding information
is like finding a needle in a haystack…

Or, mistaking the haystack for a pile of needles

People don’t always agree with rules
of the game for example
Super Bowl XL
Scott Steinmann

A Quiz for you…
On the next slide, I want you to tell
me what these four types of data
have in common
Raise your hand when you get the
answer…
(don’t worry, I won’t call on anyone)

“A computer would deserve to be called
intelligent if it could deceive a human into
believing that it was human.”

Did you get it right?
Alan Turing
The more data types we have
The harder the classification

Classification Cracked The Enigma Code
158,962,555,217,826,360,000
possibilities
Turing used Classification of the data to
narrow the problem set
1st A letter can never be itself
2nd Known Phrases - The weather report

Without Classification
There is no Correlation
Without Correlation
We are all out of jobs!

The ‘Classification Food Chain’
Classification shapes data
Shaped data enables data quality
Data Quality delivers confidence in results

Bad Classification Has Bad Consequences
Elections are won
Shuttles explode
Financial Markets
Meltdown

If you want to be confident in
your Big Data results…
Invest in your classifications as
they are critical to your success!

Thank You!
Paul Balas
303computing@gmail.com

Similar to Why Your Big Data Project Will Fail, and How to Avoid It

Copy of getting into ai event slides (PDF)Matthew Miller

Big data new physics giga om structure conference ny - march 2011Jeff Jonas

Mass declassification sept 23 2010v2.1Jeff Jonas

How Big Data identifies early indicators of Mental StressCoert Du Plessis (杜康)

From Information to Insight: Data Storytelling for OrganizationsThinking Machines

Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...Bruno Henrique - Garu

Human and Machine LearningAnkur Jain

Machine Learning: Understanding the Invisible Force Changing Our WorldKen Tabor

Talking Tech - the art and science of communicating complex ideas (Bristech2...Cecilia Thirlway

Machine Learning & Artificial Intelligence: Beyond Diagnosis SMARTMD

Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman

IBM IOD Conference 2011 Opening Keynote DeckJeff Jonas

A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...Junaid Qadir

Fairness in Machine Learning @CodemotionAzzurra Ragone

Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...John Liu

Strata Conference NY: The Accidental Chief Privacy OfficerJim Adler

4 Things You Didn't Know About Big DataTyrone Systems

Big Data Analytics - The New Cold WarKunal Dutta

Healthcare Best Practices in Data Warehousing & AnalyticsDale Sanders

CoderRank: Creating Gold StandardsStuart Shulman

Similar to Why Your Big Data Project Will Fail, and How to Avoid It (20)

Copy of getting into ai event slides (PDF)

Big data new physics giga om structure conference ny - march 2011

Mass declassification sept 23 2010v2.1

How Big Data identifies early indicators of Mental Stress

From Information to Insight: Data Storytelling for Organizations

Big Data, Inteligência Artificial, Machine Learning e o que Hollywood não vai...

Human and Machine Learning

Machine Learning: Understanding the Invisible Force Changing Our World

Talking Tech - the art and science of communicating complex ideas (Bristech2...

Machine Learning & Artificial Intelligence: Beyond Diagnosis

Sifting Social Data: Word Sense Disambiguation Using Machine Learning

IBM IOD Conference 2011 Opening Keynote Deck

A Thinking Person's Guide to Using Big Data for Development: Myths, Opportuni...

Fairness in Machine Learning @Codemotion

Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...

Strata Conference NY: The Accidental Chief Privacy Officer

4 Things You Didn't Know About Big Data

Big Data Analytics - The New Cold War

Healthcare Best Practices in Data Warehousing & Analytics

CoderRank: Creating Gold Standards

Recently uploaded

AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh

On Starlink, presented by Geoff Huston at NZNOG 2024APNIC

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC

How is AI changing journalism? (v. April 2024)Damian Radcliffe

VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya

Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh

Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4

Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe

Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC

Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh

VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13

AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁

Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6

FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066

Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131

Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755

Recently uploaded (20)

AWS Community DAY Albertini-Ellan Cloud Security (1).pptx

Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝

On Starlink, presented by Geoff Huston at NZNOG 2024

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...

How is AI changing journalism? (v. April 2024)

VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl

Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝

Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata

Moving Beyond Twitter/X and Facebook - Social Media for local news providers

Networking in the Penumbra presented by Geoff Huston at NZNOG

Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance

Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝

VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room

AlbaniaDreamin24 - How to easily use an API with Flows

Russian Call girl in Ajman +971563133746 Ajman Call girl Service

FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607

Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$

Challengers I Told Ya ShirtChallengers I Told Ya Shirt

Why Your Big Data Project Will Fail, and How to Avoid It

1. Big Data and Classification Paul Balas Content Architect 303Computing

2. A Dystopian Future George Orwell feared those who would deprive us of information. He feared the truth would be concealed from us.

3. He never imagined “Big Data” A glut of information that would conceal truth unintentionally

4. In 9.5 minutes or less… Convince you that without classification BIG DATA FAILS

6. Methods for Classification Data Mining Classification Machine-driven Taxonomies Human-driven

7. Classification Helps! • Group information by common attributes • Easily compare similarities and differences

8. People Classify (most) All classification is done by humans at some point in the life of a datum Not Machines Not Algorithms

9. But Deep Learning is Changing This • Machines can now ‘recognize’ complex objects without supervision

10. Without Classification, finding information is like finding a needle in a haystack…

11. Or, mistaking the haystack for a pile of needles

12. With Big Data, the haystack is huge

13. People don’t always agree with rules of the game for example Super Bowl XL Scott Steinmann

14. A Quiz for you… On the next slide, I want you to tell me what these four types of data have in common Raise your hand when you get the answer… (don’t worry, I won’t call on anyone)

15. “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.”

16. Did you get it right? Alan Turing The more data types we have The harder the classification

17. Classification Cracked The Enigma Code 158,962,555,217,826,360,000 possibilities Turing used Classification of the data to narrow the problem set 1st A letter can never be itself 2nd Known Phrases - The weather report

18. Without Classification There is no Correlation Without Correlation We are all out of jobs!

19. The ‘Classification Food Chain’ Classification shapes data Shaped data enables data quality Data Quality delivers confidence in results

20. Bad Classification Has Bad Consequences Elections are won Shuttles explode Financial Markets Meltdown

21. If you want to be confident in your Big Data results… Invest in your classifications as they are critical to your success!

22. Thank You! Paul Balas 303computing@gmail.com

Editor's Notes

Big Data and Classification – why more than ever, classification and good data architecture is critical to providing confidence in analytical outcomes for your big data project. Paul Balas 25 + years of data architecture and data-centric implementations from sourcing all types of data, to sourcing, mastering, modeling, and sharing Implemented numerous content architectures for fortune 500 companies and the Kingdom of Saudi Arabia – KAPSARC Chair of the Big Data in Denver LinkedIn Group.
He never imagined a world with so much data that information would be obscured simply by the volume and variety of data.
If you can’t categorize your data, you can’t analyze it. If you aren’t performing data profiling on your big data as a first step to your analysis, you’ve already failed.
The earliest known system of classification is that of Aristotle, who attempted in the 4th cent. B.C. to group animals according to such criteria as mode of reproduction and possession or lack of red blood. Aristotle's pupil Theophrastus classified plants according to their uses and methods of cultivation. Little interest was shown in classification until the 17th and 18th cent., when botanists and zoologists began to devise the modern scheme of categories. The designation of groups was based almost entirely on superficial anatomical resemblances.
Machine driven classification can assist in human analysis and refinement of classification systems, but as it stands, without human context, machine driven classification is limited.
Well-known classification systems such as GAAP-based accounting or plant taxonomies provide a common language that is widely accepted and therefor trusted. Common classification systems facilitate understanding and knowledge sharing.
A new Big Data phenomenon is the ‘Data Lake’. I like to call it the ‘Data Swamp’ as the information added to the lake is useless until it’s classified. The excitement around Hadoop and other NoSQL technologies is it allows you to defer classfication, cleansing, and standardization post-load and on the fly, thus making the ingestion process and certain types of analytical workloads much faster
Billions of dollars and tens of thousands of person-years effort has been spent on search technologies all with the focus of classifying data on-the-fly to help people locate precise information. Most of this effort has been driven by the internet search engines and firms trying to capitalize on e-commerce.
Bad categorization of a population has the effect of completely misleading results and creating controversy NASA: Ninety-seven percent of climate scientists agree that climate-warming trends over the past century are very likely due to human activities,1and most of the leading scientific organizations worldwide have issued public statements endorsing this position. The following is a partial list of these organizations, along with links to their published statements and a selection of related resources. The Wall Street Journal The Myth of the Climate Change '97%' What is the origin of the false belief—constantly repeated—that almost all scientists agree about global warming? By JOSEPH BAST And ROY SPENCER Ms. Oreskes's definition of consensus covered "man-made" but left out "dangerous"—and scores of articles by prominent scientists such as Richard Lindzen, John Christy, Sherwood Idso and Patrick Michaels, who question the consensus, were excluded. The methodology is also flawed. A study published earlier this year in Nature noted that abstracts of academic papers often contain claims that aren't substantiated in the papers.
According to IBM, In 2015 Global Data Volume is about 8,000 exabytes Most of it is sensor and social media data By 2020 some predict a 5x growth to 40,000 exabytes
Even though he was already ejected from the game, Scott Steinmann continues to argue with the umpires call. Why were there so many controversial calls in Super Bowl XL? Where the rules for each penalty applied fairly? The outcome of that game was hotly debated.
What was easy for those of you who knew the answer, is exponentially difficult for machines. Each data type has to be parsed and a common taxonomy applied as metadata to the data itself, then correlated to find the commonalities in each data source.
That is 158 Quintillion if you wanted to know.
Was the chad in favor of Bush or Gore? The risk of the o-ring failure wasn’t correctly classified based on the temperatures it would encounter. Credit default swap risk wasn’t correctly categorized and risky financial decisions ensued.

Why Your Big Data Project Will Fail, and How to Avoid It

Recommended

Recommended

More Related Content

Similar to Why Your Big Data Project Will Fail, and How to Avoid It

Similar to Why Your Big Data Project Will Fail, and How to Avoid It (20)

Recently uploaded

Recently uploaded (20)

Why Your Big Data Project Will Fail, and How to Avoid It

Editor's Notes