SlideShare a Scribd company logo
Big Data: How does it fit in your
data strategy?
A Lunch & Learn webinar for IT Management
Brought by Performance Tuning Corporation
www.perftuning.com
Panelists
Mark Swanholm
Chief Strategy Officer
Performance Tuning Corporation
https://www.linkedin.com/in/mswanholm
Dan Morgan
Oracle ACE Director
Performance Tuning Corporation
https://www.linkedin.com/pub/dan-morgan/0/aa9/a5
Agenda
• Introduction
• What is Big Data?
• Big Data's Sources
• The Four Vs
• Three Roads
• Conclusion
• Founded in 1997
– Team spun out of Compaq Performance Lab
– Focused on solving the tough/complex and messy data architecture problems
– Very Senior team of EXPERTS
• Over 1000 clients & counting
• Key industries: Financial Services, Telecom, Oil & Gas,
Healthcare
• Oracle Platinum Partner: Oracle Ace Director and Oracle Ace on
staff
About PTC Select Clients
• Database & Engineered Sys.
• Storage, Server and Network
• Consulting, Managed Services &
Training
Focus on:
High Performance Architectures
Introduction: Daniel Morgan
• Oracle ACE Director
• Wrote Oracle curriculum and primary program instructor at University of Washington
• Oracle consultant to Harvard University
• The Morgan behind Morgan's Library on the web
www.morganslibrary.org
• 10g, 11g, and 12c Beta tester
• Member: New York Oracle Users Group
• Retired chair Washington Software Assoc. Database SIG
• Co-Founder International GoldenGate Users Group
• Never an employee of Oracle Corp.
WHAT IS
BIG DATA?
Sources:
http://www.sas.com/en_us/insights/big-
data/what-is-big-data.html
http://www.sas.com/en_us/insights/big-
data/what-is-big-data.html
http://www.mongodb.com/big-data-
explained
Source:
What pros say Big Data is
What people think Big Data is
3.5%
7.5%
8.8%
10.4%
17.4%
52.4%
None of the above
I don't know
All of the web-based data and content
businesses use for their own operations
I'm not really sure what "Big Data" refers
to
The mass amounts of internal information
that is stored and managed by an…
All of the external and internal web-based
data available for business intelligence
Definition of "Big Data"
Source: Connotate, “Connotate 2012 Big Data Attitudes and Perceptions Survey,” Oct. 1, 2012
To quote Forbes Magazine…
Source: http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/
So… what is "Big Data”?
• “Big Data” marketing focused on
“net new” infrastructure and
techniques over existing solutions
• Generally refers to storing data in
an unstructured manner and
applying pattern recognition to
extract value
The Four V’s at a glance
Volume
Velocity
35 ZB
~12 TB/Day
~1 TB/Day
Variety
Veracity
are probably wrong
The speed at which data is produced and collected.
In Fraud detection, for example, minutes count.
~30B RFID sensors
The variety forms of data and its origins
80% of the world data is unstructured
The scale of the data
-Data quality uncertainty. That it exists does not mean
it has value
-The fact that someone believes something does not
make it true
1/3 business leaders don't trust the
Information they use to make decisions
2/3
Big Data volume: a few facts
Every day, we create
2.5 quintillion
bytes of data:
So much that 90% of
the data in the
world today has been
created in the last
two years alone
This data comes from
everywhere:
• sensors used to gather
climate information
• posts to social media
sites
• digital pictures and
videos
• purchase transaction
records
• cell phone GPS signals
to name a few
From the beginning of
recorded time until 2003,
we created 5B
gigabytes of data
In 2011, the same amount
was created every two
days
In 2013, the same amount
of data is created every
10 minutes
"Big Data”variety
• Audio
• Digital TV
• Digital Photos
• Smart Phones
• Smart Appliances
• RFID Tags
• Medical Imaging
• Industrial Sensors
• Satellite Images
• Games
• Scanners
• Social Networks
• CAD/CAM Drawing
• Video Conferencing
• Digital Movies
• Search Engines
File Systems
Transactional Data
Content Management
Email
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom Sources
DataExplorer
• ERP Systems
• HR Systems
• CSR Systems
• Point-of-Sale
• Credit Reports
• Public Records
• Property Taxes
• Smart Meters
• Automotive Systems
• GPS and AIS Data
• License Plate Readers
• Medical Records
• Stock Trades
• Scientific Research
• White Papers
• Weather Forecasts
ISPs Movie Rentals Retailers Credit CardsSearch Engines Phone Systems
Where is data collected?
What are they collecting?
Restaurant czheck
Grocery Bill
Airline ticket
Hotel Bill
and more…
Veracity Questioned:
Even with traditional data, there are issues around VERACITY
CRM
J Robertson
Pittsburgh, PA 15213
35 West 15th
Name:
Address:
Address:
ERP
Janet Robertson
Pittsburgh, PA 15213
35 West 15th St.
Name:
Address:
Address:
Legacy
Jan Robertson
Pittsburgh, PA 15213
36 West 15th St.
Name:
Address:
Address:
SOURCE SYSTEMS
468
?
??
AD - Andora
AE - United Arab Emirates
AF - Afghanistan
AG - Antigua
AI - Albania
AM - Armenia
AN - Netherland Antilles
AO - Angola
AR - Argentina
AS - American Samoa
AT - Austria
AU - Australia
AW - Aruba
BB - Barbados
BE - Belgium
BG - Bulgaria
BH - Bahrain
BI - Burundi
countries? 93% valid states? 0.1% valid
ACCRA 00233 GH
ACCRA-GH
ACHAIA
ADMIRALTY
AE
AG
AGHIA PARASKEVI
AGUADA
AGUAS CLARAS
AICHI-KEN
AISNE - PICARDIE
AJMAN
AK
AKERSHUS
AL
AL NAHDA1
AL QUOZ
AL.
ALABAMA
ASEAN
ASIA
ASIAN PACIFIC
Canadian cities? 2.3% valid
CITY STATE
-------------------- ------
Calgar AB
Calgar7 AB
Calgaray AB
Calgaru AB
Calgary AB
Calgary AB AB
Calgary Alberta AB
Calgary Canada AB
Calgary NW AB
Calgary Nw AB
Calgary SW AB
Calgary T3B5Y4 AB
Calgary, AB
Calgary, AB AB
Calgary, Alberta AB
Calgary, Alta. AB
Calgary, T2K 1B7 AB
Calgay AB
Calgery AB
Calgry AB
TABLE_NAME COLUMN_NAME MAX_DATE
------------------------------ ------------------------------ --------------------
XX_LINEITEMS PICKUP_DATE 10-NOV-2013 11:00:00
CHECK_PAY_DETAIL CHECK_DATE 26-AUG-2014 00:00:00
AUTO_REORDER_ITEM START_DATE 07-OCT-2017 00:00:00
AUTO_REORDER_ITEM LAST_REORDER_DATE 10-APR-2020 00:00:00
XX_PRESCRIPTIONS LAST_PICKUP_DATE 05-FEB-2030 12:00:00
SUBORDERS SHIPPED_DATE 20-DEC-2058 00:00:47
STORECATS UPDATE_DATE 11-NOV-2099 00:00:00
INSURANCE_CARDS BIRTHDATE 30-DEC-2100 00:00:00
GIFT_CERTIFICATES DATE_EXPIRES 16-SEP-2111 00:00:00
ALTERNATE_PAYMENTS EXPIRE_DATE 16-SEP-2111 00:00:00
PRODUCTS AVAILABLE_DATE 20-AUG-2154 00:00:00
REFILL_TOO_SOON READY_TO_FILL_DATE 01-MAR-3011 00:00:00
RELATIONINSTANCE END_DATE 01-JUN-3011 00:00:00
VA_PROFILES NEXT_ORDER_DATE 18-MAR-7483 00:00:00
XXXXXXX_RX SCHEDULED_PICKUP_DATE 01-JAN-9865 14:00:00
CONTACT_LENS_INFO EXPIRE_DATE 01-JUN-9865 00:00:00
PHARMACY_PATIENTS BIRTHDATE 31-DEC-9999 00:00:00
XX_PRESCRIPTIONS LAST_DISPENSED_EXPIRES 31-DEC-9999 00:00:00
With MUCH of "Big Data" there are issues around veracity
Veracity explored
fever chills aches
There is a belief, valid in fact, that certain search terms are good
indicators of flu activity
Veracity explored: Predicting flu trends
Veracity explored: Data interpretation
We can
do better
than this!
Truth or Consequences?
• Google was tracking a trend
– What mattered was direction
– What mattered was magnitude
• How did this affect a retail pharmacy chain?
– Made decisions to order different inventory and reorganize shelves
– Purchased substantially more inventory than they could sell
– Shipped the new inventory to their stores
– Paid employees to process the new inventory
– Shipped existing inventory from stores to warehouses
– Items that could have sold were removed from the shelves and did not sell
while excess inventory did not sell
Mid-course corrections
• Google's algorithm was
written by professional
statisticians and SMEs
• Google was able to rely
on the CDC data to find
their errors and correct
their algorithm
How do
we know
what is
real?
A standard yardstick
• To determine Google's error in magnitude required a sanity
check provided by the CDC using traditional relational data
• Only 12% of existing relational data is analyzed and used to
make business decisions
• To access and query the 88% of the data not being utilized
requires
– Improving data integrity
– Building data warehouses ... not data dumpsters
– Creating decision support systems with the help of SMEs
So what should managers do?
Three Paths
Do it in house
Subcontract to a
service provider
Ignore it and
more fully
exploit existing
data resources
Ignore it Do it in-house Hire a service provider
Since only 12% of existing
relational data is analyzed
and used to make business
decisions:
• Create a data warehouse to
properly organize existing
data
• Create a decision support
system with domain
expertise
• Will require ETL expertise
Data Experts: Data Architects,
Management, Governance, Policy
Coders: Data masking
Infrastructure: Servers, Storage
Visualization Expertise: Data
set interpretation and correlation:
Present results in a meaningful way
Industry Vertical Domain
Expertise: Develop hypotheses,
Identify relevant business issues, Ask
the right questions
Math and Operations
Research: Algorithm Development
Share the costs by engaging a
service provider
• Provides the hardware and
software
• Provides the source data
• Provides statisticians and
programmers
You still need to:
• Ask the right questions
• Validate the results
• Interpret the results to derive
actionable information
• If merging with internal data
still requires data masking and
ETL
Optimize What You Already Own
• The vast majority of organizations already own infrastructure
sufficient to collect, collate, and analyze, the data sets
they own and for which they can vouch for its veracity
• What these organizations need is to:
– Identify data sources, reporting resources, and data redaction gaps
– Improve existing system
– If needed, obtain outside expertise to perform limited functions to set
up systems and to train internal resources to maintain them
Writing better SQL: Real-life example (1:2)
A report that was supposed to be run every 5 minutes
Writing better SQL: Real-life example (2:2)
The same report from the same data that runs in less than 5 minutes
Same servers ...
Same storage ...
Same network ...
Same database ...
Same data ...
Same tables ...
Same indexes ...
Performance improved 739X
Analytics & Reporting
• Reporting – Historical,
emphasis on counting,
understanding what
happened
• Analytics – Future looking,
discovery, emphasis on
predicting what will happen
next and determining how
to favorably influence the
event
Oracle Analytics Database Evolution
1998 1999 2002 2005 20082004 2011 2014
• 7 Data Mining
“Partners”
• Oracle acquires
Thinking Machine
Corp’s dev. team +
“Darwin” data
mining software
• Oracle Data Mining
10g & 10gR2
introduces SQL dm
functions, 7 new SQL
dm algorithms and
new Oracle Data Miner
“Classic” wizards
driven GUI
• New algorithms (EM,
PCA, SVD)
• Predictive Queries
• SQLDEV/Oracle Data
Miner 4.0 SQL script
generation and SQL
Query node (R integration)
• OAA/ORE 1.3 + 1.4
adds NN, Stepwise,
scalable R algorithms
• Oracle Adv. Analytics
for Hadoop Connector
launched with scalable
BDA algorithms• Oracle Data Mining
9.2i launched – 2
algorithms (NB and
AR) via Java API
• ODM 11g & 11gR2 adds
AutoDataPrep (ADP), text
mining, perf.
improvements
• SQLDEV/Oracle Data Miner
3.2 “work flow” GUI
launched
• Integration with “R” and
introduction/addition of
Oracle R Enterprise
• Product renamed “Oracle
Advanced Analytics (ODM
+ ORE)
Oracle’s Big Data Tools
SOURCES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry
Models
Oracle Advanced
Analytics
Oracle Spatial &
Graph
Big Data Appliance
Apache
Flume
Oracle
GoldenGate
Oracle Event
Processing
Cloudera Hadoop
Oracle Big Data SQL
Oracle NoSQL
Oracle R Advanced
Analytics for Hadoop
Oracle R Distribution
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Exadata
Oracle
GoldenGate
Oracle Event
Processing
Oracle Data
Integrator
Oracle Big Data
Connectors
Oracle Data
Integrator
WRAP-UP
Big Data has a place in IT!
Big Data has the potential to answer questions that can
not be answered with operational data
• What are people saying about us?
• What is trending in a way that might benefit us?
• What is trending in a way that might hurt us?
• What are people concerned about?
• What other interests do our customers and prospects have?
Planning and Budgeting impact
Planning:
• Define specific goals you expect to achieve through data analysis and
reporting
• Define the skill sets required to extract actionable information from your data
• Define the training and consulting required to align your IT team with business
needs
It is far less expensive to train your current employees in how to leverage what you
already have then it is to purchase new systems and hire new FTEs with new skills
Big Data Initiatives have promise in some areas – the challenge isn’t technical though, it’s
about solving a business problem.
Before you jump into a new initiative be sure the objectives are clearly defined – and that
they match up with the strengths (and limitations) of Big Data
Conclusions
• You can maximize the profitability of your organization by leveraging your existing
data to gain insights into how to become more efficient or increase sales
• You can maximize the profitability of your organization by spending your money
conservatively and optimizing use of what you already own
• Services organizations like PTC can support you in optimizing your existing
assets and training your existing team members
– Experts in performance tuning
– Experts in data warehouse design
– Experts in ETL
• Explore Big Data initiatives cautiously – leverage external resources at first and
have well defined goals and objectives
– If you don’t know what questions to ask you can’t expect to find the answers in Big Data
Thank you!
EXPERTS
Expert Data Services team with deep
performance tuning and Oracle
technology backgrounds.
More info:
www.perftuning.com
info@perftuning.com
@perftuning

More Related Content

What's hot

Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemSmart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
DATAVERSITY
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
DATAVERSITY
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
Capgemini
 
Big Data, Big Investment
Big Data, Big InvestmentBig Data, Big Investment
Big Data, Big Investment
GGV Capital
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
DATAVERSITY
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
Seta Wicaksana
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
Caserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Caserta
 
Big data Whitepaper
Big data WhitepaperBig data Whitepaper
Big data Whitepaper
Rahul Rathi
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
Piyush Malik
 
Unlocking big data
Unlocking big dataUnlocking big data
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Caserta
 
Big data, big revenue
Big data, big revenueBig data, big revenue
Big data, big revenue
Gary Allemann
 
Smart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a ServiceSmart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a Service
DATAVERSITY
 
DataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data StewardshipDataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data Stewardship
DATAVERSITY
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Data
himanshu13jun
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
Hurwitz & Associates
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital
 

What's hot (20)

Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemSmart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Big Data, Big Investment
Big Data, Big InvestmentBig Data, Big Investment
Big Data, Big Investment
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Big data Whitepaper
Big data WhitepaperBig data Whitepaper
Big data Whitepaper
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Analytics3.0 e book
Analytics3.0 e bookAnalytics3.0 e book
Analytics3.0 e book
 
Big data, big revenue
Big data, big revenueBig data, big revenue
Big data, big revenue
 
Smart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a ServiceSmart Data Webinar: Knowledge as a Service
Smart Data Webinar: Knowledge as a Service
 
DataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data StewardshipDataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data Stewardship
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Data
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)GGV Capital: Venture Investing and the Cloud (2012)
GGV Capital: Venture Investing and the Cloud (2012)
 

Similar to Big Data: How does it fit in your data strategy?

Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
Eric Kavanagh
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
Wilfried Hoge
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
BigDataExpo
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
cedrinemadera
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data Scale
Precisely
 
Enabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent AcquisitionEnabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent Acquisition
David Bernstein
 
Big data
Big dataBig data
Big data
Riya
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Precisely
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
The one question you must never ask!" (Information Requirements Gathering for...
The one question you must never ask!" (Information Requirements Gathering for...The one question you must never ask!" (Information Requirements Gathering for...
The one question you must never ask!" (Information Requirements Gathering for...
Alan D. Duncan
 
Big Data for HR
Big Data for HRBig Data for HR
Big Data for HR
David Bernstein
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
Indu Khemchandani
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
Ramakrishnan Venkataramanan
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
Castlebridge Associates
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of Graphs
Neo4j
 
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data QualityEnabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
Eryk Budi Pratama
 
A Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataA Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with Data
Eric Kavanagh
 
Big Data Forum - Phoenix
Big Data Forum - PhoenixBig Data Forum - Phoenix
Big Data Forum - Phoenix
Krishnan Parasuraman
 

Similar to Big Data: How does it fit in your data strategy? (20)

Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data Scale
 
Enabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent AcquisitionEnabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent Acquisition
 
Big data
Big dataBig data
Big data
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
The one question you must never ask!" (Information Requirements Gathering for...
The one question you must never ask!" (Information Requirements Gathering for...The one question you must never ask!" (Information Requirements Gathering for...
The one question you must never ask!" (Information Requirements Gathering for...
 
Big Data for HR
Big Data for HRBig Data for HR
Big Data for HR
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of Graphs
 
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data QualityEnabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
 
A Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with DataA Better Understanding: Solving Business Challenges with Data
A Better Understanding: Solving Business Challenges with Data
 
Big Data Forum - Phoenix
Big Data Forum - PhoenixBig Data Forum - Phoenix
Big Data Forum - Phoenix
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Big Data: How does it fit in your data strategy?

  • 1. Big Data: How does it fit in your data strategy? A Lunch & Learn webinar for IT Management Brought by Performance Tuning Corporation www.perftuning.com
  • 2. Panelists Mark Swanholm Chief Strategy Officer Performance Tuning Corporation https://www.linkedin.com/in/mswanholm Dan Morgan Oracle ACE Director Performance Tuning Corporation https://www.linkedin.com/pub/dan-morgan/0/aa9/a5
  • 3. Agenda • Introduction • What is Big Data? • Big Data's Sources • The Four Vs • Three Roads • Conclusion
  • 4. • Founded in 1997 – Team spun out of Compaq Performance Lab – Focused on solving the tough/complex and messy data architecture problems – Very Senior team of EXPERTS • Over 1000 clients & counting • Key industries: Financial Services, Telecom, Oil & Gas, Healthcare • Oracle Platinum Partner: Oracle Ace Director and Oracle Ace on staff About PTC Select Clients • Database & Engineered Sys. • Storage, Server and Network • Consulting, Managed Services & Training Focus on: High Performance Architectures
  • 5. Introduction: Daniel Morgan • Oracle ACE Director • Wrote Oracle curriculum and primary program instructor at University of Washington • Oracle consultant to Harvard University • The Morgan behind Morgan's Library on the web www.morganslibrary.org • 10g, 11g, and 12c Beta tester • Member: New York Oracle Users Group • Retired chair Washington Software Assoc. Database SIG • Co-Founder International GoldenGate Users Group • Never an employee of Oracle Corp.
  • 6.
  • 7.
  • 10. What people think Big Data is 3.5% 7.5% 8.8% 10.4% 17.4% 52.4% None of the above I don't know All of the web-based data and content businesses use for their own operations I'm not really sure what "Big Data" refers to The mass amounts of internal information that is stored and managed by an… All of the external and internal web-based data available for business intelligence Definition of "Big Data" Source: Connotate, “Connotate 2012 Big Data Attitudes and Perceptions Survey,” Oct. 1, 2012
  • 11. To quote Forbes Magazine… Source: http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/
  • 12. So… what is "Big Data”? • “Big Data” marketing focused on “net new” infrastructure and techniques over existing solutions • Generally refers to storing data in an unstructured manner and applying pattern recognition to extract value
  • 13. The Four V’s at a glance Volume Velocity 35 ZB ~12 TB/Day ~1 TB/Day Variety Veracity are probably wrong The speed at which data is produced and collected. In Fraud detection, for example, minutes count. ~30B RFID sensors The variety forms of data and its origins 80% of the world data is unstructured The scale of the data -Data quality uncertainty. That it exists does not mean it has value -The fact that someone believes something does not make it true 1/3 business leaders don't trust the Information they use to make decisions 2/3
  • 14. Big Data volume: a few facts Every day, we create 2.5 quintillion bytes of data: So much that 90% of the data in the world today has been created in the last two years alone This data comes from everywhere: • sensors used to gather climate information • posts to social media sites • digital pictures and videos • purchase transaction records • cell phone GPS signals to name a few From the beginning of recorded time until 2003, we created 5B gigabytes of data In 2011, the same amount was created every two days In 2013, the same amount of data is created every 10 minutes
  • 15. "Big Data”variety • Audio • Digital TV • Digital Photos • Smart Phones • Smart Appliances • RFID Tags • Medical Imaging • Industrial Sensors • Satellite Images • Games • Scanners • Social Networks • CAD/CAM Drawing • Video Conferencing • Digital Movies • Search Engines File Systems Transactional Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources DataExplorer • ERP Systems • HR Systems • CSR Systems • Point-of-Sale • Credit Reports • Public Records • Property Taxes • Smart Meters • Automotive Systems • GPS and AIS Data • License Plate Readers • Medical Records • Stock Trades • Scientific Research • White Papers • Weather Forecasts
  • 16. ISPs Movie Rentals Retailers Credit CardsSearch Engines Phone Systems Where is data collected?
  • 17. What are they collecting? Restaurant czheck Grocery Bill Airline ticket Hotel Bill and more…
  • 18. Veracity Questioned: Even with traditional data, there are issues around VERACITY
  • 19. CRM J Robertson Pittsburgh, PA 15213 35 West 15th Name: Address: Address: ERP Janet Robertson Pittsburgh, PA 15213 35 West 15th St. Name: Address: Address: Legacy Jan Robertson Pittsburgh, PA 15213 36 West 15th St. Name: Address: Address: SOURCE SYSTEMS
  • 20. 468 ? ?? AD - Andora AE - United Arab Emirates AF - Afghanistan AG - Antigua AI - Albania AM - Armenia AN - Netherland Antilles AO - Angola AR - Argentina AS - American Samoa AT - Austria AU - Australia AW - Aruba BB - Barbados BE - Belgium BG - Bulgaria BH - Bahrain BI - Burundi countries? 93% valid states? 0.1% valid ACCRA 00233 GH ACCRA-GH ACHAIA ADMIRALTY AE AG AGHIA PARASKEVI AGUADA AGUAS CLARAS AICHI-KEN AISNE - PICARDIE AJMAN AK AKERSHUS AL AL NAHDA1 AL QUOZ AL. ALABAMA ASEAN ASIA ASIAN PACIFIC Canadian cities? 2.3% valid CITY STATE -------------------- ------ Calgar AB Calgar7 AB Calgaray AB Calgaru AB Calgary AB Calgary AB AB Calgary Alberta AB Calgary Canada AB Calgary NW AB Calgary Nw AB Calgary SW AB Calgary T3B5Y4 AB Calgary, AB Calgary, AB AB Calgary, Alberta AB Calgary, Alta. AB Calgary, T2K 1B7 AB Calgay AB Calgery AB Calgry AB
  • 21. TABLE_NAME COLUMN_NAME MAX_DATE ------------------------------ ------------------------------ -------------------- XX_LINEITEMS PICKUP_DATE 10-NOV-2013 11:00:00 CHECK_PAY_DETAIL CHECK_DATE 26-AUG-2014 00:00:00 AUTO_REORDER_ITEM START_DATE 07-OCT-2017 00:00:00 AUTO_REORDER_ITEM LAST_REORDER_DATE 10-APR-2020 00:00:00 XX_PRESCRIPTIONS LAST_PICKUP_DATE 05-FEB-2030 12:00:00 SUBORDERS SHIPPED_DATE 20-DEC-2058 00:00:47 STORECATS UPDATE_DATE 11-NOV-2099 00:00:00 INSURANCE_CARDS BIRTHDATE 30-DEC-2100 00:00:00 GIFT_CERTIFICATES DATE_EXPIRES 16-SEP-2111 00:00:00 ALTERNATE_PAYMENTS EXPIRE_DATE 16-SEP-2111 00:00:00 PRODUCTS AVAILABLE_DATE 20-AUG-2154 00:00:00 REFILL_TOO_SOON READY_TO_FILL_DATE 01-MAR-3011 00:00:00 RELATIONINSTANCE END_DATE 01-JUN-3011 00:00:00 VA_PROFILES NEXT_ORDER_DATE 18-MAR-7483 00:00:00 XXXXXXX_RX SCHEDULED_PICKUP_DATE 01-JAN-9865 14:00:00 CONTACT_LENS_INFO EXPIRE_DATE 01-JUN-9865 00:00:00 PHARMACY_PATIENTS BIRTHDATE 31-DEC-9999 00:00:00 XX_PRESCRIPTIONS LAST_DISPENSED_EXPIRES 31-DEC-9999 00:00:00
  • 22. With MUCH of "Big Data" there are issues around veracity
  • 23. Veracity explored fever chills aches There is a belief, valid in fact, that certain search terms are good indicators of flu activity
  • 25. Veracity explored: Data interpretation
  • 27. Truth or Consequences? • Google was tracking a trend – What mattered was direction – What mattered was magnitude • How did this affect a retail pharmacy chain? – Made decisions to order different inventory and reorganize shelves – Purchased substantially more inventory than they could sell – Shipped the new inventory to their stores – Paid employees to process the new inventory – Shipped existing inventory from stores to warehouses – Items that could have sold were removed from the shelves and did not sell while excess inventory did not sell
  • 28. Mid-course corrections • Google's algorithm was written by professional statisticians and SMEs • Google was able to rely on the CDC data to find their errors and correct their algorithm
  • 30. A standard yardstick • To determine Google's error in magnitude required a sanity check provided by the CDC using traditional relational data • Only 12% of existing relational data is analyzed and used to make business decisions • To access and query the 88% of the data not being utilized requires – Improving data integrity – Building data warehouses ... not data dumpsters – Creating decision support systems with the help of SMEs
  • 31. So what should managers do?
  • 32. Three Paths Do it in house Subcontract to a service provider Ignore it and more fully exploit existing data resources
  • 33. Ignore it Do it in-house Hire a service provider Since only 12% of existing relational data is analyzed and used to make business decisions: • Create a data warehouse to properly organize existing data • Create a decision support system with domain expertise • Will require ETL expertise Data Experts: Data Architects, Management, Governance, Policy Coders: Data masking Infrastructure: Servers, Storage Visualization Expertise: Data set interpretation and correlation: Present results in a meaningful way Industry Vertical Domain Expertise: Develop hypotheses, Identify relevant business issues, Ask the right questions Math and Operations Research: Algorithm Development Share the costs by engaging a service provider • Provides the hardware and software • Provides the source data • Provides statisticians and programmers You still need to: • Ask the right questions • Validate the results • Interpret the results to derive actionable information • If merging with internal data still requires data masking and ETL
  • 34. Optimize What You Already Own • The vast majority of organizations already own infrastructure sufficient to collect, collate, and analyze, the data sets they own and for which they can vouch for its veracity • What these organizations need is to: – Identify data sources, reporting resources, and data redaction gaps – Improve existing system – If needed, obtain outside expertise to perform limited functions to set up systems and to train internal resources to maintain them
  • 35. Writing better SQL: Real-life example (1:2) A report that was supposed to be run every 5 minutes
  • 36. Writing better SQL: Real-life example (2:2) The same report from the same data that runs in less than 5 minutes Same servers ... Same storage ... Same network ... Same database ... Same data ... Same tables ... Same indexes ... Performance improved 739X
  • 37. Analytics & Reporting • Reporting – Historical, emphasis on counting, understanding what happened • Analytics – Future looking, discovery, emphasis on predicting what will happen next and determining how to favorably influence the event
  • 38. Oracle Analytics Database Evolution 1998 1999 2002 2005 20082004 2011 2014 • 7 Data Mining “Partners” • Oracle acquires Thinking Machine Corp’s dev. team + “Darwin” data mining software • Oracle Data Mining 10g & 10gR2 introduces SQL dm functions, 7 new SQL dm algorithms and new Oracle Data Miner “Classic” wizards driven GUI • New algorithms (EM, PCA, SVD) • Predictive Queries • SQLDEV/Oracle Data Miner 4.0 SQL script generation and SQL Query node (R integration) • OAA/ORE 1.3 + 1.4 adds NN, Stepwise, scalable R algorithms • Oracle Adv. Analytics for Hadoop Connector launched with scalable BDA algorithms• Oracle Data Mining 9.2i launched – 2 algorithms (NB and AR) via Java API • ODM 11g & 11gR2 adds AutoDataPrep (ADP), text mining, perf. improvements • SQLDEV/Oracle Data Miner 3.2 “work flow” GUI launched • Integration with “R” and introduction/addition of Oracle R Enterprise • Product renamed “Oracle Advanced Analytics (ODM + ORE)
  • 39. Oracle’s Big Data Tools SOURCES DATA RESERVOIR DATA WAREHOUSE Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Big Data Appliance Apache Flume Oracle GoldenGate Oracle Event Processing Cloudera Hadoop Oracle Big Data SQL Oracle NoSQL Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database In-Memory, Multi-tenant Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Exadata Oracle GoldenGate Oracle Event Processing Oracle Data Integrator Oracle Big Data Connectors Oracle Data Integrator
  • 41. Big Data has a place in IT! Big Data has the potential to answer questions that can not be answered with operational data • What are people saying about us? • What is trending in a way that might benefit us? • What is trending in a way that might hurt us? • What are people concerned about? • What other interests do our customers and prospects have?
  • 42. Planning and Budgeting impact Planning: • Define specific goals you expect to achieve through data analysis and reporting • Define the skill sets required to extract actionable information from your data • Define the training and consulting required to align your IT team with business needs It is far less expensive to train your current employees in how to leverage what you already have then it is to purchase new systems and hire new FTEs with new skills Big Data Initiatives have promise in some areas – the challenge isn’t technical though, it’s about solving a business problem. Before you jump into a new initiative be sure the objectives are clearly defined – and that they match up with the strengths (and limitations) of Big Data
  • 43. Conclusions • You can maximize the profitability of your organization by leveraging your existing data to gain insights into how to become more efficient or increase sales • You can maximize the profitability of your organization by spending your money conservatively and optimizing use of what you already own • Services organizations like PTC can support you in optimizing your existing assets and training your existing team members – Experts in performance tuning – Experts in data warehouse design – Experts in ETL • Explore Big Data initiatives cautiously – leverage external resources at first and have well defined goals and objectives – If you don’t know what questions to ask you can’t expect to find the answers in Big Data
  • 44. Thank you! EXPERTS Expert Data Services team with deep performance tuning and Oracle technology backgrounds. More info: www.perftuning.com info@perftuning.com @perftuning

Editor's Notes

  1. Over the years, Oracle has increasingly invested in making the database and SQL more powerful….