Big Data: How does it fit in your
data strategy?
A Lunch & Learn webinar for IT Management
Brought by Performance Tuning Corporation
www.perftuning.com
Panelists
Mark Swanholm
Chief Strategy Officer
Performance Tuning Corporation
https://www.linkedin.com/in/mswanholm
Dan Morgan
Oracle ACE Director
Performance Tuning Corporation
https://www.linkedin.com/pub/dan-morgan/0/aa9/a5
Agenda
• Introduction
• What is Big Data?
• Big Data's Sources
• The Four Vs
• Three Roads
• Conclusion
• Founded in 1997
– Team spun out of Compaq Performance Lab
– Focused on solving the tough/complex and messy data architecture problems
– Very Senior team of EXPERTS
• Over 1000 clients & counting
• Key industries: Financial Services, Telecom, Oil & Gas,
Healthcare
• Oracle Platinum Partner: Oracle Ace Director and Oracle Ace on
staff
About PTC Select Clients
• Database & Engineered Sys.
• Storage, Server and Network
• Consulting, Managed Services &
Training
Focus on:
High Performance Architectures
Introduction: Daniel Morgan
• Oracle ACE Director
• Wrote Oracle curriculum and primary program instructor at University of Washington
• Oracle consultant to Harvard University
• The Morgan behind Morgan's Library on the web
www.morganslibrary.org
• 10g, 11g, and 12c Beta tester
• Member: New York Oracle Users Group
• Retired chair Washington Software Assoc. Database SIG
• Co-Founder International GoldenGate Users Group
• Never an employee of Oracle Corp.
WHAT IS
BIG DATA?
Sources:
http://www.sas.com/en_us/insights/big-
data/what-is-big-data.html
http://www.sas.com/en_us/insights/big-
data/what-is-big-data.html
http://www.mongodb.com/big-data-
explained
Source:
What pros say Big Data is
What people think Big Data is
3.5%
7.5%
8.8%
10.4%
17.4%
52.4%
None of the above
I don't know
All of the web-based data and content
businesses use for their own operations
I'm not really sure what "Big Data" refers
to
The mass amounts of internal information
that is stored and managed by an…
All of the external and internal web-based
data available for business intelligence
Definition of "Big Data"
Source: Connotate, “Connotate 2012 Big Data Attitudes and Perceptions Survey,” Oct. 1, 2012
To quote Forbes Magazine…
Source: http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/
So… what is "Big Data”?
• “Big Data” marketing focused on
“net new” infrastructure and
techniques over existing solutions
• Generally refers to storing data in
an unstructured manner and
applying pattern recognition to
extract value
The Four V’s at a glance
Volume
Velocity
35 ZB
~12 TB/Day
~1 TB/Day
Variety
Veracity
are probably wrong
The speed at which data is produced and collected.
In Fraud detection, for example, minutes count.
~30B RFID sensors
The variety forms of data and its origins
80% of the world data is unstructured
The scale of the data
-Data quality uncertainty. That it exists does not mean
it has value
-The fact that someone believes something does not
make it true
1/3 business leaders don't trust the
Information they use to make decisions
2/3
Big Data volume: a few facts
Every day, we create
2.5 quintillion
bytes of data:
So much that 90% of
the data in the
world today has been
created in the last
two years alone
This data comes from
everywhere:
• sensors used to gather
climate information
• posts to social media
sites
• digital pictures and
videos
• purchase transaction
records
• cell phone GPS signals
to name a few
From the beginning of
recorded time until 2003,
we created 5B
gigabytes of data
In 2011, the same amount
was created every two
days
In 2013, the same amount
of data is created every
10 minutes
"Big Data”variety
• Audio
• Digital TV
• Digital Photos
• Smart Phones
• Smart Appliances
• RFID Tags
• Medical Imaging
• Industrial Sensors
• Satellite Images
• Games
• Scanners
• Social Networks
• CAD/CAM Drawing
• Video Conferencing
• Digital Movies
• Search Engines
File Systems
Transactional Data
Content Management
Email
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom Sources
DataExplorer
• ERP Systems
• HR Systems
• CSR Systems
• Point-of-Sale
• Credit Reports
• Public Records
• Property Taxes
• Smart Meters
• Automotive Systems
• GPS and AIS Data
• License Plate Readers
• Medical Records
• Stock Trades
• Scientific Research
• White Papers
• Weather Forecasts
ISPs Movie Rentals Retailers Credit CardsSearch Engines Phone Systems
Where is data collected?
What are they collecting?
Restaurant czheck
Grocery Bill
Airline ticket
Hotel Bill
and more…
Veracity Questioned:
Even with traditional data, there are issues around VERACITY
CRM
J Robertson
Pittsburgh, PA 15213
35 West 15th
Name:
Address:
Address:
ERP
Janet Robertson
Pittsburgh, PA 15213
35 West 15th St.
Name:
Address:
Address:
Legacy
Jan Robertson
Pittsburgh, PA 15213
36 West 15th St.
Name:
Address:
Address:
SOURCE SYSTEMS
468
?
??
AD - Andora
AE - United Arab Emirates
AF - Afghanistan
AG - Antigua
AI - Albania
AM - Armenia
AN - Netherland Antilles
AO - Angola
AR - Argentina
AS - American Samoa
AT - Austria
AU - Australia
AW - Aruba
BB - Barbados
BE - Belgium
BG - Bulgaria
BH - Bahrain
BI - Burundi
countries? 93% valid states? 0.1% valid
ACCRA 00233 GH
ACCRA-GH
ACHAIA
ADMIRALTY
AE
AG
AGHIA PARASKEVI
AGUADA
AGUAS CLARAS
AICHI-KEN
AISNE - PICARDIE
AJMAN
AK
AKERSHUS
AL
AL NAHDA1
AL QUOZ
AL.
ALABAMA
ASEAN
ASIA
ASIAN PACIFIC
Canadian cities? 2.3% valid
CITY STATE
-------------------- ------
Calgar AB
Calgar7 AB
Calgaray AB
Calgaru AB
Calgary AB
Calgary AB AB
Calgary Alberta AB
Calgary Canada AB
Calgary NW AB
Calgary Nw AB
Calgary SW AB
Calgary T3B5Y4 AB
Calgary, AB
Calgary, AB AB
Calgary, Alberta AB
Calgary, Alta. AB
Calgary, T2K 1B7 AB
Calgay AB
Calgery AB
Calgry AB
TABLE_NAME COLUMN_NAME MAX_DATE
------------------------------ ------------------------------ --------------------
XX_LINEITEMS PICKUP_DATE 10-NOV-2013 11:00:00
CHECK_PAY_DETAIL CHECK_DATE 26-AUG-2014 00:00:00
AUTO_REORDER_ITEM START_DATE 07-OCT-2017 00:00:00
AUTO_REORDER_ITEM LAST_REORDER_DATE 10-APR-2020 00:00:00
XX_PRESCRIPTIONS LAST_PICKUP_DATE 05-FEB-2030 12:00:00
SUBORDERS SHIPPED_DATE 20-DEC-2058 00:00:47
STORECATS UPDATE_DATE 11-NOV-2099 00:00:00
INSURANCE_CARDS BIRTHDATE 30-DEC-2100 00:00:00
GIFT_CERTIFICATES DATE_EXPIRES 16-SEP-2111 00:00:00
ALTERNATE_PAYMENTS EXPIRE_DATE 16-SEP-2111 00:00:00
PRODUCTS AVAILABLE_DATE 20-AUG-2154 00:00:00
REFILL_TOO_SOON READY_TO_FILL_DATE 01-MAR-3011 00:00:00
RELATIONINSTANCE END_DATE 01-JUN-3011 00:00:00
VA_PROFILES NEXT_ORDER_DATE 18-MAR-7483 00:00:00
XXXXXXX_RX SCHEDULED_PICKUP_DATE 01-JAN-9865 14:00:00
CONTACT_LENS_INFO EXPIRE_DATE 01-JUN-9865 00:00:00
PHARMACY_PATIENTS BIRTHDATE 31-DEC-9999 00:00:00
XX_PRESCRIPTIONS LAST_DISPENSED_EXPIRES 31-DEC-9999 00:00:00
With MUCH of "Big Data" there are issues around veracity
Veracity explored
fever chills aches
There is a belief, valid in fact, that certain search terms are good
indicators of flu activity
Veracity explored: Predicting flu trends
Veracity explored: Data interpretation
We can
do better
than this!
Truth or Consequences?
• Google was tracking a trend
– What mattered was direction
– What mattered was magnitude
• How did this affect a retail pharmacy chain?
– Made decisions to order different inventory and reorganize shelves
– Purchased substantially more inventory than they could sell
– Shipped the new inventory to their stores
– Paid employees to process the new inventory
– Shipped existing inventory from stores to warehouses
– Items that could have sold were removed from the shelves and did not sell
while excess inventory did not sell
Mid-course corrections
• Google's algorithm was
written by professional
statisticians and SMEs
• Google was able to rely
on the CDC data to find
their errors and correct
their algorithm
How do
we know
what is
real?
A standard yardstick
• To determine Google's error in magnitude required a sanity
check provided by the CDC using traditional relational data
• Only 12% of existing relational data is analyzed and used to
make business decisions
• To access and query the 88% of the data not being utilized
requires
– Improving data integrity
– Building data warehouses ... not data dumpsters
– Creating decision support systems with the help of SMEs
So what should managers do?
Three Paths
Do it in house
Subcontract to a
service provider
Ignore it and
more fully
exploit existing
data resources
Ignore it Do it in-house Hire a service provider
Since only 12% of existing
relational data is analyzed
and used to make business
decisions:
• Create a data warehouse to
properly organize existing
data
• Create a decision support
system with domain
expertise
• Will require ETL expertise
Data Experts: Data Architects,
Management, Governance, Policy
Coders: Data masking
Infrastructure: Servers, Storage
Visualization Expertise: Data
set interpretation and correlation:
Present results in a meaningful way
Industry Vertical Domain
Expertise: Develop hypotheses,
Identify relevant business issues, Ask
the right questions
Math and Operations
Research: Algorithm Development
Share the costs by engaging a
service provider
• Provides the hardware and
software
• Provides the source data
• Provides statisticians and
programmers
You still need to:
• Ask the right questions
• Validate the results
• Interpret the results to derive
actionable information
• If merging with internal data
still requires data masking and
ETL
Optimize What You Already Own
• The vast majority of organizations already own infrastructure
sufficient to collect, collate, and analyze, the data sets
they own and for which they can vouch for its veracity
• What these organizations need is to:
– Identify data sources, reporting resources, and data redaction gaps
– Improve existing system
– If needed, obtain outside expertise to perform limited functions to set
up systems and to train internal resources to maintain them
Writing better SQL: Real-life example (1:2)
A report that was supposed to be run every 5 minutes
Writing better SQL: Real-life example (2:2)
The same report from the same data that runs in less than 5 minutes
Same servers ...
Same storage ...
Same network ...
Same database ...
Same data ...
Same tables ...
Same indexes ...
Performance improved 739X
Analytics & Reporting
• Reporting – Historical,
emphasis on counting,
understanding what
happened
• Analytics – Future looking,
discovery, emphasis on
predicting what will happen
next and determining how
to favorably influence the
event
Oracle Analytics Database Evolution
1998 1999 2002 2005 20082004 2011 2014
• 7 Data Mining
“Partners”
• Oracle acquires
Thinking Machine
Corp’s dev. team +
“Darwin” data
mining software
• Oracle Data Mining
10g & 10gR2
introduces SQL dm
functions, 7 new SQL
dm algorithms and
new Oracle Data Miner
“Classic” wizards
driven GUI
• New algorithms (EM,
PCA, SVD)
• Predictive Queries
• SQLDEV/Oracle Data
Miner 4.0 SQL script
generation and SQL
Query node (R integration)
• OAA/ORE 1.3 + 1.4
adds NN, Stepwise,
scalable R algorithms
• Oracle Adv. Analytics
for Hadoop Connector
launched with scalable
BDA algorithms• Oracle Data Mining
9.2i launched – 2
algorithms (NB and
AR) via Java API
• ODM 11g & 11gR2 adds
AutoDataPrep (ADP), text
mining, perf.
improvements
• SQLDEV/Oracle Data Miner
3.2 “work flow” GUI
launched
• Integration with “R” and
introduction/addition of
Oracle R Enterprise
• Product renamed “Oracle
Advanced Analytics (ODM
+ ORE)
Oracle’s Big Data Tools
SOURCES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry
Models
Oracle Advanced
Analytics
Oracle Spatial &
Graph
Big Data Appliance
Apache
Flume
Oracle
GoldenGate
Oracle Event
Processing
Cloudera Hadoop
Oracle Big Data SQL
Oracle NoSQL
Oracle R Advanced
Analytics for Hadoop
Oracle R Distribution
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Exadata
Oracle
GoldenGate
Oracle Event
Processing
Oracle Data
Integrator
Oracle Big Data
Connectors
Oracle Data
Integrator
WRAP-UP
Big Data has a place in IT!
Big Data has the potential to answer questions that can
not be answered with operational data
• What are people saying about us?
• What is trending in a way that might benefit us?
• What is trending in a way that might hurt us?
• What are people concerned about?
• What other interests do our customers and prospects have?
Planning and Budgeting impact
Planning:
• Define specific goals you expect to achieve through data analysis and
reporting
• Define the skill sets required to extract actionable information from your data
• Define the training and consulting required to align your IT team with business
needs
It is far less expensive to train your current employees in how to leverage what you
already have then it is to purchase new systems and hire new FTEs with new skills
Big Data Initiatives have promise in some areas – the challenge isn’t technical though, it’s
about solving a business problem.
Before you jump into a new initiative be sure the objectives are clearly defined – and that
they match up with the strengths (and limitations) of Big Data
Conclusions
• You can maximize the profitability of your organization by leveraging your existing
data to gain insights into how to become more efficient or increase sales
• You can maximize the profitability of your organization by spending your money
conservatively and optimizing use of what you already own
• Services organizations like PTC can support you in optimizing your existing
assets and training your existing team members
– Experts in performance tuning
– Experts in data warehouse design
– Experts in ETL
• Explore Big Data initiatives cautiously – leverage external resources at first and
have well defined goals and objectives
– If you don’t know what questions to ask you can’t expect to find the answers in Big Data
Thank you!
EXPERTS
Expert Data Services team with deep
performance tuning and Oracle
technology backgrounds.
More info:
www.perftuning.com
info@perftuning.com
@perftuning

Big Data: How does it fit in your data strategy?

  • 1.
    Big Data: Howdoes it fit in your data strategy? A Lunch & Learn webinar for IT Management Brought by Performance Tuning Corporation www.perftuning.com
  • 2.
    Panelists Mark Swanholm Chief StrategyOfficer Performance Tuning Corporation https://www.linkedin.com/in/mswanholm Dan Morgan Oracle ACE Director Performance Tuning Corporation https://www.linkedin.com/pub/dan-morgan/0/aa9/a5
  • 3.
    Agenda • Introduction • Whatis Big Data? • Big Data's Sources • The Four Vs • Three Roads • Conclusion
  • 4.
    • Founded in1997 – Team spun out of Compaq Performance Lab – Focused on solving the tough/complex and messy data architecture problems – Very Senior team of EXPERTS • Over 1000 clients & counting • Key industries: Financial Services, Telecom, Oil & Gas, Healthcare • Oracle Platinum Partner: Oracle Ace Director and Oracle Ace on staff About PTC Select Clients • Database & Engineered Sys. • Storage, Server and Network • Consulting, Managed Services & Training Focus on: High Performance Architectures
  • 5.
    Introduction: Daniel Morgan •Oracle ACE Director • Wrote Oracle curriculum and primary program instructor at University of Washington • Oracle consultant to Harvard University • The Morgan behind Morgan's Library on the web www.morganslibrary.org • 10g, 11g, and 12c Beta tester • Member: New York Oracle Users Group • Retired chair Washington Software Assoc. Database SIG • Co-Founder International GoldenGate Users Group • Never an employee of Oracle Corp.
  • 8.
  • 9.
  • 10.
    What people thinkBig Data is 3.5% 7.5% 8.8% 10.4% 17.4% 52.4% None of the above I don't know All of the web-based data and content businesses use for their own operations I'm not really sure what "Big Data" refers to The mass amounts of internal information that is stored and managed by an… All of the external and internal web-based data available for business intelligence Definition of "Big Data" Source: Connotate, “Connotate 2012 Big Data Attitudes and Perceptions Survey,” Oct. 1, 2012
  • 11.
    To quote ForbesMagazine… Source: http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/
  • 12.
    So… what is"Big Data”? • “Big Data” marketing focused on “net new” infrastructure and techniques over existing solutions • Generally refers to storing data in an unstructured manner and applying pattern recognition to extract value
  • 13.
    The Four V’sat a glance Volume Velocity 35 ZB ~12 TB/Day ~1 TB/Day Variety Veracity are probably wrong The speed at which data is produced and collected. In Fraud detection, for example, minutes count. ~30B RFID sensors The variety forms of data and its origins 80% of the world data is unstructured The scale of the data -Data quality uncertainty. That it exists does not mean it has value -The fact that someone believes something does not make it true 1/3 business leaders don't trust the Information they use to make decisions 2/3
  • 14.
    Big Data volume:a few facts Every day, we create 2.5 quintillion bytes of data: So much that 90% of the data in the world today has been created in the last two years alone This data comes from everywhere: • sensors used to gather climate information • posts to social media sites • digital pictures and videos • purchase transaction records • cell phone GPS signals to name a few From the beginning of recorded time until 2003, we created 5B gigabytes of data In 2011, the same amount was created every two days In 2013, the same amount of data is created every 10 minutes
  • 15.
    "Big Data”variety • Audio •Digital TV • Digital Photos • Smart Phones • Smart Appliances • RFID Tags • Medical Imaging • Industrial Sensors • Satellite Images • Games • Scanners • Social Networks • CAD/CAM Drawing • Video Conferencing • Digital Movies • Search Engines File Systems Transactional Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources DataExplorer • ERP Systems • HR Systems • CSR Systems • Point-of-Sale • Credit Reports • Public Records • Property Taxes • Smart Meters • Automotive Systems • GPS and AIS Data • License Plate Readers • Medical Records • Stock Trades • Scientific Research • White Papers • Weather Forecasts
  • 16.
    ISPs Movie RentalsRetailers Credit CardsSearch Engines Phone Systems Where is data collected?
  • 17.
    What are theycollecting? Restaurant czheck Grocery Bill Airline ticket Hotel Bill and more…
  • 18.
    Veracity Questioned: Even withtraditional data, there are issues around VERACITY
  • 19.
    CRM J Robertson Pittsburgh, PA15213 35 West 15th Name: Address: Address: ERP Janet Robertson Pittsburgh, PA 15213 35 West 15th St. Name: Address: Address: Legacy Jan Robertson Pittsburgh, PA 15213 36 West 15th St. Name: Address: Address: SOURCE SYSTEMS
  • 20.
    468 ? ?? AD - Andora AE- United Arab Emirates AF - Afghanistan AG - Antigua AI - Albania AM - Armenia AN - Netherland Antilles AO - Angola AR - Argentina AS - American Samoa AT - Austria AU - Australia AW - Aruba BB - Barbados BE - Belgium BG - Bulgaria BH - Bahrain BI - Burundi countries? 93% valid states? 0.1% valid ACCRA 00233 GH ACCRA-GH ACHAIA ADMIRALTY AE AG AGHIA PARASKEVI AGUADA AGUAS CLARAS AICHI-KEN AISNE - PICARDIE AJMAN AK AKERSHUS AL AL NAHDA1 AL QUOZ AL. ALABAMA ASEAN ASIA ASIAN PACIFIC Canadian cities? 2.3% valid CITY STATE -------------------- ------ Calgar AB Calgar7 AB Calgaray AB Calgaru AB Calgary AB Calgary AB AB Calgary Alberta AB Calgary Canada AB Calgary NW AB Calgary Nw AB Calgary SW AB Calgary T3B5Y4 AB Calgary, AB Calgary, AB AB Calgary, Alberta AB Calgary, Alta. AB Calgary, T2K 1B7 AB Calgay AB Calgery AB Calgry AB
  • 21.
    TABLE_NAME COLUMN_NAME MAX_DATE ------------------------------------------------------------ -------------------- XX_LINEITEMS PICKUP_DATE 10-NOV-2013 11:00:00 CHECK_PAY_DETAIL CHECK_DATE 26-AUG-2014 00:00:00 AUTO_REORDER_ITEM START_DATE 07-OCT-2017 00:00:00 AUTO_REORDER_ITEM LAST_REORDER_DATE 10-APR-2020 00:00:00 XX_PRESCRIPTIONS LAST_PICKUP_DATE 05-FEB-2030 12:00:00 SUBORDERS SHIPPED_DATE 20-DEC-2058 00:00:47 STORECATS UPDATE_DATE 11-NOV-2099 00:00:00 INSURANCE_CARDS BIRTHDATE 30-DEC-2100 00:00:00 GIFT_CERTIFICATES DATE_EXPIRES 16-SEP-2111 00:00:00 ALTERNATE_PAYMENTS EXPIRE_DATE 16-SEP-2111 00:00:00 PRODUCTS AVAILABLE_DATE 20-AUG-2154 00:00:00 REFILL_TOO_SOON READY_TO_FILL_DATE 01-MAR-3011 00:00:00 RELATIONINSTANCE END_DATE 01-JUN-3011 00:00:00 VA_PROFILES NEXT_ORDER_DATE 18-MAR-7483 00:00:00 XXXXXXX_RX SCHEDULED_PICKUP_DATE 01-JAN-9865 14:00:00 CONTACT_LENS_INFO EXPIRE_DATE 01-JUN-9865 00:00:00 PHARMACY_PATIENTS BIRTHDATE 31-DEC-9999 00:00:00 XX_PRESCRIPTIONS LAST_DISPENSED_EXPIRES 31-DEC-9999 00:00:00
  • 22.
    With MUCH of"Big Data" there are issues around veracity
  • 23.
    Veracity explored fever chillsaches There is a belief, valid in fact, that certain search terms are good indicators of flu activity
  • 24.
  • 25.
  • 26.
  • 27.
    Truth or Consequences? •Google was tracking a trend – What mattered was direction – What mattered was magnitude • How did this affect a retail pharmacy chain? – Made decisions to order different inventory and reorganize shelves – Purchased substantially more inventory than they could sell – Shipped the new inventory to their stores – Paid employees to process the new inventory – Shipped existing inventory from stores to warehouses – Items that could have sold were removed from the shelves and did not sell while excess inventory did not sell
  • 28.
    Mid-course corrections • Google'salgorithm was written by professional statisticians and SMEs • Google was able to rely on the CDC data to find their errors and correct their algorithm
  • 29.
  • 30.
    A standard yardstick •To determine Google's error in magnitude required a sanity check provided by the CDC using traditional relational data • Only 12% of existing relational data is analyzed and used to make business decisions • To access and query the 88% of the data not being utilized requires – Improving data integrity – Building data warehouses ... not data dumpsters – Creating decision support systems with the help of SMEs
  • 31.
    So what shouldmanagers do?
  • 32.
    Three Paths Do itin house Subcontract to a service provider Ignore it and more fully exploit existing data resources
  • 33.
    Ignore it Doit in-house Hire a service provider Since only 12% of existing relational data is analyzed and used to make business decisions: • Create a data warehouse to properly organize existing data • Create a decision support system with domain expertise • Will require ETL expertise Data Experts: Data Architects, Management, Governance, Policy Coders: Data masking Infrastructure: Servers, Storage Visualization Expertise: Data set interpretation and correlation: Present results in a meaningful way Industry Vertical Domain Expertise: Develop hypotheses, Identify relevant business issues, Ask the right questions Math and Operations Research: Algorithm Development Share the costs by engaging a service provider • Provides the hardware and software • Provides the source data • Provides statisticians and programmers You still need to: • Ask the right questions • Validate the results • Interpret the results to derive actionable information • If merging with internal data still requires data masking and ETL
  • 34.
    Optimize What YouAlready Own • The vast majority of organizations already own infrastructure sufficient to collect, collate, and analyze, the data sets they own and for which they can vouch for its veracity • What these organizations need is to: – Identify data sources, reporting resources, and data redaction gaps – Improve existing system – If needed, obtain outside expertise to perform limited functions to set up systems and to train internal resources to maintain them
  • 35.
    Writing better SQL:Real-life example (1:2) A report that was supposed to be run every 5 minutes
  • 36.
    Writing better SQL:Real-life example (2:2) The same report from the same data that runs in less than 5 minutes Same servers ... Same storage ... Same network ... Same database ... Same data ... Same tables ... Same indexes ... Performance improved 739X
  • 37.
    Analytics & Reporting •Reporting – Historical, emphasis on counting, understanding what happened • Analytics – Future looking, discovery, emphasis on predicting what will happen next and determining how to favorably influence the event
  • 38.
    Oracle Analytics DatabaseEvolution 1998 1999 2002 2005 20082004 2011 2014 • 7 Data Mining “Partners” • Oracle acquires Thinking Machine Corp’s dev. team + “Darwin” data mining software • Oracle Data Mining 10g & 10gR2 introduces SQL dm functions, 7 new SQL dm algorithms and new Oracle Data Miner “Classic” wizards driven GUI • New algorithms (EM, PCA, SVD) • Predictive Queries • SQLDEV/Oracle Data Miner 4.0 SQL script generation and SQL Query node (R integration) • OAA/ORE 1.3 + 1.4 adds NN, Stepwise, scalable R algorithms • Oracle Adv. Analytics for Hadoop Connector launched with scalable BDA algorithms• Oracle Data Mining 9.2i launched – 2 algorithms (NB and AR) via Java API • ODM 11g & 11gR2 adds AutoDataPrep (ADP), text mining, perf. improvements • SQLDEV/Oracle Data Miner 3.2 “work flow” GUI launched • Integration with “R” and introduction/addition of Oracle R Enterprise • Product renamed “Oracle Advanced Analytics (ODM + ORE)
  • 39.
    Oracle’s Big DataTools SOURCES DATA RESERVOIR DATA WAREHOUSE Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Big Data Appliance Apache Flume Oracle GoldenGate Oracle Event Processing Cloudera Hadoop Oracle Big Data SQL Oracle NoSQL Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database In-Memory, Multi-tenant Oracle Industry Models Oracle Advanced Analytics Oracle Spatial & Graph Exadata Oracle GoldenGate Oracle Event Processing Oracle Data Integrator Oracle Big Data Connectors Oracle Data Integrator
  • 40.
  • 41.
    Big Data hasa place in IT! Big Data has the potential to answer questions that can not be answered with operational data • What are people saying about us? • What is trending in a way that might benefit us? • What is trending in a way that might hurt us? • What are people concerned about? • What other interests do our customers and prospects have?
  • 42.
    Planning and Budgetingimpact Planning: • Define specific goals you expect to achieve through data analysis and reporting • Define the skill sets required to extract actionable information from your data • Define the training and consulting required to align your IT team with business needs It is far less expensive to train your current employees in how to leverage what you already have then it is to purchase new systems and hire new FTEs with new skills Big Data Initiatives have promise in some areas – the challenge isn’t technical though, it’s about solving a business problem. Before you jump into a new initiative be sure the objectives are clearly defined – and that they match up with the strengths (and limitations) of Big Data
  • 43.
    Conclusions • You canmaximize the profitability of your organization by leveraging your existing data to gain insights into how to become more efficient or increase sales • You can maximize the profitability of your organization by spending your money conservatively and optimizing use of what you already own • Services organizations like PTC can support you in optimizing your existing assets and training your existing team members – Experts in performance tuning – Experts in data warehouse design – Experts in ETL • Explore Big Data initiatives cautiously – leverage external resources at first and have well defined goals and objectives – If you don’t know what questions to ask you can’t expect to find the answers in Big Data
  • 44.
    Thank you! EXPERTS Expert DataServices team with deep performance tuning and Oracle technology backgrounds. More info: www.perftuning.com info@perftuning.com @perftuning

Editor's Notes

  • #39 Over the years, Oracle has increasingly invested in making the database and SQL more powerful….