SlideShare a Scribd company logo

Choosing the Right Big Data Architecture for your Business

1 of 72
Choosing the Right
Data Architecture
for Your Big Data Projects
Presentation 1
“There isn’t a cluster big enough to hold your ego!”
Presentation 1
Choosing the Right Data Architecture
for Your Big Data Projects
AGENDA
Choosing the Right Data Architecture
for Your Big Data Projects
Acknowledgements
Planning Your Enterprise Data Strategy
John Ladley
President
IMCue Solutions
Metrics for Information Management
Business Analysis Techniques for Data Professionals
Alec Sharp
Senior Consultant
Clariteq Systems Consulting
Steps to a Successful Enterprise Information Management
ProgramMichael F. Jennings
Executive Director - Data Governance
Walgreens
Meta Data Requirements for the Enterprise
David Loshin
President
Knowledge Integrity
Advanced MDM: Moving to the Next Level of MDM Success
Choosing the Right Data Architecture
for Your Big Data Projects
Acknowledgements
Choosing a Big Data Platform
Big Data
Platform

Recommended

Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft Private Cloud
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 

More Related Content

What's hot

Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeData Science Thailand
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use casesAllied Consultants
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data LakeKaran Sachdeva
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Big Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationBig Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationJean-Michel Franco
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
Unlocking Business Value Using Data
Unlocking Business Value Using DataUnlocking Business Value Using Data
Unlocking Business Value Using DataSplunk
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lakeCapgemini
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Denodo
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lakeKaran Sachdeva
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateEdgar Alejandro Villegas
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Databricks
 

What's hot (20)

Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use cases
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data Lake
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Big Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationBig Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning association
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Unlocking Business Value Using Data
Unlocking Business Value Using DataUnlocking Business Value Using Data
Unlocking Business Value Using Data
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lake
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lake
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by Actuate
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
 

Similar to Choosing the Right Big Data Architecture for your Business

Big agendas for big data analytics projects
Big agendas for big data analytics projectsBig agendas for big data analytics projects
Big agendas for big data analytics projectsThe Marketing Distillery
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
 
What are Big Data, Data Science, and Data Analytics
 What are Big Data, Data Science, and Data Analytics What are Big Data, Data Science, and Data Analytics
What are Big Data, Data Science, and Data AnalyticsRay Business Technologies
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitecturePalani Kumar
 
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxC21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxAdityaDeshpande674450
 
Data Mining Services in various types
Data Mining Services in various typesData Mining Services in various types
Data Mining Services in various typesloginworks software
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI marketMohsin Baig
 
Big data – A Review
Big data – A ReviewBig data – A Review
Big data – A ReviewIRJET Journal
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperExperian
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentIRJET Journal
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business AdvantageTeradata Aster
 
Big Data why Now and where to?
Big Data why Now and where to?Big Data why Now and where to?
Big Data why Now and where to?Fady Sayah
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...IT Support Engineer
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?Logi Analytics
 

Similar to Choosing the Right Big Data Architecture for your Business (20)

6 Reasons to Use Data Analytics
6 Reasons to Use Data Analytics6 Reasons to Use Data Analytics
6 Reasons to Use Data Analytics
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Big agendas for big data analytics projects
Big agendas for big data analytics projectsBig agendas for big data analytics projects
Big agendas for big data analytics projects
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
What are Big Data, Data Science, and Data Analytics
 What are Big Data, Data Science, and Data Analytics What are Big Data, Data Science, and Data Analytics
What are Big Data, Data Science, and Data Analytics
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxC21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
 
Big Data analytics best practices
Big Data analytics best practicesBig Data analytics best practices
Big Data analytics best practices
 
Data Mining Services in various types
Data Mining Services in various typesData Mining Services in various types
Data Mining Services in various types
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI market
 
Big data – A Review
Big data – A ReviewBig data – A Review
Big data – A Review
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate Environment
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Big Data why Now and where to?
Big Data why Now and where to?Big Data why Now and where to?
Big Data why Now and where to?
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
 

More from Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

More from Chicago Hadoop Users Group (19)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Recently uploaded

Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...MichaelBenis1
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31shyamraj55
 
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...IES VE
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)François
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 

Recently uploaded (20)

Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
 
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 

Choosing the Right Big Data Architecture for your Business

  • 1. Choosing the Right Data Architecture for Your Big Data Projects Presentation 1
  • 2. “There isn’t a cluster big enough to hold your ego!”
  • 3. Presentation 1 Choosing the Right Data Architecture for Your Big Data Projects AGENDA
  • 4. Choosing the Right Data Architecture for Your Big Data Projects Acknowledgements Planning Your Enterprise Data Strategy John Ladley President IMCue Solutions Metrics for Information Management Business Analysis Techniques for Data Professionals Alec Sharp Senior Consultant Clariteq Systems Consulting Steps to a Successful Enterprise Information Management ProgramMichael F. Jennings Executive Director - Data Governance Walgreens Meta Data Requirements for the Enterprise David Loshin President Knowledge Integrity Advanced MDM: Moving to the Next Level of MDM Success
  • 5. Choosing the Right Data Architecture for Your Big Data Projects Acknowledgements
  • 6. Choosing a Big Data Platform Big Data Platform
  • 10. Key Ideas One Big Data database cannot accommodate all the Big Data types One size DOES NOT fit all. You need to know the data type and data architecture to select the most appropriate Big Data database.
  • 11. Choosing a Big Data Architecture Big Data Platform Big Data Architecture
  • 12. What is Big Data? Big Data is about textual analytics (deriving data from unstructured content) [not dimension or fact tables] Web data click stream data social network data Semi-structured data email Unstructured content comments Sensor data Vertical industries structured transaction data tweets , text messages Choosing a Big Data Architecture
  • 13. Analysis Type Choosing a Big Data Architecture What do we need to consider when classifying Big Data? Real Time Batch Processing Methodology Predictive Analytics Analytical Querying & Reporting Misc. Data Type Meta Data Master Data Historical Transactional Data Frequency On Demand Feeds Continuous Feeds Real Time Feeds Time Series Structured Un- Structured Semi- Structured Web and Social Media Machine Generated Human Generated Internal Data Sources Transaction Data Biometric Data Via Data Providers Via Data Originators Data Consumers Human Business Process Other Enterprise Applications Other Data Repositories Hardware Commodity Hardware State of the Art Hardware
  • 14. Choosing a Big Data Architecture
  • 15. Choosing a Big Data Architecture
  • 17. Choosing a Big Data Architecture Classify Big Data Type According to the Business Needs Big data business problems by type Business problem Big Data Type Description Utility companies have rolled out smart meters to measure the consumption of water, gas, and electricity at regular intervals of one hour or less. These smart meters generate huge volumes of interval data that needs to be analyzed. Utilities also run big, expensive, and complicated systems to generate power. Each grid includes sophisticated sensors that monitor voltage, current, frequency, and?other important operating characteristics. To gain operating efficiency, the company must monitor the data delivered by the sensor. A big data solution can analyze power generation (supply) and power consumption (demand) data using smart meters. Web and social data Telecommunications operators need to build detailed customer churn models that include social media and transaction data, such as CDRs, to keep up with the competition. The value of the churn models depends on the quality of customer attributes (customer master data such as date of birth, gender, location, and income) and the social behavior of customers. Transaction data Telecommunications providers who implement a predictive analytics strategy can manage and predict churn by analyzing the calling patterns of subscribers. Marketing departments use Twitter feeds to conduct sentiment analysis to determine what users are saying about the company and its products or services, especially after a new product or release is launched. Customer sentiment must be integrated with customer profile data to derive meaningful results. Customer feedback may vary according to customer demographics. Utilities: Predict power consumption Machine- generated data Telecommunications: Customer churn analytics Marketing: Sentiment analysis Web and social data
  • 18. Choosing a Big Data Architecture Big data business problems by type Business problem Big Data Type Description Customer service: Call monitoring Human- generated IT departments are turning to big data solutions to analyze application logs to gain insight that can improve system performance. Log files from various application vendors are in different formats; they must be standardized before IT departments can use them. Web and social data Retailers can use facial recognition technology in combination with a photo from social media to make personalized offers to customers based on buying behavior and location. Biometrics This capability could have a tremendous impact on retailers? loyalty programs, but it has serious privacy ramifications. Retailers would need to make the appropriate privacy disclosures before implementing these applications. Machine- generated data Retailers can target customers with specific promotions and coupons based location data. Solutions are typically designed to detect a user's location upon entry to a store or through GPS. Transaction data Location data combined with customer preference data from social networks enable retailers to target online and in-store marketing campaigns based on buying history. Notifications are delivered through mobile applications, SMS, and email. Machine- generated data Fraud management predicts the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, first- party fraud, and deliberate misuse of account privileges. Solutions are typically designed to detect and prevent myriad fraud and risk types across multiple industries, including: Transaction data Credit and debit payment card fraud Deposit account fraud Human- generated Technical fraud Bad debt Healthcare fraud Medicaid and Medicare fraud Property and casualty insurance fraud Worker compensation fraud Insurance fraud Telecommunications fraud Retail and marketing: Mobile data and location-based targeting FSS, Healthcare: Fraud detection Retail: Personalized messaging based on facial recognition and social media Classify Big Data Type According to the Business Needs
  • 19. Key Idea There are guidelines to help suggest the Big Data Types that are commonly used by each industry.
  • 20. Choosing a Big Data Architecture Classify Big Data Type According to the Business Needs
  • 21. Validate the data being collected has business value. Critical Success Factor 55% of Big Data projects don’t get completed, …and many others fall short of their objectives. http://www.infochimps.com/resources/report-cios-big-data-what-your-it-team-wants-you-to-know-6/ Report: CIOs & Big Data: What Your IT Team Wants You to Know
  • 22. Choosing a Big Data Architecture Big Data Platform Big Data Architecture Big Data Business Needs by type
  • 23. Ten Big Data Schemas Big Data Architecture
  • 24. Ten Big Data SchemasRelational - Graph A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way. Graph databases can make a difference in harvesting more value in your data by looking at its relationships. Provides index-free adjacency where every element contains a direct pointer to its adjacent elements and no index lookups are necessary.
  • 25. Ten Big Data SchemasRelational - Graph
  • 26. Ten Big Data Schemas Relational - Analytics / MPP Columnar Column-oriented storage organization, which increases performance of sequential record access at the expense of common transactional operations such as single record retrieval, updates, and deletes Shared nothing architecture, which reduces system contention for shared resources and allows gradual degradation of performance in the face of hardware failure
  • 27. Ten Big Data Schemas Relational - Analytics / MPP Columnar
  • 28. Ten Big Data SchemasRelational - Analytics / MPP Delivers extreme performance and scalability for all your database applications including Online Transaction Processing (OTLP), data warehousing (DW) and mixed workloads
  • 29. Ten Big Data SchemasRelational - Analytics / MPP
  • 30. Ten Big Data SchemasRelational - NewSQL Scale out relational databases by virtualizing a distributed database environment. Provides organizations the relational data integrity combined with the scalability and flexibility of a modern distributed, multi-site database to support an unlimited numbers of users, larger data volumes and extremely high TPS
  • 31. Ten Big Data SchemasRelational - NewSQL
  • 32. Ten Big Data SchemasPolyStructured – Document Indexing Provides full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Provides distributed search and index replication Highly scalable
  • 33. Ten Big Data SchemasPolyStructured – Document Indexing
  • 34. Ten Big Data SchemasPolyStructured - Document Document databases completely embrace the web. Store data with JSON documents. Access documents and query indexes with web browsers, via HTTP. Index, combine, and transform documents with JavaScript. Works well with modern web and mobile apps. Serve web apps directly. On-the-fly document transformation and real-time change notifications
  • 35. Ten Big Data SchemasPolyStructured - Document Document databases lack a schema, or rigid pre-defined data structures such as tables. Data stored in document databases commonly use JSON document(s) JavaScript for MapReduce indexes
  • 36. Ten Big Data SchemasPolyStructured – Key Value Stored – InMemory - Data Grid In-Memory Accelerator for Apache Hadoop, high performance computing, streaming and database, HDFS and MongoDB Eliminate MapReduce Overhead Dynamically caches, partitions, replicates, and manages application data and business logic across multiple servers. Fully elastic memory based storage grid. Virtualized the free memory of a potentially large number of Java virtual machines and makes them behave like a single key addressable storage pool for application state. IBM WebSphere eXtreme Scale
  • 37. Ten Big Data SchemasPolyStructured – Key Value Stored – InMemory - Data Grid
  • 38. Ten Big Data SchemasPolyStructured – Key Value Stored – InMemory - Caching Run atomic operations like appending to a string; incrementing the value in a hash; pushing to a list; computing set intersection, union and difference; or getting the member with highest ranking in a sorted set. With an in-memory dataset, depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log.
  • 39. Ten Big Data SchemasPolyStructured – Key Value Stored – InMemory - Caching
  • 40. Ten Big Data SchemasPolyStructured – Key Value Stored – Columnar Random, real time read/write access to your Big Data Hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware
  • 41. Ten Big Data SchemasPolyStructured – Key Value Stored – Columnar
  • 42. Ten Big Data SchemasPolyStructured – Distributed File System Storage and large-scale processing of data-sets on clusters of commodity hardware. Distributed, scalable, and portable file-system
  • 43. Ten Big Data SchemasPolyStructured – Distributed File System
  • 44. Key Ideas Hadoop is the #1 distributed file system used for Big Data Projects Hadoop is used as the shared data source platform to merge and standardize big data with legacy data
  • 45. Data As A Service Single System Management API’s Data as a Service Applications (API) should be based from a single data source platform. Web and Social Media Machine Generated Human Generated Internal Data Sources Transaction Data Biometric Data Via Data Providers Via Data Originators
  • 46. Key Ideas Hadoop is the #1 distributed file system used for Big Data Projects Hadoop is used as the shared data source platform to merge and standardize big data with legacy data Hadoop is an excellent choice to start building your shared data source platform Hadoop can become your System of Record (SOR) for Big Data and part of your Master Data Management system (MDM)
  • 47. The date time format must be standardized across the data platform Critical Success Factors The time format of International Standard ISO 8601 specifies numeric representations of date and time. YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00) is suggested and preferred. Unique identifiers (domain keys) must be clearly described using friendly terminology For example: ‘ID’ should never be a column name ‘Sales ID’ is too generic ‘Sales Representative Reporting ID’ is friendly and clearly named
  • 48. Key Idea Hadoop is used as the shared analytical platform to merge and standardize analytics
  • 49. Single System Management Analytics should be based from a single data source platform. Analytics As A Service IBM WebSphere eXtreme Scale Analytics Analytics as a Service
  • 50. Key Ideas Hadoop is used as the shared analytical platform to merge and standardize analytics There are guidelines to help suggest the analytics, KPI’s and Profit Drivers for Big Data that are commonly used by each industry.
  • 51. Examples of tasks Algorithms to use (2) Predicting a discrete attribute •Flag the customers in a prospective buyers list as good or poor prospects. •Calculate the probability that a server will fail within the next 6 months. •Categorize patient outcomes and explore related factors. Decision Trees Algorithm Naive Bayes Algorithm Clustering Algorithm Neural Network Algorithm Predicting a continuous attribute •Forecast next year's sales. •Predict site visitors given past historical and seasonal trends. •Generate a risk score given demographics. Decision Trees Algorithm Time Series Algorithm Linear Regression Algorithm Predicting a sequence •Perform clickstream analysis of a company's Web site. •Analyze the factors leading to server failure. •Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities. Sequence Clustering Algorithm Finding groups of common items in transactions •Use market basket analysis to determine product placement. •Suggest additional products to a customer for purchase. •Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities. Association Algorithm Decision Trees Algorithm Finding groups of similar items •Create patient risk profiles groups based on attributes such as demographics and behaviors. •Analyze users by browsing and buying patterns. •Identify servers that have similar usage characteristics. Clustering Algorithm Sequence Clustering Algorithm
  • 52. Key Ideas Hadoop is used as the shared analytical platform to merge and standardize analytics There are guidelines to help suggest the analytics, KPI’s and Profit Drivers for Big Data that are commonly used by each industry. You do not need to know how the algorithm works or is designed. You only need to know the parameters needed to run them.
  • 53. Task Description Algorithms Market Basket Analysis Discover items sold together to create recommendations on-the-fly and to determine how product placement can directly contribute to your bottom line. Association Decision Trees Churn Analysis Anticipate customers who may be considering canceling their service and identify the benefits that will keep them from leaving. Decision Trees Linear Regression Logistic Regression Market Analysis Define market segments by automatically grouping similar customers together. Use these segments to seek profitable customers. Clustering Sequence Clustering Forecasting Predict sales and inventory amounts and learn how they are interrelated to foresee bottlenecks and improve performance. Decision Trees Time Series Data Exploration Analyze profitability across customers, or compare customers that prefer different brands of the same product to discover new opportunities. Neural Network Unsupervised Learning Identify previously unknown relationships between various elements of your business to inform your decisions. Neural Network Web Site Analysis Understand how people use your Web site and group similar usage patterns to offer a better experience. Sequence Clustering Campaign Analysis Spend marketing funds more effectively by targeting the customers most likely to respond to a promotion. Decision Trees Naïve Bayes Clustering Information Quality Identify and handle anomalies during data entry or data loading to improve the quality of information. Linear Regression Logistic Regression Text Analysis Analyze feedback to find common themes and trends that concern your customers or employees, informing decisions with unstructured input. Text Mining Data Mining Tasks (4)
  • 54. Data Mining Algorithms (Analysis Services - Data Mining) Choosing an Algorithm by Task To help you select an algorithm for use with a specific task, the following table provides suggestions for the types of tasks for which each algorithm is traditionally used. Examples of tasks Microsoft algorithms to use Predicting a discrete attribute Microsoft Decision Trees Algorithm Flag the customers in a prospective buyers list as good or poor prospects. Microsoft Naive Bayes Algorithm Calculate the probability that a server will fail within the next 6 months. Microsoft Clustering Algorithm Categorize patient outcomes and explore related factors. Microsoft Neural Network Algorithm Predicting a continuous attribute Microsoft Decision Trees Algorithm Forecast next year's sales. Microsoft Time Series Algorithm Predict site visitors given past historical and seasonal trends. Microsoft Linear Regression Algorithm Generate a risk score given demographics. Predicting a sequence Microsoft Sequence Clustering Algorithm Perform clickstream analysis of a company's Web site. Analyze the factors leading to server failure. Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities. Finding groups of common items in transactions Microsoft Association Algorithm Use market basket analysis to determine product placement. Microsoft Decision Trees Algorithm Suggest additional products to a customer for purchase. Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities. Finding groups of similar items Microsoft Clustering Algorithm Create patient risk profiles groups based on attributes such as demographics and behaviors. Microsoft Sequence Clustering Algorithm Analyze users by browsing and buying patterns. Identify servers that have similar usage characteristics.
  • 55. Analytic Algorithm Categories Regression a powerful and commonly used algorithm that evaluates the relationship of one variable, the dependent variable, with one or more other variables, called independent variables. By measuring exactly how large and significant each independent variable has historically been in its relation to the dependent variable, the future value of the dependent variable can be estimated. Regression models are widely used in applications, such as seasonal forecasting, quality assurance and credit risk analysis.
  • 56. Analytic Algorithm Categories Clustering / Segmentation the process of grouping items together to form categories. You might look at a large collection of shopping baskets and discover that they are clustered corresponding to health food buyers, convenience food buyers, luxury food buyers, and so on. Once these characteristics have been grouped together, they can be used to find other customers with similar characteristics. This algorithm is used to create groups for applications, such as customers for marketing campaigns, rate groups for insurance products, and crime statistics groups for law enforcement.
  • 57. Analytic Algorithm Categories Nearest Neighbor quite similar to clustering, but it will only look at others records in the dataset that are “nearest” to a chosen unclassified record based on a “similarity” measure. Records that are “near” to each other tend to have similar predictive values as well. Thus, if you know the prediction value of one of the records, you can predict its nearest neighbor. This algorithm works similar to the way that people think – by detecting closely matching examples. Nearest Neighbor applications are often used in retail and life sciences applications.
  • 58. Analytic Algorithm Categories Association Rules detects related items in a dataset. Association analysis identifies and groups together similar records that would otherwise go unnoticed by a casual observer. This type of analysis is often used for market basket analysis to find popular bundles of products that are related by transaction, such as low-end digital cameras being associated with smaller capacity memory sticks to store the digital images.
  • 59. Analytic Algorithm Categories Decision Tree a tree-shaped graphical predictive algorithm that represents alternative sequential decisions and the possible outcomes for each decision. This algorithm provides alternative actions that are available to the decision maker, the probabilistic events that follow from and affect these actions, and the outcomes that are associated with each possible scenario of actions and consequences. Their applications range from credit card scoring to time series predictions of exchange rates.
  • 60. Analytic Algorithm Categories Sequence Association detects causality and association between time-ordered events, although the associated events may be spread far apart in time and may seem unrelated. Tracking specific time-ordered records and linking these records to a specific outcome allows companies to predict a possible outcome based on a few occurring events. A sequence model can be used to reduce the number of clicks customers have to make when navigating a company’s website.
  • 61. Analytic Algorithm Categories Neural Network a sophisticated pattern detection algorithm that uses machine learning techniques to generate predictions. This technique models itself after the process of cognitive learning and the neurological functions of the brain capable of predicting new observations from other known observations. Neural networks are very powerful, complex, and accurate predictive models that are used in detecting fraudulent behavior, in predicting the movement of stocks and currencies, and in improving the response rates of direct marketing campaigns.
  • 62. Choosing a Big Data Architecture Big Data Platform Big Data Analytical Platform Big Data Analytics Big Data Business Needs by type Big Data Architecture
  • 63. Analytics Data Sources Analytics should be based from a single data source platform. Analytics As A Service Analytics as a Service IBM WebSphere eXtreme Scale
  • 64. Analytics As A Service When you write data to a traditional database, either through loading external data, writing the output of a query, doing UPDATE statements, etc., the database has total control over the storage. The database is the "gatekeeper." An important implication of this control is that the database can enforce the schema as data is written. This is called schema on write. Hive has no such control over the underlying storage. There are many ways to create, modify, and even damage the data that Hive will query. Therefore, Hive can only enforce queries on read. This is called schema on read. So what if the schema doesn’t match the file contents? Hive does the best that it can to read the data. You will get lots of null values if there aren’t enough fields in each record to match the schema. If some fields are numbers and Hive encounters nonnumeric strings, it will return nulls for those fields. Above all else, Hive tries to recover from all errors as best it can.
  • 66. Analytics As A Service Benefits of schema on write: • Better type safety and data cleansing done for the data at rest • Typically more efficient (storage size and computationally) since the data is already parsed Downsides of schema on write: • You have to plan ahead of time what your schema is before you store the data (i.e., you have to do ETL) • Typically you throw away the original data, which could be bad if you have a bug in your ingest process • It's harder to have different views of the same data Benefits of schema on read: • Flexibility in defining how your data is interpreted at load time • This gives you the ability to evolve your "schema" as time goes on • This allows you to have different versions of your "schema" • This allows the original source data format to change without having to consolidate to one data format • You get to keep your original data • You can load your data before you know what to do with it (so you don't drop it on the ground) • Gives you flexibility in being able to store unstructured, unclean, and/or unorganized data Downsides of schema on read: • Generally it is less efficient because you have to reparse and reinterpret the data every time (this can be expensive with formats like XML) • The data is not self-documenting (i.e., you can't look at a schema to figure out what the data is) • More error prone and your analytics have to account for dirty data   http://nosql.mypopescu.com/post/48638541973/schema-on-writes-vs-schema-on-reads-apache-hadoop-and
  • 67. Reporting users make their own schemas and naming standards Reporting users run their own analytics --- as many times as they want
  • 68. Key Ideas - Summary One Big Data database cannot accommodate all the Big Data types You need to know the data type and data architecture to select the most appropriate Big Data database. There are guidelines to help suggest the Big Data Types that are commonly used by each business type. Hadoop is used as the shared data source platform to merge and standardize big data with legacy data Hadoop is used as the shared analytical platform to merge and standardize analytics Hadoop is an excellent choice to start building your shared data source platform Hadoop can become your System of Record (SOR) for Big Data and part of your Master Data Management system (MDM) Hadoop is used to standardize and centralize the Key Performance Indicators (KPI) and Profit Drivers for an Enterprise Analytical Platform There are guidelines to help suggest the analytics, KPI’s and Profit Drivers for Big Data that are commonly used by each industry. Schema on read
  • 69. Critical Success Factors - Summary Validate the data being collected has business value. The date time format must be standardized across the data platform. Unique identifiers (domain keys) must be clearly described using friendly terminology
  • 70. 1) Pervasive insights produce better business decision opening access to business intelligence by embedding analytics capabilities into everyday software tools pays substantial dividends. By Lauren Gibbons Paul 2) Data Mining Algorithms (Analysis Services - Data Mining) http://msdn.microsoft.com/en-us/library/ms175595.aspx 3) Data Mining Query Task http://msdn.microsoft.com/en-us/library/ms141728.aspx 4) Predictive Analysis with SQL Server 2008 - White Paper - Microsoft - Published: November 2007 5) Predictive Analytics for the Retail Industry - White Paper - Microsoft - Writer: Matt Adams Technical Reviewer: Roni Karassik, Published: May 2008 6) Breakthrough Insights using Microsoft SQL Server 2012 - Analysis Services https://www.microsoftvirtualacademy.com/tracks/breakthrough-insights-using-microsoft-sql-server-2012-a 7) Useful DAX Starter Functions and Expressions http://thomasivarssonmalmo.wordpress.com/category/powerpivot-and-dax/ 8) Stairway to PowerPivot and DAX - Level 1: Getting Started with PowerPivot and DAX By Bill_Pearson, 2011/12/21 9) Data Mining Tool http://technet.microsoft.com/en-us/library/ms174467.aspx 10) DAX Cheat Sheet http://powerpivot-info.com/post/439-dax-cheat-sheet 11) Big Data Landscape - http://arnon.me/2012/11/nosql-landscape-diagrams/ References
  • 71. On the Internet, the World Wide Web Consortium (W3C) uses ISO 8601 in defining a profile of the standard that restricts the supported date and time formats to reduce the chance of error and the complexity of software.[19] RFC 3339 defines a profile of ISO 8601 for use in Internet protocols and standards. It explicitly excludes durations and dates before the common era. The more complex formats such as week numbers and ordinal days are not permitted.[20] RFC 3339 deviates from ISO 8601 in allowing a zero timezone offset to be specified as "-00:00", which ISO 8601 forbids. RFC 3339 intends "-00:00" to carry the connotation that it is not stating a preferred timezone, whereas the conforming "+00:00" or any non-zero offset connotes that the offset being used is preferred. This convention regarding "-00:00" is derived from earlier RFCs, such as RFC 2822 which uses it for timestamps in email headers. RFC 2822 made no claim that any part of its timestamp format conforms to ISO 8601, and so was free to use this convention without conflict. RFC 3339 errs in adopting this convention while also claiming conformance to ISO 8601. http://www.w3.org/TR/NOTE-datetime http://stackoverflow.com/questions/16307563/utc-time-explanation International Standard ISO 8601 specifies numeric representations of date and time. YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00) where: YYYY = four-digit year MM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31) hh = two digits of hour (00 through 23) (am/pm NOT allowed) mm = two digits of minute (00 through 59) ss = two digits of second (00 through 59) s = one or more digits representing a decimal fraction of a second TZD = time zone designator (Z or +hh:mm or -hh:mm) Times are expressed in UTC (Coordinated Universal Time), with a special UTC designator ("Z"). Times are expressed in local time, together with a time zone offset in hours and minutes. A time zone offset of "+hh:mm" indicates that the date/time uses a local time zone which is "hh" hours and "mm" minutes ahead of UTC. A time zone offset of "-hh:mm" indicates that the date/time uses a local time zone which is "hh" hours and "mm" minutes behind UTC.