Uploaded byAIBDP

PPTX, PDF1,345 views

Big data connection overview by aibdp.org

The document provides an agenda for a meeting on big data. It begins with acknowledging those in attendance from various companies. It then lists the agenda items which include an overview of big data connections, a presentation on demystifying big data, and success stories from SAP HANA startups. There will also be a quick poll of attendees and a networking session with Q&A. Additional slides provide background on the American Institute of Big Data Professionals and their mission to educate on big data solutions. Diagrams map the big data landscape and key vendors in related areas like analytics, visualization and predictive analytics.

Technology◦Business◦

Welcome
• Thank you: Francis - Silicon Valley Strategy,
Innovation and Product Management group
• Thank you: Michael & Sam and the Microsoft
Store
• Thank you: Aleks & David & SAP HANA
• Thank you: All of You… You are the ‘Secret
Sauce’

Agenda
• Quick Poll
• Overview – AIBDP / Big Data Connection
• Prasad Mavuduri – Board Member, AIBDP –
“Demystifying Big Data”
• David Sonnenschein – Vice President & Aleks
Swerdlow Community Manager – SAP Labs -
HANA In-Memory – Start-ups Success Stories”
• Networking & Q&A

Quick Poll
• Relationship & Experience w/ Big Data
• Job Role
• Industry
• Company Years - Start-up?
• Big Data Implementation Status
• Biggest Challenges / Opportunities
• Vs Cometitors?

Overview - Big Data Connections
Mission: Demystify Big Data
– Five E’s – entertain, engage, educate etc
– Focus on Solutions (vs technology)
– Focus on Specific Verticals
• ex Healthcare, Risk, eCom/eMarketing,
Manufacturing, Logistics, Telecom…)
– Best Practices Case Study Reviews
– Networking & Shared Learning
– Sponsored by the American Institute of
Big Data Professionals (AIBDP.org)
– Sponsored by Big Data consulting firm,
Data-Magnum

BI Platform / Reporting
OSS
Visualizations
Unstructured/ Search
Indexing / Metadata
Search
NLP
Hadoop Analytics
Hadoop Dev Platforms / Automation
HDFS
Predictive Analytics
THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT
STRUCTURED UNSTRUCTURED
Transactional
DB
OSS
High Performance
Analytical DB
NewSQL
Enhancement
Distributed
NoSQL
Graph Document
Key Value /
Column
Enterprise
Apps
Internet
Apps
Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles
Hadoop
aaS
HDFS Alternatives
DBaaS
HANA
GraphDB
Filesystem
EMR
Text / Sentiment Analysis
Data as a Service
Data
Warehouses
vFabric L
Drill
Vertical Market Applications
Impala
Messaging Optimization Data Integration / CEP
OSS
IMDG
Redshift
Based on Source: Perella Weinberg Partners
AI

Source:

Source: CapGemini: http://www.capgemini.com/sites/default/files/technology-blog/files/2012/09/big-data-vendors.jpg

Big Data Landscape
http://www.bigdatalandscape.com/

Source: http://www.forbes.com/special-report/2013/industry-atlas.html

Business Intelligence Analytics / Visualization
Big Data BI & Analytics/Visualization Landscape
Oracle Essbase Laurén

Predictive Analytics Leaders

Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
AH.. Simplicity… This looks pretty straight-forward… I can handle this..

Our Landscape Collection as published on Startup50.com

Simplified (so far)
 Data Input - Sources, Databases, & Integration
Tools
 Platform / Infrastructure - Data Preparation,
map reduce, filing, governance…
 Data Presentation & Analysis – BI, Data
Discovery, Visualization
 Predictive Analytics & Machine Learning
 Vertical & Horizontal Products (Specialized
Applications)

It can be made more complicated…
o Hadoop
o NoSQL
o NewSQL
o Structured Databases
o NGDW (next generation data warehouse)
o Cloud Services
o Technical Services
o Professional Services
o Distributors
o Deployment services
o Deployment stack/appliances
o Development services
o Application stacks
o Database stacks
o Managed Monitoring
o Storage
o Security

Example Optimized Marketing

Recommended

PPTX

Big Data Connection presents: Big Data: Cause of Confusion

PPTX

Recipes for Unlocking Value from Big Data

PPTX

Reinventing the Modern Information Pipeline: Paxata and MapR

PDF

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...

PPTX

Using Safyr to find SAP data models

byRoland Bullivant

PDF

Using Safyr for SAP in an Oil and Gas company

byRoland Bullivant

PPTX

A modern, flexible approach to Hadoop implementation incorporating innovation...

byDataWorks Summit

PDF

DesignMind Data Analytics Consulting

PDF

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...

byDataconomy Media

KEY

U of A Web Strategy and Sitecore

byTim Schneider

PPTX

DataScience and BigData Cebu 1st meetup

byFrancisco Liwa

PPT

Web analyticsandbigdata techweek2011

byRaghu Kashyap

PDF

SAS Visual Analytics Overview

bySAS Institute India Pvt. Ltd

PDF

5 Myths about Spark and Big Data by Nik Rouda

PPTX

UCSD: Building a Big Data Culture - It Takes a Village

PPTX

zData Inc. Big Data Consulting and Services - Overview and Summary

PPTX

[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...

PPT

Goa public affairs 20121213

byTim Schneider

PPT

Gartner peer forum sept 2011 orbitz

byRaghu Kashyap

PDF

Hadoop Perspectives for 2017

PPTX

AzureDay - Introduction Big Data Analytics.

byŁukasz Grala

PDF

SiSense Overview

PPTX

Metadata discovery for enterprise packages - a better approach

byRoland Bullivant

PPTX

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...

PPTX

Benchmarking Digital Readiness: Moving at the Speed of the Market

byApigee | Google Cloud

PDF

Big Data Meetup: Analytical Systems Evolution

PPTX

Geek Sync - Cloud Considerations

byIDERA Software

PPT

BI and Predictive analytics 2011 shyam desigan presentation

byShyam Desigan

PDF

1 to 1 Presentation 2015

byJames Puliatte

PPTX

Hadoop hive

More Related Content

PPTX

Big Data Connection presents: Big Data: Cause of Confusion

PPTX

Recipes for Unlocking Value from Big Data

PPTX

Reinventing the Modern Information Pipeline: Paxata and MapR

PDF

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...

PPTX

Using Safyr to find SAP data models

byRoland Bullivant

PDF

Using Safyr for SAP in an Oil and Gas company

byRoland Bullivant

PPTX

A modern, flexible approach to Hadoop implementation incorporating innovation...

byDataWorks Summit

PDF

DesignMind Data Analytics Consulting

Big Data Connection presents: Big Data: Cause of Confusion

Recipes for Unlocking Value from Big Data

Reinventing the Modern Information Pipeline: Paxata and MapR

Open Source Framework for Deploying Data Science Models and Cloud Based Appli...

Using Safyr to find SAP data models

byRoland Bullivant

Using Safyr for SAP in an Oil and Gas company

byRoland Bullivant

A modern, flexible approach to Hadoop implementation incorporating innovation...

byDataWorks Summit

DesignMind Data Analytics Consulting

What's hot

PDF

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...

byDataconomy Media

KEY

U of A Web Strategy and Sitecore

byTim Schneider

PPTX

DataScience and BigData Cebu 1st meetup

byFrancisco Liwa

PPT

Web analyticsandbigdata techweek2011

byRaghu Kashyap

PDF

SAS Visual Analytics Overview

bySAS Institute India Pvt. Ltd

PDF

5 Myths about Spark and Big Data by Nik Rouda

PPTX

UCSD: Building a Big Data Culture - It Takes a Village

PPTX

zData Inc. Big Data Consulting and Services - Overview and Summary

PPTX

[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...

PPT

Goa public affairs 20121213

byTim Schneider

PPT

Gartner peer forum sept 2011 orbitz

byRaghu Kashyap

PDF

Hadoop Perspectives for 2017

PPTX

AzureDay - Introduction Big Data Analytics.

byŁukasz Grala

PDF

SiSense Overview

PPTX

Metadata discovery for enterprise packages - a better approach

byRoland Bullivant

PPTX

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...

PPTX

Benchmarking Digital Readiness: Moving at the Speed of the Market

byApigee | Google Cloud

PDF

Big Data Meetup: Analytical Systems Evolution

PPTX

Geek Sync - Cloud Considerations

byIDERA Software

PPT

BI and Predictive analytics 2011 shyam desigan presentation

byShyam Desigan

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...

byDataconomy Media

U of A Web Strategy and Sitecore

byTim Schneider

DataScience and BigData Cebu 1st meetup

byFrancisco Liwa

Web analyticsandbigdata techweek2011

byRaghu Kashyap

SAS Visual Analytics Overview

bySAS Institute India Pvt. Ltd

5 Myths about Spark and Big Data by Nik Rouda

UCSD: Building a Big Data Culture - It Takes a Village

zData Inc. Big Data Consulting and Services - Overview and Summary

[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...

Goa public affairs 20121213

byTim Schneider

Gartner peer forum sept 2011 orbitz

byRaghu Kashyap

Hadoop Perspectives for 2017

AzureDay - Introduction Big Data Analytics.

byŁukasz Grala

SiSense Overview

Metadata discovery for enterprise packages - a better approach

byRoland Bullivant

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...

Benchmarking Digital Readiness: Moving at the Speed of the Market

byApigee | Google Cloud

Big Data Meetup: Analytical Systems Evolution

Geek Sync - Cloud Considerations

byIDERA Software

BI and Predictive analytics 2011 shyam desigan presentation

byShyam Desigan

Viewers also liked

PDF

1 to 1 Presentation 2015

byJames Puliatte

PPTX

Hadoop hive

PPTX

加速開發! 在Windows開發hadoop程式，直接運行 map/reduce

PDF

Big Data Landscape 2016

byJosef Adersberger

PDF

Hadoop pig

PDF

Hadoop 2.0 之古往今來

PDF

Hadoop ecosystem - hadoop 生態系

PDF

大資料趨勢介紹與相關使用技術

PPTX

Hadoop Powers Modern Enterprise Data Architectures

byDataWorks Summit

PDF

啟程：Data Technology 的待客之道

PDF

台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖

PDF

Data Leaders in Action - 資料價值領袖風範與關鍵行動

PDF

那些你知道的，但還沒看過的 Big Data 風景

PDF

資料科學團隊人才培育分享 ─ 以 DSP 為例

PDF

Summary of Insights Learned from the Data Science Program Team Training

PDF

轉兌數據的價值 — 從導購到策購

PPTX

資料價值 — 一位資料產品經理的視野

PDF

Big Data vs. Open Data

PDF

那些你知道的，但還沒看過的 Big Data 風景 ─ 致 Hadooper

PDF

Big Data 現象，以及現象中的我們

1 to 1 Presentation 2015

byJames Puliatte

Hadoop hive

加速開發! 在Windows開發hadoop程式，直接運行 map/reduce

Big Data Landscape 2016

byJosef Adersberger

Hadoop pig

Hadoop 2.0 之古往今來

Hadoop ecosystem - hadoop 生態系

大資料趨勢介紹與相關使用技術

Hadoop Powers Modern Enterprise Data Architectures

byDataWorks Summit

啟程：Data Technology 的待客之道

台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖

Data Leaders in Action - 資料價值領袖風範與關鍵行動

那些你知道的，但還沒看過的 Big Data 風景

資料科學團隊人才培育分享 ─ 以 DSP 為例

Summary of Insights Learned from the Data Science Program Team Training

轉兌數據的價值 — 從導購到策購

資料價值 — 一位資料產品經理的視野

Big Data vs. Open Data

那些你知道的，但還沒看過的 Big Data 風景 ─ 致 Hadooper

Big Data 現象，以及現象中的我們

Similar to Big data connection overview by aibdp.org

PPTX

Fundamentals of Big Data

byThe Wisdom Daily

PDF

Big data/Hadoop/HANA Basics

byGlobal Business Solutions SME

PPTX

Finance and Audit Predictive Analytics

PDF

Big Data, Little Data, and Everything in Between

PDF

Big Data Analytics

bySreedhar Chowdam

PPTX

Big data unit 2

PDF

Mastering Big Data: Tools, Techniques, and Applications

bykhushnuma khan

PDF

Big data and analytics

byBohitesh Misra, PMP

PPTX

"Demystifying Big Data by AIBDP.org

PDF

Thilga

byTHILAKAVATHIRAMRAJ

PDF

Big Data Overview

byIMEX Research

PPTX

March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...

PPT

Bigdata Landscape and Competitive Intelligence

PPTX

Big Data Analytics

byGlobal Business Solutions SME

PPTX

Big data session five ( a )f

PPTX

Big Data PPT by Rohit Dubey

PPSX

De-Mystifying Big Data

byPrasad Mavuduri

PPTX

Finance and Accounting BPM

PDF

BAR360 open data platform presentation at DAMA, Sydney

bySai Paravastu

PDF

Big data – An Introduction, July 2013

Fundamentals of Big Data

byThe Wisdom Daily

Big data/Hadoop/HANA Basics

byGlobal Business Solutions SME

Finance and Audit Predictive Analytics

Big Data, Little Data, and Everything in Between

Big Data Analytics

bySreedhar Chowdam

Big data unit 2

Mastering Big Data: Tools, Techniques, and Applications

bykhushnuma khan

Big data and analytics

byBohitesh Misra, PMP

"Demystifying Big Data by AIBDP.org

Thilga

byTHILAKAVATHIRAMRAJ

Big Data Overview

byIMEX Research

March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...

Bigdata Landscape and Competitive Intelligence

Big Data Analytics

byGlobal Business Solutions SME

Big data session five ( a )f

Big Data PPT by Rohit Dubey

De-Mystifying Big Data

byPrasad Mavuduri

Finance and Accounting BPM

BAR360 open data platform presentation at DAMA, Sydney

bySai Paravastu

Big data – An Introduction, July 2013

Recently uploaded

PDF

How PayPal Account Verification Works – Complete Guide for Online Businesses

PDF

Getting the Best of TrueDEM – January News & Updates

PPTX

SOFTWARE DEVELOPMENT PROCESS - INTRODUCTION

byParithi Thamizh

PDF

CISO_2027_Playbook: Sovereign AI Resilience & Quantum-Proof Identity

PDF

5G Explained! A High Level Overview [Introduction]

byLuciano Motta

PDF

Ethical AI applied to publishing: Presenting wâsikan kisewâtisiwin, an AI too...

byBookNet Canada

PPTX

Strategic Shelf Planning for the New Year Turning Resolutions into Revenue .pptx

PDF

Transcript: What ONIX can do: Leveraging metadata to support the discoverabil...

byBookNet Canada

PDF

Why Are Cloud Migration Services Essential for Modern Business Growth?

byGeoPITS Global Pvt Ltd

PDF

How the EU Ecolabel's Record Growth Creates ESG Opportunities for CRE

PPTX

Storage-and-HCI-Positioning-update-sales.pptx

byhungungphu123

PDF

LUXHUB: a detailed look at 2025...and fast forward to 2026

byalexandrekeilmann1

PPTX

Spacecraft Guidance Quick Research Guide by Arthur Morgan

byArthur Morgan

PDF

Teaching Robots how to Read 1/2: AI Center & Classic Document Understanding (...

PDF

Skills to Pass the UiPath Agentic Automation Associate (UiAAA) Certification

byUiPathCommunity

PPTX

apidays Australia 2025 | Building AI RAG Applications with No Code.pptx

PDF

apidays Australia 2025 | The Strategic Role of APIs in Digital Transformation

PDF

10 Best AI Tools for Fashion Brands in 2026

PPTX

AI and Digital Transformation Solutions.

byBuildingblocks

PPTX

Mastering SQL Server Replication: Types, Setup & Troubleshooting

byGeoPITS Global Pvt Ltd

How PayPal Account Verification Works – Complete Guide for Online Businesses

Getting the Best of TrueDEM – January News & Updates

SOFTWARE DEVELOPMENT PROCESS - INTRODUCTION

byParithi Thamizh

CISO_2027_Playbook: Sovereign AI Resilience & Quantum-Proof Identity

5G Explained! A High Level Overview [Introduction]

byLuciano Motta

Ethical AI applied to publishing: Presenting wâsikan kisewâtisiwin, an AI too...

byBookNet Canada

Strategic Shelf Planning for the New Year Turning Resolutions into Revenue .pptx

Transcript: What ONIX can do: Leveraging metadata to support the discoverabil...

byBookNet Canada

Why Are Cloud Migration Services Essential for Modern Business Growth?

byGeoPITS Global Pvt Ltd

How the EU Ecolabel's Record Growth Creates ESG Opportunities for CRE

Storage-and-HCI-Positioning-update-sales.pptx

byhungungphu123

LUXHUB: a detailed look at 2025...and fast forward to 2026

byalexandrekeilmann1

Spacecraft Guidance Quick Research Guide by Arthur Morgan

byArthur Morgan

Teaching Robots how to Read 1/2: AI Center & Classic Document Understanding (...

Skills to Pass the UiPath Agentic Automation Associate (UiAAA) Certification

byUiPathCommunity

apidays Australia 2025 | Building AI RAG Applications with No Code.pptx

apidays Australia 2025 | The Strategic Role of APIs in Digital Transformation

10 Best AI Tools for Fashion Brands in 2026

AI and Digital Transformation Solutions.

byBuildingblocks

Mastering SQL Server Replication: Types, Setup & Troubleshooting

byGeoPITS Global Pvt Ltd

Big data connection overview by aibdp.org

1.
Welcome • Thank you:Francis - Silicon Valley Strategy, Innovation and Product Management group • Thank you: Michael & Sam and the Microsoft Store • Thank you: Aleks & David & SAP HANA • Thank you: All of You… You are the ‘Secret Sauce’
2.
Agenda • Quick Poll •Overview – AIBDP / Big Data Connection • Prasad Mavuduri – Board Member, AIBDP – “Demystifying Big Data” • David Sonnenschein – Vice President & Aleks Swerdlow Community Manager – SAP Labs - HANA In-Memory – Start-ups Success Stories” • Networking & Q&A
3.
Quick Poll • Relationship& Experience w/ Big Data • Job Role • Industry • Company Years - Start-up? • Big Data Implementation Status • Biggest Challenges / Opportunities • Vs Cometitors?
4.
Overview - BigData Connections Mission: Demystify Big Data – Five E’s – entertain, engage, educate etc – Focus on Solutions (vs technology) – Focus on Specific Verticals • ex Healthcare, Risk, eCom/eMarketing, Manufacturing, Logistics, Telecom…) – Best Practices Case Study Reviews – Networking & Shared Learning – Sponsored by the American Institute of Big Data Professionals (AIBDP.org) – Sponsored by Big Data consulting firm, Data-Magnum
5.
BI Platform /Reporting OSS Visualizations Unstructured/ Search Indexing / Metadata Search NLP Hadoop Analytics Hadoop Dev Platforms / Automation HDFS Predictive Analytics THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT STRUCTURED UNSTRUCTURED Transactional DB OSS High Performance Analytical DB NewSQL Enhancement Distributed NoSQL Graph Document Key Value / Column Enterprise Apps Internet Apps Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles Hadoop aaS HDFS Alternatives DBaaS HANA GraphDB Filesystem EMR Text / Sentiment Analysis Data as a Service Data Warehouses vFabric L Drill Vertical Market Applications Impala Messaging Optimization Data Integration / CEP OSS IMDG Redshift Based on Source: Perella Weinberg Partners AI
6.
Source:
8.
Source: CapGemini: http://www.capgemini.com/sites/default/files/technology-blog/files/2012/09/big-data-vendors.jpg
10.
Big Data Landscape http://www.bigdatalandscape.com/
11.
Source: http://www.forbes.com/special-report/2013/industry-atlas.html
12.
Business Intelligence Analytics/ Visualization Big Data BI & Analytics/Visualization Landscape Oracle Essbase Laurén
13.
Predictive Analytics Leaders
14.
Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond AH.. Simplicity…This looks pretty straight-forward… I can handle this..
15.
Our Landscape Collectionas published on Startup50.com
16.
Simplified (so far) Data Input - Sources, Databases, & Integration Tools  Platform / Infrastructure - Data Preparation, map reduce, filing, governance…  Data Presentation & Analysis – BI, Data Discovery, Visualization  Predictive Analytics & Machine Learning  Vertical & Horizontal Products (Specialized Applications)
17.
It can bemade more complicated… o Hadoop o NoSQL o NewSQL o Structured Databases o NGDW (next generation data warehouse) o Cloud Services o Technical Services o Professional Services o Distributors o Deployment services o Deployment stack/appliances o Development services o Application stacks o Database stacks o Managed Monitoring o Storage o Security
18.
Example Optimized Marketing

Editor's Notes

#8 Source: sqrll:To simplify the NoSQL world, lets take a look at the top 3 databases in terms of current popularity and how they compare to Apache Accumulo, which is at the core of our product, Sqrrl Enterprise.MongoDB: It is a wonderfully easy-to-use document store that many select as a flexible replacement for a SQL database, as it (like all NoSQL databases) does not require pre-defined schemas. However, MongoDB has difficulty scaling to very large datasets (e.g., 100+ TB) and does not natively work with your Hadoop cluster. It also does not possess fine-grained security controls.Cassandra: This is an excellent choice if your data is too big for MongoDB and you require multi-datacenter replication. Although Cassandra was not originally designed to run natively on your Hadoop cluster, it now has integrations with MapReduce, Pig, and Hive. It does not possess fine-grained security controls.HBase: HBase natively integrates with Hadoop, and it can handle very large datasets. However, it does not have fine-grained security controls. Accumulo: Accumulo has an architecture most similar to HBase, which allows it also to natively plug into your Hadoop cluster. It is far more scalable than MongoDB, and with reported cluster sizes in the multiple thousands within the Intelligence Community it is also significantly more scalable than HBase and Cassandra. Accumulo is the only NoSQL database with cell-level security capabilities. Accumulo also has other features that could lead one to choose it over HBase or Cassandra for reasons other than security or scalability. For example, Accumulo has a powerful server-side programming mechanism called Iterators, which provide it with the capability to do a variety of real-time aggregations and analytics.These high level differences between MongoDB, Cassandra, HBase, and Accumulo are summarized in the decision tree diagram below. Of course, there are a wide variety of more detailed technical differences that will be explored in greater detail in a later post. This decision tree can be summarized with a few simple statements:If you need a quick, simple solution and have “small” Big Data (e.g., a few dozen terabytes), MongoDB may be the answer.If you need cell-level security or multi-petabyte scalability, Accumulo is the right answer.If you have data that is too big for MongoDB and don’t need cell-level security or massive scalability, we would recommend testing HBase, Cassandra, and Accumulo for your specific workloads. Each has their own nuanced advantages and disadvantages.If you don’t need real-time analytics, you are probably on the wrong decision tree and can stick with the Hadoop Distributed File System and batch analytics. It is worth noting that the NoSQL databases above are all open source databases. Sqrrl Enterprise builds upon Accumulo and adds a number of additional features to Accumulo including streaming ingest, JSON, encryption, identity management integrations, full-text search, SQL queries, graph search, and statistics. We believe that these features set Sqrrl Enterprise apart from other Big Data platforms.
#9 http://www.capgemini.com/blog/capping-it-off/2012/09/big-data-vendors-and-technologies-the-listBig Data Vendors and TechnologiesData Acquisition stream - technological providers Ab InitioHPIBM (Datastage, Streams, Data mirror)Informatica (PowerCenter, PowerExchange, CEP)KalidoMicrosoftNumentaOracleSAPSASSplunkSyncsortTalendTibcoData ProvidersComScoreDatasiftExperianFactualGfKGnipIMSInrixKaggleKnoemaLexisNexisMicrosoft (with their Windows Azure Marketplace data market)NielsenReutersSalesforce Radian6Symphony IRIsocial network websites like Facebook, Google+, LinkedIn, Tumblr, Twitter or Viadeoall the Open Data providers, like governments, regions, etc.Marshalling domain - Very Large Data Warehousing and BI AppliancesActian; ParaccelEMC² (Greenplum)HP (Vertica)IBM (Netezza)KognitioMicrosoft (SQL 2012 and PDW)Oracle (Exadata)SAP (HANA and Sybase IQ)SASTeradataNoSQL Domain – Main technologies and vendors: Amazon (as cloud provider or with their own NoSQL solution)CassandraCloudera (CDH, Hadoop distribution)CouchDBEMC²GoogleHadoop (of course)GoogleHortonworks (Hadoop distribution)HPIBMKXMapR (Hadoop distribution)MarkLogicMicrosoft (Hadoop on Windows and Azure)MongoDBNeo4JOraclePalantirSnaplogicSparsitySplunkTeradata (Aster Data)ZL TechnologiesContent Management Space:AdobeAlfrescoEMC² (Documentum)IBM (FileNet)HP (Autonomy)MicrosoftOpenTextOracle.Analytics phasePredictive technologies (such as data mining) and vendors which are Adobe, EMC², GoodData, Hadoop Map Reduce, HP, IBM (SPSS), Karmasphere, Kxen, Microsoft, Mzinga, Oracle, R, Salesforce, SAS, SAP (R on HANA) and Teradata (Aprimo). Data Virtualization (and data federation) is currently led by Composite, Denodo, HP (IDOL), IBM, Informatica, Microsoft, Oracle (Exalytics), SAP and Teiid (JBoss community).c BI Tools Vendors:ActuateDassaultSystèmes (Exalead)DomoEsriGoodDataGoogleHP (Autonomy)IBM (Cognos suite)Information BuildersLogiXML (LogiAnalytics)Microsoft (SQL 2012)MicrostrategyNeutrinoBIOracle (OBI Foundation)PanopticonPanoramaPentahoQlikviewRoambiSAP (BI4 suite)SASSpagoBITableauTIBCO Spotfire.Action Phase - Data Acquisition providers plus the ERP, CRM and BPM actorsAdobeEloquaEMC²IBMiGrafxMicrosoftOpenTextOraclePegaProgress softwareSAPSalesforceSoftware AGTeradata (Aprimo) Tibco.Data Governance area - Master Data Management (MDM), metadata and data quality toolsAdaptiveHPIBMInformaticaKalidoMicrosoftOracleOrchestra NetworksSAPSASTalendTibco. Note that the Complex Event Processing (CEP) Tools are part of Acquisition (streaming data acquisition), Marshalling (eg in-memory storage as data is used or compared immediately) and Analytics (eg Monitoring functions to detect abnormal activity) streams.Note that the BI Tools are part of Analytics (Computing Key Performance Indicators) and Action (eg Creating Alerts in a push mode by mail for instance) streams.
#10 Citrisleaf = AerospikeCouchbase – roots are in Northscale – Membase .. CouchDB; two focus audiences – Enterprise & funnel
#11 Analytics Infrastrucure = MPP – Distributed open-source, Apache-licensed distribution of Apache Hadoop ... Open source, Massively Parallel Processing (MPP) query engineInfrastucure ad a Service = Cloud IaaSOperational Infrastructure = Structure of Data – ex JSAN; ad-hoc queries; unstructured data; behaviorial, redundencyNot Listed – Hardware / Storage – NetApp, EMC, HP
#12 Per Forbes (per Wikibon), Big Data is an $18 billion industry heading to $50 billion in five years. The companies in the inner-circle (ex: MapR, Cloudera, Splunk, Couchbase etc) are pure-plays within Big Data. A theory is these inner-circle players will probably get gobbled up by the big boys on the outside, who are just starting to play in the Big Data space (like SAP, Microsoft, Oracle, IBM…) In the meantime, the relative sizes of the circles reflects the relative size of the companies, in terms of revenue. The percentages reflect the % of their current business that is ‘big data’
#14 5/18/13 w/ Paul HofmannPalantir – just text; just Homeland SecurityOracle Endica – addedHP Autonomy AddedAttivio (partner with TIBCO added)Saffron – Semantec and .. (Risk predictive) added0xData – changed logoMuSigma -= Consultant onlyRecorded Future -= Timeline; Opera = Text-only?; No predictive Analytics?Kxen – nice companySAS – Dead? Not scalable; Skytree = a platform / toolbox.. You need to have yoru own Data Quant to create yuur own analytics Sociocast – Saffron PartnerDigital Reasoning – Strong with Dept of Defense too
#15 NoSQL databases currently available include:Hbase (Apache)Cassandra (DataStax)MarkLogic (MarkLogic)Aerospike (CitrixDB)MongoDB (10gen)Accumulo (Apache)Riak (Basho)CouchDB (CouchBase)DynamoDB (Amazon)Sqrrl (?)VoltDB (?)http://thinkbiganalytics.com/leading_big_data_technologies/nosql/NoSQLNoSQL is an umbrella term for a broad class of database management systems that relax some of the tradition design constraints of relational database management systems (RDBMS) in order to meet goals of more cost-effective scalability, flexible tradeoffs of availability vs. consistency (as described by the CAP theorem), and flexibility for data structures that don’t fit well into the relational model, such as key-value data and large graphs. NoSQL databases typically don’t offer ACID transactions nor full SQL dialects.The NoSQL ecosystem is very large. Among the better known databases are HBase, Cassandra, Aerospike, DynamoDB, MongoDB, Riak, Redis, Accumulo, Datatomic, and Couchbase. Of these, HBase and Accumulo are more closely tied to Hadoop than the others, as both use HDFS, by default, for persistent storage and Zookeeper for service federation.NoSQL databases expose different information models, including key-value records, JSON or XML documents as records, or graph-oriented data. They expose corresponding programmer APIs and sometimes custom query languages that may or may not be SQL-based. However, a recent trend in this industry is the re-introduction of restricted SQL dialects to support the large user community accustomed to SQL and improving support for transactions.As an example of a scenario where a NoSQL database is a good fit, an event log for a web site might be captured in a key-value store, where fast appends and key-based retrievals are required, but not updates nor joins.HBaseHBase is a distributed, column-oriented database, where each cell is versioned (a configurable number of previous values is retained). HBase provides Bigtable-like capabilities on top of Hadoop. SQL queries (but not updates) are supported using Hive, but with high latency. Eventually, Impala will also support Hive queries with lower latency. Like many NoSQL databases, HBase does not support complex transactions, SQL, or ACID transactions. However, HBase offers high read and write performance and is used in several large applications, such as Facebook’s Messaging Platform. By default, HBase uses HDFS for durable storage, but it layers on top of this storage fast record-level queries and updates, which “raw” HDFS doesn’t support. Hence, HBase is useful when fast, record-level queries and updates are required, but storage in HDFS is desired for use with Pig, Hive, or other MapReduce-based tools.Cassandra Cassandra is the most popular NoSQL database for very large data sets. It is a key-value, clustered database that uses column-oriented storage, sharding by key ranges, and redundant storage for scalability in both data sizes and read/write performance, as well as resiliency against “hot” nodes and node failures. Cassandra has configurable consistency vs. availability (CAP theorem) tradeoffs, such as a tunable quorum model for writes.MongoDB MongoDB is a document-oriented NoSQL database where each record is a JSON document. It has a rich, Javascript-based query language that exploits the implicit structure of JSON. MongoDB supports sharding for improved scalability and resilience. It is most popular for small to large data sets and less commonly used for very large data sets.DynamoDBDynamoDB is Amazon’s highly scalable and available, key-value, NoSQL database. DynamoDB was one of the earliest NoSQL databases and papers written about it influenced the design of many other NoSQL databases, such as Cassandra.CouchbaseCouchbase is a key-value NoSQL database that is well-suited for mobile applications where a copy of a data set is resident on many devices, where changes can be performed on any copy, and copies are synchronized when connectivity is available. Think of how an email client works with local copies of your email history and corresponding email servers. RedisRedis is a key-value store with the specific support for fundamental data structures as values, including strings, hash maps, lists, sets, and sorted sets, whereas most key-value stores have limited understanding of a value’s meaning, except to represent the value as column cells, if many cases. For this reason, Redis is sometimes called a data structure server. Redis keeps all data in memory, which improves performance, but limits the data set sizes it can manage. Durability is optional, by periodic flushing to disk or writing updates to an append log. Master slave replication is also supported. Datomic Datomic is a newer entrant in the NoSQL landscape with a unique data model that remembers the state of the database at all points in the past, making historical reconstruction of events and state trivial. Many standard database operations are supported, including joins and ACID transactions. Deployments are distributed, elastic, highly available. RiakRiak is a fault-tolerant, distributed, key-value NoSQL database designed for large-scale deployments in cloud or hosted environments. A Riak database is masterless, with no single points of failure. It is resilient against the failure of multiple nodes and nodes can be added or removed easily. Riak is also optimized for read and write-intensive applications.