SlideShare a Scribd company logo
1 of 40
Big Data 2.0: ETL & Analytics
Implementing a next generation platform
Tyler Mitchell, Paul Dingman
Innovation Lab
January 2014
ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS

Outcomes

Sources
Enterprise

Applications

Data
Warehouse

Actian Analytics Platform
Connect

Analyze

Customer
Delight

Act

Social
Competitive
Advantage

Accelerators
Internet of Things
DataFlow
WWW

Machine
Data

Matrix

Vector

World-Class Risk
Management

Mobile

Traditional

NoSQL

SaaS
Disruptive New
Business Models

→
→
→
→

2

Rapid Time to Value
Unlimited Scale
Extreme Performance
Disruptive price/performance

→ Modern GUI Development
→ In-memory Analytics
→ Extends Hadoop and
NoSQL analytics
→ Complements Traditional

→
→
→
→

200+ data connectors
600+ analytic functions
Full deployment choice
Certification with broad
set of analytics tools
Actian Matrix for High Performance
Analytics at Any Scale
Serve up highperformance analytic
processing for any app

On-Demand Analytics
On-Demand Integration
Orchestration

Manage dataflows
across the entire
analytic process

Connect to any data
source at the point of
the query
700+ indatabase, analytic
functions

Analytic Libraries
Optimizer

Massively Parallel
LEADER NODE

Columnar

5 LEVELS OF
OPTIMIZATION:

Compressed
Compiled

SQL

Connected

Planning
Execution
Communications
Memory

H

H

H

H

H

H

H

H

H

H

H

H

Node-to-node, bidirectional sharing of
analytics & processes
with Hadoop nodes

Confidential © 2013 Actian Corporation
Actian DataFlow – High Speed Hadoop ETL,
DQ, and Analytics, No Programming
Actian Dataflow
Choose from five sets
of operators:

Transformation & Analytics Libraries

Connections

Visual Framework

Transformation

Automatically detect
resources, plan
optimal utilization,
and parallelize all
workloads on Hadoop

Data Quality
Use dual pipeline
parallelism to
accelerate
performance 10X

Analytics

Data Science

Optimize
Query Pipelining

Manage the entire
analytic process in a
visual framework with
no coding required.

Hadoop – Leader Node

Reuse and share all
components from
operators to
workflows

Take processing to
where the data
lives, runs natively on
any Hadoop
distribution

Actian Accelerator
for Hadoop

Run fully optimized
processing directly on
the Hadoop node or
on any file system

CPU Pipelining
Optimized, On-HDFS Processing

Confidential © 2013 Actian Corporation
ACTIAN DATAFLOW – ETL & ANALYTICS
ACTIAN DATAFLOW – ETL & ANALYTICS

•
•
•
•

Predefined operators
Reduced IO
In-memory operations
Pipeline parallelism

Hadoop 2.0 - what is the big deal

YARN – a new resourced scheduler !
Yet Another Resource Scheduler”
DATAFLOW

DATAFLOW

ob Tracker and Task Tracker has been split up
to increase scalability
Remove MapReduce from core architecture

Now there is a
Operator Library – ETL/DQ
 Reading/Writing
 Text Processing
 Data Exploration
 Data Matching
 Aggregation
 Filtering
 Manipulation

7
Innovation Lab
 Tactical mission:
• Driving platform integration

 Strategic mission:
• Blueprint next-generation analytic
apps & solution architectures

• Advance new science where data
and algorithms intersect
• Solution demoware

Confidential © 2014 Actian Corporation

8
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

DBMS – SMP/MPP
Time Series
Event
Logs

ETL & Analytics
Semantic Web

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

EVERYTHING IS LOG DATA

• Application logs
• System monitoring
• Real-time feeds

Event
Logs

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

TIME-ORDERED PERSISTENCE

DBMS – SMP/MPP
Time Series
Event
Logs

ETL & Analytics
•
•
•
•

Schema-less flexibility
Semantic Web
Extendable first-class citizens (Time, Location, Type)
Universal accessibility
Complete archive of raw events

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

VARIABLE OUTPUT TARGETS

DBMS – SMP/MPP

Traditional DW loading

Time Series

Time window analysis

ETL & Analytics

Load, analyze, re-feed

Semantic Web

Patterns, graph traversal, visuals

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
ACTIAN DATAFLOW

SPARQLverse

Confidential © 2014 Actian Corporation
DATA LOADING

ACTIAN DATA CLOUD LOG FILE EXAMPLE

 2013-08-01T03:38:42.236-0500
 [74.95.141.217, 10.120.245.3]

 User[id=2162,name=tmitchell]
 login
 57509328

Confidential © 2014 Actian Corporation
DATA LOADING

ACTIAN DATA CLOUD LOG FILE EXAMPLE

 2013-08-01T03:38:42.236-0500 - Time
 [74.95.141.217, 10.120.245.3]

- Space

 User[id=2162,name=tmitchell]

- People

 login

- Activity

 57509328

- Magnitude

Confidential © 2014 Actian Corporation
DATA LOADING

HBASE LOADER

Dataflow workflow built into
KNIME open source data mining app

Confidential © 2014 Actian Corporation
DATA LOADING

HBASE STRUCTURED
Event Record
 hasSource – IP Address
 hasTime – timestamp

 hasValue – full source
 hasType – data cloud type
 hasLoadTimestamp – timestamp

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

Confidential © 2014 Actian Corporation
HBASE TO OPENTSDB

Optimized HBase reader, selects a time window
and dumps to text files for serving to OpenTSDB

Perpetual
Load
Service
Confidential © 2014 Actian Corporation
EMIT TO OPENTSDB

event.glassfish
1390373743
38720912
method=listUsers
rowid=0548e8
id=79

- metric name
- timestamp
- execution time
- method called
- row ID
- user ID

Confidential © 2014 Actian Corporation
OPENTSDB UI

Confidential © 2014 Actian Corporation
OPENTSDB UI

Confidential © 2014 Actian Corporation
CUSTOM WEB VIZ

Built using:
• Autobahn Python Websockets
• OpenTSDB Web API
• D3 visualization
Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

Confidential © 2014 Actian Corporation
Analytics Library

25
MACHINE LEARNING ON HBASE

Observe

Act!

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
SPARQLverse

* aka SPARQLBase.com

Confidential © 2014 Actian Corporation
DATA LOADING

RDF/SEMANTIC WEB LOADER

RDF/Tr
iples
Writer
Coming Soon

Confidential © 2014 Actian Corporation
FROM LOG TO SPARQLVERSE

From
Single Record

Confidential © 2014 Actian Corporation
TRIPLES EXAMPLE

Agent <produces> Record

Record <logsDataAbout> User
Client <isCalledBy> User

Client <requestsFrom> S

Server <repliesTo> Cli

Confidential © 2014 Actian Corporation
SAMPLE SPARQL QUERY
SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum)
FROM <event>
WHERE {
?record :logsDataAbout ?client .
?user :initiates ?client .
?record :exectime ?time . }

Confidential © 2014 Actian Corporation
SAMPLE SPARQL QUERY

…
?record :logsDataAbout ?client .
?user :initiates ?client .
…

Confidential © 2014 Actian Corporation
VISUALIZE DATA GRAPHS

Gephi desktop UI
- supports RDF import

D3 web UI example
Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

•

Used behind Amazon Redshift

Confidential © 2014 Actian Corporation
Actian Matrix for High Performance
Analytics at Any Scale
Serve up highperformance analytic
processing for any app

On-Demand Analytics
On-Demand Integration
Orchestration

Manage dataflows
across the entire
analytic process

Connect to any data
source at the point of
the query
700+ indatabase, analytic
functions

Analytic Libraries
Optimizer

Massively Parallel
LEADER NODE

Columnar

5 LEVELS OF
OPTIMIZATION:

Compressed
Compiled

SQL

Connected

Planning
Execution
Communications
Memory

H

H

H

H

H

H

H

H

H

H

H

H

Node-to-node, bidirectional sharing of
analytics & processes
with Hadoop nodes

Confidential © 2013 Actian Corporation
EXPORT TO MATRIX/MPP

HBASE TO MATRIX LOADER

Load Matrix MPP

Confidential © 2014 Actian Corporation
EXPORT TO MATRIX/MPP

HBase to Matrix Loader

Confidential © 2014 Actian Corporation
FUTURE DIRECTION

Confidential © 2014 Actian Corporation
FUTURE DIRECTION

Real-time processing
Sematic event processing
Continued integration

Confidential © 2014 Actian Corporation
THANK YOU
www.actian.com
facebook.com/actiancorp

Tyler.Mitchell@actian.com
Paul.Dingman@actian.com

@actiancorp

Confidential © 2014 Actian Corporation

More Related Content

What's hot

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 

What's hot (20)

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Data lake
Data lakeData lake
Data lake
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 

Viewers also liked

Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014Amazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Roadmap for solution company
Roadmap for  solution companyRoadmap for  solution company
Roadmap for solution companyLytton He
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Michael Olschimke
 
Performing network security analytics
Performing network security analyticsPerforming network security analytics
Performing network security analyticsDataWorks Summit
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020Anjan Roy, PMP
 
Envisioning the Next Generation of Analytics
Envisioning the Next Generation of AnalyticsEnvisioning the Next Generation of Analytics
Envisioning the Next Generation of AnalyticsLora Cecere
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analyticsChristian Have
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Network Security‬ and Big ‪‎Data Analytics‬
Network Security‬ and Big ‪‎Data Analytics‬Network Security‬ and Big ‪‎Data Analytics‬
Network Security‬ and Big ‪‎Data Analytics‬Allot Communications
 
Survey: Security Analytics and Intelligence
Survey: Security Analytics and IntelligenceSurvey: Security Analytics and Intelligence
Survey: Security Analytics and IntelligenceSolarWinds
 
Security Analytics and Big Data: What You Need to Know
Security Analytics and Big Data: What You Need to KnowSecurity Analytics and Big Data: What You Need to Know
Security Analytics and Big Data: What You Need to KnowMapR Technologies
 

Viewers also liked (20)

Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Roadmap for solution company
Roadmap for  solution companyRoadmap for  solution company
Roadmap for solution company
 
Netadminpres
NetadminpresNetadminpres
Netadminpres
 
Security analytics
Security analyticsSecurity analytics
Security analytics
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Performing network security analytics
Performing network security analyticsPerforming network security analytics
Performing network security analytics
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Envisioning the Next Generation of Analytics
Envisioning the Next Generation of AnalyticsEnvisioning the Next Generation of Analytics
Envisioning the Next Generation of Analytics
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analytics
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Network Security‬ and Big ‪‎Data Analytics‬
Network Security‬ and Big ‪‎Data Analytics‬Network Security‬ and Big ‪‎Data Analytics‬
Network Security‬ and Big ‪‎Data Analytics‬
 
Survey: Security Analytics and Intelligence
Survey: Security Analytics and IntelligenceSurvey: Security Analytics and Intelligence
Survey: Security Analytics and Intelligence
 
Security Analytics and Big Data: What You Need to Know
Security Analytics and Big Data: What You Need to KnowSecurity Analytics and Big Data: What You Need to Know
Security Analytics and Big Data: What You Need to Know
 

Similar to Big Data 2.0: ETL & Analytics: Implementing a next generation platform

Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionAlessandro Salvatico
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�Actian Corporation
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowEric Kavanagh
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleDevOps.com
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020HostedbyConfluent
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Cloud Computing & Sun Vision 03262009
Cloud Computing & Sun Vision 03262009Cloud Computing & Sun Vision 03262009
Cloud Computing & Sun Vision 03262009guest829442
 
Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Dealmaker Media
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 

Similar to Big Data 2.0: ETL & Analytics: Implementing a next generation platform (20)

Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Cloud Computing & Sun Vision 03262009
Cloud Computing & Sun Vision 03262009Cloud Computing & Sun Vision 03262009
Cloud Computing & Sun Vision 03262009
 
Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013Continuuity Presents at Under the Radar 2013
Continuuity Presents at Under the Radar 2013
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

  • 1. Big Data 2.0: ETL & Analytics Implementing a next generation platform Tyler Mitchell, Paul Dingman Innovation Lab January 2014
  • 2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS Outcomes Sources Enterprise Applications Data Warehouse Actian Analytics Platform Connect Analyze Customer Delight Act Social Competitive Advantage Accelerators Internet of Things DataFlow WWW Machine Data Matrix Vector World-Class Risk Management Mobile Traditional NoSQL SaaS Disruptive New Business Models → → → → 2 Rapid Time to Value Unlimited Scale Extreme Performance Disruptive price/performance → Modern GUI Development → In-memory Analytics → Extends Hadoop and NoSQL analytics → Complements Traditional → → → → 200+ data connectors 600+ analytic functions Full deployment choice Certification with broad set of analytics tools
  • 3. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  • 4. Actian DataFlow – High Speed Hadoop ETL, DQ, and Analytics, No Programming Actian Dataflow Choose from five sets of operators: Transformation & Analytics Libraries Connections Visual Framework Transformation Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Data Quality Use dual pipeline parallelism to accelerate performance 10X Analytics Data Science Optimize Query Pipelining Manage the entire analytic process in a visual framework with no coding required. Hadoop – Leader Node Reuse and share all components from operators to workflows Take processing to where the data lives, runs natively on any Hadoop distribution Actian Accelerator for Hadoop Run fully optimized processing directly on the Hadoop node or on any file system CPU Pipelining Optimized, On-HDFS Processing Confidential © 2013 Actian Corporation
  • 5. ACTIAN DATAFLOW – ETL & ANALYTICS
  • 6. ACTIAN DATAFLOW – ETL & ANALYTICS • • • • Predefined operators Reduced IO In-memory operations Pipeline parallelism Hadoop 2.0 - what is the big deal YARN – a new resourced scheduler ! Yet Another Resource Scheduler” DATAFLOW DATAFLOW ob Tracker and Task Tracker has been split up to increase scalability Remove MapReduce from core architecture Now there is a
  • 7. Operator Library – ETL/DQ  Reading/Writing  Text Processing  Data Exploration  Data Matching  Aggregation  Filtering  Manipulation 7
  • 8. Innovation Lab  Tactical mission: • Driving platform integration  Strategic mission: • Blueprint next-generation analytic apps & solution architectures • Advance new science where data and algorithms intersect • Solution demoware Confidential © 2014 Actian Corporation 8
  • 9. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics Semantic Web Confidential © 2014 Actian Corporation
  • 10. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE EVERYTHING IS LOG DATA • Application logs • System monitoring • Real-time feeds Event Logs Confidential © 2014 Actian Corporation
  • 11. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE TIME-ORDERED PERSISTENCE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics • • • • Schema-less flexibility Semantic Web Extendable first-class citizens (Time, Location, Type) Universal accessibility Complete archive of raw events Confidential © 2014 Actian Corporation
  • 12. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE VARIABLE OUTPUT TARGETS DBMS – SMP/MPP Traditional DW loading Time Series Time window analysis ETL & Analytics Load, analyze, re-feed Semantic Web Patterns, graph traversal, visuals Confidential © 2014 Actian Corporation
  • 13. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics ACTIAN DATAFLOW SPARQLverse Confidential © 2014 Actian Corporation
  • 14. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500  [74.95.141.217, 10.120.245.3]  User[id=2162,name=tmitchell]  login  57509328 Confidential © 2014 Actian Corporation
  • 15. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500 - Time  [74.95.141.217, 10.120.245.3] - Space  User[id=2162,name=tmitchell] - People  login - Activity  57509328 - Magnitude Confidential © 2014 Actian Corporation
  • 16. DATA LOADING HBASE LOADER Dataflow workflow built into KNIME open source data mining app Confidential © 2014 Actian Corporation
  • 17. DATA LOADING HBASE STRUCTURED Event Record  hasSource – IP Address  hasTime – timestamp  hasValue – full source  hasType – data cloud type  hasLoadTimestamp – timestamp Confidential © 2014 Actian Corporation
  • 18. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  • 19. HBASE TO OPENTSDB Optimized HBase reader, selects a time window and dumps to text files for serving to OpenTSDB Perpetual Load Service Confidential © 2014 Actian Corporation
  • 20. EMIT TO OPENTSDB event.glassfish 1390373743 38720912 method=listUsers rowid=0548e8 id=79 - metric name - timestamp - execution time - method called - row ID - user ID Confidential © 2014 Actian Corporation
  • 21. OPENTSDB UI Confidential © 2014 Actian Corporation
  • 22. OPENTSDB UI Confidential © 2014 Actian Corporation
  • 23. CUSTOM WEB VIZ Built using: • Autobahn Python Websockets • OpenTSDB Web API • D3 visualization Confidential © 2014 Actian Corporation
  • 24. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  • 26. MACHINE LEARNING ON HBASE Observe Act! Confidential © 2014 Actian Corporation
  • 27. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics SPARQLverse * aka SPARQLBase.com Confidential © 2014 Actian Corporation
  • 28. DATA LOADING RDF/SEMANTIC WEB LOADER RDF/Tr iples Writer Coming Soon Confidential © 2014 Actian Corporation
  • 29. FROM LOG TO SPARQLVERSE From Single Record Confidential © 2014 Actian Corporation
  • 30. TRIPLES EXAMPLE Agent <produces> Record Record <logsDataAbout> User Client <isCalledBy> User Client <requestsFrom> S Server <repliesTo> Cli Confidential © 2014 Actian Corporation
  • 31. SAMPLE SPARQL QUERY SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum) FROM <event> WHERE { ?record :logsDataAbout ?client . ?user :initiates ?client . ?record :exectime ?time . } Confidential © 2014 Actian Corporation
  • 32. SAMPLE SPARQL QUERY … ?record :logsDataAbout ?client . ?user :initiates ?client . … Confidential © 2014 Actian Corporation
  • 33. VISUALIZE DATA GRAPHS Gephi desktop UI - supports RDF import D3 web UI example Confidential © 2014 Actian Corporation
  • 34. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse • Used behind Amazon Redshift Confidential © 2014 Actian Corporation
  • 35. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  • 36. EXPORT TO MATRIX/MPP HBASE TO MATRIX LOADER Load Matrix MPP Confidential © 2014 Actian Corporation
  • 37. EXPORT TO MATRIX/MPP HBase to Matrix Loader Confidential © 2014 Actian Corporation
  • 38. FUTURE DIRECTION Confidential © 2014 Actian Corporation
  • 39. FUTURE DIRECTION Real-time processing Sematic event processing Continued integration Confidential © 2014 Actian Corporation

Editor's Notes

  1. Extreme PerformanceRuns natively on Hadoop, so 500% faster than MapReduceExtreme ScaleRun on a laptopScale out to n number of nodes on any file systemExtreme AgilityETL, DQ and Analytics on Hadoop with no codingMove from any FS to any FS with no changes