Your SlideShare is downloading. ×
0
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

7,191

Published on

In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed …

In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.

Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20

For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.

Published in: Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,191
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
21
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Extreme PerformanceRuns natively on Hadoop, so 500% faster than MapReduceExtreme ScaleRun on a laptopScale out to n number of nodes on any file systemExtreme AgilityETL, DQ and Analytics on Hadoop with no codingMove from any FS to any FS with no changes
  • Transcript

    • 1. Big Data 2.0: ETL & Analytics Implementing a next generation platform Tyler Mitchell, Paul Dingman Innovation Lab January 2014
    • 2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS Outcomes Sources Enterprise Applications Data Warehouse Actian Analytics Platform Connect Analyze Customer Delight Act Social Competitive Advantage Accelerators Internet of Things DataFlow WWW Machine Data Matrix Vector World-Class Risk Management Mobile Traditional NoSQL SaaS Disruptive New Business Models → → → → 2 Rapid Time to Value Unlimited Scale Extreme Performance Disruptive price/performance → Modern GUI Development → In-memory Analytics → Extends Hadoop and NoSQL analytics → Complements Traditional → → → → 200+ data connectors 600+ analytic functions Full deployment choice Certification with broad set of analytics tools
    • 3. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
    • 4. Actian DataFlow – High Speed Hadoop ETL, DQ, and Analytics, No Programming Actian Dataflow Choose from five sets of operators: Transformation & Analytics Libraries Connections Visual Framework Transformation Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Data Quality Use dual pipeline parallelism to accelerate performance 10X Analytics Data Science Optimize Query Pipelining Manage the entire analytic process in a visual framework with no coding required. Hadoop – Leader Node Reuse and share all components from operators to workflows Take processing to where the data lives, runs natively on any Hadoop distribution Actian Accelerator for Hadoop Run fully optimized processing directly on the Hadoop node or on any file system CPU Pipelining Optimized, On-HDFS Processing Confidential © 2013 Actian Corporation
    • 5. ACTIAN DATAFLOW – ETL & ANALYTICS
    • 6. ACTIAN DATAFLOW – ETL & ANALYTICS • • • • Predefined operators Reduced IO In-memory operations Pipeline parallelism Hadoop 2.0 - what is the big deal YARN – a new resourced scheduler ! Yet Another Resource Scheduler” DATAFLOW DATAFLOW ob Tracker and Task Tracker has been split up to increase scalability Remove MapReduce from core architecture Now there is a
    • 7. Operator Library – ETL/DQ  Reading/Writing  Text Processing  Data Exploration  Data Matching  Aggregation  Filtering  Manipulation 7
    • 8. Innovation Lab  Tactical mission: • Driving platform integration  Strategic mission: • Blueprint next-generation analytic apps & solution architectures • Advance new science where data and algorithms intersect • Solution demoware Confidential © 2014 Actian Corporation 8
    • 9. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics Semantic Web Confidential © 2014 Actian Corporation
    • 10. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE EVERYTHING IS LOG DATA • Application logs • System monitoring • Real-time feeds Event Logs Confidential © 2014 Actian Corporation
    • 11. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE TIME-ORDERED PERSISTENCE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics • • • • Schema-less flexibility Semantic Web Extendable first-class citizens (Time, Location, Type) Universal accessibility Complete archive of raw events Confidential © 2014 Actian Corporation
    • 12. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE VARIABLE OUTPUT TARGETS DBMS – SMP/MPP Traditional DW loading Time Series Time window analysis ETL & Analytics Load, analyze, re-feed Semantic Web Patterns, graph traversal, visuals Confidential © 2014 Actian Corporation
    • 13. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics ACTIAN DATAFLOW SPARQLverse Confidential © 2014 Actian Corporation
    • 14. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500  [74.95.141.217, 10.120.245.3]  User[id=2162,name=tmitchell]  login  57509328 Confidential © 2014 Actian Corporation
    • 15. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500 - Time  [74.95.141.217, 10.120.245.3] - Space  User[id=2162,name=tmitchell] - People  login - Activity  57509328 - Magnitude Confidential © 2014 Actian Corporation
    • 16. DATA LOADING HBASE LOADER Dataflow workflow built into KNIME open source data mining app Confidential © 2014 Actian Corporation
    • 17. DATA LOADING HBASE STRUCTURED Event Record  hasSource – IP Address  hasTime – timestamp  hasValue – full source  hasType – data cloud type  hasLoadTimestamp – timestamp Confidential © 2014 Actian Corporation
    • 18. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
    • 19. HBASE TO OPENTSDB Optimized HBase reader, selects a time window and dumps to text files for serving to OpenTSDB Perpetual Load Service Confidential © 2014 Actian Corporation
    • 20. EMIT TO OPENTSDB event.glassfish 1390373743 38720912 method=listUsers rowid=0548e8 id=79 - metric name - timestamp - execution time - method called - row ID - user ID Confidential © 2014 Actian Corporation
    • 21. OPENTSDB UI Confidential © 2014 Actian Corporation
    • 22. OPENTSDB UI Confidential © 2014 Actian Corporation
    • 23. CUSTOM WEB VIZ Built using: • Autobahn Python Websockets • OpenTSDB Web API • D3 visualization Confidential © 2014 Actian Corporation
    • 24. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
    • 25. Analytics Library 25
    • 26. MACHINE LEARNING ON HBASE Observe Act! Confidential © 2014 Actian Corporation
    • 27. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics SPARQLverse * aka SPARQLBase.com Confidential © 2014 Actian Corporation
    • 28. DATA LOADING RDF/SEMANTIC WEB LOADER RDF/Tr iples Writer Coming Soon Confidential © 2014 Actian Corporation
    • 29. FROM LOG TO SPARQLVERSE From Single Record Confidential © 2014 Actian Corporation
    • 30. TRIPLES EXAMPLE Agent <produces> Record Record <logsDataAbout> User Client <isCalledBy> User Client <requestsFrom> S Server <repliesTo> Cli Confidential © 2014 Actian Corporation
    • 31. SAMPLE SPARQL QUERY SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum) FROM <event> WHERE { ?record :logsDataAbout ?client . ?user :initiates ?client . ?record :exectime ?time . } Confidential © 2014 Actian Corporation
    • 32. SAMPLE SPARQL QUERY … ?record :logsDataAbout ?client . ?user :initiates ?client . … Confidential © 2014 Actian Corporation
    • 33. VISUALIZE DATA GRAPHS Gephi desktop UI - supports RDF import D3 web UI example Confidential © 2014 Actian Corporation
    • 34. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse • Used behind Amazon Redshift Confidential © 2014 Actian Corporation
    • 35. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
    • 36. EXPORT TO MATRIX/MPP HBASE TO MATRIX LOADER Load Matrix MPP Confidential © 2014 Actian Corporation
    • 37. EXPORT TO MATRIX/MPP HBase to Matrix Loader Confidential © 2014 Actian Corporation
    • 38. FUTURE DIRECTION Confidential © 2014 Actian Corporation
    • 39. FUTURE DIRECTION Real-time processing Sematic event processing Continued integration Confidential © 2014 Actian Corporation
    • 40. THANK YOU www.actian.com facebook.com/actiancorp Tyler.Mitchell@actian.com Paul.Dingman@actian.com @actiancorp Confidential © 2014 Actian Corporation

    ×