Big Data 2.0: ETL & Analytics: Implementing a next generation platform

9,634 views
9,190 views

Published on

In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.

Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20

For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.

Published in: Technology
0 Comments
20 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,634
On SlideShare
0
From Embeds
0
Number of Embeds
3,874
Actions
Shares
0
Downloads
21
Comments
0
Likes
20
Embeds 0
No embeds

No notes for slide
  • Extreme PerformanceRuns natively on Hadoop, so 500% faster than MapReduceExtreme ScaleRun on a laptopScale out to n number of nodes on any file systemExtreme AgilityETL, DQ and Analytics on Hadoop with no codingMove from any FS to any FS with no changes
  • Big Data 2.0: ETL & Analytics: Implementing a next generation platform

    1. 1. Big Data 2.0: ETL & Analytics Implementing a next generation platform Tyler Mitchell, Paul Dingman Innovation Lab January 2014
    2. 2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS Outcomes Sources Enterprise Applications Data Warehouse Actian Analytics Platform Connect Analyze Customer Delight Act Social Competitive Advantage Accelerators Internet of Things DataFlow WWW Machine Data Matrix Vector World-Class Risk Management Mobile Traditional NoSQL SaaS Disruptive New Business Models → → → → 2 Rapid Time to Value Unlimited Scale Extreme Performance Disruptive price/performance → Modern GUI Development → In-memory Analytics → Extends Hadoop and NoSQL analytics → Complements Traditional → → → → 200+ data connectors 600+ analytic functions Full deployment choice Certification with broad set of analytics tools
    3. 3. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
    4. 4. Actian DataFlow – High Speed Hadoop ETL, DQ, and Analytics, No Programming Actian Dataflow Choose from five sets of operators: Transformation & Analytics Libraries Connections Visual Framework Transformation Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Data Quality Use dual pipeline parallelism to accelerate performance 10X Analytics Data Science Optimize Query Pipelining Manage the entire analytic process in a visual framework with no coding required. Hadoop – Leader Node Reuse and share all components from operators to workflows Take processing to where the data lives, runs natively on any Hadoop distribution Actian Accelerator for Hadoop Run fully optimized processing directly on the Hadoop node or on any file system CPU Pipelining Optimized, On-HDFS Processing Confidential © 2013 Actian Corporation
    5. 5. ACTIAN DATAFLOW – ETL & ANALYTICS
    6. 6. ACTIAN DATAFLOW – ETL & ANALYTICS • • • • Predefined operators Reduced IO In-memory operations Pipeline parallelism Hadoop 2.0 - what is the big deal YARN – a new resourced scheduler ! Yet Another Resource Scheduler” DATAFLOW DATAFLOW ob Tracker and Task Tracker has been split up to increase scalability Remove MapReduce from core architecture Now there is a
    7. 7. Operator Library – ETL/DQ  Reading/Writing  Text Processing  Data Exploration  Data Matching  Aggregation  Filtering  Manipulation 7
    8. 8. Innovation Lab  Tactical mission: • Driving platform integration  Strategic mission: • Blueprint next-generation analytic apps & solution architectures • Advance new science where data and algorithms intersect • Solution demoware Confidential © 2014 Actian Corporation 8
    9. 9. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics Semantic Web Confidential © 2014 Actian Corporation
    10. 10. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE EVERYTHING IS LOG DATA • Application logs • System monitoring • Real-time feeds Event Logs Confidential © 2014 Actian Corporation
    11. 11. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE TIME-ORDERED PERSISTENCE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics • • • • Schema-less flexibility Semantic Web Extendable first-class citizens (Time, Location, Type) Universal accessibility Complete archive of raw events Confidential © 2014 Actian Corporation
    12. 12. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE VARIABLE OUTPUT TARGETS DBMS – SMP/MPP Traditional DW loading Time Series Time window analysis ETL & Analytics Load, analyze, re-feed Semantic Web Patterns, graph traversal, visuals Confidential © 2014 Actian Corporation
    13. 13. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics ACTIAN DATAFLOW SPARQLverse Confidential © 2014 Actian Corporation
    14. 14. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500  [74.95.141.217, 10.120.245.3]  User[id=2162,name=tmitchell]  login  57509328 Confidential © 2014 Actian Corporation
    15. 15. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500 - Time  [74.95.141.217, 10.120.245.3] - Space  User[id=2162,name=tmitchell] - People  login - Activity  57509328 - Magnitude Confidential © 2014 Actian Corporation
    16. 16. DATA LOADING HBASE LOADER Dataflow workflow built into KNIME open source data mining app Confidential © 2014 Actian Corporation
    17. 17. DATA LOADING HBASE STRUCTURED Event Record  hasSource – IP Address  hasTime – timestamp  hasValue – full source  hasType – data cloud type  hasLoadTimestamp – timestamp Confidential © 2014 Actian Corporation
    18. 18. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
    19. 19. HBASE TO OPENTSDB Optimized HBase reader, selects a time window and dumps to text files for serving to OpenTSDB Perpetual Load Service Confidential © 2014 Actian Corporation
    20. 20. EMIT TO OPENTSDB event.glassfish 1390373743 38720912 method=listUsers rowid=0548e8 id=79 - metric name - timestamp - execution time - method called - row ID - user ID Confidential © 2014 Actian Corporation
    21. 21. OPENTSDB UI Confidential © 2014 Actian Corporation
    22. 22. OPENTSDB UI Confidential © 2014 Actian Corporation
    23. 23. CUSTOM WEB VIZ Built using: • Autobahn Python Websockets • OpenTSDB Web API • D3 visualization Confidential © 2014 Actian Corporation
    24. 24. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
    25. 25. Analytics Library 25
    26. 26. MACHINE LEARNING ON HBASE Observe Act! Confidential © 2014 Actian Corporation
    27. 27. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics SPARQLverse * aka SPARQLBase.com Confidential © 2014 Actian Corporation
    28. 28. DATA LOADING RDF/SEMANTIC WEB LOADER RDF/Tr iples Writer Coming Soon Confidential © 2014 Actian Corporation
    29. 29. FROM LOG TO SPARQLVERSE From Single Record Confidential © 2014 Actian Corporation
    30. 30. TRIPLES EXAMPLE Agent <produces> Record Record <logsDataAbout> User Client <isCalledBy> User Client <requestsFrom> S Server <repliesTo> Cli Confidential © 2014 Actian Corporation
    31. 31. SAMPLE SPARQL QUERY SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum) FROM <event> WHERE { ?record :logsDataAbout ?client . ?user :initiates ?client . ?record :exectime ?time . } Confidential © 2014 Actian Corporation
    32. 32. SAMPLE SPARQL QUERY … ?record :logsDataAbout ?client . ?user :initiates ?client . … Confidential © 2014 Actian Corporation
    33. 33. VISUALIZE DATA GRAPHS Gephi desktop UI - supports RDF import D3 web UI example Confidential © 2014 Actian Corporation
    34. 34. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse • Used behind Amazon Redshift Confidential © 2014 Actian Corporation
    35. 35. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
    36. 36. EXPORT TO MATRIX/MPP HBASE TO MATRIX LOADER Load Matrix MPP Confidential © 2014 Actian Corporation
    37. 37. EXPORT TO MATRIX/MPP HBase to Matrix Loader Confidential © 2014 Actian Corporation
    38. 38. FUTURE DIRECTION Confidential © 2014 Actian Corporation
    39. 39. FUTURE DIRECTION Real-time processing Sematic event processing Continued integration Confidential © 2014 Actian Corporation
    40. 40. THANK YOU www.actian.com facebook.com/actiancorp Tyler.Mitchell@actian.com Paul.Dingman@actian.com @actiancorp Confidential © 2014 Actian Corporation

    ×