Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

11,431 views

Published on

In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.

Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20

For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

  1. 1. Big Data 2.0: ETL & Analytics Implementing a next generation platform Tyler Mitchell, Paul Dingman Innovation Lab January 2014
  2. 2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS Outcomes Sources Enterprise Applications Data Warehouse Actian Analytics Platform Connect Analyze Customer Delight Act Social Competitive Advantage Accelerators Internet of Things DataFlow WWW Machine Data Matrix Vector World-Class Risk Management Mobile Traditional NoSQL SaaS Disruptive New Business Models → → → → 2 Rapid Time to Value Unlimited Scale Extreme Performance Disruptive price/performance → Modern GUI Development → In-memory Analytics → Extends Hadoop and NoSQL analytics → Complements Traditional → → → → 200+ data connectors 600+ analytic functions Full deployment choice Certification with broad set of analytics tools
  3. 3. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  4. 4. Actian DataFlow – High Speed Hadoop ETL, DQ, and Analytics, No Programming Actian Dataflow Choose from five sets of operators: Transformation & Analytics Libraries Connections Visual Framework Transformation Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Data Quality Use dual pipeline parallelism to accelerate performance 10X Analytics Data Science Optimize Query Pipelining Manage the entire analytic process in a visual framework with no coding required. Hadoop – Leader Node Reuse and share all components from operators to workflows Take processing to where the data lives, runs natively on any Hadoop distribution Actian Accelerator for Hadoop Run fully optimized processing directly on the Hadoop node or on any file system CPU Pipelining Optimized, On-HDFS Processing Confidential © 2013 Actian Corporation
  5. 5. ACTIAN DATAFLOW – ETL & ANALYTICS
  6. 6. ACTIAN DATAFLOW – ETL & ANALYTICS • • • • Predefined operators Reduced IO In-memory operations Pipeline parallelism Hadoop 2.0 - what is the big deal YARN – a new resourced scheduler ! Yet Another Resource Scheduler” DATAFLOW DATAFLOW ob Tracker and Task Tracker has been split up to increase scalability Remove MapReduce from core architecture Now there is a
  7. 7. Operator Library – ETL/DQ  Reading/Writing  Text Processing  Data Exploration  Data Matching  Aggregation  Filtering  Manipulation 7
  8. 8. Innovation Lab  Tactical mission: • Driving platform integration  Strategic mission: • Blueprint next-generation analytic apps & solution architectures • Advance new science where data and algorithms intersect • Solution demoware Confidential © 2014 Actian Corporation 8
  9. 9. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics Semantic Web Confidential © 2014 Actian Corporation
  10. 10. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE EVERYTHING IS LOG DATA • Application logs • System monitoring • Real-time feeds Event Logs Confidential © 2014 Actian Corporation
  11. 11. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE TIME-ORDERED PERSISTENCE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics • • • • Schema-less flexibility Semantic Web Extendable first-class citizens (Time, Location, Type) Universal accessibility Complete archive of raw events Confidential © 2014 Actian Corporation
  12. 12. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE VARIABLE OUTPUT TARGETS DBMS – SMP/MPP Traditional DW loading Time Series Time window analysis ETL & Analytics Load, analyze, re-feed Semantic Web Patterns, graph traversal, visuals Confidential © 2014 Actian Corporation
  13. 13. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics ACTIAN DATAFLOW SPARQLverse Confidential © 2014 Actian Corporation
  14. 14. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500  [74.95.141.217, 10.120.245.3]  User[id=2162,name=tmitchell]  login  57509328 Confidential © 2014 Actian Corporation
  15. 15. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500 - Time  [74.95.141.217, 10.120.245.3] - Space  User[id=2162,name=tmitchell] - People  login - Activity  57509328 - Magnitude Confidential © 2014 Actian Corporation
  16. 16. DATA LOADING HBASE LOADER Dataflow workflow built into KNIME open source data mining app Confidential © 2014 Actian Corporation
  17. 17. DATA LOADING HBASE STRUCTURED Event Record  hasSource – IP Address  hasTime – timestamp  hasValue – full source  hasType – data cloud type  hasLoadTimestamp – timestamp Confidential © 2014 Actian Corporation
  18. 18. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  19. 19. HBASE TO OPENTSDB Optimized HBase reader, selects a time window and dumps to text files for serving to OpenTSDB Perpetual Load Service Confidential © 2014 Actian Corporation
  20. 20. EMIT TO OPENTSDB event.glassfish 1390373743 38720912 method=listUsers rowid=0548e8 id=79 - metric name - timestamp - execution time - method called - row ID - user ID Confidential © 2014 Actian Corporation
  21. 21. OPENTSDB UI Confidential © 2014 Actian Corporation
  22. 22. OPENTSDB UI Confidential © 2014 Actian Corporation
  23. 23. CUSTOM WEB VIZ Built using: • Autobahn Python Websockets • OpenTSDB Web API • D3 visualization Confidential © 2014 Actian Corporation
  24. 24. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  25. 25. Analytics Library 25
  26. 26. MACHINE LEARNING ON HBASE Observe Act! Confidential © 2014 Actian Corporation
  27. 27. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics SPARQLverse * aka SPARQLBase.com Confidential © 2014 Actian Corporation
  28. 28. DATA LOADING RDF/SEMANTIC WEB LOADER RDF/Tr iples Writer Coming Soon Confidential © 2014 Actian Corporation
  29. 29. FROM LOG TO SPARQLVERSE From Single Record Confidential © 2014 Actian Corporation
  30. 30. TRIPLES EXAMPLE Agent <produces> Record Record <logsDataAbout> User Client <isCalledBy> User Client <requestsFrom> S Server <repliesTo> Cli Confidential © 2014 Actian Corporation
  31. 31. SAMPLE SPARQL QUERY SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum) FROM <event> WHERE { ?record :logsDataAbout ?client . ?user :initiates ?client . ?record :exectime ?time . } Confidential © 2014 Actian Corporation
  32. 32. SAMPLE SPARQL QUERY … ?record :logsDataAbout ?client . ?user :initiates ?client . … Confidential © 2014 Actian Corporation
  33. 33. VISUALIZE DATA GRAPHS Gephi desktop UI - supports RDF import D3 web UI example Confidential © 2014 Actian Corporation
  34. 34. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse • Used behind Amazon Redshift Confidential © 2014 Actian Corporation
  35. 35. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  36. 36. EXPORT TO MATRIX/MPP HBASE TO MATRIX LOADER Load Matrix MPP Confidential © 2014 Actian Corporation
  37. 37. EXPORT TO MATRIX/MPP HBase to Matrix Loader Confidential © 2014 Actian Corporation
  38. 38. FUTURE DIRECTION Confidential © 2014 Actian Corporation
  39. 39. FUTURE DIRECTION Real-time processing Sematic event processing Continued integration Confidential © 2014 Actian Corporation
  40. 40. THANK YOU www.actian.com facebook.com/actiancorp Tyler.Mitchell@actian.com Paul.Dingman@actian.com @actiancorp Confidential © 2014 Actian Corporation

×