Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

1,383 views

Published on

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,383
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Usual presentation and contactstuf…Greet Ibon, he couldn’tmakeitto Madrid.Threequestions: - Are youengaged in any Hadoop projects? - HaveyouplayedwithMicrosoft’s Hadoop Distribution - Didyouknowtherewas a Microsoft’s Hadoop Distribution? ;)Microsoft’s Big Data IncubationProgram.
  • Development as a Proof of Concept allowsfor new scenariosto be thought and developed in futureiterationswith mínimum risk.Wewouldstartwith a 10Min data storage and DataWarehouse, and 1Min data storage. Thenanalytical proceses.
  • Show HDInsightService and HDInsight Server.
  • Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

    1. 1. BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH MICROSOFTS BI STACK
    2. 2. WHO…… AM I? • SQL/BI Team Lead at Plain Concepts • e-mail: pablod@plainconcepts.com • Blog: http://geek.ms/blogs/palvarez • Twitter: @PabloDoval… ARE YOU? • Quick Poll in the Room 
    3. 3. WHAT…… ARE WE GOING TO SEE?… I’M NOT GOING TO SHOW?
    4. 4. SOME PICS…
    5. 5. SHARPOverview SCADA Historical Analysis and Reporting Platform Demonstrate the feasibility of a custom end to end global architecture: • SCADA: Local, Mobile and Central • Historical Data: High speed and High volume • Reporting • Analysis
    6. 6. SHARP MAGUS MongoDB MongoDB Capped collections Capped collections For each Production CenterMAGUS 2 months of 1s data MAGUS 2 months of 1s data Central 1 year of 10m data 1 year of 10m data MAGUS Local Operation Mobile Operation Production Center A MAGUS Remote Operation MongoDB Capped collectionsMAGUS 2 months of 1s data 1 year of 10m data Mongo MAGUS DAT Files Export Local Operation Mobile Operation Production Center B Production Centers Central
    7. 7. SHARPHistorical Data MAGUS MAGUS Mongo Central Export Source 1 Loader DAT Source2 DAT Loader DAT Source3 DAT Loader DWH Hadoop Source4 Loader DAT Source5 Loader DAT DAT DAT Loader Source6 Loader Source7 Production Centers Central
    8. 8. SHARP Analysis and Reporting Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud?Production Centers Central
    9. 9. INITIAL ASSESMENT Proof of Concept Microsoft Ecosystem On PremiseInfrastructure
    10. 10. TOOLS OF THE TRADE PowerPivot Power View
    11. 11. SO… WHAT DOES IT LOOK LIKE?
    12. 12. CURRENT SHARP IMPLEMENTATION Map Reduce HDFS LoadService HIVE DWH Hadoop Azure Storage SSIS SSRS PowerView
    13. 13. LET’S TAKE A DEEPER LOOK…
    14. 14. FUTURE IMPROVEMENTSNew Analytical ProcessesCEP Integration with Stream InsightImprovements on the Higher Resolution data
    15. 15. COMPLEX EVENT PROCESSING StreamInsight Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud?Production Centers Central
    16. 16. COMPLEX EVENT PROCESSING StreamInsight Events StreamInsightProduction Centers Central
    17. 17. IMPROV. TO HIGHER RESOLUTIONDATAThe GoalAbility to work with data in DW and Hive seamlessly and in aperformant way. Export
    18. 18. IMPROV. TO HIGHER RESOLUTIONDATASqoop Refresher
    19. 19. IMPROV. TO HIGHER RESOLUTIONSqoop with PDW…DATA Map/ Sqoop Reduce Job … SQL Server SQL Server SQL Server SQL Server
    20. 20. IMPROV. TO HIGHER RESOLUTIONDATASqoop refresher… … SQL Server SQL Server SQL Server SQL Server Sqoop Hadoop Cluster
    21. 21. IMPROV. TO HIGHER RESOLUTIONThe Goal – Polybase!DATAAbility to work with data in DW and Hive seamlessly and in aperformant way. T-SQL Queries SQL Server (PDW) SQL HDF
    22. 22. IMPROV. TO HIGHER RESOLUTIONDATAPolybase parallelism via DMS … SQL Server SQL Server SQL Server SQL Server Hadoop Cluster
    23. 23. IMPROV. TO HIGHER RESOLUTIONDATAParallelism
    24. 24. IMPROV. TO HIGHER RESOLUTIONThat’s just the beginning…DATAUses the same T-SQL Syntax to query both worlds at the sametimeThe QO is able to check what data to push into whatenvironment to process optimally.
    25. 25. STORIES WE COULD TELLWhat went right… Cloud Environment Tabular Model for OLAP SSIS for ETL via ODBC Hive Driver
    26. 26. STORIES WE COULD TELLWhat was not so good… Mappers and Reducers in C# via Hadoop Streaming
    27. 27. CALL TO ACTIONLEARN MORE 1. Microsoft Big Data Solution: www.microsoft.com/bigdata 2. Windows Azure: www.windowsazure.com/en- us/home/scenarios/big-dataTRY NOW 1. Preview of the Windows Azure HDInsight Service: https://www.hadooponazure.com 2. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata

    ×