SlideShare a Scribd company logo
BUILDING AN HETEROGENEOUS
 HADOOP/OLAP SYSTEM WITH
    MICROSOFT'S BI STACK
WHO…

… AM I?
 •   SQL/BI Team Lead at Plain Concepts
 •   e-mail: pablod@plainconcepts.com
 •   Blog: http://geek.ms/blogs/palvarez
 •   Twitter: @PabloDoval

… ARE YOU?
 •   Quick Poll in the Room 
WHAT…

… ARE WE GOING TO SEE?




… I’M NOT GOING TO SHOW?
SOME PICS…
SHARP
Overview



 SCADA Historical Analysis and Reporting Platform


 Demonstrate the feasibility of a custom end to end global
 architecture:
 • SCADA: Local, Mobile and Central
 • Historical Data: High speed and High volume
 • Reporting
 • Analysis
SHARP
  MAGUS
                                                                           MongoDB
          MongoDB                                                          Capped collections
          Capped collections                                               For each Production Center
MAGUS
        2 months of 1s data
                                                                MAGUS      2 months of 1s data
                                                                 Central   1 year of 10m data
        1 year of 10m data


          MAGUS
          Local Operation
          Mobile Operation


           Production Center A
                                                                           MAGUS
                                                                           Remote Operation
           MongoDB
           Capped collections
MAGUS
           2 months of 1s data
           1 year of 10m data

                                                                 Mongo
           MAGUS                                                                 DAT Files
                                                                 Export
           Local Operation
           Mobile Operation

           Production Center B

                                 Production Centers   Central
SHARP
Historical Data
 MAGUS
                           MAGUS     Mongo
                           Central   Export

           Source 1



                           Loader             DAT
 Source2                                      DAT
                           Loader             DAT
           Source3
                                              DAT

                           Loader



                                                             DWH
                                                    Hadoop
 Source4
                           Loader
                                              DAT
           Source5         Loader             DAT
                                              DAT
                                              DAT
                           Loader
 Source6


                           Loader
           Source7




  Production Centers   Central
SHARP
    Analysis and Reporting
                                                   Events




                                                            Power
                                                            Pivot

                                     DWH
                     StreamInsight


                                               Microsoft
                                                Office




                                                            •   Dynamic reports
                                               Reporting
                                                            •   Scheduled reports
                                                Services    •   Automatic Distribution
                                     OLAP                   •   Multiformat (PDF, XLS, etc.)
                                     Tabular



                                               Power View

                                     OLAP
                                     Tabular
                                                             Future
                                                            ¿Cloud?



Production Centers   Central
INITIAL ASSESMENT

  Proof of
  Concept

  Microsoft
 Ecosystem


 On Premise
Infrastructure
TOOLS OF THE TRADE




                     PowerPivot




                     Power View
SO… WHAT DOES IT LOOK LIKE?
CURRENT SHARP IMPLEMENTATION

                      Map
                     Reduce




              HDFS
 Load
Service
                     HIVE               DWH
           Hadoop
  Azure
 Storage


                               SSIS




                              SSRS PowerView
LET’S TAKE A DEEPER LOOK…
FUTURE IMPROVEMENTS

New Analytical Processes


CEP Integration with Stream Insight


Improvements on the Higher Resolution data
COMPLEX EVENT PROCESSING
    StreamInsight
                                                   Events




                                                            Power
                                                            Pivot

                                     DWH
                     StreamInsight


                                               Microsoft
                                                Office




                                                            •   Dynamic reports
                                               Reporting
                                                            •   Scheduled reports
                                                Services    •   Automatic Distribution
                                     OLAP                   •   Multiformat (PDF, XLS, etc.)
                                     Tabular



                                               Power View

                                     OLAP
                                     Tabular
                                                             Future
                                                            ¿Cloud?



Production Centers   Central
COMPLEX EVENT PROCESSING
    StreamInsight
                                     Events




                     StreamInsight




Production Centers   Central
IMPROV. TO HIGHER RESOLUTION
DATA
The Goal

Ability to work with data in DW and Hive seamlessly and in a
performant way.




                                       Export
IMPROV. TO HIGHER RESOLUTION
DATA
Sqoop Refresher
IMPROV. TO HIGHER RESOLUTION
Sqoop with PDW…
DATA
                                Map/
        Sqoop                  Reduce
                                 Job




                                              …
     SQL Server
                  SQL Server     SQL Server       SQL Server
IMPROV. TO HIGHER RESOLUTION
DATA
Sqoop refresher…


                                             …
     SQL Server
                   SQL Server   SQL Server       SQL Server




                                Sqoop




  Hadoop Cluster
IMPROV. TO HIGHER RESOLUTION
The Goal – Polybase!
DATA
Ability to work with data in DW and Hive seamlessly and in a
performant way.
                       T-SQL Queries




                SQL Server
                  (PDW)
            SQL HDF
IMPROV. TO HIGHER RESOLUTION
DATA
Polybase parallelism via DMS


                                             …
     SQL Server
                   SQL Server   SQL Server       SQL Server




  Hadoop Cluster
IMPROV. TO HIGHER RESOLUTION
DATA
Parallelism
IMPROV. TO HIGHER RESOLUTION
That’s just the beginning…
DATA

Uses the same T-SQL Syntax to query both worlds at the same
time


The QO is able to check what data to push into what
environment to process optimally.
STORIES WE COULD TELL
What went right…
 Cloud Environment


 Tabular Model for OLAP


 SSIS for ETL via ODBC Hive Driver
STORIES WE COULD TELL
What was not so good…
 Mappers and Reducers in C# via Hadoop Streaming
CALL TO ACTION
LEARN MORE
 1.   Microsoft Big Data Solution: www.microsoft.com/bigdata
 2.   Windows Azure: www.windowsazure.com/en-
      us/home/scenarios/big-data

TRY NOW
 1. Preview of the Windows Azure HDInsight Service:
      https://www.hadooponazure.com

 2. Developer CTP of Microsoft HDInsight Server for Windows Server:
      http://www.microsoft.com/bigdata
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

More Related Content

What's hot

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Krishnan Parasuraman
 
SPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureSPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureShakir Majeed Khan
 
Oracle BI Server By AORTA
Oracle BI Server By AORTAOracle BI Server By AORTA
Oracle BI Server By AORTA
Aorta business intelligence
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
Sybase Türkiye
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
SAP Technology
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
Sparkhound Inc.
 
Sql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalSql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalKlaudiia Jacome
 

What's hot (9)

Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
SPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows AzureSPS- Share Point 2010 and Windows Azure
SPS- Share Point 2010 and Windows Azure
 
Oracle BI Server By AORTA
Oracle BI Server By AORTAOracle BI Server By AORTA
Oracle BI Server By AORTA
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
SQL Server Reporting Services (SSRS) 101
 SQL Server Reporting Services (SSRS) 101 SQL Server Reporting Services (SSRS) 101
SQL Server Reporting Services (SSRS) 101
 
Sql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_finalSql server2008 r2_bi_datasheet_final
Sql server2008 r2_bi_datasheet_final
 

Similar to Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

OpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageOpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageFrank Wagman
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
SnapLogic corporate presentation
SnapLogic corporate presentationSnapLogic corporate presentation
SnapLogic corporate presentation
pbridges
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSybase Türkiye
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Cloudera, Inc.
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013Mike Boyarski
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
Bob Rhubart
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
Sybase Türkiye
 
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_dataCon2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
aadamserpcorp
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
Dipti Borkar
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented Architecture
Bob Rhubart
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Controlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANAControlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANA
John Jordan
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and career
Vitaliy Rudnytskiy
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
Fabian Hardt
 

Similar to Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012 (20)

OpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to ManageOpenSpan - A Better Way to Work, A Better Way to Manage
OpenSpan - A Better Way to Work, A Better Way to Manage
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
SnapLogic corporate presentation
SnapLogic corporate presentationSnapLogic corporate presentation
SnapLogic corporate presentation
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming Processing
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Jaspersoft Dashboards Webinar Feb 2013
Jaspersoft Dashboards Webinar  Feb 2013Jaspersoft Dashboards Webinar  Feb 2013
Jaspersoft Dashboards Webinar Feb 2013
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_dataCon2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
Con2012 salmon impact_of_sap_hana_revolutionary_changes_for_sap_controlling_data
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented Architecture
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Controlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANAControlling 2012 Impact of SAP HANA
Controlling 2012 Impact of SAP HANA
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and career
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

  • 1. BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH MICROSOFT'S BI STACK
  • 2. WHO… … AM I? • SQL/BI Team Lead at Plain Concepts • e-mail: pablod@plainconcepts.com • Blog: http://geek.ms/blogs/palvarez • Twitter: @PabloDoval … ARE YOU? • Quick Poll in the Room 
  • 3. WHAT… … ARE WE GOING TO SEE? … I’M NOT GOING TO SHOW?
  • 4.
  • 6. SHARP Overview SCADA Historical Analysis and Reporting Platform Demonstrate the feasibility of a custom end to end global architecture: • SCADA: Local, Mobile and Central • Historical Data: High speed and High volume • Reporting • Analysis
  • 7. SHARP MAGUS MongoDB MongoDB Capped collections Capped collections For each Production Center MAGUS 2 months of 1s data MAGUS 2 months of 1s data Central 1 year of 10m data 1 year of 10m data MAGUS Local Operation Mobile Operation Production Center A MAGUS Remote Operation MongoDB Capped collections MAGUS 2 months of 1s data 1 year of 10m data Mongo MAGUS DAT Files Export Local Operation Mobile Operation Production Center B Production Centers Central
  • 8. SHARP Historical Data MAGUS MAGUS Mongo Central Export Source 1 Loader DAT Source2 DAT Loader DAT Source3 DAT Loader DWH Hadoop Source4 Loader DAT Source5 Loader DAT DAT DAT Loader Source6 Loader Source7 Production Centers Central
  • 9. SHARP Analysis and Reporting Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud? Production Centers Central
  • 10. INITIAL ASSESMENT Proof of Concept Microsoft Ecosystem On Premise Infrastructure
  • 11. TOOLS OF THE TRADE PowerPivot Power View
  • 12.
  • 13. SO… WHAT DOES IT LOOK LIKE?
  • 14. CURRENT SHARP IMPLEMENTATION Map Reduce HDFS Load Service HIVE DWH Hadoop Azure Storage SSIS SSRS PowerView
  • 15. LET’S TAKE A DEEPER LOOK…
  • 16. FUTURE IMPROVEMENTS New Analytical Processes CEP Integration with Stream Insight Improvements on the Higher Resolution data
  • 17. COMPLEX EVENT PROCESSING StreamInsight Events Power Pivot DWH StreamInsight Microsoft Office • Dynamic reports Reporting • Scheduled reports Services • Automatic Distribution OLAP • Multiformat (PDF, XLS, etc.) Tabular Power View OLAP Tabular Future ¿Cloud? Production Centers Central
  • 18. COMPLEX EVENT PROCESSING StreamInsight Events StreamInsight Production Centers Central
  • 19. IMPROV. TO HIGHER RESOLUTION DATA The Goal Ability to work with data in DW and Hive seamlessly and in a performant way. Export
  • 20. IMPROV. TO HIGHER RESOLUTION DATA Sqoop Refresher
  • 21. IMPROV. TO HIGHER RESOLUTION Sqoop with PDW… DATA Map/ Sqoop Reduce Job … SQL Server SQL Server SQL Server SQL Server
  • 22. IMPROV. TO HIGHER RESOLUTION DATA Sqoop refresher… … SQL Server SQL Server SQL Server SQL Server Sqoop Hadoop Cluster
  • 23. IMPROV. TO HIGHER RESOLUTION The Goal – Polybase! DATA Ability to work with data in DW and Hive seamlessly and in a performant way. T-SQL Queries SQL Server (PDW) SQL HDF
  • 24. IMPROV. TO HIGHER RESOLUTION DATA Polybase parallelism via DMS … SQL Server SQL Server SQL Server SQL Server Hadoop Cluster
  • 25. IMPROV. TO HIGHER RESOLUTION DATA Parallelism
  • 26. IMPROV. TO HIGHER RESOLUTION That’s just the beginning… DATA Uses the same T-SQL Syntax to query both worlds at the same time The QO is able to check what data to push into what environment to process optimally.
  • 27. STORIES WE COULD TELL What went right… Cloud Environment Tabular Model for OLAP SSIS for ETL via ODBC Hive Driver
  • 28. STORIES WE COULD TELL What was not so good… Mappers and Reducers in C# via Hadoop Streaming
  • 29. CALL TO ACTION LEARN MORE 1. Microsoft Big Data Solution: www.microsoft.com/bigdata 2. Windows Azure: www.windowsazure.com/en- us/home/scenarios/big-data TRY NOW 1. Preview of the Windows Azure HDInsight Service: https://www.hadooponazure.com 2. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata

Editor's Notes

  1. Usual presentation and contactstuf…Greet Ibon, he couldn’tmakeitto Madrid.Threequestions: - Are youengaged in any Hadoop projects? - HaveyouplayedwithMicrosoft’s Hadoop Distribution - Didyouknowtherewas a Microsoft’s Hadoop Distribution? ;)Microsoft’s Big Data IncubationProgram.
  2. Development as a Proof of Concept allowsfor new scenariosto be thought and developed in futureiterationswith mínimum risk.Wewouldstartwith a 10Min data storage and DataWarehouse, and 1Min data storage. Thenanalytical proceses.
  3. Show HDInsightService and HDInsight Server.