SlideShare a Scribd company logo
1 of 105
Download to read offline
BIG DATA


Wesley Backelant
Technology Advisor
Microsoft
@WesleyBackelant

Nathan Bijnens
Big Data Consultant
DataCrunchers
@nathan_gs
AGENDA

•   Big Data
•   Hadoop (& Ecosystem)
•   How does it fit in the Microsoft world?
•   Demo
•   Resources
•   Q&A
THE WORLD OF DATA IS CHANGING
TODAY A NEW SET OF QUESTIONS ARE BEING ASKED OF
THE BUSINESS:


      What’s the social                         How do I better
      sentiment for my                          predict future
      brand or products                         outcomes?




                          How do I optimize
                          my fleet based on
                          weather and traffic
                          patterns?
TRANSFORMATION OF ONLINE MARKETING




            BLOGS.FORBES.COM/DAVEFEINLEIB
TRANSFORMATION OF OPERATIONS




            BLOGS.FORBES.COM/DAVEFEINLEIB
TRANSFORMATION OF CUSTOMER SERVICE




            BLOGS.FORBES.COM/DAVEFEINLEIB
TRANSFORMATION OF ENERGY
TRANSFORMATION OF FRAUD DETECTION




Then…                 Now…
NEW HARDWARE APPROACH
Traditional                Big Data
  Exotic HW                 Commodity HW
   • Big central servers         • racks of pizza boxes
   • SAN                         • Ethernet
   • RAID                        • JBOD
 Hardware reliability       Unreliable HW
 Limited scalability        Scales further
                            Cost effective
NEW SOFTWARE APPROACH
Traditional             Big Data
  Monolotic              Distributed
   • Centralized            - storage & compute nodes
   • RDBMS               Raw data
 Schema first
 Proprietary
HADOOP & BIG DATA ECOSYSTEM




               MapReduce


              HDFS
HDFS
HDFS
MAPREDUCE
MAPREDUCE
MAPREDUCE
HIVE
HIVE

A data warehouse infrastructure built on top of
 Hadoop for providing data summarization, query, and
 analysis.
   – Ideal for ad hoc querying
   – Query execution via MapReduce.

Key Building Principles:
   – SQL
   – Extensibility
       – Types
       – Functions
       – Scripts
HIVE

It supports many SQL features like:
    – Data partitioning
    – Aggregations
    – Grouping
    – Joins
HIVE

And it’s extendable using UDFs.
 package com.example.hive.udf;

 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.Text;

 public final class Lower extends UDF {
   public Text evaluate(final Text s) {
     if (s == null) { return null; }
     return new Text(s.toString().toLowerCase());
   }
 }




There are many UDFs published by external parties, for:
- Loading / Saving (SerDe)
- Field Transformations
HADOOP PIG: INTRO




Pig is a high level data flow language.
HADOOP PIG: 3 COMPONENTS

• Pig Latin

• Grunt

• PigServer
HADOOP PIG


data = LOAD 'employee.csv' USING PigStorage() AS (
                 first_name:chararray,
                 last_name:chararray,
                 age:int,
                 wage:float,
                 department:chararray
         );
HADOOP PIG



grouped_by_department = GROUP data BY department;

total_wage_by_department =
         FOREACH grouped_by_department
         GENERATE
                  group AS department,
                  COUNT(data) as employee_count,
                  SUM(data::wage) AS total_wage;

total_ordered = ORDER total_wage_by_department BY total_wage;

total_limited = LIMIT total_ordered 10;
HADOOP PIG



DUMP total_limited;

STORE total_limited INTO ‘/test/’;
UDF

●   Custom Load and Store classes.
    ●   Hbase
    ●   ProtocolBuffers
    ●   CombinedLog
●   Custom extraction
    eg. date, ...


●   Take a look at the PiggyBank.
HBASE

  A distributed, versioned, column-oriented
   database.
• Main features:
  • Horizontal scalability
  • Machine failure tolerance
  • Row-level atomic operations including compare-and-swap ops like
    incrementing counters
  • Augmented key-value schemas, the user can group columns into families which
    are configured independently
  • Multiple clients like its native Java library, Thrift, and REST
  • Upcoming Security
STORM
STORM
STORM

•   Message passing.
•   Distributed processing.
•   Horizontally scalable.
•   Incremental algorithms.
•   Fast.

• Data in motion.
STORM




                          Nimbus                                       Zookeeper



        Supervisor                 Supervisor                 Supervisor
        Worker


                 Worker


                          Worker



                                   Worker




                                                              Worker
                                            Worker




                                                                        Worker
                                                     Worker




                                                                                 Worker
        Worker Node                Worker Node                Worker Node
STORM

• Tuple




• Stream
STORM

• Spout




• Bolt
STORM

• Grouping
A DATA SYSTEM
DATA IS MORE THAN INFORMATION




            Not all information is equal.
      Some information is derived from other pieces of information.
DATA IS MORE THAN INFORMATION




Eventually you will reach the most ‘raw’
         form of information.
      This is the information you hold true, simple because it exists.
                 Let’s call this ‘data’, very similar to ‘event’.
EVENTS
Everything we do generates events:
  •   Pay with Credit Card
  •   Commit to Git
  •   Click on a webpage
  •   Tweet
EVENTS - BEFORE




       Events used to manipulate
            the master data.
EVENTS - AFTER




       Today, events are the master
                  data.
DATA SYSTEM




         Let’s store everything.
EVENTS




         Data is Immutable
EVENTS




         Data is Time Based
CAPTURING CHANGE TRADITIONALLY




Person   Location          Person   Location
Nathan   Antwerp           Nathan   Ghent
Geert    Dendermonde       Geert    Dendermonde
John     Ghent             John     Ghent
CAPTURING CHANGE




Person   Location      Timestamp    Person   Location      Time

                                    Nathan   Antwerp       2005-01-01
Nathan   Antwerp       2005-01-01

                                    Geert    Dendermonde   2011-10-08
Geert    Dendermonde   2011-10-08

                                    John     Ghent         2010-05-02
John     Ghent         2010-05-02

                                    Nathan   Ghent         2013-02-03
QUERY




        The data you query is often
        transformed, aggregated, ...
            Rarely used in it’s original form.
QUERY




   Query = function ( data )
NUMBER OF PEOPLE LIVING IN EACH CITY.




Person   Location      Time         Location      Count
Nathan   Antwerp       2005-01-01   Ghent         2
                                    Dendermonde   1
Geert    Dendermonde   2011-10-08



John     Ghent         2010-05-02



Nathan   Ghent         2013-02-03
QUERY




        All Data   Query
QUERY: PRECOMPUTE




     All Data       Precomputed
                       View       Query
LAYERED ARCHITECTURE




                  Batch Layer


                 Speed Layer


                 Serving Layer
LAYERED ARCHITECTURE


                                     SQL




                                                    Query
Incoming Data

                        HD Insight
                                           Column
                                            Store
BATCH LAYER
BATCH LAYER




Incoming Data

                HD Insight
                             Column
                              Store
BATCH LAYER




       Unrestrained computation.
BATCH LAYER




              Horizontal scalable.
BATCH LAYER




                  High Latency.
       Let’s pretend temporarily that update latency
                       doesn’t matter.
BATCH LAYER




      Stores master copy of data set...
               append only.
BATCH LAYER
BATCH: VIEW GENERATION




                                     View #1

     Master Dataset


                                     View #2
                         MapReduce




                                     View #3
MAPREDUCE


          1. Take a large problem and divide it into sub-problems

                                                              …
 MAP




          2. Perform the same function on all sub-problems
                                                              …
                DoWork()      DoWork()         DoWork()



          3. Combine the output from all sub-problems
 REDUCE




                                                …



                                 Output
BATCH VIEW DATABASE




           Read only database.
             No random writes required.
BATCH LAYER



We are not done yet…             Just a few hours of data.




                                     Not yet
Data absorbed into Batch Views      absorbed.


                       Time




                                                         Now
SPEED LAYER
OVERVIEW


                             SQL




Incoming Data

                HD Insight
                                   Column
                                    Store
SPEED LAYER




              Stream processing.
SPEED LAYER




        Continuous computation.
SPEED LAYER




              Transactional.
SPEED LAYER




     Storing a limited window of data.
         Compensating for the last few hours of data.
SPEED LAYER




   All the complexity is isolated in the
  Speed layer. If anything goes wrong,
            it’s auto-corrected.
CAP

You have a choice between:
• Availability
  • Queries are eventual consistent.
• Consistency
  • Queries are consistent.
EVENTUAL ACCURACY




     Some algorithms are hard to
   implement in real time. For those
  cases we could estimate the results.
SPEED LAYER




                 Real
                 Time
                View 1



Incoming Data

                 Real
                 Time
                View 2
SPEED LAYER VIEWS
• The views are stored in Read & Write database.
  •   MS SQL Server
  •   Column Store
  •   Cassandra
  •   …
• Much more complex than a read only view.
SERVING LAYER
OVERVIEW


                             SQL




                                            Query
Incoming Data

                HD Insight
                                   Column
                                    Store
SERVING LAYER




   This layer queries the Batch & Real
        Time views and merges it.
SERVING LAYER




            Batch
            Views




                    Merge


             Real
            Time
            Views
SERVING LAYER




      Polybase is a great fit.
OVERVIEW
OVERVIEW


                             SQL




                                            Query
Incoming Data

                HD Insight
                                   Column
                                    Store
LAMBDA ARCHITECTURE
• Can discard any view, batch and real time, and just recreate
  everything from the master data.
• Mistakes are corrected via recomputation.
  • Write bad data? Remove the data & recompute.
  • Bug in view generation? Just recompute the view.
• Data storage is highly optimized.
MICROSOFT BIG DATA
WHAT IS MICROSOFT DOING ON
THE BI & DEVELOPMENT SIDE
INSIGHTS FROM ANY DATA, ANY SIZE, ANYWHERE




                                   010101010101010101
                                    1010101010101010
                                     01010101010101
                                      101010101010
WE DELIVER INSIGHTS TO EVERYONE BY ENABLING BIG DATA
ANALYSIS WITH FAMILIAR END USER TOOLS
Benefits




               Interaction and analysis of
               unstructured data in Hadoop
Key Features




               Hive add-in for Excel
UNLOCKING IMMERSIVE INSIGHTS FROM ALL DATA
WITH MICROSOFT BI TOOLS
Benefits




               Familiar self service BI tools
Key Features




               Hive ODBC Driver integrates Hadoop
               to SQL Server Analysis Services,
               PowerPivot, and Power View
WHILE DRAMATICALLY SIMPLIFYING PROGRAMMING
ON HADOOP

               MapReduce
               programs
Benefits




               in JavaScript


               Simplified                     Simplified Deployment of
               Programming                    MapReduce jobs
Key Features




                               JS
                                              Deploy JavaScript Hadoop
               Integration with .NET and      jobs from a simple web
               new JavaScript libraries for   browser on any supported
               Hadoop                         device
WE MANAGE STREAMING DATA WITH STREAMINSIGHT
Benefits
Key Features




               StreamInsight   SQL StreamInsight
WHAT IS MICROSOFT DOING ON
THE HADOOP & INTEGRATION SIDE?
WE MANAGE RELATIONAL DATA WITH MICROSOFT
ENTERPRISE DATA WAREHOUSE SOLUTIONS
              Reference Architectures   Appliances




                                        Dell Parallel   HP Enterprise
                                        Data            Data
                    Fast Track for      Warehouse       Warehouse



                                        Dell
                                                        HP Business
                                        Quickstart
                                                        Data
                                        Data
                                                        Warehouse
                                        Warehouse
INTRODUCING POLYBASE
Fundamental Breakthrough in Data Processing




                                   Single Query; Structured and Unstructured
       SQL
                               •     Query and join Hadoop tables with Relational Tables

             SQL Server 2012   •     Use Standard SQL language
             PDW Powered               •    Select, From Where
             by PolyBase


                                   Existing SQL         No IT      Save Time     Analyze All
                                      Skillset      Intervention   and Costs     Data Types
AND SUPPORT UNSTRUCTURED DATA WITH ENTERPRISE
CLASS HADOOP ON PREMISE AND IN THE CLOUD
Benefits
Key Features
MICROSOFT BRINGS THE SIMPLICITY AND MANAGEABILITY
OF WINDOWS AND SQL SERVER TO HADOOP
 Benefits
 Key Features
MICROSOFT DELIVERS BIG DATA THROUGH OPEN
PLATFORM AND A RICH PARTNER ECOSYSTEM
Benefits
Key Features
BIG DATA DEMO:
FROM DATA TO INSIGHTS!



             Analysis with familiar   Collaboration on
Simplicity   tools                    insights
THANK YOU!!!
RESOURCES
•   Microsoft Big Data Solution: www.microsoft.com/bigdata
•   Windows Azure: www.windowsazure.com/en-us/home/scenarios/big-data
•   Try Now: https://www.hadooponazure.com
•   HDInsight For Windows Beta Download: http://hortonworks.com/download/
•   HDInsight Services For Windows:
    http://social.technet.microsoft.com/wiki/contents/articles/6204.hdinsight-services-for-
    windows.aspx#videos
•   Hadoop in PowerPivot: http://social.technet.microsoft.com/wiki/contents/articles/6294.how-to-
    connect-excel-powerpivot-to-hive-on-azure-via-hiveodbc.aspx
•   Hadoop in SSIS: http://msdn.microsoft.com/en-us/library/jj720569.aspx
•   Hurricane Sandy: http://sqlcat.com/sqlcat/b/msdnmirror/archive/2013/02/01/hurricane-sandy-
    mash-up-hive-sql-server-powerpivot-amp-power-view.aspx
•   Hadoop PowerShell: http://blogs.msdn.com/b/cindygross/archive/2012/08/23/how-to-install-the-
    powershell-cmdlets-for-apache-hadoop-based-services-for-windows.aspx
•   SQL Server BCP to Hive: http://blogs.msdn.com/b/cindygross/archive/2012/09/28/load-sql-server-
    bcp-data-to-hive.aspx
•   Internal vs External Table Hive: http://blogs.msdn.com/b/cindygross/archive/2013/02/06/hdinsight-
    hive-internal-and-external-tables-intro.aspx
•   Microsoft.NET SDK for Hadoop: http://hadoopsdk.codeplex.com/
•   Twitter Analytics Example: http://twitterbigdata.codeplex.com/
DATACRUNCHERS

We enable companies in envisioning, defining and implementing a data
strategy.
A one-stop-shop for all your Big Data needs.


The first Big Data Consultancy agency in Belgium.

More Related Content

What's hot

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...StampedeCon
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015StampedeCon
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Thingselephantscale
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidSalil Kalia
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidYahoo Developer Network
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talkDanny Yuan
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platformhadooparchbook
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospectc-bslim
 

What's hot (20)

Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with Druid
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talk
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 

Viewers also liked

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
PPT on Microsoft Corporation
PPT on Microsoft CorporationPPT on Microsoft Corporation
PPT on Microsoft CorporationVijaykumar Nishad
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveThe_IPA
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platformIBM Sverige
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Industrial Internet of Things -- Microsoft DC Azure Meetup
Industrial Internet of Things -- Microsoft DC Azure MeetupIndustrial Internet of Things -- Microsoft DC Azure Meetup
Industrial Internet of Things -- Microsoft DC Azure MeetupStephen Bates
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
 
SME Funding in Horizon 2020 - Are You Ready?
SME Funding in Horizon 2020 - Are You Ready?SME Funding in Horizon 2020 - Are You Ready?
SME Funding in Horizon 2020 - Are You Ready?Zaz Ventures
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Industrial Internet of Things and (Machine to Machine) M2M Overview
Industrial Internet of Things and (Machine to Machine) M2M OverviewIndustrial Internet of Things and (Machine to Machine) M2M Overview
Industrial Internet of Things and (Machine to Machine) M2M OverviewBryan Kester
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Strategic Analysis of Microsoft Corp. (2014)
Strategic Analysis of Microsoft Corp. (2014)Strategic Analysis of Microsoft Corp. (2014)
Strategic Analysis of Microsoft Corp. (2014)Chinmay Chauhan
 
Presentation on microsoft
Presentation on microsoftPresentation on microsoft
Presentation on microsoftJoel Pais
 
Winning competition through organizational agility
Winning competition through organizational agilityWinning competition through organizational agility
Winning competition through organizational agilityMcKinsey & Company
 
Microsoft corporation case analysis
Microsoft corporation case analysisMicrosoft corporation case analysis
Microsoft corporation case analysisWasim Parmar
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 

Viewers also liked (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
PPT on Microsoft Corporation
PPT on Microsoft CorporationPPT on Microsoft Corporation
PPT on Microsoft Corporation
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM Perspective
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platform
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Industrial Internet of Things -- Microsoft DC Azure Meetup
Industrial Internet of Things -- Microsoft DC Azure MeetupIndustrial Internet of Things -- Microsoft DC Azure Meetup
Industrial Internet of Things -- Microsoft DC Azure Meetup
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
SME Funding in Horizon 2020 - Are You Ready?
SME Funding in Horizon 2020 - Are You Ready?SME Funding in Horizon 2020 - Are You Ready?
SME Funding in Horizon 2020 - Are You Ready?
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Industrial Internet of Things and (Machine to Machine) M2M Overview
Industrial Internet of Things and (Machine to Machine) M2M OverviewIndustrial Internet of Things and (Machine to Machine) M2M Overview
Industrial Internet of Things and (Machine to Machine) M2M Overview
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Global flows in a digital age
Global flows in a digital ageGlobal flows in a digital age
Global flows in a digital age
 
Strategic Analysis of Microsoft Corp. (2014)
Strategic Analysis of Microsoft Corp. (2014)Strategic Analysis of Microsoft Corp. (2014)
Strategic Analysis of Microsoft Corp. (2014)
 
Presentation on microsoft
Presentation on microsoftPresentation on microsoft
Presentation on microsoft
 
Winning competition through organizational agility
Winning competition through organizational agilityWinning competition through organizational agility
Winning competition through organizational agility
 
Microsoft corporation case analysis
Microsoft corporation case analysisMicrosoft corporation case analysis
Microsoft corporation case analysis
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 

Similar to Microsoft Big Data @ SQLUG 2013

Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Fwdays
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityPaul Morse
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesVladimír Schreiner
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
Cloud connect 03 08-2011
Cloud connect 03 08-2011Cloud connect 03 08-2011
Cloud connect 03 08-2011Colin Clark
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Armel Nene
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 

Similar to Microsoft Big Data @ SQLUG 2013 (20)

Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data Pipelines
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
Cloud connect 03 08-2011
Cloud connect 03 08-2011Cloud connect 03 08-2011
Cloud connect 03 08-2011
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 

More from Nathan Bijnens

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AINathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beNathan Bijnens
 

More from Nathan Bijnens (10)

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Microsoft Big Data @ SQLUG 2013

  • 1. BIG DATA Wesley Backelant Technology Advisor Microsoft @WesleyBackelant Nathan Bijnens Big Data Consultant DataCrunchers @nathan_gs
  • 2. AGENDA • Big Data • Hadoop (& Ecosystem) • How does it fit in the Microsoft world? • Demo • Resources • Q&A
  • 3. THE WORLD OF DATA IS CHANGING
  • 4. TODAY A NEW SET OF QUESTIONS ARE BEING ASKED OF THE BUSINESS: What’s the social How do I better sentiment for my predict future brand or products outcomes? How do I optimize my fleet based on weather and traffic patterns?
  • 5. TRANSFORMATION OF ONLINE MARKETING BLOGS.FORBES.COM/DAVEFEINLEIB
  • 6. TRANSFORMATION OF OPERATIONS BLOGS.FORBES.COM/DAVEFEINLEIB
  • 7. TRANSFORMATION OF CUSTOMER SERVICE BLOGS.FORBES.COM/DAVEFEINLEIB
  • 9. TRANSFORMATION OF FRAUD DETECTION Then… Now…
  • 10. NEW HARDWARE APPROACH Traditional Big Data Exotic HW Commodity HW • Big central servers • racks of pizza boxes • SAN • Ethernet • RAID • JBOD Hardware reliability Unreliable HW Limited scalability Scales further Cost effective
  • 11. NEW SOFTWARE APPROACH Traditional Big Data Monolotic Distributed • Centralized - storage & compute nodes • RDBMS Raw data Schema first Proprietary
  • 12. HADOOP & BIG DATA ECOSYSTEM MapReduce HDFS
  • 13.
  • 14. HDFS
  • 15. HDFS
  • 16.
  • 20. HIVE
  • 21. HIVE A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. – Ideal for ad hoc querying – Query execution via MapReduce. Key Building Principles: – SQL – Extensibility – Types – Functions – Scripts
  • 22. HIVE It supports many SQL features like: – Data partitioning – Aggregations – Grouping – Joins
  • 23. HIVE And it’s extendable using UDFs. package com.example.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public final class Lower extends UDF { public Text evaluate(final Text s) { if (s == null) { return null; } return new Text(s.toString().toLowerCase()); } } There are many UDFs published by external parties, for: - Loading / Saving (SerDe) - Field Transformations
  • 24.
  • 25. HADOOP PIG: INTRO Pig is a high level data flow language.
  • 26. HADOOP PIG: 3 COMPONENTS • Pig Latin • Grunt • PigServer
  • 27. HADOOP PIG data = LOAD 'employee.csv' USING PigStorage() AS ( first_name:chararray, last_name:chararray, age:int, wage:float, department:chararray );
  • 28. HADOOP PIG grouped_by_department = GROUP data BY department; total_wage_by_department = FOREACH grouped_by_department GENERATE group AS department, COUNT(data) as employee_count, SUM(data::wage) AS total_wage; total_ordered = ORDER total_wage_by_department BY total_wage; total_limited = LIMIT total_ordered 10;
  • 29. HADOOP PIG DUMP total_limited; STORE total_limited INTO ‘/test/’;
  • 30. UDF ● Custom Load and Store classes. ● Hbase ● ProtocolBuffers ● CombinedLog ● Custom extraction eg. date, ... ● Take a look at the PiggyBank.
  • 31.
  • 32. HBASE A distributed, versioned, column-oriented database. • Main features: • Horizontal scalability • Machine failure tolerance • Row-level atomic operations including compare-and-swap ops like incrementing counters • Augmented key-value schemas, the user can group columns into families which are configured independently • Multiple clients like its native Java library, Thrift, and REST • Upcoming Security
  • 33.
  • 34. STORM
  • 35. STORM
  • 36. STORM • Message passing. • Distributed processing. • Horizontally scalable. • Incremental algorithms. • Fast. • Data in motion.
  • 37. STORM Nimbus Zookeeper Supervisor Supervisor Supervisor Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Node Worker Node Worker Node
  • 42. DATA IS MORE THAN INFORMATION Not all information is equal. Some information is derived from other pieces of information.
  • 43. DATA IS MORE THAN INFORMATION Eventually you will reach the most ‘raw’ form of information. This is the information you hold true, simple because it exists. Let’s call this ‘data’, very similar to ‘event’.
  • 44. EVENTS Everything we do generates events: • Pay with Credit Card • Commit to Git • Click on a webpage • Tweet
  • 45. EVENTS - BEFORE Events used to manipulate the master data.
  • 46. EVENTS - AFTER Today, events are the master data.
  • 47. DATA SYSTEM Let’s store everything.
  • 48. EVENTS Data is Immutable
  • 49. EVENTS Data is Time Based
  • 50. CAPTURING CHANGE TRADITIONALLY Person Location Person Location Nathan Antwerp Nathan Ghent Geert Dendermonde Geert Dendermonde John Ghent John Ghent
  • 51. CAPTURING CHANGE Person Location Timestamp Person Location Time Nathan Antwerp 2005-01-01 Nathan Antwerp 2005-01-01 Geert Dendermonde 2011-10-08 Geert Dendermonde 2011-10-08 John Ghent 2010-05-02 John Ghent 2010-05-02 Nathan Ghent 2013-02-03
  • 52. QUERY The data you query is often transformed, aggregated, ... Rarely used in it’s original form.
  • 53. QUERY Query = function ( data )
  • 54. NUMBER OF PEOPLE LIVING IN EACH CITY. Person Location Time Location Count Nathan Antwerp 2005-01-01 Ghent 2 Dendermonde 1 Geert Dendermonde 2011-10-08 John Ghent 2010-05-02 Nathan Ghent 2013-02-03
  • 55. QUERY All Data Query
  • 56. QUERY: PRECOMPUTE All Data Precomputed View Query
  • 57. LAYERED ARCHITECTURE Batch Layer Speed Layer Serving Layer
  • 58. LAYERED ARCHITECTURE SQL Query Incoming Data HD Insight Column Store
  • 60. BATCH LAYER Incoming Data HD Insight Column Store
  • 61. BATCH LAYER Unrestrained computation.
  • 62. BATCH LAYER Horizontal scalable.
  • 63. BATCH LAYER High Latency. Let’s pretend temporarily that update latency doesn’t matter.
  • 64. BATCH LAYER Stores master copy of data set... append only.
  • 66. BATCH: VIEW GENERATION View #1 Master Dataset View #2 MapReduce View #3
  • 67. MAPREDUCE 1. Take a large problem and divide it into sub-problems … MAP 2. Perform the same function on all sub-problems … DoWork() DoWork() DoWork() 3. Combine the output from all sub-problems REDUCE … Output
  • 68. BATCH VIEW DATABASE Read only database. No random writes required.
  • 69. BATCH LAYER We are not done yet… Just a few hours of data. Not yet Data absorbed into Batch Views absorbed. Time Now
  • 71. OVERVIEW SQL Incoming Data HD Insight Column Store
  • 72. SPEED LAYER Stream processing.
  • 73. SPEED LAYER Continuous computation.
  • 74. SPEED LAYER Transactional.
  • 75. SPEED LAYER Storing a limited window of data. Compensating for the last few hours of data.
  • 76. SPEED LAYER All the complexity is isolated in the Speed layer. If anything goes wrong, it’s auto-corrected.
  • 77. CAP You have a choice between: • Availability • Queries are eventual consistent. • Consistency • Queries are consistent.
  • 78. EVENTUAL ACCURACY Some algorithms are hard to implement in real time. For those cases we could estimate the results.
  • 79. SPEED LAYER Real Time View 1 Incoming Data Real Time View 2
  • 80. SPEED LAYER VIEWS • The views are stored in Read & Write database. • MS SQL Server • Column Store • Cassandra • … • Much more complex than a read only view.
  • 82. OVERVIEW SQL Query Incoming Data HD Insight Column Store
  • 83. SERVING LAYER This layer queries the Batch & Real Time views and merges it.
  • 84. SERVING LAYER Batch Views Merge Real Time Views
  • 85. SERVING LAYER Polybase is a great fit.
  • 87. OVERVIEW SQL Query Incoming Data HD Insight Column Store
  • 88. LAMBDA ARCHITECTURE • Can discard any view, batch and real time, and just recreate everything from the master data. • Mistakes are corrected via recomputation. • Write bad data? Remove the data & recompute. • Bug in view generation? Just recompute the view. • Data storage is highly optimized.
  • 90. WHAT IS MICROSOFT DOING ON THE BI & DEVELOPMENT SIDE
  • 91. INSIGHTS FROM ANY DATA, ANY SIZE, ANYWHERE 010101010101010101 1010101010101010 01010101010101 101010101010
  • 92. WE DELIVER INSIGHTS TO EVERYONE BY ENABLING BIG DATA ANALYSIS WITH FAMILIAR END USER TOOLS Benefits Interaction and analysis of unstructured data in Hadoop Key Features Hive add-in for Excel
  • 93. UNLOCKING IMMERSIVE INSIGHTS FROM ALL DATA WITH MICROSOFT BI TOOLS Benefits Familiar self service BI tools Key Features Hive ODBC Driver integrates Hadoop to SQL Server Analysis Services, PowerPivot, and Power View
  • 94. WHILE DRAMATICALLY SIMPLIFYING PROGRAMMING ON HADOOP MapReduce programs Benefits in JavaScript Simplified Simplified Deployment of Programming MapReduce jobs Key Features JS Deploy JavaScript Hadoop Integration with .NET and jobs from a simple web new JavaScript libraries for browser on any supported Hadoop device
  • 95. WE MANAGE STREAMING DATA WITH STREAMINSIGHT Benefits Key Features StreamInsight SQL StreamInsight
  • 96. WHAT IS MICROSOFT DOING ON THE HADOOP & INTEGRATION SIDE?
  • 97. WE MANAGE RELATIONAL DATA WITH MICROSOFT ENTERPRISE DATA WAREHOUSE SOLUTIONS Reference Architectures Appliances Dell Parallel HP Enterprise Data Data Fast Track for Warehouse Warehouse Dell HP Business Quickstart Data Data Warehouse Warehouse
  • 98. INTRODUCING POLYBASE Fundamental Breakthrough in Data Processing Single Query; Structured and Unstructured SQL • Query and join Hadoop tables with Relational Tables SQL Server 2012 • Use Standard SQL language PDW Powered • Select, From Where by PolyBase Existing SQL No IT Save Time Analyze All Skillset Intervention and Costs Data Types
  • 99. AND SUPPORT UNSTRUCTURED DATA WITH ENTERPRISE CLASS HADOOP ON PREMISE AND IN THE CLOUD Benefits Key Features
  • 100. MICROSOFT BRINGS THE SIMPLICITY AND MANAGEABILITY OF WINDOWS AND SQL SERVER TO HADOOP Benefits Key Features
  • 101. MICROSOFT DELIVERS BIG DATA THROUGH OPEN PLATFORM AND A RICH PARTNER ECOSYSTEM Benefits Key Features
  • 102. BIG DATA DEMO: FROM DATA TO INSIGHTS! Analysis with familiar Collaboration on Simplicity tools insights
  • 104. RESOURCES • Microsoft Big Data Solution: www.microsoft.com/bigdata • Windows Azure: www.windowsazure.com/en-us/home/scenarios/big-data • Try Now: https://www.hadooponazure.com • HDInsight For Windows Beta Download: http://hortonworks.com/download/ • HDInsight Services For Windows: http://social.technet.microsoft.com/wiki/contents/articles/6204.hdinsight-services-for- windows.aspx#videos • Hadoop in PowerPivot: http://social.technet.microsoft.com/wiki/contents/articles/6294.how-to- connect-excel-powerpivot-to-hive-on-azure-via-hiveodbc.aspx • Hadoop in SSIS: http://msdn.microsoft.com/en-us/library/jj720569.aspx • Hurricane Sandy: http://sqlcat.com/sqlcat/b/msdnmirror/archive/2013/02/01/hurricane-sandy- mash-up-hive-sql-server-powerpivot-amp-power-view.aspx • Hadoop PowerShell: http://blogs.msdn.com/b/cindygross/archive/2012/08/23/how-to-install-the- powershell-cmdlets-for-apache-hadoop-based-services-for-windows.aspx • SQL Server BCP to Hive: http://blogs.msdn.com/b/cindygross/archive/2012/09/28/load-sql-server- bcp-data-to-hive.aspx • Internal vs External Table Hive: http://blogs.msdn.com/b/cindygross/archive/2013/02/06/hdinsight- hive-internal-and-external-tables-intro.aspx • Microsoft.NET SDK for Hadoop: http://hadoopsdk.codeplex.com/ • Twitter Analytics Example: http://twitterbigdata.codeplex.com/
  • 105. DATACRUNCHERS We enable companies in envisioning, defining and implementing a data strategy. A one-stop-shop for all your Big Data needs. The first Big Data Consultancy agency in Belgium.