SlideShare a Scribd company logo
1 of 22
Download to read offline
Big data
The technology landscape and its applications.




                                                 Natalino Busa - 12 Feb. 2013
Outline


          ● Big Data: Who are thou?
          ● Big Data: The technology landscape

          ● Hadoop: Overview
          ● Analytics & Machine Learning
          ● Opportunities




                                            Natalino Busa - 12 Feb. 2013
Hype cycle on new IT technologies

                                    Gartner 2012




                                    Natalino Busa - 12 Feb. 2013
What is big data?

        DATA (structured and un-structured, Logs, ETL, social)


            Velocity               Diversity                Volume




                        BIG DATA


           Hardware                Software                Services

      Infrastructure            Marketing (e.g. Unica)    RDBMS
      (Private) Cloud           Analytics (Tableau)       OLAP
      Networking                Modeling (SAS)            Messaging



                                                                      Natalino Busa - 12 Feb. 2013
Big Data Heat map




                    Natalino Busa - 12 Feb. 2013
How big is big?

SkyTree (tm) defines: Analytics Requirements Index (ARI)

                                 ARI = # Rows × # Columns
                                          Time (secs)


Where          # Rows =                   Number of records being analyzed

               # Columns =                Number of variables captured in each record

               Time (secs) =              The timeframe within which to complete the analysis




 Example: For each view (1000 views/sec) produce a personalized banner
 I need to analyze 100 variables on 1000 records (historic data) every 1 ms

 ARI = (1000*100)/0.001 = 100 M values/sec




                                                                                  Natalino Busa - 12 Feb. 2013
What data?

Big Data can imply:


           ●   Complex Data refactoring in Batch                  (lots of rows)
           ●   Real-Time Event Processing                         (high-speed responses)
           ●   Multidimensional analisys                          (lots of parameters)

           ●   ... or any of those three
                                           Response
                                           time




                                                  Pa
                                                    ram
                                                          ete              s
                                                             rs       titie
                                                                    En

                                                                               Natalino Busa - 12 Feb. 2013
More data

                                                                           customers +
                                                         customers +       products +
                                  customers +            products +        surveys +
                customers +       products +             surveys +         transactions +
customers       products          surveys                transactions      social messages


Database        Databases         Federated Data         Aggregated Data   Linked Data            Just Data


Structured                                                                                   Unstructured



   ●    in today's IT environments there is a gradual shift
        from structured data to unstructured data

 RDBMS are well suited to deal with structured data ->
   but: more and complex ETL, how to deal with new data (structures) ?

 Map-Reduce and noSQL systems are good with unstructured data ->
  but: how to we query and analyze this data?



                                                                                 Natalino Busa - 12 Feb. 2013
Big Data: how to deal with it



        ●   Big Data at rest     (storage, access)
        ●   Big Data in motion   (streaming, dataflows)


        ●   Big Data analytics   (OLAP, OTAP, BI)
        ●   Big Data modeling    (predictive, machine learning)




                                                          Natalino Busa - 12 Feb. 2013
Big Data at rest

Analytical RDBMSs                (EDW) Oracle, IBM, and various MPP's

Hadoop Distributed Systems       HDFS (distributed file system)
                                 Hbase (Big Table)




                  Batch      Real-time

                 Cassandra       HBase                            Analytics

      Logs                HDFS                 EDW                  EDW       EDW




  ●   Traditional EDW and Distributed             ●   These systems do not exclude each
      BigData / NoSQL solutions are                   others and can coexist to form a full
      complementary to each other.                    enterprise level solution.


                                                                               Natalino Busa - 12 Feb. 2013
Big Data at rest

No need to get everything out of the hadoop ecosystem:

NoSQL DBMSs:            Couchbase ( ++ reads, caching)
                         Cassandra ( ++ writes, OLAP)

... hybrid solutions are also possible:

HDFS + Cassandra : in-memory analytics + large DFS
HDFS + Solr/Lucene: fast text search on a distributed file system




                                                                    Natalino Busa - 12 Feb. 2013
Big Data in motion

Stream processing // Dataflow architectures

Used to support the automatic analysis of data-in-motion in real-time or near real-time.

- Identify meaningful patterns
- Trigger action to respond to them as quickly as possible.



                                                       - Storm (from twitter)
                                                         dataflow processing framework
                                                         ++ multi-language

                                                       - Akka (from typesafe)
                                                         dataflow actor framework
                                                         ++ speed


                                                       Both are:
                                                       Distributed, fault-tolerant, streaming



                                                                                   Natalino Busa - 12 Feb. 2013
Big Data Landscape

                                           Machine Learning on Big Data



                    Unstructured
                                    SAS, R over HDFS                Mahout


                           REST
                  Logs     flume                 Hbase                    Hive
Data Interfaces




                           scribe                                                      ●   Batch Analytics
                                    HDFS                                               ●   Visualization
                                                               MapR              BI
                                                                                       ●   Monitoring
                                                                                       ●   Marketing
                           sqoop              Cassandra                   Pig
                  EDW
                           hiho

                    Unstructured
                                     FS          OLAP            OTAP Impala
                                                                                  ●   Real-Time Analytics
                                                                                  ●   Streaming
                                              STORM

                                                                                 Natalino Busa - 12 Feb. 2013
Lambda Architecture




                                    Logic layer
                                                   Software as a Service
                                                   e.g realt-time predictor




from http://www.manning.com/marz/
                                                  Natalino Busa - 12 Feb. 2013
Why do machine learning on big data




    http://www.skytree.net/why-do-machine-learning-on-big-data/



                                                                  Natalino Busa - 12 Feb. 2013
Machine Learning: What?
          SIMILARITY SEARCH
          Similarity search provides a way to find the
          objects that are the most similar, in an overall
          sense, to the object(s) of interest.


                                         PREDICTIVE ANALYTICS
                                         Predictive analytics is the science of analyzing current and
                                         historical facts/data to make predictions about future events.



             CLUSTERING AND SEGMENTATION
             Cluster analysis and segmentation represents a purely data
             driven approach to grouping similar objects, behaviors, or
             whatever is represented by the data.


From http://www.skytree.net/why-do-machine-learning-on-big-data/use-cases/                   Natalino Busa - 12 Feb. 2013
Word Counting on Map Reduce




                              Natalino Busa - 12 Feb. 2013
Machine learning on Map Reduce




     From http://www.slideshare.net/hadoop/modeling-with-hadoop-kdd2011




                                                                          Natalino Busa - 12 Feb. 2013
Machine learning on Map Reduce




From http://www.slideshare.net/hadoop/modeling-with-hadoop-kdd2011   Natalino Busa - 12 Feb. 2013
Machine Learning: Use Cases

 E-Commerce / E-Tailing
 ● Product Recommendation Engines
 ● Cross Channel Analytics
 ● Events/Activity Behavior Segmentation

 Product Marketing
 ● Campaign management and optimization
 ● Market and consumer segmentations
 ● Pricing Optimization

 Customer Marketing
 ● Customer Churn Management
 ● (Mobile) User Behavior Prediction
 ● Offer Personalization


                                           Natalino Busa - 12 Feb. 2013
Big Data: Opportunities

 Unstructured Data
 ● Clustering
 ● Distributed processing
 ● Distributed Storage

 Modeling & Analytics
 ● Distributed Machine Learning
 ● Fast Online Analytics Cubes

 Streaming and Real-Time processing
 ● Build RT profiles
 ● Decision trees and Predictions
 ● Offer Personalization



                                      Natalino Busa - 12 Feb. 2013
Thanks


         linkedin:
         www.linkedin.com/in/natalinobusa

         blog:
         www.natalinobusa.com

More Related Content

What's hot

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataNicha Tatsaneeyapan
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsWSO2
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousingKavisha Uniyal
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 

What's hot (20)

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 

Viewers also liked

Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Matt Turck
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0Matt Turck
 
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Alexander Crépin
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdpAIBDP
 
A chart of the big data ecosystem
A chart of the big data ecosystemA chart of the big data ecosystem
A chart of the big data ecosystemMatt Turck
 
Big Data Landscape 2016
Big Data Landscape 2016 Big Data Landscape 2016
Big Data Landscape 2016 Matt Turck
 
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)Matt Turck
 
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)Matt Turck
 

Viewers also liked (8)

Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark)
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
A chart of the big data ecosystem
A chart of the big data ecosystemA chart of the big data ecosystem
A chart of the big data ecosystem
 
Big Data Landscape 2016
Big Data Landscape 2016 Big Data Landscape 2016
Big Data Landscape 2016
 
Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)Big Data, Big Deal? (A Big Data 101 presentation)
Big Data, Big Deal? (A Big Data 101 presentation)
 
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
 

Similar to Big data landscape

13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_openingJazz Yao-Tsung Wang
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
8 douetteau - dataiku - data tuesday open source 26 fev 2013
8   douetteau - dataiku - data tuesday open source 26 fev 2013 8   douetteau - dataiku - data tuesday open source 26 fev 2013
8 douetteau - dataiku - data tuesday open source 26 fev 2013 Data Tuesday
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Pavlo Baron
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big dataAndrew Clegg
 

Similar to Big data landscape (20)

13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
8 douetteau - dataiku - data tuesday open source 26 fev 2013
8   douetteau - dataiku - data tuesday open source 26 fev 2013 8   douetteau - dataiku - data tuesday open source 26 fev 2013
8 douetteau - dataiku - data tuesday open source 26 fev 2013
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
Big Data
Big DataBig Data
Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
De-Mystifying Big Data
De-Mystifying Big DataDe-Mystifying Big Data
De-Mystifying Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big data
 

More from Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

More from Natalino Busa (19)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Big data landscape

  • 1. Big data The technology landscape and its applications. Natalino Busa - 12 Feb. 2013
  • 2. Outline ● Big Data: Who are thou? ● Big Data: The technology landscape ● Hadoop: Overview ● Analytics & Machine Learning ● Opportunities Natalino Busa - 12 Feb. 2013
  • 3. Hype cycle on new IT technologies Gartner 2012 Natalino Busa - 12 Feb. 2013
  • 4. What is big data? DATA (structured and un-structured, Logs, ETL, social) Velocity Diversity Volume BIG DATA Hardware Software Services Infrastructure Marketing (e.g. Unica) RDBMS (Private) Cloud Analytics (Tableau) OLAP Networking Modeling (SAS) Messaging Natalino Busa - 12 Feb. 2013
  • 5. Big Data Heat map Natalino Busa - 12 Feb. 2013
  • 6. How big is big? SkyTree (tm) defines: Analytics Requirements Index (ARI) ARI = # Rows × # Columns Time (secs) Where # Rows = Number of records being analyzed # Columns = Number of variables captured in each record Time (secs) = The timeframe within which to complete the analysis Example: For each view (1000 views/sec) produce a personalized banner I need to analyze 100 variables on 1000 records (historic data) every 1 ms ARI = (1000*100)/0.001 = 100 M values/sec Natalino Busa - 12 Feb. 2013
  • 7. What data? Big Data can imply: ● Complex Data refactoring in Batch (lots of rows) ● Real-Time Event Processing (high-speed responses) ● Multidimensional analisys (lots of parameters) ● ... or any of those three Response time Pa ram ete s rs titie En Natalino Busa - 12 Feb. 2013
  • 8. More data customers + customers + products + customers + products + surveys + customers + products + surveys + transactions + customers products surveys transactions social messages Database Databases Federated Data Aggregated Data Linked Data Just Data Structured Unstructured ● in today's IT environments there is a gradual shift from structured data to unstructured data RDBMS are well suited to deal with structured data -> but: more and complex ETL, how to deal with new data (structures) ? Map-Reduce and noSQL systems are good with unstructured data -> but: how to we query and analyze this data? Natalino Busa - 12 Feb. 2013
  • 9. Big Data: how to deal with it ● Big Data at rest (storage, access) ● Big Data in motion (streaming, dataflows) ● Big Data analytics (OLAP, OTAP, BI) ● Big Data modeling (predictive, machine learning) Natalino Busa - 12 Feb. 2013
  • 10. Big Data at rest Analytical RDBMSs (EDW) Oracle, IBM, and various MPP's Hadoop Distributed Systems HDFS (distributed file system) Hbase (Big Table) Batch Real-time Cassandra HBase Analytics Logs HDFS EDW EDW EDW ● Traditional EDW and Distributed ● These systems do not exclude each BigData / NoSQL solutions are others and can coexist to form a full complementary to each other. enterprise level solution. Natalino Busa - 12 Feb. 2013
  • 11. Big Data at rest No need to get everything out of the hadoop ecosystem: NoSQL DBMSs: Couchbase ( ++ reads, caching) Cassandra ( ++ writes, OLAP) ... hybrid solutions are also possible: HDFS + Cassandra : in-memory analytics + large DFS HDFS + Solr/Lucene: fast text search on a distributed file system Natalino Busa - 12 Feb. 2013
  • 12. Big Data in motion Stream processing // Dataflow architectures Used to support the automatic analysis of data-in-motion in real-time or near real-time. - Identify meaningful patterns - Trigger action to respond to them as quickly as possible. - Storm (from twitter) dataflow processing framework ++ multi-language - Akka (from typesafe) dataflow actor framework ++ speed Both are: Distributed, fault-tolerant, streaming Natalino Busa - 12 Feb. 2013
  • 13. Big Data Landscape Machine Learning on Big Data Unstructured SAS, R over HDFS Mahout REST Logs flume Hbase Hive Data Interfaces scribe ● Batch Analytics HDFS ● Visualization MapR BI ● Monitoring ● Marketing sqoop Cassandra Pig EDW hiho Unstructured FS OLAP OTAP Impala ● Real-Time Analytics ● Streaming STORM Natalino Busa - 12 Feb. 2013
  • 14. Lambda Architecture Logic layer Software as a Service e.g realt-time predictor from http://www.manning.com/marz/ Natalino Busa - 12 Feb. 2013
  • 15. Why do machine learning on big data http://www.skytree.net/why-do-machine-learning-on-big-data/ Natalino Busa - 12 Feb. 2013
  • 16. Machine Learning: What? SIMILARITY SEARCH Similarity search provides a way to find the objects that are the most similar, in an overall sense, to the object(s) of interest. PREDICTIVE ANALYTICS Predictive analytics is the science of analyzing current and historical facts/data to make predictions about future events. CLUSTERING AND SEGMENTATION Cluster analysis and segmentation represents a purely data driven approach to grouping similar objects, behaviors, or whatever is represented by the data. From http://www.skytree.net/why-do-machine-learning-on-big-data/use-cases/ Natalino Busa - 12 Feb. 2013
  • 17. Word Counting on Map Reduce Natalino Busa - 12 Feb. 2013
  • 18. Machine learning on Map Reduce From http://www.slideshare.net/hadoop/modeling-with-hadoop-kdd2011 Natalino Busa - 12 Feb. 2013
  • 19. Machine learning on Map Reduce From http://www.slideshare.net/hadoop/modeling-with-hadoop-kdd2011 Natalino Busa - 12 Feb. 2013
  • 20. Machine Learning: Use Cases E-Commerce / E-Tailing ● Product Recommendation Engines ● Cross Channel Analytics ● Events/Activity Behavior Segmentation Product Marketing ● Campaign management and optimization ● Market and consumer segmentations ● Pricing Optimization Customer Marketing ● Customer Churn Management ● (Mobile) User Behavior Prediction ● Offer Personalization Natalino Busa - 12 Feb. 2013
  • 21. Big Data: Opportunities Unstructured Data ● Clustering ● Distributed processing ● Distributed Storage Modeling & Analytics ● Distributed Machine Learning ● Fast Online Analytics Cubes Streaming and Real-Time processing ● Build RT profiles ● Decision trees and Predictions ● Offer Personalization Natalino Busa - 12 Feb. 2013
  • 22. Thanks linkedin: www.linkedin.com/in/natalinobusa blog: www.natalinobusa.com