Big Trends
in

Big Data
2013 AITP Region-5 Technical Conference

-Naresh Chintalcheru
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Apache Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Batch to Real Time

Changing image of Big Data from Batch to Real Time
Hadoop + MapReduce = Batch Processing
Batch to Real Time
● Companies need real time processing of Big Data for
various applications including online Fraud Detection,
CEP (Complex Event Processing) and more.
● Emerging new frameworks, architectures and tools are
making the real time processing dream come true.
Big Data Real-Time Computing Systems
● Twitter’s Storm is an open source, distributed, faulttolerant and real time computation system.
○ Storm is a stream processing system
○ Unlike Hadoop jobs Strom jobs never stop continue
to process data as it arrives
● Other Real Time systems include Streambase,
HStreaming, Apache S4, Dempsy and Esper.
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Sql Tools
Big Data Processing include ...
● Writing complex Java MapReduce Jobs
● Apache Pig Latin scripting
● Slow Sql processing from Apache Hive
Big Data Sql Tools
Inspired with Google’s Dremel paper now many vendors
offer faster SQL based tools
● Google BigQuery
● Cloudera Impala
● IBM BigSql
● Greenplum HAWQ
● Hortonworks Stinger (Improve Hive Sql by x100)
● Apache Drill
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data And Cloud
Big Data needs many computing nodes for Data Storage
and Data Processing which are elastic in nature …
● Cloud VM based computing is a perfect solution for
Big Data infrastructure
● Public Cloud MegaStar Amazon AWS announced
support for Hadoop, which means spin off Hadoop
installed VM with basic configuration in 10mins
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Hadoop 2.0
New in Hadoop 2x
● Improved Performance with YARN aka MapReduce 2.0
● Improved Scalability with HDFS Federation
● Support for Microsoft Windows
● Improved Security
● HDFS Snapshots
Hadoop 2.0 - Performance
Improved Performance with YARN aka MapReduce 2.0
● MapReduce JobTracker managed both Resource
management and App Job life-cycle together before.
● Now two functions are divided into separate
components.
● Application Master negotiates with global Resource
Manager for various Job requests
Hadoop 2.0 - Scalability
HDFS Federation
● No more single NameNode(NN) and SNN.
● HDFS Federation supports multiple independent
NameNodes and Namespaces.
● Each DataNode(DN) registers with all the NameNodes in
the cluster. DN sends periodic heartbeats & block
reports and handle commands from all NN.
Hadoop 2.0 - Security
Improved Security
● Enforcement of HDFS file permission by NN and Access
Control List (ACL) of users and groups
● Block Access Tokens for access control to Data block.
● Job Tokens to enforce Task authorization
● Network Encryption & Kerberos RPC. Now HDFS file
transfer can be configured for encryption
Hadoop 2.0 - HDFS Snapshots
Improved Backup & Disaster Recovery
● HDFS Snapshots are read-only point-in-time copies of
the file system.
● Snapshots can be taken on a subtree or entire file
system.
● Useful for data backup, protection against user errors
and disaster recovery
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Applications
● Infrastructure layer of Big Data is largely solved (.........
secret Hadoop)
● Now the future innovation is focused on applications and
analytics
Big Data Analytic Applications
Pattern Discovery and Sense-Making based analytic
applications.
● Wibi Data: Lessons learned and predictive apps
● Recorded Future: Web intelligence for Business decisions
● Nutonian: Uncovers relationships hidden with in complex
data
● R Studio: Data analysis tool
Big Data - Visualization Applications
Sophisticated Big Data Visualization tools.
● IBM BigSheets
● D3.js
● Fathom
● Processing.org
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data & Business Intelligence
Support from various BI vendors IBM Cognos, SAP Business
Objects & Oracle Hyperion to connect directly to Hadoop Data
using Apache Hive connectors.
Big Data & Data Warehouse
Challenge of new multiple unstructured data sources such as
Clickstreams, Social media, Mobile, Sensors and Web Logs
requires massive processing and traditional data warehouse
cost to scale.
The Big question is data warehouse survive the Big Data ?
More on this in my next presentation :)
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Vision

Big Data requires a Big Vision
Big Data requires Big Vision
● Unlike Business Intelligence, Big Data is an innovation
originated from the IT side.
● The Business departments, which should come up with Big
Data usage requirements needs constant coaching on the
potential of the Big Data intelligence and successful
stories.
Thank You
Feedback appreciated
Nash Chintalcheru
Chintal75@gmail.com
309-242-1615
Presentation pdf : www.slideshare.net/chintal75

Big Trends in Big Data

  • 1.
    Big Trends in Big Data 2013AITP Region-5 Technical Conference -Naresh Chintalcheru
  • 2.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Apache Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 3.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 4.
    Batch to RealTime Changing image of Big Data from Batch to Real Time Hadoop + MapReduce = Batch Processing
  • 5.
    Batch to RealTime ● Companies need real time processing of Big Data for various applications including online Fraud Detection, CEP (Complex Event Processing) and more. ● Emerging new frameworks, architectures and tools are making the real time processing dream come true.
  • 6.
    Big Data Real-TimeComputing Systems ● Twitter’s Storm is an open source, distributed, faulttolerant and real time computation system. ○ Storm is a stream processing system ○ Unlike Hadoop jobs Strom jobs never stop continue to process data as it arrives ● Other Real Time systems include Streambase, HStreaming, Apache S4, Dempsy and Esper.
  • 7.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 8.
    Big Data SqlTools Big Data Processing include ... ● Writing complex Java MapReduce Jobs ● Apache Pig Latin scripting ● Slow Sql processing from Apache Hive
  • 9.
    Big Data SqlTools Inspired with Google’s Dremel paper now many vendors offer faster SQL based tools ● Google BigQuery ● Cloudera Impala ● IBM BigSql ● Greenplum HAWQ ● Hortonworks Stinger (Improve Hive Sql by x100) ● Apache Drill
  • 10.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 11.
    Big Data AndCloud Big Data needs many computing nodes for Data Storage and Data Processing which are elastic in nature … ● Cloud VM based computing is a perfect solution for Big Data infrastructure ● Public Cloud MegaStar Amazon AWS announced support for Hadoop, which means spin off Hadoop installed VM with basic configuration in 10mins
  • 12.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 13.
    Hadoop 2.0 New inHadoop 2x ● Improved Performance with YARN aka MapReduce 2.0 ● Improved Scalability with HDFS Federation ● Support for Microsoft Windows ● Improved Security ● HDFS Snapshots
  • 14.
    Hadoop 2.0 -Performance Improved Performance with YARN aka MapReduce 2.0 ● MapReduce JobTracker managed both Resource management and App Job life-cycle together before. ● Now two functions are divided into separate components. ● Application Master negotiates with global Resource Manager for various Job requests
  • 15.
    Hadoop 2.0 -Scalability HDFS Federation ● No more single NameNode(NN) and SNN. ● HDFS Federation supports multiple independent NameNodes and Namespaces. ● Each DataNode(DN) registers with all the NameNodes in the cluster. DN sends periodic heartbeats & block reports and handle commands from all NN.
  • 16.
    Hadoop 2.0 -Security Improved Security ● Enforcement of HDFS file permission by NN and Access Control List (ACL) of users and groups ● Block Access Tokens for access control to Data block. ● Job Tokens to enforce Task authorization ● Network Encryption & Kerberos RPC. Now HDFS file transfer can be configured for encryption
  • 17.
    Hadoop 2.0 -HDFS Snapshots Improved Backup & Disaster Recovery ● HDFS Snapshots are read-only point-in-time copies of the file system. ● Snapshots can be taken on a subtree or entire file system. ● Useful for data backup, protection against user errors and disaster recovery
  • 18.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 19.
    Big Data Applications ●Infrastructure layer of Big Data is largely solved (......... secret Hadoop) ● Now the future innovation is focused on applications and analytics
  • 20.
    Big Data AnalyticApplications Pattern Discovery and Sense-Making based analytic applications. ● Wibi Data: Lessons learned and predictive apps ● Recorded Future: Web intelligence for Business decisions ● Nutonian: Uncovers relationships hidden with in complex data ● R Studio: Data analysis tool
  • 21.
    Big Data -Visualization Applications Sophisticated Big Data Visualization tools. ● IBM BigSheets ● D3.js ● Fathom ● Processing.org
  • 22.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 23.
    Big Data &Business Intelligence Support from various BI vendors IBM Cognos, SAP Business Objects & Oracle Hyperion to connect directly to Hadoop Data using Apache Hive connectors.
  • 24.
    Big Data &Data Warehouse Challenge of new multiple unstructured data sources such as Clickstreams, Social media, Mobile, Sensors and Web Logs requires massive processing and traditional data warehouse cost to scale. The Big question is data warehouse survive the Big Data ? More on this in my next presentation :)
  • 25.
    Agenda - BigData Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 26.
    Big Data Vision BigData requires a Big Vision
  • 27.
    Big Data requiresBig Vision ● Unlike Business Intelligence, Big Data is an innovation originated from the IT side. ● The Business departments, which should come up with Big Data usage requirements needs constant coaching on the potential of the Big Data intelligence and successful stories.
  • 28.
    Thank You Feedback appreciated NashChintalcheru Chintal75@gmail.com 309-242-1615 Presentation pdf : www.slideshare.net/chintal75