Trends in Big Data
Selvaraaju Murugesan
https://www.linkedin.com/in/selvaraaju/
Overview
• Big data evolution and role of data
• Big data ecosystem and tools
• Demo of MapR cluster
• Trends in next 5 years
• My personal recommendations
• Conclusion
When big data was small ?
• Most of the transactional data was/(is) stored in
databases
• Social media was getting some traction
• Mobile phone penetration was very low
• Managers made decision on reports that is based on
static data (there is no live feed of data that influences
their decision at right time)
Big data landscape has changed !
Old Vs New Paradigm
Distributed Storage
Big data ≠ Hadoop
Main players
MapR Converged Data Platform
David vs Goliath
https://www.dezyre.com/article/cloudera-vs-hortonworks-vs-mapr-hadoop-distribution-comparison-/190
Ecosystem tools
Big data process
Ingestion Storage Analysis Presentation
Data Ingestion : Flume / Streamsets / Impala
Hive / Hue
Self-Service Data Exploration
Data Agility with Less IT Required
Single SQL Interface for Structured and
Semi-Structured Data
Data Exploration
Data Analytics – R / Spark R
Operationalise – Spark
Trends in next 5 years
• Every home will not have super computers but powerful nodes that
can do distributed computing and storage
• Analysing data and decision making will be performed by 5 year old
using standard AI libraries and cheap hardware
• Big data will empower deep learning
• Bots will try to mimic human services
What is after big data ?
Recommendation
• Big data platform can be implemented in many ways ; Hadoop is not
the only option !
• Analyse important data bytes that is relevant to make business
decisions
• Beware of cloud providers and their traps
• Data driven decision making but hunch is very important
Creativity > big data
Trends in big data

Trends in big data

Editor's Notes

  • #17 With the volume and velocity of IoT data an important advantage is the ability to explore the data directly, without requiring IT to set up the data This is where Apache Drill comes in. This is a SQL query engine that supports self-service data exploration without the need to predefine a schema. Drill is ANSI SQL compliant and plugs right into all of those BI tools you are accustomed to using. With Drill you simply query your data in place; there is no need to perform ETL or to move your data. After all, if you currently use business intelligence tools, you should be enabled to still use them. Data exploration…before and after…. No IT step required…
  • #18 With the volume and velocity of IoT data an important advantage is the ability to explore the data directly, without requiring IT to set up the data This is where Apache Drill comes in. This is a SQL query engine that supports self-service data exploration without the need to predefine a schema. Drill is ANSI SQL compliant and plugs right into all of those BI tools you are accustomed to using. With Drill you simply query your data in place; there is no need to perform ETL or to move your data. After all, if you currently use business intelligence tools, you should be enabled to still use them. Data exploration…before and after…. No IT step required…
  • #19 With the volume and velocity of IoT data an important advantage is the ability to explore the data directly, without requiring IT to set up the data This is where Apache Drill comes in. This is a SQL query engine that supports self-service data exploration without the need to predefine a schema. Drill is ANSI SQL compliant and plugs right into all of those BI tools you are accustomed to using. With Drill you simply query your data in place; there is no need to perform ETL or to move your data. After all, if you currently use business intelligence tools, you should be enabled to still use them. Data exploration…before and after…. No IT step required…