Big Data Solutions on Cloud – the way
forward
By: K. A. Kiththi Perera
Chief Enterprise and Wholesale Officer
Sri Lanka Telecom
ITU-TRCSL Symposium on Cloud Computing 2015
Colombo
Session 04: Big Data Strategy in the Cloud and Applications
Big Data Analytics and
Cloud Computing
• Two ICT initiatives are currently top of mind for organizations;
– Big Data Analytics and
– Cloud Computing
• Big Data Analytics offer;
– Valuable insights to create competitive advantage
– Spark new innovations and
– Drive Revenue
• Cloud Computing offer;
– Enhance Business Agility and Productivity
– Enable greater efficiencies and
– Reduce Costs
Both Technologies continue to evolve
Big Data
Harnessing Big Data
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics and Processing (Big Data Architecture & technology)
Big Data – Variety and Complexity
What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
Value of Big Data Analytics
• Big Data is more real-
time in nature than
traditional DW
applications
• Traditional DW
Architectures (e.g.
Exadata, Teradata) are
not well-suited for big
data apps
• Shared, massively
parallel processing, scale
out architectures are
well-suited for big data
apps
“Without big data, you are blind
and deaf in the middle of a
freeway”
Geoffrey Moore, management consultant and theorist
Need to have a high-performance and easy-to-use data
transformation and analytic solution for Big Data
Scale and Architectures
Hadoop Functional Blocks
Hive - A high-level language built on top of MapReduce for analyzing large data sets .
Pig - Enables the analysis of large data sets using Pig Latin.
Sqoop - ("SQL to Hadoop") is a Java-based application designed for transferring bulk data between
Apache Hadoop and non-Hadoop data stores
Hadoop Core Components
• HDFS – Hadoop Distributed File System (Distributed Storage);
– Distributed across multiple “nodes”
– Natively redundant
– “NameNode” tracks locations
• Map Reduce (Distributed Processing);
– Split a task across processors
– Self-Healing, High Bandwidth
– Clustered Storage
– JobTracker manages TaskTrackers
Big Data and EDW to coexist?
Alternatives to Hadoop
• Many believe that Big Data and Hadoop is the only option
• Hadoop's historic focus on Batch Processing of data was well
supported by ‘MapReduce’
• But there is a need for more flexible developer tool to support;
– The larger market of 'mid-size data sets’ and
– Use cases that call for ‘real-time processing’
• Apache Spark: Preparing for the Next Wave of Reactive Big Data
Survey on Apache Spark
Hadoop and Spark –
work together
Cloud for Big Data ?
Economics of Cloud Users
Unused resources
• Pay by use instead of provisioning for peak
Static data center Data center in the cloud
Demand
Capacity
Time
Resources
Demand
Capacity
TimeResources
Cloud Computing Modalities
• Hosted Applications and services
• Pay-as-you-go model
• Scalability, fault-tolerance,
elasticity, and self-manageability
• Very large data repositories
• Complex analysis
• Distributed and parallel data
processing
“Can we outsource our IT software and
hardware infrastructure?”
“We have terabytes of click-stream data –
what can we do with it?”
EDBT 2011 Tutorial
Big Data - Cloud Option
and Challenges
• Key to big data success;
– Elastic Infrastructure and
– Data gravity
• Cloud is emerging as increasingly popular option for new
analytics applications and processing big data
• Challenge - movement of hundreds of terabytes or petabytes
of data across the network
– Traditional data is largely located in Enterprise Data Warehouse
– Limited speed in the WAN
• New data sets – weather data, census data, machine and
sensor data originate from outside the enterprise
– Cloud becomes the ideal place to capture and data processing
Cloud Service Providers to offer “Hadoop/Spark as a service”
bundled with “High Speed Connectivity”
SLT “akaza” cloud services
IAAS
Infrastructure
as a Service
SAAS
Software as
a Service
DAAS
Desktop as a
Service
CAAS
Communicati
on as a
Service
PAAS
Platform as a
Service
Big Data Use Cases
Optimize Funnel Conversion01
Behavioral Analytics02
Customer Segmentation03
Predictive Support04
Market Analysis and pricing optimization05
Predict Security Threats06
 Big data analytics allows companies to track
leads through the entire sales conversion
process, from a click on an adword ad to the
final transaction, in order to uncover insights
on how the conversion process can be
improved.
Optimize Funnel Conversion
COMPANY
T- Mobile
INDUSTRY
Communication
EMPLOYEES
38,000
TYPE
Optimize Funnel
Conversion
PURPOSE:
T- mobile uses multiple indicators, such as billing and sentiment
analysis, in order to identify customers that can be upgraded to
higher quality products, as well as to identify those with a high
lifetime customer – value, so its team can focus on retaining those
customers.
Optimize Funnel Conversion
 With access to data on consumer behavior,
companies can learn what prompts a customer
to stick around longer as well as learn more
about their customer’s characteristics and
purchasing habits in order to improve
marketing efforts and boost profits.
Behavioral Analytics
PURPOSE:
McDonalds tracks vast amounts of data in order to improve operations and
boost the customer experience. The company looks at factors such as the
design of the drive-thru, information provided on the menu, wait times,
size of orders and ordering patterns in order to optimize each restaurant
to its particular market.
Company
McDonald’s
Industry
Food and Beverage
Employees
750,000
Type
Behavioral Analytics
Behavioral Analytics
 By accessing data about the consumer from
multiple sources, such as social media data
and transaction history, companies can better
segment and target their customers and start
to make personalized offers to those
customers.
Customer Segmentation
COMPANY
Intercontinental Hotel
Group
INDUSTRY
Hotel/Travel
EMPLOYEES
7,981
TYPE
Customer Segmentation
PURPOSE:
IHG collects extensive data about their customers in order to provide a
personalized web experience for each customer, so as to boost
conversion rates. It also uses data analytics to evaluate and adjusts
marketing mix.
Customer Segmentation
 Through sensors and other machine-generated
data, companies can identify when a
malfunction is likely to occur. The company can
then proactively order parts and make repairs
in order to avoid downtime and lost profits.
Predictive Support
COMPANY
Southwest Airlines
INDUSTRY
Travel
EMPLOYEES
45,000
TYPE
Predictive Support
PURPOSE:
Southwest analyses sensor data on their planes in order to identify
patterns that indicate a potential malfunction or safety issue. This
allows the airline to address potential problems and make necessary
repairs without interrupting flights or putting passengers in
danger.
Predictive Support
“Information is the oil of the 21st
century, and analytics is the combustion
engine.”
By Peter Sondergaard, Gartner Research
References
• http://spark.apache.org/
• https://hadoop.apache.org/
• https://www.oracle.com/big-data/index.html
• http://www.computerworld.com/article/2929384/cloud-computing/
• http://www.thoughtworks.com/insights/blog/6-reasons-why-hadoop-cloud-makes-sense
• http://www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf
• http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/big-data-
cloud-technologies-brief.pdf
• https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135
• http://www.slideshare.net/Dell/big-data-use-cases-36019892
• http://www.rackspace.com/big-data
• http://www.microsoft.com/en-us/server-cloud/solutions/big-data.aspx
• http://www.slideshare.net/BernardMarr/big-data-news-feb-2015
• http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
• https://adtmag.com/blogs/dev-watch/2015/03/hadoop-and-spark-friends-or-foes.aspx
• http://www.datastax.com/resources/webinars/choosing-a-big-data-solution
• http://www.infosys.com/cloud/resource-center/Documents/big-data-spectrum.pdf
• http://www.slideshare.net/nasrinhussain1/big-data-ppt-31616290
• http://www.adamadiouf.com/2013/03/22/bigdata-vs-enterprise-data-warehouse/
Big data solutions on cloud – the way forward

Big data solutions on cloud – the way forward

  • 1.
    Big Data Solutionson Cloud – the way forward By: K. A. Kiththi Perera Chief Enterprise and Wholesale Officer Sri Lanka Telecom ITU-TRCSL Symposium on Cloud Computing 2015 Colombo Session 04: Big Data Strategy in the Cloud and Applications
  • 2.
    Big Data Analyticsand Cloud Computing • Two ICT initiatives are currently top of mind for organizations; – Big Data Analytics and – Cloud Computing • Big Data Analytics offer; – Valuable insights to create competitive advantage – Spark new innovations and – Drive Revenue • Cloud Computing offer; – Enhance Business Agility and Productivity – Enable greater efficiencies and – Reduce Costs Both Technologies continue to evolve
  • 3.
  • 4.
    Harnessing Big Data •OLTP: Online Transaction Processing (DBMSs) • OLAP: Online Analytical Processing (Data Warehousing) • RTAP: Real-Time Analytics and Processing (Big Data Architecture & technology)
  • 5.
    Big Data –Variety and Complexity
  • 6.
    What’s driving BigData - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time
  • 7.
    Value of BigData Analytics • Big Data is more real- time in nature than traditional DW applications • Traditional DW Architectures (e.g. Exadata, Teradata) are not well-suited for big data apps • Shared, massively parallel processing, scale out architectures are well-suited for big data apps
  • 8.
    “Without big data,you are blind and deaf in the middle of a freeway” Geoffrey Moore, management consultant and theorist Need to have a high-performance and easy-to-use data transformation and analytic solution for Big Data
  • 9.
  • 10.
    Hadoop Functional Blocks Hive- A high-level language built on top of MapReduce for analyzing large data sets . Pig - Enables the analysis of large data sets using Pig Latin. Sqoop - ("SQL to Hadoop") is a Java-based application designed for transferring bulk data between Apache Hadoop and non-Hadoop data stores
  • 11.
    Hadoop Core Components •HDFS – Hadoop Distributed File System (Distributed Storage); – Distributed across multiple “nodes” – Natively redundant – “NameNode” tracks locations • Map Reduce (Distributed Processing); – Split a task across processors – Self-Healing, High Bandwidth – Clustered Storage – JobTracker manages TaskTrackers
  • 13.
    Big Data andEDW to coexist?
  • 14.
    Alternatives to Hadoop •Many believe that Big Data and Hadoop is the only option • Hadoop's historic focus on Batch Processing of data was well supported by ‘MapReduce’ • But there is a need for more flexible developer tool to support; – The larger market of 'mid-size data sets’ and – Use cases that call for ‘real-time processing’ • Apache Spark: Preparing for the Next Wave of Reactive Big Data
  • 15.
  • 16.
    Hadoop and Spark– work together
  • 17.
  • 18.
    Economics of CloudUsers Unused resources • Pay by use instead of provisioning for peak Static data center Data center in the cloud Demand Capacity Time Resources Demand Capacity TimeResources
  • 19.
    Cloud Computing Modalities •Hosted Applications and services • Pay-as-you-go model • Scalability, fault-tolerance, elasticity, and self-manageability • Very large data repositories • Complex analysis • Distributed and parallel data processing “Can we outsource our IT software and hardware infrastructure?” “We have terabytes of click-stream data – what can we do with it?” EDBT 2011 Tutorial
  • 20.
    Big Data -Cloud Option and Challenges • Key to big data success; – Elastic Infrastructure and – Data gravity • Cloud is emerging as increasingly popular option for new analytics applications and processing big data • Challenge - movement of hundreds of terabytes or petabytes of data across the network – Traditional data is largely located in Enterprise Data Warehouse – Limited speed in the WAN • New data sets – weather data, census data, machine and sensor data originate from outside the enterprise – Cloud becomes the ideal place to capture and data processing Cloud Service Providers to offer “Hadoop/Spark as a service” bundled with “High Speed Connectivity”
  • 21.
    SLT “akaza” cloudservices IAAS Infrastructure as a Service SAAS Software as a Service DAAS Desktop as a Service CAAS Communicati on as a Service PAAS Platform as a Service
  • 22.
    Big Data UseCases Optimize Funnel Conversion01 Behavioral Analytics02 Customer Segmentation03 Predictive Support04 Market Analysis and pricing optimization05 Predict Security Threats06
  • 23.
     Big dataanalytics allows companies to track leads through the entire sales conversion process, from a click on an adword ad to the final transaction, in order to uncover insights on how the conversion process can be improved. Optimize Funnel Conversion
  • 24.
    COMPANY T- Mobile INDUSTRY Communication EMPLOYEES 38,000 TYPE Optimize Funnel Conversion PURPOSE: T-mobile uses multiple indicators, such as billing and sentiment analysis, in order to identify customers that can be upgraded to higher quality products, as well as to identify those with a high lifetime customer – value, so its team can focus on retaining those customers. Optimize Funnel Conversion
  • 25.
     With accessto data on consumer behavior, companies can learn what prompts a customer to stick around longer as well as learn more about their customer’s characteristics and purchasing habits in order to improve marketing efforts and boost profits. Behavioral Analytics
  • 26.
    PURPOSE: McDonalds tracks vastamounts of data in order to improve operations and boost the customer experience. The company looks at factors such as the design of the drive-thru, information provided on the menu, wait times, size of orders and ordering patterns in order to optimize each restaurant to its particular market. Company McDonald’s Industry Food and Beverage Employees 750,000 Type Behavioral Analytics Behavioral Analytics
  • 27.
     By accessingdata about the consumer from multiple sources, such as social media data and transaction history, companies can better segment and target their customers and start to make personalized offers to those customers. Customer Segmentation
  • 28.
    COMPANY Intercontinental Hotel Group INDUSTRY Hotel/Travel EMPLOYEES 7,981 TYPE Customer Segmentation PURPOSE: IHGcollects extensive data about their customers in order to provide a personalized web experience for each customer, so as to boost conversion rates. It also uses data analytics to evaluate and adjusts marketing mix. Customer Segmentation
  • 29.
     Through sensorsand other machine-generated data, companies can identify when a malfunction is likely to occur. The company can then proactively order parts and make repairs in order to avoid downtime and lost profits. Predictive Support
  • 30.
    COMPANY Southwest Airlines INDUSTRY Travel EMPLOYEES 45,000 TYPE Predictive Support PURPOSE: Southwestanalyses sensor data on their planes in order to identify patterns that indicate a potential malfunction or safety issue. This allows the airline to address potential problems and make necessary repairs without interrupting flights or putting passengers in danger. Predictive Support
  • 31.
    “Information is theoil of the 21st century, and analytics is the combustion engine.” By Peter Sondergaard, Gartner Research
  • 32.
    References • http://spark.apache.org/ • https://hadoop.apache.org/ •https://www.oracle.com/big-data/index.html • http://www.computerworld.com/article/2929384/cloud-computing/ • http://www.thoughtworks.com/insights/blog/6-reasons-why-hadoop-cloud-makes-sense • http://www.finance.gov.au/files/2013/03/Big-Data-Strategy-Issues-Paper1.pdf • http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/big-data- cloud-technologies-brief.pdf • https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135 • http://www.slideshare.net/Dell/big-data-use-cases-36019892 • http://www.rackspace.com/big-data • http://www.microsoft.com/en-us/server-cloud/solutions/big-data.aspx • http://www.slideshare.net/BernardMarr/big-data-news-feb-2015 • http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/ • https://adtmag.com/blogs/dev-watch/2015/03/hadoop-and-spark-friends-or-foes.aspx • http://www.datastax.com/resources/webinars/choosing-a-big-data-solution • http://www.infosys.com/cloud/resource-center/Documents/big-data-spectrum.pdf • http://www.slideshare.net/nasrinhussain1/big-data-ppt-31616290 • http://www.adamadiouf.com/2013/03/22/bigdata-vs-enterprise-data-warehouse/