SlideShare a Scribd company logo
1 of 23
Big Data
Platform and Architecture Recommendation
Sofyan Hadi Ahmad
sofyan.ahmad@mandalalabs.com
November 2018
Why Big Data is Important?
Despite the hype, many organizations don’t realize they have a big data problem
or they simply don’t think of it in terms of big data. In general, an organization is
likely to benefit from big data technologies when existing databases and
applications can no longer scale to support sudden increases in volume, variety,
and velocity of data.
Failure to correctly address big data challenges can result in escalating costs, as
well as reduced productivity and competitiveness. On the other hand, a sound big
data strategy can help organizations reduce costs and gain operational
efficiencies by migrating heavy existing workloads to big data technologies; as
well as deploying new applications to capitalize on new opportunities.
How Does Big Data Work?
With new tools that address the entire data management cycle, big data technologies make it technically
and economically feasible, not only to collect and store larger datasets, but also to analyze them in order
to uncover new and valuable insights. In most cases, big data processing involves a common data flow –
from collection of raw data to consumption of actionable information.
● Collect. Collecting the raw data – transactions, logs, mobile devices and more – is the first challenge many
organizations face when dealing with big data. A good big data platform makes this step easier, allowing developers
to ingest a wide variety of data – from structured to unstructured – at any speed – from real-time to batch.
● Store. Any big data platform needs a secure, scalable, and durable repository to store data prior or even after
processing tasks. Depending on your specific requirements, you may also need temporary stores for data in-transit.
● Process & Analyze. This is the step where data is transformed from its raw state into a consumable format – usually
by means of sorting, aggregating, joining and even performing more advanced functions and algorithms. The
resulting data sets are then stored for further processing or made available for consumption via business intelligence
and data visualization tools.
● Consume & Visualize. Big data is all about getting high value, actionable insights from your data assets. Ideally,
data is made available to stakeholders through self-service business intelligence and agile data visualization tools
that allow for fast and easy exploration of datasets. Depending on the type of analytics, end-users may also consume
the resulting data in the form of statistical “predictions” – in the case of predictive analytics – or recommended
actions – in the case of prescriptive analytics.
Automate
Business Process
Self Serve
Reporting Platform
Machine Learning
Analytics & BI
Data Lakes
Big Data
Infrastructure
Big Data & ML Goals
Platform
Which platform is the best to implement big data?
● Cloud or on-Premise?
● If Cloud, Google, AWS, Azure, or Cloudera Cloud?
● If on-premise, is Cloudera is the best way to go?
Cloud or On-Premise?
● Why Cloud
○ Planning, research, analytics for a lower
○ No Server
○ Get Started Tomorrow
○ Grow as fast you can imagine
○ Get out of old IT business line
○ Part of your Opex, not Capex
○ Low commitment, you can jump from one to another provider, just like counting 1, 2, 3
○ Safer and 99% uptime
● Why On-Premise
○ Meet your inhouse policies
○ You own the software, and the hardware
Recommendation: Cloud
Why pick cloud?
1. Flexibility. Cloud-based services are ideal for businesses with growing or fluctuating bandwidth demands.
2. Disaster recovery. Businesses of all sizes should be investing in robust disaster recovery, by using cloud, you avoid
large up-front investment and roll up third-party expertise as part of the deal.
3. Automatic software updates. Provider take care that for you and roll-out regular software updates – including security
updates – so you don’t have to worry about wasting time maintaining the system yourself.
4. Capital-expenditure Free. You simply pay as you go and enjoy a subscription-based model that’s kind to your cash.
5. Increased collaboration. When your teams can access, edit and share documents anytime.
6. Work from anywhere. With cloud computing, if you’ve got an internet connection you can be at work. And with most
serious cloud services offering mobile apps, you’re not restricted by which device you’ve got to hand.
The result? Businesses can offer more flexible working perks to employees so they can enjoy the work-life balance
that suits them – without productivity taking a hit.
1. Document control. Before the cloud, workers had to send files back and forth as email attachments to be worked on
by one user at a time. Sooner or later – usually sooner – you end up with a mess of conflicting file content, formats
and titles.
2. Security. Lost laptops are a billion dollar business problem. And potentially greater than the loss of an expensive
piece of kit is the loss of the sensitive data inside it. Cloud computing gives you greater security when this happens.
3. Competitiveness. Wish there was a simple step you could take to become more competitive? Moving to the cloud
gives access to enterprise-class technology, for everyone.
4. Environmentally friendly. While the above points spell out the benefits of cloud computing for your business, moving
to the cloud isn’t an entirely selfish act. The environment gets a little love too. When your cloud needs fluctuate, your
server capacity scales up and down to fit.
Any three of the above benefits would be enough to convince many businesses to move their business into the cloud.
But when you add up all ten? It’s approaching no-brainer territory.
Google, AWS, Azure, or Cloudera Cloud?
Why Google
● Big Data, no waiting.
Based on the same distributed data services we use at Google, Google BigQuery, Google Cloud
Datalab and Google Cloud Dataproc are changing how you analyze and use data. Customers say
tools like BigQuery are “nearly magical” because of their performance. Queries that used to take
hours or days now take minutes or seconds. The result: more insights and value, realized by more
people in more companies. Learn more
● Context-rich applications
Good apps tailor their behavior to us. Great apps delight us by suggesting what we want before we
know it ourselves. Apps should respond to context, telling us the best route to drive right now, not
just when we started. Products like Google Cloud Dataflow and Google Cloud Pub/Sub make it
easier for your code to use huge amounts of data to deliver amazing context-rich experiences.
● Citizen Data Science
Rather than keep your most powerful data tools in the hands of a few experts, Google Cloud
Platform unlocks data to empower your entire organization. Our tools like BigQuery and Cloud
Datalab bring data directly to the people who run your business, because they’re most likely to find
the insights that create value.
● Machine Intelligence
Google Cloud Machine Learning gives everyone access to the deep learning systems that power
services like Google Translate, Google Photos, voice search in the Google app, and smart replies in
Gmail. Soon available as a cloud service (alpha), so it’s easy to incorporate in your app.
Why AWS
● AWS is the cloud market leader
AWS is a market leader with 40% of market capitalization in 2017. Its vast list of features makes it
lucrative for larger corporations whereas a growing infrastructure promises scalability and price cuts
for upstarts.
● Very Flexible
AWS enables you to select the operating system, programming language, web application platform,
database, and other services you need. With AWS, you receive a virtual environment that lets you
load the software and services your application requires. This eases the migration process for
existing applications while preserving options for building new solutions.
Why Azure (ADLA: Azure Data Lake Analytics)
● Only one language to learn
OK, this is kind of tricky, because, although there is only one language to learn, U-SQL, it's really an
amalgam of SQL and C# – so if you're familiar with either or both of those languages then you'll be
ready to tackle U-SQL. If you're not familiar with either, then it's OK, you don't have to be a SQL or
C# master to understand U-SQL. With Hadoop, you'll have a few more languages to learn – I'd say
at least six (Hive, Pig, Java, Scala, Python, Bash; yup, six).
● Only offered as a platform service
Hadoop comes in many different flavors, some running on-premises, others running in the cloud.
Some are managed BY you, others are managed FOR you. ADLA, however, is offered ONLY as a
platform service in the Microsoft Azure Cloud. It's managed by Microsoft — you'll never have to
troubleshoot a cluster problem with ADLA (only your own code). It's also integrated with Azure
Active Directory, so you don't have to manage security separately.
Really, all you have to worry about when you set up an Azure Data Lake Analytics account is
building your application.
● Pricing per job; not per hour
Most Big Data cloud offerings that are available are priced per hour based on how long you keep
your cluster up and running. ADLA takes a different approach to pricing. With ADLA, you pay for
each individual job that is run. As a matter of fact, just owning an Azure Data Lake Analytics account
doesn't cost anything. You aren't even billed for the account until you run a job. That's pretty unique
in the Big Data space.
Why Cloudera Cloud
● Backed by best Big Data Software
No one knows Apache Hadoop like Cloudera. Period!.
● Elastic
Cloudera’s platform can leverage elastic infrastructure to size compute and storage independently,
grow and shrink clusters dynamically, and clone net-new clusters for ad-hoc, transient workloads.
● Portable
Preserve business flexibility and minimize cloud lock-in. Run your Cloudera workloads on any public
cloud, or even in multi-cloud environments. You can even switch from Cloud to On Premise without
hassle
● Enterprise Grade
Reduce your risk with the only big data platform that delivers comprehensive manageability,
availability, security, and governance required for production workloads.
Recommendation: Google Cloud Platform
Benchmarks:
● Tools and Support: Google
● Platform Expertise: Google
● Performance: Google
● Integration: Google
● Flexibility: Cloudera
● Easy To Learn: AWS
● Pricing: Azure
Architecture
GCP: Big Data Life Cycle
Basic Data Lake Workflows
Real Time Self Reporting Big Data Dashboard
Machine Learning Integration
Data Warehouse transitions to GCP
How to run GCP and
Existing System On-
Premise?
Understanding data stream using Data Flow Engine
Ingest Data into Google BigQuery Using Talend
Question?
sofyan.h.ahmad@gmail.com

More Related Content

What's hot

Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 
Auto AI : AI used to create AI applications
Auto AI : AI used to create AI applicationsAuto AI : AI used to create AI applications
Auto AI : AI used to create AI applicationsKaran Sachdeva
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIRightScale
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
5 Pillars of API Management
5 Pillars of API Management5 Pillars of API Management
5 Pillars of API ManagementRich Graham
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Precisely
 
CONNtext presentation
CONNtext presentationCONNtext presentation
CONNtext presentationArmedia LLC
 
Hooduku - Cloud and Big data practice
Hooduku - Cloud and Big data practiceHooduku - Cloud and Big data practice
Hooduku - Cloud and Big data practicehooduku
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsAhmed Tayeh
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
SAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP Technology
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsEDB
 
Azure Overview Business Model Overview
Azure Overview Business Model OverviewAzure Overview Business Model Overview
Azure Overview Business Model Overviewrramabad
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Precisely
 

What's hot (20)

Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Auto AI : AI used to create AI applications
Auto AI : AI used to create AI applicationsAuto AI : AI used to create AI applications
Auto AI : AI used to create AI applications
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROI
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
5 Pillars of API Management
5 Pillars of API Management5 Pillars of API Management
5 Pillars of API Management
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
 
CONNtext presentation
CONNtext presentationCONNtext presentation
CONNtext presentation
 
Hooduku - Cloud and Big data practice
Hooduku - Cloud and Big data practiceHooduku - Cloud and Big data practice
Hooduku - Cloud and Big data practice
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
SAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence Overview
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOps
 
Azure Overview Business Model Overview
Azure Overview Business Model OverviewAzure Overview Business Model Overview
Azure Overview Business Model Overview
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 

Similar to Big Data Platform Recommendation: Cloud is Best for Flexibility, Security & Costs

The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfThe Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfAlzenaLimon
 
GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole Vasu S
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlBenefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlMezzybatliwala
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
 
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptx
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptxHow Are Cloud Services Outweighing On-Premise Solutions In 2022.pptx
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptxArpitGautam20
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | QuboleVasu S
 
Advantages of Moving to the Cloud Platform
Advantages of Moving to the Cloud PlatformAdvantages of Moving to the Cloud Platform
Advantages of Moving to the Cloud PlatformSoluzione IT Services
 
Factors that Can Contribute in Enhancing Hybrid Cloud Benefits
Factors that Can Contribute in Enhancing Hybrid Cloud BenefitsFactors that Can Contribute in Enhancing Hybrid Cloud Benefits
Factors that Can Contribute in Enhancing Hybrid Cloud BenefitsWeb Werks Data Centers
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudDWP Information Architects Inc.
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Cloud Computing & Control Auditing
Cloud Computing & Control AuditingCloud Computing & Control Auditing
Cloud Computing & Control AuditingNavin Malhotra
 
Cloud Adaption and Migration - Raghvendra Prabhu
Cloud Adaption and Migration - Raghvendra PrabhuCloud Adaption and Migration - Raghvendra Prabhu
Cloud Adaption and Migration - Raghvendra PrabhuRaghavendra Prabhu
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper Assem mousa
 

Similar to Big Data Platform Recommendation: Cloud is Best for Flexibility, Security & Costs (20)

The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfThe Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
 
GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlBenefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptx
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptxHow Are Cloud Services Outweighing On-Premise Solutions In 2022.pptx
How Are Cloud Services Outweighing On-Premise Solutions In 2022.pptx
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Advantages of Moving to the Cloud Platform
Advantages of Moving to the Cloud PlatformAdvantages of Moving to the Cloud Platform
Advantages of Moving to the Cloud Platform
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
Factors that Can Contribute in Enhancing Hybrid Cloud Benefits
Factors that Can Contribute in Enhancing Hybrid Cloud BenefitsFactors that Can Contribute in Enhancing Hybrid Cloud Benefits
Factors that Can Contribute in Enhancing Hybrid Cloud Benefits
 
Choosing the Right Cloud Provider
Choosing the Right Cloud ProviderChoosing the Right Cloud Provider
Choosing the Right Cloud Provider
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the Cloud
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Cloud Computing & Control Auditing
Cloud Computing & Control AuditingCloud Computing & Control Auditing
Cloud Computing & Control Auditing
 
Cloud Adaption and Migration - Raghvendra Prabhu
Cloud Adaption and Migration - Raghvendra PrabhuCloud Adaption and Migration - Raghvendra Prabhu
Cloud Adaption and Migration - Raghvendra Prabhu
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Big Data Platform Recommendation: Cloud is Best for Flexibility, Security & Costs

  • 1. Big Data Platform and Architecture Recommendation Sofyan Hadi Ahmad sofyan.ahmad@mandalalabs.com November 2018
  • 2. Why Big Data is Important? Despite the hype, many organizations don’t realize they have a big data problem or they simply don’t think of it in terms of big data. In general, an organization is likely to benefit from big data technologies when existing databases and applications can no longer scale to support sudden increases in volume, variety, and velocity of data. Failure to correctly address big data challenges can result in escalating costs, as well as reduced productivity and competitiveness. On the other hand, a sound big data strategy can help organizations reduce costs and gain operational efficiencies by migrating heavy existing workloads to big data technologies; as well as deploying new applications to capitalize on new opportunities.
  • 3. How Does Big Data Work? With new tools that address the entire data management cycle, big data technologies make it technically and economically feasible, not only to collect and store larger datasets, but also to analyze them in order to uncover new and valuable insights. In most cases, big data processing involves a common data flow – from collection of raw data to consumption of actionable information. ● Collect. Collecting the raw data – transactions, logs, mobile devices and more – is the first challenge many organizations face when dealing with big data. A good big data platform makes this step easier, allowing developers to ingest a wide variety of data – from structured to unstructured – at any speed – from real-time to batch. ● Store. Any big data platform needs a secure, scalable, and durable repository to store data prior or even after processing tasks. Depending on your specific requirements, you may also need temporary stores for data in-transit. ● Process & Analyze. This is the step where data is transformed from its raw state into a consumable format – usually by means of sorting, aggregating, joining and even performing more advanced functions and algorithms. The resulting data sets are then stored for further processing or made available for consumption via business intelligence and data visualization tools. ● Consume & Visualize. Big data is all about getting high value, actionable insights from your data assets. Ideally, data is made available to stakeholders through self-service business intelligence and agile data visualization tools that allow for fast and easy exploration of datasets. Depending on the type of analytics, end-users may also consume the resulting data in the form of statistical “predictions” – in the case of predictive analytics – or recommended actions – in the case of prescriptive analytics.
  • 4. Automate Business Process Self Serve Reporting Platform Machine Learning Analytics & BI Data Lakes Big Data Infrastructure Big Data & ML Goals
  • 6. Which platform is the best to implement big data? ● Cloud or on-Premise? ● If Cloud, Google, AWS, Azure, or Cloudera Cloud? ● If on-premise, is Cloudera is the best way to go?
  • 7. Cloud or On-Premise? ● Why Cloud ○ Planning, research, analytics for a lower ○ No Server ○ Get Started Tomorrow ○ Grow as fast you can imagine ○ Get out of old IT business line ○ Part of your Opex, not Capex ○ Low commitment, you can jump from one to another provider, just like counting 1, 2, 3 ○ Safer and 99% uptime ● Why On-Premise ○ Meet your inhouse policies ○ You own the software, and the hardware Recommendation: Cloud
  • 8. Why pick cloud? 1. Flexibility. Cloud-based services are ideal for businesses with growing or fluctuating bandwidth demands. 2. Disaster recovery. Businesses of all sizes should be investing in robust disaster recovery, by using cloud, you avoid large up-front investment and roll up third-party expertise as part of the deal. 3. Automatic software updates. Provider take care that for you and roll-out regular software updates – including security updates – so you don’t have to worry about wasting time maintaining the system yourself. 4. Capital-expenditure Free. You simply pay as you go and enjoy a subscription-based model that’s kind to your cash. 5. Increased collaboration. When your teams can access, edit and share documents anytime. 6. Work from anywhere. With cloud computing, if you’ve got an internet connection you can be at work. And with most serious cloud services offering mobile apps, you’re not restricted by which device you’ve got to hand. The result? Businesses can offer more flexible working perks to employees so they can enjoy the work-life balance that suits them – without productivity taking a hit. 1. Document control. Before the cloud, workers had to send files back and forth as email attachments to be worked on by one user at a time. Sooner or later – usually sooner – you end up with a mess of conflicting file content, formats and titles. 2. Security. Lost laptops are a billion dollar business problem. And potentially greater than the loss of an expensive piece of kit is the loss of the sensitive data inside it. Cloud computing gives you greater security when this happens. 3. Competitiveness. Wish there was a simple step you could take to become more competitive? Moving to the cloud gives access to enterprise-class technology, for everyone. 4. Environmentally friendly. While the above points spell out the benefits of cloud computing for your business, moving to the cloud isn’t an entirely selfish act. The environment gets a little love too. When your cloud needs fluctuate, your server capacity scales up and down to fit. Any three of the above benefits would be enough to convince many businesses to move their business into the cloud. But when you add up all ten? It’s approaching no-brainer territory.
  • 9. Google, AWS, Azure, or Cloudera Cloud? Why Google ● Big Data, no waiting. Based on the same distributed data services we use at Google, Google BigQuery, Google Cloud Datalab and Google Cloud Dataproc are changing how you analyze and use data. Customers say tools like BigQuery are “nearly magical” because of their performance. Queries that used to take hours or days now take minutes or seconds. The result: more insights and value, realized by more people in more companies. Learn more ● Context-rich applications Good apps tailor their behavior to us. Great apps delight us by suggesting what we want before we know it ourselves. Apps should respond to context, telling us the best route to drive right now, not just when we started. Products like Google Cloud Dataflow and Google Cloud Pub/Sub make it easier for your code to use huge amounts of data to deliver amazing context-rich experiences. ● Citizen Data Science Rather than keep your most powerful data tools in the hands of a few experts, Google Cloud Platform unlocks data to empower your entire organization. Our tools like BigQuery and Cloud Datalab bring data directly to the people who run your business, because they’re most likely to find the insights that create value. ● Machine Intelligence Google Cloud Machine Learning gives everyone access to the deep learning systems that power services like Google Translate, Google Photos, voice search in the Google app, and smart replies in Gmail. Soon available as a cloud service (alpha), so it’s easy to incorporate in your app.
  • 10. Why AWS ● AWS is the cloud market leader AWS is a market leader with 40% of market capitalization in 2017. Its vast list of features makes it lucrative for larger corporations whereas a growing infrastructure promises scalability and price cuts for upstarts. ● Very Flexible AWS enables you to select the operating system, programming language, web application platform, database, and other services you need. With AWS, you receive a virtual environment that lets you load the software and services your application requires. This eases the migration process for existing applications while preserving options for building new solutions.
  • 11. Why Azure (ADLA: Azure Data Lake Analytics) ● Only one language to learn OK, this is kind of tricky, because, although there is only one language to learn, U-SQL, it's really an amalgam of SQL and C# – so if you're familiar with either or both of those languages then you'll be ready to tackle U-SQL. If you're not familiar with either, then it's OK, you don't have to be a SQL or C# master to understand U-SQL. With Hadoop, you'll have a few more languages to learn – I'd say at least six (Hive, Pig, Java, Scala, Python, Bash; yup, six). ● Only offered as a platform service Hadoop comes in many different flavors, some running on-premises, others running in the cloud. Some are managed BY you, others are managed FOR you. ADLA, however, is offered ONLY as a platform service in the Microsoft Azure Cloud. It's managed by Microsoft — you'll never have to troubleshoot a cluster problem with ADLA (only your own code). It's also integrated with Azure Active Directory, so you don't have to manage security separately. Really, all you have to worry about when you set up an Azure Data Lake Analytics account is building your application. ● Pricing per job; not per hour Most Big Data cloud offerings that are available are priced per hour based on how long you keep your cluster up and running. ADLA takes a different approach to pricing. With ADLA, you pay for each individual job that is run. As a matter of fact, just owning an Azure Data Lake Analytics account doesn't cost anything. You aren't even billed for the account until you run a job. That's pretty unique in the Big Data space.
  • 12. Why Cloudera Cloud ● Backed by best Big Data Software No one knows Apache Hadoop like Cloudera. Period!. ● Elastic Cloudera’s platform can leverage elastic infrastructure to size compute and storage independently, grow and shrink clusters dynamically, and clone net-new clusters for ad-hoc, transient workloads. ● Portable Preserve business flexibility and minimize cloud lock-in. Run your Cloudera workloads on any public cloud, or even in multi-cloud environments. You can even switch from Cloud to On Premise without hassle ● Enterprise Grade Reduce your risk with the only big data platform that delivers comprehensive manageability, availability, security, and governance required for production workloads.
  • 13. Recommendation: Google Cloud Platform Benchmarks: ● Tools and Support: Google ● Platform Expertise: Google ● Performance: Google ● Integration: Google ● Flexibility: Cloudera ● Easy To Learn: AWS ● Pricing: Azure
  • 15. GCP: Big Data Life Cycle
  • 16. Basic Data Lake Workflows
  • 17. Real Time Self Reporting Big Data Dashboard
  • 20. How to run GCP and Existing System On- Premise?
  • 21. Understanding data stream using Data Flow Engine
  • 22. Ingest Data into Google BigQuery Using Talend