With more and more data being generated and stored in the cloud, you need a modern data platform that can extend to any environment so you can derive value from all your data. Cloudera Enterprise is the leading enterprise Hadoop platform for cloud deployments. It’s the easiest way to manage and secure Hadoop data across any cloud environment and includes component-level support for cloud-native object stores. This makes the platform uniquely suited to handle transient jobs like ETL and BI analytics, as well as persistent workloads like stream processing and advanced analytics.
With the recent release of Cloudera 5.8, Apache Impala (incubating) has added support for Amazon S3, enabling business analysts to get instant insights from all data through high-performance exploratory analytics and BI.
3 Things to learn:
Join David Tishgart, Director of Product Marketing, and James Curtis, Senior Analyst Data Platforms & Analytics at 451 Research, as they discuss:
* Best practices for analytic workloads in the cloud
* A live demo and real-world use cases
* What’s next for Cloudera and the cloud
4. 451 Research is a leading IT research & advisory company
4
Founded in 2000
250+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
2,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
6. Data Gravity: Play It as It Lies
Cloud Compute
In golf, the USGA Rule 13 states that ‘the ball must be played where it lies….’
Data gravity: data being analyzed where it resides
(instead of moving the data to a data warehouse for analysis)
Our research shows 33% of IT storage professionals already use cloud storage services
and 23.4% plan to purchase cloud services in next 90 days
Source: 451 Research, Voice of Enterprise Storage, Q1 2016.
7. Public Cloud Infrastructure (IaaS, PaaS) Storage Growth
Cloud Compute
Source: 451 Research, Voice of the Enterprise Storage, Q1 2016.
Database/DW – Today
Database/DW – 2 Yrs.
Data Analytics/BI – Today
Data Analytics/BI – 2 Yrs.
Big Data – Today
Big Data – 2 Yrs.
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
8. Why Use Cloud for Analytics/BI and Big Data?
Source: 451 Research, Voice of the Enterprise Cloud, Workloads, and Key Projects 2016.
Analytics/BI Workloads Big Data Workloads
9. Benefits—Moving Analytic Workloads to Cloud Storage
Financial—costs significantly less to store data in S3 (AWS) over HDFS running on EC2 (AWS)
HDFS requires three copies of each block, whereas cloud storage uses auto backups and
compression
Scalability
— HDFS inherently scales; it requires manual configuration and management. Cloud storage
automatically scales as data is added, requiring no user involvement
— Ability to scale storage and compute separately, based on use case and need
Durability/Persistence—with HDFS on EC2, data is persisted for the life of the instance. Cloud
storage is designed to deliver data durability to 99.9999%
10. Considerations: Leveraging the Cloud for Analytics
Cloud Compute
Remember data gravity. It’s much easier to analyze the data where it resides, particularly if
it’s in the cloud
If moving to the cloud, know why and what you’re getting into. If data is on premise and
you move it to the cloud, then keep it in cloud
Because cloud storage separates storage from compute (versus local storage),
performance may be impacted
When moving objects to cloud storage, be aware size limitations for objects
Cloud providers enable security but organizations will want to implement platform-level
security as well
Avoid looking at cloud or cloud storage from a single benefit. Does it benefit the
organization across multiple dimensions?
It’s not necessarily an either/or proposition. It’s acceptable for enterprises to run a mix of
cloud, cloud storage, and on-premises