Mobile, Big Data, Cloud, Security, Virtualization
http://www.exceleratesystems.com
David Bennett - CEO
2
• Founded in 2008
• Excelerate Systems is a leading Company in the Americas focusing on Big
Data, Cloud, IT Operations a...
3
• 4 Technology Areas:
Security
IT Operations
&
The Data
Center
Cloud
Services
Cloud Services, IaaS and SaaS
Big Data
5
Big Data and Hadoop
6
Storage Only Grid (original raw data)
Instrumentation
Collection
RDBMS (aggregated data)
BI Reports + Interactive Apps
M...
7
The Solution: A Combined Storage/Compute Layer
Hadoop: Storage + Compute Grid
Instrumentation
Collection
RDBMS (aggregat...
So What is Apache Hadoop ?
• A scalable fault-tolerant distributed system for data storage and
processing (open source und...
The Hadoop Big Bang
9
• Fastest sort of a TB, 62secs
over 1,460 nodes
• Sorted a PB in 16.25hours
over 3,658 nodes
Hadoop ...
The Key Benefit: Agility/Flexibility
10
Schema-on-Read (Hadoop):Schema-on-Write (RDBMS):
• Schema must be created before
a...
Scalability: Scalable Software Development
11
Grows without requiring developers to
re-architect their algorithms/applicat...
Economics: Return on Byte
• Return on Byte (ROB) = value to be extracted from that
byte divided by the cost of storing tha...
The Big Data Platform: CDH5
16
CDH in the Enterprise Data Stack
Logs Files Web Data
Relational
Databases
IDEs
BI /
Analytics
Enterprise
Reporting
Enterpr...
HBase versus HDFS
HDFS: HBase:
Use For:
• Dimension tables which are updated
frequently and require random low-
latency lo...
• Retail: Price Optimization
• Media: Content Targeting
• Finance: Fraud Detection
• Manufacturing: Diagnostics
• Info Ser...
1. FLEXIBILITY
STORE ANY DATA
RUN ANY ANALYSIS
KEEP’S PACE WITH THE RATE OF CHANGE OF INCOMING DATA
2. SCALABILITY
PROVEN ...
How do I start?
21
I
II
III
IV
4 Options
Cloudera cluster up and running in the Cloud in 24 hours.
Use and Excelerate Syst...
Cloudera from Excelerate Systems
22
There is a worldwide shortage of Big Data skills,
especially in Latin America. Exceler...
• 8 Certified Cloudera Developers
• 6 Certified Cloudera Administrators
• 2 Hbase developers
• 2 Hadoop Developers
• 2 Dat...
25
Questions and next steps
David Bennett, CEO David.bennett@exceleratesystems.net
Victor Pichardo, President, Victor.pich...
26
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
Upcoming SlideShare
Loading in …5
×

Big Data/Cloudera from Excelerate Systems

692
-1

Published on

Learn how Big Data solutions from Excelerate Systems are driving nextgen DataWarehouse optimization.... In other words - if you have BIG data - come and talk to us

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
692
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data/Cloudera from Excelerate Systems

  1. 1. Mobile, Big Data, Cloud, Security, Virtualization http://www.exceleratesystems.com David Bennett - CEO
  2. 2. 2 • Founded in 2008 • Excelerate Systems is a leading Company in the Americas focusing on Big Data, Cloud, IT Operations and Security. • With Offices in the US, Mexico, Chile and France as well as individual contributors in Brasil, Uruguay, Argentina, Canada, Spain, China and India we have a global delivery capability. • 125 customers in 25 countries
  3. 3. 3 • 4 Technology Areas:
  4. 4. Security IT Operations & The Data Center Cloud Services Cloud Services, IaaS and SaaS Big Data
  5. 5. 5 Big Data and Hadoop
  6. 6. 6 Storage Only Grid (original raw data) Instrumentation Collection RDBMS (aggregated data) BI Reports + Interactive Apps Mostly Append ETL Compute Grid 1. Moving Data To Compute Doesn’t Scale 3. Can’t Explore Original High Fidelity Raw Data 2. Archiving = Premature Data Death The Problems with Current Data Systems
  7. 7. 7 The Solution: A Combined Storage/Compute Layer Hadoop: Storage + Compute Grid Instrumentation Collection RDBMS (aggregated data) BI Reports + Interactive Apps 3. Data Exploration & Advanced Analytics 2. Keep Data Alive For Ever (Active Archive) 1. Scalable Throughput For ETL & Aggregation (ETL Acceleration) Mostly Append
  8. 8. So What is Apache Hadoop ? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license). • Core Hadoop has two main systems: • Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction. • Key business values: • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options. 8
  9. 9. The Hadoop Big Bang 9 • Fastest sort of a TB, 62secs over 1,460 nodes • Sorted a PB in 16.25hours over 3,658 nodes Hadoop World 2009, 500 attendees
  10. 10. The Key Benefit: Agility/Flexibility 10 Schema-on-Read (Hadoop):Schema-on-Write (RDBMS): • Schema must be created before any data can be loaded. • An explicit load operation has to take place which transforms data to DB internal serialization format. • New columns must be added explicitly before new data for such columns can be loaded into the database. • OLAP is Fast • Standards/Governance • Data is simply copied to the file store, no transformation is needed. • A SerDe (Serializer/Deserlizer) is applied during read time to extract the required columns (late binding) • New data can start flowing anytime and will appear retroactively once the SerDe is updated to parse it. • Load is Fast • Flexibility/Agility Pros
  11. 11. Scalability: Scalable Software Development 11 Grows without requiring developers to re-architect their algorithms/application. AUTO SCALE
  12. 12. Economics: Return on Byte • Return on Byte (ROB) = value to be extracted from that byte divided by the cost of storing that byte • If ROB is < 1 then it will be buried into tape wasteland, thus we need more economical active storage. 12 Low ROB High ROB
  13. 13. The Big Data Platform: CDH5 16
  14. 14. CDH in the Enterprise Data Stack Logs Files Web Data Relational Databases IDEs BI / Analytics Enterprise Reporting Enterprise Data Warehouse Online Serving Systems Cloudera Manager SYSTEM OPERATORS ENGINEERS ANALYSTS BUSINESS USERS Web/Mobile Applications CUSTOMERS Sqoop Sqoop Sqoop FlumeFlumeFlume Modeling Tools DATA SCIENTISTS DATA ARCHITECTS Meta Data/ ETL Tools ODBC, JDBC, NFS, HTTP 17
  15. 15. HBase versus HDFS HDFS: HBase: Use For: • Dimension tables which are updated frequently and require random low- latency lookups. Use For: • Fact tables that are mostly append only and require sequential full table scans. Optimized For: • Large Files • Sequential Access (Hi Throughput) • Append Only Optimized For: • Small Records • Random Access (Lo Latency) • Atomic Record Updates Not Suitable For: • Low Latency Interactive OLAP. 18
  16. 16. • Retail: Price Optimization • Media: Content Targeting • Finance: Fraud Detection • Manufacturing: Diagnostics • Info Services: Satellite Imagery • Agriculture: Seed Optimization • Power: Smart Consumption Use Case Examples 19
  17. 17. 1. FLEXIBILITY STORE ANY DATA RUN ANY ANALYSIS KEEP’S PACE WITH THE RATE OF CHANGE OF INCOMING DATA 2. SCALABILITY PROVEN GROWTH TO PBS/1,000s OF NODES NO NEED TO REWRITE QUERIES, AUTOMATICALLY SCALES KEEP’S PACE WITH THE RATE OF GROWTH OF INCOMING DATA 3. ECONOMICS COST PER TB AT A FRACTION OF OTHER OPTIONS KEEP ALL OF YOUR DATA ALIVE IN AN ACTIVE ARCHIVE POWERING THE DATA BEATS ALGORITHM MOVEMENT 20 Core Benefits of the Platform for Big Data
  18. 18. How do I start? 21 I II III IV 4 Options Cloudera cluster up and running in the Cloud in 24 hours. Use and Excelerate Systems Data Scientist to set customer’s Data strategy.. Get an on-premise Cloudera Cluster up and running in 5 days with 5 nodes and upto 10TB of Data.. Training: Customers who invest in training are generally more successful than those who do not.
  19. 19. Cloudera from Excelerate Systems 22 There is a worldwide shortage of Big Data skills, especially in Latin America. Excelerate Systems has invested heavily in building a global network of certified specialists in Cloudera who can design, implement, configure, develop and Support Big Data solutions. No other company in the region has these skills yet. Excelerate Systems is Cloudera’s Primary partner in the region.
  20. 20. • 8 Certified Cloudera Developers • 6 Certified Cloudera Administrators • 2 Hbase developers • 2 Hadoop Developers • 2 Data Scientists Excelerate Systems Big Data Resources
  21. 21. 25 Questions and next steps David Bennett, CEO David.bennett@exceleratesystems.net Victor Pichardo, President, Victor.pichardo@exceleratesystems.net Alex Campos, Systems Engineer, alex.campos@exceleratesystems.net Plus consulting Resources in various countries
  22. 22. 26

×