Your SlideShare is downloading. ×
Big Data/Cloudera from Excelerate Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data/Cloudera from Excelerate Systems

549

Published on

Learn how Big Data solutions from Excelerate Systems are driving nextgen DataWarehouse optimization.... In other words - if you have BIG data - come and talk to us

Learn how Big Data solutions from Excelerate Systems are driving nextgen DataWarehouse optimization.... In other words - if you have BIG data - come and talk to us

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
549
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mobile, Big Data, Cloud, Security, Virtualization http://www.exceleratesystems.com David Bennett - CEO
  • 2. 2 • Founded in 2008 • Excelerate Systems is a leading Company in the Americas focusing on Big Data, Cloud, IT Operations and Security. • With Offices in the US, Mexico, Chile and France as well as individual contributors in Brasil, Uruguay, Argentina, Canada, Spain, China and India we have a global delivery capability. • 125 customers in 25 countries
  • 3. 3 • 4 Technology Areas:
  • 4. Security IT Operations & The Data Center Cloud Services Cloud Services, IaaS and SaaS Big Data
  • 5. 5 Big Data and Hadoop
  • 6. 6 Storage Only Grid (original raw data) Instrumentation Collection RDBMS (aggregated data) BI Reports + Interactive Apps Mostly Append ETL Compute Grid 1. Moving Data To Compute Doesn’t Scale 3. Can’t Explore Original High Fidelity Raw Data 2. Archiving = Premature Data Death The Problems with Current Data Systems
  • 7. 7 The Solution: A Combined Storage/Compute Layer Hadoop: Storage + Compute Grid Instrumentation Collection RDBMS (aggregated data) BI Reports + Interactive Apps 3. Data Exploration & Advanced Analytics 2. Keep Data Alive For Ever (Active Archive) 1. Scalable Throughput For ETL & Aggregation (ETL Acceleration) Mostly Append
  • 8. So What is Apache Hadoop ? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license). • Core Hadoop has two main systems: • Hadoop Distributed File System: self-healing high-bandwidth clustered storage. • MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction. • Key business values: • Flexibility – Store any data, Run any analysis. • Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. • Economics – Cost per TB at a fraction of traditional options. 8
  • 9. The Hadoop Big Bang 9 • Fastest sort of a TB, 62secs over 1,460 nodes • Sorted a PB in 16.25hours over 3,658 nodes Hadoop World 2009, 500 attendees
  • 10. The Key Benefit: Agility/Flexibility 10 Schema-on-Read (Hadoop):Schema-on-Write (RDBMS): • Schema must be created before any data can be loaded. • An explicit load operation has to take place which transforms data to DB internal serialization format. • New columns must be added explicitly before new data for such columns can be loaded into the database. • OLAP is Fast • Standards/Governance • Data is simply copied to the file store, no transformation is needed. • A SerDe (Serializer/Deserlizer) is applied during read time to extract the required columns (late binding) • New data can start flowing anytime and will appear retroactively once the SerDe is updated to parse it. • Load is Fast • Flexibility/Agility Pros
  • 11. Scalability: Scalable Software Development 11 Grows without requiring developers to re-architect their algorithms/application. AUTO SCALE
  • 12. Economics: Return on Byte • Return on Byte (ROB) = value to be extracted from that byte divided by the cost of storing that byte • If ROB is < 1 then it will be buried into tape wasteland, thus we need more economical active storage. 12 Low ROB High ROB
  • 13. The Big Data Platform: CDH5 16
  • 14. CDH in the Enterprise Data Stack Logs Files Web Data Relational Databases IDEs BI / Analytics Enterprise Reporting Enterprise Data Warehouse Online Serving Systems Cloudera Manager SYSTEM OPERATORS ENGINEERS ANALYSTS BUSINESS USERS Web/Mobile Applications CUSTOMERS Sqoop Sqoop Sqoop FlumeFlumeFlume Modeling Tools DATA SCIENTISTS DATA ARCHITECTS Meta Data/ ETL Tools ODBC, JDBC, NFS, HTTP 17
  • 15. HBase versus HDFS HDFS: HBase: Use For: • Dimension tables which are updated frequently and require random low- latency lookups. Use For: • Fact tables that are mostly append only and require sequential full table scans. Optimized For: • Large Files • Sequential Access (Hi Throughput) • Append Only Optimized For: • Small Records • Random Access (Lo Latency) • Atomic Record Updates Not Suitable For: • Low Latency Interactive OLAP. 18
  • 16. • Retail: Price Optimization • Media: Content Targeting • Finance: Fraud Detection • Manufacturing: Diagnostics • Info Services: Satellite Imagery • Agriculture: Seed Optimization • Power: Smart Consumption Use Case Examples 19
  • 17. 1. FLEXIBILITY STORE ANY DATA RUN ANY ANALYSIS KEEP’S PACE WITH THE RATE OF CHANGE OF INCOMING DATA 2. SCALABILITY PROVEN GROWTH TO PBS/1,000s OF NODES NO NEED TO REWRITE QUERIES, AUTOMATICALLY SCALES KEEP’S PACE WITH THE RATE OF GROWTH OF INCOMING DATA 3. ECONOMICS COST PER TB AT A FRACTION OF OTHER OPTIONS KEEP ALL OF YOUR DATA ALIVE IN AN ACTIVE ARCHIVE POWERING THE DATA BEATS ALGORITHM MOVEMENT 20 Core Benefits of the Platform for Big Data
  • 18. How do I start? 21 I II III IV 4 Options Cloudera cluster up and running in the Cloud in 24 hours. Use and Excelerate Systems Data Scientist to set customer’s Data strategy.. Get an on-premise Cloudera Cluster up and running in 5 days with 5 nodes and upto 10TB of Data.. Training: Customers who invest in training are generally more successful than those who do not.
  • 19. Cloudera from Excelerate Systems 22 There is a worldwide shortage of Big Data skills, especially in Latin America. Excelerate Systems has invested heavily in building a global network of certified specialists in Cloudera who can design, implement, configure, develop and Support Big Data solutions. No other company in the region has these skills yet. Excelerate Systems is Cloudera’s Primary partner in the region.
  • 20. • 8 Certified Cloudera Developers • 6 Certified Cloudera Administrators • 2 Hbase developers • 2 Hadoop Developers • 2 Data Scientists Excelerate Systems Big Data Resources
  • 21. 25 Questions and next steps David Bennett, CEO David.bennett@exceleratesystems.net Victor Pichardo, President, Victor.pichardo@exceleratesystems.net Alex Campos, Systems Engineer, alex.campos@exceleratesystems.net Plus consulting Resources in various countries
  • 22. 26

×