Why I/O Is Strategicfor Big DataPresented by: Emulex andEvaluator Group                           1
Webcast Housekeeping1.   All attendees will be on mute during the presentation2.   Please submit your questions via the te...
Why I/O Is StrategicKatherine LaneDirector of CorporateCommunications                        3
Why I/O Is Strategic?          Building a Virtual Panel of Experts!                                                 4
Topics for the Virtual Panel    Server         Cloud       Big      NetworkVirtualization   Computing     Data   Convergen...
Moving the Elephant                 Through the Pipes                                John Webster                         ...
Overview “Big data” can mean two different things       — Storage for large amounts of data       — Analytics against ver...
Customer                         Data Analytics Model for Individualized Marketing    ProfilesNoSQL DB                    ...
Distributed, Shared-Nothing    Architectures for Big Data Analytics                                            1          ...
CAP theorem           It is impossible for a distributed computer system           to simultaneously provide all three of ...
The Impact of Network and I/O Performance The impacts of internal analytics system  network performance—both positive and...
Internal Network Throughput 1GbE© 2012 Evaluator Group, Inc.
Internal Network Throughput 10GbE© 2012 Evaluator Group, Inc.
Load/Unload Throughput© 2012 Evaluator Group, Inc.
Why Enterprise IT is Now Involved Distributed computing for analytics (Hadoop, for example) is  moving from science exper...
Is Hadoop Ready for Prime Time? Hadoop was not born and raised in the highly  risk averse, enterprise data center Hadoop...
Shared Storage as Secondary Storage                      Network                1   Link     2   3   Link     4   5       ...
Shared Storage as Primary Storage                      Network                1   Link     2   3    Link    4   5       Li...
Evaluating Hadoop as a Storage Device                              Single Points of Failure Eliminated?                  ...
Enterprise IT and Big Data Analytics There will be Big Data—Storage and Apps The traditional data warehouse will continu...
© 2011 Emulex Corporation   21
Upcoming SlideShare
Loading in …5
×

Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

812 views

Published on

This webcast is the fourth in a series on why I/O is strategic for the data center. John Webster, senior partner at the Evaluator Group, will discuss why I/O is critically important to meet the bandwidth demands of big data deployments. As the data center infrastructure scales upward, so will the need for I/O to scale dynamically to meet these needs.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
812
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Emulex Branding Americas - Focus On Top OEM & DMR GroupsAPAC and EMEA – 10Gb VAR MediaEmulex = Ethernet#1 for Web SearchesGoogle, Yahoo, Bing, BaiduDMR Search Engine PlacementSocial Media Community BuildingIO Blender.com & Linked In Convergence CommunityECE – Emulex Connected Experience – End User Loyalty ProgramCustomized Content DeliveryMYEMULEX.com, iPhone App - Connected CardsTargeted Push (iSCSI, VMware, Oracle, MSFT, FC, Convergence)SF.com lead and community maturation
  • Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data

    1. 1. Why I/O Is Strategicfor Big DataPresented by: Emulex andEvaluator Group 1
    2. 2. Webcast Housekeeping1. All attendees will be on mute during the presentation2. Please submit your questions via the text/chat feature3. We will do all Q&A at the end of the presentation 2
    3. 3. Why I/O Is StrategicKatherine LaneDirector of CorporateCommunications 3
    4. 4. Why I/O Is Strategic? Building a Virtual Panel of Experts! 4
    5. 5. Topics for the Virtual Panel Server Cloud Big NetworkVirtualization Computing Data Convergence 5
    6. 6. Moving the Elephant Through the Pipes John Webster Senior Partner Evaluator Group© 2012 Evaluator Group, Inc.
    7. 7. Overview “Big data” can mean two different things — Storage for large amounts of data — Analytics against very large amounts of data — I/O is critical for both Big Data Apps — Personalized Healthcare — Online-style shopping for bricks-and-mortar retailers — Fraud detection Marketing Needs it Now — Correlate customer data with social media data feeds — Understand the buyer as an individual 12/11/2012 7© 2012 Evaluator Group, Inc.
    8. 8. Customer Data Analytics Model for Individualized Marketing ProfilesNoSQL DB High Scale Data HDFS BI and Reductions Analytics Logs, Predictions Tweets Location on Buying Behavior 4) Real-time: Expert System Determine Best Offer For This Low 3) Input Into CustomerLatency 1) Identify 2b) Lookup NoSQL DB User Location 2a)Lookup User Profile 12/11/2012 8 © 2012 Evaluator Group, Inc.
    9. 9. Distributed, Shared-Nothing Architectures for Big Data Analytics 1 2 3 4 5 6 7 8 Console Network Link Link Link Link Pwr B8GMR3 Active Active Active Active Active Layer C N N N N O O O O O N D D D D Compute T E E E E R Layer O L 1 2 3 n Storage Layer DAS DAS DAS DAS DAS 12/11/2012 9© 2012 Evaluator Group, Inc.
    10. 10. CAP theorem It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:  Consistency (all nodes see the same data at the same time)  Availability (a guarantee that every request receives a response about whether it was successful or failed)  Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) A distributed system can satisfy any two of these guarantees at the same time, but not all three© 2012 Evaluator Group, Inc.
    11. 11. The Impact of Network and I/O Performance The impacts of internal analytics system network performance—both positive and negative—are experienced at the level of analytics application users. The rate at which data flows between storage and processors within a Hadoop cluster has a direct effect on cluster performance and scalability. Getting data into and out of distributed computing clusters impacts how quickly query results are delivered to users. 12/11/2012 11© 2012 Evaluator Group, Inc.
    12. 12. Internal Network Throughput 1GbE© 2012 Evaluator Group, Inc.
    13. 13. Internal Network Throughput 10GbE© 2012 Evaluator Group, Inc.
    14. 14. Load/Unload Throughput© 2012 Evaluator Group, Inc.
    15. 15. Why Enterprise IT is Now Involved Distributed computing for analytics (Hadoop, for example) is moving from science experiment to mission-critical Emerging Enterprise Hadoop use cases include: — Hadoop for very large data sets that can’t be analyzed economically by the data warehouse — Hadoop on the front-end of the data warehouse — Hadoop as data convergence engine – combine new unstructured data sources with structured data warehouse data — Hadoop as the back-end to the data warehouse Also emerging in the need to bring Hadoop under the data governance umbrella — Use case for NAS/SAN attached to Hadoop clusters? — At what cost? 12/11/2012 15© 2012 Evaluator Group, Inc.
    16. 16. Is Hadoop Ready for Prime Time? Hadoop was not born and raised in the highly risk averse, enterprise data center Hadoop puts forward a different and inefficient operational model from the standpoint of enterprise IT Hadoop introduces enterprise security and data governance issues 12/11/2012 16© 2012 Evaluator Group, Inc.
    17. 17. Shared Storage as Secondary Storage Network 1 Link 2 3 Link 4 5 Link 6 7 Link 8 Pwr Console B8GMR3 Active Active Active Active Active Layer C N N N N O N O O O O Compute T R D D D D Layer O L E E E E 1 2 3 n Storage Layer SAN/NAS© 2012 Evaluator Group, Inc.
    18. 18. Shared Storage as Primary Storage Network 1 Link 2 3 Link 4 5 Link 6 7 Link 8 Pwr Console B8GMR3 Active Active Active Active Active Layer C N N N N O N O O O O Compute T R D D D D Layer O L E E E E 1 2 3 n Storage SAN and Scale-out NAS Layer© 2012 Evaluator Group, Inc.
    19. 19. Evaluating Hadoop as a Storage Device  Single Points of Failure Eliminated?  SSD and automated tiering?  Dedupe?  Snapshots?  Insert your hot-button storage feature here: __________© 2012 Evaluator Group, Inc.
    20. 20. Enterprise IT and Big Data Analytics There will be Big Data—Storage and Apps The traditional data warehouse will continue to evolve Distributed computing clusters (XxSQL, Hadoop) will achieve prominence in enterprise data centers Shared storage, while controversial within some circles, can be applied Communications bandwidth is as important a resource as compute and storage 12/11/2012 20© 2012 Evaluator Group, Inc.
    21. 21. © 2011 Emulex Corporation 21

    ×