For netapp haifa 2012 v3

922 views
825 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
922
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

For netapp haifa 2012 v3

  1. 1. Infra Challenges 2012 Pini Cohen VP and Senior Analyst pini@stki.info www.stki.info
  2. 2. What is happening ? `Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph 2
  3. 3. Something’s Happening HerePersonal Computers `Personal Computing Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 3
  4. 4. Game Changer Consumerized IT • “Personal” connectivity • Personal Mobile Computing • Cloud based applications ` Knowledge Individuals • Always connected • Technology Savvy • MultitaskedGalit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph 4
  5. 5. Quiz #8: What does this product do?• Supporting ARM architecture• GUI based on Touch• Online-store (like Apple Store, Android Market) for purchasing SW and distribution• Geo-Location Services ` Is it a phone? Is it a Tablet? Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  6. 6. Its Windows 8!• Supporting ARM architecture (for Tablets, Smartphones?!)• New GUI based on Touch (!) and Silverlight technology• Windows Store (like Apple Store, Android Market) for purchasing SW and distribution `• Geo-Location Services• Big looser might be Intel! Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  7. 7. Social Networks Source: http://www.freeiconsdownload.com/Free_Downloads.asp?id=661 `Pini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  8. 8. The result: ` Source: http://xnews.pk/numl/2011/08/01/result-anounced/Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  9. 9. The balance is changing Cloud application have different needs ` Infra is betterNeed for more Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  10. 10. Big Data• Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.• Examples include web logs, RFID, sensor networks, social networks, Internet text and documents, Internet search indexing, call detail records, genomics, astronomy, biological research, military ` surveillance, medical records, photography archives, video archives, and large scale eCommerce. (wikipedia) Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 10 Source: http://fortunewallstreet.files.wordpress.com/2010/12/matrix.jpg
  11. 11. From Data to Information `• “Huston Storm”• “Low volume” but “High influenceFacebook” client profile• Proactive intensive care Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  12. 12. Merging the Traditional and Big Data Approaches Traditional Approach Big Data Approach Structured & Repeatable Analysis Iterative & Exploratory Analysis IT Business Users Delivers a platform to Determine what enable creative question to ask discovery ` IT Business Structures the Explores what data to answer questions could be that question asked Monthly sales reports Brand sentiment Profitability analysis Product strategy Customer surveys Maximum asset utilization Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  13. 13. Brewers (CAP) Theorem• It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: – Consistency (all nodes see the same data at the same time) ` – Availability (node failures do not prevent survivors from continuing to operate) – Partition Tolerance (the system continues to operate despite arbitrary message loss) Source: Scalebase Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph http://en.wikipedia.org/wiki/CAP_theorem
  14. 14. Dealing With CAP• Drop Consistency – Welcome to the “Eventually Consistent” term. • At the end – everything will work out just fine - And hi, sometimes this is a good enough solution – When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent ` – For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service – Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID Source: Scalebase Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  15. 15. NO-SQL ` Source: ScalebaseGalit Fein’s work Copyright STKI@2012 http://browsertoolkit.com/fault-tolerance.pngDo not remove source or attribution from any slide or graph
  16. 16. Pros/Cons• Pros: – Performance – BigData – Most solutions are open source – Data is replicated to nodes and is therefore fault-tolerant (partitioning) – Dont require a schema – Can scale up and down• Cons: ` – Code change – No framework support – Not ACID – Eco system (BI, Backup) – There is always a database at the backend – Some API is just too simple Source: Scalebase Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  17. 17. There are some NoSQL projects out there… Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel ` Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  18. 18. HDFS architecture ` Replication, robustness, pipelining, data correctness, snapshotsPini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  19. 19. Dataflow in Hadoop Master Job: Word Count Submit job ` map schedule reduce map reducePini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012 Source: Haifa Labs IBMDo not remove source or attribution from any slide or graph
  20. 20. Dataflow in Hadoop Hello World Bye WorldRead Hello 1 `Input File World 2 map reduce Block 1 Bye Hello Hadoop Goodbye Hadoop HDFS Block 2 Hello 1 map Hadoop 2 reduce Goodbye Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  21. 21. Dataflow in Hadoop Finished Finished + Location ` map Local FS reduce Local map FS reducePini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  22. 22. Dataflow in Hadoop ` map Local FS reduce HTTP GET Local map FS reducePini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  23. 23. Dataflow in Hadoop Write ` Final reduce Answer HDFS reduce Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2Pini Cohen’s work Copyright STKI@2012Galit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph
  24. 24. Can we live with NoSQL limitations?• Facebook has dropped Cassandra• “..we found Cassandras eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure”• Facebook has selected HBase (Columnar ` DBMS) . http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 24
  25. 25. Who Uses Hadoop?• Amazon/A9  Quantcast• AOL  Rackspace/Mailtrust• Facebook  Veoh• Fox interactive media  Yahoo!• Netflix  PowerSet (now• New York Times ` Microsoft) More at http://wiki.apache.org/hadoop/PoweredBy Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  26. 26. Storage Size and Growth in Selected IndustriesIndustry 2011 1Q 2010 1Q Planned Size RAW Size RAW Growth per yearDefense 500T-6P 500TB-4P 50%- 100%Finance 600T-1.3P 400TB-1P 40% - 75%Health 140T-550T 140TB-350TB 30%-50%Manufacturing – 100T-250T ` 40TB-200TB 20%-50%RetailTelco 2P-3P 900TB-2.5 30%-50% PETAGovernmental 100T-300T 10TB-30TB 25%-100%PublicHigh Tech 150T-550T 40TB-150TB 20%-30% Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  27. 27. Storage Ratios• Number of Raw TB and Usable TB per Storage Staff Member FTE (including backup and DRP of storage): Per FTE RAW Storage Usable Storage 25 percentile 97TB 42TB Median 225TB 150TB ` 75 percentile 350TB 238TB• Moderate 25% increase from last years data Source: STKI Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  28. 28. Usable/Raw storage ratio• Net Storage in this research – usable for applications: – After Raids – After replication to DRP – Without VTL’s – The term “Usable storage” is tricky since with snapshots and thin provisioning application can see more storage then “Raw ` storage” NETRAW Ratio 25 percentile 50% Median 60% 75 percentile 71% Source: STKI Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  29. 29. STKI’s take• Storage growth rate will increase• Managing the storage will be key priority• Automation Automation Automation (which also means standardization)• Users should have established storage metrics such as: – Storage managed per administrator – Storage in inventory - unused storage divided by total storage – Data multiplier (number of copies made of primary data ) – ` Data availability (number of hours storage is – Mean time to recovery (MTTR)• Explore the business needs for Big Data Pini Cohen’s work Copyright STKI@2012 Galit Fein’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 29
  30. 30. Thanks ` pini@stki.infoGalit Fein’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph

×