Trends in        Infrastructure:        Paradigm ShiftsTell me and I’ll forgetShow me and I may         STKI Summit 2012re...
What do we do?             Pini Cohen’s work Copyright STKI@2012             Do not remove source or attribution from any ...
AgendaMajor paradigm shiftsDevelopment and SOAESM BSM CMDBDBMS and DATAPlatforms – ServersClientsStorage                  ...
Major paradigm shifts -mini agenda     • Why don’t we see a change when it is coming?     • Big Data and programming model...
Managers Dillema     • Bingo! My product is main stream product (quartiles 2 and 3).     • Now, should I invest in quartil...
Prof. Clayton Christensen: Disruptive Innovation ModelRemember Digital Equipment Corporation (DEC). “Underdogs become  mai...
Last’s year my theme was “The Gap”              Pini Cohen’s work Copyright STKI@2012              Do not remove source or...
Major paradigm shifts-mini agenda     • Why don’t we see a change when it is coming?     • Big Data and programming models...
Big Data Definition – 4 V’s (or more…)     • Volume – tens of TBs and more (15-20TB+)     • Velocity – the speed in which ...
The origins of the 3V’s:      • 2002 research by Doug Laney from META Group (now        Gartner):                Pini Cohe...
“Big Data” theme main current usage:     • “Big Data" is just marketing jargon. -Doug Laney,       Gartner source: http://...
Big Data at work:     • Orbitz Worldwide has collected 750 terabytes of       unstructured data on their consumers’ behavi...
Example network flow data (possible use – Cyber)     • A huge amount of flow data        • Long-term collection of flow da...
DW appliances will be discussed later                Teradata                                                             ...
Several parts of paradigm changes Elements  Concepts     • Storing data for analytics (mainly):        • HDFS – Hadoop Fil...
Who Uses Hadoop?     •   Amazon/A9                                                              Quantcast     •   AOL    ...
Who Uses Cassandra?     •   Facebook                                                            SimpleGeo     •   Digg   ...
Big Data technologies (Hadoop etc.) vs. traditional IT  Traditional IT                                              Big Da...
The Basic Concept –the internet     • Think Distributed     • Think Parallel       Source: http://retedeicittadini.it/wp-c...
New type of scale:     • Hadoop:        • Up to 4,000 machines in a cluster        • Up to 20 PB in a cluster     • Curren...
Brewers (CAP) Theorem     • It is impossible for a distributed computer system to       simultaneously provide all three o...
Dealing With CAP     • Drop Consistency        • Welcome to the “Eventually Consistent” term.            • At the end – ev...
Hadoop    • Apache Hadoop is a software framework that supports      data-intensive distributed applications    • It enabl...
HDFS – Hadoop File System        • Parallel        • Distributed on commodity elements        • Throughput over latency   ...
HDFS motivation     • What if you needed to write a program that distributes       data on commodity HW (PC’s or Servers)....
HDFS: Hadoop Distributed File Systems              • Client requests meta data about a file from namenode              • D...
Datanode BlockreportsFile “part-0” will bereplicated twice and willpopulatesaved in blocks 1and 3 (file is big so it has t...
HDFS basic limitations     • Namenode is single point of failure     • Write-once model     • Plan to support appending-wr...
Map Reduce programming model    • In very basic – Brings the program to the data    • Contains two elements:        • Map:...
MapReduce motivation    • What if you needed to write a program that processes data      that’s on distributed computers? ...
MapReduce example:    map(String key, String value):    // key: document name    // value: document contents    for each w...
Dataflow in Hadoop                                                 Master                         Job: Word Count         ...
Dataflow in Hadoop                   Hello World Bye World Read                                                     Hello ...
Dataflow in Hadoop                              Finished                                      Finished + Location         ...
Dataflow in Hadoop                     map                      Local                                               FS    ...
Dataflow in Hadoop                                                                                           Write        ...
Example: Flow Analysis Map/Reduce                                                                                         ...
Components of Cluster Node            Flow File Input               Processor                                             ...
MapReduce helprs: Hive, Pig         • Make life easier – translate more friendly language to Map           Reduce         ...
Hive: MapReduce helper:     • Code Example:        • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;      ...
NoSQL DBMS: storing and retrieving data     • Key/Value         • A big hash table         • Examples: Voldemort, Amazon’s...
Pros/Cons     • Pros:         • Performance         • BigData         • Most solutions are open source         • Data is r...
There are some NoSQL projects out there…                                                                            Source...
NoSQL Market Forecast 2011-2015            http://www.marketresearchmedia.com/2010/11/11/nosql-market/               Pini ...
Apache Cassandra     • Cassandra is a highly scalable, eventually       consistent, distributed, structured key-value     ...
Consistent Hashing• Partition using consistent hashing (for the  first node data is placed) based on MD5  Distributed hash...
Write operation                                  Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applicati...
Cassandra’s tunable consistency (write)Level          Behavior               Ensure that the write has been written to at ...
Cassandra’s tunable consistency – readLevel        BehaviorANY          Not supported. You probably want ONE instead.     ...
Cassandra’s data model structure                 Think of cassandra as row-oriented      keyspace                         ...
Data Model – “flexible” scheme! ColumnFamily: RocketsKey                      Value 1                        Name         ...
Cassandra’s CQL – Cassandra SQL Language     • SQL like. Example:        • CREATE KEYSPACE test with strategy_class = Simp...
NoSQL benchmark – for scale!            Source: r esearch.yahoo.com/files/ycsb-v4.pdf                        Pini Cohen’s ...
Can we live with NoSQL limitations?     • Facebook has dropped Cassandra     • “..we found Cassandras eventual consistency...
What about other NoSQL DBMS?    • MongoDB    • Hbase    • CouchDB    • Maybe next session….               Pini Cohen’s wor...
Big Data potential implications on IT     • Will traditional RDBMS be obsolete? Surely no!     • Several areas are Big Dat...
Example of big data technology: SPLUNK     • Splunk is a traditional IT vendor based on MapReduce       (from 2009)       ...
Another aspect of Big Data - IBM Watson wins in Jeopardy                                                                  ...
DeepQA: the technology & architecture behind Watson                                                                       ...
Where did it acquire knowledge?   Three                      Domain Data                          Training and test   NLP ...
IBM’s Watson possible implications     If the computer understands my speech, why do I need a     keyboard?     If the com...
Major paradigm shifts -mini agenda     • Why don’t we see a change when it is coming?     • Big Data and programming model...
Mega-trend #1 of 21st century CONSUMERIZATION:         empowerment of people collaborating via             connected mobil...
User Interface Revolution – Touch / Sound(Voice) / Move Era                    Pini Cohen’s work Copyright STKI@2012      ...
2012: Sound/Voice is in               Pini Cohen’s work Copyright STKI@2012               Do not remove source or attribut...
2012: Face recognition is in                Pini Cohen’s work Copyright STKI@2012                Do not remove source or a...
Desktop and Mobile ecosystems begin to converge                  “BYOD : bring your own device"       employees asserting ...
Four screens of convergence: TV, PC, mobile and in-car• We want to be connected 7X24• Each of these screens is useful duri...
Can IT support all devices ?      • Employees will use as many        computers and mobile devices as        they wish.   ...
What about Productivity Software for non-wintel machines?                                                                 ...
Israel (expected end 2012):      Wintel: Q42011 compared to Q42010      Desktop PCs: -25% Notebooks: -35%               Pi...
Client/server v2                                                                                                          ...
Windows on ARMFeature                Windows 8 x86/64                        Windows 8 on ARM               Source: http:/...
Microsoft is fighting back    Win8 tabletsphone are:                                                  However:    • Easier...
A new era. We had it before:                                                                             Source: http://ww...
And the new era will look like :   Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletn...
New Era: IT can no longer dictate a single device     • Looks like the dominance of Microsoft on Intel with C/S or WEB    ...
Major paradigm shifts -mini agenda     • Why don’t we see a change when it is coming?     • Big Data and programming model...
Infrastructure as code     • Treat your infrastructure as code:         •   AnalyzeDesign         •   Develop (the automat...
Some SW definitions:     • Software build - the process of converting source code files       into standalone software art...
Infrastructure as code     • This will enable frequent changes in production     • 180% change from current “versions” pol...
Opscode - Chef      • With Chef, you write abstract definitions as source code to describe        how you want each part o...
Opscode’s Chef     • Chef agent assures that the desired configuration is       installed!     • All install files  script...
Devops – Development and Operations     • Addresses the conflict between Development and       Operations:        • Develo...
Devops – Development from Mars, Operations from Venus     • Development and Operations are in different organization      ...
DeploymentRelease time is trouble time     • Development kicks things off by "tossing" a software release       "over the ...
Devops – new state of mind                                                                                 Source: http://...
Devops aims at:                                                                                                Source: htt...
DevOps Addresses Challenges     • DevOps is an operational approach that automates system       configuration and manageme...
Striving towards Devops state of mind:     • Measurement and incentives to change culture - metrics       based on joint p...
Devop Measurement    • Resource Utilization - How resources are allocated and how efficiently      they are used. Usually ...
Devop Measurement    • Operations Throughput - The volume and rate at which change      moves through your development to ...
Devop Measurement    • Agility - This looks at how quickly and efficiently your IT      operations can react to changes in...
Architecture Concepts related to Devops     • Devops is related to several technology       architecture and guidelines:  ...
Devops tools:                                                                              Soruce: http://doc36.controltie...
Devops vs. Private Cloud?     • In many aspects the objectives of Devops and Private Cloud       are overlapping     • Aut...
Some input from last’s year presentation     • Public cloud              Source: IDC https://www.eiseverywhere.com/file_up...
Summary – Major paradigm shifts     • Remember Digital Equipment       Corporation (DEC). “Underdogs       become mainstre...
STKI Round Tables     • Lots of useful information – use it !                 Pini Cohen’s work Copyright STKI@2012       ...
STKI Round Tables              Pini Cohen’s work Copyright STKI@2012              Do not remove source or attribution from...
We will present data on products and vendors:1. Israeli vendors rating – state of the current market focused on the   ente...
We will present data on products and vendors (cont.)3. Selected installations of products – projects in different stages ,...
103Pini Cohen’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph     103
Ratio Analysis:                                                                                   Sorted Metric   Metric• ...
AgendaMajor paradigm shiftsDevelopment and SOAESM BSM CMDBDBMS and DATAPlatforms – ServersClientsStorage                  ...
Upcoming SlideShare
Loading in …5
×

Stki summit2012infra v7 - major trends - paradign shifts

1,651 views
1,494 views

Published on

STKI infrastructure presentation 2012 - Parading Shifts - Big Data Hadoop Mapreduce Cassandra Devops and more

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,651
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • DeepQA generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.DeepQAgenerates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. Thesegather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.Watson – the computer system we developed to play Jeopardy! is based on the DeepQAsoftatearchtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. <UIMA Mention>For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
  • Stki summit2012infra v7 - major trends - paradign shifts

    1. 1. Trends in Infrastructure: Paradigm ShiftsTell me and I’ll forgetShow me and I may STKI Summit 2012remember Pini CohenInvolve me and I’ll VP and Senior Analyst
    2. 2. What do we do? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 2
    3. 3. AgendaMajor paradigm shiftsDevelopment and SOAESM BSM CMDBDBMS and DATAPlatforms – ServersClientsStorage Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg Pini Cohen’s work Copyright STKI@2012 3 Do not remove source or attribution from any slide or graph
    4. 4. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 4
    5. 5. Managers Dillema • Bingo! My product is main stream product (quartiles 2 and 3). • Now, should I invest in quartiles 1 or 4? • Most managers will invest in quartile 4 Quality required is improving gradually Percentage Source of pic: http://www.buat-nadlan.com/2011/11/blog-post_3065.html New productcategory Quality required by Customers Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 5
    6. 6. Prof. Clayton Christensen: Disruptive Innovation ModelRemember Digital Equipment Corporation (DEC). “Underdogs become mainstream faster than we think”. Change towards what looks as “none mature” areas is crucial T1 T2 Pini Cohen’s work Copyright STKI@2012 6 Do not remove source or attribution from any slide or graph
    7. 7. Last’s year my theme was “The Gap” Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 7
    8. 8. Major paradigm shifts-mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as Code and Devops Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 8
    9. 9. Big Data Definition – 4 V’s (or more…) • Volume – tens of TBs and more (15-20TB+) • Velocity – the speed in which data is added – 10M items per hour and more. And the speed in which the data needs to be processed • Variety – different types of data – structured & unstructured. In many cases deals with internet of things, social media, but also with voice, video, etc. • Variability - able to cope with new attributes and changing data types – without interrupting the analytical process (without “import-export”) • Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 9
    10. 10. The origins of the 3V’s: • 2002 research by Doug Laney from META Group (now Gartner): Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 10
    11. 11. “Big Data” theme main current usage: • “Big Data" is just marketing jargon. -Doug Laney, Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg • STKI : doing something significantly different from what you’ve done until now Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 11
    12. 12. Big Data at work: • Orbitz Worldwide has collected 750 terabytes of unstructured data on their consumers’ behavior – detailed information from customer online visits and browsing sessions. Using Hadoop, models have been developed intended to improve search results and tailor the user experience based on everything from location, interest in family travel versus solo travel, and even the kind of device being used to explore travel options. • The result? To date, a 7% increase in interaction rate, 37% growth in stickiness of sessions and a net 2.6% in booking path engagement. Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 12
    13. 13. Example network flow data (possible use – Cyber) • A huge amount of flow data • Long-term collection of flow data Flow data in our campus network ( /16 prefix ) # of Routers 1 Day 1 Month 1 Year 1 1.2 GB 13 GB 156 GB 5 6 GB 65 GB 780 GB 10 12 GB 130 GB 1.5 TB 200 240 GB 2.6 TB 30 TB • Short-term period of flow data • Massive flow data from anomaly traffic data of Internet worm and DDoS • Cluster file system and cloud computing platform • Google’s programming model, MapReduce, big table [8] • Open-source system, Hadoop [9] Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt STKI modifications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 13
    14. 14. DW appliances will be discussed later Teradata EMC Greenplun Oracle Exadata Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/ Pini Cohen’s work Copyright STKI@2012 14 Microsoft Parallel Data Warehouse Do not remove source or attribution from any slide or graph
    15. 15. Several parts of paradigm changes Elements Concepts • Storing data for analytics (mainly): • HDFS – Hadoop File System • Map Reduce- Programming method mainly for analytics • Other “Add-on”: Pig, , Hive, JAQL (IBM) • Storing and retrieving data - DBMS: • NoSQL – DBMS (not only SQL): • Cassandra • MongoDB • CouchDB • Hbase • New ways of manipulating and analyzing all kind data. Example – how do get specific lead from a Facebook status “I wish I could see Messi next month in London”? Not discussed in this presentation (see Einat’s presentation) New algorithms. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 15
    16. 16. Who Uses Hadoop? • Amazon/A9  Quantcast • AOL  Rackspace/Mailtrust • Facebook • Fox interactive media  Veoh • Netflix  Yahoo! • New York Times  PowerSet (now Microsoft) More at http://wiki.apache.org/hadoop/PoweredBy Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 16
    17. 17. Who Uses Cassandra? • Facebook  SimpleGeo • Digg  Rackspace • Despegar  Shazam • Ooyala  SoftwareProjects • Imagini Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 17
    18. 18. Big Data technologies (Hadoop etc.) vs. traditional IT Traditional IT Big Data Centralized Storage Local storage Brand redundant Servers Cheap HW White Boxes Standard Infrastructure and virtual Is standardization needed?! (in the HW servers. level). No server virtualization. Well established backup and DRP Why do I need backup? How do I tackle procedures DRP (compute clusters that are stretched over locations) Traditional vendors Open Source solutions Mature products and procedures In a new patch for specific issues sometimes it is written “not implemented yet” Traditional programming, SQL Different kind of programming (map- reduce) , no Joins Will Big Data infrastructure be part of existing infrastructure or will be developed as new domain? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 18
    19. 19. The Basic Concept –the internet • Think Distributed • Think Parallel Source: http://retedeicittadini.it/wp-content/uploads/2011/02/network-distributed.gif Source: http://www.catonmat.net/blog/mit-introduction-to-algorithms- Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 19
    20. 20. New type of scale: • Hadoop: • Up to 4,000 machines in a cluster • Up to 20 PB in a cluster • Currently traditional IT technologies can not handle this kind of scale. • This scale comes with a cost! Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 20
    21. 21. Brewers (CAP) Theorem • It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (node failures do not prevent survivors from continuing to operate) • Partition Tolerance (the system continues to operate in many partitions and despite arbitrary message loss) Source: Scalebase STKI modifications Professor Eric A. Brewer Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 21
    22. 22. Dealing With CAP • Drop Consistency • Welcome to the “Eventually Consistent” term. • At the end – everything will work out just fine - And hey, sometimes this is a good enough solution • When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent • For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service • Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 22
    23. 23. Hadoop • Apache Hadoop is a software framework that supports data-intensive distributed applications • It enables applications to work with thousands of nodes and petabytes of data. • Hadoop was inspired by Googles MapReduce and Google File System (GFS) papers • Contains (basically): • HDFS – Hadoop file System • MapReduce programming model Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 23
    24. 24. HDFS – Hadoop File System • Parallel • Distributed on commodity elements • Throughput over latency • Reliable and self healing • For large scale – typical file is gigabytes to terabytes (for one file!) • Applications need a write-once-read-many access model (mainly analytics) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 24
    25. 25. HDFS motivation • What if you needed to write a program that distributes data on commodity HW (PC’s or Servers). You would need to take care of: • Where is the data located • How to distribute data between the nodes • How many times you want to replicate the data • How to insert, select and update data • What to do if one node or more fails • How to add node or to take out a node • Manage and monitor the environment • Hadoop File System did it for you! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 25
    26. 26. HDFS: Hadoop Distributed File Systems • Client requests meta data about a file from namenode • Data is served directly from datanode HDFS namenode Application (file name, block id) HDFS Client File namespace /user/css534/input (block id, block location) block 3df2 instructions state (block id, byte range) HDFS datanode HDFS datanode block data Linux local file system Linux local file system … … source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 26
    27. 27. Datanode BlockreportsFile “part-0” will bereplicated twice and willpopulatesaved in blocks 1and 3 (file is big so it has tobe divided to 2 blocks) Block 1 is on data nodes A and C source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 27
    28. 28. HDFS basic limitations • Namenode is single point of failure • Write-once model • Plan to support appending-writes • A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain • Cannot be mounted by exisiting OS • Getting data in and out is tedious • HDFS does not implement / support user quotas / access permissions • Data balancing schemes • No periodic checkpoints Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 28
    29. 29. Map Reduce programming model • In very basic – Brings the program to the data • Contains two elements: • Map: this part of the job is performed in parallel asynchronous by each node • Reduce: gather the result from the relevant nodes • In more detail : • Map : return (write on temp file) a list containing zero or more ( k, v ) pairs • Output can be a different key from the input • Output can have same key • Reduce : return a new list of reduced output from input Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 29
    30. 30. MapReduce motivation • What if you needed to write a program that processes data that’s on distributed computers? • You would need to write distributed program that: • Finds where the data located • Work on each node and then combine the result from each node together. • Where (on the local node) and how (format) to write the intermediate results • Find when the jobs of all participating nodes have concluded and then start the “aggregation” part • What to do if a job is stuck (restart the job or turn to another node to perform the same job) • Hadopp MapReduce is the framework for you! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 30
    31. 31. MapReduce example: map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 31
    32. 32. Dataflow in Hadoop Master Job: Word Count Submit job All elements – standard HW map schedule reduce map reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 32
    33. 33. Dataflow in Hadoop Hello World Bye World Read Hello 1 Input File World 2 map reduce Block 1 Bye Hello Hadoop Goodbye Hadoop HDFS Block 2 Hello 1 map Hadoop 2 reduce Goodbye Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 33
    34. 34. Dataflow in Hadoop Finished Finished + Location map Local FS reduce Local map FS reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 34
    35. 35. Dataflow in Hadoop map Local FS reduce HTTP GET Local map FS reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 35
    36. 36. Dataflow in Hadoop Write Final reduce Answer HDFS reduce Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 36
    37. 37. Example: Flow Analysis Map/Reduce • Read text flow files Flow Flow Flow Octet Dst Port • Run map tasks Flow • Read each line (Validation Check) • Parsing flow data • Save result 53 [64, 128] into temporary files (key, value) 53 128 64 53 192 • Run reduce tasks • Read temporary files (Key, List[Value]) • Run sum process • Write results to a file Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 37
    38. 38. Components of Cluster Node Flow File Input Processor Flow Analysis Flow Analysis • Flow file Cluster File Map Reduce Cluster File Map Reduce input processor System (System) HDFS • Flow analysis flow- ( HDFS ) MapReduce Library map/reduce tools • Flow-tools Hadoop • Hadoop • HDFS Java Virtual Machine • MapReduce Operating System ( Linux ) • Java VM • OS : Linux Hardware ( CPU, HDD, Memory, NIC ) Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 38
    39. 39. MapReduce helprs: Hive, Pig • Make life easier – translate more friendly language to Map Reduce Hive PigLanguage SQL-like PigLatinSchemas/Types Yes (explicit) Yes (implicit)Partitions Yes NoServer Optional (Thrift) NoUser Defined Functions (UDF) Yes (Java) Yes (Java)Custom Serializer/Deserializer Yes YesDFS Direct Access Yes (implicit) Yes (explicit)Streaming Yes YesWeb Interface Yes NoJDBC/ODBC Yes (limited) No Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 39
    40. 40. Hive: MapReduce helper: • Code Example: • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a; • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100; • hive> INSERT OVERWRITE LOCAL DIRECTORY /tmp/reg_3 SELECT a.* FROM events a; • hive> INSERT OVERWRITE DIRECTORY /tmp/reg_4 select a.invites, a.pokes FROM profiles a; • hive> INSERT OVERWRITE DIRECTORY /tmp/reg_5 SELECT COUNT(*) FROM invites a WHERE a.ds=2008-08-15; • hive> INSERT OVERWRITE DIRECTORY /tmp/reg_5 SELECT a.foo, a.bar FROM invites a; • hive> INSERT OVERWRITE LOCAL DIRECTORY /tmp/sum SELECT SUM(a.pc) FROM pc1 a; Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 40
    41. 41. NoSQL DBMS: storing and retrieving data • Key/Value • A big hash table • Examples: Voldemort, Amazon’s Dynamo • Big Table • Big table, column families • Examples: Hbase, Cassandra • Document based • Collections of collections • Examples: CouchDB, MongoDB • Graph databases • Based on graph theory • Examples: Neo4J • Each solves a different problem Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 41
    42. 42. Pros/Cons • Pros: • Performance • BigData • Most solutions are open source • Data is replicated to nodes and is therefore fault-tolerant (partitioning) • Dont require a schema • Can scale up and down • Cons: • Code change • No framework support • Not ACID • Eco system (BI, Backup) • There is always a database at the backend • Some API is just too simple Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 42
    43. 43. There are some NoSQL projects out there… Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 43
    44. 44. NoSQL Market Forecast 2011-2015 http://www.marketresearchmedia.com/2010/11/11/nosql-market/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 44
    45. 45. Apache Cassandra • Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store • Child of Google’s BigTable and Amazon’s Dynamo • Peer to peer architecture. All nodes are equal Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt • Cassandra’s replication factor (RF) is the total number of nodes onto which the data will be placed. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes / RF). • CQL (Cassandra Query Language) command line • Time stamp for each value written Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 45
    46. 46. Consistent Hashing• Partition using consistent hashing (for the first node data is placed) based on MD5 Distributed hash table algorithm A• Keys hash to a point on a fixed circular C space V B• Ring is partitioned into a set of ordered slots and servers and keys hashed over these slots• Nodes take positions on the circle. S D• A, B, and D exists.• B responsible for AB range ( for replication factor=2 – default).• D responsible for BD range.• A responsible for DA range. R H• C joins.• B, D split ranges. M• C gets BC from D. Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 46
    47. 47. Write operation Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt Pini Cohen’s work Copyright STKI@2012 47 Do not remove source or attribution from any slide or graph
    48. 48. Cassandra’s tunable consistency (write)Level Behavior Ensure that the write has been written to at least 1 node, including HintedHandoffANY recipients. Ensure that the write has been written to at least 1 replicas commit log andONE memory table before responding to the client. Ensure that the write has been written to at least 2 replicas before responding toTWO the client. Ensure that the write has been written to at least 3 replicas before responding toTHREE the client. Ensure that the write has been written to N / 2 + 1 replicas before responding to theQUORUM client. Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, withinLOCAL_QUORUM the local datacenter (requires NetworkTopologyStrategy) Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in eachEACH_QUORUM datacenter (requires NetworkTopologyStrategy) Ensure that the write is written to all N replicas before responding to the client. AnyALL unresponsive replicas will fail the operation. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph Source: wiki 48
    49. 49. Cassandra’s tunable consistency – readLevel BehaviorANY Not supported. You probably want ONE instead. Will return the record returned by the first replica to respond. A consistency check is always done in aONE background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is called ReadRepair) Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas willTWO be checked in the background.THREE Will query 3 replicas and return the record with the most recent timestamp. Will query all replicas and return the record with the most recent timestamp once it has at least a majority ofQUORUM replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.LOCAL_QUO Returns the record with the most recent timestamp once a majority of replicas within the local datacenter haveRUM replied.EACH_QUO Returns the record with the most recent timestamp once a majority of replicas within each datacenter haveRUM replied. Will query all replicas and return the record with the most recent timestamp once all replicas have replied. AnyALL unresponsive replicas will fail the operation. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph Source: wiki 49
    50. 50. Cassandra’s data model structure Think of cassandra as row-oriented keyspace column family settings (eg, partitioner) settings column (eg, comparator, type [Std]) name value clock Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 50
    51. 51. Data Model – “flexible” scheme! ColumnFamily: RocketsKey Value 1 Name Value name Rocket-Powered Roller Skates toon Ready, Set, Zoom inventoryQty 5 brakes false 2 Name Value name Little Giant Do-It-Yourself Rocket-Sled Kit toon Beep Prepared inventoryQty 4 brakes false 3 Name Value name Acme Jet Propelled Unicycle toon Hot Rod and Reel inventoryQty 1 wheels 1 Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 51
    52. 52. Cassandra’s CQL – Cassandra SQL Language • SQL like. Example: • CREATE KEYSPACE test with strategy_class = SimpleStrategy and strategy_options:replication_factor=1; • CREATE INDEX ON users (birth_date); • SELECT * FROM users WHERE state=UT AND birth_date > 1970; • However: • No Joins • No UPDATES/DELETES Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 52
    53. 53. NoSQL benchmark – for scale! Source: r esearch.yahoo.com/files/ycsb-v4.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 53
    54. 54. Can we live with NoSQL limitations? • Facebook has dropped Cassandra • “..we found Cassandras eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure” • Facebook has selected HBase (Columnar DBMS) . http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of- messages/454991608919 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 54
    55. 55. What about other NoSQL DBMS? • MongoDB • Hbase • CouchDB • Maybe next session…. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 55
    56. 56. Big Data potential implications on IT • Will traditional RDBMS be obsolete? Surely no! • Several areas are Big Data zone by definition – Internet marketing, Cyber, DW, etc. • How well can we live with “Eventually Consistent” which in most cases means 1-2 minutes delay?! • Can we define that all batch data can live well on Big Data technologies? • Will we see at the end (10 years form now) that only small portion of data still resides on RDBMS and most of the data resides on Big Data technologies?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 56
    57. 57. Example of big data technology: SPLUNK • Splunk is a traditional IT vendor based on MapReduce (from 2009) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 57
    58. 58. Another aspect of Big Data - IBM Watson wins in Jeopardy 58 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    59. 59. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence model model model Answer Sources Evidence Sources model model model Initial Candidate Answer Evidence Deep Primary Question Answer Scoring Retrieval Evidence Search model model model Generation Scoring Question Hypothesis Question Hypothesis Final Confidence & Topic & Evidence Synthesis Decomposition Generation Merging & Ranking Analysis Scoring Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence Hypothesis Hypothesis and Evidence Scoring Generation Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 59
    60. 60. Where did it acquire knowledge? Three Domain Data Training and test NLP Resources (vocabularies, types of (articles, books, question sets taxonomies, documents) w/answer keys knowledge ontologies) • Wikipedia • 17 GB • Time, Inc. • 2.0 GB • New York Time • 7.4 GB • Encarta • 0.3 GB • Oxford University • 0.11 GB • Internet Movie Database • 0.1 GB • IBM Dictionary • 0.01 GB • ... J! Archive/YAGO/dbPedia… XXX • Total Raw Content • 70 GB • Preprocessed Content • 500 GB Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 60
    61. 61. IBM’s Watson possible implications If the computer understands my speech, why do I need a keyboard? If the computer can talk, why do I need a screen? If the computer understands semantics and can act with its own reasoning – why do you need me?! 61 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    62. 62. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end user devices ecosystem • Infrastructure as a s Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 62
    63. 63. Mega-trend #1 of 21st century CONSUMERIZATION: empowerment of people collaborating via connected mobile devices Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    64. 64. User Interface Revolution – Touch / Sound(Voice) / Move Era Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 64
    65. 65. 2012: Sound/Voice is in Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 65
    66. 66. 2012: Face recognition is in Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    67. 67. Desktop and Mobile ecosystems begin to converge “BYOD : bring your own device" employees asserting control over the technology they use for work 4 Devices per employee?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    68. 68. Four screens of convergence: TV, PC, mobile and in-car• We want to be connected 7X24• Each of these screens is useful during our day and each is connected to the cloud• IT should allow us to use the same business (IT supports ALL) and entertainment applications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 68
    69. 69. Can IT support all devices ? • Employees will use as many computers and mobile devices as they wish. • Automatically keep their data in sync with a backup copy . • Solutions should be enterprise class : • secure • reliable • maintainable • integrated to critical back-office systems Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 69
    70. 70. What about Productivity Software for non-wintel machines? Office 2015 ARM W8 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 70
    71. 71. Israel (expected end 2012): Wintel: Q42011 compared to Q42010 Desktop PCs: -25% Notebooks: -35% Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    72. 72. Client/server v2 Client/Server V2 1. Most apps work on/off lineTerminals V 2 2. Most of the time connected 3. Uses cloud/local applicationsWEB/Browser client2 types of applications:1. Off-line: processing andstorage local2. Always connected: Client/Server V1browser based applications 2 types of applications: 1. Off-line: processing and storage local Terminals V1 2. Always connected : data and Always connected Picture Source: http://sthvcarringtonmedia.blogspot.com/2011/02/emotions.html processing @server; GUI++ @client I/O only at the local ADVANCES/COST 1. Communications/networking 2. Processor/storage 3. Power /battery Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    73. 73. Windows on ARMFeature Windows 8 x86/64 Windows 8 on ARM Source: http://lenzfire.com/2011/12/future-of-pc-is-soon-to-be-woa-windows-on-arm-than-to-wintel-85094/Device Branding Such devices would be These would also be branded as x86/64 ones branded as ARMOld Windows 7 Things Everything that runs on Only selective things Windows 7 would run on would be runnable these platformsVirtualization Yes, If hardware supports it Not supportedTurn on/off options Yes, on all devices No, devices would keep running on Connected Standby power modeApp Development Yes, many tools are Yes, but with selective available tools only which are not yet availableAvailability All the sources from where Would be available only Windows 7 is available e.g. in ARM devices. No, online, DVD/CD and PC’s etc DVD’s or online availability WOA – Windows on ArmDriver availability From respective company’s Only through Windows site, DVD/CD’s and through Update Windows UpdateMaintenance e.g. Through Windows Disks and Only Through WindowsUpdates and Other Windows Update UpdateFixesUniqueness Any source would run on a Each source in unique to wide variety of devices unique device Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 73
    74. 74. Microsoft is fighting back Win8 tabletsphone are: However: • Easier to managesecure • Microsoft starts from from enterprise scratch in this markets perspective • The “influences” already • Easier to synchronize with enterprise data are heavy users mainly of “stylish Apple” • Easier to enable enterprise applications • There are strong forces (on Intel based devices) within Microsoft to • Microsoft hopes to “Bring enable business Your Enterprise to Home” applications to other BYEH platforms (Office on iPAD Android..) Will Microsoft “hidden” dream of “IT enabling only Microsoft tablets and phones accessing mail enterprise apps” will come true?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
    75. 75. A new era. We had it before: Source: http://www.socialtechpop.com/2010/10/old-vs-new-trends-in-social-media/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 75
    76. 76. And the new era will look like : Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletnotebook-thingy/ Computing as we now it today Change at the deviceUX level and change in application level - mobility Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 76
    77. 77. New Era: IT can no longer dictate a single device • Looks like the dominance of Microsoft on Intel with C/S or WEB app is over! • The new general purpose application architecture will support: • Data stored in a cloud and in local devices (appropriate formats per each device). • Data synchronization with conflict resolution between data instances • Continuous transaction processing between different devices = mobility • Different interfaces to the same application (mainly APPS but also browser based) • Application code is native or hybrid for each device • Offline work (read with update) • Automatic SW update • Voice • Face recognition • AI reasoning Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 77
    78. 78. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as a s Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 78
    79. 79. Infrastructure as code • Treat your infrastructure as code: • AnalyzeDesign • Develop (the automation scripts) • Prepare the Build • Test • Deploy the Build • That means – no more manual configurations • Automatic testing – not only for the apps level • Also – be sure that what is not in the build – will not be installed • Is that possible in the current landscape?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 79
    80. 80. Some SW definitions: • Software build - the process of converting source code files into standalone software artifact(s) that can be run on a computer. One of the most important steps of a software build is the compilation process where source code files are converted into executable code. • Build automation is the act of automating a wide variety of tasks that software developers do in their day-to-day activities including things like: • compiling computer source code into binary code • packaging binary code • running tests • deployment to production systems Source: Wiki STKI modifications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 80
    81. 81. Infrastructure as code • This will enable frequent changes in production • 180% change from current “versions” policy! Source: wiki Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 81
    82. 82. Opscode - Chef • With Chef, you write abstract definitions as source code to describe how you want each part of your infrastructure to be built, and then apply those descriptions to individual servers. • The result is a fully automated infrastructure: when a new server comes on line, the only thing you have to do is tell Chef what role it should play in your architecture. Source: opscode Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 82
    83. 83. Opscode’s Chef • Chef agent assures that the desired configuration is installed! • All install files scripts are located in a central repository (Chef Server) in CouchDB • Tracing what was successful and what not • Documentation of everything • Major components: Cookbooks, Precipice , Knife, Shef • Pull model (can not control when components are installed) • Ruby scripting language Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 83
    84. 84. Devops – Development and Operations • Addresses the conflict between Development and Operations: • Development – are paid for change • Operations – change is the enemy! • “Wall of Confusion” - combination of conflicting motivations, processes, and tooling Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 84
    85. 85. Devops – Development from Mars, Operations from Venus • Development and Operations are in different organization entities and use different tools Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 85
    86. 86. DeploymentRelease time is trouble time • Development kicks things off by "tossing" a software release "over the wall" to Operations. • Operations also hand edit configuration files to reflect the production environment, which is significantly different than the Development or QA environments. • At best they are duplicating work that was already done in previous environments, at worst they are about to introduce or uncover new bugs. Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 86
    87. 87. Devops – new state of mind Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 87
    88. 88. Devops aims at: Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html • DevOps enables the benefits of Agile development to be felt at the organizational level. DevOps does this by allowing for fast and responsive, yet stable, operations that can be kept in sync with the pace of innovation coming out of the development process. http://en.wikipedia.org/wiki/File:Devops.png Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 88
    89. 89. DevOps Addresses Challenges • DevOps is an operational approach that automates system configuration and management. • To manage cloud systems, customers • Need to manage servers as groups • Must respond to rapid infrastructure changes • Have repeatable automated deployments Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 89
    90. 90. Striving towards Devops state of mind: • Measurement and incentives to change culture - metrics based on joint performance • Unified processes • Unified tooling Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 90
    91. 91. Devop Measurement • Resource Utilization - How resources are allocated and how efficiently they are used. Usually were talking about people, but other kinds of resources can fall into this bucket as well. • How much time do developers and administrators spend on build and deployment activity? • How much productivity is lost to problems and bottlenecks? What is the ripple Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html effect of that? • What’s the ratio of ad-hoc change or service recovery activity to planned change? • What’s the cost of moving a unit of change through your lifecycle? • Whats the mean time to diagnose a service outage? Mean time to repair? • What was the true cost of each build or deployment problem (resource and schedule impact)? • What percentage of Development driven changes require Operations to edit/change procedures or edit/change automation? • How much management time is spent dealing with build and deployment problems or change management overhead? • Can Development and QA successfully deploy their own environments? How long does it take per deployment? • How much of your team’s time is spent recreating and maintaining software infrastructure that already exists elsewhere? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 91
    92. 92. Devop Measurement • Operations Throughput - The volume and rate at which change moves through your development to operations pipeline. • How long does it take to get a release from development, through testing, and into production? Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html • How much of that is actual testing time, deployment time, handoff time, or waiting? • How many releases can you successfully deploy per period? • How many successful individual change requests can your operations team handle per period? • Are any build and deployment activities the rate limiting step of your application lifecycle? How does that limit impact your business? • How many simultaneous changes can your team safely handle? • What is business perceived “wait time” from code completion to production deployment of a feature? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 92
    93. 93. Devop Measurement • Agility - This looks at how quickly and efficiently your IT operations can react to changes in the needs of your business. • How quickly can you scale up or scale down capacity to meet Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html changing business demands? • What’s the change management overhead associated increasing/decreasing capacity? What’s the risk? • How quickly and what would it cost to adapt your build and deployment systems to automate any new applications or acquired business lines? • What would it cost you to handle a x% growth in the number of applications or business lines (direct resource assignment plus any attention drain from other staff)? • Could your IT operations handle a x% growth in number of applications or business lines? (i.e. could it even be done?) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 93
    94. 94. Architecture Concepts related to Devops • Devops is related to several technology architecture and guidelines: • Build an application “as stateless as” and “as shared nothing as” possible • Try to have as least “technical debt” as possible (bugs that are on production, patches that are not installed, unsupported swhw, etc.) • Build an application with the ability to “turn off” some of its functionality while on air • Expending transaction versions vs. modifying or updating transaction (enables roll back and working concurrently in several versions) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 94
    95. 95. Devops tools: Soruce: http://doc36.controltier.org/wiki/File:ProvisioningToolchain.png Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 95
    96. 96. Devops vs. Private Cloud? • In many aspects the objectives of Devops and Private Cloud are overlapping • Automation is at the core of both Private Cloud and Devops Source: http://www.pistoncloud.com/2012/01/devops-and-private-cloud-sitting-in-a-tree/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 96
    97. 97. Some input from last’s year presentation • Public cloud Source: IDC https://www.eiseverywhere.com/file_uploads/7e2edb16ed28a2123cd21508f87be8b2_ITR_Boston_2011_Public_and_Private_Cloud_Track_RickVillars_IDC.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 97
    98. 98. Summary – Major paradigm shifts • Remember Digital Equipment Corporation (DEC). “Underdogs become mainstream faster then we think”. Change is crucial • Embrace big data experiments • Embrace Devops concepts – metrics, process and tools. Start with metrics • Devops tools might be our current Technologies configuration, CMDB, tools. Processes • Embrace at least one SAAS application Standardization now (Email, Service desk, HR, ERP, CRM, etc.). Also IAAS, PAAS. • Standardization with processes. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 98
    99. 99. STKI Round Tables • Lots of useful information – use it ! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 99
    100. 100. STKI Round Tables Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 100
    101. 101. We will present data on products and vendors:1. Israeli vendors rating – state of the current market focused on the enterprise market (not SMB)  X – Market penetration (sales + installed base+ clients perspective)  Y – is X plus localization, support, development center, number and kind of integrators, etc.  Worldwide leaders marked, based on global positioning  Vendors to watch: Are only just entering Israeli market or making a big change so can’t be positioned but should be watched  Represents the current Israeli market and not necessarily what we recommend to our clients2. Products and selected resellers / implementers  The location within the list is random Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 101
    102. 102. We will present data on products and vendors (cont.)3. Selected installations of products – projects in different stages , production,implementation, after decision…4. Service providers that are used by users . I asked users – “which SI do you use in this category” and counted the result.5. Analysis by international and Israeli analysts  This complete information (1 to 5) should be used together, combined with the specific circumstances of each case when making a decision This subjective chart is the result of our objective research Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 102
    103. 103. 103Pini Cohen’s work Copyright STKI@2012Do not remove source or attribution from any slide or graph 103
    104. 104. Ratio Analysis: Sorted Metric Metric• 25% percentile 36 57 43 36• 50% percentile = 50 117 median 50 57 438 60• 75% percentile 60 60 175 150 68.6 25% percentile 71 143 100 120 100 50 109 250 117 125 117 280 120 60 120.0 50% percentile = Median 120 200 125 117 125 100 143 164 150 125 164 600 175 192 178.1 75% percentile 188 71 192 120 200 50 250 188 280 43 438 109 Pini Cohen’s work Copyright STKI@2012 600 Do not remove source or attribution from any slide or graph 104 100
    105. 105. AgendaMajor paradigm shiftsDevelopment and SOAESM BSM CMDBDBMS and DATAPlatforms – ServersClientsStorage Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg Pini Cohen’s work Copyright STKI@2012 105 Do not remove source or attribution from any slide or graph

    ×