SlideShare a Scribd company logo
1 of 105
Trends in
        Infrastructure:
        Paradigm Shifts




Tell me and I’ll forget
Show me and I may         STKI Summit 2012
remember                         Pini Cohen
Involve me and I’ll         VP and Senior Analyst
What do we do?




             Pini Cohen’s work Copyright STKI@2012
             Do not remove source or attribution from any slide or graph   2
Agenda


Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage
                                                                         Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg




                  Pini Cohen’s work Copyright STKI@2012
                                                                                                              3
                  Do not remove source or attribution from any slide or graph
Major paradigm shifts -mini agenda

     • Why don’t we see a change when it is coming?
     • Big Data and programming models
     • The changing end users devices ecosystem
     • Infrastructure as Code and DEVOPS




                                                                              Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                                  4
Managers Dillema

     • Bingo! My product is main stream product (quartiles 2 and 3).
     • Now, should I invest in quartiles 1 or 4?
     • Most managers will invest in quartile 4

                                                                      Quality required is improving gradually
       Percentage




                                                                                                           Source of pic: http://www.buat-nadlan.com/2011/11/blog-post_3065.html
   New productcategory

                                                       Quality required by Customers
                     Pini Cohen’s work Copyright STKI@2012
                     Do not remove source or attribution from any slide or graph                                                                                                   5
Prof. Clayton Christensen: Disruptive Innovation Model

Remember Digital Equipment Corporation (DEC). “Underdogs become
  mainstream faster than we think”. Change towards what looks as
                  “none mature” areas is crucial




                                           T1                           T2




                Pini Cohen’s work Copyright STKI@2012
                                                                              6
                Do not remove source or attribution from any slide or graph
Last’s year my theme was “The Gap”




              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph   7
Major paradigm shifts-mini agenda

     • Why don’t we see a change when it is coming?
     • Big Data and programming models
     • The changing end users devices ecosystem
     • Infrastructure as Code and Devops




                                                                              Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                                  8
Big Data Definition – 4 V’s (or more…)

     • Volume – tens of TBs and more (15-20TB+)
     • Velocity – the speed in which data is added – 10M items
       per hour and more. And the speed in which the data needs
       to be processed
     • Variety – different types of data – structured &
       unstructured. In many cases deals with internet of things,
       social media, but also with voice, video, etc.
     • Variability - able to cope with new attributes and changing
       data types – without interrupting the analytical process
       (without “import-export”)
     • Other optional V’s - validity, volatility, viscosity (resistance
       to flow), etc.    source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph                                                                        9
The origins of the 3V’s:

      • 2002 research by Doug Laney from META Group (now
        Gartner):




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   10
“Big Data” theme main current usage:

     • “Big Data" is just marketing jargon. -Doug Laney,
       Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html




                                                                                             Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
     • STKI : doing something significantly different from
       what you’ve done until now

               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph                                                                                                         11
Big Data at work:

     • Orbitz Worldwide has collected 750 terabytes of
       unstructured data on their consumers’ behavior – detailed
       information from customer online visits and browsing
       sessions. Using Hadoop, models have been developed
       intended to improve search results and tailor the user
       experience based on everything from location, interest in
       family travel versus solo travel, and even the kind of device
       being used to explore travel options.
     • The result? To date, a 7% increase in interaction rate, 37%
       growth in stickiness of sessions and a net 2.6% in booking
       path engagement.


             Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf




                              Pini Cohen’s work Copyright STKI@2012
                              Do not remove source or attribution from any slide or graph                                          12
Example network flow data (possible use – Cyber)

     • A huge amount of flow data
        • Long-term collection of flow data
                                                                               Flow data in our campus network ( /16 prefix )
           # of Routers                                                                      1 Day             1 Month            1 Year
                                                             1                                       1.2 GB              13 GB             156 GB
                                                             5                                        6 GB               65 GB             780 GB
                                                         10                                          12 GB               130 GB            1.5 TB
                                                     200                                             240 GB              2.6 TB
                                                                                                                                    30 TB

        • Short-term period of flow data
                 • Massive flow data from anomaly traffic data of Internet worm and DDoS

     • Cluster file system and cloud computing platform
        • Google’s programming model, MapReduce, big table [8]
        • Open-source system, Hadoop [9]
           Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt STKI modifications




                                         Pini Cohen’s work Copyright STKI@2012
                                         Do not remove source or attribution from any slide or graph                                                13
DW appliances will be discussed later




                Teradata                                                                   EMC Greenplun                 Oracle Exadata




       Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/
                                                  Pini Cohen’s work Copyright STKI@2012
                                                                                                                      14
                                                                                                                Microsoft Parallel Data Warehouse
                                                  Do not remove source or attribution from any slide or graph
Several parts of paradigm changes Elements  Concepts

     • Storing data for analytics (mainly):
        • HDFS – Hadoop File System
        • Map Reduce- Programming method mainly for analytics
        • Other “Add-on”: Pig, , Hive, JAQL (IBM)
     • Storing and retrieving data - DBMS:
        • NoSQL – DBMS (not only SQL):
            •   Cassandra
            •   MongoDB
            •   CouchDB
            •   Hbase
     • New ways of manipulating and analyzing all kind data.
       Example – how do get specific lead from a Facebook status
       “I wish I could see Messi next month in London”? Not
       discussed in this presentation (see Einat’s presentation)
       New algorithms.
                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph   15
Who Uses Hadoop?

     •   Amazon/A9                                                              Quantcast
     •   AOL
                                                                                Rackspace/Mailtrust
     •   Facebook
     •   Fox interactive media
                                                                                Veoh
     •   Netflix                                                                Yahoo!
     •   New York Times                                                         PowerSet (now
                                                                                 Microsoft)



  More at http://wiki.apache.org/hadoop/PoweredBy




                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph                         16
Who Uses Cassandra?

     •   Facebook                                                            SimpleGeo
     •   Digg                                                                Rackspace
     •   Despegar                                                            Shazam
     •   Ooyala                                                              SoftwareProjects
     •   Imagini




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                      17
Big Data technologies (Hadoop etc.) vs. traditional IT


  Traditional IT                                              Big Data
  Centralized Storage                                         Local storage
  Brand redundant Servers                                     Cheap HW  White Boxes
  Standard Infrastructure and virtual                         Is standardization needed?! (in the HW
  servers.                                                    level). No server virtualization.
  Well established backup and DRP                             Why do I need backup? How do I tackle
  procedures                                                  DRP (compute clusters that are stretched
                                                              over locations)
  Traditional vendors                                         Open Source solutions
  Mature products and procedures                              In a new patch for specific issues
                                                              sometimes it is written “not implemented
                                                              yet”
  Traditional programming, SQL           Different kind of programming (map-
                                         reduce) , no Joins
      Will Big Data infrastructure be part of existing infrastructure or will be
                             developed as new domain?
                        Pini Cohen’s work Copyright STKI@2012
                        Do not remove source or attribution from any slide or graph                      18
The Basic Concept –the internet

     • Think Distributed
     • Think Parallel




       Source: http://retedeicittadini.it/wp-content/uploads/2011/02/network-distributed.gif                       Source: http://www.catonmat.net/blog/mit-introduction-to-algorithms-




                                                     Pini Cohen’s work Copyright STKI@2012
                                                     Do not remove source or attribution from any slide or graph                                                                          19
New type of scale:

     • Hadoop:
        • Up to 4,000 machines in a cluster
        • Up to 20 PB in a cluster
     • Currently traditional IT technologies can not handle this
       kind of scale.
     • This scale comes with a cost!




                                                                               Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph                                                                                             20
Brewer's (CAP) Theorem

     • It is impossible for a distributed computer system to
       simultaneously provide all three of the following
       guarantees:
        • Consistency (all nodes see the same data at the same time)
        • Availability (node failures do not prevent survivors from
          continuing to operate)
        • Partition Tolerance (the system continues to operate in many
          partitions and despite arbitrary message loss)




                  Source: Scalebase STKI modifications

                                                                               Professor Eric A. Brewer
                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph                      21
Dealing With CAP

     • Drop Consistency
        • Welcome to the “Eventually Consistent” term.
            • At the end – everything will work out just fine - And hey, sometimes
              this is a good enough solution
        • When no updates occur for a long period of time, eventually all
          updates will propagate through the system and all the nodes will
          be consistent
        • For a given accepted update and a given node, eventually either
          the update reaches the node or the node is removed from service
        • Known as BASE (Basically Available, Soft state, Eventual
          consistency), as opposed to ACID




                                                           Source: Scalebase
                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph         22
Hadoop

    • Apache Hadoop is a software framework that supports
      data-intensive distributed applications
    • It enables applications to work with thousands of nodes
      and petabytes of data.
    • Hadoop was inspired by Google's MapReduce and Google
      File System (GFS) papers
    • Contains (basically):
         • HDFS – Hadoop file System
         • MapReduce programming model




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   23
HDFS – Hadoop File System

        • Parallel
        • Distributed on commodity elements
        • Throughput over latency
        • Reliable and self healing
        • For large scale – typical file is gigabytes to terabytes (for
          one file!)
        • Applications need a write-once-read-many access
          model (mainly analytics)




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   24
HDFS motivation

     • What if you needed to write a program that distributes
       data on commodity HW (PC’s or Servers). You would need
       to take care of:
        •   Where is the data located
        •   How to distribute data between the nodes
        •   How many times you want to replicate the data
        •   How to insert, select and update data
        •   What to do if one node or more fails
        •   How to add node or to take out a node
        •   Manage and monitor the environment
     • Hadoop File System did it for you!



                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph   25
HDFS: Hadoop Distributed File Systems

              • Client requests meta data about a file from namenode
              • Data is served directly from datanode




                                                                                                                                              HDFS namenode
    Application
                         (file name, block id)
    HDFS Client                                                                                                        File namespace                                 /user/css534/input
                         (block id, block location)
                                                                                                                                                                      block 3df2




                                                                                                             instructions                                              state
                  (block id, byte range)
                                                                                                            HDFS datanode                                                          HDFS datanode
                   block data
                                                                                                      Linux local file system                                                  Linux local file system

                                                                                                                                           …                                                      …

                                    source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA




                         Pini Cohen’s work Copyright STKI@2012
                         Do not remove source or attribution from any slide or graph                                                                                                                     26
Datanode Blockreports


File “part-0” will be
replicated twice and will
populatesaved in blocks 1
and 3 (file is big so it has to
be divided to 2 blocks)




                                                 Block 1 is on data nodes A and C




                                                          source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA



                          Pini Cohen’s work Copyright STKI@2012
                          Do not remove source or attribution from any slide or graph                                                                                                       27
HDFS basic limitations

     • Namenode is single point of failure
     • Write-once model
     • Plan to support appending-writes
     • A namespace with an extremely large number of files
       exceeds Namenode’s capacity to maintain
     • Cannot be mounted by exisiting OS
     • Getting data in and out is tedious
     • HDFS does not implement / support user quotas / access
       permissions
     • Data balancing schemes
     • No periodic checkpoints

                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   28
Map Reduce programming model

    • In very basic – Brings the program to the data
    • Contains two elements:
        • Map: this part of the job is performed in parallel  asynchronous
          by each node
        • Reduce: gather the result from the relevant nodes
    • In more detail :
        • Map : return (write on temp file) a list containing zero or more
          ( k, v ) pairs
            • Output can be a different key from the input
            • Output can have same key
        • Reduce : return a new list of reduced output from input




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   29
MapReduce motivation

    • What if you needed to write a program that processes data
      that’s on distributed computers?
    • You would need to write distributed program that:
       • Finds where the data located
       • Work on each node and then combine the result from each node
         together.
       • Where (on the local node) and how (format) to write the
         intermediate results
       • Find when the jobs of all participating nodes have concluded and
         then start the “aggregation” part
       • What to do if a job is stuck (restart the job or turn to another node
         to perform the same job)
    • Hadopp MapReduce is the framework for you!

                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   30
MapReduce example:

    map(String key, String value):
    // key: document name
    // value: document contents
    for each word w in value:
     EmitIntermediate(w, "1");

    reduce(String key, Iterator values):
    // key: a word
    // values: a list of counts
    int result = 0;
    for each v in values:
     result += ParseInt(v);
    Emit(AsString(result));

                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   31
Dataflow in Hadoop



                                                 Master                         Job: Word Count
                Submit job

                                                                                             All elements – standard HW




                       map                            schedule                      reduce



                       map                                                          reduce




                                                                    Source: Haifa Labs IBM
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                       32
Dataflow in Hadoop




                   Hello World Bye World
 Read                                                     Hello 1
 Input File                                               World 2
                                map                                                          reduce
              Block 1                                      Bye

                   Hello Hadoop Goodbye Hadoop
   HDFS
              Block 2                                      Hello 1
                                map                       Hadoop 2                           reduce
                                                          Goodbye




                                                                             Source: Haifa Labs IBM
                         Pini Cohen’s work Copyright STKI@2012
                         Do not remove source or attribution from any slide or graph                  33
Dataflow in Hadoop




                              Finished                                      Finished + Location


                     map                      Local
                                               FS
                                                                                  reduce



                                              Local
                     map                       FS                                 reduce




                                                                  Source: Haifa Labs IBM
              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph                         34
Dataflow in Hadoop




                     map                      Local
                                               FS
                                                                                  reduce

                                                          HTTP GET
                                              Local
                     map                       FS                                 reduce




                                                                  Source: Haifa Labs IBM
              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph                  35
Dataflow in Hadoop




                                                                                           Write
                                                                                           Final
                                                                                  reduce
                                                                                           Answer
                                                                                              HDFS

                                                                                  reduce      Bye 1
                                                                                              Goodbye 1
                                                                                              Hadoop 2
                                                                                              Hello 2
                                                                                              World 2

                                                                  Source: Haifa Labs IBM
              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph                                 36
Example: Flow Analysis Map/Reduce



                                                                                                            • Read text flow files
        Flow
        Flow                                                                           Flow Octet
                                                                                Dst Port                    • Run map tasks
        Flow                                                                                                    • Read each line
                                                                                                                  (Validation Check)
                                                                                                                • Parsing flow data
                                                                                                                • Save result
                                     53                          [64, 128]
                                                                                                                  into temporary files
                                                                                                                  (key, value)

   53      128
           64                                                                       53        192           • Run reduce tasks
                                                                                                                • Read temporary files
                                                                                                                  (Key, List[Value])
                                                                                                                • Run sum process
                                                                                                            • Write results to a file
                 Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt




                                              Pini Cohen’s work Copyright STKI@2012
                                              Do not remove source or attribution from any slide or graph                                37
Components of Cluster Node


            Flow File Input
               Processor

                                                                     Flow Analysis           Flow Analysis   • Flow file
            Cluster File                                                Map                    Reduce
            Cluster File
                                                                         Map                    Reduce         input processor
              System
             (System)
               HDFS                                                                                          • Flow analysis
  flow-      ( HDFS )
                                                                               MapReduce Library               map/reduce
  tools
                                                                                                             • Flow-tools
                                                                              Hadoop                         • Hadoop
                                                                                                                • HDFS
                                                 Java Virtual Machine
                                                                                                                • MapReduce
                Operating System ( Linux )                                                                   • Java VM
                                                                                                             • OS : Linux
          Hardware ( CPU, HDD, Memory, NIC )
               Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt




                                            Pini Cohen’s work Copyright STKI@2012
                                            Do not remove source or attribution from any slide or graph                          38
MapReduce helprs: Hive, Pig

         • Make life easier – translate more friendly language to Map
           Reduce
                                                                                        Hive                Pig

Language                                                                               SQL-like          PigLatin

Schemas/Types                                                                       Yes (explicit)     Yes (implicit)

Partitions                                                                               Yes                No

Server                                                                             Optional (Thrift)        No

User Defined Functions (UDF)                                                          Yes (Java)        Yes (Java)

Custom Serializer/Deserializer                                                           Yes                Yes

DFS Direct Access                                                                   Yes (implicit)     Yes (explicit)

Streaming                                                                                Yes                Yes

Web Interface                                                                            Yes                No

JDBC/ODBC                                                                           Yes (limited)           No


                     Pini Cohen’s work Copyright STKI@2012
                     Do not remove source or attribution from any slide or graph                                    39
Hive: MapReduce helper:

     • Code Example:
        • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
        • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a
         WHERE a.key < 100;
        • hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.*
         FROM events a;
        • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites,
         a.pokes FROM profiles a;
        • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*)
         FROM invites a WHERE a.ds='2008-08-15';
        • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar
         FROM invites a;
        • hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT
         SUM(a.pc) FROM pc1 a;




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   40
NoSQL DBMS: storing and retrieving data

     • Key/Value
         • A big hash table
         • Examples: Voldemort, Amazon’s Dynamo
     • Big Table
         • Big table, column families
         • Examples: Hbase, Cassandra
     • Document based
         • Collections of collections
         • Examples: CouchDB, MongoDB
     • Graph databases
         • Based on graph theory
         • Examples: Neo4J
     • Each solves a different problem


                                                             Source: Scalebase

                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph   41
Pros/Cons

     • Pros:
         • Performance
         • BigData
         • Most solutions are open source
         • Data is replicated to nodes and is therefore fault-tolerant
           (partitioning)
         • Don't require a schema
         • Can scale up and down
     • Cons:
         •   Code change
         •   No framework support
         •   Not ACID
         •   Eco system (BI, Backup)
         •   There is always a database at the backend
         •   Some API is just too simple
                                                               Source: Scalebase

                     Pini Cohen’s work Copyright STKI@2012
                     Do not remove source or attribution from any slide or graph   42
There are some NoSQL projects out there…




                                                                            Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel
              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph                                                                                           43
NoSQL Market Forecast 2011-2015




            http://www.marketresearchmedia.com/2010/11/11/nosql-market/

               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph   44
Apache Cassandra

     • Cassandra is a highly scalable, eventually
       consistent, distributed, structured key-value
       store
     • Child of Google’s BigTable and Amazon’s
       Dynamo
     • Peer to peer architecture. All nodes are equal                         Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt




     • Cassandra’s replication factor (RF) is the total
       number of nodes onto which the data will be
       placed. RF of at least 2 is highly recommended,
       keeping in mind that your effective number of
       nodes is (N total nodes / RF).
     • CQL (Cassandra Query Language) command line
     • Time stamp for each value written


                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                              45
Consistent Hashing

• Partition using consistent hashing (for the
  first node data is placed) based on MD5
  Distributed hash table algorithm                                                                    A
• Keys hash to a point on a fixed circular
                                                                                                                  C
  space                                                                                           V       B
• Ring is partitioned into a set of ordered
  slots and servers and keys hashed over
  these slots
• Nodes take positions on the circle.       S                                                                 D
• A, B, and D exists.
•   B responsible for AB range ( for replication
    factor=2 – default).
•   D responsible for BD range.
•   A responsible for DA range.                                                              R            H
• C joins.
•   B, D split ranges.                                                                                M
•   C gets BC from D.
                                  Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt



                               Pini Cohen’s work Copyright STKI@2012
                               Do not remove source or attribution from any slide or graph                        46
Write operation




                                  Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt




              Pini Cohen’s work Copyright STKI@2012
                                                                                                                                                       47
              Do not remove source or attribution from any slide or graph
Cassandra’s tunable consistency (write)

Level          Behavior
               Ensure that the write has been written to at least 1 node, including HintedHandoff
ANY
               recipients.
               Ensure that the write has been written to at least 1 replica's commit log and
ONE
               memory table before responding to the client.
               Ensure that the write has been written to at least 2 replica's before responding to
TWO
               the client.
               Ensure that the write has been written to at least 3 replica's before responding to
THREE
               the client.
               Ensure that the write has been written to N / 2 + 1 replicas before responding to the
QUORUM
               client.
               Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within
LOCAL_QUORUM
               the local datacenter (requires NetworkTopologyStrategy)

               Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each
EACH_QUORUM
               datacenter (requires NetworkTopologyStrategy)

               Ensure that the write is written to all N replicas before responding to the client. Any
ALL
               unresponsive replicas will fail the operation.

                        Pini Cohen’s work Copyright STKI@2012
                        Do not remove source or attribution from any slide or graph         Source: wiki
                                                                                                  48
Cassandra’s tunable consistency – read

Level        Behavior
ANY          Not supported. You probably want ONE instead.

             Will return the record returned by the first replica to respond. A consistency check is always done in a
ONE          background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent
             calls will have correct data even if the initial read gets an older value. (This is called ReadRepair)


             Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will
TWO
             be checked in the background.

THREE        Will query 3 replicas and return the record with the most recent timestamp.

             Will query all replicas and return the record with the most recent timestamp once it has at least a majority of
QUORUM
             replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.

LOCAL_QUO Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have
RUM       replied.
EACH_QUO Returns the record with the most recent timestamp once a majority of replicas within each datacenter have
RUM      replied.

             Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any
ALL
             unresponsive replicas will fail the operation.

                                  Pini Cohen’s work Copyright STKI@2012
                                  Do not remove source or attribution from any slide or graph                 Source: wiki
                                                                                                                    49
Cassandra’s data model structure

                 Think of cassandra as row-oriented
      keyspace


                           column family
        settings
          (eg,
      partitioner)          settings                          column
                              (eg,
                          comparator,
                           type [Std])                               name                                             value                                           clock




                                                                    Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt

                     Pini Cohen’s work Copyright STKI@2012
                     Do not remove source or attribution from any slide or graph                                                                                                         50
Data Model – “flexible” scheme!

 ColumnFamily: Rockets

Key                      Value

 1                        Name                                                          Value

                          name                                                          Rocket-Powered Roller Skates
                          toon                                                          Ready, Set, Zoom
                          inventoryQty                                                  5
                          brakes                                                        false


 2                        Name                                                          Value

                          name                                                          Little Giant Do-It-Yourself Rocket-Sled Kit
                          toon                                                          Beep Prepared
                          inventoryQty                                                  4
                          brakes                                                        false


 3                        Name                                                          Value

                          name                                                          Acme Jet Propelled Unicycle
                          toon                                                          Hot Rod and Reel
                          inventoryQty                                                  1
                          wheels                                                        1
                                   Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html




                           Pini Cohen’s work Copyright STKI@2012
                           Do not remove source or attribution from any slide or graph                                                51
Cassandra’s CQL – Cassandra SQL Language

     • SQL like. Example:
        • CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and
          strategy_options:replication_factor=1;
        • CREATE INDEX ON users (birth_date);
        • SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
     • However:
        • No Joins
        • No UPDATES/DELETES




                  Pini Cohen’s work Copyright STKI@2012
                  Do not remove source or attribution from any slide or graph   52
NoSQL benchmark – for scale!




            Source: r esearch.yahoo.com/files/ycsb-v4.pdf




                        Pini Cohen’s work Copyright STKI@2012
                        Do not remove source or attribution from any slide or graph   53
Can we live with NoSQL limitations?

     • Facebook has dropped Cassandra
     • “..we found Cassandra's eventual consistency model to be a
       difficult pattern to reconcile for our new Messages
       infrastructure”
     • Facebook has selected HBase (Columnar DBMS) .
                 http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-
     messages/454991608919




                          Pini Cohen’s work Copyright STKI@2012
                          Do not remove source or attribution from any slide or graph               54
What about other NoSQL DBMS?

    • MongoDB
    • Hbase
    • CouchDB
    • Maybe next session….




               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph   55
Big Data potential implications on IT

     • Will traditional RDBMS be obsolete? Surely no!
     • Several areas are Big Data zone by definition – Internet
       marketing, Cyber, DW, etc.
     • How well can we live with “Eventually Consistent” which in
       most cases means 1-2 minutes delay?!
     • Can we define that all batch data can live well on Big Data
       technologies?
     • Will we see at the end (10 years form now) that only small
       portion of data still resides on RDBMS and most of the data
       resides on Big Data technologies?!




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   56
Example of big data technology: SPLUNK

     • Splunk is a traditional IT vendor based on MapReduce
       (from 2009)




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   57
Another aspect of Big Data - IBM Watson wins in Jeopardy




                                                                             58
               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph
DeepQA: the technology & architecture behind Watson

                                                                                                           Learned Models
                                                                                                          help combine and
                                                                                                          weigh the Evidence


                                                                                                          model    model   model
                Answer Sources                                             Evidence Sources
                                                                                                          model    model   model
  Initial                   Candidate                       Answer                Evidence      Deep
             Primary
 Question                    Answer                         Scoring               Retrieval   Evidence
             Search                                                                                       model    model   model
                            Generation                                                         Scoring



 Question                                                                 Hypothesis
              Question                   Hypothesis                                                         Final Confidence
  & Topic                                                                 & Evidence          Synthesis
            Decomposition                Generation                                                        Merging & Ranking
 Analysis                                                                  Scoring



                                         Hypothesis                     Hypothesis and Evidence
                                         Generation                            Scoring                             Answer &
                                                                                                                  Confidence


                                         Hypothesis
                                                                        Hypothesis and Evidence Scoring
                                         Generation




                        Pini Cohen’s work Copyright STKI@2012
                        Do not remove source or attribution from any slide or graph                                                59
Where did it acquire knowledge?

   Three                      Domain Data                          Training and test   NLP Resources
                                                                                        (vocabularies,
   types of                   (articles, books,                      question sets
                                                                                         taxonomies,
                                documents)                          w/answer keys
   knowledge                                                                              ontologies)




     • Wikipedia
                                                                                             • 17 GB
     • Time, Inc.
                                                                                            • 2.0 GB
     • New York Time
                                                                                            • 7.4 GB
     • Encarta                                                                              • 0.3 GB
     • Oxford University                                                                   • 0.11 GB
     • Internet Movie Database                                                              • 0.1 GB
     • IBM Dictionary                                                                      • 0.01 GB
     • ... J! Archive/YAGO/dbPedia…                                                                 XXX
     • Total Raw Content                                                                     • 70 GB
     • Preprocessed Content                                                                 • 500 GB
                    Pini Cohen’s work Copyright STKI@2012
                    Do not remove source or attribution from any slide or graph                           60
IBM’s Watson possible implications

     If the computer understands my speech, why do I need a
     keyboard?
     If the computer can talk, why do I need a screen?
     If the computer understands semantics and can act with its
     own reasoning – why do you need me?!




                                                                              61
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph
Major paradigm shifts -mini agenda

     • Why don’t we see a change when it is coming?
     • Big Data and programming models
     • The changing end user devices ecosystem
     • Infrastructure as a s Code and DEVOPS




                                                                              Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                                  62
Mega-trend #1 of 21st century


 CONSUMERIZATION:
         empowerment of people collaborating via
             connected mobile devices




               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph
User Interface Revolution – Touch / Sound(Voice) / Move Era




                    Pini Cohen’s work Copyright STKI@2012
                    Do not remove source or attribution from any slide or graph   64
2012: Sound/Voice is in




               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph   65
2012: Face recognition is in




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph
Desktop and Mobile ecosystems begin to converge




                  “BYOD : bring your own device"
       employees asserting control over the technology they use for work
                           4 Devices per employee?!
                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph
Four screens of convergence: TV, PC, mobile and in-car

• We want to be connected 7X24
• Each of these screens is useful during our
  day and each is connected to the 'cloud'




• IT should allow us to use the same
  business (IT supports ALL) and
  entertainment applications

                    Pini Cohen’s work Copyright STKI@2012
                    Do not remove source or attribution from any slide or graph   68
Can IT support all devices ?

      • Employees will use as many
        computers and mobile devices as
        they wish.

      • Automatically keep their data in
        sync with a backup copy .

      • Solutions should be enterprise class :
          • secure
          • reliable
          • maintainable
          • integrated to critical back-office
            systems

                  Pini Cohen’s work Copyright STKI@2012
                  Do not remove source or attribution from any slide or graph   69
What about Productivity Software for non-wintel machines?




                                                                               Office 2015
                                                                                ARM W8




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph             70
Israel (expected end 2012):




      Wintel: Q42011 compared to Q42010
      Desktop PCs: -25% Notebooks: -35%
               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph
Client/server v2



                                                                                                                               Client/Server V2
                                                                                                                               1. Most apps work on/off line
Terminals V 2                                                                                                                  2. Most of the time connected
                                                                                                                               3. Uses cloud/local applications
WEB/Browser client
2 types of applications:
1. Off-line: processing and
storage local
2. Always connected:
                                                                                                                        Client/Server V1
browser based applications                                                                                                 2 types of applications:
                                                                                                                        1. Off-line: processing and storage local
          Terminals V1                                                                                                  2. Always connected : data and
          Always connected
                                        Picture Source: http://sthvcarringtonmedia.blogspot.com/2011/02/emotions.html
                                                                                                                           processing @server; GUI++ @client
          I/O only at the local

                                          ADVANCES/COST
                                          1. Communications/networking
                                          2. Processor/storage
                                          3. Power /battery
                              Pini Cohen’s work Copyright STKI@2012
                              Do not remove source or attribution from any slide or graph
Windows on ARM

Feature                Windows 8 x86/64                        Windows 8 on ARM               Source: http://lenzfire.com/2011/12/future-of-pc-is-soon-to-be-woa-windows-on-arm-than-to-wintel-85094/



Device Branding        Such devices would be                   These would also be
                       branded as x86/64 ones                  branded as ARM
Old Windows 7 Things   Everything that runs on                 Only selective things
                       Windows 7 would run on                  would be runnable
                       these platforms


Virtualization         Yes, If hardware supports it            Not supported


Turn on/off options    Yes, on all devices                     No, devices would keep
                                                               running on Connected
                                                               Standby power mode
App Development        Yes, many tools are                     Yes, but with selective
                       available                               tools only which are not
                                                               yet available


Availability           All the sources from where              Would be available only
                       Windows 7 is available e.g.             in ARM devices. No,
                       online, DVD/CD and PC’s etc             DVD’s or online
                                                               availability                   WOA – Windows on Arm
Driver availability    From respective company’s               Only through Windows
                       site, DVD/CD’s and through              Update
                       Windows Update
Maintenance e.g.       Through Windows Disks and               Only Through Windows
Updates and Other      Windows Update                          Update
Fixes
Uniqueness             Any source would run on a               Each source in unique to
                       wide variety of devices                 unique device



                                Pini Cohen’s work Copyright STKI@2012
                                Do not remove source or attribution from any slide or graph                                                                                                             73
Microsoft is fighting back

    Win8 tabletsphone are:                                                  However:
    • Easier to managesecure                                                • Microsoft starts from
      from enterprise                                                          scratch in this markets
      perspective
                                                                             • The “influences” already
    • Easier to synchronize
      with enterprise data                                                     are heavy users mainly of
                                                                               “stylish Apple”
    • Easier to enable
      enterprise applications                                                • There are strong forces
      (on Intel based devices)                                                 within Microsoft to
    • Microsoft hopes to “Bring                                                enable business
      Your Enterprise to Home”                                                 applications to other
      BYEH                                                                     platforms (Office on iPAD
                                                                               Android..)
     Will Microsoft “hidden” dream of “IT enabling only Microsoft tablets and
             phones accessing mail enterprise apps” will come true?!
                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph
A new era. We had it before:

                                                                             Source: http://www.socialtechpop.com/2010/10/old-vs-new-trends-in-social-media/




               Pini Cohen’s work Copyright STKI@2012
               Do not remove source or attribution from any slide or graph                                                                                     75
And the new era will look like :




   Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletnotebook-thingy/




   Computing as we now it today                                                                                                                  Change at the deviceUX level
                                                                                                                                                and change in application level -
                                                                                                                                                            mobility


                                                                                  Pini Cohen’s work Copyright STKI@2012
                                                                                  Do not remove source or attribution from any slide or graph                                       76
New Era: IT can no longer dictate a single device

     • Looks like the dominance of Microsoft on Intel with C/S or WEB
       app is over!
     • The new general purpose application architecture will support:
         • Data stored in a cloud and in local devices (appropriate formats per
           each device).
         • Data synchronization with conflict resolution between data instances
         • Continuous transaction processing between different devices =
           mobility
         • Different interfaces to the same application (mainly APPS but also
           browser based)
         • Application code is native or hybrid for each device
         • Offline work (read with update)
         • Automatic SW update
         • Voice
         • Face recognition
         • AI reasoning

                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph    77
Major paradigm shifts -mini agenda

     • Why don’t we see a change when it is coming?
     • Big Data and programming models
     • The changing end users devices ecosystem
     • Infrastructure as a s Code and DEVOPS




                                                                              Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                                  78
Infrastructure as code

     • Treat your infrastructure as code:
         •   AnalyzeDesign
         •   Develop (the automation scripts)
         •   Prepare the Build
         •   Test
         •   Deploy the Build
     • That means – no more manual configurations
     • Automatic testing – not only for the apps level
     • Also – be sure that what is not in the build – will not be
       installed
     • Is that possible in the current landscape?!



                    Pini Cohen’s work Copyright STKI@2012
                    Do not remove source or attribution from any slide or graph   79
Some SW definitions:

     • Software build - the process of converting source code files
       into standalone software artifact(s) that can be run on a
       computer. One of the most important steps of a software
       build is the compilation process where source code files
       are converted into executable code.
     • Build automation is the act of automating a wide variety of
       tasks that software developers do in their day-to-day
       activities including things like:
        •   compiling computer source code into binary code
        •   packaging binary code
        •   running tests
        •   deployment to production systems

                      Source: Wiki STKI modifications


                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   80
Infrastructure as code

     • This will enable frequent changes in production
     • 180% change from current “versions” policy!




                         Source: wiki
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   81
Opscode - Chef

      • With Chef, you write abstract definitions as source code to describe
        how you want each part of your infrastructure to be built, and then
        apply those descriptions to individual servers.
      • The result is a fully automated infrastructure: when a new server
        comes on line, the only thing you have to do is tell Chef what role it
        should play in your architecture.




  Source: opscode


                    Pini Cohen’s work Copyright STKI@2012
                    Do not remove source or attribution from any slide or graph   82
Opscode’s Chef

     • Chef agent assures that the desired configuration is
       installed!
     • All install files  scripts are located in a central repository
       (Chef Server) in CouchDB
     • Tracing what was successful and what not
     • Documentation of everything
     • Major components: Cookbooks, Precipice , Knife, Shef
     • Pull model (can not control when components are
       installed)
     • Ruby scripting language



                  Pini Cohen’s work Copyright STKI@2012
                  Do not remove source or attribution from any slide or graph   83
Devops – Development and Operations

     • Addresses the conflict between Development and
       Operations:
        • Development – are paid for change
        • Operations – change is the enemy!
     • “Wall of Confusion” - combination of conflicting
       motivations, processes, and tooling




                                                                               Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph                                                                   84
Devops – Development from Mars, Operations from Venus

     • Development and Operations are in different organization
       entities and use different tools




                                                                                   Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   85
DeploymentRelease time is trouble time

     • Development kicks things off by "tossing" a software release
       "over the wall" to Operations.
     • Operations also hand edit configuration files to reflect the
       production environment, which is significantly different than
       the Development or QA environments.
     • At best they are duplicating work that was already done in
       previous environments, at worst they are about to introduce
       or uncover new bugs.




                                                                                   Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   86
Devops – new state of mind




                                                                                 Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph   87
Devops aims at:




                                                                                                Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
     • DevOps enables the benefits of Agile development to be
       felt at the organizational level. DevOps does this by
       allowing for fast and responsive, yet stable, operations that
       can be kept in sync with the pace of innovation coming out
       of the development process.




                                                                                     http://en.wikipedia.org/wiki/File:Devops.png
                  Pini Cohen’s work Copyright STKI@2012
                  Do not remove source or attribution from any slide or graph   88
DevOps Addresses Challenges

     • DevOps is an operational approach that automates system
       configuration and management.

     • To manage cloud systems, customers
        • Need to manage servers as groups
        • Must respond to rapid infrastructure changes
        • Have repeatable automated deployments




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   89
Striving towards Devops state of mind:

     • Measurement and incentives to change culture - metrics
       based on joint performance
     • Unified processes
     • Unified tooling




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   90
Devop Measurement

    • Resource Utilization - How resources are allocated and how efficiently
      they are used. Usually we're talking about people, but other kinds of
      resources can fall into this bucket as well.
        • How much time do developers and administrators spend on build and deployment
          activity?
        • How much productivity is lost to problems and bottlenecks? What is the ripple




                                                                                               Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
          effect of that?
        • What’s the ratio of ad-hoc change or service recovery activity to
          planned change?
        • What’s the cost of moving a unit of change through your lifecycle?
        • What's the mean time to diagnose a service outage? Mean time to repair?
        • What was the true cost of each build or deployment problem (resource and
          schedule impact)?
        • What percentage of Development driven changes require Operations to
          edit/change procedures or edit/change automation?
        • How much management time is spent dealing with build and deployment problems
          or change management overhead?
        • Can Development and QA successfully deploy their own
          environments? How long does it take per deployment?
        • How much of your team’s time is spent recreating and maintaining software
          infrastructure that already exists elsewhere?


                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph            91
Devop Measurement

    • Operations Throughput - The volume and rate at which change
      moves through your development to operations pipeline.
        • How long does it take to get a release from development,
          through testing, and into production?




                                                                                       Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
       • How much of that is actual testing time, deployment time, handoff
         time, or waiting?
       • How many releases can you successfully deploy per period?
       • How many successful individual change requests can your operations
         team handle per period?
       • Are any build and deployment activities the rate limiting step of your
         application lifecycle? How does that limit impact your business?
       • How many simultaneous changes can your team safely handle?
       • What is business' perceived “wait time” from code
         completion to production deployment of a feature?

                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph      92
Devop Measurement

    • Agility - This looks at how quickly and efficiently your IT
      operations can react to changes in the needs of your
      business.
       • How quickly can you scale up or scale down capacity to meet




                                                                                    Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
         changing business demands?
       • What’s the change management overhead associated
         increasing/decreasing capacity? What’s the risk?
       • How quickly and what would it cost to adapt your build and
         deployment systems to automate any new applications or
         acquired business lines?
       • What would it cost you to handle a x% growth in the number of
         applications or business lines (direct resource assignment plus any
         attention drain from other staff)?
       • Could your IT operations handle a x% growth in number of
         applications or business lines? (i.e. could it even be done?)
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph    93
Architecture Concepts related to Devops

     • Devops is related to several technology
       architecture and guidelines:
        • Build an application “as stateless as” and “as shared
          nothing as” possible
        • Try to have as least “technical debt” as possible (bugs
          that are on production, patches that are not installed,
          unsupported swhw, etc.)
        • Build an application with the ability to “turn off” some
          of its functionality while on air
        • Expending transaction versions vs. modifying or
          updating transaction (enables roll back and working
          concurrently in several versions)

                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph   94
Devops tools:




                                                                              Soruce: http://doc36.controltier.org/wiki/File:ProvisioningToolchain.png
                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                                              95
Devops vs. Private Cloud?

     • In many aspects the objectives of Devops and Private Cloud
       are overlapping
     • Automation is at the core of both Private Cloud and Devops




                                         Source: http://www.pistoncloud.com/2012/01/devops-and-private-cloud-sitting-in-a-tree/




                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                                       96
Some input from last’s year presentation

     • Public cloud




              Source: IDC https://www.eiseverywhere.com/file_uploads/7e2edb16ed28a2123cd21508f87be8b2_ITR_Boston_2011_Public_and_Private_Cloud_Track_RickVillars_IDC.pdf




                        Pini Cohen’s work Copyright STKI@2012
                        Do not remove source or attribution from any slide or graph                                                                                        97
Summary – Major paradigm shifts

     • Remember Digital Equipment
       Corporation (DEC). “Underdogs
       become mainstream faster then we
       think”. Change is crucial
     • Embrace big data experiments
     • Embrace Devops concepts – metrics,
       process and tools. Start with metrics
     • Devops tools might be our current                                                               Technologies
       configuration, CMDB, tools.                                            Processes

     • Embrace at least one SAAS application                                              Standardization
       now (Email, Service desk, HR, ERP,
       CRM, etc.). Also IAAS, PAAS.
     • Standardization with processes.


                Pini Cohen’s work Copyright STKI@2012
                Do not remove source or attribution from any slide or graph                                   98
STKI Round Tables

     • Lots of useful information – use it !




                 Pini Cohen’s work Copyright STKI@2012
                 Do not remove source or attribution from any slide or graph   99
STKI Round Tables




              Pini Cohen’s work Copyright STKI@2012
              Do not remove source or attribution from any slide or graph   100
We will present data on products and vendors:


1. Israeli vendors rating – state of the current market focused on the
   enterprise market (not SMB)
           X – Market penetration (sales + installed base+ clients
             perspective)
           Y – is X plus localization, support, development center, number
             and kind of integrators, etc.
           Worldwide leaders marked, based on global positioning
           Vendors to watch: Are only just entering Israeli market or
             making a big change so can’t be positioned but should be
             watched
      Represents the current Israeli market and not necessarily what we
        recommend to our clients
2. Products and selected resellers / implementers
      The location within the list is random


                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph   101
We will present data on products and vendors (cont.)




3. Selected installations of products – projects in different stages ,
   production,implementation, after decision…

4. Service providers that are used by users . I asked users – “which
   SI do you use in this category” and counted the result.

5. Analysis by international and Israeli analysts
      This complete information (1 to 5) should be used together,
       combined with the specific circumstances of each case when
       making a decision
      This subjective chart is the result of our
      objective research
                   Pini Cohen’s work Copyright STKI@2012
                   Do not remove source or attribution from any slide or graph   102
103
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph     103
Ratio Analysis:

                                                                                   Sorted Metric   Metric
• 25% percentile                                                                             36           57
                                                                                             43           36
• 50% percentile =                                                                           50          117
  median                                                                                     50
                                                                                             57
                                                                                                         438
                                                                                                          60
• 75% percentile                                                                             60
                                                                                             60
                                                                                                         175
                                                                                                         150
                           68.6                     25% percentile                           71          143
                                                                                            100          120
                                                                                            100           50
                                                                                            109          250
                                                                                            117          125
                                                                                            117          280
                                                                                            120           60
                         120.0                      50% percentile = Median                 120          200
                                                                                            125          117
                                                                                            125          100
                                                                                            143          164
                                                                                            150          125
                                                                                            164          600
                                                                                            175          192
                         178.1                      75% percentile                          188           71
                                                                                            192          120
                                                                                            200           50
                                                                                            250          188
                                                                                            280           43
                                                                                            438          109
                     Pini Cohen’s work Copyright STKI@2012                                  600
                     Do not remove source or attribution from any slide or graph                     104 100
Agenda


Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage


                                                                           Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg




                    Pini Cohen’s work Copyright STKI@2012
                                                                                                                105
                    Do not remove source or attribution from any slide or graph

More Related Content

Similar to Stki summit2012infra v7 - major trends - paradign shifts

Teaching IT one trick or two
Teaching IT one trick or twoTeaching IT one trick or two
Teaching IT one trick or twoPini Cohen
 
Secure development 2014
Secure development 2014Secure development 2014
Secure development 2014Ariel Evans
 
For netapp haifa 2012 v3
For netapp haifa 2012 v3For netapp haifa 2012 v3
For netapp haifa 2012 v3Pini Cohen
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 
Summit 2011 infra_general_trends
Summit 2011 infra_general_trendsSummit 2011 infra_general_trends
Summit 2011 infra_general_trendsPini Cohen
 
Delivery 2015 pini
Delivery 2015 piniDelivery 2015 pini
Delivery 2015 piniPini Cohen
 
Summit 2011 infra_pini_v6
Summit 2011 infra_pini_v6Summit 2011 infra_pini_v6
Summit 2011 infra_pini_v6Pini Cohen
 
Summit 2017 cyber delivery v4 long version
Summit 2017 cyber delivery v4 long versionSummit 2017 cyber delivery v4 long version
Summit 2017 cyber delivery v4 long versionPini Cohen
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013davidrknight
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Pini sigal Summit 2014 final
Pini sigal  Summit 2014 finalPini sigal  Summit 2014 final
Pini sigal Summit 2014 finalAriel Evans
 
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trends
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trendsSTKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trends
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trendsPini Cohen
 
STKI Summit 2010 Infra Pini
STKI Summit 2010 Infra PiniSTKI Summit 2010 Infra Pini
STKI Summit 2010 Infra PiniPini Cohen
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youModusOptimum
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Visual_BI
 
Summit 2011 infra_dbms
Summit 2011 infra_dbmsSummit 2011 infra_dbms
Summit 2011 infra_dbmsPini Cohen
 

Similar to Stki summit2012infra v7 - major trends - paradign shifts (20)

Teaching IT one trick or two
Teaching IT one trick or twoTeaching IT one trick or two
Teaching IT one trick or two
 
Secure development 2014
Secure development 2014Secure development 2014
Secure development 2014
 
For netapp haifa 2012 v3
For netapp haifa 2012 v3For netapp haifa 2012 v3
For netapp haifa 2012 v3
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
Summit 2011 infra_general_trends
Summit 2011 infra_general_trendsSummit 2011 infra_general_trends
Summit 2011 infra_general_trends
 
Delivery 2015 pini
Delivery 2015 piniDelivery 2015 pini
Delivery 2015 pini
 
Summit 2011 infra_pini_v6
Summit 2011 infra_pini_v6Summit 2011 infra_pini_v6
Summit 2011 infra_pini_v6
 
SGI Big Data Launch
SGI Big Data LaunchSGI Big Data Launch
SGI Big Data Launch
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Summit 2017 cyber delivery v4 long version
Summit 2017 cyber delivery v4 long versionSummit 2017 cyber delivery v4 long version
Summit 2017 cyber delivery v4 long version
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Pini sigal Summit 2014 final
Pini sigal  Summit 2014 finalPini sigal  Summit 2014 final
Pini sigal Summit 2014 final
 
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trends
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trendsSTKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trends
STKI Summit 2014 Infra Trends - How CIO Deliver - complete infra trends
 
STKI Summit 2010 Infra Pini
STKI Summit 2010 Infra PiniSTKI Summit 2010 Infra Pini
STKI Summit 2010 Infra Pini
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
 
Summit 2011 infra_dbms
Summit 2011 infra_dbmsSummit 2011 infra_dbms
Summit 2011 infra_dbms
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 

More from Pini Cohen

Cto 2021 markets v2
Cto 2021 markets v2Cto 2021 markets v2
Cto 2021 markets v2Pini Cohen
 
Workato integrators corrections stki Israeli VAS market research 2020 v1
Workato integrators corrections stki Israeli VAS  market research 2020 v1Workato integrators corrections stki Israeli VAS  market research 2020 v1
Workato integrators corrections stki Israeli VAS market research 2020 v1Pini Cohen
 
It procurement 2019 v3
It procurement 2019 v3It procurement 2019 v3
It procurement 2019 v3Pini Cohen
 
STKI summit CTO presentation 2019
STKI summit CTO presentation 2019STKI summit CTO presentation 2019
STKI summit CTO presentation 2019Pini Cohen
 
STKI IT Delivery staffing ratios 2018 v3
STKI IT Delivery staffing ratios 2018 v3STKI IT Delivery staffing ratios 2018 v3
STKI IT Delivery staffing ratios 2018 v3Pini Cohen
 
Stkisummi18 i taa_s_cybergov_long_version_v2
Stkisummi18 i taa_s_cybergov_long_version_v2Stkisummi18 i taa_s_cybergov_long_version_v2
Stkisummi18 i taa_s_cybergov_long_version_v2Pini Cohen
 
Dev trends 18_q1
Dev trends 18_q1Dev trends 18_q1
Dev trends 18_q1Pini Cohen
 
Stkisummi18 i taa_s_cybergov_long_version_v1
Stkisummi18 i taa_s_cybergov_long_version_v1Stkisummi18 i taa_s_cybergov_long_version_v1
Stkisummi18 i taa_s_cybergov_long_version_v1Pini Cohen
 
Delivery positionnig 2017 v2
Delivery positionnig 2017   v2Delivery positionnig 2017   v2
Delivery positionnig 2017 v2Pini Cohen
 
IT procurement cloud (and other) recommandations
IT procurement cloud (and other) recommandationsIT procurement cloud (and other) recommandations
IT procurement cloud (and other) recommandationsPini Cohen
 
IT procurement v2
IT procurement v2IT procurement v2
IT procurement v2Pini Cohen
 
Cyber ratios 2017 v1
Cyber ratios 2017 v1Cyber ratios 2017 v1
Cyber ratios 2017 v1Pini Cohen
 
Delivery positionnig 2016 v1
Delivery positionnig 2016 v1Delivery positionnig 2016 v1
Delivery positionnig 2016 v1Pini Cohen
 
Ratios 2016 v1
Ratios 2016 v1Ratios 2016 v1
Ratios 2016 v1Pini Cohen
 
It delivery 2016 v5
It delivery 2016 v5It delivery 2016 v5
It delivery 2016 v5Pini Cohen
 
Positioning stki pini 2015 v1
Positioning stki  pini 2015 v1Positioning stki  pini 2015 v1
Positioning stki pini 2015 v1Pini Cohen
 
Stki ratios 2015 v1
Stki ratios 2015 v1Stki ratios 2015 v1
Stki ratios 2015 v1Pini Cohen
 
STKI staffing ratios ratios 2014
STKI staffing ratios ratios 2014STKI staffing ratios ratios 2014
STKI staffing ratios ratios 2014Pini Cohen
 
STKI Summit 2014 - Trends and Positioning - Delivery domain
STKI Summit 2014 - Trends and Positioning - Delivery domain STKI Summit 2014 - Trends and Positioning - Delivery domain
STKI Summit 2014 - Trends and Positioning - Delivery domain Pini Cohen
 
STKI Summit 2014 - How does CIO deliver?
STKI Summit 2014 - How does CIO deliver?STKI Summit 2014 - How does CIO deliver?
STKI Summit 2014 - How does CIO deliver?Pini Cohen
 

More from Pini Cohen (20)

Cto 2021 markets v2
Cto 2021 markets v2Cto 2021 markets v2
Cto 2021 markets v2
 
Workato integrators corrections stki Israeli VAS market research 2020 v1
Workato integrators corrections stki Israeli VAS  market research 2020 v1Workato integrators corrections stki Israeli VAS  market research 2020 v1
Workato integrators corrections stki Israeli VAS market research 2020 v1
 
It procurement 2019 v3
It procurement 2019 v3It procurement 2019 v3
It procurement 2019 v3
 
STKI summit CTO presentation 2019
STKI summit CTO presentation 2019STKI summit CTO presentation 2019
STKI summit CTO presentation 2019
 
STKI IT Delivery staffing ratios 2018 v3
STKI IT Delivery staffing ratios 2018 v3STKI IT Delivery staffing ratios 2018 v3
STKI IT Delivery staffing ratios 2018 v3
 
Stkisummi18 i taa_s_cybergov_long_version_v2
Stkisummi18 i taa_s_cybergov_long_version_v2Stkisummi18 i taa_s_cybergov_long_version_v2
Stkisummi18 i taa_s_cybergov_long_version_v2
 
Dev trends 18_q1
Dev trends 18_q1Dev trends 18_q1
Dev trends 18_q1
 
Stkisummi18 i taa_s_cybergov_long_version_v1
Stkisummi18 i taa_s_cybergov_long_version_v1Stkisummi18 i taa_s_cybergov_long_version_v1
Stkisummi18 i taa_s_cybergov_long_version_v1
 
Delivery positionnig 2017 v2
Delivery positionnig 2017   v2Delivery positionnig 2017   v2
Delivery positionnig 2017 v2
 
IT procurement cloud (and other) recommandations
IT procurement cloud (and other) recommandationsIT procurement cloud (and other) recommandations
IT procurement cloud (and other) recommandations
 
IT procurement v2
IT procurement v2IT procurement v2
IT procurement v2
 
Cyber ratios 2017 v1
Cyber ratios 2017 v1Cyber ratios 2017 v1
Cyber ratios 2017 v1
 
Delivery positionnig 2016 v1
Delivery positionnig 2016 v1Delivery positionnig 2016 v1
Delivery positionnig 2016 v1
 
Ratios 2016 v1
Ratios 2016 v1Ratios 2016 v1
Ratios 2016 v1
 
It delivery 2016 v5
It delivery 2016 v5It delivery 2016 v5
It delivery 2016 v5
 
Positioning stki pini 2015 v1
Positioning stki  pini 2015 v1Positioning stki  pini 2015 v1
Positioning stki pini 2015 v1
 
Stki ratios 2015 v1
Stki ratios 2015 v1Stki ratios 2015 v1
Stki ratios 2015 v1
 
STKI staffing ratios ratios 2014
STKI staffing ratios ratios 2014STKI staffing ratios ratios 2014
STKI staffing ratios ratios 2014
 
STKI Summit 2014 - Trends and Positioning - Delivery domain
STKI Summit 2014 - Trends and Positioning - Delivery domain STKI Summit 2014 - Trends and Positioning - Delivery domain
STKI Summit 2014 - Trends and Positioning - Delivery domain
 
STKI Summit 2014 - How does CIO deliver?
STKI Summit 2014 - How does CIO deliver?STKI Summit 2014 - How does CIO deliver?
STKI Summit 2014 - How does CIO deliver?
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Stki summit2012infra v7 - major trends - paradign shifts

  • 1. Trends in Infrastructure: Paradigm Shifts Tell me and I’ll forget Show me and I may STKI Summit 2012 remember Pini Cohen Involve me and I’ll VP and Senior Analyst
  • 2. What do we do? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 2
  • 3. Agenda Major paradigm shifts Development and SOA ESM BSM CMDB DBMS and DATA Platforms – Servers Clients Storage Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg Pini Cohen’s work Copyright STKI@2012 3 Do not remove source or attribution from any slide or graph
  • 4. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 4
  • 5. Managers Dillema • Bingo! My product is main stream product (quartiles 2 and 3). • Now, should I invest in quartiles 1 or 4? • Most managers will invest in quartile 4 Quality required is improving gradually Percentage Source of pic: http://www.buat-nadlan.com/2011/11/blog-post_3065.html New productcategory Quality required by Customers Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 5
  • 6. Prof. Clayton Christensen: Disruptive Innovation Model Remember Digital Equipment Corporation (DEC). “Underdogs become mainstream faster than we think”. Change towards what looks as “none mature” areas is crucial T1 T2 Pini Cohen’s work Copyright STKI@2012 6 Do not remove source or attribution from any slide or graph
  • 7. Last’s year my theme was “The Gap” Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 7
  • 8. Major paradigm shifts-mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as Code and Devops Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 8
  • 9. Big Data Definition – 4 V’s (or more…) • Volume – tens of TBs and more (15-20TB+) • Velocity – the speed in which data is added – 10M items per hour and more. And the speed in which the data needs to be processed • Variety – different types of data – structured & unstructured. In many cases deals with internet of things, social media, but also with voice, video, etc. • Variability - able to cope with new attributes and changing data types – without interrupting the analytical process (without “import-export”) • Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 9
  • 10. The origins of the 3V’s: • 2002 research by Doug Laney from META Group (now Gartner): Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 10
  • 11. “Big Data” theme main current usage: • “Big Data" is just marketing jargon. -Doug Laney, Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg • STKI : doing something significantly different from what you’ve done until now Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 11
  • 12. Big Data at work: • Orbitz Worldwide has collected 750 terabytes of unstructured data on their consumers’ behavior – detailed information from customer online visits and browsing sessions. Using Hadoop, models have been developed intended to improve search results and tailor the user experience based on everything from location, interest in family travel versus solo travel, and even the kind of device being used to explore travel options. • The result? To date, a 7% increase in interaction rate, 37% growth in stickiness of sessions and a net 2.6% in booking path engagement. Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 12
  • 13. Example network flow data (possible use – Cyber) • A huge amount of flow data • Long-term collection of flow data Flow data in our campus network ( /16 prefix ) # of Routers 1 Day 1 Month 1 Year 1 1.2 GB 13 GB 156 GB 5 6 GB 65 GB 780 GB 10 12 GB 130 GB 1.5 TB 200 240 GB 2.6 TB 30 TB • Short-term period of flow data • Massive flow data from anomaly traffic data of Internet worm and DDoS • Cluster file system and cloud computing platform • Google’s programming model, MapReduce, big table [8] • Open-source system, Hadoop [9] Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt STKI modifications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 13
  • 14. DW appliances will be discussed later Teradata EMC Greenplun Oracle Exadata Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/ Pini Cohen’s work Copyright STKI@2012 14 Microsoft Parallel Data Warehouse Do not remove source or attribution from any slide or graph
  • 15. Several parts of paradigm changes Elements Concepts • Storing data for analytics (mainly): • HDFS – Hadoop File System • Map Reduce- Programming method mainly for analytics • Other “Add-on”: Pig, , Hive, JAQL (IBM) • Storing and retrieving data - DBMS: • NoSQL – DBMS (not only SQL): • Cassandra • MongoDB • CouchDB • Hbase • New ways of manipulating and analyzing all kind data. Example – how do get specific lead from a Facebook status “I wish I could see Messi next month in London”? Not discussed in this presentation (see Einat’s presentation) New algorithms. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 15
  • 16. Who Uses Hadoop? • Amazon/A9  Quantcast • AOL  Rackspace/Mailtrust • Facebook • Fox interactive media  Veoh • Netflix  Yahoo! • New York Times  PowerSet (now Microsoft) More at http://wiki.apache.org/hadoop/PoweredBy Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 16
  • 17. Who Uses Cassandra? • Facebook  SimpleGeo • Digg  Rackspace • Despegar  Shazam • Ooyala  SoftwareProjects • Imagini Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 17
  • 18. Big Data technologies (Hadoop etc.) vs. traditional IT Traditional IT Big Data Centralized Storage Local storage Brand redundant Servers Cheap HW White Boxes Standard Infrastructure and virtual Is standardization needed?! (in the HW servers. level). No server virtualization. Well established backup and DRP Why do I need backup? How do I tackle procedures DRP (compute clusters that are stretched over locations) Traditional vendors Open Source solutions Mature products and procedures In a new patch for specific issues sometimes it is written “not implemented yet” Traditional programming, SQL Different kind of programming (map- reduce) , no Joins Will Big Data infrastructure be part of existing infrastructure or will be developed as new domain? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 18
  • 19. The Basic Concept –the internet • Think Distributed • Think Parallel Source: http://retedeicittadini.it/wp-content/uploads/2011/02/network-distributed.gif Source: http://www.catonmat.net/blog/mit-introduction-to-algorithms- Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 19
  • 20. New type of scale: • Hadoop: • Up to 4,000 machines in a cluster • Up to 20 PB in a cluster • Currently traditional IT technologies can not handle this kind of scale. • This scale comes with a cost! Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 20
  • 21. Brewer's (CAP) Theorem • It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (node failures do not prevent survivors from continuing to operate) • Partition Tolerance (the system continues to operate in many partitions and despite arbitrary message loss) Source: Scalebase STKI modifications Professor Eric A. Brewer Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 21
  • 22. Dealing With CAP • Drop Consistency • Welcome to the “Eventually Consistent” term. • At the end – everything will work out just fine - And hey, sometimes this is a good enough solution • When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent • For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service • Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 22
  • 23. Hadoop • Apache Hadoop is a software framework that supports data-intensive distributed applications • It enables applications to work with thousands of nodes and petabytes of data. • Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers • Contains (basically): • HDFS – Hadoop file System • MapReduce programming model Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 23
  • 24. HDFS – Hadoop File System • Parallel • Distributed on commodity elements • Throughput over latency • Reliable and self healing • For large scale – typical file is gigabytes to terabytes (for one file!) • Applications need a write-once-read-many access model (mainly analytics) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 24
  • 25. HDFS motivation • What if you needed to write a program that distributes data on commodity HW (PC’s or Servers). You would need to take care of: • Where is the data located • How to distribute data between the nodes • How many times you want to replicate the data • How to insert, select and update data • What to do if one node or more fails • How to add node or to take out a node • Manage and monitor the environment • Hadoop File System did it for you! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 25
  • 26. HDFS: Hadoop Distributed File Systems • Client requests meta data about a file from namenode • Data is served directly from datanode HDFS namenode Application (file name, block id) HDFS Client File namespace /user/css534/input (block id, block location) block 3df2 instructions state (block id, byte range) HDFS datanode HDFS datanode block data Linux local file system Linux local file system … … source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 26
  • 27. Datanode Blockreports File “part-0” will be replicated twice and will populatesaved in blocks 1 and 3 (file is big so it has to be divided to 2 blocks) Block 1 is on data nodes A and C source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 27
  • 28. HDFS basic limitations • Namenode is single point of failure • Write-once model • Plan to support appending-writes • A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain • Cannot be mounted by exisiting OS • Getting data in and out is tedious • HDFS does not implement / support user quotas / access permissions • Data balancing schemes • No periodic checkpoints Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 28
  • 29. Map Reduce programming model • In very basic – Brings the program to the data • Contains two elements: • Map: this part of the job is performed in parallel asynchronous by each node • Reduce: gather the result from the relevant nodes • In more detail : • Map : return (write on temp file) a list containing zero or more ( k, v ) pairs • Output can be a different key from the input • Output can have same key • Reduce : return a new list of reduced output from input Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 29
  • 30. MapReduce motivation • What if you needed to write a program that processes data that’s on distributed computers? • You would need to write distributed program that: • Finds where the data located • Work on each node and then combine the result from each node together. • Where (on the local node) and how (format) to write the intermediate results • Find when the jobs of all participating nodes have concluded and then start the “aggregation” part • What to do if a job is stuck (restart the job or turn to another node to perform the same job) • Hadopp MapReduce is the framework for you! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 30
  • 31. MapReduce example: map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 31
  • 32. Dataflow in Hadoop Master Job: Word Count Submit job All elements – standard HW map schedule reduce map reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 32
  • 33. Dataflow in Hadoop Hello World Bye World Read Hello 1 Input File World 2 map reduce Block 1 Bye Hello Hadoop Goodbye Hadoop HDFS Block 2 Hello 1 map Hadoop 2 reduce Goodbye Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 33
  • 34. Dataflow in Hadoop Finished Finished + Location map Local FS reduce Local map FS reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 34
  • 35. Dataflow in Hadoop map Local FS reduce HTTP GET Local map FS reduce Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 35
  • 36. Dataflow in Hadoop Write Final reduce Answer HDFS reduce Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 Source: Haifa Labs IBM Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 36
  • 37. Example: Flow Analysis Map/Reduce • Read text flow files Flow Flow Flow Octet Dst Port • Run map tasks Flow • Read each line (Validation Check) • Parsing flow data • Save result 53 [64, 128] into temporary files (key, value) 53 128 64 53 192 • Run reduce tasks • Read temporary files (Key, List[Value]) • Run sum process • Write results to a file Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 37
  • 38. Components of Cluster Node Flow File Input Processor Flow Analysis Flow Analysis • Flow file Cluster File Map Reduce Cluster File Map Reduce input processor System (System) HDFS • Flow analysis flow- ( HDFS ) MapReduce Library map/reduce tools • Flow-tools Hadoop • Hadoop • HDFS Java Virtual Machine • MapReduce Operating System ( Linux ) • Java VM • OS : Linux Hardware ( CPU, HDD, Memory, NIC ) Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 38
  • 39. MapReduce helprs: Hive, Pig • Make life easier – translate more friendly language to Map Reduce Hive Pig Language SQL-like PigLatin Schemas/Types Yes (explicit) Yes (implicit) Partitions Yes No Server Optional (Thrift) No User Defined Functions (UDF) Yes (Java) Yes (Java) Custom Serializer/Deserializer Yes Yes DFS Direct Access Yes (implicit) Yes (explicit) Streaming Yes Yes Web Interface Yes No JDBC/ODBC Yes (limited) No Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 39
  • 40. Hive: MapReduce helper: • Code Example: • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a; • hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100; • hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a; • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a; • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*) FROM invites a WHERE a.ds='2008-08-15'; • hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a; • hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a; Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 40
  • 41. NoSQL DBMS: storing and retrieving data • Key/Value • A big hash table • Examples: Voldemort, Amazon’s Dynamo • Big Table • Big table, column families • Examples: Hbase, Cassandra • Document based • Collections of collections • Examples: CouchDB, MongoDB • Graph databases • Based on graph theory • Examples: Neo4J • Each solves a different problem Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 41
  • 42. Pros/Cons • Pros: • Performance • BigData • Most solutions are open source • Data is replicated to nodes and is therefore fault-tolerant (partitioning) • Don't require a schema • Can scale up and down • Cons: • Code change • No framework support • Not ACID • Eco system (BI, Backup) • There is always a database at the backend • Some API is just too simple Source: Scalebase Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 42
  • 43. There are some NoSQL projects out there… Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 43
  • 44. NoSQL Market Forecast 2011-2015 http://www.marketresearchmedia.com/2010/11/11/nosql-market/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 44
  • 45. Apache Cassandra • Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store • Child of Google’s BigTable and Amazon’s Dynamo • Peer to peer architecture. All nodes are equal Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt • Cassandra’s replication factor (RF) is the total number of nodes onto which the data will be placed. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes / RF). • CQL (Cassandra Query Language) command line • Time stamp for each value written Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 45
  • 46. Consistent Hashing • Partition using consistent hashing (for the first node data is placed) based on MD5 Distributed hash table algorithm A • Keys hash to a point on a fixed circular C space V B • Ring is partitioned into a set of ordered slots and servers and keys hashed over these slots • Nodes take positions on the circle. S D • A, B, and D exists. • B responsible for AB range ( for replication factor=2 – default). • D responsible for BD range. • A responsible for DA range. R H • C joins. • B, D split ranges. M • C gets BC from D. Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 46
  • 47. Write operation Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt Pini Cohen’s work Copyright STKI@2012 47 Do not remove source or attribution from any slide or graph
  • 48. Cassandra’s tunable consistency (write) Level Behavior Ensure that the write has been written to at least 1 node, including HintedHandoff ANY recipients. Ensure that the write has been written to at least 1 replica's commit log and ONE memory table before responding to the client. Ensure that the write has been written to at least 2 replica's before responding to TWO the client. Ensure that the write has been written to at least 3 replica's before responding to THREE the client. Ensure that the write has been written to N / 2 + 1 replicas before responding to the QUORUM client. Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within LOCAL_QUORUM the local datacenter (requires NetworkTopologyStrategy) Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each EACH_QUORUM datacenter (requires NetworkTopologyStrategy) Ensure that the write is written to all N replicas before responding to the client. Any ALL unresponsive replicas will fail the operation. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph Source: wiki 48
  • 49. Cassandra’s tunable consistency – read Level Behavior ANY Not supported. You probably want ONE instead. Will return the record returned by the first replica to respond. A consistency check is always done in a ONE background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is called ReadRepair) Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will TWO be checked in the background. THREE Will query 3 replicas and return the record with the most recent timestamp. Will query all replicas and return the record with the most recent timestamp once it has at least a majority of QUORUM replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background. LOCAL_QUO Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have RUM replied. EACH_QUO Returns the record with the most recent timestamp once a majority of replicas within each datacenter have RUM replied. Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any ALL unresponsive replicas will fail the operation. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph Source: wiki 49
  • 50. Cassandra’s data model structure Think of cassandra as row-oriented keyspace column family settings (eg, partitioner) settings column (eg, comparator, type [Std]) name value clock Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 50
  • 51. Data Model – “flexible” scheme! ColumnFamily: Rockets Key Value 1 Name Value name Rocket-Powered Roller Skates toon Ready, Set, Zoom inventoryQty 5 brakes false 2 Name Value name Little Giant Do-It-Yourself Rocket-Sled Kit toon Beep Prepared inventoryQty 4 brakes false 3 Name Value name Acme Jet Propelled Unicycle toon Hot Rod and Reel inventoryQty 1 wheels 1 Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 51
  • 52. Cassandra’s CQL – Cassandra SQL Language • SQL like. Example: • CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1; • CREATE INDEX ON users (birth_date); • SELECT * FROM users WHERE state='UT' AND birth_date > 1970; • However: • No Joins • No UPDATES/DELETES Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 52
  • 53. NoSQL benchmark – for scale! Source: r esearch.yahoo.com/files/ycsb-v4.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 53
  • 54. Can we live with NoSQL limitations? • Facebook has dropped Cassandra • “..we found Cassandra's eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure” • Facebook has selected HBase (Columnar DBMS) . http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of- messages/454991608919 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 54
  • 55. What about other NoSQL DBMS? • MongoDB • Hbase • CouchDB • Maybe next session…. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 55
  • 56. Big Data potential implications on IT • Will traditional RDBMS be obsolete? Surely no! • Several areas are Big Data zone by definition – Internet marketing, Cyber, DW, etc. • How well can we live with “Eventually Consistent” which in most cases means 1-2 minutes delay?! • Can we define that all batch data can live well on Big Data technologies? • Will we see at the end (10 years form now) that only small portion of data still resides on RDBMS and most of the data resides on Big Data technologies?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 56
  • 57. Example of big data technology: SPLUNK • Splunk is a traditional IT vendor based on MapReduce (from 2009) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 57
  • 58. Another aspect of Big Data - IBM Watson wins in Jeopardy 58 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 59. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence model model model Answer Sources Evidence Sources model model model Initial Candidate Answer Evidence Deep Primary Question Answer Scoring Retrieval Evidence Search model model model Generation Scoring Question Hypothesis Question Hypothesis Final Confidence & Topic & Evidence Synthesis Decomposition Generation Merging & Ranking Analysis Scoring Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence Hypothesis Hypothesis and Evidence Scoring Generation Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 59
  • 60. Where did it acquire knowledge? Three Domain Data Training and test NLP Resources (vocabularies, types of (articles, books, question sets taxonomies, documents) w/answer keys knowledge ontologies) • Wikipedia • 17 GB • Time, Inc. • 2.0 GB • New York Time • 7.4 GB • Encarta • 0.3 GB • Oxford University • 0.11 GB • Internet Movie Database • 0.1 GB • IBM Dictionary • 0.01 GB • ... J! Archive/YAGO/dbPedia… XXX • Total Raw Content • 70 GB • Preprocessed Content • 500 GB Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 60
  • 61. IBM’s Watson possible implications If the computer understands my speech, why do I need a keyboard? If the computer can talk, why do I need a screen? If the computer understands semantics and can act with its own reasoning – why do you need me?! 61 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 62. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end user devices ecosystem • Infrastructure as a s Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 62
  • 63. Mega-trend #1 of 21st century CONSUMERIZATION: empowerment of people collaborating via connected mobile devices Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 64. User Interface Revolution – Touch / Sound(Voice) / Move Era Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 64
  • 65. 2012: Sound/Voice is in Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 65
  • 66. 2012: Face recognition is in Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 67. Desktop and Mobile ecosystems begin to converge “BYOD : bring your own device" employees asserting control over the technology they use for work 4 Devices per employee?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 68. Four screens of convergence: TV, PC, mobile and in-car • We want to be connected 7X24 • Each of these screens is useful during our day and each is connected to the 'cloud' • IT should allow us to use the same business (IT supports ALL) and entertainment applications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 68
  • 69. Can IT support all devices ? • Employees will use as many computers and mobile devices as they wish. • Automatically keep their data in sync with a backup copy . • Solutions should be enterprise class : • secure • reliable • maintainable • integrated to critical back-office systems Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 69
  • 70. What about Productivity Software for non-wintel machines? Office 2015 ARM W8 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 70
  • 71. Israel (expected end 2012): Wintel: Q42011 compared to Q42010 Desktop PCs: -25% Notebooks: -35% Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 72. Client/server v2 Client/Server V2 1. Most apps work on/off line Terminals V 2 2. Most of the time connected 3. Uses cloud/local applications WEB/Browser client 2 types of applications: 1. Off-line: processing and storage local 2. Always connected: Client/Server V1 browser based applications 2 types of applications: 1. Off-line: processing and storage local Terminals V1 2. Always connected : data and Always connected Picture Source: http://sthvcarringtonmedia.blogspot.com/2011/02/emotions.html processing @server; GUI++ @client I/O only at the local ADVANCES/COST 1. Communications/networking 2. Processor/storage 3. Power /battery Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 73. Windows on ARM Feature Windows 8 x86/64 Windows 8 on ARM Source: http://lenzfire.com/2011/12/future-of-pc-is-soon-to-be-woa-windows-on-arm-than-to-wintel-85094/ Device Branding Such devices would be These would also be branded as x86/64 ones branded as ARM Old Windows 7 Things Everything that runs on Only selective things Windows 7 would run on would be runnable these platforms Virtualization Yes, If hardware supports it Not supported Turn on/off options Yes, on all devices No, devices would keep running on Connected Standby power mode App Development Yes, many tools are Yes, but with selective available tools only which are not yet available Availability All the sources from where Would be available only Windows 7 is available e.g. in ARM devices. No, online, DVD/CD and PC’s etc DVD’s or online availability WOA – Windows on Arm Driver availability From respective company’s Only through Windows site, DVD/CD’s and through Update Windows Update Maintenance e.g. Through Windows Disks and Only Through Windows Updates and Other Windows Update Update Fixes Uniqueness Any source would run on a Each source in unique to wide variety of devices unique device Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 73
  • 74. Microsoft is fighting back Win8 tabletsphone are: However: • Easier to managesecure • Microsoft starts from from enterprise scratch in this markets perspective • The “influences” already • Easier to synchronize with enterprise data are heavy users mainly of “stylish Apple” • Easier to enable enterprise applications • There are strong forces (on Intel based devices) within Microsoft to • Microsoft hopes to “Bring enable business Your Enterprise to Home” applications to other BYEH platforms (Office on iPAD Android..) Will Microsoft “hidden” dream of “IT enabling only Microsoft tablets and phones accessing mail enterprise apps” will come true?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph
  • 75. A new era. We had it before: Source: http://www.socialtechpop.com/2010/10/old-vs-new-trends-in-social-media/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 75
  • 76. And the new era will look like : Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletnotebook-thingy/ Computing as we now it today Change at the deviceUX level and change in application level - mobility Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 76
  • 77. New Era: IT can no longer dictate a single device • Looks like the dominance of Microsoft on Intel with C/S or WEB app is over! • The new general purpose application architecture will support: • Data stored in a cloud and in local devices (appropriate formats per each device). • Data synchronization with conflict resolution between data instances • Continuous transaction processing between different devices = mobility • Different interfaces to the same application (mainly APPS but also browser based) • Application code is native or hybrid for each device • Offline work (read with update) • Automatic SW update • Voice • Face recognition • AI reasoning Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 77
  • 78. Major paradigm shifts -mini agenda • Why don’t we see a change when it is coming? • Big Data and programming models • The changing end users devices ecosystem • Infrastructure as a s Code and DEVOPS Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 78
  • 79. Infrastructure as code • Treat your infrastructure as code: • AnalyzeDesign • Develop (the automation scripts) • Prepare the Build • Test • Deploy the Build • That means – no more manual configurations • Automatic testing – not only for the apps level • Also – be sure that what is not in the build – will not be installed • Is that possible in the current landscape?! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 79
  • 80. Some SW definitions: • Software build - the process of converting source code files into standalone software artifact(s) that can be run on a computer. One of the most important steps of a software build is the compilation process where source code files are converted into executable code. • Build automation is the act of automating a wide variety of tasks that software developers do in their day-to-day activities including things like: • compiling computer source code into binary code • packaging binary code • running tests • deployment to production systems Source: Wiki STKI modifications Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 80
  • 81. Infrastructure as code • This will enable frequent changes in production • 180% change from current “versions” policy! Source: wiki Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 81
  • 82. Opscode - Chef • With Chef, you write abstract definitions as source code to describe how you want each part of your infrastructure to be built, and then apply those descriptions to individual servers. • The result is a fully automated infrastructure: when a new server comes on line, the only thing you have to do is tell Chef what role it should play in your architecture. Source: opscode Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 82
  • 83. Opscode’s Chef • Chef agent assures that the desired configuration is installed! • All install files scripts are located in a central repository (Chef Server) in CouchDB • Tracing what was successful and what not • Documentation of everything • Major components: Cookbooks, Precipice , Knife, Shef • Pull model (can not control when components are installed) • Ruby scripting language Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 83
  • 84. Devops – Development and Operations • Addresses the conflict between Development and Operations: • Development – are paid for change • Operations – change is the enemy! • “Wall of Confusion” - combination of conflicting motivations, processes, and tooling Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 84
  • 85. Devops – Development from Mars, Operations from Venus • Development and Operations are in different organization entities and use different tools Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 85
  • 86. DeploymentRelease time is trouble time • Development kicks things off by "tossing" a software release "over the wall" to Operations. • Operations also hand edit configuration files to reflect the production environment, which is significantly different than the Development or QA environments. • At best they are duplicating work that was already done in previous environments, at worst they are about to introduce or uncover new bugs. Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 86
  • 87. Devops – new state of mind Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 87
  • 88. Devops aims at: Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html • DevOps enables the benefits of Agile development to be felt at the organizational level. DevOps does this by allowing for fast and responsive, yet stable, operations that can be kept in sync with the pace of innovation coming out of the development process. http://en.wikipedia.org/wiki/File:Devops.png Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 88
  • 89. DevOps Addresses Challenges • DevOps is an operational approach that automates system configuration and management. • To manage cloud systems, customers • Need to manage servers as groups • Must respond to rapid infrastructure changes • Have repeatable automated deployments Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 89
  • 90. Striving towards Devops state of mind: • Measurement and incentives to change culture - metrics based on joint performance • Unified processes • Unified tooling Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 90
  • 91. Devop Measurement • Resource Utilization - How resources are allocated and how efficiently they are used. Usually we're talking about people, but other kinds of resources can fall into this bucket as well. • How much time do developers and administrators spend on build and deployment activity? • How much productivity is lost to problems and bottlenecks? What is the ripple Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html effect of that? • What’s the ratio of ad-hoc change or service recovery activity to planned change? • What’s the cost of moving a unit of change through your lifecycle? • What's the mean time to diagnose a service outage? Mean time to repair? • What was the true cost of each build or deployment problem (resource and schedule impact)? • What percentage of Development driven changes require Operations to edit/change procedures or edit/change automation? • How much management time is spent dealing with build and deployment problems or change management overhead? • Can Development and QA successfully deploy their own environments? How long does it take per deployment? • How much of your team’s time is spent recreating and maintaining software infrastructure that already exists elsewhere? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 91
  • 92. Devop Measurement • Operations Throughput - The volume and rate at which change moves through your development to operations pipeline. • How long does it take to get a release from development, through testing, and into production? Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html • How much of that is actual testing time, deployment time, handoff time, or waiting? • How many releases can you successfully deploy per period? • How many successful individual change requests can your operations team handle per period? • Are any build and deployment activities the rate limiting step of your application lifecycle? How does that limit impact your business? • How many simultaneous changes can your team safely handle? • What is business' perceived “wait time” from code completion to production deployment of a feature? Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 92
  • 93. Devop Measurement • Agility - This looks at how quickly and efficiently your IT operations can react to changes in the needs of your business. • How quickly can you scale up or scale down capacity to meet Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html changing business demands? • What’s the change management overhead associated increasing/decreasing capacity? What’s the risk? • How quickly and what would it cost to adapt your build and deployment systems to automate any new applications or acquired business lines? • What would it cost you to handle a x% growth in the number of applications or business lines (direct resource assignment plus any attention drain from other staff)? • Could your IT operations handle a x% growth in number of applications or business lines? (i.e. could it even be done?) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 93
  • 94. Architecture Concepts related to Devops • Devops is related to several technology architecture and guidelines: • Build an application “as stateless as” and “as shared nothing as” possible • Try to have as least “technical debt” as possible (bugs that are on production, patches that are not installed, unsupported swhw, etc.) • Build an application with the ability to “turn off” some of its functionality while on air • Expending transaction versions vs. modifying or updating transaction (enables roll back and working concurrently in several versions) Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 94
  • 95. Devops tools: Soruce: http://doc36.controltier.org/wiki/File:ProvisioningToolchain.png Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 95
  • 96. Devops vs. Private Cloud? • In many aspects the objectives of Devops and Private Cloud are overlapping • Automation is at the core of both Private Cloud and Devops Source: http://www.pistoncloud.com/2012/01/devops-and-private-cloud-sitting-in-a-tree/ Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 96
  • 97. Some input from last’s year presentation • Public cloud Source: IDC https://www.eiseverywhere.com/file_uploads/7e2edb16ed28a2123cd21508f87be8b2_ITR_Boston_2011_Public_and_Private_Cloud_Track_RickVillars_IDC.pdf Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 97
  • 98. Summary – Major paradigm shifts • Remember Digital Equipment Corporation (DEC). “Underdogs become mainstream faster then we think”. Change is crucial • Embrace big data experiments • Embrace Devops concepts – metrics, process and tools. Start with metrics • Devops tools might be our current Technologies configuration, CMDB, tools. Processes • Embrace at least one SAAS application Standardization now (Email, Service desk, HR, ERP, CRM, etc.). Also IAAS, PAAS. • Standardization with processes. Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 98
  • 99. STKI Round Tables • Lots of useful information – use it ! Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 99
  • 100. STKI Round Tables Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 100
  • 101. We will present data on products and vendors: 1. Israeli vendors rating – state of the current market focused on the enterprise market (not SMB)  X – Market penetration (sales + installed base+ clients perspective)  Y – is X plus localization, support, development center, number and kind of integrators, etc.  Worldwide leaders marked, based on global positioning  Vendors to watch: Are only just entering Israeli market or making a big change so can’t be positioned but should be watched  Represents the current Israeli market and not necessarily what we recommend to our clients 2. Products and selected resellers / implementers  The location within the list is random Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 101
  • 102. We will present data on products and vendors (cont.) 3. Selected installations of products – projects in different stages , production,implementation, after decision… 4. Service providers that are used by users . I asked users – “which SI do you use in this category” and counted the result. 5. Analysis by international and Israeli analysts  This complete information (1 to 5) should be used together, combined with the specific circumstances of each case when making a decision This subjective chart is the result of our objective research Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 102
  • 103. 103 Pini Cohen’s work Copyright STKI@2012 Do not remove source or attribution from any slide or graph 103
  • 104. Ratio Analysis: Sorted Metric Metric • 25% percentile 36 57 43 36 • 50% percentile = 50 117 median 50 57 438 60 • 75% percentile 60 60 175 150 68.6 25% percentile 71 143 100 120 100 50 109 250 117 125 117 280 120 60 120.0 50% percentile = Median 120 200 125 117 125 100 143 164 150 125 164 600 175 192 178.1 75% percentile 188 71 192 120 200 50 250 188 280 43 438 109 Pini Cohen’s work Copyright STKI@2012 600 Do not remove source or attribution from any slide or graph 104 100
  • 105. Agenda Major paradigm shifts Development and SOA ESM BSM CMDB DBMS and DATA Platforms – Servers Clients Storage Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg Pini Cohen’s work Copyright STKI@2012 105 Do not remove source or attribution from any slide or graph

Editor's Notes

  1. DeepQA generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.DeepQAgenerates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. Thesegather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.Watson – the computer system we developed to play Jeopardy! is based on the DeepQAsoftatearchtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. &lt;UIMA Mention&gt;For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis &amp; Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.