Stki summit2012infra v7 - major trends - paradign shifts

Trends in
Infrastructure:
Paradigm Shifts

Tell me and I’ll forget
Show me and I may STKI Summit 2012
remember Pini Cohen
Involve me and I’ll VP and Senior Analyst

What do we do?

Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 2

Agenda

Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage
Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg

3
Do not remove source or attribution from any slide or graph

Major paradigm shifts -mini agenda

• Why don’t we see a change when it is coming?
• Big Data and programming models
• The changing end users devices ecosystem
• Infrastructure as Code and DEVOPS

Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift


Managers Dillema

• Bingo! My product is main stream product (quartiles 2 and 3).
• Now, should I invest in quartiles 1 or 4?
• Most managers will invest in quartile 4

Quality required is improving gradually
Percentage

Source of pic: http://www.buat-nadlan.com/2011/11/blog-post_3065.html
New productcategory

Quality required by Customers

Prof. Clayton Christensen: Disruptive Innovation Model

Remember Digital Equipment Corporation (DEC). “Underdogs become
mainstream faster than we think”. Change towards what looks as
“none mature” areas is crucial

T1 T2

6

Last’s year my theme was “The Gap”


Major paradigm shifts-mini agenda

• Infrastructure as Code and Devops



Big Data Definition – 4 V’s (or more…)

• Volume – tens of TBs and more (15-20TB+)
• Velocity – the speed in which data is added – 10M items
per hour and more. And the speed in which the data needs
to be processed
• Variety – different types of data – structured &
unstructured. In many cases deals with internet of things,
social media, but also with voice, video, etc.
• Variability - able to cope with new attributes and changing
data types – without interrupting the analytical process
(without “import-export”)
• Other optional V’s - validity, volatility, viscosity (resistance
to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html


The origins of the 3V’s:

• 2002 research by Doug Laney from META Group (now
Gartner):


“Big Data” theme main current usage:

• “Big Data" is just marketing jargon. -Doug Laney,
Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html

Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
• STKI : doing something significantly different from
what you’ve done until now


Big Data at work:

• Orbitz Worldwide has collected 750 terabytes of
unstructured data on their consumers’ behavior – detailed
information from customer online visits and browsing
sessions. Using Hadoop, models have been developed
intended to improve search results and tailor the user
experience based on everything from location, interest in
family travel versus solo travel, and even the kind of device
being used to explore travel options.
• The result? To date, a 7% increase in interaction rate, 37%
growth in stickiness of sessions and a net 2.6% in booking
path engagement.

Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf


Example network flow data (possible use – Cyber)

• A huge amount of flow data
• Long-term collection of flow data
Flow data in our campus network ( /16 prefix )
# of Routers 1 Day 1 Month 1 Year
1 1.2 GB 13 GB 156 GB
5 6 GB 65 GB 780 GB
10 12 GB 130 GB 1.5 TB
200 240 GB 2.6 TB
30 TB

• Short-term period of flow data
• Massive flow data from anomaly traffic data of Internet worm and DDoS

• Cluster file system and cloud computing platform
• Google’s programming model, MapReduce, big table [8]
• Open-source system, Hadoop [9]
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt STKI modifications


DW appliances will be discussed later

Teradata EMC Greenplun Oracle Exadata

Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/
14
Microsoft Parallel Data Warehouse

Several parts of paradigm changes Elements Concepts

• Storing data for analytics (mainly):
• HDFS – Hadoop File System
• Map Reduce- Programming method mainly for analytics
• Other “Add-on”: Pig, , Hive, JAQL (IBM)
• Storing and retrieving data - DBMS:
• NoSQL – DBMS (not only SQL):
• Cassandra
• MongoDB
• CouchDB
• Hbase
• New ways of manipulating and analyzing all kind data.
Example – how do get specific lead from a Facebook status
“I wish I could see Messi next month in London”? Not
discussed in this presentation (see Einat’s presentation)
New algorithms.

Who Uses Hadoop?

• Amazon/A9  Quantcast
• AOL
 Rackspace/Mailtrust
• Facebook
• Fox interactive media
 Veoh
• Netflix  Yahoo!
• New York Times  PowerSet (now
Microsoft)

More at http://wiki.apache.org/hadoop/PoweredBy


Who Uses Cassandra?

• Facebook  SimpleGeo
• Digg  Rackspace
• Despegar  Shazam
• Ooyala  SoftwareProjects
• Imagini


Big Data technologies (Hadoop etc.) vs. traditional IT

Traditional IT Big Data
Centralized Storage Local storage
Brand redundant Servers Cheap HW White Boxes
Standard Infrastructure and virtual Is standardization needed?! (in the HW
servers. level). No server virtualization.
Well established backup and DRP Why do I need backup? How do I tackle
procedures DRP (compute clusters that are stretched
over locations)
Traditional vendors Open Source solutions
Mature products and procedures In a new patch for specific issues
sometimes it is written “not implemented
yet”
Traditional programming, SQL Different kind of programming (map-
reduce) , no Joins
Will Big Data infrastructure be part of existing infrastructure or will be
developed as new domain?

The Basic Concept –the internet

• Think Distributed
• Think Parallel

Source: http://retedeicittadini.it/wp-content/uploads/2011/02/network-distributed.gif Source: http://www.catonmat.net/blog/mit-introduction-to-algorithms-


New type of scale:

• Hadoop:
• Up to 4,000 machines in a cluster
• Up to 20 PB in a cluster
• Currently traditional IT technologies can not handle this
kind of scale.
• This scale comes with a cost!

Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg


Brewer's (CAP) Theorem

• It is impossible for a distributed computer system to
simultaneously provide all three of the following
guarantees:
• Consistency (all nodes see the same data at the same time)
• Availability (node failures do not prevent survivors from
continuing to operate)
• Partition Tolerance (the system continues to operate in many
partitions and despite arbitrary message loss)

Source: Scalebase STKI modifications

Professor Eric A. Brewer

Dealing With CAP

• Drop Consistency
• Welcome to the “Eventually Consistent” term.
• At the end – everything will work out just fine - And hey, sometimes
this is a good enough solution
• When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
• For a given accepted update and a given node, eventually either
the update reaches the node or the node is removed from service
• Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID

Source: Scalebase

Hadoop

• Apache Hadoop is a software framework that supports
data-intensive distributed applications
• It enables applications to work with thousands of nodes
and petabytes of data.
• Hadoop was inspired by Google's MapReduce and Google
File System (GFS) papers
• Contains (basically):
• HDFS – Hadoop file System
• MapReduce programming model


HDFS – Hadoop File System

• Parallel
• Distributed on commodity elements
• Throughput over latency
• Reliable and self healing
• For large scale – typical file is gigabytes to terabytes (for
one file!)
• Applications need a write-once-read-many access
model (mainly analytics)


HDFS motivation

• What if you needed to write a program that distributes
data on commodity HW (PC’s or Servers). You would need
to take care of:
• Where is the data located
• How to distribute data between the nodes
• How many times you want to replicate the data
• How to insert, select and update data
• What to do if one node or more fails
• How to add node or to take out a node
• Manage and monitor the environment
• Hadoop File System did it for you!


HDFS: Hadoop Distributed File Systems

• Client requests meta data about a file from namenode
• Data is served directly from datanode

HDFS namenode
Application
(file name, block id)
HDFS Client File namespace /user/css534/input
(block id, block location)
block 3df2

instructions state
(block id, byte range)
HDFS datanode HDFS datanode
block data
Linux local file system Linux local file system

… …

source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA


Datanode Blockreports

File “part-0” will be
replicated twice and will
populatesaved in blocks 1
and 3 (file is big so it has to
be divided to 2 blocks)

Block 1 is on data nodes A and C

source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA


HDFS basic limitations

• Namenode is single point of failure
• Write-once model
• Plan to support appending-writes
• A namespace with an extremely large number of files
exceeds Namenode’s capacity to maintain
• Cannot be mounted by exisiting OS
• Getting data in and out is tedious
• HDFS does not implement / support user quotas / access
permissions
• Data balancing schemes
• No periodic checkpoints


Map Reduce programming model

• In very basic – Brings the program to the data
• Contains two elements:
• Map: this part of the job is performed in parallel asynchronous
by each node
• Reduce: gather the result from the relevant nodes
• In more detail :
• Map : return (write on temp file) a list containing zero or more
( k, v ) pairs
• Output can be a different key from the input
• Output can have same key
• Reduce : return a new list of reduced output from input


MapReduce motivation

• What if you needed to write a program that processes data
that’s on distributed computers?
• You would need to write distributed program that:
• Finds where the data located
• Work on each node and then combine the result from each node
together.
• Where (on the local node) and how (format) to write the
intermediate results
• Find when the jobs of all participating nodes have concluded and
then start the “aggregation” part
• What to do if a job is stuck (restart the job or turn to another node
to perform the same job)
• Hadopp MapReduce is the framework for you!


MapReduce example:

map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");

reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));


Dataflow in Hadoop

Master Job: Word Count
Submit job

All elements – standard HW

map schedule reduce

map reduce

Source: Haifa Labs IBM

Dataflow in Hadoop

Hello World Bye World
Read Hello 1
Input File World 2
map reduce
Block 1 Bye

Hello Hadoop Goodbye Hadoop
HDFS
Block 2 Hello 1
map Hadoop 2 reduce
Goodbye


Dataflow in Hadoop

Finished Finished + Location

map Local
FS
reduce

Local
map FS reduce


Dataflow in Hadoop

map Local
FS
reduce

HTTP GET
Local
map FS reduce


Dataflow in Hadoop

Write
Final
reduce
Answer
HDFS

reduce Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2


Example: Flow Analysis Map/Reduce

• Read text flow files
Flow
Flow Flow Octet
Dst Port • Run map tasks
Flow • Read each line
(Validation Check)
• Parsing flow data
• Save result
53 [64, 128]
into temporary files
(key, value)

53 128
64 53 192 • Run reduce tasks
• Read temporary files
(Key, List[Value])
• Run sum process
• Write results to a file
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt


Components of Cluster Node

Flow File Input
Processor

Flow Analysis Flow Analysis • Flow file
Cluster File Map Reduce
Cluster File
Map Reduce input processor
System
(System)
HDFS • Flow analysis
flow- ( HDFS )
MapReduce Library map/reduce
tools
• Flow-tools
Hadoop • Hadoop
• HDFS
Java Virtual Machine
• MapReduce
Operating System ( Linux ) • Java VM
• OS : Linux
Hardware ( CPU, HDD, Memory, NIC )
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt


MapReduce helprs: Hive, Pig

• Make life easier – translate more friendly language to Map
Reduce
Hive Pig

Language SQL-like PigLatin

Schemas/Types Yes (explicit) Yes (implicit)

Partitions Yes No

Server Optional (Thrift) No

User Defined Functions (UDF) Yes (Java) Yes (Java)

Custom Serializer/Deserializer Yes Yes

DFS Direct Access Yes (implicit) Yes (explicit)

Streaming Yes Yes

Web Interface Yes No

JDBC/ODBC Yes (limited) No


Hive: MapReduce helper:

• Code Example:
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a
WHERE a.key < 100;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.*
FROM events a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites,
a.pokes FROM profiles a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*)
FROM invites a WHERE a.ds='2008-08-15';
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar
FROM invites a;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT
SUM(a.pc) FROM pc1 a;


NoSQL DBMS: storing and retrieving data

• Key/Value
• A big hash table
• Examples: Voldemort, Amazon’s Dynamo
• Big Table
• Big table, column families
• Examples: Hbase, Cassandra
• Document based
• Collections of collections
• Examples: CouchDB, MongoDB
• Graph databases
• Based on graph theory
• Examples: Neo4J
• Each solves a different problem

Source: Scalebase


Pros/Cons

• Pros:
• Performance
• BigData
• Most solutions are open source
• Data is replicated to nodes and is therefore fault-tolerant
(partitioning)
• Don't require a schema
• Can scale up and down
• Cons:
• Code change
• No framework support
• Not ACID
• Eco system (BI, Backup)
• There is always a database at the backend
• Some API is just too simple
Source: Scalebase


There are some NoSQL projects out there…

Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel

NoSQL Market Forecast 2011-2015

http://www.marketresearchmedia.com/2010/11/11/nosql-market/


Apache Cassandra

• Cassandra is a highly scalable, eventually
consistent, distributed, structured key-value
store
• Child of Google’s BigTable and Amazon’s
Dynamo
• Peer to peer architecture. All nodes are equal Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt

• Cassandra’s replication factor (RF) is the total
number of nodes onto which the data will be
placed. RF of at least 2 is highly recommended,
keeping in mind that your effective number of
nodes is (N total nodes / RF).
• CQL (Cassandra Query Language) command line
• Time stamp for each value written


Consistent Hashing

• Partition using consistent hashing (for the
first node data is placed) based on MD5
Distributed hash table algorithm A
• Keys hash to a point on a fixed circular
C
space V B
• Ring is partitioned into a set of ordered
slots and servers and keys hashed over
these slots
• Nodes take positions on the circle. S D
• A, B, and D exists.
• B responsible for AB range ( for replication
factor=2 – default).
• D responsible for BD range.
• A responsible for DA range. R H
• C joins.
• B, D split ranges. M
• C gets BC from D.
Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt


Write operation

Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt

47

Cassandra’s tunable consistency (write)

Level Behavior
Ensure that the write has been written to at least 1 node, including HintedHandoff
ANY
recipients.
Ensure that the write has been written to at least 1 replica's commit log and
ONE
memory table before responding to the client.
Ensure that the write has been written to at least 2 replica's before responding to
TWO
the client.
Ensure that the write has been written to at least 3 replica's before responding to
THREE
the client.
Ensure that the write has been written to N / 2 + 1 replicas before responding to the
QUORUM
client.
Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within
LOCAL_QUORUM
the local datacenter (requires NetworkTopologyStrategy)

Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each
EACH_QUORUM
datacenter (requires NetworkTopologyStrategy)

Ensure that the write is written to all N replicas before responding to the client. Any
ALL
unresponsive replicas will fail the operation.

Do not remove source or attribution from any slide or graph Source: wiki
48

Cassandra’s tunable consistency – read

Level Behavior
ANY Not supported. You probably want ONE instead.

Will return the record returned by the first replica to respond. A consistency check is always done in a
ONE background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent
calls will have correct data even if the initial read gets an older value. (This is called ReadRepair)

Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will
TWO
be checked in the background.

THREE Will query 3 replicas and return the record with the most recent timestamp.

Will query all replicas and return the record with the most recent timestamp once it has at least a majority of
QUORUM
replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.

LOCAL_QUO Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have
RUM replied.
EACH_QUO Returns the record with the most recent timestamp once a majority of replicas within each datacenter have
RUM replied.

Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any
ALL
unresponsive replicas will fail the operation.

Do not remove source or attribution from any slide or graph Source: wiki
49

Cassandra’s data model structure

Think of cassandra as row-oriented
keyspace

column family
settings
(eg,
partitioner) settings column
(eg,
comparator,
type [Std]) name value clock

Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt


Data Model – “flexible” scheme!

ColumnFamily: Rockets

Key Value

1 Name Value

name Rocket-Powered Roller Skates
toon Ready, Set, Zoom
inventoryQty 5
brakes false

2 Name Value

name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false

3 Name Value

name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html


Cassandra’s CQL – Cassandra SQL Language

• SQL like. Example:
• CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and
strategy_options:replication_factor=1;
• CREATE INDEX ON users (birth_date);
• SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
• However:
• No Joins
• No UPDATES/DELETES


NoSQL benchmark – for scale!

Source: r esearch.yahoo.com/files/ycsb-v4.pdf


Can we live with NoSQL limitations?

• Facebook has dropped Cassandra
• “..we found Cassandra's eventual consistency model to be a
difficult pattern to reconcile for our new Messages
infrastructure”
• Facebook has selected HBase (Columnar DBMS) .
http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-
messages/454991608919


What about other NoSQL DBMS?

• MongoDB
• Hbase
• CouchDB
• Maybe next session….


Big Data potential implications on IT

• Will traditional RDBMS be obsolete? Surely no!
• Several areas are Big Data zone by definition – Internet
marketing, Cyber, DW, etc.
• How well can we live with “Eventually Consistent” which in
most cases means 1-2 minutes delay?!
• Can we define that all batch data can live well on Big Data
technologies?
• Will we see at the end (10 years form now) that only small
portion of data still resides on RDBMS and most of the data
resides on Big Data technologies?!


Example of big data technology: SPLUNK

• Splunk is a traditional IT vendor based on MapReduce
(from 2009)


Another aspect of Big Data - IBM Watson wins in Jeopardy

58

DeepQA: the technology & architecture behind Watson

Learned Models
help combine and
weigh the Evidence

model model model
Answer Sources Evidence Sources
model model model
Initial Candidate Answer Evidence Deep
Primary
Question Answer Scoring Retrieval Evidence
Search model model model
Generation Scoring

Question Hypothesis
Question Hypothesis Final Confidence
& Topic & Evidence Synthesis
Decomposition Generation Merging & Ranking
Analysis Scoring

Hypothesis Hypothesis and Evidence
Generation Scoring Answer &
Confidence

Hypothesis
Hypothesis and Evidence Scoring
Generation


Where did it acquire knowledge?

Three Domain Data Training and test NLP Resources
(vocabularies,
types of (articles, books, question sets
taxonomies,
documents) w/answer keys
knowledge ontologies)

• Wikipedia
• 17 GB
• Time, Inc.
• 2.0 GB
• New York Time
• 7.4 GB
• Encarta • 0.3 GB
• Oxford University • 0.11 GB
• Internet Movie Database • 0.1 GB
• IBM Dictionary • 0.01 GB
• ... J! Archive/YAGO/dbPedia… XXX
• Total Raw Content • 70 GB
• Preprocessed Content • 500 GB

IBM’s Watson possible implications

If the computer understands my speech, why do I need a
keyboard?
If the computer can talk, why do I need a screen?
If the computer understands semantics and can act with its
own reasoning – why do you need me?!

61


• The changing end user devices ecosystem
• Infrastructure as a s Code and DEVOPS



Mega-trend #1 of 21st century

CONSUMERIZATION:
empowerment of people collaborating via
connected mobile devices


User Interface Revolution – Touch / Sound(Voice) / Move Era


2012: Sound/Voice is in


2012: Face recognition is in


Desktop and Mobile ecosystems begin to converge

“BYOD : bring your own device"
employees asserting control over the technology they use for work
4 Devices per employee?!

Four screens of convergence: TV, PC, mobile and in-car

• We want to be connected 7X24
• Each of these screens is useful during our
day and each is connected to the 'cloud'

• IT should allow us to use the same
business (IT supports ALL) and
entertainment applications


Can IT support all devices ?

• Employees will use as many
computers and mobile devices as
they wish.

• Automatically keep their data in
sync with a backup copy .

• Solutions should be enterprise class :
• secure
• reliable
• maintainable
• integrated to critical back-office
systems


What about Productivity Software for non-wintel machines?

Office 2015
ARM W8


Israel (expected end 2012):

Wintel: Q42011 compared to Q42010
Desktop PCs: -25% Notebooks: -35%

Client/server v2

Client/Server V2
1. Most apps work on/off line
Terminals V 2 2. Most of the time connected
3. Uses cloud/local applications
WEB/Browser client
2 types of applications:
1. Off-line: processing and
storage local
2. Always connected:
Client/Server V1
browser based applications 2 types of applications:
1. Off-line: processing and storage local
Terminals V1 2. Always connected : data and
Always connected
Picture Source: http://sthvcarringtonmedia.blogspot.com/2011/02/emotions.html
processing @server; GUI++ @client
I/O only at the local

ADVANCES/COST
1. Communications/networking
2. Processor/storage
3. Power /battery

Windows on ARM

Feature Windows 8 x86/64 Windows 8 on ARM Source: http://lenzfire.com/2011/12/future-of-pc-is-soon-to-be-woa-windows-on-arm-than-to-wintel-85094/

Device Branding Such devices would be These would also be
branded as x86/64 ones branded as ARM
Old Windows 7 Things Everything that runs on Only selective things
Windows 7 would run on would be runnable
these platforms

Virtualization Yes, If hardware supports it Not supported

Turn on/off options Yes, on all devices No, devices would keep
running on Connected
Standby power mode
App Development Yes, many tools are Yes, but with selective
available tools only which are not
yet available

Availability All the sources from where Would be available only
Windows 7 is available e.g. in ARM devices. No,
online, DVD/CD and PC’s etc DVD’s or online
availability WOA – Windows on Arm
Driver availability From respective company’s Only through Windows
site, DVD/CD’s and through Update
Windows Update
Maintenance e.g. Through Windows Disks and Only Through Windows
Updates and Other Windows Update Update
Fixes
Uniqueness Any source would run on a Each source in unique to
wide variety of devices unique device


Microsoft is fighting back

Win8 tabletsphone are: However:
• Easier to managesecure • Microsoft starts from
from enterprise scratch in this markets
perspective
• The “influences” already
• Easier to synchronize
with enterprise data are heavy users mainly of
“stylish Apple”
• Easier to enable
enterprise applications • There are strong forces
(on Intel based devices) within Microsoft to
• Microsoft hopes to “Bring enable business
Your Enterprise to Home” applications to other
BYEH platforms (Office on iPAD
Android..)
Will Microsoft “hidden” dream of “IT enabling only Microsoft tablets and
phones accessing mail enterprise apps” will come true?!

A new era. We had it before:

Source: http://www.socialtechpop.com/2010/10/old-vs-new-trends-in-social-media/


And the new era will look like :

Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletnotebook-thingy/

Computing as we now it today Change at the deviceUX level
and change in application level -
mobility


New Era: IT can no longer dictate a single device

• Looks like the dominance of Microsoft on Intel with C/S or WEB
app is over!
• The new general purpose application architecture will support:
• Data stored in a cloud and in local devices (appropriate formats per
each device).
• Data synchronization with conflict resolution between data instances
• Continuous transaction processing between different devices =
mobility
• Different interfaces to the same application (mainly APPS but also
browser based)
• Application code is native or hybrid for each device
• Offline work (read with update)
• Automatic SW update
• Voice
• Face recognition
• AI reasoning



• Infrastructure as a s Code and DEVOPS



Infrastructure as code

• Treat your infrastructure as code:
• AnalyzeDesign
• Develop (the automation scripts)
• Prepare the Build
• Test
• Deploy the Build
• That means – no more manual configurations
• Automatic testing – not only for the apps level
• Also – be sure that what is not in the build – will not be
installed
• Is that possible in the current landscape?!


Some SW definitions:

• Software build - the process of converting source code files
into standalone software artifact(s) that can be run on a
computer. One of the most important steps of a software
build is the compilation process where source code files
are converted into executable code.
• Build automation is the act of automating a wide variety of
tasks that software developers do in their day-to-day
activities including things like:
• compiling computer source code into binary code
• packaging binary code
• running tests
• deployment to production systems

Source: Wiki STKI modifications


Infrastructure as code

• This will enable frequent changes in production
• 180% change from current “versions” policy!

Source: wiki

Opscode - Chef

• With Chef, you write abstract definitions as source code to describe
how you want each part of your infrastructure to be built, and then
apply those descriptions to individual servers.
• The result is a fully automated infrastructure: when a new server
comes on line, the only thing you have to do is tell Chef what role it
should play in your architecture.

Source: opscode


Opscode’s Chef

• Chef agent assures that the desired configuration is
installed!
• All install files scripts are located in a central repository
(Chef Server) in CouchDB
• Tracing what was successful and what not
• Documentation of everything
• Major components: Cookbooks, Precipice , Knife, Shef
• Pull model (can not control when components are
installed)
• Ruby scripting language


Devops – Development and Operations

• Addresses the conflict between Development and
Operations:
• Development – are paid for change
• Operations – change is the enemy!
• “Wall of Confusion” - combination of conflicting
motivations, processes, and tooling

Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html

Devops – Development from Mars, Operations from Venus

• Development and Operations are in different organization
entities and use different tools


DeploymentRelease time is trouble time

• Development kicks things off by "tossing" a software release
"over the wall" to Operations.
• Operations also hand edit configuration files to reflect the
production environment, which is significantly different than
the Development or QA environments.
• At best they are duplicating work that was already done in
previous environments, at worst they are about to introduce
or uncover new bugs.


Devops – new state of mind


Devops aims at:

• DevOps enables the benefits of Agile development to be
felt at the organizational level. DevOps does this by
allowing for fast and responsive, yet stable, operations that
can be kept in sync with the pace of innovation coming out
of the development process.

http://en.wikipedia.org/wiki/File:Devops.png

DevOps Addresses Challenges

• DevOps is an operational approach that automates system
configuration and management.

• To manage cloud systems, customers
• Need to manage servers as groups
• Must respond to rapid infrastructure changes
• Have repeatable automated deployments


Striving towards Devops state of mind:

• Measurement and incentives to change culture - metrics
based on joint performance
• Unified processes
• Unified tooling


Devop Measurement

• Resource Utilization - How resources are allocated and how efficiently
they are used. Usually we're talking about people, but other kinds of
resources can fall into this bucket as well.
• How much time do developers and administrators spend on build and deployment
activity?
• How much productivity is lost to problems and bottlenecks? What is the ripple

Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
effect of that?
• What’s the ratio of ad-hoc change or service recovery activity to
planned change?
• What’s the cost of moving a unit of change through your lifecycle?
• What's the mean time to diagnose a service outage? Mean time to repair?
• What was the true cost of each build or deployment problem (resource and
schedule impact)?
• What percentage of Development driven changes require Operations to
edit/change procedures or edit/change automation?
• How much management time is spent dealing with build and deployment problems
or change management overhead?
• Can Development and QA successfully deploy their own
environments? How long does it take per deployment?
• How much of your team’s time is spent recreating and maintaining software
infrastructure that already exists elsewhere?


Devop Measurement

• Operations Throughput - The volume and rate at which change
moves through your development to operations pipeline.
• How long does it take to get a release from development,
through testing, and into production?

• How much of that is actual testing time, deployment time, handoff
time, or waiting?
• How many releases can you successfully deploy per period?
• How many successful individual change requests can your operations
team handle per period?
• Are any build and deployment activities the rate limiting step of your
application lifecycle? How does that limit impact your business?
• How many simultaneous changes can your team safely handle?
• What is business' perceived “wait time” from code
completion to production deployment of a feature?


Devop Measurement

• Agility - This looks at how quickly and efficiently your IT
operations can react to changes in the needs of your
business.
• How quickly can you scale up or scale down capacity to meet

changing business demands?
• What’s the change management overhead associated
increasing/decreasing capacity? What’s the risk?
• How quickly and what would it cost to adapt your build and
deployment systems to automate any new applications or
acquired business lines?
• What would it cost you to handle a x% growth in the number of
applications or business lines (direct resource assignment plus any
attention drain from other staff)?
• Could your IT operations handle a x% growth in number of
applications or business lines? (i.e. could it even be done?)

Architecture Concepts related to Devops

• Devops is related to several technology
architecture and guidelines:
• Build an application “as stateless as” and “as shared
nothing as” possible
• Try to have as least “technical debt” as possible (bugs
that are on production, patches that are not installed,
unsupported swhw, etc.)
• Build an application with the ability to “turn off” some
of its functionality while on air
• Expending transaction versions vs. modifying or
updating transaction (enables roll back and working
concurrently in several versions)


Devops tools:

Soruce: http://doc36.controltier.org/wiki/File:ProvisioningToolchain.png

Devops vs. Private Cloud?

• In many aspects the objectives of Devops and Private Cloud
are overlapping
• Automation is at the core of both Private Cloud and Devops

Source: http://www.pistoncloud.com/2012/01/devops-and-private-cloud-sitting-in-a-tree/


Some input from last’s year presentation

• Public cloud

Source: IDC https://www.eiseverywhere.com/file_uploads/7e2edb16ed28a2123cd21508f87be8b2_ITR_Boston_2011_Public_and_Private_Cloud_Track_RickVillars_IDC.pdf


Summary – Major paradigm shifts

• Remember Digital Equipment
Corporation (DEC). “Underdogs
become mainstream faster then we
think”. Change is crucial
• Embrace big data experiments
• Embrace Devops concepts – metrics,
process and tools. Start with metrics
• Devops tools might be our current Technologies
configuration, CMDB, tools. Processes

• Embrace at least one SAAS application Standardization
now (Email, Service desk, HR, ERP,
CRM, etc.). Also IAAS, PAAS.
• Standardization with processes.


STKI Round Tables

• Lots of useful information – use it !


STKI Round Tables


We will present data on products and vendors:

1. Israeli vendors rating – state of the current market focused on the
enterprise market (not SMB)
 X – Market penetration (sales + installed base+ clients
perspective)
 Y – is X plus localization, support, development center, number
and kind of integrators, etc.
 Worldwide leaders marked, based on global positioning
 Vendors to watch: Are only just entering Israeli market or
making a big change so can’t be positioned but should be
watched
 Represents the current Israeli market and not necessarily what we
recommend to our clients
2. Products and selected resellers / implementers
 The location within the list is random


We will present data on products and vendors (cont.)

3. Selected installations of products – projects in different stages ,
production,implementation, after decision…

4. Service providers that are used by users . I asked users – “which
SI do you use in this category” and counted the result.

5. Analysis by international and Israeli analysts
 This complete information (1 to 5) should be used together,
combined with the specific circumstances of each case when
making a decision
This subjective chart is the result of our
objective research

103

Ratio Analysis:

Sorted Metric Metric
• 25% percentile 36 57
43 36
• 50% percentile = 50 117
median 50
57
438
60
• 75% percentile 60
60
175
150
68.6 25% percentile 71 143
100 120
100 50
109 250
117 125
117 280
120 60
120.0 50% percentile = Median 120 200
125 117
125 100
143 164
150 125
164 600
175 192
178.1 75% percentile 188 71
192 120
200 50
250 188
280 43
438 109
Pini Cohen’s work Copyright STKI@2012 600
Do not remove source or attribution from any slide or graph 104 100

Agenda

Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage

Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg

105

Stki summit2012infra v7 - major trends - paradign shifts

Recommended

Recommended

More Related Content

Similar to Stki summit2012infra v7 - major trends - paradign shifts

Similar to Stki summit2012infra v7 - major trends - paradign shifts (20)

More from Pini Cohen

More from Pini Cohen (20)

Recently uploaded

Recently uploaded (20)

Stki summit2012infra v7 - major trends - paradign shifts

Editor's Notes