8. Data sets so large and complex
that they become difficult to
process using on-hand database
management tools.
Wikipedia
Big Data: Teams not Tech
Nov 2012 8
billjacobus1
20. Infrastructure
Made for the cloud – the challenge is
economic
Big Data: Teams not Tech
Nov 2012 20
jblyberg
21. Operational Systems
Application Stack
System Management
Cleansing & Transformation ZettaSet ZooKeeper …
Visualisation & Analysis
Operational Delivery
Scheduling, Workflow
Oozie Pig …
Capture
Resource Management
MapReduce … …
Record Management
SQL NoSQL …
Tooling, Learning, Integration,
File System
HDFS … …
Stabilising for Ops
Storage Infrastructure
Big Data: Teams not Tech
Nov 2012 21
22. Skills
Organisational context for multi-skilled
teams
Big Data: Teams not Tech
Nov 2012 22
izzyplante
23. Experimentation
Create capacity to try stuff. Encourage
failure.
Big Data: Teams not Tech
Nov 2012 23
NOAA Photo Library
24. Valuation
Find specific value drivers & business
models
Big Data: Teams not Tech
Nov 2012 24
dogbomb
25. Fragmentation
Systems (human and technical) that
bypass the silos
Data quality. Data quality. Data quality.
Big Data: Teams not Tech
Nov 2012 25
turloughmor
26. Infrastructure
Application Stack
Skills
Experimentation
Valuation
Fragmentation
Big Data: Teams not Tech
Nov 2012 26
ell brown
27. Infrastructure
Application Stack
Skills
Experimentation
Valuation
Fragmentation
Big Data: Teams not Tech
Nov 2012 27
ell brown
28. CMO
COO
Build teams that can deal with variety
Service &
Application
Development
Service Mgt
Infrastructure
… & Operations
Data Exploitation
Operate in tight cycles – value experiments
(Build technical Infrastructure & Ops support this)
Big Data: Teams not Tech
Nov 2012 28
30. Graham Oakes Ltd
Making sense of technology…
Many organisations are caught up in the
complexity of technology and systems.
This complexity may be inherent to the
technology itself. It may be created by the pace of technology change. Or it may arise from
the surrounding process, people and governance structures.
We help untangle this complexity and define business strategies that both can be
implemented and will be adopted by people throughout the organisation and its partner
network. We then help assure delivery of implementation projects.
Clients…
Cisco Worldwide Education – Architecture and research for e-learning and educational systems
Council of Europe – Systems for monitoring compliance with international treaties; e-learning systems
Dover Harbour Board – Systems and architecture review
MessageLabs – Architecture and assurance for partner management portal
National Savings & Investments – Helped NS&I and BPO partner develop joint IS strategy
The Open University – Enterprise architecture, CRM and product development strategies
Oxfam – Content management, CRM, e-Commerce
Thames Valley Police – Internet Consultancy
Sony Computer Entertainment – Global process definition
Skype – product development lifecycle
Amnesty International, Endemol, tsoosayLabs, Vodafone, …
Big Data: Teams not Tech
Nov 2012 30
Editor's Notes
Basket Promise: Use behavioural data to segment customers into tight segments and you getImproved sales metrics (more click through, bigger basket, …)At lower cost (e.g. smaller campaigns to get same number of responses)
Conversation Promise: Monitor what people are saying and respond to them in order to Improve brand perceptionDrive purchasing behaviourIdentify opportunities for new products
Process Promise: Process enough jet engine data quickly enough, and you can sell “power by the hour” rather than enginesNew productsImproved operational metrics
Question Mark: But how real is this? Is it really possible? Are people actually doing it? Or is it all vendor hype?I’ve seen enough glimpses to believe that it’s realUniversity – identify students who are most likely to succeed (links to funding; reduces campaign costs)Truck Manufacturer – use data from service history to identify product recalls and do them more efficiently; use data to identify operational “best practice”Startup – use data from secondhand market in consumer appliances to identify scope for supporting services, parts, etcTravel Firm -- data warehouse to reduce campaign costsTelco – make it easier for people to aggregate and manage contact and event data, and use this data to connect to new people, services, etcWeb analytics – dipping toes in the water
But the reality is that we’re nowhere near what’s promised. There’s a lot of talk, but not a lot of action beyond BAU analytics & suchlike. Why haven’t we gone further?
Part of the problem is that big data has been driven by the tech vendors – they’ve turned it into an opportunity to sell more kit, software & services. To do this, they’ve let the definition become very fuzzy.
And it was fuzzy to begin with – “Exploiting data sets that push the technology capabilities you have available.”By that definition we’ve had big data ever since the invention of the stone tablet, let alone the IOS or Android one…
Volume, Velocity, Variety: The defining characteristics. They’ve always been there – the scale has shifted from kilo to mega to giga to terra to peta & beyond, but roughly the same pace as the underlying tech capabilities (cost/byte for storage, speed of processors, etc)
So I’m going to spend 5-10 mins going through my personal history of big data, stretching back over the last 3 decades. See what that suggests we need to do to address big data.Then I’ll summarise what I see as the key challenges, and a way to start dealing with them.Not a definitive answer – just a kind of “lessons learned” from my experience with processing lots of data for a variety of business objectives.
Processing geophysical data – seismic, aerial surveysVolume -- Megabytes to GigabytesVelocity -- Weeks to monthsVariety -- Single, technical domain – Limited data fusionTech -- Challenged all but specialist computers so created a driver for supercomputingTeam -- Domain experts made the technology work: if you wanted to handle data, you learned to drive the technology. One team, not organisation silos.
Processing satellite imageryVolume -- Megabytes to gigabytesVelocity -- Weeks to months (Specialised research into real-time processing)Variety -- Patchy data fusion – again, this was mostly research rather than operational realityTech -- Mostly handled by specialist hardwareTeam -- Domain experts made the technology work – again, you had to learn technology if you wanted to handle data
Customer data warehousesVolume -- Gigabytes to terabytesVelocity -- Days to weeksVariety -- Some data fusion via ETL (the crane), but vision was “single source of truth” – aimed to try to massage reality into a single, well-defined data modelTech -- Mostly handled by general purpose computers (some specialist)Team -- Separated business and technical skills – sometimes as cross-functional teams, but more often as silos
Web analyticsVolume -- Gigabytes to terabytesVelocity -- Minutes to days (real time handled via abstracted rules)Variety –Web data is in a separate silo & low aspirations to integrate it (some people dream of integrating with other corporate data, but organisational silos & legacy tech get in the way) – a step backwards from the customer data of 90’s, where were beginning to integrate…?Tech -- Mostly handled via specialist services (SaaS)Team -- Separated business and technical skills – sometimes as cross-functional teams, but more often as silos
Real time view into complex, real-world problems:Analysing multi-channel / social media dataVolume -- Terabytes and beyondVelocity -- Increasing call to be able to see trends in near-real-0timeVariety -- Fuse data from multiple sources; Handle evolving data definitionsTech -- Within capabilities of cloud-based services (IaaS & SaaS) – compute & storage not really a constraint (bandwidth may be)Team -- Silos between marketing, sales, IT, …, have become more entrenched, just as we need to bring together wider knowledge…
Real time view into complex, real-world problems:Threat detection, e.g. crime and national security
Real time view into complex, real-world problems:Routing and optimisation of complex logistics networks; product data management; service scheduling
Volume, velocity, variety when up, but pretty much in synch with tech capabilitiesTechnology: from bespoke s/w and specialised h/w to standard s/w stack and commodity compute capacity – makes it easier to focus on the data, not the techTeams: From individual scientist to small teams of multi-skilled teams who worked together (often in tough conditions), to large teams of specialists split across org boundariesAbout 2000, we swapped from generalist teams using specialist hardware to specialist teams using generalist hardware. That’s where we went wrong.
What’s stopping us exploit Big Data as well as we might?I see 6 broad classes of challenge arising from these trends…
Big Data consumes a lot – storage, compute & bandwidthIt creates highly variable workloadsMade for cloud, so get on the tech curveNeed to get the economics right
Complex & variable maturityDozen apps in ClouderaHadoop distributionRelatively new & variable maturityCreates challenges for choice of tools, learning curve, integrating, creating a stable stack for operationsComplex, but can be learned. This is what people in IT do – they learn new tech. You shouldn’t be in the game if you can’t or won’t do it.
Need a deep stack of skills – technical, data-centric (data scientists / analysts / visualisation designers), problem domain expertiseDebate about data vs domain expertise – you need both, with tech supportChallengesFinding people with right mindset & building their skillsBuilding integrated teamsChanging mgt hierarchy, reward structures, org structures, etc, to support teams & their analytical/experiment-based approachI’d focus here – automate the infra & technology or push it to partners, & build inhouse teams to focus on the high value stuff
Cycle is not plan & do. It’s experiment, learn and evolve.Lean startups are doing this with their systems.Traditional IT needs to internalise it to make big data work. Agile is a step in the right direction, but need to turn the dial full upForm long-running, integrated, business-led teamsCreate capacity for people to think & try stuff – Google 20%Encourage failure – 50% of experiments should fail if we’re being radical enough – this maximises learning
Can only do this if can understand the value of the outcomes. Without value, can’t build an effective portfolio of experiments and investments.Still hard to write an effective business case for many orgs – it’s hard to ascribe clear value while still experimenting – needs a bit of a “leap of faith”May mean flipping our business model on its head to exploit value in dataRolls Royce – value migrates from engineering to services, through value of engine operational dataMecado – niche ecommerce platform. Is value in customer data, but may be more in product data – seeing lifecycle of products & their service needs, accessories, etc, can provide valuable insight to product manufacturersAgain, the business domain specialists and the data scientists need to be close to the IT guys to make the value clear
Web team is doing this on their data. So is Sales. And Marketing & Ops & IT.Creates challenges for data fusionData definition (syntax & semantics)Data cleansing, transformation & integrationDATA QUALITY HAS ALWAYS BEEN AN ISSUE – BIG DATA SILOs JUST MULTIPLY THE PROBLEM – not a new problem, just a bigger oneHeard this all before for data warehouses, but now we haveSemi-structured as well as structured dataRapidly changing data definitionsPeople have built deep, fortified silos in the cloudRight now we’re building systems (human & technical) to play with data – shift it around and look at bits of it. Need to build systems to exploit value in integrated data.
Technical - is being talked about, because the vendors have something to sellThe vendors are talking about this & they’re selling solutions that at least start to address the issues (varying levels of maturity)This is what vendors do best, so let them do it / work with themOrganisational – is known & some are talking about it, but it’s less discussed because there’s nothing easy for vendors to sellNeed skills & processes to manage the technical (& the associated vendors)Need skills & processes to manage the valuation challengesBusiness model – not really even being discussed: people assume/hope we’ll solve it with pilots. Need to address it more specifically.I’m not sure how to solve this – I think we need to put the teams together & make it their explicit goal to solve it.
I think we lost it in items 3 & 4 – we stopped exploring the data and started getting bogged down in the inter-team dynamicsWho sees this in their daily lives – time goes into managing organisational dynamics, not delivering results?Can’t avoid org dynamics, but if focus elsewhere (on the tech), then it becomes more invidious. Need to manage it actively to get best results, and big data is making us stare in just the opposite direction.Solution:Organise ourselves to build skills & place them within cross-functional teams, put the right context around those teams, and focus our initial experiments on ranging widely to explore new value modelsI think we probably need new org models to do this
If I was building an IT org from scratch, this is what it’d look like… (widths are not proportional – just what I could fit)Infra & Ops: May be outsourcedService Mgt: Must be inhouse. Needs to be skillful enough to manage the infra & ops core, and to add new services into it.Service & application development: Innovation core. Needs core of skills & capability inhouse, perhaps with supplementation from external agencies, system integrators, etc. Partnering approach, but internally led.“Petals”: Cross-functional, business-led teams. Sit outside IT control, but with IT dev specialists within the team to support it. Also sit outside any business silo, but will have specialists from the relevant silos. Also have data scientists & other expertise. Led from the business domain. Experiment-led service development.Build the team skills and experimental approach, for cross-functional teams. The infra and app stack will follow easily. Over time, will start to get handle on valuation and reduce fragmentation.I know startups which are effectively organised this way. Now larger orgs need to do it too.