Basket Promise: Use behavioural data to segment customers into tight segments and you getImproved sales metrics (more click through, bigger basket, …)At lower cost (e.g. smaller campaigns to get same number of responses)
Conversation Promise: Monitor what people are saying and respond to them in order to Improve brand perceptionDrive purchasing behaviourIdentify opportunities for new products
Process Promise: Process enough jet engine data quickly enough, and you can sell “power by the hour” rather than enginesNew productsImproved operational metrics
Question Mark: But how real is this? Is it really possible? Are people actually doing it? Or is it all vendor hype?I’ve seen enough glimpses to believe that it’s realUniversity – identify students who are most likely to succeed (links to funding; reduces campaign costs)Truck Manufacturer – use data from service history to identify product recalls and do them more efficiently; use data to identify operational “best practice”Startup – use data from secondhand market in consumer appliances to identify scope for supporting services, parts, etcTravel Firm -- data warehouse to reduce campaign costsTelco – make it easier for people to aggregate and manage contact and event data, and use this data to connect to new people, services, etcWeb analytics – dipping toes in the water
But the reality is that we’re nowhere near what’s promised. There’s a lot of talk, but not a lot of action beyond BAU analytics & suchlike. Why haven’t we gone further?
Part of the problem is that big data has been driven by the tech vendors – they’ve turned it into an opportunity to sell more kit, software & services. To do this, they’ve let the definition become very fuzzy.
And it was fuzzy to begin with – “Exploiting data sets that push the technology capabilities you have available.”By that definition we’ve had big data ever since the invention of the stone tablet, let alone the IOS or Android one…
Volume, Velocity, Variety: The defining characteristics. They’ve always been there – the scale has shifted from kilo to mega to giga to terra to peta & beyond, but roughly the same pace as the underlying tech capabilities (cost/byte for storage, speed of processors, etc)
So I’m going to spend 5-10 mins going through my personal history of big data, stretching back over the last 3 decades. See what that suggests we need to do to address big data.Then I’ll summarise what I see as the key challenges, and a way to start dealing with them.Not a definitive answer – just a kind of “lessons learned” from my experience with processing lots of data for a variety of business objectives.
Processing geophysical data – seismic, aerial surveysVolume -- Megabytes to GigabytesVelocity -- Weeks to monthsVariety -- Single, technical domain – Limited data fusionTech -- Challenged all but specialist computers so created a driver for supercomputingTeam -- Domain experts made the technology work: if you wanted to handle data, you learned to drive the technology. One team, not organisation silos.
Processing satellite imageryVolume -- Megabytes to gigabytesVelocity -- Weeks to months (Specialised research into real-time processing)Variety -- Patchy data fusion – again, this was mostly research rather than operational realityTech -- Mostly handled by specialist hardwareTeam -- Domain experts made the technology work – again, you had to learn technology if you wanted to handle data
Customer data warehousesVolume -- Gigabytes to terabytesVelocity -- Days to weeksVariety -- Some data fusion via ETL (the crane), but vision was “single source of truth” – aimed to try to massage reality into a single, well-defined data modelTech -- Mostly handled by general purpose computers (some specialist)Team -- Separated business and technical skills – sometimes as cross-functional teams, but more often as silos
Web analyticsVolume -- Gigabytes to terabytesVelocity -- Minutes to days (real time handled via abstracted rules)Variety –Web data is in a separate silo & low aspirations to integrate it (some people dream of integrating with other corporate data, but organisational silos & legacy tech get in the way) – a step backwards from the customer data of 90’s, where were beginning to integrate…?Tech -- Mostly handled via specialist services (SaaS)Team -- Separated business and technical skills – sometimes as cross-functional teams, but more often as silos
Real time view into complex, real-world problems:Analysing multi-channel / social media dataVolume -- Terabytes and beyondVelocity -- Increasing call to be able to see trends in near-real-0timeVariety -- Fuse data from multiple sources; Handle evolving data definitionsTech -- Within capabilities of cloud-based services (IaaS & SaaS) – compute & storage not really a constraint (bandwidth may be)Team -- Silos between marketing, sales, IT, …, have become more entrenched, just as we need to bring together wider knowledge…
Real time view into complex, real-world problems:Threat detection, e.g. crime and national security
Real time view into complex, real-world problems:Routing and optimisation of complex logistics networks; product data management; service scheduling
Volume, velocity, variety when up, but pretty much in synch with tech capabilitiesTechnology: from bespoke s/w and specialised h/w to standard s/w stack and commodity compute capacity – makes it easier to focus on the data, not the techTeams: From individual scientist to small teams of multi-skilled teams who worked together (often in tough conditions), to large teams of specialists split across org boundariesAbout 2000, we swapped from generalist teams using specialist hardware to specialist teams using generalist hardware. That’s where we went wrong.
What’s stopping us exploit Big Data as well as we might?I see 6 broad classes of challenge arising from these trends…
Big Data consumes a lot – storage, compute & bandwidthIt creates highly variable workloadsMade for cloud, so get on the tech curveNeed to get the economics right
Complex & variable maturityDozen apps in ClouderaHadoop distributionRelatively new & variable maturityCreates challenges for choice of tools, learning curve, integrating, creating a stable stack for operationsComplex, but can be learned. This is what people in IT do – they learn new tech. You shouldn’t be in the game if you can’t or won’t do it.
Need a deep stack of skills – technical, data-centric (data scientists / analysts / visualisation designers), problem domain expertiseDebate about data vs domain expertise – you need both, with tech supportChallengesFinding people with right mindset & building their skillsBuilding integrated teamsChanging mgt hierarchy, reward structures, org structures, etc, to support teams & their analytical/experiment-based approachI’d focus here – automate the infra & technology or push it to partners, & build inhouse teams to focus on the high value stuff
Cycle is not plan & do. It’s experiment, learn and evolve.Lean startups are doing this with their systems.Traditional IT needs to internalise it to make big data work. Agile is a step in the right direction, but need to turn the dial full upForm long-running, integrated, business-led teamsCreate capacity for people to think & try stuff – Google 20%Encourage failure – 50% of experiments should fail if we’re being radical enough – this maximises learning
Can only do this if can understand the value of the outcomes. Without value, can’t build an effective portfolio of experiments and investments.Still hard to write an effective business case for many orgs – it’s hard to ascribe clear value while still experimenting – needs a bit of a “leap of faith”May mean flipping our business model on its head to exploit value in dataRolls Royce – value migrates from engineering to services, through value of engine operational dataMecado – niche ecommerce platform. Is value in customer data, but may be more in product data – seeing lifecycle of products & their service needs, accessories, etc, can provide valuable insight to product manufacturersAgain, the business domain specialists and the data scientists need to be close to the IT guys to make the value clear
Web team is doing this on their data. So is Sales. And Marketing & Ops & IT.Creates challenges for data fusionData definition (syntax & semantics)Data cleansing, transformation & integrationDATA QUALITY HAS ALWAYS BEEN AN ISSUE – BIG DATA SILOs JUST MULTIPLY THE PROBLEM – not a new problem, just a bigger oneHeard this all before for data warehouses, but now we haveSemi-structured as well as structured dataRapidly changing data definitionsPeople have built deep, fortified silos in the cloudRight now we’re building systems (human & technical) to play with data – shift it around and look at bits of it. Need to build systems to exploit value in integrated data.
Technical - is being talked about, because the vendors have something to sellThe vendors are talking about this & they’re selling solutions that at least start to address the issues (varying levels of maturity)This is what vendors do best, so let them do it / work with themOrganisational – is known & some are talking about it, but it’s less discussed because there’s nothing easy for vendors to sellNeed skills & processes to manage the technical (& the associated vendors)Need skills & processes to manage the valuation challengesBusiness model – not really even being discussed: people assume/hope we’ll solve it with pilots. Need to address it more specifically.I’m not sure how to solve this – I think we need to put the teams together & make it their explicit goal to solve it.
I think we lost it in items 3 & 4 – we stopped exploring the data and started getting bogged down in the inter-team dynamicsWho sees this in their daily lives – time goes into managing organisational dynamics, not delivering results?Can’t avoid org dynamics, but if focus elsewhere (on the tech), then it becomes more invidious. Need to manage it actively to get best results, and big data is making us stare in just the opposite direction.Solution:Organise ourselves to build skills & place them within cross-functional teams, put the right context around those teams, and focus our initial experiments on ranging widely to explore new value modelsI think we probably need new org models to do this
If I was building an IT org from scratch, this is what it’d look like… (widths are not proportional – just what I could fit)Infra & Ops: May be outsourcedService Mgt: Must be inhouse. Needs to be skillful enough to manage the infra & ops core, and to add new services into it.Service & application development: Innovation core. Needs core of skills & capability inhouse, perhaps with supplementation from external agencies, system integrators, etc. Partnering approach, but internally led.“Petals”: Cross-functional, business-led teams. Sit outside IT control, but with IT dev specialists within the team to support it. Also sit outside any business silo, but will have specialists from the relevant silos. Also have data scientists & other expertise. Led from the business domain. Experiment-led service development.Build the team skills and experimental approach, for cross-functional teams. The infra and app stack will follow easily. Over time, will start to get handle on valuation and reduce fragmentation.I know startups which are effectively organised this way. Now larger orgs need to do it too.
Big Data Teams, not TechnologyBig Data: Teams not TechNov 2012 1
Infrastructure Made for the cloud – the challenge is economicBig Data: Teams not TechNov 2012 20 jblyberg
Operational Systems Application Stack System Management Cleansing & Transformation ZettaSet ZooKeeper … Visualisation & Analysis Operational Delivery Scheduling, Workflow Oozie Pig … Capture Resource Management MapReduce … … Record Management SQL NoSQL … Tooling, Learning, Integration, File System HDFS … … Stabilising for Ops Storage InfrastructureBig Data: Teams not TechNov 2012 21
Skills Organisational context for multi-skilled teamsBig Data: Teams not TechNov 2012 22 izzyplante
Experimentation Create capacity to try stuff. Encourage failure.Big Data: Teams not TechNov 2012 23 NOAA Photo Library
Valuation Find specific value drivers & business modelsBig Data: Teams not TechNov 2012 24 dogbomb
Fragmentation Systems (human and technical) that bypass the silos Data quality. Data quality. Data quality.Big Data: Teams not TechNov 2012 25 turloughmor
Infrastructure Application Stack Skills Experimentation Valuation FragmentationBig Data: Teams not TechNov 2012 26 ell brown
Infrastructure Application Stack Skills Experimentation Valuation FragmentationBig Data: Teams not TechNov 2012 27 ell brown
CMO COO Build teams that can deal with variety Service & Application Development Service Mgt Infrastructure … & Operations Data Exploitation Operate in tight cycles – value experiments (Build technical Infrastructure & Ops support this)Big Data: Teams not TechNov 2012 28
Thank Yougraham@grahamoakes.co.uk@GrahamDOakesBig Data: Teams not TechNov 2012 29
Graham Oakes Ltd Making sense of technology… Many organisations are caught up in the complexity of technology and systems. This complexity may be inherent to the technology itself. It may be created by the pace of technology change. Or it may arise from the surrounding process, people and governance structures. We help untangle this complexity and define business strategies that both can be implemented and will be adopted by people throughout the organisation and its partner network. We then help assure delivery of implementation projects. Clients… Cisco Worldwide Education – Architecture and research for e-learning and educational systems Council of Europe – Systems for monitoring compliance with international treaties; e-learning systems Dover Harbour Board – Systems and architecture review MessageLabs – Architecture and assurance for partner management portal National Savings & Investments – Helped NS&I and BPO partner develop joint IS strategy The Open University – Enterprise architecture, CRM and product development strategies Oxfam – Content management, CRM, e-Commerce Thames Valley Police – Internet Consultancy Sony Computer Entertainment – Global process definition Skype – product development lifecycle Amnesty International, Endemol, tsoosayLabs, Vodafone, …Big Data: Teams not TechNov 2012 30