Hi I’m… Welcome to my talk on Big Data and Corporate Evolution
Please feel free to contact me or tweet at me or about this talk while I’m up here.
I might be taking a bit of a risk with this talk, because rather than diving into the technology behind “big data” I’m going to try to talk about big data in the context of the industrial age and our continuing transition into the information age. Then I want to explore the impact of big data on evolution of the corporate form.The industrial age is often split into the periods that were based on steam and electricity. The information age also has significant moments of technological discontinuity, and one of my arguments here is that “big data” is one of themThis is a talk about what the corporation was, and what is becoming. And how the idea of Big Data might influence it. My core argument is that big data isn’t just another technology trend that will come and go, but that like the Internet before it, it is a major discontinuity and will cause major changes in the form of the corporation. . It isn’t a technology, it is a technology epoch.
Let’s go back to the industrial age. It was an era of vast change when we harnessed first steam and then electricity to achieve huge increases in productivity.
And it was the first time in human history to see a sustained growth in GDP. For the first time ever the average human got wealthier.
It was an era characterized by ever more specializedwork that often occurred in huge industrial settings.To coordinate the efforts of all of that specialized labor across these vast enterprises, we used lots of rules and hierarchy of authority. At scale the corporation was no longer the exercise of an owner’s will, it was a kind of organism.
An organism whose systems of control were based on the bureaucracy they adapted from governments. Bureaucracy, literally “desks” from the French, consisted of functional departments focused on specific tasks of the organization. They held and moved paper between them for a combination of record keeping and control.
In short, bureaucracy is an … Thecorporation and bureaucracy became synonymous during the industrial revolution.
So, getting back to the subject of this talk, let’s make an evolutionary analogy. At that stage, a bureaucracy of functional departments based on paper memory and signaling, the corporation was something like a nematode, an incredibly plentiful and diverse but very simple worm with a simple nervous system.It turns out that over 16,000 nematode species are parasitic, but that is not relevant to my talk.
An typical nematode nervous system consists of only 302 neurons. In fact, its core pharyngeal nervous system (essentially its brain) contains only 20, yet it is able to effectively manage homeostasis, direct movement, detect information in its environment, and create complex responses. Of course the nematode isn’t “conscious” of any of this. These are dispositional responses; essentially deterministic reflexes encoded in the simple network. The worm has dispositions to move toward food for example. These dispositions aren’t unlike the rules and processes encoded in corporate bureaucracy. They too are adequate to manage its response to market and other stimuli. At least those that fall within a range of expected conditions.
In1954, Joe Glickauf of Arthur Anderson implemented a payroll system for General Electric on a UNIVAC 1. This is one of the first general purpose computers used to automate a traditionally paper-based business process in the U.S. (and the beginning of IT consulting). Systems like this were adopted rapidly throughout the 1950’s and thus began the corporate shift into the information age.
Of course, even after corporations began to rely on computers for a variety of data processing tasks, business remained bureaucratic. Which is to say still hierarchical, based on fixed rules, and specialized functions.
In fact, as we automated those existing bureaucratic components, we usually just adapted the previous paper based systems into code. Invoices and trades and accounts and inventories and etc. migrated into the machine. We emptied our filing cabinets into database tables but we didn’t immediately change much about how the business worked.
If we summarizethe high-level characteristics of information technology over this phase of the information age, it might look like this. Essentially mirroring the bureaucracy it was automating, and for most industries, focusing on costs and efficiency.
During those first projects in the 1950’s we didn’t fundamentally evolve the worm’s nervous system, we mostly set about digitizing it in its existing form. Substituting digital automation, controls, and record keeping for paper. In fact, that’s mostly what we’ve all been doing for the last 55 years, wiring the worm and automating bureaucracy.And while for a long time we didn’t really change the characteristic nature of the corporation. It remained dispositional and reactive, it did become more responsive, efficient, and scalable.And all of this workwas a departure point in the corporation’s evolutionary history That digital foundation would become the substrate on which further evolutionary processes could occur.
And then, right in the middle of this process, about 30 years ago or so, Leonard Kleinrock, Lawrence Roberts, Robert Kahn, and Vint Cerf invented the computer network that ultimately became the internet, and by the mid to late 90’s these technologies began to have an impact on the corporation…
Now instead of just automating internal processes, we began to focus on integration with trading partners, etc.
Our little worm was beginning to sense, and in very rudimentary ways, interact with its more remote surroundings. It could see further and respond faster.
But that network connectivity isn’t just changing the corporation’s external interactions. With the rise of new communication and collaboration mechanisms it is changing how we organize internally, if not intentionally, then in an ad hoc emergent way. More and more we are ignoring our org charts and organizing organically in direct response to the work. Finally, bureaucracy began to give way to other models of organization and some business, particularly on the web and other information focused industries, were beginning to fundamentally change what it meant to be a business.
And we *needed* that increasing internal organizational complexity. A hierarchical organization can never be smarter than those at the top, but our worm’s extended ecosystem is not too complex for that small group to deal with.So we need internal structures, processes, and time cycles that are better able to cope.
Ok, so we’ve made it to the part of the information age that is based on computing and internetworking and are about to enter the period of big data. So, let’s take a moment to talk about what big data is. We’ll come back to our meta discussion after this short explanatory digression.
This term is getting a lot of mileage these days, but what does it mean?
Here Is one definition. It’s a pretty good one I think.
But that’s probably way too specific to really capture what is going on… There isn’t really a single definition right now. As a term, “big data” is more like a word cloud of related ideas that are influencing significant changes in how we store, manage, and analyze information in the corporate enterprise.Let’s just skim a few of these…
And it all probably started with web logs, back when someone at Amazon or wherever said “You know, these are useful for more than just troubleshooting web servers.”Logs, the first big data source, provided direct observation of customer behavior in near real time. That is a powerful thing, and companies away from the web are waking up to the possibilities of similar kinds of data in other domains.Path Intelligence in retail, Progressive insurance, Set top boxes, smart meters, phone location data (traffic monitoring), hardware and software heartbeat / phone home data, video and audio, …Some of this data is data that they already had, but just viewed it in an operational context until now, and some of it is newly acquired data. Where they are purposely designing products to not just serve their customers but to also capture new forms of useful data.
Since that time we’ve seen massive growth in non-transactional, and generally less structured, sources of data that are greatly outstripping the growth in core transaction activity. These data sources are dwarfing traditional And to reiterate, it’s not just web logs, it’s all kinds of semi-structured and unstructured data that once would have been considered useless from an analytical point of view. The corporate enterprise is discovering the value in all kinds of data beyond the transaction – geo-locations, unstructured text (e.g. twitter), machine logs, sensor data, … other “data exhaust” from our increasingly digital lives…
And let’s not forget “other people’s data” – even bigger still. Our worm is beginning to see the outside world and the outside world generates a LOT of data. We all know about things like using twitter and other social streams to conduct sentiment analysis, but governments are opening up massive data sets through their open data initiatives and companies like Infochimps and Microsoft are creating data markets that make all kinds of massive data sets available.Traditional = enterprise data. Usually transactional records originating in an OLTP system. New, unstructured data. Previously thrown away or ignored. Things like web and server logs, VRU records, …Other People’s Data. The result of open data initiatives, data brokers, etc. Often unstructured web or social media data.
Why is it happening now? What is causing our basic architecture for data storage to change?Storage is getting cheaper faster than networks are getting faster, and while CPU’s are getting faster, they are doing it with multiple cores. Moore’s law on a single core is basically dead – and that is driving architectures toward parallelism. It’s happening later in the corporate enterprise, but we are going to be following the path the web blazed, the future is parallel.
The first implication of all that cheap disk is that we are going to fill it up. We’ll always fill it up.
The other implication is that the data on those disks is going to be heavy and is going to tend to stay put. That’s why we are seeing architectures like Hadoop become popular, because unlike the traditional RDBMS where you run a query to move the data to the analyst, we are going to leave the data where it is and send the analytical algorithm to local compute nodes.No longer just a “persistence layer” in our applications, data is going to be the platform on which future applications are built.
Beyond that relationship between disk, cpu, and network, there is a whole confluence of forces that will make us think differently about data. And in fact, data is going to be much more central to our enterprise. One of the important ones that I want to point out here is the shifting enterprise IT focus – from an inward / cost focus we can expect to be more and more focused on using data to enhance revenue. We are going to experience significant cultural changes as even every day corporate IT jobs take on more of the characteristics of web and product orientation.
Once an enterprise has all this data, what are they doing with it? If we can process it at scale we can conduct all kinds of analysis: machine learning, statistics at scale, “data science”, behavioral analysis, … in short, reasoning.
Which brings us to the other popular term in this space, Data Science. Data science is about applying a facsimile of the scientific method to our data with the goal of turning data into products. We won’t just be recording the sale of products, or automating that sale, data and its analysis will be a core component of the products we build and sell, whether we are on the web or in more traditional industries.Think about how web companies turn real time data analytics into re-ordered search results for example, then apply that kind of thinking to other businesses.Or as Brian Dolan of Fox Interactive Media says “I turn XML into cash”
But it’s not enough to analyze data. For it to impact the products we make and sell, we have to close the loop – whether it be an “act” step at the end of the scientific method, or …
…or the OODA loop (if you are familiar with that term). It’s not just the data and our ability to analyze it at scale, it’s the closing of the loop of your choice. It’s making the data analysis an inline process – moving beyond dispositional responses to much more intelligence.For example, on the web if you back up your analytics with strong agile development and devops you can nearly continuously deploy new features. Or even better, the applications themselves can be data and analytically driven. This will look different in other industries, but expect to look for new data sources, new sources of feedback, and new ways to close the insight to product loop more quickly.On the web this loop is becoming more and more automated. Data comes in, is analyzed (reasoned on), decisions are made and deployed into the product automatically.That may not happen in your industry, but more data will give you more ways to more rapidly change your interface to your customers.The latter bits of this cycle are like motor nerves in an animal. You have to be able to act on what you learn and think.
So, let’s get back to our meta story about corporate evolution. Big data is giving corporations the ability to greatly increase the sensing resolution of their environments, to understand the behaviors it sees, and to predict future behaviors to create more attractive products. In a very real sense the corporation is developing the ability to map its environment and reason on those maps.
When it comes to its primary businesses, the analytically-enabled corporation may turn out to be smarter than the collection of humans that run it.The corporation is a legal fiction that permits a large group of people whose efforts are coordinated to appear before the law as a single entity. Historically its organized labor was much greater than the individual labor of its participants, but it’s organized sum total intelligence was less than the sum of the intelligence of its participants. Generally a corporation could be no smarter in the marketplace than the smartest person at the top. But the combination of less hierarchical, more participative organization, and closed big data feedback loops to product are changing that. The corporation is becoming smarter. It is evolving the early beginnings of something like a mind.
So, we are at a point of discontinuity. For fifty years we’ve focused on bureaucracy automation.
But now we are shifting our focus. Let’s make up an absurdly difficult to say word to capture the goal of IT in the future. Our jobs are going to be migrating from making the corporation more efficient, to more intelligent.
So, prepare to set down your ESB and grab a compute cluster full of data.To make companies smarter our jobs are going to be changing. We are going to be processing existing data, acquiring external data, and looking for ways to create more. You may find yourself designing product features for their data gathering potential.In addition to growing data, we will be building more and smarter ways to process it and analyze it. And don’t forget, we will be re-architecting our systems, processes, and culture to be able to act on what we learn in real time. Our corporate organism doesn’t just need a brain, it needs motor neurons as well.An intelligent corporation is one with a post-dispositional mind wired to action, one that participates in closed loops.
And now our primary job is to keep making it smarter.
As the corporation evolves the characteristics of its information technology evolve with it.
This doesn’t mean that our current jobs disappear. Far from it. The corporation will have “vestigial IT” too just like the human brain still has dispositional regions. After all, we still pull our hands away from a hot stove without thinking about it first, and companies will continue to automatically resupply empty shelves. Our existing automation and transactional systems will still be there working in concert with this new “big data” layer of sensing, mapping, and reasoning.
Ok, so let me leave you with a silly picture. to show how this new company will combine traditional dispositional and new “image mapping” and reasoning capabilities in a single architecture. Big data doesn’t do away with any of the current things that corporate I.T. does, but it adds to the overall architecture by adding memory and reason to the existing dispositionally oriented systems.
Jim Stogdill / AccentureBig Data and Corporate EvolutionA meta discussion
When the data size and performance requirements “become significant design anddecision factors for implementing a data management and analysis system.” Roger Magoulas and Ben Lorica, O’Reilly Media
Machine Parallel Learning Privacy Open Petabytes Source NoSQL Data ExhaustCassandra Hadoop R Sensors Unstructured Analtyics Cloud Open Data Data > Algo Creepy Predictive
Industrial Revolution Information Age Steam Electricity Computing Internetworking Big Data Brain image soruce: http://mset.rst2.edu/portfolios/t/thoman_j/toolsvis/mapplerproject/brain.html
Industrial Revolution Information Age Steam Electricity Computing Internetworking Big Data Dispositional Automated Intelligent Brain image soruce: http://mset.rst2.edu/portfolios/t/thoman_j/toolsvis/mapplerproject/brain.html
Emergent Agile Insight Networked Reasoning Responsive IntelligentSensing, exter nal focus Hypothesize and Test Maps and Images OODA Profit Center Resilient
“The two spaces point todifferent ages in brainevolution, one in whichdispositions sufficed toguide adequate behaviorand another in whichmaps gave rise to imagesand to an upgrade of thequality of behavior. Todaythey are seamlesslyintegrated.”From Self Comes to MindConstructing the Conscious Brain,Antonio Damasio