So what I like to take about and what seems to be resonating really well with our clients is a set of four paradigm shifts that we’re seeing in the Market. Now the first paradigm shift enabled by big data is that organizations are now able to leverage more of the data being captured. And let me explain what I mean by this - if you look at how most companies today do analytics - how they generate insight, how they support their decision making process - they analyze a small sub-set of information. The collect what oftentimes they refer to as statistically relevant sampling of information and then analyze that small sub-set of information to identify any patterns or trends and identify what they think might be happening or what they think is most relevant to their business - to then generate insights upon which they make all of their different business decisions. Including how to identify risk with customers, how to identify what equipment may be failing and needs to be addressed, how to identify security and fraud risks, how to identify what to offer to customers and cross-sell efforts, how to identify which customers to target. Basically almost any type of analytic decision in any organization - is really focused around these - this approach. And the fundamental approach or change that we’re seeing here is that now organizations are looking to analyze all information available to generate more relevant insights and we have some great examples of this, including a credit card company where they initially were generating their offers for balance transfer campaigns - those offers you guys probably all get in the mail offering to transfer your balance from one credit card to another. Well, the way they had identified those was through the traditional approach where they identify a sub-set of customers that they think may be interested in this. And use that to build their model - and what they found was they were getting declining response rates on those offers. So we actually we’re able to go in and show them how they could change the way they develop their models and instead of building their models against a small sub-set of their customer set, they could actually analyze their entire customer data set. So they could move from analyzing a small set of information to the entire customer data set. So that sort of gets to the volume of information, the other thing that they looked at doing is incorporating additional parameters and looking beyond the standard metrics like demographics, household income and average balance - to looking at transaction details and other types of information that they hadn’t been able to incorporate into their models before. And this gets to what you often hear people talk about is the different variety of information. Now oftentimes we relay variety just to the unstructured information, but variety is not just about unstructured, it’s also just about being able to leverage different types of data, even other types of structured data - other parameters and incorporate them into your analysis. So this is the first major paradigm shift we’re seeing is that our clients are looking to leverage more information than they ever had in the past and that is going to enable them to generate much more relevant and accurate insights upon which to base their business decisions.
Now the second major paradigm shift we’re seeing is very much related to that - and that’s the fact that when you are making your decisions based upon a small amount of information, it’s critical that the information is extremely accurate. So what has happened is most organizations have had to carefully cleanse all the information before analysis, they have to spend a lot of time and effort on pulling together that data and addressing data quality and in fact, what we find is most people doing the analytical work, spend 75-85% of their time actually preparing the data and only 20-25% of the time doing the actual analysis. Now here we’re seeing a fundamental paradigm shift where if you are analyzing when - a small statistically relevant sampling of data, it’s critical that that sampling has complete accuracy which is why this was so important. But when you’re doing analysis on larger data sets, you can analyze the information as is and you can deal with the noise more easily. So it’s changing the environments and what we’re seeing is organizations are looking to do at least some initial level of analysis before they have to go through all the efforts to cleanse and prepare and make the data ready for a more broader consumption. So the idea here is they start analyzing the information as is, and then they figure out what information they might need to do that next level of analysis, and then only cleanse as needed. And the idea here is that they can accelerate the time to insights and they can start delivering insights more quickly by reducing the cost and level of effort. It also is enabling them to fail fast - and this is something that you’ll hear a lot of your clients talking about - which is in the past, when they’ve wanted to generate some insights - they had an idea that they wanted to pursue, it took a lot of time, effort, and money for them to explore that and determine if that insight was valuable before they actually got an answer.
And as a result, often times when they failed, it was only after having invested a lot of time, effort, and money. So the idea here is if we can enable them to do that initial analysis more quickly, when they fail they’ll fail faster so they’ll have spent less wasted time, money, and effort on doing analysis and they can actually do a greater amount of analysis. And this leads to the this third paradigm shift that we’re seeing, and this is really challenging many of our clients because it’s changing the way they think about doing analytics - and the way they think about looking at the information across their enterprise. And the idea here is if you look at how most companies have done analysis in the past, they start with a hypothesis, they say I think this set of customers that look like this, will be interested in these sets of products, that’s an example. Or I think when under these conditions, this set of equipment is going to have problems and may fail. They then figure out what are the questions they need to answer that hypothesis, they pull together the data required to answer those questions and then they prove or disprove that hypothesis. So, the challenge we often hear from customers related to this, is they often times don’t know what they don’t know. They don’t often know exactly what questions to ask and they may be missing things. So the idea here is - and the change in approach is to let the data lead the way. Start with the data, explore the data, identify any correlations and patterns in the data, and then from that, generate insights. So explore all the data and identify correlations - and sometimes correlations are good enough. Now it’s important for us to recognize that our clients - and recognize in front of our clients that they need - that this isn’t always necessarily the end of the process and what they will do is they will generate some insights that they then will continue to need to explore further and do deeper analysis of. And, in fact, there are some great examples out there that we should be weary of and also share with our clients to show them that we recognize that further work needs to be done, and one that some of you may have heard me talk about is someone did some analysis and identified that there was a high correlation between the cattle population in India and the performance of the stock exchange. Now obviously, we don’t want our financial advisors starting to give us stock recommendations based on the cattle population in India. But, it gives you some insight into what might be find - what might be found in the data that you need to be weary of and explore further and apply some human intelligence to. But sometimes those seemingly unrelated correlations may actually have some deeper rooted causality behind them and another great example of this is when they did some analysis of some data from the 1700s, what they found in the Netherlands is they actually believed that the storks brought babies and what I mean by this, is they found that there was a high correlation between the new birth rate in the Netherlands and the stork population. So when the stork population went up, it had a lot more new births in the Netherlands. Now these seemingly unrelated activities - what they found when they did further analysis were actually very much related - and what they found was they were both being caused by the same thing. And that was when the crop - the wheat crop output started flourishing and was better, than people were healthier and as a result, there was less infant deaths and more births that were going on and at the same time because of the extended wheat output and crop output, there was more feed for the storks. So the stork population grew - so there was actually some relationship there - while it wasn’t things directly caused each other, there was some other factor they identified that was causing both things. So, the point here is to provide some analogies that our clients may be able to relate to and see how oftentimes just identifying these correlations can provide significant benefit and value.
So the fourth paradigm shift is equally important for us to be able to talk about and you’ll see as we go through all of these paradigm shifts, there’s a key reason that we’re going to bring these up and it’s because they really lead to discussions that we can be having with our clients about ways that we can competitively differentiate ourselves. And this fourth paradigm shift is critical to that, and in the fourth paradigm shift that we’re seeing - is that organizations in the past have taken data, landed it in a repository, then did some analysis and then they generate their insight. So they analyze data after it’s been processed and landed in a warehouse and the paradigm shift that we’re seeing here is that with these growing volumes of data, and with the need to deliver faster insights to the business, we’re seeing that organizations are looking to leverage the data as its captured. They want to analyze the data while it’s in motion and capture it and generate insights in real time. And sometimes this is to generate insights in real time and sometimes it’s just to enable them to process these extreme volumes of data that they’re now dealing with. And their traditional batch oriented processes for capturing and extracting just general information, let alone insight, is being challenged. So they’re looking to how they can leverage the data and process it in motion. And, of course, this will talk about later - IBM is uniquely positioned to be able to address this particular requirement. So it’s important that we highlight this - this paradigm shift to our clients.
And in this high level picture that you’re seeing here on slide eight - it’s a great example of that. You have transaction and application data that’s generated in operational systems across the enterprise. That data is often pulled and dumped into some staging area or pulled directly into the warehouse where you provide a normalized view for the enterprise, and they come and data model for the enterprise so that people across the enterprise can review that information, analyze it, report on it, et cetera and the idea here is that you go through a lot of data quality, data transformation, to pull that data into that common environment. And then, you pull it into different views or into data marts to enable people to do business specific reporting and do some ad hoc analysis in review of that data. And then, separately, you’re pulling data from the warehouse and often times also directly from your operational systems, into your predictive modeling and analytic environments to your SPS or SASS or other environments to do that deep analytics and predictive modeling. And then finally, what we see is people have been trying to reduce and manage the growing cost of the warehouse so they try to archive data that’s less frequently used from the warehouse, and I heard a great analogy from someone about this where they said that basically, they believe the archive is where data goes to die and they can only get data out of the archives basically, through an act of God, or an act of government. And they basically said that if you look at most cases, either you’re pulling - the only way you get data out of an archive is an act of God where there’s some natural catastrophe where you need to through your DR plans to recover the data or it’s an act of government, where there’s some law suit that requires you to pull data for legal or regulatory reasons. So I thought it was a great analogy that we can relate to, that these archives often become basically graveyards for data where organizations can’t necessarily get data out of in the future. And of course, you’ll see why this becomes important as we talk through the next generation architecture of capabilities.
One other great analogy that I’ll share with you that I heard from a guy that is responsible for information management environment at a large company, told me that basically it was taking - he had an eight hour window where he could get data loaded into all of his analytic environments and he said his current batch processes were taking seven hours and basically what that meant to him was he was an hour away from being fired every day. And if anything ever went wrong or if that ended up the time window ended up increasing - once it went past eight hours, there was no way to catch up. So this is what a lot of our clients are dealing with and often times they are limiting how many processes they run, they are taking other measures - so this is the way that we can help them deal with that and start leveraging more of the data across the organization. So the next set of capabilities is equally important and this is another key area where we see and where you’ll see why the - how the big data story expands to the rest of our portfolio. And this is a key thing that we need to start talking to about with our customers because we’re the only vendor that can truly address this broad set of capabilities. And that’s the fact that information integration and governance capabilities need to be applied to big data to all your data. And there are a set of key capabilities that are required, not just in between or to move data in between, but as a - really required as a back plane to their big data environments and there’s something that a lot of our customers initially - that are in this experimentation mode haven’t been thinking about, but will need to think about as they move into production. And it starts with a standard information integration capability. So just the ability to move data across all of these different zones and as we move towards these purpose built components, as we move towards having multiple zones for data, the information integration capabilities are going to become critical. And in fact, one of the things that we see is the information and integration capabilities or approaches that people are taking with Hadoop are very immature, require a lot of coding - in fact, many people have said to me it’s like going back to the days of cobalt coding. And this is what we need to sort of be able to position our technologies to help customers address. And we’ve done a tremendous amount of work to be able to allow customers to basically apply data stages and information server jobs and run them directly within our Hadoop environment. So we have tremendous capabilities and competitive advantages in this space and we need to be talking about how our information integration offerings can be used across this next generation - this next generation data architecture for big data and how it can be used to their advantage to enable them to really take advantage of the data capabilities. Another capability that’s just as important and going to become even more important is as you start bringing in these huge volumes of data and more data sets from more sources, and if you see on the left side, what’s happened through this presentation is as we’ve added zones, we’re actually adding the ability to handle additional data types. So when we added the exploration and landing zone back a few charts - a few slides ago - you’ll see we added the ability to capture and manage image and video and enterprise content and social data and third party data. As we added the real time processing and analytics, we added the ability to handle machine and sensor data and other types of data. Well now with all these different data types and broader set of data sources, it’s even more critical that we can identify all the data related to a particular customer. All the data related to a particular piece of equipment, et cetera. So master data management capabilities are going to become even more important in this big data environment and I think we’re going to see a resurgence of MDM discussions around that and it’s a huge opportunity that we should be leveraging and taking advantage of. At the same time, there are other more advanced data matching technologies and capabilities that we can provide, even beyond MDM and separate from MDM. So the ability to connect the dots - and there’s a great example in retail - if you think about how a retailer gets information about their customers. When you walk into a store and make a purchase with a credit or debit card, all they really have is your credit card number and the amount and what you’re purchased, they have no information on you and who you are. Then when you go on line and start navigating a site, and you’re just browsing a site, but you haven’t actually logged on and you might throw things in your cart and then abandon it and you might just look at certain products on line, they actually have a browser for your footprint and they see what products you looked at, but they don’t know exactly who you are. Now all of a sudden you go on line and purchase something, and now when you go on line and purchase something, they have the browser ID or the cookie that they’ve dropped and they have the card that you used. Well now by connecting the dots, they can see, well, that’s the same card that you used when you went in and bought something in the store. So now I can identify that the person that bought this stuff in the store is the same person that just bought this stuff on line. And now I can relate those two things together. Now I can also go and see, hey the browser footprint, the cookie is from the same browser that went in and browsed different products on the site and looked at different products on my site. So this is what we mean by connecting the dots and data matching. And our analytics and the next generation of that platform G2 provides some unique capabilities that we can provide to our customers that we need to be able to talk about. But that all has a huge play in our big data story. The next piece is a Meta data and lineage and this is even more critical in these big data environments and its going to be one of those things that’s going to hit - that’s going to blindside a lot of our customers. A lot of our customers are just dumping a bunch of data in Hadoop now for experimentation purposes, playing around purposes, trying to reduce the cost of their warehouse. But what they’re quickly going to find - especially in the regulated environments is they’re losing all that Meta data and lineage. And that’s a unique capability we can bring to the table by enabling people to maintain that lineage and maintain that Meta data when they put the data in our Hadoop environment and when the apply our information server platform - our Meta data management technologies and things like that. Then of course, security and privacy becomes a key issue. You’re going to - clients are going to start putting in place customer sensitive information, personally sensitive identification information in these Hadoop environments which today don’t have any true security or privacy capabilities. So being able to apply security and privacy, to be able to apply things like Guardian and Optim, do data masking, et cetera to big data environments is going to be a key requirement and something we should be going after. And then of course, being able to manage the life cycle of information in these environments. I talked about Visa that has over nine peta bytes of data in there, that’s only going to continue growing and what they haven’t actually figured out yet is how do they - when do they actually remove data from there - from that environment. When do they delete it? When do they get rid of it? How long do they retain it? There are no built in capabilities for doing that in most Hadoop environments. We actually have a unique capability there that no one else has in the Market.