16. On-demand and cost-effective
cloud infrastructure
BI engineer Data scientist Data engineer
SQL interface
Unified and secure data platform
Notebook environment
Deployment of AI models
Job execution
Workflow management
Self-service Self-service
17. Analytics is high on the
corporate agenda
More data available
than ever before
A shift in consumption
of data
Focus on digital
feedback loop
Open data by default
Build common data infra
Data and analytics have become a key focus point of many organisations. Analytics is high on the corporate agenda
The effective use of data can reduce cost, increase revenue and create new business models.
But the reality is that most organisations are not well equipped to deal with this change.
Any analytics beyond Excel is new. Only a small fraction (10%) of the value that could be unlocked by analytics, has been unlocked.
Traditional BI and data warehousing departments got started. Maybe I’m biased in the clients we see, but if there is one thing coming back time and again, it is that these BI departments were ivory towers. Everything is centralised, it takes year of development, and information is only shared piecemeal. All control is with one department, and business is usually not satisfied with results. Even in the good cases, the only results that come out of these departments, are a bunch of reports. How actionable are reports? Sure, they do provide value, but there is more to data analytics than offering reports.
Another thing we’ve seen, is the data science lab. They put a bunch of smart people in a room and shield them off from business. More often than not these people are hired from the outside. And their goal is to do innovation, run cool experiments, try out new algorithms.
The problem with this approach of course is that it is very much a walled garden. And there is little to no attention given to business value. Now, of course, this is a problem to business, because sooner or later someone will start asking questions about where all that money is going. And no matter how cool your technical experiments are, no matter how many GPUs you’ve purchased for Tensorflow processing, if the business doesn’t see any value, they will stop that funding.
What people don’t expect, is that this will also isn’t ideal for the data scientists. Data scientists, like every other employee, want to feel that they contribute to something bigger. If they see their efforts go to waste, it’s a big big big demotivator. I’m sure you’ve all been on projects that got cancelled. The feeling of emptiness is not pleasant.
How can we do it better? Well, analytics succeeds if it really impacts the organisation. Your analytics is successful if it results in more engaged customers, if it results in more empowered employees, if it results in optimised operations, and if it results in transformed products. Microsoft calls this the digital feedback loop and I think it’s really helpful way of looking at it.
If your people have the right data at any time, they can make the right decisions
If you better understand your customers through data, you get an opportunity to engage them more
If you make the performance of your operations transparent to everyone, for sure you will be able to optimise them
And finally, if you let data inform you how your products is being used, this will transform how you go to market and what you offer.
These projects don’t have to be complex. For instance, we’ve done a predictive maintenance project for rolling material, where we simply plotted the distance travelled vs the amount of maintenance they got. From that one chart, the organisation revised their maintenance schedule and it saved them millions per year.
Where to start? Very often, when we come to organisations, they already know what they want. If not, you can go all out consulting. And offer them a 2 by 2 matrix. On the X-axis, you plot the feasibility of a data analytics idea. On the Y-axis you plot the potential business value. Like any good consultant, you tell them they should be high and to the right. That means, which projects are quite feasibile and offer an immediate return? It is important that you identify business sponsors. Projects without business sponsors have a fair high risk of not being adopted, and you get that data science lab effect.
There is more data available than ever before. You have your traditional financial and operational data. But next to that, you have product usage data, website data, marketing data, sensor data, … And so much more. We’ve never been able to leverage so much data as now for our analytics projects.
The irony is that there has never been more data silos. Studies show that data scientists spend 80% of their time collecting, cleaning and organisating data. Imagine, if you can break down the silos, how
It reminded of a story in this book. Team of Teams by General McChrystal. He talked about how until 9/11, different intelligence agencies kept their information in silos. After 9/11 they decided to share by default. This brings with it a lot of risks. Most notably Chelsea Manning and Edward Snowden got access to tons and tons of information and were able to bring that to wikileaks. Now, let’s leave politics out of it. It doesn’t matter whether you see Manning and Snowden as patriots or traitors. What matters is that, when you start opening up silos, there is a clear risk of data being used against you or even sold to competition. And you should have the right governance in place to make sure this doesn’t happen. Yet, at the same time. General McChrystal is very clear on the fact that the ROI of that decision has been tremendously positive. In your organisation, that means maybe shielding off things like HR data, or hypersensitive commercial data, but be very open about 99% of your data. Only then can everyone in the organisation make the right decision without delay. The ROI of that capability is extremely positive.
Related to this information sharing, this will us bring to a hybrid organisational model Where not everything is completely centralised, because that would create insane bottlenecks. But also not completely decentralised, as then you would create the same solution 5x and you would have to spend an enormous amount of money for every new analytics project. We’ve seen analytics projects across the entire spectrum, from completely centralised, where all information is protected by a central BI Competence Center. They didn’t want to share data to their users because they were referred to as “Monkeys with hand grenades” I can tell you, if you are this disrepectful about your own (internal) customers, you’re always using.
How can you be this open? While we see so many different use cases of data? How we consume data has changed dramatically.
On the blog of Martin Fowler a very interesting article appeared about how to deal with this. His idea is that you shouldn’t make functional groups: These are the data engineers, these are the web engineers, these are the data scientists, … No, you create domain specific teams. Each team owns the data that belongs to that domain. You have source domains, this is data that comes from operational sources. You have customer domains, that is outputs that you generate in an analytics project, eg a predictive model to churn, or a data mart for a report. And then, you also have shared domains, these are intermediate data sets that are generated from source domains and are reused across different customer domains.
All of these domain teams rely on the same data infrastructure. That doesn’t mean one data lake or one data warehouse. It means one team that is responsible for providing the capabilities of storage, data pipelines, catalogs, access controls, .. It is called a data mesh.