See what's new in our latest version - http://www.talend.com/products
Most enterprises understand that digital transformation is necessary in a successful enterprise, but the digital divide of leaders and laggards is widening. Here's how both IT can enable business with self-service data.
2. #TalendConnect
… but the digital divide is accelerating
Most enterprises run their digital transformations…
1Digital America: A tale of the haves and have-mores - McKinsey Global Institute report December 2015
http://www.mckinsey.com/industries/high-tech/our-insights/digital-america-a-tale-of-the-haves-and-have-mores
1997
8%
Leading sectors (base 100% in 1997)
2013
14%
4.1x increase in digitization
Rest of economy
2005
12%
1.7x increase in digitization
Degree of
digitalization
4. #TalendConnect
Bring Your Own Data
Driven by Lines of Business
• Quick and dirty
• Ad-Hoc
• No control
But Self-Service brings new challenges for Data Governance
Authoritative Governance
Driven by Central Orgs
• Producers define the rules
• Consumers follow the rules
• Heavy control
5. #TalendConnect
Data Governance 2.0: the case for collaborative governance
Collaborative Governance
Hybrid organizations
• Everyone is a producer
• Policies are a work in process
• Data is a team game
6. #TalendConnect
Use case and demo
Personalized marketing campaign to
tailor offers based on customer’s
attributes (size, color preferences,
purchase history, clickstreams)
ALTO
Note : Alto Fashion is a fictitious company
7. #TalendConnect
Introducing the data-driven tasks force & the demo
The Data Protection Officer
• Ensures compliance with
privacy rules (opt-in)
The Regional Marketing Manager
• Runs the campaign in France
The Data Architect
• Implements the whole
data-driven process
The Data Scientist
• Designs the
recommendation model
8. #TalendConnect
Roadmap 2017
Addressing the needs of large enterprises
Big Data
1st on Spark 2.0
&
Data Prep on Big
Data
Data Prep
&
Data Ingestion
Cloud Self-service
Data Stewardship
&
Self-service
connectors
Governance
Apache Atlas
9. #TalendConnect
Paving the road for governed self-service
Run Talend 6.2
or beyond
2 FREE USERS
Activate your
2 Free Users
Introductive video
Download Free Desktop
Getting Started Guide
Discover Data Prep Get trained
Attend the Free On
Demand Training
FREE UNTIL 12/31/16
1’
We all know that our world is becoming digital as we are hearing every day new stories of data-driven companies outperforming their competition. This impacts every organization no matter its industry or size. Today, most organizations have already engaged their digital transformations.
However, in their most recent research, McKinsey shows that the majority are only reaping a small part of the potential benefits of their digital transformation (an estimated 18% in US).
And McKinsey sees a gap widening between what they call the have and the have more: meaning that despite their transformation program in process, many companies are left behind the “have more” at an accelerated pace (http://www.mckinsey.com/industries/high-tech/our-insights/digital-america-a-tale-of-the-haves-and-have-mores)
The digital divide is accelerating, and the risk is big to be left behind
In particular the research shows that the “have-more” succeed in empowering operational workers to do their jobs more efficiently, by digitalizing routine tasks and creating new digital jobs faster. (http://www.mckinsey.com/mgi/overview/in-the-news/the-most-digital-companies-are-leaving-the-rest-behind”)
With employees across the company empowered with greater access to technology, the tech expertise inside your business is no longer limited to the IT department.
Cloud applications often came through Sales, Marketing, HR, operations or Finance to complement the centrally designed legacy apps, such as the ERP, Data warehouse or CRM.
Digital and mobile applications provided new channel to connect with external parties.
New insights and data sources that were not controlled centrally could be reached as well, which gave the birth of the Big Data era and is exploding with the Internet of Things.
So we see the rise of data-driven talents. This all started in analytics with the rise of self-service Business Intelligence. According to the US Bureau of Labor Statistics, there might be between 4 and 6 million people employed as data analysts in the US alone, and this is growing by 27%. We’ve all heard about the shortage for data scientists as well , with many company struggling to expand their taskforce with such roles.
As regulations evolve we also see the rise of new roles for assessing, certifying and documenting data. A perfect example is the Data Protection Officer, a role that the new Global Data Protection Regulations is mandating by 2018 for any company processing personal data related to European citizen on a large scale . It is estimated that 28000 jobs for those data specialists will be created by 2018 ( https://www.euractiv.com/section/social-europe-jobs/news/new-eu-privacy-rules-will-create-28000-jobs-says-industry-group/).
In the meantime, everyone across lines of business is morphing into a data worker. Think of the marketing department whose activities have been profoundly digitalized, and who needs data a t each and every steps of their activities.
All those people need to be empowered with data, and that’s the promise of self service. Indeed, the goal is to boost their productivity with data and there’s lot to gain in this area as survey say that people are spending from 60 to 80% of their time manually crunching data instead of reaping the insights.
2’
But self-service is not only a personal productivity issue: Data is everywhere, in centralized or decentralized apps, in data warehouses, data lakes, or excel spreadsheet. When people becomes self sufficient with data, this creates silos, which goes against the ultimate goal . This is what we call Bring your own
That’s why we believe that goal of self service is not self-sufficiency, but rather allowing people to work on shared data without putting data at risk. There’s no self service without proper governance.
Today, organizations are struggling to establish that governance layer
The reasons is that traditional governance practices where designed for very centralized data management approaches.
In those approaches, a small number of data specialist are designing data models, defining the rules on data, and establishing the policies for data quality, data protection, and data access. Then they monitor data usage with audit trails. This approach have proven success when applied to small set of highly shared data, especially heavily regulated data. Talend provides components to address those use cases such as Master data Management or Metadata Manager.
But, unfortunately, for self service, those centralized approach might not a good fit. This happens when data management is driven by multiple stakeholders in the organization, rather than by a
single data specialist. An recent survey from Experian shows that trend, highlighting that data management approach are shifting from centralized 531%) to hybrid (51%), and highlighting the new roles of data experts such as Data Analysts (42%), data scientists (28%), Chief Data Officer (22%), CFO (22%), or CMO (14%) to support data management strategy
(Soucre : The 2016 global data management benchmark report)
Collaborative Governance
3’
Overall : 10 minutes
So, an alternative approach for data governance is emerging, and we call it collaborative governance.
Think about models like Wikipedia where everyone can be a provider as a well a as consumer. And then there some established curation policies in place to make sure that the content is accurate and tagged as quality proofed.
Those models established themselves with the proliferation of data that came up with the internet era, and we see similar models establishing themselves for data governance with the proliferation of data that is coming up in the Big Data/Self Service/ Cloud era
We will showcase how it works in the real life with the demo, but let me first expose the principles.
Self service can not only empower lines of business users with data, but it’s a way to reclaim all the data that is being processed by data workers. And it’s not only about capturing the data sets that is being used, but also the actions that taken to put this data at work.
And this is precious, because that shows how data scientists are turning raw data into smart data, how data workers are cleansing their data before their analysis.
Then you put more methodology and processes on top of it.
Let’s tale an example on fixing data quality issues in your marketing database. In most case, there is no central organization that cleanse the database, this is the responsibility of the marketing organization. Collaborative governance would allow them to organize themselves as a team. To tackle bad contact data, they would organize themselves as a task force for a couple of day to fix data quality issues. They would assign roles, such as who can work on suspicious data for existing customers, who would have similar responsibilities but for new contacts that were acquired from third parties, while other would focus on the contacts that were captured through the web or social networks.
Overall, engaging the ones that know the data best would bring huge improvement in the quality of your contact data. And this would translate into big benefits form the one that engaged now that they campaign can reach the proper target.
1’So let’s dig into our use case, this is not a real case, although it is inspired by a couple of real life stories.
But, for a while, let’s consider that Gwendal and I are working for Alto Fashion, an rapidly growing Footwear, Apparel and fashion company
This year we want to grow significantly our back Friday sales, and this growth will be data driven. Through personalized campaign, customers will be guided to the products that fits their size, typical budget, and meets their preferences and buying intentions.
This is the result of a big Data projects that started earlier this year. The CEO is pushing this after the roll out of a quick win in Spain that demonstrated an uplift of the sales conversion rates by 40%. Applied to black Friday sales, that would drive €3M additional sales.
To achieve this objective, a task has been created
2’It is composed of 4 stakeholders.
The data scientist has already designed the recommendation model. Using Talend Real time Big Data, he ingested the needed data to learn with the Spark based recommendation engine in ML LIB. Together with his IT colleagues, he operationalized the model so that it can be invoked by the marketing automation application for outbound campaign and by the websites and mobile for real time personalization
But Alto Fashion is very serious about their customer privacy. So they want to be sure that this campaign apply only for customers that gave their consent, you know the famous opt-in that every company that wants to do business with European citizens have to consider in the context of the upcoming Global Data Protection Regulation European law. And they have appointed a Data Protection Officer for that, as the new regulation suggests and will mandate. So the Data Protection Officer is the guardian of customer privacy, and for that he uses Talend Master Management together with self service apps.
Let’s now talk about me. I’m the marketing country manager for France, and need to make sure that the campaigns delivers on its promises in my region. The bad news is that in my region, the marketing database has been outsourced to a third party and this is where you have all the opt-in data for French customer.
So the IT departments asked me to bring back that precious data into our IT system
And for that, I’ll work with Gwendal, our data architect.
Gwendal : Hi Jean-Michel, let me start by the basics. Do you have the opt in data ?
Jean-Michel : well, yes, the marketing agency sent me the file. But I’m afraid it not exactly in the way it should appear in our CRM
Gwendal : But, you can use Data Prep for that
Jean-Michel : Oh you’re right. I thought if was just for analytics, but I guess it would fit here as well.
Let me try. I’ll start by opening the file from my partner in Data Prep. There it goes.
Let’s first remove the opt out customers
Then I need to cleanse and standardize my data
Let’s standardize the phones
Now, can directly drill down to invalid data. As you can see, Data prep is smart enough to understand my data? He found that the first columns is a first name, the second a last name, the third an e-mail, and the fourth a city. And the magic is he can point out to me the invalid values as well. Look at this e-mail flagged in orange. It looks like the domain extension is missing. Let’s correct it. And there’s not a city called Montouge, it’s a typo, it should be Montrouge, indeed. Let’s fix it.
Now, I fixed all my invalid data.
But I can do more. Data Prep highlights me so spaces in the first and last names, let’s remove them. I also see contact with empty e-mails , let’s drill down to those one. Well, no address, no phone, nor e-mail, this useless let’s delete them
I’ll also retrieve the sales org related to each contact (Extract part of Text + Lookup)
I’m done. See my preparation ? Just 7 actions, and my data is ready for you
Now I can export the file and send it to you via e-mail
Gwendal
Don’t do that, please !!!!
Don’t you remember that our data protection officer asked us to avoid using personal files and e-mail with our customer’s personal data? There’s a better way, let me show you how I you can share your preparation with me
<Gwendal explains how to share a preparation in Data Prep>
Jean-Michel
Waouh, that’s pretty cool. I could share that way with my colleagues as well?
Gwendal
Of course.
But now let me show you how I can bring back your data, and cleansing rules into my Tool, called Talend Studio. Don’t be afraid, you will never use this, it’s a tool that we use at the IT. But, believe me, it’s pretty Powerful
You see how I can get your data and connect it with our CRM database. For that, I will use a matching component. When the contact in your list isn’t yet in our CRM, I just add them. If they are already, I just add the opt-in info from your file. But in some case, I could guess, for example when the contact data in your file and the CRM look similar, but don’t completely match. In that case, I would need to ask you. But let’s launch the job and see how that goes.
First thing to notice is that by leveraging your preparation that you’ve built in 2 minutes and 7 clicks, I got rid of 15 % of your data, because they didn’t match your business rules. If I had to do it on my side, it would have taken days, if not weeks. We would first need to align on your business rules because I cannot guess them, then I would implement them, then I should ask for test data, then we would probably have back and forth to adjust the rules. Here, I know that your preparation is trustworthy because you designed it while being able to see the outcome of your work.
And now, we indeed have known contacts that have been updated in the CRM with the opt-in info and we have several new contacts. And for few contacts, I do need your help because they look alike what we have in the CRM but are not identical.
Jean-Michel
Well, just tell me.
Gwendal
There’s a better way with Talend, this is called the Stewardship App. So you see, through this tmatchgroup component, I just bring the data into this tool for your intention. There you go! Now you just have to connect.
Jean-Michel
Ah, now I see.
So this is pretty simple.
Let’s look at the tasks that are flagged with hi priority. This other, I’ll manage once our guests have left .
On the first line, there’s obviously a typo in the first name from our CRM system ; lets chose the value from my supplier’s source for this one
For the second record, I’ll do the opposite : I trust more the CRM e-mails that the one from the digital marketing where customer might reference their spam box ather than their real e-mail
For the third one, I’ll chose the e-mail from the external data source, as the other one is empty.
For the fourth one, the phone numbers are different and I’ll choose the external data source one, as the customer may have had a new phone. Plus, I know thanks to the semantic analysis that this phone number is not a dummy one.
So what‘s next
Gwendal
Well just save and we are good.
Gwendal wraps up
So as you’ve seen in this scenario:
- cost and delays can be drastically reduced
- simple and efficient way to improve data quality is to engage the business users that know the data best
- collaboration between a business user and IT is a pragmatic approach to governance. It allowed to connect two data sets, one that was fully in control of the enterprise and its IT, and another one that was not managed, and not even known, by a central authority. In our example, it is crucial to reconcile those disparate datasets to achieve outcomes such as revenue growth, and to comply with new regulations like GDRP. Only a sound collaboration between business and IT can make this thing happen. And we’ve shown also how we Talend Platform orchestrating those kind of collaboration, through a set of interconnected tools that can meet the profile and level of expertise of each persona.
Big Data
Big data and cloud innovations including Spark 2.0
Staying on the cutting edge of big data innovation, processing big data at the fastest speeds possible.
ML on Spark: what we saw today, but at a greater scale
Data Preparation for Big Data lets anyone access and cleanse data.
Enables the information worker to turn data into insight at scale
Enables the entire organization to access “trusted” data in the lake
Cloud
- Data Prep as a Service
- Democratize Data ingestion via tools accessible to Data scientists, with a similar UX to what we presented earlier
Self-Service
New Data Stewardship App helps orchestrate data governance between IT and business. It empowers the business to ensure data integrity at the source.
Data Prep Self-service connectors: Big Data, Cloud, but applicative connectors too (SFDC, MKTO, …)
Governance:
We talked a lot about it
In Big Data, you have a lot of data, from which you have very little knowledge. Atlas integration provides traceability & lineage
Speaker JMichel
Risk and compliance with data transparency
Increase agility across the data landscape
Turn data into a business language
3 thing
1’
In particular the research shows that the “have-more” make the difference in the last mile of their data driven efforts.
They succeed in empowering operational workers to do their jobs more efficiently, by digitalizing routine tasks and creating new digital jobs faster. (http://www.mckinsey.com/mgi/overview/in-the-news/the-most-digital-companies-are-leaving-the-rest-behind”)
With employees across the company empowered with greater access to technology, the tech expertise inside your business is no longer limited to the IT department.
Cloud applications often came through Sales, Marketing, HR, operations or Finance to complement the centrally designed legacy apps, such as the ERP, Data warehouse or CRM.
Digital and mobile applications provided new channel to connect with external parties.
New insights and data sources that were not controlled centrally could be reached as well, which gave the birth of the Big Data era and is exploding with the Internet of Things.
Through governed self-service you avoid the creation of shadow IT and the ‘wild-wild west’ within your enterprise. Business users gain full access to the data they want in a timely manner which makes them more successful in their jobs and IT maintains control and protection of the data quality.
Balancing act: the Line of Business wants user freedom and autonomy, while IT needs to ensure governance and control over the enterprise data.
With Talend Data Preparation, commercial edition, you can achieve both and half harmony between business and IT.
Today, organizations are struggling to establish that governance layer
The reasons is that traditional governance practices where designed for very centralized data management approaches.
In those approaches, a small number of data specialist are designing data models, defining the rules on data, and establishing the policies for data quality, data protection, and data access. Then they monitor data usage with audit trails. This approach have proven success when applied to small set of highly shared data, especially heavily regulated data. Talend provides components to address those use cases such as Master data Management or Metadata Manager.
But, unfortunately, for self service, those centralized approach might not a good fit. This happens when data management is driven by multiple stakeholders in the organization, rather than by a
single data specialist. An recent survey from Experain shows that trend, highlighting that data management approach are shifting from centralized 531%) to hybrid (51%), and highlighting the new roles of data experts such as Data Analysts (42%), data scientists (28%), Chief Data Officer (22%), CFO (22%), or CMO (14%) to support data management strategy
(Soucre : The 2016 global data management benchmark report)