Global organizations are investing aggressively in data lake infrastructures in the pursuit of new, breakthrough business insights. At the same time, however, 2 out of 3 business executives are not highly confident in the accuracy and reliability of their own Big Data. Regaining that confidence requires utilizing proven data quality tools at Big Data scale.
In this on-demand webinar, discover how to ensure your data lake is a trusted source for advanced business insights that lead to new revenue, cost savings and competitiveness. You will have the opportunity to:
• Compare your organization’s data lake “readiness” against initial findings from our upcoming annual Big Data Trends survey
• Gain insight into where and how to leverage data quality best practices for Big Data use cases
• Explore how a ‘Develop Once, Deploy Anywhere’ approach, including to native Big Data infrastructures such as Hadoop and Spark, facilitates consistent data quality patterns
MLOps - Getting Machine Learning Into ProductionMichael Pearce
Creating autonomy and self-sufficiency by giving people what they need in order to do the things they need to do! What gets in the way, and how can we overcome those barriers? How do we get started quickly, effectively and safely? We'll come together to look at what MLOps entails, some of the tools available and what common MLOps pipelines look like.
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
There’s a lot of confusion out there about the differences between a data catalog, a data dictionary and a business glossary, and it's not always easy to understand who needs which and why. Join Malcolm Chisholm, Ph.D., President of Data Millennium, and Amichai Fenner, Product Lead at Octopai, as they help decode the mystery. Spoiler alert: one of these enables collaboration across BI and IT, which is it?
The last year has put a new lens on what speed to insights actually mean - day-old data became useless, and only in-the-moment-insights became relevant, pushing data and analytics teams to their breaking point. The results, everyone has fast forwarded in their transformation and modernization plans, and it's also made us look differently at dashboards and the type of information that we're getting the business. Join this live event and hear about the data teams ditching their dashboards to embrace modern cloud analytics.
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
Presenting this set of slides with name - Big Data Analytics Architecture Powerpoint Presentation Slides. This PPT deck displays twenty six slides with in depth research. Our topic oriented Big Data Analytics Architecture Powerpoint Presentation Slides presentation deck is a helpful tool to plan, prepare, document and analyse the topic with a clear approach. We provide a ready to use deck with all sorts of relevant topics subtopics templates, charts and graphs, overviews, analysis templates. Outline all the important aspects without any hassle. It showcases of all kind of editable templates infographs for an inclusive and comprehensive Big Data Analytics Architecture Powerpoint Presentation Slides presentation. Professionals, managers, individual and team involved in any company organization from any field can use them as per requirement.
MLOps - Getting Machine Learning Into ProductionMichael Pearce
Creating autonomy and self-sufficiency by giving people what they need in order to do the things they need to do! What gets in the way, and how can we overcome those barriers? How do we get started quickly, effectively and safely? We'll come together to look at what MLOps entails, some of the tools available and what common MLOps pipelines look like.
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
There’s a lot of confusion out there about the differences between a data catalog, a data dictionary and a business glossary, and it's not always easy to understand who needs which and why. Join Malcolm Chisholm, Ph.D., President of Data Millennium, and Amichai Fenner, Product Lead at Octopai, as they help decode the mystery. Spoiler alert: one of these enables collaboration across BI and IT, which is it?
The last year has put a new lens on what speed to insights actually mean - day-old data became useless, and only in-the-moment-insights became relevant, pushing data and analytics teams to their breaking point. The results, everyone has fast forwarded in their transformation and modernization plans, and it's also made us look differently at dashboards and the type of information that we're getting the business. Join this live event and hear about the data teams ditching their dashboards to embrace modern cloud analytics.
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
Presenting this set of slides with name - Big Data Analytics Architecture Powerpoint Presentation Slides. This PPT deck displays twenty six slides with in depth research. Our topic oriented Big Data Analytics Architecture Powerpoint Presentation Slides presentation deck is a helpful tool to plan, prepare, document and analyse the topic with a clear approach. We provide a ready to use deck with all sorts of relevant topics subtopics templates, charts and graphs, overviews, analysis templates. Outline all the important aspects without any hassle. It showcases of all kind of editable templates infographs for an inclusive and comprehensive Big Data Analytics Architecture Powerpoint Presentation Slides presentation. Professionals, managers, individual and team involved in any company organization from any field can use them as per requirement.
Data Management Meets Human Management - Why Words MatterDATAVERSITY
At Fifth Third Bank, about 450 people use data every day. They all start with Alation. But this wasn't always the case. In fact, getting hundreds of folks working in sync has been a monumental task.
Just ask Greg Swygart, VP of enterprise data at Fifth Third Bank. Greg has led data consumption and interaction efforts since adopting Alation. Currently he’s scaling out data literacy for Fifth Third, replicating data capabilities to all roles across the company.
Join Greg to learn how Fifth Third Bank moved from a command-and-control governance approach to non-invasive — and reaped the benefits. Greg will be followed by Bob Seiner, creator of Non-Invasive Data Governance, who will speak to data governance’s evolution, with an eye to what’s next.
In this webinar, you'll learn:
• About Fifth Third’s transition away from command-and-control governance
• How Fifth Third leverages Alation as its data marketplace for curation & consumption
• Why words matter when driving adoption
• About the data catalog — and its role in human management
Speed Matters - Intelligent Strategies to Accelerate Data-Driven DecisionsDATAVERSITY
COVID-19 has shown us the importance of data in being able to quickly make decisions when market variables are out of our control. In order to accelerate and harness the process, an organization needs an agile approach to data integration and analytics that avoids the limitations of predefined schemas and data models.
Learn from 451 Research, now part of S&P Global Market Intelligence, a leading global IT research and advisory firm, and Qlik about best practices that can help you accelerate the data to decision path with agility. You’ll understand how to:
-Rethink traditional assumptions about data management and analytic roles and technologies
-Recognize trends that drive the demand to reduce the time required to investigate, analyze and take action on business data.
See a new state of business intelligence, where the data pipeline is optimized to enable organizations to make decisions and act in real-time. Seeking alternatives to the traditional approaches to become more agile in today’s evolving market and economy? Then don’t miss this presentation!
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...DATAVERSITY
Operational Data Governance is more than a stewardship process for critical Business Assets. As organizations build structure around KPI’s and other critical data, a workflow develops that revolves around the sources and supply chain for that critical data. There can be many aspects to changes and inconsistencies affecting the final results of the supply chain. Inaccurate usage of data can result in audit penalties as well as erroneous report summaries and conclusions.
Is it coming from the correct authoritative source? Has the data been profiled? Has it met it’s threshold?
Gaps in the supply chain from incorrect pathways may lead dead ends or lost sources.
The value of understanding the entire supply chain cannot be overstated. When changes occur at and point, end users can validate that correct business standards, rules and policies have been applied to the critical data within the supply chain. Your organization can rest easy that you are not at risk for exposure due to improper usage, security, and compliance.
Join this webinar to uncover how companies are using data lineage to accomplish data supply chain transparency. You’ll also see the direct value clear data lineage can give to your business and IT landscape today.
It’s been almost two years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. This complex but critical practice still has most enterprises grappling to master it for a myriad of reasons.
In this webinar, we’ll examine how Data Governance attitudes and practices continue to evolve and discuss what new research reveals as the most predominant challenges. We’ll delve into technology trends, including how adding certain capabilities will benefit your organization in terms of data asset availability, quality, and usability, including data consumer literacy and confidence.
When you attend this webinar, you will learn about:
• The requirements for a successful and sustainable Data Governance program
• Increasing confidence in data analytics for faster speed to insights
• How to automate data preparation and intelligence and where to start
Slides: Applying Artificial Intelligence (AI) in All the Right Places in the ...DATAVERSITY
Data and Analytics are fundamental to digital transformation, yet many companies are still under-utilizing them. To go full throttle, AI and automation technologies can be added across the full spectrum of your data journey to truly re-imagine processes and business models.
Join Information Builders for this webinar on how AI:
• Augments your traditional business intelligence and analytics systems
• Minimizes manual inefficiencies with the way data is generated, collected, cleansed, and organized
• Helps you realize substantial performance gains with use cases such as churn forecasting, predictive maintenance, supply chain planning, risk mitigation, and more
Slides: Achieving a “Single Source of Truth” with BI in Your EnterpriseDATAVERSITY
The ability to drive consistent use and widespread adoption of Business Intelligence is an ongoing challenge for many companies, and the inability to achieve this consistency and uniform adoption can significantly impede their progress in becoming information and data-driven organizations. Departmental siloes, tool proliferation, end-user Data Literacy, and other challenges too often produce an environment in which a shared, common understanding of the organization’s key performance indicators fail to materialize. In addition, metrics and measurements — the much-discussed “single-source-of-truth” — often fail to take shape, which in turn leads to competing versions of the truth, a lack of trust in available decision-making data, and degradation in decision-making speed and effectiveness.
In this webinar, we will:
• Explore the underlying conditions that lead to the challenges of driving consistent and company-wide adoption of Business Intelligence
• Examine case studies of companies that have successfully solved these challenges
• Suggest solutions to the issues preventing organizations from building the necessary but elusive “Single Source of Truth”
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
This is the age of analytics—information resulting from the systematic analysis of data.
Insights gained from applying data and analytics to business allows large and small organizations across diverse industries—be it healthcare, retail, manufacturing, financial, or others—to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The key to building a data-driven practice is a Data and Analytics Operating Model (D&AOM) which enables the organization to establish standards for data governance, controls for data flows (both within and outside the organization), and adoption of appropriate technological innovations.
Success measures of a data initiative may include:
• Creating a competitive advantage by fulfilling unmet needs,
• Driving adoption and engagement of the digital experience platform (DXP),
• Delivering industry standard data and metrics, and
• Reducing the lift on service teams.
This green paper lays out the framework for building and customizing an effective data and analytics operating model.
The business models across industries around the world are becoming Customer Centric. Recent studies show that “knowing” customers based on internal as well as external data is one of the top priorities of business leaders. On the other hand various surveys also reveal that customers do not mind to share their semi-personal data for the benefit of differentiated service. In that context, the 360 degree view of customer – which was once thought to be a business process, master data management, data integration and data warehouse / business intelligence related problem has now entered into the whole new big world of BIG data including integration with unstructured data sources. Impact of big data on Customer Master Data Management is spread across - from Integration and linkage of unstructured or semi-structured data with structured master data that is maintained within enterprise; to analyze and visualization of the same to generate useful insight about the customers. There are various patterns to handle the challenges across the steps i.e. acquire, link, manage, analyze and distribute the enhanced customer data for differentiated product or services.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Noise to Signal - The Biggest Problem in DataDATAVERSITY
Our ability to produce, ingest and store data has grown exponentially, but our ability to parse out insights from data has not. In the 90s, an organization’s data would live in a data warehouse with an ETL pipeline and one reporting layer on top. Information was well controlled if not somewhat limited in breadth and slow to trickle down. Now with the onset of self-service analytics, anyone can create a report and an insight and there are many different sources of “truth.” For example, a seemingly straightforward question like "how many customers do we have?" will likely return difference answers from sales, finance and customer success, depending on their definitions and the data at hand. There is simply too much data (and duplicate data), too many tools, and too many systems storing data -- leading to time consuming searches, confusion and a lack of trust. Hear Stephanie discuss how a data catalog can help solve the noise to signal problem - making information easier to find, easier to understand and more trustworthy. She will describe how organizations like Safeway, Albertsons, Munich Re and Pfizer leverage a data catalog to find data and collaborate on data, gain a fuller understanding of its meaning and ultimately, solve important problems.
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...DATAVERSITY
Greater agility, scalability, and lower total cost of ownership made the decision to move key elements of your organization’s data capability to the cloud easy. The real challenge is migrating data from your legacy systems to your new cloud platform so you can unleash its potential and value while minimizing the migration risks.
Combining erwin‘s data modeling, governance, and intelligence solutions with Snowflake’s modern cloud data platform, organizations can realize a scalable, governed, and transparent enterprise data capability.
In this session, we’ll show you how enterprise stakeholders with different skills and needs can work together to accelerate and assure the success of cloud migration projects of any size. You’ll learn how to:
• Reduce costs and mitigate risks when migrating legacy applications to Snowflake with erwin’s model-driven schema design and transformation capabilities
• Increase the precision, speed, and agility of Snowflake deployments with erwin data automation
• Assure transparency, compliance, and governance for Snowflake data and processes
• Increase the efficiency and accuracy of analytics and other data usage on the Snowflake Cloud Platform
Analytics is all about course correcting the future. While this starts with accurate predictions of the future, without resultant actions steering the future toward company goals, knowing that future is academic. Successful companies must be grounded in successful data-based prescription. In this webinar, William will present a data maturity model with a focus on how analytic competitors outdo the competition by looking forward to a data-influenced future.
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
Digital Transformation is a top priority for many organizations, and a successful digital journey requires a strong data foundation. Creating this digital transformation requires a number of core data management capabilities such as MDM, With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDATAVERSITY
The increasing focus on data in today’s organization has increased demand for critical roles such as data architect, data engineer, and data modeler. But there is often confusion and ambiguity around what these roles entail, and what overlap exists between them. This webinar will discuss these data-centric roles and their place in the data-driven organization.
Emerging Data Quality Trends for Governing and Analyzing Big DataDATAVERSITY
Business initiatives across industries are applying more data than ever to drive analytics and AI in the quest for new competitive insights. As the volume and variety of data gathered by organizations continues to escalate, both on-premises and in the cloud, traditional methods of Data Quality are transforming to meet this Big Data challenge. This webinar looks at these emerging trends in Data Quality to address Data Governance, entity resolution at scale, AI and machine learning, and establishing Data Quality as a core tenet of data literacy.
Emerging Data Quality Trends for Governing and Analyzing Big DataPrecisely
Business initiatives across industries are applying more data than ever to drive analytics and AI in the quest for new competitive insights. As the volume and variety of data gathered by organizations continues to escalate, both on-premises and in the cloud, traditional methods of data quality are transforming to meet this Big Data challenge.
View this Dataversity-sponsored webinar on-demand as we look at these emerging trends in data quality to address data governance, entity resolution at scale, AI and machine learning, and establishing data quality as a core tenet of data literacy.
Data Management Meets Human Management - Why Words MatterDATAVERSITY
At Fifth Third Bank, about 450 people use data every day. They all start with Alation. But this wasn't always the case. In fact, getting hundreds of folks working in sync has been a monumental task.
Just ask Greg Swygart, VP of enterprise data at Fifth Third Bank. Greg has led data consumption and interaction efforts since adopting Alation. Currently he’s scaling out data literacy for Fifth Third, replicating data capabilities to all roles across the company.
Join Greg to learn how Fifth Third Bank moved from a command-and-control governance approach to non-invasive — and reaped the benefits. Greg will be followed by Bob Seiner, creator of Non-Invasive Data Governance, who will speak to data governance’s evolution, with an eye to what’s next.
In this webinar, you'll learn:
• About Fifth Third’s transition away from command-and-control governance
• How Fifth Third leverages Alation as its data marketplace for curation & consumption
• Why words matter when driving adoption
• About the data catalog — and its role in human management
Speed Matters - Intelligent Strategies to Accelerate Data-Driven DecisionsDATAVERSITY
COVID-19 has shown us the importance of data in being able to quickly make decisions when market variables are out of our control. In order to accelerate and harness the process, an organization needs an agile approach to data integration and analytics that avoids the limitations of predefined schemas and data models.
Learn from 451 Research, now part of S&P Global Market Intelligence, a leading global IT research and advisory firm, and Qlik about best practices that can help you accelerate the data to decision path with agility. You’ll understand how to:
-Rethink traditional assumptions about data management and analytic roles and technologies
-Recognize trends that drive the demand to reduce the time required to investigate, analyze and take action on business data.
See a new state of business intelligence, where the data pipeline is optimized to enable organizations to make decisions and act in real-time. Seeking alternatives to the traditional approaches to become more agile in today’s evolving market and economy? Then don’t miss this presentation!
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...DATAVERSITY
Operational Data Governance is more than a stewardship process for critical Business Assets. As organizations build structure around KPI’s and other critical data, a workflow develops that revolves around the sources and supply chain for that critical data. There can be many aspects to changes and inconsistencies affecting the final results of the supply chain. Inaccurate usage of data can result in audit penalties as well as erroneous report summaries and conclusions.
Is it coming from the correct authoritative source? Has the data been profiled? Has it met it’s threshold?
Gaps in the supply chain from incorrect pathways may lead dead ends or lost sources.
The value of understanding the entire supply chain cannot be overstated. When changes occur at and point, end users can validate that correct business standards, rules and policies have been applied to the critical data within the supply chain. Your organization can rest easy that you are not at risk for exposure due to improper usage, security, and compliance.
Join this webinar to uncover how companies are using data lineage to accomplish data supply chain transparency. You’ll also see the direct value clear data lineage can give to your business and IT landscape today.
It’s been almost two years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. This complex but critical practice still has most enterprises grappling to master it for a myriad of reasons.
In this webinar, we’ll examine how Data Governance attitudes and practices continue to evolve and discuss what new research reveals as the most predominant challenges. We’ll delve into technology trends, including how adding certain capabilities will benefit your organization in terms of data asset availability, quality, and usability, including data consumer literacy and confidence.
When you attend this webinar, you will learn about:
• The requirements for a successful and sustainable Data Governance program
• Increasing confidence in data analytics for faster speed to insights
• How to automate data preparation and intelligence and where to start
Slides: Applying Artificial Intelligence (AI) in All the Right Places in the ...DATAVERSITY
Data and Analytics are fundamental to digital transformation, yet many companies are still under-utilizing them. To go full throttle, AI and automation technologies can be added across the full spectrum of your data journey to truly re-imagine processes and business models.
Join Information Builders for this webinar on how AI:
• Augments your traditional business intelligence and analytics systems
• Minimizes manual inefficiencies with the way data is generated, collected, cleansed, and organized
• Helps you realize substantial performance gains with use cases such as churn forecasting, predictive maintenance, supply chain planning, risk mitigation, and more
Slides: Achieving a “Single Source of Truth” with BI in Your EnterpriseDATAVERSITY
The ability to drive consistent use and widespread adoption of Business Intelligence is an ongoing challenge for many companies, and the inability to achieve this consistency and uniform adoption can significantly impede their progress in becoming information and data-driven organizations. Departmental siloes, tool proliferation, end-user Data Literacy, and other challenges too often produce an environment in which a shared, common understanding of the organization’s key performance indicators fail to materialize. In addition, metrics and measurements — the much-discussed “single-source-of-truth” — often fail to take shape, which in turn leads to competing versions of the truth, a lack of trust in available decision-making data, and degradation in decision-making speed and effectiveness.
In this webinar, we will:
• Explore the underlying conditions that lead to the challenges of driving consistent and company-wide adoption of Business Intelligence
• Examine case studies of companies that have successfully solved these challenges
• Suggest solutions to the issues preventing organizations from building the necessary but elusive “Single Source of Truth”
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
This is the age of analytics—information resulting from the systematic analysis of data.
Insights gained from applying data and analytics to business allows large and small organizations across diverse industries—be it healthcare, retail, manufacturing, financial, or others—to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The key to building a data-driven practice is a Data and Analytics Operating Model (D&AOM) which enables the organization to establish standards for data governance, controls for data flows (both within and outside the organization), and adoption of appropriate technological innovations.
Success measures of a data initiative may include:
• Creating a competitive advantage by fulfilling unmet needs,
• Driving adoption and engagement of the digital experience platform (DXP),
• Delivering industry standard data and metrics, and
• Reducing the lift on service teams.
This green paper lays out the framework for building and customizing an effective data and analytics operating model.
The business models across industries around the world are becoming Customer Centric. Recent studies show that “knowing” customers based on internal as well as external data is one of the top priorities of business leaders. On the other hand various surveys also reveal that customers do not mind to share their semi-personal data for the benefit of differentiated service. In that context, the 360 degree view of customer – which was once thought to be a business process, master data management, data integration and data warehouse / business intelligence related problem has now entered into the whole new big world of BIG data including integration with unstructured data sources. Impact of big data on Customer Master Data Management is spread across - from Integration and linkage of unstructured or semi-structured data with structured master data that is maintained within enterprise; to analyze and visualization of the same to generate useful insight about the customers. There are various patterns to handle the challenges across the steps i.e. acquire, link, manage, analyze and distribute the enhanced customer data for differentiated product or services.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Noise to Signal - The Biggest Problem in DataDATAVERSITY
Our ability to produce, ingest and store data has grown exponentially, but our ability to parse out insights from data has not. In the 90s, an organization’s data would live in a data warehouse with an ETL pipeline and one reporting layer on top. Information was well controlled if not somewhat limited in breadth and slow to trickle down. Now with the onset of self-service analytics, anyone can create a report and an insight and there are many different sources of “truth.” For example, a seemingly straightforward question like "how many customers do we have?" will likely return difference answers from sales, finance and customer success, depending on their definitions and the data at hand. There is simply too much data (and duplicate data), too many tools, and too many systems storing data -- leading to time consuming searches, confusion and a lack of trust. Hear Stephanie discuss how a data catalog can help solve the noise to signal problem - making information easier to find, easier to understand and more trustworthy. She will describe how organizations like Safeway, Albertsons, Munich Re and Pfizer leverage a data catalog to find data and collaborate on data, gain a fuller understanding of its meaning and ultimately, solve important problems.
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...DATAVERSITY
Greater agility, scalability, and lower total cost of ownership made the decision to move key elements of your organization’s data capability to the cloud easy. The real challenge is migrating data from your legacy systems to your new cloud platform so you can unleash its potential and value while minimizing the migration risks.
Combining erwin‘s data modeling, governance, and intelligence solutions with Snowflake’s modern cloud data platform, organizations can realize a scalable, governed, and transparent enterprise data capability.
In this session, we’ll show you how enterprise stakeholders with different skills and needs can work together to accelerate and assure the success of cloud migration projects of any size. You’ll learn how to:
• Reduce costs and mitigate risks when migrating legacy applications to Snowflake with erwin’s model-driven schema design and transformation capabilities
• Increase the precision, speed, and agility of Snowflake deployments with erwin data automation
• Assure transparency, compliance, and governance for Snowflake data and processes
• Increase the efficiency and accuracy of analytics and other data usage on the Snowflake Cloud Platform
Analytics is all about course correcting the future. While this starts with accurate predictions of the future, without resultant actions steering the future toward company goals, knowing that future is academic. Successful companies must be grounded in successful data-based prescription. In this webinar, William will present a data maturity model with a focus on how analytic competitors outdo the competition by looking forward to a data-influenced future.
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
Digital Transformation is a top priority for many organizations, and a successful digital journey requires a strong data foundation. Creating this digital transformation requires a number of core data management capabilities such as MDM, With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDATAVERSITY
The increasing focus on data in today’s organization has increased demand for critical roles such as data architect, data engineer, and data modeler. But there is often confusion and ambiguity around what these roles entail, and what overlap exists between them. This webinar will discuss these data-centric roles and their place in the data-driven organization.
Emerging Data Quality Trends for Governing and Analyzing Big DataDATAVERSITY
Business initiatives across industries are applying more data than ever to drive analytics and AI in the quest for new competitive insights. As the volume and variety of data gathered by organizations continues to escalate, both on-premises and in the cloud, traditional methods of Data Quality are transforming to meet this Big Data challenge. This webinar looks at these emerging trends in Data Quality to address Data Governance, entity resolution at scale, AI and machine learning, and establishing Data Quality as a core tenet of data literacy.
Emerging Data Quality Trends for Governing and Analyzing Big DataPrecisely
Business initiatives across industries are applying more data than ever to drive analytics and AI in the quest for new competitive insights. As the volume and variety of data gathered by organizations continues to escalate, both on-premises and in the cloud, traditional methods of data quality are transforming to meet this Big Data challenge.
View this Dataversity-sponsored webinar on-demand as we look at these emerging trends in data quality to address data governance, entity resolution at scale, AI and machine learning, and establishing data quality as a core tenet of data literacy.
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
With recent studies indicating that 80% of AI and machine learning projects are failing due to data quality related issues, it’s critical to think holistically about this fact. This is not a simple topic – issues in data quality can occur throughout from starting the project through to model implementation and usage.
View this webinar on-demand, where we start with four foundational data steps to get our AI and ML projects grounded and underway, specifically:
• Framing the business problem
• Identifying the “right” data to collect and work with
• Establishing baselines of data quality through data profiling and business rules
• Assessing fitness for purpose for training and evaluating the subsequent models and algorithms
Enabling Success With Big Data - Driven Talent AcquisitionDavid Bernstein
Adopting an evidence-based recruitment marketing strategy is not just reserved for large employers. In fact, a targeted sourcing strategy can in some ways have a greater impact on small and mid-size businesses who need to allocate already-limited resources to the areas that will provide the most value. Ultimately, hiring the right candidate means profitability for your business. How can talent acquisition professionals gain the insights their organizations need to make better-informed decisions about their recruitment marketing efforts?
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Learn more about a world beyond CRM suites and how your company can build the customer data technology stack that matches the reality of today’s multi-channel, digital era.
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
Data Lake or Data Swamp? By now, we’ve likely all heard the comparison. Data Lake architectures have the opportunity to provide the ability to integrate vast amounts of disparate data across the organization for strategic business analytic value. But without a proper architecture and metadata management strategy in place, a Data Lake can quickly devolve into a swamp of information that is difficult to understand. This webinar will offer practical strategies to architect and manage your Data Lake in a way that optimizes its success.
The Data Driven Enterprise - Roadmap to Big Data & Analytics SuccessBigInsights
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
Presentation used at the series of Breakfast seminar around Australia hosted by Lenovo/Intel/SAP/EY
What is the impact of Big Data on Analytics from a Data Science perspective.
Presented at the Big Data and Analytics Summit 2014, Nasscom by Mamatha Upadhyaya.
Similar to Applying Data Quality Best Practices at Big Data Scale (20)
AI-Ready Data - The Key to Transforming Projects into Production.pptxPrecisely
Moving AI projects from the laboratory to production requires careful consideration of data preparation. Join us for a fireside chat where industry experts, including Antonio Cotroneo (Director, Product Marketing, Precisely) and Sanjeev Mohan (Principal, SanjMo), will discuss the crucial role of AI-ready data in achieving success in AI projects. Gain essential insights and considerations to ensure your AI solutions are built on a solid foundation of accurate, consistent, and context-rich data. Explore practical insights and learn how data integrity drives innovation and competitive advantage. Transform your approach to AI with a focus on data readiness.
Building a Multi-Layered Defense for Your IBM i SecurityPrecisely
In today's challenging security environment, new vulnerabilities emerge daily, leaving even patched systems exposed. While IBM works tirelessly to release fixes as they discover vulnerabilities, bad actors are constantly innovating. Don't settle for reactive defense – secure your IT with a layered approach!
This holistic strategy builds multiple security walls, making it far harder for attackers to breach your defenses. Even if a certain vulnerability is exploited, one of the controls could stop the attack or at least delay it until you can take action.
Join us for this webcast to hear about:
• How security risks continue to evolve and change
• The importance of keeping all your systems patched an up-to-date
• A multi-layered approach to network, system object and data security
Navigating the Cloud: Best Practices for Successful MigrationPrecisely
In today's digital landscape, migrating workloads and applications to the cloud has become imperative for businesses seeking scalability, flexibility, and efficiency. However, executing a seamless transition requires strategic planning and careful execution. Join us as we delve into the insightful insights around cloud migration, where we will explore three key topics:
i. Considerations to take when planning for cloud migration
ii. Best practices for successfully migrating to the cloud
iii. Real-world customer stories
Unlocking the Power of Your IBM i and Z Security Data with Google ChroniclePrecisely
In today's ever-evolving threat landscape, any siloed systems, or data leave organizations vulnerable. This is especially true when mission-critical systems like IBM i and IBM Z mainframes are not included in your security planning. Valuable security data from these systems often remains isolated, hindering your ability to detect and respond to threats effectively.
Ironstream and bridge this gap for IBM systems by integrating the important security data from these mission-critical systems into Google Chronicle where it can be seen, analyzed and correlated with the data from other enterprise systems Here's what you'll learn:
• The unique challenges of securing IBM i and Z mainframes
• Why traditional security tools fall short for mainframe data
• The power of Google Chronicle for unified security intelligence
• How to gain comprehensive visibility into your entire IT ecosystem
• Real-world use cases for integrating IBM i and Z security data with Google Chronicle
Join us for this webcast to hear about:
• The unique challenges of securing IBM i and IBM Z systems
• Real-world use cases for integrating IBM i and IBM Z security data with Google Chronicle
• Combining Ironstream and Google Chronicle to deliver faster threat detection, investigation, and response times
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
Are you considering leveraging the cloud alongside your existing IBM AIX and IBM I systems infrastructure? There are likely benefits to be realized in scalability, flexibility and even cost.
However, to realize these benefits, you need to be aware of the challenges and opportunities that come with integrating your IBM Power Systems in the cloud. These challenges range from data synchronization to testing to planning for fallback in the event of problems.
Join us for this webcast to hear about:
• Seamless migration strategies
• Best practices for operating in the cloud
• Benefits of cloud-based HA/DR for IBM AIX and IBM i
It can be challenging display and share capacity data that is meaningful to end users. There is an overabundance of data points related to capacity, and the summarization of this data is difficult to construct and display.
You are already spending time and money to handle the critical need to manage systems capacity, performance and estimate future needs. Are you it spending wisely? Are you getting the level of results from your investment that you really need? Can you prove it?
The good news is that the return on investment of implementing capacity management and capacity planning is most definitely positive and provable, both in terms of tangible monetary value and in some less tangible but no-less-valuable benefits.
Join us for this webinar and learn:
• Top Trends in Capacity Management
• Common customer pain points
• Ways to demonstrate these benefits to your company
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Precisely
Ready to improve efficiency, provide easy to use data automations and take materials master (MM) data maintenance to the next level?
Find out how during our Automate Studio training on March 28 – led by Sigrid Kok, Principal Sales Engineer, and Isra Azam, Sales Engineer, at Precisely.
This session’s for you if you want to discover the best approaches for creating, extending or maintaining different types of materials, as well as automating the tricky parts of these processes that slow you down.
Greater control over your Automate Studio business processes means bigger, better results. We’ll show you how to enable your business users to interact with SAP from Microsoft Office and other familiar platforms – resulting in more efficient SAP data management, along with improved data integrity and accuracy.
This 90-minute session will be filled with a variety of topics, including:
real world approaches for creating multiple types of materials, balancing flexibility and power with simplicity and ease of use
tips on material creation, including
downloading the generated material number
using formulas to format prior to upload, such as capitalization or zero padding to make it easy to get the data right the first time
conditionally require fields based on other field entries
using LOV for fields that are free form entry for standard values
tips on modifying alternate units of measure, building from scratch using GUI scripting
modify multiple language descriptions, build from scratch using a standard BAPI
make end-to-end MM process flows more of a reality with features including APIs and predictive AI
Through these topics, you’ll gain plenty of actionable takeaways that you can start implementing right away – including how to:
improve your data integrity and accuracy
make scripts flexible and usable for automation users
seamlessly handle both simple and complex parts of material master
interact with SAP from both business user and script developers’ perspectives
easily upload and download data between SAP and Excel – and how to format the data before upload using simple formulas
You’ll leave this session feeling ready and empowered to save time, boost efficiency, and change the way you work.
Automate Studio reduces your dependency on technical resources to help you create automation scenarios – and our team of experts is here to make sure you get the most out of our solution throughout the journey.
Questions? Sigrid & Isra will be ready to answer them during a live Q&A at the end of the session.
Who should attend:
Attendees who will get the most out of this session are Automate Studio developers and runners familiar with SAP MM. Knowledge of Automate Studio script creation is nice to have, but not required.
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Precisely
Join us for an insightful roundtable discussion featuring experts from AWS, Confluent, and Precisely as they delve into the complexities and opportunities of migrating mainframe data to the cloud.
In this engaging webinar, participants will learn about the various considerations, strategies, and customer challenges associated with replicating mainframe data to cloud environments.
Our panelists will share practical insights, real-world experiences, and best practices to help organizations successfully navigate this transformative journey.
Whether you're considering migrating and modernizing your mainframe applications to cloud, or augmenting mainframe-based applications with data replication to cloud, this roundtable will provide valuable perspectives and insights to maximize the benefits of migrating mainframe data to the cloud.
Join us on March 27 to gain a deeper understanding of the opportunities and challenges in this evolving landscape.
Data Innovation Summit: Data Integrity TrendsPrecisely
Data integrity remains an evolving process of discovery, identification, and resolution. With an all-time low in public confidence on data being used for decision-making, attention has gradually shifted to data quality and data integration across multiple systems and frameworks. Data integrity becomes a focal point again for companies to make strategic moves in a world facing an evolving economy.
Key takeaways:
· How to build a data-driven culture within your organization
· Tips to engage with key stakeholders in your business and examples from other businesses around the world
· How to establish and maintain a business-first approach to data governance
· A summary of the findings from a recent survey of global data executives by Drexel University's LeBow College of Business
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
Artificial Intelligence (AI) has become a strategic imperative in a rapidly evolving business landscape. However, the rush to embrace AI comes with risks, as illustrated by instances of AI-generated content with fake citations and potentially dangerous recommendations. The critical factor underpinning trustworthy AI is data integrity, ensuring data is accurate, consistent, and full of rich context.
Attend our upcoming webinar, "AI You Can Trust: Ensuring Success with Data Integrity," as we explore organizational challenges in maintaining data integrity for AI applications and real-world use cases showcasing the transformative impact of high-integrity data on AI success.
During this panel discussion, we'll highlight everything from personalized recommendations and AI-powered workflows to machine learning applications and innovative AI assistants.
Key Topics:
AI Use Cases with Data Integrity: Discover how data integrity shapes the success of AI applications through six compelling use cases.
Solving AI Challenges: Uncover practical solutions to common AI challenges such as bias, unreliable results, lack of contextual relevance, and inadequate data security.
Three Considerations of Data Integrity for AI: Learn the essential pillars—complete, trusted, and contextual—that underpin data integrity for AI success.
Precisely and AWS Partnership: Explore how the collaboration between Precisely and Amazon Web Services (AWS) addresses these challenges and empowers organizations to achieve AI-ready data.
Join our panelists to unlock the full potential of AI by starting your data integrity journey today. Trust in AI begins with trusted data – let's future-proof your AI together.
Less Bias. More Accurate. Relevant Outcomes.
Optimisez la fonction financière en automatisant vos processus SAPPrecisely
La fonction finance est au cœur du succès de l’entreprise, et doit aussi évoluer pour faire face aux enjeux d’aujourd’hui : aller plus vite, traiter plus d’informations et assurer une qualité des données sans faille.
Nous vous proposons de découvrir ensemble comment répondre à ces défis, notamment les points suivants :
Gérer les référentiels comptables et financiers, comptes comptables, clients, fournisseurs, centres de couts, centres de profits…Accélérer les clôtures et permettre de passer les écritures comptables nécessaires, de lancer les rapports adéquats et d’extraire les informations en temps réelOrganiser les taches en les affectant de manière ordonnancée à leurs responsables ou en les lançant automatiquement et les suivre de manière granulaire
Notre webinaire sera l’occasion d’évoquer et d’illustrer cette palette de capacités disponibles pour des utilisateurs métier sans code ou avec peu de code et nous vous espérons nombreux.
In dieser Präsentation diskutieren wir, welche Tools aus unserer Sicht dabei helfen, die Transformation zu SAP S/4HANA optimal zu gestalten. Aber wir blicken auch nach vorne!
In unserem Beitrag fokussieren wir uns nicht nur auf kurzfristige Lösungen, sondern es geht auch um das Thema „Nachhaltigkeit“. Um Investitionen für die Zukunft.
Dazu gehören Entwicklungen, die die SAP Welt nachhaltig verändern werden.
Wir betrachten zukünftige Technologien, wie KI oder Machine Learning, die dazu beitragen, datenintensive SAP Prozesse zu optimieren, die Datenqualität zu verbessern, manuelle Prozesse zu reduzieren und Mitarbeiter zu entlasten.
Werfen Sie mit uns einen Blick in die Zukunft und gestalten Sie die digitale Transformation in Ihrem Unternehmen mit.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Applying Data Quality Best Practices at Big Data Scale
1. Applying Data Quality Best
Practices at Big Data Scale
Harald Smith
Director of Product Management
Michael Urbonas
Director of Product Marketing
2. Speakers
Mike Urbonas
Director of Product Marketing,
Syncsort
15 years of software experience
including
– BI/DW & data visualization
– Data management & ETL
– Text analytics
– Enterprise search
Harald Smith
Director of Product Management,
Syncsort
20 years in Information
Management incl. data quality,
integration, and governance
– Consulting, product management,
software & solution development
Co-author of Patterns of Information
Management, as well as two Redbooks
on Information Governance and Data
Integration
3. Today’s agenda
Problem: Huge Big Data investments, Scarce Big Data trust
– New insights from Syncsort 2017 Big Data Trends survey
– Root causes of Data Lake distrust
Sample use cases at Big Data scale
– 360 degree view of the customer, product or other core entity
– Anti-fraud
Solution: Bringing Data Quality best practices into the Data Lake
– “Design once, deploy anywhere” with Syncsort/Trillium technology approach
– “Intelligent execution” to leverage the strength of the Big Data platform
4. Nobody wants a data swamp!
“This sure looked a lot nicer
on the whiteboard…”
5. Key problem:
Big Data deemed untrustworthy by business managers and leaders
Only 33% of senior
execs have a high level
of trust in the accuracy
of their Big Data
analytics ~ KPMG 2016
6. Key problem:
Big Data deemed untrustworthy by business managers and leaders
Only 33% of senior
execs have a high level
of trust in the accuracy
of their Big Data
analytics ~ KPMG 2016
59% of global execs do
not believe their company
has capabilities to
generate meaningful
business insights from
their data ~ Bain 2015
7. Key problem:
Big Data deemed untrustworthy by business managers and leaders
Only 33% of senior
execs have a high level
of trust in the accuracy
of their Big Data
analytics ~ KPMG 2016
85% of global execs say
major investments are
required to update their
existing data platform,
including data cleaning
and consolidating ~ Bain 2015
59% of global execs do
not believe their company
has capabilities to
generate meaningful
business insights from
their data ~ Bain 2015
8. Fresh insights from Syncsort 2017 Big Data Trends survey
Data Quality is recognized as a
mission-critical data lake
success factor
– Data Quality tops the list of
challenges of data lake
implementation, followed
closely by Data Governance
9. Fresh insights from Syncsort 2017 Big Data Trends survey
Data Quality is recognized as a
mission-critical data lake
success factor
– Data Quality tops the list of
challenges of data lake
implementation, followed
closely by Data Governance
Financial services and
insurance industry is most
focused on Data Quality and
Data Governance
– Named Data Quality as top
priority 50% more often than
participants from other
industries
– Also identified Data
Governance as a top priority at
more than twice the rate of
those from other industries
10. Fresh insights from Syncsort 2017 Big Data Trends survey
Data Quality is recognized as a
mission-critical data lake
success factor
– Data Quality tops the list of
challenges of data lake
implementation, followed
closely by Data Governance
But… not everyone is making
the connection between Data
Quality and Big Data success
– Participants who did not
include data quality as a top 3
priority for implementing the
data lake expressed the most
interest in analytically-
intensive data lake uses…
which are highly dependent on
proper data quality
Financial services and
insurance industry is most
focused on Data Quality and
Data Governance
– Named Data Quality as top
priority 50% more often than
participants from other
industries
– Also identified Data
Governance as a top priority at
more than twice the rate of
those from other industries
12. Root causes of Big Data mistrust
Are these numbers
accurate? Are
calculations using
correctly aggregated
data?
Is this data current?
When was it last
updated?
Are these terms
consistent with our
business definitions?
Can I trust this data
enough to make key
decisions and/or allow
the data to be used in
real-time?
Did we include all of
the data we should
have? Are additional
data sources missing?
13. Root causes of Big Data mistrust… examples
False
Assumptions
Pinterest targeted marketing
campaign mistakenly
congratulated single women
on upcoming weddings...
14. Root causes of Big Data mistrust… examples
False
Assumptions
Pinterest targeted marketing
campaign mistakenly
congratulated single women
on upcoming weddings...
Miscoded/
Misinterpreted Data
Predictive analysis falsely
found call center workers
without a HS diploma were
3x more likely to remain on
board for at least 9 months…
…?
15. Root causes of Big Data mistrust… examples
False
Assumptions
Pinterest targeted marketing
campaign mistakenly
congratulated single women
on upcoming weddings...
Duplicate
Data
Fraud examination revealed
massive import tariff evasion
on eggs, only to find there
was no case to crack…
Miscoded/
Misinterpreted Data
Predictive analysis falsely
found call center workers
without a HS diploma were
3x more likely to remain on
board for at least 9 months…
…?
16. Sample use cases at Big Data scale…
360 view of customer (or product, or other key entity)
Is Data Lake essential for this use case?
– YES… Purpose of customer 360 is to optimize customer
experience management
– Increasingly broad spectrum of data sources involved in and
required for effectively personalizing customer experiences
and targeted marketing offers
What Types of Data?
– Internal sources – often many/overlapping
– 3rd Party data – demographics
– Suppression data – keeping customer information updated
– New sources – mobile, social media
Internal Data
Customer Master Data
Point-of-Sale Data
Contact Form Data
Loyalty Program Data
ecommerce Data
Customer Service Data
Suppression Data
Change of Address
Mortality
Do Not Call
Third-Party Data
Age
Occupation
Education
Gender
Income
Geographic
17. Sample use cases at Big Data scale…
Anti-Fraud/Anti-Money Laundering
Is Data Lake essential for this use case?
– YES… Fraudulent transaction detection requires huge volumes
of customer profile data, recent transaction activity with “last
known” values, device data with geolocation and time-based
tagging, 3rd party news/alerts
– Data used to refine Machine Learning models (e.g., anomaly
detection, implausible behavior analysis) to review new
transactions in real time
What Types of Data?
– Internal sources – often many/overlapping
– Suppression data – keeping customer information updated
– Mobile data – devices, locations
– New sources – social media, 3rd party data, …
Internal Data
Customer Master Data
Point-of-Sale Data
Contact Form Data
Loyalty Program Data
ecommerce Data
Customer Service Data
Mobile Data
Device
Location
Wearables
Mobile wallets
Suppression Data
Change of Address
Mortality
Do Not Call
Social Data
Sentiment
Opinions
Interests
Social handles
18. The Fundamental Data Quality Question:
What are you trying to do?
“Never lead with a data set;
lead with a question.”
Anthony Scriffignano
Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017
“The Data Differentiator”
“If you don’t know what you want to get out of
the data, how can you know what data you need
– and what insight you’re looking for?”
Wolf Ruzicka
Chairman of the Board at EastBanc Technologies
Blog post: June 1, 2017
“Grow A Data Tree Out Of The “Big Data” Swamp”
19. Understanding Data Quality best practices:
Where to start?
Establishing Scope
Asking the “right questions” about your data (not just “what” and “how”)
– “Why” questions to understand core business problem
– “Who” questions to understand varying needs of all involved users (role, function, etc.)
Empowering users (“Who”) to gain new clarity into the core problem (“Why”)
– Bringing together data sources relevant to asking insightful questions of the data
– Enabling the data to answer the questions freely
– Building data analytics, algorithms, machine learning, etc. to expedite and broadcast
answers
Above lines of inquiry inform what Data Quality processing is required
– Determining how, what and where Data Quality is established based on business
problem
– “High-quality data” definition will vary by business problem
20. Understanding Data Quality best practices:
What’s the End Goal?
The End Goal drives Data Quality Requirements & Processes
Do you have all the data required?
– What’s the central entity? E.g. Customer, Product, Asset
– What’s the definition? E.g. “Customers” may mean customers, prospects, store visitors, …
– Are the sources comprehensive? E.g. any data silos? cover all geographies?
– Will “new” information be added? E.g. demographics, geolocation, …
How will data be matched, consolidated, or connected?
– One “golden” record? Or multiple links to connect all the dots?
What’s needed to facilitate the matching, consolidation, or connection required?
– E.g. Customer may need: Name, Address, Geolocation, Phone, Email
Have you evaluated the sources?
– Are the data sources “Fit for Purpose”?
21. Applying Data Quality best practices:
Identifying required Data Quality dimensions
What data do we care about?
• What are the Critical Data Elements?
What measures can we take advantage of?
1) Completeness – Are the relevant fields populated?
2) Integrity – Does the data maintain an internal
structural integrity or a relational integrity across
sources
3) Uniqueness – Are keys or records unique?
4) Validity – Does the data have the correct values?
5) Consistency – Is the data at consistent levels of
aggregation or does it have consistent valid values
over time?
6) Timeliness – Did the data arrive in a time period that
makes it useful or usable?
23. Example: Call Center Record
Unique
Integrity
Complete
? Consistent
Timely
Valid ?
Is Duration = 0 important?
Is 01/01/20xx a defaulted date?
And how will this be linked or
connected with my other data?
The file appears complete, but
does it cover all call centers?
26. What else can we review or measure?
1) Coverage (Relevance) – How well does the data source meet the defined needs?
– E.g. does it cover the relevant geography? Is it biased?
2) Continuity – Data points for all intervals or expected intervals?
– E.g. sensors, weather records, call data records
3) Triangulation – What Gartner describes as ‘consistency of data across proximate data points’, i.e. consistent
measurements from related points of reference.
– E.g. if temperatures in Chicago and Louisville are 30°and 32°then temperature in Indianapolis for same day is
unlikely to be 70°
4) Provenance – Where did the data originate, who gathered it, and what criteria was used to create it?
– E.g. government agency, 3rd party provider, free or paid data
5) Transformation from origin – how many layers and/or changes has the data passed through?
– E.g. has the original data source already been merged with two other record sources? And is the result accurate?
6) Repetition or duplication of data patterns – Data points exactly the same across multiple recording intervals
or across multiple sensors.
– E.g. is there tampering with sensors or call data?
Applying Data Quality best practices:
‘New’ or ‘Extended’ Measures of Data Quality
29. Applying Data Quality best practices:
Understanding Context
Context is critical:
Even on data that is considered
“common” or “understood” such as
Name or Address or Product
Description
To parse or standardize data to
useful and usable components for
additional processing
To determine when and where to
verify or enrich the data content
To determine whether and how to
match records to a given entity
To identify whether to consolidate
data, and if so what other data
drives the consolidation
30. Applying Data Quality best practices:
Assessing Quality Requirements
Entity data (customer, product,
asset, …):
Requires understanding data
provenance and context
Requires integrating data from
multiple data sources
Requires determining whether
specific data should even be included
Presents differences in coverage,
completeness, consistency,
provenance, …
Comes from different points in time
May contain repetitions, particularly
from 3rd Party data sources
May contain data at different levels
of consolidation or aggregation
Robert Smith Jr
3 Davy Drive
S66 7EN
bsmith850@gmail.com
+44(0)1189 823606
Rotherham
Name
Address
City
Postal Code
Phone
Email
3rd Party
31. Applying Data Quality best practices:
Utilizing Data Quality functions to achieve required DQ dimensions
Parse data values from unstructured
fields to their correct domains
Standardize values to enable higher
quality matching and linkage
Verify and enrich global postal
addresses and geolocations
Enrich data from external, third-party
sources to create comprehensive,
unified records
Match and link like records
Consolidate and aggregate to “golden”
record, if appropriate, based on factors
such as data source, date, …
Match records that belong to the same
domain (i.e., household or business)
Smith
3 Davy Drive
S66 7EN
Rotherham
Name
Address
City
Postal Code
Household View
32. Applying Data Quality best practices:
Example
Large telco organization:
“What are our customers
saying about us in the
marketplace?
Where are the most common
complaints are coming from?
Issue: sparse results
concentrated in one region
Required: standardization,
enrichment, geocoding,
matching/record linkage,
address verification
Before Data Quality After Data Quality
33. Applying Data Quality best practices:
Example
Large telco organization:
“What are our most
profitable regions on a daily
basis?
Which are the most
profitable regions?
Issue: poor geolocation
identifying wrong regions
Required: parsing,
standardization, address
verification, enrichment,
geocoding, matching/record
linkage
Before Data Quality After Data Quality
34. Applying Data Quality best practices:
Consistent processing
Big Data at scale distributes data across many nodes –
not necessarily with other relevant data!
– Implications for joining, sorting, and matching data,
whether for enrichment, verification against trusted
sources, or a consolidated single view
Data Quality functions must be performed in a
consistent manner, no matter where actual processing
takes place, how the data is segmented, and what the
data volume is
– Processing routines must apply same approach and logic
each time
– Critical to establishing, building, and maintaining trust
Source: HP Analyst Briefing
35. Trillium Quality for Big Data
Focus on Data Quality, not the Big Data platform
• Use existing Data Quality skills and expertise
• No need to worry about mappers, reducers, big side or small side of joins, etc
• Automatic optimization for best performance, load balancing, etc.
• No changes or tuning required, even if you change execution frameworks
• Future-proof job designs for emerging compute frameworks, e.g. Spark 2.x
• Run multiple execution frameworks in a single job
Single GUI Execute Anywhere!
35Syncsort Confidential and Proprietary - do not copy or distribute
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop
36. Bring Data Quality best practices into the Data Lake:
The Syncsort/Trillium technology approach
“Design once, deploy anywhere”
– Visually design data quality jobs once and run anywhere
(MapReduce, Spark, Linux, Unix, Windows; on premise or
in the cloud)
– Use-case templates to fast-track development
– Test & debug locally in Windows/Linux; deploy to Big Data
– Intelligent Execution dynamically optimizes data
processing at run-time based on the chosen compute
framework; no changes or tuning required
Benefit: Significantly reduce manual data preparation
– Major time sink for data scientists, architects and analysts
– Risk of inconsistent or incomplete data preparation
Benefit: Significantly increase trust in data
– Major time sink for executives
– Risk of poor data-based business decisions
Single GUI
Execute Anywhere!
37. “Data is useful. High-quality, well-understood,
auditable data is priceless.”
Ted Friedman
VP Distinguished Analyst
Article in CRM.com: Mar 8, 2005
“The Coming of BI Competency Centers”
Data Quality remains Data Quality, even at scale
“Data is the new science. Big Data holds the
answers. Are you asking the right questions?”
Pat Gelsinger
President and COO at EMC
Forbes Insights, June 22, 2012
“Big Bets On Big Data”
38. Questions and Next Steps
For more information on Trillium Quality for Big Data, visit: trilliumsoftware.com/products/big-data
Contact Info:
Mike Urbonas, Director of Product Marketing, Syncsort/Trillium Software
murbonas@syncsort.com
https://www.linkedin.com/in/mikeu
Harald Smith, Director of Product Management, Syncsort/Trillium Software
harald_smith@trilliumsoftware.com
https://www.linkedin.com/in/harald-smith-71028b