Achieving agility in data and analytics is hard. It’s no secret that most data organizations struggle to deliver the on-demand data products that their business customers demand. Recently, there has been much hype around new design patterns that promise to deliver this much sought-after agility.
In this webinar, Chris Bergh, CEO and Head Chef of DataKitchen will cut through the noise and describe several elegant and effective data architecture design patterns that deliver low errors, rapid development, and high levels of collaboration. He’ll cover:
• DataOps, Data Mesh, Functional Design, and Hub & Spoke design patterns;
• Where Data Fabric fits into your architecture;
• How different patterns can work together to maximize agility; and
• How a DataOps platform serves as the foundational superstructure for your agile architecture.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
More organizations are aspiring to become ‘data driven businesses’. But all too often this aim fails, as business goals and IT & data realities are misaligned, with IT lagging behind rapidly changing business needs. So how do you get the perfect fit where data strategy is driven by and underpins business strategy? This webinar will show you how by de-mystifying the building blocks of a global data strategy and highlighting a number of real world success stories. Topics include:
•How to align data strategy with business motivation and drivers
•Why business & data strategies often become misaligned & the impact
•Defining the core building blocks of a successful data strategy
•The role of business and IT
•Success stories in implementing global data strategies
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
More organizations are aspiring to become ‘data driven businesses’. But all too often this aim fails, as business goals and IT & data realities are misaligned, with IT lagging behind rapidly changing business needs. So how do you get the perfect fit where data strategy is driven by and underpins business strategy? This webinar will show you how by de-mystifying the building blocks of a global data strategy and highlighting a number of real world success stories. Topics include:
•How to align data strategy with business motivation and drivers
•Why business & data strategies often become misaligned & the impact
•Defining the core building blocks of a successful data strategy
•The role of business and IT
•Success stories in implementing global data strategies
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback. In this presentation, we will explain the methodology, and focus on practical aspects of DataOps.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
In this webinar, Bob will focus on:
-Selecting the appropriate metadata to govern
-The business and technical value of a data catalog
-Building the catalog into people’s routines
-Positioning the data catalog for success
-Questions the data catalog can answer
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Data-Ed Online: Approaching Data QualityDATAVERSITY
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Learning Objectives:
Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Data and Application Modernization in the Age of the Cloudredmondpulver
Data modernization is key to unlocking the full potential of your IT investments, both on premises and in the cloud. Enterprises and organizations of all sizes rely on their data to power advanced analytics, machine learning, and artificial intelligence.
Yet the path to modernizing legacy data systems for the cloud is full of pitfalls that cost time, money, and resources. These issues include high hardware and staffing costs, difficulty moving data and analytical processes to cloud environments, and inadequate support for real-time use cases. These issues delay delivery timelines and increase costs, impacting the return on investment for new, cutting-edge applications.
Watch this webinar in which James Kobielus, TDWI senior research director for data management, explores how enterprises are modernizing their mainframe data and application infrastructures in the cloud to sustain innovation and drive efficiencies. Kobielus will engage John de Saint Phalle, senior product manager at Precisely, in a discussion that addresses the following key questions:
When should enterprises consider migrating and replicating all their data assets to modern public clouds vs. retaining some on-premises in hybrid deployments?How should enterprises modernize their legacy data and application infrastructures to unlock innovation and value in the age of cloud computing?What are the key investments that enterprises should make to modernize their data pipelines to deliver better AI/ML applications in the cloud?What is the optimal data engineering workflow for building, testing, and operationalizing high-quality modern AI/ML applications in the cloud?What value does real-time replication play in migrating data and applications to modern cloud data architectures?What challenges do enterprises face in ensuring and maintaining the integrity, fitness, and quality of the data that they migrate to modern clouds?What tools and methodologies should enterprise application developers use to refactor and transform legacy data applications that have migrated to modern clouds
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback. In this presentation, we will explain the methodology, and focus on practical aspects of DataOps.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
In this webinar, Bob will focus on:
-Selecting the appropriate metadata to govern
-The business and technical value of a data catalog
-Building the catalog into people’s routines
-Positioning the data catalog for success
-Questions the data catalog can answer
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
Business intelligence (BI) and data analytics are increasing in popularity as more organizations are looking to become more data-driven. Many tools have powerful visualization techniques that can create dynamic displays of critical information. To ensure that the data displayed on these visualizations is accurate and timely, a strong Data Architecture is needed. Join this webinar to understand how to create a robust Data Architecture for BI and data analytics that takes both business and technology needs into consideration.
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Data-Ed Online: Approaching Data QualityDATAVERSITY
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy. This, in turn, allows for speedy identification of business problems, the delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This, in turn, allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Learning Objectives:
Help you understand foundational Data Quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBoK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Data and Application Modernization in the Age of the Cloudredmondpulver
Data modernization is key to unlocking the full potential of your IT investments, both on premises and in the cloud. Enterprises and organizations of all sizes rely on their data to power advanced analytics, machine learning, and artificial intelligence.
Yet the path to modernizing legacy data systems for the cloud is full of pitfalls that cost time, money, and resources. These issues include high hardware and staffing costs, difficulty moving data and analytical processes to cloud environments, and inadequate support for real-time use cases. These issues delay delivery timelines and increase costs, impacting the return on investment for new, cutting-edge applications.
Watch this webinar in which James Kobielus, TDWI senior research director for data management, explores how enterprises are modernizing their mainframe data and application infrastructures in the cloud to sustain innovation and drive efficiencies. Kobielus will engage John de Saint Phalle, senior product manager at Precisely, in a discussion that addresses the following key questions:
When should enterprises consider migrating and replicating all their data assets to modern public clouds vs. retaining some on-premises in hybrid deployments?How should enterprises modernize their legacy data and application infrastructures to unlock innovation and value in the age of cloud computing?What are the key investments that enterprises should make to modernize their data pipelines to deliver better AI/ML applications in the cloud?What is the optimal data engineering workflow for building, testing, and operationalizing high-quality modern AI/ML applications in the cloud?What value does real-time replication play in migrating data and applications to modern cloud data architectures?What challenges do enterprises face in ensuring and maintaining the integrity, fitness, and quality of the data that they migrate to modern clouds?What tools and methodologies should enterprise application developers use to refactor and transform legacy data applications that have migrated to modern clouds
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
Modernize your Infrastructure and Mobilize Your DataPrecisely
Modernizing your infrastructure can get complicated really fast. The keys to success involve breaking down data silos and moving data to the cloud in real time. But building data pipelines to mobilize your data in the cloud can be time consuming. You need solutions that decrease bandwidth, ensure data consistency, and enable data migration and replication in real-time; solutions that help you build data pipelines in hours, not days.
Watch this on-demand webinar to learn about the trends and pitfalls related to modernizing your infrastructure to cloud, how the pace of on-prem data growth demands accelerating data streaming to analytics platforms, and why mobilizing your data for the cloud improves business outcomes.
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
The Briefing Room with Dean Abbott and Tableau Software
Live Webcast July 23, 2013
http://www.insideanalysis.com
Today’s desire for analytics extends well beyond the traditional domain of Business Intelligence. That’s partly because business users are realizing the value of mixing and matching all kinds of data, from all kinds of sources. One emerging market driver is Cloud-based data, and the desire companies have to analyze this data cohesively with their on-premise data sets.
Register for this episode of The Briefing Room to learn from Analyst Dean Abbott, who will explain how the ability to access data in the cloud can play a critical role for generating business value from analytics. He’ll be briefed by Ellie Fields of Tableau Software who will tout Tableau’s latest release, which includes native connectors to cloud-based applications like Salesforce.com, Amazon Redshift, Google Analytics and BigQuery. She’ll also demonstrate how Tableau can combine cloud data with other data sources, including spreadsheets, databases, cubes and even Big Data.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
Memoori's 10th Webinar in the 2019 Smart Buildings Series. We spoke with Chris Irwin, VP Sales EMEA & Asia at J2 Innovations about the FIN 5 software framework and “Simplifying Building Automation by Leveraging Semantic Tagging with a New Breed of Software”.
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps Given at DevOps Enterprise Summit 2019
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Watch here: https://bit.ly/3i2iJbu
You will often hear that "data is the new gold". In this context, data management is one of the areas that has received more attention by the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
Join us for an exciting session that will cover:
- The most interesting trends in data management.
- Our predictions on how those trends will change the data management world.
- How these trends are shaping the future of data virtualization and our own software.
Watch full webinar here: https://buff.ly/2mHGaLA
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
• What data virtualization really is
• How it differs from other enterprise data integration technologies
• Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Watch full webinar here: https://bit.ly/3mdj9i7
You will often hear that "data is the new gold"? In this context, data management is one of the areas that has received more attention from the software community in recent years. From Artificial Intelligence and Machine Learning to new ways to store and process data, the landscape for data management is in constant evolution. From the privileged perspective of an enterprise middleware platform, we at Denodo have the advantage of seeing many of these changes happen.
In this webinar, we will discuss the technology trends that will drive the enterprise data strategies in the years to come. Don't miss it if you want to keep yourself informed about how to convert your data to strategic assets in order to complete the data-driven transformation in your company.
Watch this on-demand webinar as we cover:
- The most interesting trends in data management
- How to build a data fabric architecture?
- How to manage your data integration strategy in the new hybrid world
- Our predictions on how those trends will change the data management world
- How can companies monetize the data through data-as-a-service infrastructure?
- What is the role of voice computing in future data analytic
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a comprehensive platform designed to address multi-faceted needs by offering multi-function data management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion.
In this research-based session, I’ll discuss what the components are in multiple modern enterprise analytics stacks (i.e., dedicated compute, storage, data integration, streaming, etc.) and focus on total cost of ownership.
A complete machine learning infrastructure cost for the first modern use case at a midsize to large enterprise will be anywhere from $3 million to $22 million. Get this data point as you take the next steps on your journey into the highest spend and return item for most companies in the next several years.
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.
Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.
What is data literacy? Which organizations, and which workers in those organizations, need to be data-literate? There are seemingly hundreds of definitions of data literacy, along with almost as many opinions about how to achieve it.
In a broader perspective, companies must consider whether data literacy is an isolated goal or one component of a broader learning strategy to address skill deficits. How does data literacy compare to other types of skills or “literacy” such as business acumen?
This session will position data literacy in the context of other worker skills as a framework for understanding how and where it fits and how to advocate for its importance.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Uncover how your business can save money and find new revenue streams.
Driving profitability is a top priority for companies globally, especially in uncertain economic times. It's imperative that companies reimagine growth strategies and improve process efficiencies to help cut costs and drive revenue – but how?
By leveraging data-driven strategies layered with artificial intelligence, companies can achieve untapped potential and help their businesses save money and drive profitability.
In this webinar, you'll learn:
- How your company can leverage data and AI to reduce spending and costs
- Ways you can monetize data and AI and uncover new growth strategies
- How different companies have implemented these strategies to achieve cost optimization benefits
Because every organization produces and propagates data as part of their day-to-day operations, data trends are becoming more and more important in the mainstream business world’s consciousness. For many organizations in various industries, though, comprehension of this development begins and ends with buzzwords: “Big Data,” “NoSQL,” “Data Scientist,” and so on. Few realize that all solutions to their business problems, regardless of platform or relevant technology, rely to a critical extent on the data model supporting them. As such, data modeling is not an optional task for an organization’s data effort, but rather a vital activity that facilitates the solutions driving your business. Since quality engineering/architecture work products do not happen accidentally, the more your organization depends on automation, the more important the data models driving the engineering and architecture activities of your organization. This webinar illustrates data modeling as a key activity upon which so much technology and business investment depends.
Specific learning objectives include:
- Understanding what types of challenges require data modeling to be part of the solution
- How automation requires standardization on derivable via data modeling techniques
- Why only a working partnership between data and the business can produce useful outcomes
Analytics play a critical role in supporting strategic business initiatives. Despite the obvious value to analytic professionals of providing the analytics for these initiatives, many executives question the economic return of analytics as well as data lakes, machine learning, master data management, and the like.
Technology professionals need to calculate and present business value in terms business executives can understand. Unfortunately, most IT professionals lack the knowledge required to develop comprehensive cost-benefit analyses and return on investment (ROI) measurements.
This session provides a framework to help technology professionals research, measure, and present the economic value of a proposed or existing analytics initiative, no matter the form that the business benefit arises. The session will provide practical advice about how to calculate ROI and the formulas, and how to collect the necessary information.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Enterprise data literacy. A worthy objective? Certainly! A realistic goal? That remains to be seen. As companies consider investing in data literacy education, questions arise about its value and purpose. While the destination – having a data-fluent workforce – is attractive, we wonder how (and if) we can get there.
Kicking off this webinar series, we begin with a panel discussion to explore the landscape of literacy, including expert positions and results from focus groups:
- why it matters,
- what it means,
- what gets in the way,
- who needs it (and how much they need),
- what companies believe it will accomplish.
In this engaging discussion about literacy, we will set the stage for future webinars to answer specific questions and feature successful literacy efforts.
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
Change is hard, especially in response to negative stimuli or what is perceived as negative stimuli. So organizations need to reframe how they think about data privacy, security and governance, treating them as value centers to 1) ensure enterprise data can flow where it needs to, 2) prevent – not just react – to internal and external threats, and 3) comply with data privacy and security regulations.
Working together, these roles can accelerate faster access to approved, relevant and higher quality data – and that means more successful use cases, faster speed to insights, and better business outcomes. However, both new information and tools are required to make the shift from defense to offense, reducing data drama while increasing its value.
Join us for this panel discussion with experts in these fields as they discuss:
- Recent research about where data privacy, security and governance stand
- The most valuable enterprise data use cases
- The common obstacles to data value creation
- New approaches to data privacy, security and governance
- Their advice on how to shift from a reactive to resilient mindset/culture/organization
You’ll be educated, entertained and inspired by this panel and their expertise in using the data trifecta to innovate more often, operate more efficiently, and differentiate more strategically.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
As DATAVERSITY’s RWDG series hurdles into our 12th year, this webinar takes a quick look behind us, evaluates the present, and predicts the future of Data Governance. Based on webinar numbers, hot Data Governance topics have evolved over the years from policies and best practices, roles and tools, data catalogs and frameworks, to supporting data mesh and fabric, artificial intelligence, virtualization, literacy, and metadata governance.
Join Bob Seiner as he reflects on the past and what has and has not worked, while sharing examples of enterprise successes and struggles. In this webinar, Bob will challenge the audience to stay a step ahead by learning from the past and blazing a new trail into the future of Data Governance.
In this webinar, Bob will focus on:
- Data Governance’s past, present, and future
- How trials and tribulations evolve to success
- Leveraging lessons learned to improve productivity
- The great Data Governance tool explosion
- The future of Data Governance
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
Would you share your bank account information on social media? How about shouting your social security number on the New York City subway? We didn’t think so either – that’s why data governance is consistently top of mind.
In this webinar, we’ll discuss the common Cloud data governance best practices – and how to apply them today. Join us to uncover Google Cloud’s investment in data governance and learn practical and doable methods around key management and confidential computing. Hear real customer experiences and leave with insights that you can share with your team. Let’s get solving.
Topics that you will hear addressed in this webinar:
- Understanding the basics of Cloud Incident Response (IR) and anticipated data governance trends
- Best practices for key management and apply data governance to your day-to-day
- The next wave of Confidential Computing and how to get started, including a demo
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the enterprise mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and data architecture. William will kick off the fifth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Too often I hear the question “Can you help me with our data strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component: the data strategy itself. A more useful request is: “Can you help me apply data strategically?” Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) data strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” This program refocuses efforts on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. It also contributes to three primary organizational data goals. Learn how to improve the following:
- Your organization’s data
- The way your people use data
- The way your people use data to achieve your organizational strategy
This will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs as organizations identify prioritized areas where better assets, literacy, and support (data strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why data strategy is necessary for effective data governance
- An overview of prerequisites for effective strategic use of data strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Who Should Own Data Governance – IT or Business?DATAVERSITY
The question is asked all the time: “What part of the organization should own your Data Governance program?” The typical answers are “the business” and “IT (information technology).” Another answer to that question is “Yes.” The program must be owned and reside somewhere in the organization. You may ask yourself if there is a correct answer to the question.
Join this new RWDG webinar with Bob Seiner where Bob will answer the question that is the title of this webinar. Determining ownership of Data Governance is a vital first step. Figuring out the appropriate part of the organization to manage the program is an important second step. This webinar will help you address these questions and more.
In this session Bob will share:
- What is meant by “the business” when it comes to owning Data Governance
- Why some people say that Data Governance in IT is destined to fail
- Examples of IT positioned Data Governance success
- Considerations for answering the question in your organization
- The final answer to the question of who should own Data Governance
It is clear that Data Management best practices exist and so does a useful process for improving existing Data Management practices. The question arises: Since we understand the goal, how does one design a process for Data Management goal achievement? This program describes what must be done at the programmatic level to achieve better data use and a way to implement this as part of your data program. The approach combines DMBoK content and CMMI/DMM processes – permitting organizations with the opportunity to benefit from the best of both. It also permits organizations to understand:
- Their current Data Management practices
- Strengths that should be leveraged
- Remediation opportunities
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
With the explosive growth of DataOps to drive faster and more confident business decisions, proactively understanding the quality and health of your data is more important than ever. Data observability is an emerging discipline within data quality used to expose anomalies in data by continuously monitoring and testing data using artificial intelligence and machine learning to trigger alerts when issues are discovered.
Join Julie Skeen and Shalaish Koul from Precisely, to learn how data observability can be used as part of a DataOps strategy to improve data quality and reliability and to prevent data issues from wreaking havoc on your analytics and ensure that your organization can confidently rely on the data used for advanced analytics and business intelligence.
Topics you will hear addressed in this webinar:
Data observability – what is it and how it can complement your data quality strategy
Why now is the time to incorporate data observability into your DataOps strategy
How data observability helps prevent data issues from impacting downstream analytics
Examples of how data observability can be used to prevent real-world issues
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
By consolidating data engineering, data warehouse, and data science capabilities under a single fully-managed platform, BigQuery can accelerate computation, reduce data analysis costs, and streamline data management.
Following in-depth interviews with a security services provider and a telecommunications company, Nucleus Research found that customers moving to Google Cloud BigQuery from on-premises data warehouse solutions accelerate data processing by over 75 percent while reducing data ongoing administrative expenses by over 25 percent.
As BigQuery continues to optimize its platform architecture for compute efficiency and multicloud support, Nucleus expects the vendor to see rapid adoption and further penetrate the data warehouse market.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
3. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Our Focus Is The River Of Work Right In Front Of
Us
• The Model,
• The Algorithm,
• The Data Pipeline,
• The Data Visualization,
• The Governance,
• The Data Itself
What is my next task?
4. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Next Task Focus Is Making Us Blind To Failure
• The Model,
• The Algorithm,
• The Data Pipeline,
• The Data Visualization,
• The Governance,
• The Data Itself
Task Focus Not Working
5. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Look Upstream At The Source Of The Problem
• Develop
• Deploy
• Iterate
• Monitor
• Test
• Collaborate
How You Do It
6. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
How? Focus On Four Key Upstream Processes
Decrease The Cycle Time:
Continuously Deploy
Innovation
Lower Error Rates: Increasing
Customer Data Trust
Improve Collaboration: Less
Meetings & Bureaucracy
Measure Your Team: And
show everyone your success
7. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Aligns People, Processes,
and Technology
Rapid experimentation and innovation
enables faster delivery
Low error rates
Collaboration across complex sets of
people, technology, and
environments
Clear measurement and monitoring of
results
8. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
What Problems Do We Need To Solve With
Architecture for AI and Data Analytics?
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
10. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Gartner Data Fabric
“Data fabric focuses on composability,
allowing users to build a flexible, agile,
scalable architecture that will be able
to supply data to humans or machine
users.
Data fabric is a design concept, not just
a set of technology components. “
11. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric Toolchain Elements
Store: Transform:
SQL Code, ETL
Govern:
Catalog
12. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric Toolchain Elements
Store: Transform:
SQL Code, ETL
Virtualize:
layer
Govern:
Catalog
Includes Data
Virtualization in
Reference Fabric
Design
Includes Data
Streaming in
Reference Fabric
Design
13. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric: Beware Magic of ‘AI Inside’
Store: Transform:
SQL Code, ETL
Virtualize:
layer
Govern:
Catalog
AI
AI
AI AI
Magic AI:
Danger Will
Robinson
14. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric: Beware Magic of ‘AI Inside’
Think of ‘AI Inside’ of Data Fabric like
autonomous driving:
• Level 1: Simple, keep your hands
on wheel
• Level 5: Cross Boston, in the
snow, at night
We are at Level 1 of AI in the Data
Fabric
15. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
AI + New Tools Agility
16. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
People & Tools in a
DataOps
Architecture
Agility
AI + New Tools
17. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Canonical ‘Factory’ Data Architecture / Fabric
18. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Functional Architecture
Cloud/On-Prem
Production
Environment
Test
Dev
Source
Data
Data
Customers
Raw
Lake
Data
Engine
-ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern
-ance
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
DataOps Platform
Storage
&Version
Control
History &
Metadat
a
Auth &
Permissions
Envron-
ment
Secrets
DataOps
Metrics &
Reports
Automated
Deployment
Environment
Creation
and
Management
DataOps
Team
Second
Cloud/On-
Prem Data
Center
19. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Physical Architecture
Cloud/On-Prem
Data
Center
Production
Environment
Test
Dev
Source
Data
Data
Customers
Agent
Agent
Agent
DataOps Platform
Storage Metadat
a
Auth Secrets Metrics
Raw
Lake
Data
Engine
-ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern
-ance
Second
Cloud/On-
Prem Data
Center
Agent DataOps
Team
20. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Cloud/ON-Prem
#1
Production
Environment
Test
Dev
Agent
Agent
Agent
DataOps
Team
DataOps Pipeline
Cloud/On
Prem
#2
Production
Environment
Dev
Agent
Agent
DataOps Pipeline
DataOps Platform
Storage
&Version
Control
History &
Metadat
a
Auth &
Permissions
Envron-
ment
Secrets
DataOps
Metrics &
Reports
DataOps Spans Environments
21. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric – A New Fashion Trend?
• It's Hot Stuff:
Gartner View, Forrester View. Top 10 downloaded report 2020, top inquiry
• What is a data fabric?:
• All the stuff you do with centralized data infrastructure:
ETL, DB, governance, store, lake, warehouse, stream/batch transformation.
• Plus, some fancy new stuff
1. AI component - magic pixie dust of self-driving data
2. Data virtualization/semantic layer
• However, it is missing other parts of the data value chain:
models, visualizations, self service. It’s more ‘hub’ than ‘spoke’
• Why? Moniker that covers the latest trends in data management.
• Caveat: The goal of implementing a data fabric is agility - agility is a second-order effect from
better tools. The primary driver is people & process following DataOps.
22. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
23. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Mesh 101
Why Data Mesh?
• Centralized Systems Fail
• Skill-based roles are unable to respond to rapid
customer needs
• Data domain knowledge matters
• Universal, one size fits all patterns fail
• General Data Analytic Project Failure
• Inspired by domain driven design (DDD) in software
The main idea is to take a best practice from
developing software & apply them to data analytics.
(Sound familiar?)
24. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
The Human Side of a Data Mesh: Main Idea
• The organization structure builds walls
& barriers to the changes
• When you make a change, you need to
update each component & coordinate
between several different teams
The organization creates walls & changes need to cross the traditional organizational boundaries
25. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
No, Data Engineers Are Not Perfectly Fungible
Data Mesh = Organization Mesh
The use of domain-driven / data mesh
design as the primary means:
1. Assignment of full end-to-end
ownership of a domain to one
cross-functional team that gets the
necessary support to fulfil that
responsibility.
2. Structure data
3. Build composable systems
Data Organization Keys
Let the small team continually own the
data set & not move for project to project
is key
‘You own the product’ thinking provides
the right incentives between the producers
& consumers
Source: thoughtworks.com/insights/blog/data-mesh-its-not-about-tech-its-about-ownership-and-communication
26. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
● Take the ideas of microservices where a team
owns the dev, test, deploy & running of the
microservice (5-9 people)
● Organize around the domain, not the technology
● The Operational & Data products are created by
the same team
● Domain data as a product - domain data teams
must consider their data assets & artifacts as their
products & others as their customers
● Data Engineers must live, work & understand a
finite number of data sets to really add value
The Human Side of a Data Mesh: Main Idea
The organization creates walls & changes need to cross the traditional organizational boundaries
27. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What Data is in a Domain?
Domains Aligned with Sources / Types of Data
• ‘Mastered’ Data:
• Entities of business / subject areas
• Customers, products, etc.
• ‘Sources’ of Data:
• Business reality: facts on the ground
• Weblogs, user interaction history
Domains Aligned with Consumption of Data
• Integrated Data / Ready for Consumption
• Facts / Dimensions / Star Schemas
• Aggregated Views
• Product View
• Never Done, Always Improving
• Customer Usage Fucus
28. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What are the Domain’s Components?
1. Data
2. Artifacts created from that data:
models, views, reports, dashboards, etc.
3. Code that acts upon that data:
pipelines, toolchains, etc.
4. Team used to create/update/run that Domain
5. Metadata: catalogs, lineage, test results,
processing history, etc.
Data Domain 1
29. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Must Be Composable & Controllable
Data Domain 1
Data Domain
2
Data Domain
3
30. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Interfaces
Data Domain
The Where:
How to find & access data securely;
e.g., DB connect string
The What:
Description of the data;
e.g., data catalog URL
The When:
Processing Results, Timing,
Test Results, Status, etc.
The How:
Steps, Code/Config, toolchain
& processing pipeline
The With:
Raw Data (or other Data
Domain), hopefully immutable
31. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Interfaces as URLs
https://cloud.datakitchen.io/#/recipes/dc/Production/agile-analytic-ops/variations/prod-env-DevSprint-build-now
https://cloud.datakitchen.io/#/orders/dc/Production/runs/60e82aa8-2518-11eb-8653-c2e92ba8ebec
jdbc:redshift://endpoint:port/database
https://dkimplementation.atlassian.net/wiki/spaces/
DC/pages/9306114/Dimension+Tables
Data Domain
32. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What Do You Want Out of a Domain?
A series of independent domains of data that are:
1. Trusted
2. Usable by the teams’ customer
3. Discoverable / Findable
4. Understandable & well-described
5. Secure & permissioned
6. URL/API Driven: & can inter-operate with other domains
7. Have ‘single throat to choke’ for the customer to easily:
• Report problem & get updates on fixes
• Ask for new insights / improvements & get them into
production quickly
33. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Mesh Change in Focus
1. Domains & the grouping of your work into small teams
& partitions over ‘one platform to rule them all’
2. What services you are providing you customer, rather
than what data you are loading
3. Discovering & using over extracting & loading
4. Decentralization & the freedom to innovate over
central control
5. Ecosystem of data products linked together over a
centralized lake / warehouse
34. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
An Example of Domains
US Commercial Pharma Domains
• NPP (Non-Personal Promotion): emails, web site visits, even radio ads
• Physician: doctor (& other outlets) sales, claims data, anonymized patient data
• Payer: Payer/Plan, rebates, formulary
Launch:
NPP Domain
Growth:
Physician
Domain
Mature:
Payer
Domain
Commercial Pharma Analytics
35. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What About the Data?
What about the data in each domain?
• Each domain has separate data sources
• Overlapping entities (e.g., physicians) exist in
each domain
• Each domain has different cycle times of product
(i.e., daily, weekly, hourly, etc.)
• Each data domain has its unique characteristics.
• For instance, subnational physician data from
IQVIA - purchased by pharma companies -
may not 1:1 match claims data, which may
not match payer data. This is due to data
supplier issues & timing projection
algorithms.
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
36. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Pharma Sales & Marketing Teams
NPP Domain Marketing & Sales Team
One part of the pharma brand team focused on ads, digital & other non-personal
promotions. This team matters most pre-launch & during the growth phase of a product
Physician Domain Marketing & Sales Team
Another part of the pharma team focused on in-person sales. Those are the good-looking
people you see in doctors waiting rooms. Sales calls, samples, doctor visits, messages,
call alignments, etc. This team matters the most during the first years of a pharma launch.
Payer Domain Marketing & Sales Team
A third part is focused on Payer Marketing. This part is - in essence - controlling the price
of a pharmaceutical product due to the rebate given to any payer. They are concerned
about the rebate contract, being on formulary & tier & copays. Payer Marketing matters
more during the 'mature' phase of a pharma product lifecycle.
37. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers
1. Mastering & small files foundation files are a domain layer
There are 1M physicians in the US, but the company master of
physicians is only 40K. This work is done by separate teams working
independently.
2. Of course, the main data warehouse is a domain layer
There are facts & dimensions, along with multiple tables used for specific
analysts needed.
3. Self/Service & Data Science are a domain layers
They can keep their owned cached data sets (e.g., tableau extract) or
have their own small data sets that they mix with the central data in
Alteryx (or other) tools. Data Science teams have their own segmentation
models dependent on specific views or extracts of data.
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
38. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Mastering Domain:
Physician MDM
Mastering Domain:
Target Lists, Product
Market Baskets
Brand Team
Reporting Domain
Field Sales Reporting
Domain
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
39. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Relationships
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Mastering Domain:
Physician MDM
Mastering Domain:
Target Lists, Product
Market Baskets
Brand Team
Reporting Domain
Field Sales Reporting
Domain
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
40. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Steps
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
Mastering Domain:
Physician MDM
Brand Team
Reporting Domain
41. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Benefits of Approach
• Yes, you can do all these four Data Architecture Mega-
Patterns for Agility!
• Benefits
• Support over $10 Billion in sales
• Integrated 100s of data sets
• Very, very few errors or missed SLAs
• > 50,000 automated tests
• > 100 of schema/data changes per week
• Staff of seven data and DataOps engineers
• Low total yearly costs
hardware/hosting/software/staffing
• DataKitchen software enables those four patterns:
Recipes, Tests, Kitchens and Especially Ingredients can
handle all the needs
42. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
43. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Built With Functional Programming
• Start with immutable (never
changing) data
• Pure functions (you put some
data in & get some data out)
• Idempotency (you can run it over
again & get the same thing)
• No side effects
44. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Approach Benefits
Reproducibility
• Foundational to the scientific method
and data science / AI
• Critical from a legal standpoint and
sanity standpoint
Complexity Reduction
Cloud Native
• Storage and compute are cheap
Faster Time To Value
45. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Data Mesh Systems
Production
Data
Analytic
Customers
Production Team
Yeah! All my tests & monitors
are passing!
Happy Customers!
Think of all your data & analytic work as a
“Big Function” in domain
• In that function are your data & AI toolchain
• Everybody works that function
(whether they know it or not!)
• Re-running a task for the same date should
always produce same output
• Data can be repaired by rerunning the new code
• A ‘big red/green light’ on the system telling you
everything is OK
Data
Domain
46. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Data Systems Are Easier to Test & Deploy
Yeah! All my tests & monitors are
passing!
I did not break any code!
I can safely push to production!
A safe controlled process
Production
Data
Production Team
Data
Domain
Test
Data
Development Team
Data
Domain
Just flip the DNS entry for
the production URL!
47. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Why DataKitchen supports these four patterns
easily!
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
48. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Relationships
How do we update the data?
• Each Domain layer its own domain update processing
• Each layer has their own toolchain (i.e., SQL, Python, Informatica, etc.)
• Each layer has a series of sub-steps (i.e., a ‘DAG’)
• Each layer wants to know if the build is completed, the test applied & if the data is data is correct
What causes the update of each domain?
• Time / Schedule
• Order of operations, a meta-orchestrated coupling of each Domain, one part may need to be done
before the other or after.
• Event-orchestrated coupling. When new data arrives, kick off a change.
You Need a ‘Master DAG’ to run them all
49. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Inter-Domain Communication Links
Field Sales
Reporting Domain
Inter-Domain Communication Question / Steps Asked
Domain Query
“When was the last time you were updated?”
Successful or failure? Warnings?
Domain Query
“Is the data or artifacts in your domain good?
Can you prove it with some test results?”
Process Linkage
“Ok, you start. I am done.”
Process Linkage
“Ok, you start. I am done & here are a bunch of parameters you need to
keep going.”
Event Linkage
“Here is an event: e.g., processing completed, error, warnings, etc.”
Data Linkage
“We share a common table (e.g., a dimension table) in our domain.”
Development Linkage
“Can I re-create your domain in development?”
Can I see the code you used to create it?”
“Can I modify that code in development?”
“Is there a path to production?”
{ … }
50. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Supported Inter-Domain Communication Links
Field Sales
Reporting Domain
Inter Domain Communication DataKitchen Support
Domain Query YES
Domain Query YES
Process Linkage YES
Process Linkage YES
Event Linkage YES
Data Linkage NO
Development Linkage YES
{ … }
51. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Development Process
The development process is essential.
• Code changes or new data sets may affect
downstream parts of the mesh.
• DataKitchen encapsules the development
& production environments
Key Questions
• How does a developer change one part
& not break things?
• How do you allow local change to a
domain & global governance & control?
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
Production Domains
Development of Domains
How do I change
this part & not
break things?
52. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Software's Role (Recipes)
DataKitchen DataOps Capability
Intelligent, test-informed, system-wide production
orchestration (meta-orchestration)
What workflow tools like Airflow, Control-
M, or Azure Data Factory do not have
• Integrated Production Testing & Monitoring
• A set of connectors to the complex chain of
data engineering, science, analytics, self-
service, governance & database tools.
• DataKitchen Recipes Meta-Orchestration or a
‘DAG of DAGs’
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
DataKitchen Recipe
53. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Domain Interfaces As URLs
https://cloud.datakitchen.io/#/recipes/dc/Production/agile-
analytic-ops/variations/prod-env-DevSprint-build-now
Data Domain
The When:
DataKitchen OrderRun information
The How:
DataKitchen Recipe
https://cloud.datakitchen.io/#/orders/dc/
Production/runs/60e82aa8-2518-11eb-
8653-c2e92ba8ebec
54. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Ingredients Allow Composition
• DataKitchen Ingredients allow reusable components that
can be incorporated into other processing
• Each domain can change independently, with a centralized
process to make sure the entire system is correct
• While DataKitchen Kitchens lets people work
independently, Ingredients let people work dependently:
• Recipes can reuse the data or artifacts that other Recipe
Variations produce
• Recipes need to incorporate other Recipes Variations
when they run
55. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Conclusion
Data Fabric, Data Mesh, and Functional Data engineering are exciting new paradigms
However, the DataOps part of is of paramount importance!
• The lineages & composition between domains are important
• Managing central process control & governance with local domain independence is very important
DataKitchen Features (e.g., Recipes, Tests, Kitchens & Ingredients) can handle all the needs of
the DataOps part of the mesh
56. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Accelerate Theses Patterns With DataKitchen
Software
DataKitchen DataOps Software Platform
that delivers new business insights by
enabling the development and
deployment of innovative, high quality
data analytic pipelines. Rapidly
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
57. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Learn More !
Sign The DataOps Manifesto:
http://dataopsmanifesto.org
Free DataOps Cookbook:
https://datakitchen.io/the-dataops-cookbook/
Free DataOps Transformation Book
https://datakitchen.io/recipes-for-dataops-success-guide-to-dataops-transformation/
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering