Yelp has operated our connector ecosystem to feed vital data to domain-specific teams and data stores. We share some of our learning and experiences on operating such system. We will touch on what is the next phase of the system evolution.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Analytics play a critical role in supporting strategic business initiatives. Despite the obvious value to analytic professionals of providing the analytics for these initiatives, many executives question the economic return of analytics as well as data lakes, machine learning, master data management, and the like.
Technology professionals need to calculate and present business value in terms business executives can understand. Unfortunately, most IT professionals lack the knowledge required to develop comprehensive cost-benefit analyses and return on investment (ROI) measurements.
This session provides a framework to help technology professionals research, measure, and present the economic value of a proposed or existing analytics initiative, no matter the form that the business benefit arises. The session will provide practical advice about how to calculate ROI and the formulas, and how to collect the necessary information.
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
In order to find value in your organization's data assets, heroic data stewards are tasked with saving the day- every single day! These heroes adhere to a data governance framework and work to ensure that data is: captured right the first time, validated through automated means, and integrated into business processes. Whether its data profiling or in depth root cause analysis, data stewards can be counted on to ensure the organization's mission critical data is reliable. In this webinar we will approach this framework, and punctuate important facets of a data steward’s role.
Learning Objectives:
- Understand the business need for a data governance framework
- Learn why embedded data quality principles are an important part of system/process design
- Identify opportunities to help drive your organization to a data driven culture
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
We have seen vast improvements to data collection, storage, processing and transport in recent years. An increasing number of networked devices are emitting data and all of us are preparing to handle this wave of valuable data.
Have we, as data professionals, been too focused on the technical challenges and analytical results?
What about the data quality? Are we confident about it? How can we be sure we are making good decisions?
We need to revisit methods of assessing data quality on our modernized data platforms. The quality of our decision making depends on it.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Data Mesh is a new socio-technical approach to data architecture, first described by Zhamak Dehghani and popularised through a guest blog post on Martin Fowler's site.
Since then, community interest has grown, due to Data Mesh's ability to explain and address the frustrations that many organisations are experiencing as they try to get value from their data. The 2022 publication of Zhamak's book on Data Mesh further provoked conversation, as have the growing number of experience reports from companies that have put Data Mesh into practice.
So what's all the fuss about?
On one hand, Data Mesh is a new approach in the field of big data. On the other hand, Data Mesh is application of the lessons we have learned from domain-driven design and microservices to a data context.
In this talk, Chris and Pablo will explain how Data Mesh relates to current thinking in software architecture and the historical development of data architecture philosophies. They will outline what benefits Data Mesh brings, what trade-offs it comes with and when organisations should and should not consider adopting it.
Analytics play a critical role in supporting strategic business initiatives. Despite the obvious value to analytic professionals of providing the analytics for these initiatives, many executives question the economic return of analytics as well as data lakes, machine learning, master data management, and the like.
Technology professionals need to calculate and present business value in terms business executives can understand. Unfortunately, most IT professionals lack the knowledge required to develop comprehensive cost-benefit analyses and return on investment (ROI) measurements.
This session provides a framework to help technology professionals research, measure, and present the economic value of a proposed or existing analytics initiative, no matter the form that the business benefit arises. The session will provide practical advice about how to calculate ROI and the formulas, and how to collect the necessary information.
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
In order to find value in your organization's data assets, heroic data stewards are tasked with saving the day- every single day! These heroes adhere to a data governance framework and work to ensure that data is: captured right the first time, validated through automated means, and integrated into business processes. Whether its data profiling or in depth root cause analysis, data stewards can be counted on to ensure the organization's mission critical data is reliable. In this webinar we will approach this framework, and punctuate important facets of a data steward’s role.
Learning Objectives:
- Understand the business need for a data governance framework
- Learn why embedded data quality principles are an important part of system/process design
- Identify opportunities to help drive your organization to a data driven culture
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
We have seen vast improvements to data collection, storage, processing and transport in recent years. An increasing number of networked devices are emitting data and all of us are preparing to handle this wave of valuable data.
Have we, as data professionals, been too focused on the technical challenges and analytical results?
What about the data quality? Are we confident about it? How can we be sure we are making good decisions?
We need to revisit methods of assessing data quality on our modernized data platforms. The quality of our decision making depends on it.
Data warehousing and business intelligence project reportsonalighai
Developed Data warehouse project with a structured, semi-structured and unstructured sources of data
and generated Business Intelligence reports. Topic for the project was Tobacco products consumption in
America. Studied on which products are more famous among people across and also got to know that
middle school students are the soft targets for the tobacco companies as maximum people start taking
tobacco products at this age.
Tools used: SSMS, SSIS, SSAS, SSRS, R-Studio, Power BI, Excel
Purpose of this presentation is to highlight how end to end machine learning looks like in real world enterprise. This is to provide insight to aspiring data scientist who have been through courses or education in ML that mostly focus on ML algorithms and not end to end pipeline.
Architecture and components mentioned in Slide 11 will be discussed in detailed in series of post on LinkedIn over the course of next few month
To get updates on this follow me on LinkedIn or search/follow hashtag #end2endDS. Post will be active in August 2019 and will be posted till September 2019
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
The Briefing Room with Dr. Robin Bloor and WhereScape
Live Webcast on April 1, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=7b23b14b532bd7be60a70f6bd5209f03
In the Big Data shuffle, everyone is looking at Hadoop as “the answer” to collect interesting data from a new set of sources. While Hadoop has given organizations the power to gather more information assets than ever before, the question still looms: which data, regardless of source, structure, volume and all the rest, are significant for affecting business value – and how do we harness it? One effective approach is to bolster the data warehouse environment with a solution capable of integrating all the data sources, including Hadoop, and automating delivery of key information into the rights hands.
Register for this episode of The Briefing Room to hear veteran Analyst Robin Bloor as he explains how a rapidly changing information landscape impacts data management. He will be briefed by Mark Budzinski of WhereScape, who will tout his company’s data warehouse automation solutions. Budzinski will discuss how automation can be the cornerstone for closing the gap between those responsible for data management and the people driving business decisions.
Visit InsideAnlaysis.com for more information.
Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
Why Data Virtualization? An Introduction by DenodoJusto Hidalgo
Data Virtualization means Real-time Data Access and Integration. But why do I need it? This presentation tries to answer it in a simple yet clear way.
By Alberto Pan, CTO of Denodo, and Justo Hidalgo, VP Product Management.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
A whitepaper about how the evolving data engineering profession helps data-driven companies work smarter and lower cloud costs with Qubole.
https://www.qubole.com/resources/white-papers/the-evolving-role-of-the-data-engineer
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
3. Who am I?
My name is
Steven, my
preferred
pronoun is “he”
I graduated from UC Berkeley EECS in 2005
This is my second term in Yelp (2017 - now)
Last term is 2011 - 2015
I consider myself a generalist in the field
4. Who am I?
I work in team
metrics-data
within
metrics-platform
5. Who am I?
I work in team
metrics-data
within
metrics-platform
6. Data powers
decision making
OnLine Transaction Processing (OLTP)
We use MySQL to power yelp.com
Each transaction interacts with small amount of
data
Display reviews, photos, tips of a business
OLTP queries’ results are expected to return quickly
No one wants to wait for more than 2 seconds for a
business page to load
7. OLTP example:
find the titles an
author has
written. Take
advantage of an
index
https://en.wikipedia.org/wiki/Library_catalog#/media/File:Schlagwortkatalog.jpg
8. Data powers
decision making
Developers want to find out what local business has
the most reviews
Table scan on the review table?
OnLine Analytical Processing (OLAP)
Queries that scan majority of data relative to total
amount of data
Need specialized system to support such queries
Yelp uses AWS Redshift as a data warehouse to
support OLAP queries.
9. OLAP example:
average number
of pages in a
book stored
inside main
stack. Need to
scan all the titles.
https://www.dailycal.org/2013/12/08/best-worst-foods-sneak-main-stacks/
12. Data Fabric We want to avoid n * m programs to transport data
n is the number of source, and m is the number of sink
Domain specific data stores are here to stay
Stonebraker, “One Size Fits All”: An Idea Whose Time
Has Come and Gone”
Stream-Table Duality
We can formulate the transport of data as streams
17. Benefits
Connector
Ecosystem
Lower the barrier of entry
It’s easy to move data between data stores
High performance implementation
Each data store has its own performance
characteristics.
Streams-processing over batch processing
Near real-time data availability
19. Lesson Learned
Connector
Ecosystem
Schematized data is good
Lessen the likelihood of malformed data
Schema evolution can be difficult
Making incompatible schema change can break many
things. Discourage them in registration phase.
Decouple data producers and data consumers
We need automation to inform data producers how to
manage data life cycle as producers do not think about
who uses the data.
21. Desirable
Improvements
Data Producers should own their data life cycle
Specific connector owner does not have visibility of
data semantics.
Data Consumers are stakeholders
Consumers don’t want to out incompatible changes
after its been rolled out.
Self-serve mechanism accelerates changes
The only way to rapidly evolves is to self-serve
22. Data Mesh Data specifications are like microservices APIs
They are contracts between producers and consumers
Each team owns their data specifications
To avoid accidentally abstraction leakage
Decentralization allows rapid experiments
Common conventions are promoted to minimize
frictions among different domain systems
24. yelp.com/dataset_challenge
Academic
dataset from 10
cities across the
globe!
Your academic project, research or visualizations
submitted by December 31, 2019
=
a $5,000 prize* !
*See full terms on website
6M reviews
1M business attributes
190K businesses
200K photos