The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data warehouses, data quality can be hard. How can we ensure high quality in a cloud-based, internet-scale, modern big data warehouse employing a variety of data engineering technologies?
In this talk, Michelle Ufford will share how the Data Engineering & Analytics team at Netflix is doing exactly that. We’ll kick things off with a quick overview of Netflix’s analytics environment, then dig into the architecture of our current data quality solution. We’ll cover what worked, what didn’t work so well, and what we're working on next. We’ll conclude with some tips & lessons learned for ensuring high quality on big data.
This talk was presented at DataWorks/Hadoop Summit 2017 on June 13, 2017.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
Apache Kafka With Spark Structured Streaming With Emma LIU, Nitin Saksena, Ram Dhakne | Current 2022
A well-architected data lakehouse provides an open data platform that combines streaming with data warehousing, data engineering, data science and ML. This opens a world beyond streaming to solving business problems in real-time with analytics and AI. See how companies like Albertsons have used Databricks and Confluent together to combine Kafka streaming with Databricks for their digital transformation.
In this talk, you will learn:
- The built-in streaming capabilities of a lakehouse
- Best practices for integrating Kafka with Spark Structured Streaming
- How Albertsons architected their data platform for real-time data processing and real-time analytics
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
A View on AI in Insurance - Chris Madsen - H2O AI World London 2018Sri Ambati
This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/LFVIGMMlfhI
A view on what is driving AI and ML developments in insurance and why.
• What is driving the change in insurance and why is AI/ML so important?
• What does the future look like?
• Which AI/ML use cases are being worked on in the industry?
• Which ones are needed?
Chris Madsen is Chairman and CEO of Blue Square Re N.V., Aegon’s internal reinsurer and a company he co-founded in 2010.
Mr. Madsen holds a Masters in Engineering from Princeton University in Princeton, USA. His undergraduate degree is in Mathematics and Economics. He is an Associate of the Society of Actuaries, a Member of the American Academy of Actuaries and a Chartered Financial Analyst.
He started his professional career in New York in 1990, working as Consulting Actuary and later Principal. Mr. Madsen has published numerous articles on innovative underwriting risk solutions and is a frequent speaker on the topic and related developments.
Mr. Madsen is an avid proponent and driver of integrating start-up and insurtech expertise into insurance solutions - including internet-of-things applications as well as blockchain initiatives such as “B3i”. He is also responsible for the ground-breaking longevity solutions that Aegon brought to the capital markets totalling over EUR 20bn of reserves.
Critical Relationships for HR Professionals to Mitigate Risks and Navigate Ch...Aggregage
HR professionals know that risk mitigation is a complicated subject to broach. As COVID continues to create new opportunities and risks for organizations, HR professionals must be prepared to balance the needs of the workforce with the needs of the organization. The pace of certain changes in technology, however, has made it hard to support the organization in isolation. Wouldn't you like to better understand which changes make your role more effective?
Whether you support an in-office, hybrid, or fully remote workforce, there are measures and technologies to implement that can help create a more inclusive and resilient organization, reduce the risk of workplace violence, and bolster safety.
Join Kayne McGladrey, CISSP, Cybersecurity Strategist at Ascent Solutions for this discussion on the intersection of ethics and technology. In this session, you will learn how to mitigate risks by building key relationships with three different parts of your organization:
• The digital collaboration team
• The cybersecurity team
• The executive leadership team
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data warehouses, data quality can be hard. How can we ensure high quality in a cloud-based, internet-scale, modern big data warehouse employing a variety of data engineering technologies?
In this talk, Michelle Ufford will share how the Data Engineering & Analytics team at Netflix is doing exactly that. We’ll kick things off with a quick overview of Netflix’s analytics environment, then dig into the architecture of our current data quality solution. We’ll cover what worked, what didn’t work so well, and what we're working on next. We’ll conclude with some tips & lessons learned for ensuring high quality on big data.
This talk was presented at DataWorks/Hadoop Summit 2017 on June 13, 2017.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
Apache Kafka With Spark Structured Streaming With Emma LIU, Nitin Saksena, Ram Dhakne | Current 2022
A well-architected data lakehouse provides an open data platform that combines streaming with data warehousing, data engineering, data science and ML. This opens a world beyond streaming to solving business problems in real-time with analytics and AI. See how companies like Albertsons have used Databricks and Confluent together to combine Kafka streaming with Databricks for their digital transformation.
In this talk, you will learn:
- The built-in streaming capabilities of a lakehouse
- Best practices for integrating Kafka with Spark Structured Streaming
- How Albertsons architected their data platform for real-time data processing and real-time analytics
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
A View on AI in Insurance - Chris Madsen - H2O AI World London 2018Sri Ambati
This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/LFVIGMMlfhI
A view on what is driving AI and ML developments in insurance and why.
• What is driving the change in insurance and why is AI/ML so important?
• What does the future look like?
• Which AI/ML use cases are being worked on in the industry?
• Which ones are needed?
Chris Madsen is Chairman and CEO of Blue Square Re N.V., Aegon’s internal reinsurer and a company he co-founded in 2010.
Mr. Madsen holds a Masters in Engineering from Princeton University in Princeton, USA. His undergraduate degree is in Mathematics and Economics. He is an Associate of the Society of Actuaries, a Member of the American Academy of Actuaries and a Chartered Financial Analyst.
He started his professional career in New York in 1990, working as Consulting Actuary and later Principal. Mr. Madsen has published numerous articles on innovative underwriting risk solutions and is a frequent speaker on the topic and related developments.
Mr. Madsen is an avid proponent and driver of integrating start-up and insurtech expertise into insurance solutions - including internet-of-things applications as well as blockchain initiatives such as “B3i”. He is also responsible for the ground-breaking longevity solutions that Aegon brought to the capital markets totalling over EUR 20bn of reserves.
Critical Relationships for HR Professionals to Mitigate Risks and Navigate Ch...Aggregage
HR professionals know that risk mitigation is a complicated subject to broach. As COVID continues to create new opportunities and risks for organizations, HR professionals must be prepared to balance the needs of the workforce with the needs of the organization. The pace of certain changes in technology, however, has made it hard to support the organization in isolation. Wouldn't you like to better understand which changes make your role more effective?
Whether you support an in-office, hybrid, or fully remote workforce, there are measures and technologies to implement that can help create a more inclusive and resilient organization, reduce the risk of workplace violence, and bolster safety.
Join Kayne McGladrey, CISSP, Cybersecurity Strategist at Ascent Solutions for this discussion on the intersection of ethics and technology. In this session, you will learn how to mitigate risks by building key relationships with three different parts of your organization:
• The digital collaboration team
• The cybersecurity team
• The executive leadership team
Globally, the extensive use of smartphone devices has led to an increase in storage and transmission of enormous volumes of data that could be potentially be used as digital evidence in a forensic investigation. Digital evidence can sometimes be difficult to extract from these devices given the various versions and models of smartphone devices in the market. Forensic analysis of smartphones to extract digital evidence can be carried out in many ways, however, prior knowledge of smartphone forensic tools is paramount to a successful forensic investigation. In this paper, the authors outline challenges, limitations and reliability issues faced when using smartphone device forensic tools and accompanied forensic techniques. The main objective of this paper is intended to be consciousness-raising than suggesting best practices to these forensic work challenges.
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...Coert Du Plessis (杜康)
WA is in a state of rapid transformation with the changes in Energy, Resources and support industries. At ADMA WA's 2015 annual conference, we explored why disruptive data activity in Marketing and Advertising is lagging the East Coast and Global stage
Improving Life With Connected Medical DevicesAryanRaj496746
IoT and 5G has bring a technological reform in every field. Healthcare sector is one such field which got most benefitted with these technologies. In this document, we have discussed about benefits of IoT in medical and advantages of connected medical devices. We have also discussed about market prospective and future opportunity of this technology in healthcare taking reference from Delvens' market analysis report of global medical device connectivity market.
Preparing Testimony about Cellebrite UFED In a Daubert or Frye HearingCellebrite
The Cellebrite UFED is among the best known and most used mobile forensic extraction and analysis tools in the digital forensics industry. However, its complex technical processes are not as well understood outside of training. The following information is presented in an effort to help U.S.-based attorneys prepare themselves and their witnesses for Daubert, Frye, or related challenges to the admissibility of UFED-extracted mobile device evidence.
Architecting, designing and building medical devices in an outcomes focused B...Shahid Shah
Keeping your medical device designs relevant in an era of value based and outcome driven care is not easy. In this talk, I cover the following topics:
* “Connected EHRs”, device interoperability, and “Accountable Tech” are the future of med devices
* Hardware, sensors, and software are transient businesses but data lives forever. He who owns, integrates, and uses data wins in the end.
* Data from devices is too important and specialized to be left to software vendors, managed service providers, and system integrators.
Outline
Value Based Healthcare System – How it is seen today
Healthcare Challenge & IoT as a Solution
IoT – Big Data Structure
Recent Trends in IoT Big Data Analytics
Challenges & Our Future
In-depth Knowledge of
What causes the most premature death?
Distribution of Disease burden from 1990 - 2020
Challenges in Healthcare
Future Healthcare
IoT Machine Talking to Machine
Prediction of IoT Usage
About PEPGRA HEALTHCARE,
A leading healthcare communication firm with years of excellence serving clients with a dedicated team of Medical, Regulatory and Scientific writers specialized in all therapeutic areas.
Contact us at :
UK: +44-1143520021
US/Canada: +1-972-502-9262
India: +91-8754446690
info@pepgra.com
www.pepgra.com
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.
Democratizing Data Quality Through a Centralized PlatformDatabricks
Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.
At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:
Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal
Performing data quality validations using libraries built to work with spark
Dynamically generating pipelines that can be abstracted away from users
Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
Why APM Is Not the Same As ML MonitoringDatabricks
Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications.
As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored.
In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs.
There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs.
The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks.
Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model?
The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity.
The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters.
In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal.
In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark.
Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations.
Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.
Sawtooth Windows for Feature AggregationsDatabricks
In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.
We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark.
Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue
· Why?
o Custom queries on top a table; We load the data once and query N times
· Why not Structured Streaming
· Working Solution using Redis
Niche 2 : Distributed Counters
· Problems with Spark Accumulators
· Utilize Redis Hashes as distributed counters
· Precautions for retries and speculative execution
· Pipelining to improve performance
Re-imagine Data Monitoring with whylogs and SparkDatabricks
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components.
We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure.
This allows us to introduce optimization rules that
(i) reduce unnecessary computations by passing information between the data processing and ML operators
(ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and
(iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator.
We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis.
Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them.
Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy.
This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.
Massive Data Processing in Adobe Using Delta LakeDatabricks
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used
(String FTW)
Data Manipulation using UDFs
Writer Worries and How to Wipe them Away
Staging Tables FTW
Datalake Replication Lag Tracking
Performance Time!
Machine Learning CI/CD for Email Attack DetectionDatabricks
Detecting advanced email attacks at scale is a challenging ML problem, particularly due to the rarity of attacks, adversarial nature of the problem, and scale of data. In order to move quickly and adapt to the newest threat we needed to build a Continuous Integration / Continuous Delivery pipeline for the entire ML detection stack. Our goal is to enable detection engineers and data scientists to make changes to any part of the stack including joined datasets for hydration, feature extraction code, detection logic, and develop/train ML models.
In this talk, we discuss why we decided to build this pipeline, how it is used to accelerate development and ensure quality, and dive into the nitty-gritty details of building such a system on top of an Apache Spark + Databricks stack.
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
Sarah: CEO-Finance-Report pipeline seems to be slow today. Why
Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days.
Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours.
Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert.
We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example:
Tom: I am not seeing any data for today in my Campaign Metrics Dashboard.
Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021.
This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.
But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?
Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models?
Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case?
Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark?
Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize?
In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints
Fugue-Tune like Fugue, is a “super framework” – an absraction layer unifying existing solutions such as Hyperopt and Optuna
It firstly models the general tuning problems, independent from machine learning
It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem
It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark.
In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.
When it comes to Large Scale data processing and Machine Learning, Apache Spark is no doubt one of the top battle-tested frameworks out there for handling batched or streaming workloads. The ease of use, built-in Machine Learning modules, and multi-language support makes it a very attractive choice for data wonks. However bootstrapping and getting off the ground could be difficult for most teams without leveraging a Spark cluster that is already pre-provisioned and provided as a managed service in the Cloud, while this is a very attractive choice to get going, in the long run, it could be a very expensive option if it’s not well managed.
As an alternative to this approach, our team has been exploring and working a lot with running Spark and all our Machine Learning workloads and pipelines as containerized Docker packages on Kubernetes. This provides an infrastructure-agnostic abstraction layer for us, and as a result, it improves our operational efficiency and reduces our overall compute cost. Most importantly, we can easily target our Spark workload deployment to run on any major Cloud or On-prem infrastructure (with Kubernetes as the common denominator) by just modifying a few configurations.
In this talk, we will walk you through the process our team follows to make it easy for us to run a production deployment of our Machine Learning workloads and pipelines on Kubernetes which seamlessly allows us to port our implementation from a local Kubernetes set up on the laptop during development to either an On-prem or Cloud Kubernetes environment
Improving Apache Spark for Dynamic Allocation and Spot InstancesDatabricks
This presentation will explore the new work in Spark 3.1 adding the concept of graceful decommissioning and how we can use this to improve Spark’s performance in both dynamic allocation and spot/preemptable instances. Together we’ll explore how Spark’s dynamic allocation has evolved over time, and why the different changes have been needed. We’ll also look at the multi-company collaboration that resulted in being able to deliver this feature and I’ll end with encouraging pointers on how to get more involved in Spark’s development.
The session includes a brief history of the evolution of search before diving into the roles technology, content, and links play in developing a powerful SEO strategy in a world of Generative AI and social search. Discover how to optimize for TikTok searches, Google's Gemini, and Search Generative Experience while developing a powerful arsenal of tools and templates to help maximize the effectiveness of your SEO initiatives.
Key Takeaways:
Understand how search engines work
Be able to find out where your users search
Know what is required for each discipline of SEO
Feel confident creating an SEO Plan
Confidently measure SEO performance
When most people in the industry talk about online or digital reputation management, what they're really saying is Google search and PPC. And it's usually reactive, left dealing with the aftermath of negative information published somewhere online. That's outdated. It leaves executives, organizations and other high-profile individuals at a high risk of a digital reputation attack that spans channels and tactics. But the tools needed to safeguard against an attack are more cybersecurity-oriented than most marketing and communications professionals can manage. Business leaders Leaders grasp the importance; 83% of executives place reputation in their top five areas of risk, yet only 23% are confident in their ability to address it. To succeed in 2024 and beyond, you need to turn online reputation on its axis and think like an attacker.\
Key Takeaways:
- New framework for examining and safeguarding an online reputation
- Tools and techniques to keep you a step ahead
- Practical examples that demonstrate when to act, how to act and how to recover
How to Run Landing Page Tests On and Off Paid Social PlatformsVWO
Join us for an exclusive webinar featuring Mariate, Alexandra and Nima where we will unveil a comprehensive blueprint for crafting a successful paid media strategy focused on landing page testing.With escalating costs in paid advertising, understanding how to maximize each visitor’s experience is crucial for retention and conversion.
This session will dive into the methodologies for executing and analyzing landing page tests within paid social channels, offering a blend of theoretical knowledge and practical insights.
The Pearmill team will guide you through the nuances of setting up and managing landing page experiments on paid social platforms. You will learn about the critical rules to follow, the structure of effective tests, optimal conversion duration and budget allocation.
The session will also cover data analysis techniques and criteria for graduating landing pages.
In the second part of the webinar, Pearmill will explore the use of A/B testing platforms. Discover common pitfalls to avoid in A/B testing and gain insights into analyzing A/B tests results effectively.
It's another new era of digital and marketers are faced with making big bets on their digital strategy. If you are looking at modernizing your tech stack to support your digital evolution, there are a few can't miss (often overlooked) areas that should be part of every conversation. We'll cover setting your vision, avoiding siloes, adding a democratized approach to data strategy, localization, creating critical governance requirements and more. Attendees will walk away with actions they can take into initiatives they are running today and consider for the future.
For too many years marketing and sales have operated in silos...while in some forward thinking companies, the two organizations work together to drive new opportunity development and revenue. This session will explore the lessons learned in that beautiful dance that can occur when marketing and sales work together...to drive new opportunity development, account expansion and customer satisfaction.
No, this is not a conversation about MQLs and SQLs. Instead we will focus on a framework that allows the two organizations to drive company success together.
Mastering Local SEO for Service Businesses in the AI Era is tailored specifically for local service providers like plumbers, dentists, and others seeking to dominate their local search landscape. This session delves into leveraging AI advancements to enhance your online visibility and search rankings through the Content Factory model, designed for creating high-impact, SEO-driven content. Discover the Dollar-a-Day advertising strategy, a cost-effective approach to boost your local SEO efforts and attract more customers with minimal investment. Gain practical insights on optimizing your online presence to meet the specific needs of local service seekers, ensuring your business not only appears but stands out in local searches. This concise, action-oriented workshop is your roadmap to navigating the complexities of digital marketing in the AI age, driving more leads, conversions, and ultimately, success for your local service business.
Key Takeaways:
Embrace AI for Local SEO: Learn to harness the power of AI technologies to optimize your website and content for local search. Understand the pivotal role AI plays in analyzing search trends and consumer behavior, enabling you to tailor your SEO strategies to meet the specific demands of your target local audience. Leverage the Content Factory Model: Discover the step-by-step process of creating SEO-optimized content at scale. This approach ensures a steady stream of high-quality content that engages local customers and boosts your search rankings. Get an action guide on implementing this model, complete with templates and scheduling strategies to maintain a consistent online presence. Maximize ROI with Dollar-a-Day Advertising: Dive into the cost-effective Dollar-a-Day advertising strategy that amplifies your visibility in local searches without breaking the bank. Learn how to strategically allocate your budget across platforms to target potential local customers effectively. The session includes an action guide on setting up, monitoring, and optimizing your ad campaigns to ensure maximum impact with minimal investment.
Videos are more engaging, more memorable, and more popular than any other type of content out there. That’s why it’s estimated that 82% of consumer traffic will come from videos by 2025.
And with videos evolving from landscape to portrait and experts promoting shorter clips, one thing remains constant – our brains LOVE videos.
So is there science behind what makes people absolutely irresistible on camera?
The answer: definitely yes.
In this jam-packed session with Stephanie Garcia, you’ll get your hands on a steal-worthy guide that uncovers the art and science to being irresistible on camera. From body language to words that convert, she’ll show you how to captivate on command so that viewers are excited and ready to take action.
When most people in the industry talk about online or digital reputation management, what they're really saying is Google search and PPC. And it's usually reactive, left dealing with the aftermath of negative information published somewhere online. That's outdated. It leaves executives, organizations and other high-profile individuals at a high risk of a digital reputation attack that spans channels and tactics. But the tools needed to safeguard against an attack are more cybersecurity-oriented than most marketing and communications professionals can manage. Business leaders Leaders grasp the importance; 83% of executives place reputation in their top five areas of risk, yet only 23% are confident in their ability to address it. To succeed in 2024 and beyond, you need to turn online reputation on its axis and think like an attacker.
Key Takeaways:
- New framework for examining and safeguarding an online reputation
- Tools and techniques to keep you a step ahead
- Practical examples that demonstrate when to act, how to act and how to recover
Most small businesses struggle to see marketing results. In this session, we will eliminate any confusion about what to do next, solving your marketing problems so your business can thrive. You’ll learn how to create a foundational marketing OS (operating system) based on neuroscience and backed by real-world results. You’ll be taught how to develop deep customer connections, and how to have your CRM dynamically segment and sell at any stage in the customer’s journey. By the end of the session, you’ll remove confusion and chaos and replace it with clarity and confidence for long-term marketing success.
Key Takeaways:
• Uncover the power of a foundational marketing system that dynamically communicates with prospects and customers on autopilot.
• Harness neuroscience and Tribal Alignment to transform your communication strategies, turning potential clients into fans and those fans into loyal customers.
• Discover the art of automated segmentation, pinpointing your most lucrative customers and identifying the optimal moments for successful conversions.
• Streamline your business with a content production plan that eliminates guesswork, wasted time, and money.
Financial curveballs sent many American families reeling in 2023. Household budgets were squeezed by rising interest rates, surging prices on everyday goods, and a stagnating housing market. Consumers were feeling strapped. That sentiment, however, appears to be waning. The question is, to what extent?
To take the pulse of consumers’ feelings about their financial well-being ahead of a highly anticipated election, ThinkNow conducted a nationally representative quantitative survey. The survey highlights consumers’ hopes and anxieties as we move into 2024. Let's unpack the key findings to gain insights about where we stand.
Mastering Local SEO for Service Businesses in the AI Era is tailored specifically for local service providers like plumbers, dentists, and others seeking to dominate their local search landscape. This session delves into leveraging AI advancements to enhance your online visibility and search rankings through the Content Factory model, designed for creating high-impact, SEO-driven content. Discover the Dollar-a-Day advertising strategy, a cost-effective approach to boost your local SEO efforts and attract more customers with minimal investment. Gain practical insights on optimizing your online presence to meet the specific needs of local service seekers, ensuring your business not only appears but stands out in local searches. This concise, action-oriented workshop is your roadmap to navigating the complexities of digital marketing in the AI age, driving more leads, conversions, and ultimately, success for your local service business.
Key Takeaways:
Embrace AI for Local SEO: Learn to harness the power of AI technologies to optimize your website and content for local search. Understand the pivotal role AI plays in analyzing search trends and consumer behavior, enabling you to tailor your SEO strategies to meet the specific demands of your target local audience. Leverage the Content Factory Model: Discover the step-by-step process of creating SEO-optimized content at scale. This approach ensures a steady stream of high-quality content that engages local customers and boosts your search rankings. Get an action guide on implementing this model, complete with templates and scheduling strategies to maintain a consistent online presence. Maximize ROI with Dollar-a-Day Advertising: Dive into the cost-effective Dollar-a-Day advertising strategy that amplifies your visibility in local searches without breaking the bank. Learn how to strategically allocate your budget across platforms to target potential local customers effectively. The session includes an action guide on setting up, monitoring, and optimizing your ad campaigns to ensure maximum impact with minimal investment.
In this presentation, Danny Leibrandt explains the impact of AI on SEO and what Google has been doing about it. Learn how to take your SEO game to the next level and win over Google with his new strategy anyone can use. Get actionable steps to rank your name, your business, and your clients on Google - the right way.
Key Takeaways:
1. Real content is king
2. Find ways to show EEAT
3. Repurpose across all platforms
Digital Money Maker Club – von Gunnar Kessler digital.focsh890
Title One is a comprehensive examination of the impact of digital technologies on
modern society. In a world where technology continues to advance rapidly, this article delves into the nuances and complexities of the digital age, exploring Its implications across various sectors and aspects of life.
SEO as the Backbone of Digital MarketingFelipe Bazon
In this talk Felipe Bazon will share how him and his team at Hedgehog Digital share our journey of making C-Levels alike, specially CMOS realize that SEO is the backbone of digital marketing by showing how SEO can contribute to brand awareness, reputation and authority and above all how to use SEO to create more robust global marketing strategies.
4. Structured
data Usually transaction based
record
key attribute
index
Bank transactions
Point of sale
Telephone call
Payments made
Payments received
…………………….
9. Analog/IoT
telemetry
Date Sept 2, 2021
Time 11:21 am
Location from Denver
Location to Co Spgs
Elevation
786
792
812
854
901
978
1012
1256
1469
1672
2018
2259
2871
……..
Speed
0
35
79
124
197
276
367
416
521
702
835
915
…..
Telemetry data is generated as the
rocket is launched and is measured
throughout the flight
10. Analog/IoT
The data lake is created by throwing data
all the data into the lake
Textual
data
Structured
data
16. Analog/IoT
Machine generated
Time – 0912
Time – 0916
Time – 1002
Time – 1008
Time – 1017
…………….
Basic, raw measurements
High probability
High performance
Low probability
Bulk storage
Analog/IoT data is often
segmented
17. High probability
High performance
Low probability
Bulk storage
Date of launch
Ultimate speed
Ultimate height
Final landing point
Second by second
measurements
34. What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
Analyses –
how does treatment in Utah vary from treatment in Oregon?
is Prolia more effective than estrogen?
when patients are treated with Algaecal, what other side effects are noticed?
do women have better results than men?
how much does age affect –
the types of treatment for osteoporosis
the effectiveness of treatment
whether men react differently than women
35. What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
When you have both treatment and outcome data together, you can
answer – for the first time – important questions about treatment,
medication, dosage, side, effects, demographics of treatment
You can match outcome with treatment
36. What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
The result is healthier people
and longer life and better quality
of life
40. Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
Analyses –
what manufacturing machines are producing defects
what manufacturing machines are not producing defects
what operators are producing defects
what operators are not producing defects
what telemetry needs to be adjusted
under what conditions are defects created
…………………………………………………………
41. Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
With all of this data together and able to be analyzed
you can now tell what defects can be corrected and what
conditions cause defects to occur. The manufacturing
process can be materially improved
42. Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
Now manufacturing can be done
efficiently and in a cost effective
manner
43. With analytics from the data lakehouse, you can improve the
lives and livelihood of many people