Consistent across time, sports organizations strive for success and to be the best. However, winning on the field is not sufficient to succeed as an organization as sports entertainment evolves.
Preventing Abuse Using Unsupervised LearningDatabricks
Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels.
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7Manju Devadas
This document summarizes the backgrounds and proposed presentation by Manju Devadas and Salil Amonkar of Pluto7. Manju has over 25 years of experience in business process transformation, IT strategy, and management consulting. Salil has over 20 years of supply chain and technology experience. Their proposed presentation will provide an introduction to machine learning and supply chain use cases, including forecasting, predictive analytics, and price optimization. It will also demonstrate visualizing insights from machine learning models in Tableau.
This document provides an overview of Microsoft's Azure Machine Learning solution. It discusses why machine learning is important, describes the Cortana Analytics Suite and its components, and provides a high-level view of Azure Machine Learning. It then demonstrates Azure ML Studio with a sample demo and discusses market trends in machine learning platforms and vendors.
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
This document provides guidance on setting up predictive analytics. It recommends being proactive rather than reactive, understanding how to acquire and analyze the right data, and creating a diverse team that includes various domain expertise. It also stresses the importance of performance, knowing your tools, and addressing challenges like deciding which analytics tools to use, combining data sources, and collecting data while maintaining privacy and integrity. The goal is to figure out what the existing data reveals, agree on business problems, identify useful predictions, and build an iterative pipeline to feed predictive algorithms.
Machine Learning Application to Manufacturing using Tableau, Tableau and Goog...Manju Devadas
This document discusses using machine learning to improve forecast accuracy. It notes that profitability depends on having the right product quantities in the right places at the right times. Machine learning can derive demand insights from various internal and external data sources to enhance forecasting. The document outlines steps for data preparation, model building using machine learning algorithms, and outputting results for analysis and integration into forecasting and business processes. Case studies show improvements in key metrics like ROI, revenue, and forecast accuracy when combining machine learning and predictive analytics with demand management.
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt
Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry. Join in to learn how Two Sigma, a leading quantitative investment and technology firm, solved its model optimization problem.
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi RenSri Ambati
1) Comcast uses machine learning for personalized recommendations on their X1 platform and to predict trending shows.
2) They apply machine learning to identify customer service issues that could potentially be avoided by a truck roll through predictive modeling.
3) Comcast also uses machine learning to develop a customer experience metric to help prioritize network deployments and understand customer needs.
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt
Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.
Preventing Abuse Using Unsupervised LearningDatabricks
Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels.
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7Manju Devadas
This document summarizes the backgrounds and proposed presentation by Manju Devadas and Salil Amonkar of Pluto7. Manju has over 25 years of experience in business process transformation, IT strategy, and management consulting. Salil has over 20 years of supply chain and technology experience. Their proposed presentation will provide an introduction to machine learning and supply chain use cases, including forecasting, predictive analytics, and price optimization. It will also demonstrate visualizing insights from machine learning models in Tableau.
This document provides an overview of Microsoft's Azure Machine Learning solution. It discusses why machine learning is important, describes the Cortana Analytics Suite and its components, and provides a high-level view of Azure Machine Learning. It then demonstrates Azure ML Studio with a sample demo and discusses market trends in machine learning platforms and vendors.
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
This document provides guidance on setting up predictive analytics. It recommends being proactive rather than reactive, understanding how to acquire and analyze the right data, and creating a diverse team that includes various domain expertise. It also stresses the importance of performance, knowing your tools, and addressing challenges like deciding which analytics tools to use, combining data sources, and collecting data while maintaining privacy and integrity. The goal is to figure out what the existing data reveals, agree on business problems, identify useful predictions, and build an iterative pipeline to feed predictive algorithms.
Machine Learning Application to Manufacturing using Tableau, Tableau and Goog...Manju Devadas
This document discusses using machine learning to improve forecast accuracy. It notes that profitability depends on having the right product quantities in the right places at the right times. Machine learning can derive demand insights from various internal and external data sources to enhance forecasting. The document outlines steps for data preparation, model building using machine learning algorithms, and outputting results for analysis and integration into forecasting and business processes. Case studies show improvements in key metrics like ROI, revenue, and forecast accuracy when combining machine learning and predictive analytics with demand management.
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt
Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry. Join in to learn how Two Sigma, a leading quantitative investment and technology firm, solved its model optimization problem.
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi RenSri Ambati
1) Comcast uses machine learning for personalized recommendations on their X1 platform and to predict trending shows.
2) They apply machine learning to identify customer service issues that could potentially be avoided by a truck roll through predictive modeling.
3) Comcast also uses machine learning to develop a customer experience metric to help prioritize network deployments and understand customer needs.
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt
Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.
Learn more about a world beyond CRM suites and how your company can build the customer data technology stack that matches the reality of today’s multi-channel, digital era.
Venyoo DECK for location aware data analyticsTeamVenyoo
Venyoo is an early stage tech co. focused on location based big data analytics in the live event space, based out of San Francisco/LA. We have built a strong team internally and have an all star group of advisors (CTO of Ticketmaster/Ex-Google/Twitter) leading the way.
Additional information on our company:
Venyoo is an enterprise mobile platform (www.venyoo.co). Our technology solution enables customers to gather extensive data on their fans and visitors while offering maps as a utility option.
Venyoo's first and flagship customer is the New England Patriots (www.patriots.com).
Big Data & Analytics 101: How Customer Lifetime Value Enhances Predictive Mar...Big Cloud Analytics, Inc.
The document discusses how organizations can fund their big data and analytics initiatives through incremental revenues, cost savings, and more effective marketing spending. It outlines typical stages in a company's analytics journey and provides case studies showing how analytics has been used to increase revenues by 2.5x, decrease costs by $2,190, and improve customer satisfaction by 63%. The key message is that most companies are not fully leveraging the data they already have and that properly implemented analytics can significantly improve customer engagement and drive strong returns.
Team Tokens is a white-label platform that enables professional sports organizations to create digital experiences for fans using blockchain technology. The platform focuses on using web3 to connect organizations with new audiences and increase fan engagement through features like creating NFT-based collectibles, collaborating with partners to offer NFT rewards, and integrating new technologies into existing programs. The document discusses challenges around fan engagement that Team Tokens can address and provides examples of how the platform can be used to create engaging experiences for fans.
Real-time Single Customer View
Create a single customer view of your prospects and customers with data from your website, mobile apps, social and phone calls. Use the out of the box dashboards to generate advanced and actionable insights based on your customer data.
Global Intelligence Alliance (GIA) is a market intelligence and advisory firm that provides customized solutions and services to help clients understand, compete and grow in international markets. GIA has expertise in 11 industries and offices in 12 countries, enabling it to provide local market knowledge in over 100 countries. GIA's offerings include customized market monitoring, analytical insights and advisory services, on-demand research, intelligence software, and training to help clients strengthen their intelligence operations. Many multinational companies rely on GIA as a trusted partner.
Our solution to the PS4 groundbraker challenge:
"The landscape for sports has changed. Viewers have shorter attention spans, and they now prefer to watch games on-demand.
As a result, how do we diversify the target audience, produce new content, and create new revenue streams to attract further investments into cricket, so as to grow the sport for future generations?"
Winstep marketing research and analytics 1.2Vinay Darp
Winstep provides integrated marketing solutions including marketing research, branding & events, and analytics services. It has 17 years of experience working with clients across diverse industries in India. The company collects and analyzes data to provide insights and recommendations to help clients make better business decisions. It offers various marketing research services including surveys, qualitative research, and data collection & management. Winstep also handles branding and event planning needs. Additionally, the company provides analytics solutions including market intelligence reports, marketing campaign analysis, and social media sentiment analysis using technologies like machine learning.
Automating and Orchestrating Processes and Decisions Across the EnterpriseDenis Gagné
PRESENTED BY
Carl Lehmann – Principal Analyst, 451 Research
Denis Gagne – CEO and CTO, Trisotech
DESCRIPTION
When business processes must execute complex decisions across the enterprise, most process automation platforms and rules management engines fall short. While competent in rules-based process modeling and automation they’re unable to model, automate and orchestrate complex decision-making processes, for example, in areas such as in clinical contexts, insurance risk management, and structured financial services, among others.
451 Research is tracking an emerging class of automation and orchestration technology that is becoming competent in both.
Join us to explore the industry trends driving the need for joint process and decision automation.
The benefits derived from a unified approach to both.
The technical apparatus needed to automate and orchestrate process and decision models.
- Lead scoring is a methodology used to rank marketing leads based on their perceived value. It helps sales and marketing prioritize which leads to engage with.
- Traditional lead scoring relied on limited data and rules-based scoring by contact centers. Modern approaches use machine learning on digital user behavior data from websites and apps combined with CRM data.
- The presentation provides an example of a company that saw a 22% increase in conversion rates and 18x higher return on ad spend by implementing a lead scoring solution combining online behavior data with ML models.
The document provides an overview of the CrickedIn platform, which allows cricketers to create profiles, networks, and share information. Individual users can build profiles and organize matches, while corporate users like clubs can manage players, events, and leagues. The platform aims to connect talent with opportunities through its viral growth engine and identity ecosystem. It seeks global monetization through various marketing solutions and subscriptions. The technology infrastructure supports this professional networking platform through features like a professional graph and data infrastructure.
Are you getting the most out of your data?SAS Canada
Data is an organizations most valuable asset, but raw data by itself has little value. To drive data’s worth, it must be managed and processed to extract value and information that decision makers can leverage and turn into actionable insights. It is the ways in which a company choses to put that information to use that will determine the true value of its data.
Through business intelligence and business analytic tools, businesses are enabling themselves to make more strategic, accurate decisions, while optimizing business processes. Hear from Info-Tech Research Group and learn what you need to consider when choosing an analytics solution provider. The webinar will highlight Info-Tech Research Group’s recently published vendor landscape for selecting and implementing Business Intelligence and Business Analytics solutions. The report positions SAS as the only leader across all four categories of Enterprise BI, Mid-Market BI, Enterprise BA and Mid-Market BA.
This document outlines an agenda for a Retail Media Rundown event taking place on September 13, 2023. It includes four sessions focused on meeting consumer mindsets in Q4, crafting success through audience insights and data-driven strategies, revolutionizing retail strategy with a unified approach, and maximizing impact in Q4 through retail media synergy. Each session is scheduled between 10:00 am and 11:45 am Pacific Time and between 1:00 pm and 2:45 pm Eastern Time.
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...NICSA
With the proliferation of Big Data-oriented technology and its accompanying applications of advanced statistical techniques, asset managers are enabling their sales and marketing teams with more insight into the preferences and proclivities of their clients, both advisors and investors. This webinar will give attendees a general understanding of Big Data’s technologies and techniques especially as they pertain to using predictive analytics for more effective and targeted marketing and distribution.
Desired Outcomes:
Understanding Big Data and how it is enabling adopters to use data more effectively than in the past
Familiarity with some of the technological and analytical approaches Big Data enables
Understanding of attribution models for measuring advisor and investor responsiveness
Knowledge of how to prioritize campaigns and contacts by combining measures of valuation and responsiveness
Grasp of some of the more effective way to adopt predictive analysis for sales and marketing
Understanding basics of recommender systems and how next best action is determined
The document discusses how Cloudera helps customers with their data and analytics journeys. It recommends that customers (1) build a data-driven culture, (2) assemble the right cross-functional team, and (3) adopt an agile approach to data projects by starting small and iterating often. Successful customers operationalize insights efficiently and implement data governance appropriately for their needs and maturity.
6D Global's analytics team provides clients with support through all stages of the digital analytics lifecycle to deliver data that gives clients a deeper understanding of their customers. They help clients maximize return on investment through data-driven decision making. Using optimization and testing on digital analytics platforms, their technical experts generate valuable data to show clients how initiatives are performing and help clients understand how to increase performance and profitable spending.
VENYOO deck for location aware data analyticsTeamVenyoo
Venyoo is developing a cloud-based mobile mapping and data analytics platform to enhance the fan experience at live events. Their solution will provide venues with hyper-local maps, location-based analytics, and tools to identify and learn more about attendees. This will help venues improve operations, personalize experiences, and potentially increase revenues. Venyoo is seeking $750,000 in funding to further develop their product and platform, hire additional staff, and expand sales and marketing efforts to onboard new customers. They have an initial customer deployment with the New England Patriots and are building a strategic pipeline and growth plan to capitalize on the large market opportunity in sports, entertainment and other venues.
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
Data Lakehouse Symposium | Day 1 | Part 1Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
More Related Content
Similar to Using Machine Learning to Evolve Sports Entertainment
Learn more about a world beyond CRM suites and how your company can build the customer data technology stack that matches the reality of today’s multi-channel, digital era.
Venyoo DECK for location aware data analyticsTeamVenyoo
Venyoo is an early stage tech co. focused on location based big data analytics in the live event space, based out of San Francisco/LA. We have built a strong team internally and have an all star group of advisors (CTO of Ticketmaster/Ex-Google/Twitter) leading the way.
Additional information on our company:
Venyoo is an enterprise mobile platform (www.venyoo.co). Our technology solution enables customers to gather extensive data on their fans and visitors while offering maps as a utility option.
Venyoo's first and flagship customer is the New England Patriots (www.patriots.com).
Big Data & Analytics 101: How Customer Lifetime Value Enhances Predictive Mar...Big Cloud Analytics, Inc.
The document discusses how organizations can fund their big data and analytics initiatives through incremental revenues, cost savings, and more effective marketing spending. It outlines typical stages in a company's analytics journey and provides case studies showing how analytics has been used to increase revenues by 2.5x, decrease costs by $2,190, and improve customer satisfaction by 63%. The key message is that most companies are not fully leveraging the data they already have and that properly implemented analytics can significantly improve customer engagement and drive strong returns.
Team Tokens is a white-label platform that enables professional sports organizations to create digital experiences for fans using blockchain technology. The platform focuses on using web3 to connect organizations with new audiences and increase fan engagement through features like creating NFT-based collectibles, collaborating with partners to offer NFT rewards, and integrating new technologies into existing programs. The document discusses challenges around fan engagement that Team Tokens can address and provides examples of how the platform can be used to create engaging experiences for fans.
Real-time Single Customer View
Create a single customer view of your prospects and customers with data from your website, mobile apps, social and phone calls. Use the out of the box dashboards to generate advanced and actionable insights based on your customer data.
Global Intelligence Alliance (GIA) is a market intelligence and advisory firm that provides customized solutions and services to help clients understand, compete and grow in international markets. GIA has expertise in 11 industries and offices in 12 countries, enabling it to provide local market knowledge in over 100 countries. GIA's offerings include customized market monitoring, analytical insights and advisory services, on-demand research, intelligence software, and training to help clients strengthen their intelligence operations. Many multinational companies rely on GIA as a trusted partner.
Our solution to the PS4 groundbraker challenge:
"The landscape for sports has changed. Viewers have shorter attention spans, and they now prefer to watch games on-demand.
As a result, how do we diversify the target audience, produce new content, and create new revenue streams to attract further investments into cricket, so as to grow the sport for future generations?"
Winstep marketing research and analytics 1.2Vinay Darp
Winstep provides integrated marketing solutions including marketing research, branding & events, and analytics services. It has 17 years of experience working with clients across diverse industries in India. The company collects and analyzes data to provide insights and recommendations to help clients make better business decisions. It offers various marketing research services including surveys, qualitative research, and data collection & management. Winstep also handles branding and event planning needs. Additionally, the company provides analytics solutions including market intelligence reports, marketing campaign analysis, and social media sentiment analysis using technologies like machine learning.
Automating and Orchestrating Processes and Decisions Across the EnterpriseDenis Gagné
PRESENTED BY
Carl Lehmann – Principal Analyst, 451 Research
Denis Gagne – CEO and CTO, Trisotech
DESCRIPTION
When business processes must execute complex decisions across the enterprise, most process automation platforms and rules management engines fall short. While competent in rules-based process modeling and automation they’re unable to model, automate and orchestrate complex decision-making processes, for example, in areas such as in clinical contexts, insurance risk management, and structured financial services, among others.
451 Research is tracking an emerging class of automation and orchestration technology that is becoming competent in both.
Join us to explore the industry trends driving the need for joint process and decision automation.
The benefits derived from a unified approach to both.
The technical apparatus needed to automate and orchestrate process and decision models.
- Lead scoring is a methodology used to rank marketing leads based on their perceived value. It helps sales and marketing prioritize which leads to engage with.
- Traditional lead scoring relied on limited data and rules-based scoring by contact centers. Modern approaches use machine learning on digital user behavior data from websites and apps combined with CRM data.
- The presentation provides an example of a company that saw a 22% increase in conversion rates and 18x higher return on ad spend by implementing a lead scoring solution combining online behavior data with ML models.
The document provides an overview of the CrickedIn platform, which allows cricketers to create profiles, networks, and share information. Individual users can build profiles and organize matches, while corporate users like clubs can manage players, events, and leagues. The platform aims to connect talent with opportunities through its viral growth engine and identity ecosystem. It seeks global monetization through various marketing solutions and subscriptions. The technology infrastructure supports this professional networking platform through features like a professional graph and data infrastructure.
Are you getting the most out of your data?SAS Canada
Data is an organizations most valuable asset, but raw data by itself has little value. To drive data’s worth, it must be managed and processed to extract value and information that decision makers can leverage and turn into actionable insights. It is the ways in which a company choses to put that information to use that will determine the true value of its data.
Through business intelligence and business analytic tools, businesses are enabling themselves to make more strategic, accurate decisions, while optimizing business processes. Hear from Info-Tech Research Group and learn what you need to consider when choosing an analytics solution provider. The webinar will highlight Info-Tech Research Group’s recently published vendor landscape for selecting and implementing Business Intelligence and Business Analytics solutions. The report positions SAS as the only leader across all four categories of Enterprise BI, Mid-Market BI, Enterprise BA and Mid-Market BA.
This document outlines an agenda for a Retail Media Rundown event taking place on September 13, 2023. It includes four sessions focused on meeting consumer mindsets in Q4, crafting success through audience insights and data-driven strategies, revolutionizing retail strategy with a unified approach, and maximizing impact in Q4 through retail media synergy. Each session is scheduled between 10:00 am and 11:45 am Pacific Time and between 1:00 pm and 2:45 pm Eastern Time.
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...NICSA
With the proliferation of Big Data-oriented technology and its accompanying applications of advanced statistical techniques, asset managers are enabling their sales and marketing teams with more insight into the preferences and proclivities of their clients, both advisors and investors. This webinar will give attendees a general understanding of Big Data’s technologies and techniques especially as they pertain to using predictive analytics for more effective and targeted marketing and distribution.
Desired Outcomes:
Understanding Big Data and how it is enabling adopters to use data more effectively than in the past
Familiarity with some of the technological and analytical approaches Big Data enables
Understanding of attribution models for measuring advisor and investor responsiveness
Knowledge of how to prioritize campaigns and contacts by combining measures of valuation and responsiveness
Grasp of some of the more effective way to adopt predictive analysis for sales and marketing
Understanding basics of recommender systems and how next best action is determined
The document discusses how Cloudera helps customers with their data and analytics journeys. It recommends that customers (1) build a data-driven culture, (2) assemble the right cross-functional team, and (3) adopt an agile approach to data projects by starting small and iterating often. Successful customers operationalize insights efficiently and implement data governance appropriately for their needs and maturity.
6D Global's analytics team provides clients with support through all stages of the digital analytics lifecycle to deliver data that gives clients a deeper understanding of their customers. They help clients maximize return on investment through data-driven decision making. Using optimization and testing on digital analytics platforms, their technical experts generate valuable data to show clients how initiatives are performing and help clients understand how to increase performance and profitable spending.
VENYOO deck for location aware data analyticsTeamVenyoo
Venyoo is developing a cloud-based mobile mapping and data analytics platform to enhance the fan experience at live events. Their solution will provide venues with hyper-local maps, location-based analytics, and tools to identify and learn more about attendees. This will help venues improve operations, personalize experiences, and potentially increase revenues. Venyoo is seeking $750,000 in funding to further develop their product and platform, hire additional staff, and expand sales and marketing efforts to onboard new customers. They have an initial customer deployment with the New England Patriots and are building a strategic pipeline and growth plan to capitalize on the large market opportunity in sports, entertainment and other venues.
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
Data Lakehouse Symposium | Day 1 | Part 1Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The document discusses the challenges of modern data, analytics, and AI workloads. Most enterprises struggle with siloed data systems that make integration and productivity difficult. The future of data lies with a data lakehouse platform that can unify data engineering, analytics, data warehousing, and machine learning workloads on a single open platform. The Databricks Lakehouse platform aims to address these challenges with its open data lake approach and capabilities for data engineering, SQL analytics, governance, and machine learning.
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.
Democratizing Data Quality Through a Centralized PlatformDatabricks
Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.
At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:
Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal
Performing data quality validations using libraries built to work with spark
Dynamically generating pipelines that can be abstracted away from users
Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Why APM Is Not the Same As ML MonitoringDatabricks
Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications.
As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored.
In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs.
There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs.
The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks.
Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model?
The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity.
The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters.
In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal.
In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark.
Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations.
Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.
Sawtooth Windows for Feature AggregationsDatabricks
In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.
We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark.
Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue
· Why?
o Custom queries on top a table; We load the data once and query N times
· Why not Structured Streaming
· Working Solution using Redis
Niche 2 : Distributed Counters
· Problems with Spark Accumulators
· Utilize Redis Hashes as distributed counters
· Precautions for retries and speculative execution
· Pipelining to improve performance
Re-imagine Data Monitoring with whylogs and SparkDatabricks
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components.
We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure.
This allows us to introduce optimization rules that
(i) reduce unnecessary computations by passing information between the data processing and ML operators
(ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and
(iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator.
We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis.
Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them.
Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy.
This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.
Massive Data Processing in Adobe Using Delta LakeDatabricks
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used
(String FTW)
Data Manipulation using UDFs
Writer Worries and How to Wipe them Away
Staging Tables FTW
Datalake Replication Lag Tracking
Performance Time!
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
4. About us
4
David Cunningham has more than 20 years of Information Technology
experience supporting advanced technology programs and major
transformational programs within the Department of Defense, Intelligence
Community, Civilian Agencies, and Commercial clients. He is consistently at
the forefront of technology evolutions to drive customer success and meet
critical missions and business objectives. From architecting complex
geographically dispersed systems to evolving and strategizing the transition
into data-driven organizations, David has provided exceptional support to
meet his clients’ demands for industry-leading cloud and data solutions.
DAVID CUNNINGHAM
CEO, Insipher
As the Chief Growth Officer at Atlas Research, Mr. Bang uses his broad
knowledge of the Federal market (Health, Defense, Intelligence, and Civil) to
guide the firm’s growth in existing, adjacent, and new markets. He is a
recognized expert and speaker in health IT, artificial intelligence, data
science and big data, DevSecOps, orchestration, containerization, systems
engineering, the software development lifecycle, cloud computing, product
development, portfolio management, biometrics, acquisition process,
telecommunications, logistics, supply and maintenance, and manufacturing.
YOUNG BANG
Chief Growth Officer, Atlas Research
5. Integrated Sports Analytics
Fan
Engagement
Sponsorship
Game and
Player Analysis
Business
Operations
In today’s environment, all aspects of a sports organization are highly
connected and dependent on each other to drive success on and off the field
6. Fan engagement
▪ By integrating multiple data
sources, organizations can build
an enhanced fan profile that
highlights
▪ Who they are,
▪ Where they come from,
▪ What they bought,
▪ And their interests
▪ Leading to increased fan base and
higher stadium attendance
Analytic Opportunities
▪ Live engagement - earning rewards for concessions /
merchandise
▪ Request songs for in-game playlist and be recognized
▪ Track history of attendance- longest consecutive
streak / awards / discounts
▪ Enhanced targeting of fans that attend and those who
do not
▪ Alumni nights through fan profiling - if certain alumni
are coming, drive higher participation through alumni
nights
7. Sponsorship
▪ By using an enhanced fan profile,
organizations can maximize their
sponsorship opportunities
through
▪ Targeted marketing,
▪ Cross-promotions,
▪ And in-stadium advertising
▪ Leading to increased customer
reach and sales opportunities
Analytic Opportunities
▪ Profile ticket purchases to determine a “relationship”
that would drive sponsor in game engagement
▪ Post-game outreach based on attendance and
concessions
▪ Targeted advertising based on profiles – “target east
side differently than west side based on attendance”
▪ Track activation with non-sports related events - how
many fans are also going to the concert at the
stadium?
8. Game and player analysis
▪ By capturing and analyzing game
footage, player metrics, and
conducting skill assessments
organizations gain a competitive
edge
▪ Leading to success on the field
Analytic Opportunities
▪ Engage fans with the players - beat your favorite
player’s training data for this day and receive a
discount
▪ Pre, during, and post-game analysis using image
recognition and video analysis
▪ Identify “tendencies” of players to find opportunities
to exploit a potential weakness
▪ Use biometrics integrated with game analysis to
identify if a player show tendencies as the game goes
on and their fitness level decreases
9. Business operations
▪ By integrating business systems
(e.g. financial, ticketing, CRM, etc)
organizations can
▪ Immediately identify how one
part of the organization impacts
the other
▪ Leading to increased revenue and
overall success
Analytic Opportunities
▪ Most teams share stadiums with other teams, drive
data integration across teams to increase overall
picture of a stadium’s activity
▪ Integrate concession revenue, with merchandise, with
fan attendance, with in-game statistics to identify
opportunities to increase revenue
▪ Integrate ticketing with sponsors with revenue to
determine return-on-investment
12. Examples to date
▪ Leveraging Spark and
Insipher, we have conducted
analysis within minutes to
▪ Identify sentiment analysis on social
media before, during, and after the game
to address fan feedback and potential
concerns
▪ Run geographic analysis against ticketing
to determine reach in and around a
stadium and more importantly “where is
the message not going”
▪ Map 2 – 3 degrees of connectivity
between fans and companies to identify
potential sponsor targets or campaigns