Slildes from the Webinar "Five Ways to Leverage AI and Tableau". Full webinar recording: https://starschema.com/kb/five-ways-to-leverage-ai-and-tableau
Sources & Workbooks: https://github.com/starschema/tableau-ai-use-cases
Prasad Narasimhan discusses various applications of predictive analytics across different domains including business, marketing, operations, collections, customer segmentation, telecom, sports, social media, and insurance. Predictive analytics uses statistical techniques to analyze current and historical data to predict future events or outcomes. It has various uses such as predicting customer churn, credit risk, response to marketing campaigns, fraud detection, and more. The document provides examples of how predictive analytics is applied in areas like customer retention, cross-sell, collections, credit risk management, and churn prediction in telecom.
This document provides an overview and agenda for Week 8 of the Data Scientist Enablement (DSE) 400 program. It outlines the week's discussions on ethics in big data, recommended learning materials, activities including exploring datasets and starting a blog, and an assignment to cleanse and visualize a sensor dataset or complete an alternative task. The timeline for the full DSE program and options for adaptive learning and proficiency certification are also summarized.
Artificial Intelligence using Machine Learning techniques like Churn and Recommender models can help Relationship Managers connect with dormant clients and help recommend stocks and MFs using existing applications via different devices
SPPM Clinical 7 Best Practices In Forecasting & Planningguest1fe658d
Strategic Project Portfolio Management for Clinical Trials
Ø Plan clinical trial expenditure using a top-down approach based on empirical or historical data
Ø Adjust the plan bottom-up after assessing individual site and region enrolment plan
Ø Update the plan using actual study performance data
Ø Manage accruals and payments
This Presentation presents the benefits of Data Science for those in retail broking practice. Employing Machine Learning techniques and text analytics, you not only get that competitive edge but also earn the customer's satisfaction and loyalty
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
The document discusses predictive analytics techniques including defining objectives, data preparation, modeling, deployment, and model monitoring. It describes preparing data through transformation, deriving behavioral variables, and quality checks. Modeling techniques covered include decision trees, regression, neural networks, and ensemble modeling. Model monitoring compares actual and predicted values, and analyzes variable distributions and predicted scores.
Visuals present better and quicker insights when forecasting sales. At a glance business strategies can be planned - time periods, geographic locations, pick variables that can highlight what works or doesn't, where it scores or doesn't, join two or more variables that work in specific geographical locations or don't, etc. All this put together makes data virtualization a very nifty tool to project what can make or break your predictions for sales!
Prasad Narasimhan discusses various applications of predictive analytics across different domains including business, marketing, operations, collections, customer segmentation, telecom, sports, social media, and insurance. Predictive analytics uses statistical techniques to analyze current and historical data to predict future events or outcomes. It has various uses such as predicting customer churn, credit risk, response to marketing campaigns, fraud detection, and more. The document provides examples of how predictive analytics is applied in areas like customer retention, cross-sell, collections, credit risk management, and churn prediction in telecom.
This document provides an overview and agenda for Week 8 of the Data Scientist Enablement (DSE) 400 program. It outlines the week's discussions on ethics in big data, recommended learning materials, activities including exploring datasets and starting a blog, and an assignment to cleanse and visualize a sensor dataset or complete an alternative task. The timeline for the full DSE program and options for adaptive learning and proficiency certification are also summarized.
Artificial Intelligence using Machine Learning techniques like Churn and Recommender models can help Relationship Managers connect with dormant clients and help recommend stocks and MFs using existing applications via different devices
SPPM Clinical 7 Best Practices In Forecasting & Planningguest1fe658d
Strategic Project Portfolio Management for Clinical Trials
Ø Plan clinical trial expenditure using a top-down approach based on empirical or historical data
Ø Adjust the plan bottom-up after assessing individual site and region enrolment plan
Ø Update the plan using actual study performance data
Ø Manage accruals and payments
This Presentation presents the benefits of Data Science for those in retail broking practice. Employing Machine Learning techniques and text analytics, you not only get that competitive edge but also earn the customer's satisfaction and loyalty
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
The document discusses predictive analytics techniques including defining objectives, data preparation, modeling, deployment, and model monitoring. It describes preparing data through transformation, deriving behavioral variables, and quality checks. Modeling techniques covered include decision trees, regression, neural networks, and ensemble modeling. Model monitoring compares actual and predicted values, and analyzes variable distributions and predicted scores.
Visuals present better and quicker insights when forecasting sales. At a glance business strategies can be planned - time periods, geographic locations, pick variables that can highlight what works or doesn't, where it scores or doesn't, join two or more variables that work in specific geographical locations or don't, etc. All this put together makes data virtualization a very nifty tool to project what can make or break your predictions for sales!
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Analytic Snapshots: Common Use Cases that Everyone Can Utilize (Dreamforce 2...Rhonda Ross
Have you heard about Analytic Snapshots, but been intimidated about setting your first one up? Or, are you simply unsure of how you can use them? Join us as we show you how setting up a snapshot can be as simple as performing three easy steps. Several use cases that virtually every organization can take advantage of will be discussed, such as tracking user logins over time, measuring Chatter adoption, reporting on changes to key fields, and more! Step-by-step instructions for setting up your own snapshots will be available so you can put what you learn into practice immediately.
The document discusses big data and predictive analytics. It defines big data as large volumes of diverse data that require new techniques and technologies to analyze. Predictive analytics uses statistical modeling of historical data to predict future outcomes. The document provides examples of how predictive models are used in weather forecasting, customer service, and marketing. It also distinguishes predictive analytics from machine learning and discusses common predictive modeling techniques like decision trees, neural networks, and regression.
In this fast-paced data-driven world, the fallout from a single data quality issue can cost thousands of dollars in a matter of hours. To catch these issues quickly, system monitoring for data quality requires a different set of strategies from other continuous regression efforts. Like a race car pit crew, you need detection mechanisms that not only don’t interfere with what you are monitoring but also allow for strategic analysis off-track. You need to use every second your subject is at rest to repair and clean up problems that could affect performance. As the systems in race cars vary, the tools and resources available to the data quality professional vary from one organization to the next. You need to be able to leverage the tools at hand to implement your solutions. Shauna Ayers and Catherine Cruz Agosto show you how to develop testing strategies to detect issues with data integration timing, operational dependencies, reference data management, and data integrity—even in production systems. See how you can leverage this testing to provide proactive notification alerts and feed business intelligence dashboards to communicate the health of your organization’s data systems to both operation support and non-technical personnel.
The document outlines a three-phase approach to developing an intelligent monitoring platform:
Phase 1 involves interviewing dev and ops teams to understand current monitoring practices.
Phase 2 focuses on improving the postmortem process and outage understanding.
Phase 3 aims to reduce the time to identify and resolve outages through expanded data collection, correlation analysis, and predictive capabilities.
How to Effectively Migrate Data From Legacy AppsCloverDX
** Watch the webinar to accompany these slides: https://www.cloverdx.com/webinars/how-to-effectively-migrate-data-from-legacy-system **
TIPS FOR PLANNING A DATA MIGRATION
Old HCM, ERP or CRM systems are often business critical since they are ingrained into many processes within a company. But their age often means that the knowledge about how they work is mostly lost and it can be daunting to replace them with something newer and more streamlined.
We'll show you some tips and best practices to help you migrate from a legacy system in a stress-free way.
More CloverDX webinars: https://www.cloverdx.com/webinars
Twitter: https://twitter.com/cloverdx
LinkedIn: https://www.linkedin.com/company/cloverdx/
Get a free 45 day trial of the CloverDX Data Management Platform: https://www.cloverdx.com/trial-platform
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
Data Analytics has emerged has one of the central aspects of business operations. Consequently, the quest to grab professional positions within the Data Analytics domain has assumed unimaginable proportions. So if you too happen to be someone who is desirous of making through a Data Analyst .
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona presents why it is a good idea to use Machine Learning in Security and explains some Machine Learning jargon and demonstraits with two fingerprinting examples: a wifi device (PHY) and a browser (L7)
Roger S. Barga discusses his experience in data science and predictive analytics projects across multiple industries. He provides examples of predictive models built for customer segmentation, predictive maintenance, customer targeting, and network intrusion prevention. Barga also outlines a sample predictive analytics project for a real estate client to predict whether they can charge above or below market rates. The presentation emphasizes best practices for building predictive models such as starting small, leveraging third-party tools, and focusing on proxy metrics that drive business outcomes.
Time series analysis allows Data Scientists to recognize trends, seasonality, and correlations within past data related to an organization to make predictions on which business decisions are based.
Let’s take a look at how various industries use Time series analysis to make crucial decisions.
~ The airline industry can optimize travel routes by predicting future weather patterns, seasonal demands, or unexpected events.
~ Stockbrokers use it to predict correlations within stocks & market conditions to decide where to invest.
~ Supply Chain companies can predict weather conditions, traffic patterns, expected delivery times to optimize routes.
But did you know that you can build Time Series models with minimum knowledge of coding? The KNIME Analytics Platform can make this happen. It uses a Graphical User Interface to allow Data Scientists who are just starting out and do not have extensive experience in coding.
In this webinar, our Machine Learning expert will help you build time series models using the KNIME Analytics Platform. Business leaders and Data scientists must not miss this opportunity to arrive at smart, data-driven business decisions with the help of this platform.
This document provides an agenda for a presentation on data analytics. It includes an introduction to data analytics concepts and examples of applications in various industries. A demo of Instant Insights, a cloud-based analytics tool, is presented. The document emphasizes that building analytics solutions through testing and iteration is better than just discussing ideas. Users should test solutions in the real world and measure user behavior to make data-driven decisions.
Slides from my presentation at the Data Intelligence conference in Washington DC (6/23/2017). See this link for the abstract: http://www.data-intelligence.ai/presentations/36
UNIT I Streaming Data & Architectures.pptxRahul Borate
The document provides an introduction and overview of streaming data. It discusses sources of streaming data such as operational monitoring, web analytics, online advertising, social media, and mobile/IoT data. It explains that streaming data is different from other data types in that it is always flowing in, loosely structured, and can have high-cardinality dimensions. Real-time architectures for streaming data need to have high availability, low latency, and horizontal scalability.
This document provides an overview of a course on business intelligence. It discusses how BI allows people at all levels of organizations to access, interact with, and analyze data to manage business operations more efficiently. The course aims to develop advanced business users with a deep understanding of business needs and good technical knowledge. It covers BI and social analytics in the first part and process modeling in the second part. The document also provides examples of how BI has helped companies in supply chain management, vaccine distribution, and beverage sales to improve operations through predictive and prescriptive analytics.
Confluent Partner Tech Talk with BearingPointconfluent
This document discusses best practices for debugging client applications in Kafka streams. It begins by asking a question about debugging practices for producers, consumers, and Kafka streams applications. It then describes a Partner Technical Sales Enablement offering that includes live sessions and on-demand learning paths on topics like Confluent fundamentals and use cases. It outlines additional support for partners through technical workshops, coaching, and solution discovery sessions. The document concludes by stating the goal of Partner Tech Talks is to provide insights and inspiration through use case discussions.
Data science involves analyzing data to extract meaningful insights. It uses principles from fields like mathematics, statistics, and computer science. Data scientists analyze large amounts of data to answer questions about what happened, why it happened, and what will happen. This helps generate meaning from data. There are different types of data analysis including descriptive analysis, which looks at past data, diagnostic analysis, which finds causes of past events, and predictive analysis, which forecasts future trends. The data analysis process involves specifying requirements, collecting and cleaning data, analyzing it, interpreting results, and reporting findings. Tools like SAS, Excel, R and Python are used for these tasks.
Are you collecting just about every metric under the sun and the kitchen sink too? Understanding the cost of collecting metrics and the usefulness of those metrics is the only way to scale in a cloud native world. You can’t get away with just collecting everything as you grow. Your observability teams need to make decisions about what to collect, what to drop, what to aggregate, and still be able to alert, triage, remediate, and do their root cause analysis on a daily basis. Gain immediate insights into high cost data (DPPS), when to drop time series data, and how to determine when the value of that data is at its lowest. Session includes a recorded demo video of it in action.
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf4dalert
Automating Data Reconciliation, Data Observability, and Data Quality Check After Each Data Load, read more: https://medium.com/@nihar.rout_analytics/automatic-data-reconciliation-data-quality-and-data-observability-3eeca4650cd
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Analytic Snapshots: Common Use Cases that Everyone Can Utilize (Dreamforce 2...Rhonda Ross
Have you heard about Analytic Snapshots, but been intimidated about setting your first one up? Or, are you simply unsure of how you can use them? Join us as we show you how setting up a snapshot can be as simple as performing three easy steps. Several use cases that virtually every organization can take advantage of will be discussed, such as tracking user logins over time, measuring Chatter adoption, reporting on changes to key fields, and more! Step-by-step instructions for setting up your own snapshots will be available so you can put what you learn into practice immediately.
The document discusses big data and predictive analytics. It defines big data as large volumes of diverse data that require new techniques and technologies to analyze. Predictive analytics uses statistical modeling of historical data to predict future outcomes. The document provides examples of how predictive models are used in weather forecasting, customer service, and marketing. It also distinguishes predictive analytics from machine learning and discusses common predictive modeling techniques like decision trees, neural networks, and regression.
In this fast-paced data-driven world, the fallout from a single data quality issue can cost thousands of dollars in a matter of hours. To catch these issues quickly, system monitoring for data quality requires a different set of strategies from other continuous regression efforts. Like a race car pit crew, you need detection mechanisms that not only don’t interfere with what you are monitoring but also allow for strategic analysis off-track. You need to use every second your subject is at rest to repair and clean up problems that could affect performance. As the systems in race cars vary, the tools and resources available to the data quality professional vary from one organization to the next. You need to be able to leverage the tools at hand to implement your solutions. Shauna Ayers and Catherine Cruz Agosto show you how to develop testing strategies to detect issues with data integration timing, operational dependencies, reference data management, and data integrity—even in production systems. See how you can leverage this testing to provide proactive notification alerts and feed business intelligence dashboards to communicate the health of your organization’s data systems to both operation support and non-technical personnel.
The document outlines a three-phase approach to developing an intelligent monitoring platform:
Phase 1 involves interviewing dev and ops teams to understand current monitoring practices.
Phase 2 focuses on improving the postmortem process and outage understanding.
Phase 3 aims to reduce the time to identify and resolve outages through expanded data collection, correlation analysis, and predictive capabilities.
How to Effectively Migrate Data From Legacy AppsCloverDX
** Watch the webinar to accompany these slides: https://www.cloverdx.com/webinars/how-to-effectively-migrate-data-from-legacy-system **
TIPS FOR PLANNING A DATA MIGRATION
Old HCM, ERP or CRM systems are often business critical since they are ingrained into many processes within a company. But their age often means that the knowledge about how they work is mostly lost and it can be daunting to replace them with something newer and more streamlined.
We'll show you some tips and best practices to help you migrate from a legacy system in a stress-free way.
More CloverDX webinars: https://www.cloverdx.com/webinars
Twitter: https://twitter.com/cloverdx
LinkedIn: https://www.linkedin.com/company/cloverdx/
Get a free 45 day trial of the CloverDX Data Management Platform: https://www.cloverdx.com/trial-platform
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
Data Analytics has emerged has one of the central aspects of business operations. Consequently, the quest to grab professional positions within the Data Analytics domain has assumed unimaginable proportions. So if you too happen to be someone who is desirous of making through a Data Analyst .
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona presents why it is a good idea to use Machine Learning in Security and explains some Machine Learning jargon and demonstraits with two fingerprinting examples: a wifi device (PHY) and a browser (L7)
Roger S. Barga discusses his experience in data science and predictive analytics projects across multiple industries. He provides examples of predictive models built for customer segmentation, predictive maintenance, customer targeting, and network intrusion prevention. Barga also outlines a sample predictive analytics project for a real estate client to predict whether they can charge above or below market rates. The presentation emphasizes best practices for building predictive models such as starting small, leveraging third-party tools, and focusing on proxy metrics that drive business outcomes.
Time series analysis allows Data Scientists to recognize trends, seasonality, and correlations within past data related to an organization to make predictions on which business decisions are based.
Let’s take a look at how various industries use Time series analysis to make crucial decisions.
~ The airline industry can optimize travel routes by predicting future weather patterns, seasonal demands, or unexpected events.
~ Stockbrokers use it to predict correlations within stocks & market conditions to decide where to invest.
~ Supply Chain companies can predict weather conditions, traffic patterns, expected delivery times to optimize routes.
But did you know that you can build Time Series models with minimum knowledge of coding? The KNIME Analytics Platform can make this happen. It uses a Graphical User Interface to allow Data Scientists who are just starting out and do not have extensive experience in coding.
In this webinar, our Machine Learning expert will help you build time series models using the KNIME Analytics Platform. Business leaders and Data scientists must not miss this opportunity to arrive at smart, data-driven business decisions with the help of this platform.
This document provides an agenda for a presentation on data analytics. It includes an introduction to data analytics concepts and examples of applications in various industries. A demo of Instant Insights, a cloud-based analytics tool, is presented. The document emphasizes that building analytics solutions through testing and iteration is better than just discussing ideas. Users should test solutions in the real world and measure user behavior to make data-driven decisions.
Slides from my presentation at the Data Intelligence conference in Washington DC (6/23/2017). See this link for the abstract: http://www.data-intelligence.ai/presentations/36
UNIT I Streaming Data & Architectures.pptxRahul Borate
The document provides an introduction and overview of streaming data. It discusses sources of streaming data such as operational monitoring, web analytics, online advertising, social media, and mobile/IoT data. It explains that streaming data is different from other data types in that it is always flowing in, loosely structured, and can have high-cardinality dimensions. Real-time architectures for streaming data need to have high availability, low latency, and horizontal scalability.
This document provides an overview of a course on business intelligence. It discusses how BI allows people at all levels of organizations to access, interact with, and analyze data to manage business operations more efficiently. The course aims to develop advanced business users with a deep understanding of business needs and good technical knowledge. It covers BI and social analytics in the first part and process modeling in the second part. The document also provides examples of how BI has helped companies in supply chain management, vaccine distribution, and beverage sales to improve operations through predictive and prescriptive analytics.
Confluent Partner Tech Talk with BearingPointconfluent
This document discusses best practices for debugging client applications in Kafka streams. It begins by asking a question about debugging practices for producers, consumers, and Kafka streams applications. It then describes a Partner Technical Sales Enablement offering that includes live sessions and on-demand learning paths on topics like Confluent fundamentals and use cases. It outlines additional support for partners through technical workshops, coaching, and solution discovery sessions. The document concludes by stating the goal of Partner Tech Talks is to provide insights and inspiration through use case discussions.
Data science involves analyzing data to extract meaningful insights. It uses principles from fields like mathematics, statistics, and computer science. Data scientists analyze large amounts of data to answer questions about what happened, why it happened, and what will happen. This helps generate meaning from data. There are different types of data analysis including descriptive analysis, which looks at past data, diagnostic analysis, which finds causes of past events, and predictive analysis, which forecasts future trends. The data analysis process involves specifying requirements, collecting and cleaning data, analyzing it, interpreting results, and reporting findings. Tools like SAS, Excel, R and Python are used for these tasks.
Are you collecting just about every metric under the sun and the kitchen sink too? Understanding the cost of collecting metrics and the usefulness of those metrics is the only way to scale in a cloud native world. You can’t get away with just collecting everything as you grow. Your observability teams need to make decisions about what to collect, what to drop, what to aggregate, and still be able to alert, triage, remediate, and do their root cause analysis on a daily basis. Gain immediate insights into high cost data (DPPS), when to drop time series data, and how to determine when the value of that data is at its lowest. Session includes a recorded demo video of it in action.
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf4dalert
Automating Data Reconciliation, Data Observability, and Data Quality Check After Each Data Load, read more: https://medium.com/@nihar.rout_analytics/automatic-data-reconciliation-data-quality-and-data-observability-3eeca4650cd
Similar to Five Ways to Leverage AI and Tableau (20)
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
5. info@starschema.com starschema.com
About Kristóf
Dr Kristóf Cséfalvay
• Originally trained in law, then in clinical
epidemiology
• 10-year track record of data science jobs,
including at companies like Volkswagen AG,
RB plc and various roles in public service
• Special interest in neural networks/deep
learning, geospatial analytics and data
exploration
• Joined Starschema as Principal Data Scientist,
currently serving as VP Special Projects
VP Special Projects, Starschema
8. info@starschema.com starschema.com
• Natural Language Processing (NLP) is the data science discipline that pertains to extracting
information from, and learning on, unstructured or semi-structured textual data:
• computer-generated text: logs, status messages,...
• VOC (Voice of the Customer): e-mails, call transcripts, chatbot interactions, customer reviews...
• social media, brand surveillance, topic surveillance...
• internal and external knowledge bases, documents, reports, legal documentation (contracts, deeds
&c.)...
• Based on research in computational linguistics since the 1970s, it has been enormously
successful in a range of applications – to mention only a few examples:
• determining sentiment from a large volume of data, e.g. thousands of product reviews or tweets,
• categorising institutional knowledge by the most pertinent key concepts,
• analysing error tickets to prioritise support skillset,
• understanding ratings from data sets that couple a rating with a review.
About NLP
8
9. info@starschema.com starschema.com
• 23,486 reviews of women's clothing from an unnamed e-commerce distributor, anonymised
• Variables:
• Review text
• Rating (numeric, 1–5)
• Review weighter (number of users who have found a review to be useful)
• Approach: tf/idf association
• Isolates terms that are relatively frequent within one review when compared to all reviews
• In other words: what terms are particularly salient to
• each review,
• good reviews (rating 4 or 5),
• bad reviews (rating 1 or 2)
• From this, we can create a picture of what drives customers' interest
Example data set: women's clothing reviews
9
11. info@starschema.com starschema.com
• Some processes over time exhibit a combination of a periodic effect (seasonality) and an
aperiodic effect (trend), combined with a noise variable to account for everything else.
• Consider air traffic passenger counts:
• Overall, as air travel has become more affordable, more people travel by air today than 10 years ago
(aperiodic effect/trend).
• Every year, people generally travel more by air during Christmas and Summer school holidays than at
other times of the year (periodic effect/seasonality, since it is reflected in every year)
• Beyond this, there are variations that aren't acutely accounted for by either seasonal or trend effects –
a particularly bad air traffic disaster, for instance, may result in reduced bookings (accounted for by the
noise variable)
• Seasonality detection allows
• a better understanding of the cyclic processes that drive demand effects,
• while also revealing more about trends while controlling for seasonal effects.
About seasonality detection
11
13. info@starschema.com starschema.com
• Anomaly detection identifies data points that fall outside a 'normal' pattern:
• abnormal combinations: people who buy bleach and duct tape (and nothing else)
• time series anomalies; sudden spikes or outliers
• Many of these can be easily captured visually, but machine learning can be useful in identifying
and quantifying them, in real time – allowing immediate response:
• fraud prevention: acting on evidence of anomalous transactions by blocking them (e.g. credit card
lockdowns after a sudden sequence of cash withdrawals)
• maintaining quality of service: responding to increased demand due to extraneous factors by bringing
in additional resources
• predictive maintenance and fault detection: using anomalies to detect prodromic signs of impending
breakdown
• pattern break recognition: using an established pattern, such as average key press duration and
average time between key presses within a word, as an 'identity' and alerting when this pattern is
deviated from, a technique widely used to detect fraud on sensitive systems
About anomaly detection
13
14. info@starschema.com starschema.com
• 22,695 data points consisting of measurements from a temperature within a machine, with a
temporal resolution of 5 minutes. The accumulation of an initial error and increasing degradation
led to catastrophic failure of the machine at the end.
• The data set is a good and very realistic stand-in for a single dimension of real-world predictive
maintenance data we encounter on a regular basis – especially because it contains not one
catastrophic anomaly but a sequence of a very overt and a much less overt anomaly culminating
in catastrophic breakdown.
• Timely detection of such anomalies can
• save repair costs,
• reduce outages and distruptions to supply chain processes,
• prevent catastrophic breakdowns, and
• reduce the potential risk of personal injury or loss of property associated with serious defects.
Example data set: NAB Machine Temperature
14
16. info@starschema.com starschema.com
• Natural Language Generation (NLG) refers to the synthesis of natural language responses to
data.
• NLG makes data intelligible by summarising it in a clear, cogent natural language output. While
visualisations show what the data is, NLG is about automatically creating a data narrative based
on the saliency of information. This saves the lengthy process of manually writing reports to
stakeholders who rely on quick textual summaries.
• In the following demo, a finance dashboard will be accompanied by an NLG output using a
solution developed by our partner, Arria. The NLG output will put the often complex visuals into
plain text, while also tied closely to the data and being responsive to selections on the main
dashboard.
About Natural Language Generation
16