The document provides an overview of the methodology for optimal individual sales allocation and forecasting for Astra Honda Motor. It involves several steps: 1) dealer-level forecasting, 2) allocation of the total forecast to dealers, 3) secondary forecasting at the dealer-level, and 4) validation and adjustment of the allocation and forecasts. It also discusses data preparation involving 32 variables related to customer and sales data. The goal is to optimize sales allocation and forecasts for individual dealers.
Steve Molis shared his world famous Salesforce formulas presentation to the Salesforce Wellington Trailblazer Community group, May 6 2020. Check out the meeting and wrap up for more info, links and photos.
Steve mo's formulas and life hacks frankfurt de 2020-05-07Alan Thomas Payne
Our world-renowned special Guest SteveMo has the Low-Down on Formulas, Reports, and a S**t-Load of great Admin LifeHacks!
So we're #StayingSafe and #StayingHome in the 2nd installment of our #VirtualEvents
Lightning talk on the future of analytics - CloudCamp London, 2016 Jon Hawes
The document proposes a model for using data and analytics to generate insights that drive business processes and decisions. It outlines moving from a past model where data was a byproduct to a new model where data drives decisions. However, simply hoping data will provide answers is unrealistic given limited data scientists. The document advocates combining business knowledge with data to validate, optimize and predict outcomes, focusing first on "solid ground" opportunities before riskier exploratory efforts. It presents a structured approach involving missions, recipes and runbooks to formalize problem-solving and make the process clear and repeatable. The goal is a feedback loop where insights inform new questions and improvements.
This document discusses techniques for data visualization, including the basics of common chart types like line graphs, bar charts, scatter plots, and pie charts. It then addresses additional challenges involved in visualizing big data, such as handling large data volumes, different data varieties, visualization velocity, and filtering big data. SAS Visual Analytics is introduced as a solution that uses autocharting and high-performance analytics to help users quickly visualize and explore huge amounts of data.
This document discusses a learning activity to help students practice multiplication and division with integers by having them create a family budget. It provides context on wages, expenses, and costs for a family of six. Students are asked to identify the family's monthly income and expenses, calculate the total budget and per person spending, and express their understanding of operations with integers. The activity aims to help students comprehend the meaning of positive and negative signs and use integer operations to model real-world financial situations.
The document discusses requirements analysis and provides guidance on conducting an agile approach. It introduces the "Requirements Iceberg" concept and identifies relevant aspects of analysis. It emphasizes identifying customer needs rather than just the initial request, and provides examples of diagrams and models that can help uncover additional requirements, such as business process flows and conceptual data models. The document recommends iterating the analysis to incrementally build understanding and avoid getting stuck in analysis paralysis.
This document provides an overview and agenda for a one-day data analysis training. The training will cover foundational concepts of data analysis including data preparation, visualization, and effective data presentation. It will include exercises in data gathering, graph types, pivot tables, and developing data stories. The goal is to help participants turn data into meaningful insights through analysis and visualization.
Steve Molis shared his world famous Salesforce formulas presentation to the Salesforce Wellington Trailblazer Community group, May 6 2020. Check out the meeting and wrap up for more info, links and photos.
Steve mo's formulas and life hacks frankfurt de 2020-05-07Alan Thomas Payne
Our world-renowned special Guest SteveMo has the Low-Down on Formulas, Reports, and a S**t-Load of great Admin LifeHacks!
So we're #StayingSafe and #StayingHome in the 2nd installment of our #VirtualEvents
Lightning talk on the future of analytics - CloudCamp London, 2016 Jon Hawes
The document proposes a model for using data and analytics to generate insights that drive business processes and decisions. It outlines moving from a past model where data was a byproduct to a new model where data drives decisions. However, simply hoping data will provide answers is unrealistic given limited data scientists. The document advocates combining business knowledge with data to validate, optimize and predict outcomes, focusing first on "solid ground" opportunities before riskier exploratory efforts. It presents a structured approach involving missions, recipes and runbooks to formalize problem-solving and make the process clear and repeatable. The goal is a feedback loop where insights inform new questions and improvements.
This document discusses techniques for data visualization, including the basics of common chart types like line graphs, bar charts, scatter plots, and pie charts. It then addresses additional challenges involved in visualizing big data, such as handling large data volumes, different data varieties, visualization velocity, and filtering big data. SAS Visual Analytics is introduced as a solution that uses autocharting and high-performance analytics to help users quickly visualize and explore huge amounts of data.
This document discusses a learning activity to help students practice multiplication and division with integers by having them create a family budget. It provides context on wages, expenses, and costs for a family of six. Students are asked to identify the family's monthly income and expenses, calculate the total budget and per person spending, and express their understanding of operations with integers. The activity aims to help students comprehend the meaning of positive and negative signs and use integer operations to model real-world financial situations.
The document discusses requirements analysis and provides guidance on conducting an agile approach. It introduces the "Requirements Iceberg" concept and identifies relevant aspects of analysis. It emphasizes identifying customer needs rather than just the initial request, and provides examples of diagrams and models that can help uncover additional requirements, such as business process flows and conceptual data models. The document recommends iterating the analysis to incrementally build understanding and avoid getting stuck in analysis paralysis.
This document provides an overview and agenda for a one-day data analysis training. The training will cover foundational concepts of data analysis including data preparation, visualization, and effective data presentation. It will include exercises in data gathering, graph types, pivot tables, and developing data stories. The goal is to help participants turn data into meaningful insights through analysis and visualization.
Data Visualization is widely used in industries in info-graphics design, business analytics, data analytics, advanced analytics, business intelligence dashboards, content marketing. It is the 1st part of 3 part series on data visualization. These techniques will enable you to create a good design UI/UX. It contains r codes useful for programmers to create good visual charts and depict a story to clients, customer, senior management, etc ...
Digging into the Dirichlet Distribution by Max SklarHakka Labs
When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data.
The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...DataScienceConferenc1
Working with data isn’t just a technical challenge. Once you exit the safe nest of your data science department, you often need to act as a lawyer, therapist, writer or creative director. Storytelling turns out to be one of the key skills when explaining data to others. How not to overburden product managers with data and how to present findings in an understandable way? We will discuss about ways to hack the counterproductive nature of two professions, data scientist and product manager, where one is experimenting while the other is boxing it in. This talk will be about how this collaboration can be pushed to the next level and how a data scientist can expand her skill sets beyond just technical proficiency and critical thinking.
Charts are a beautiful way to visualize your data. In this session, we'll introduce the dozens of options to create unique Charts and take questions on how to best optimize your reporting.
Tutorial for beginning graduate students. Basic exploration of multivariate experimental data can be done with freely downloadable software. We also discuss the use of Excel because it is commonly in use.
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
The document discusses data mining and the gender gap as a fundamental issue for metrics. Data mining models and patterns can help resolve this issue. Achieving gender equality and empowering women and girls is Sustainable Development Goal number 5. Experience with data mining techniques like clusters and patterns can provide insights from data to information and knowledge. However, visual counting large numbers is risky due to production patterns with variability and ambiguity.
This document introduces intervention analysis as it applies to regression models. It describes a case scenario where a student and instructor work through building a regression model to explain and forecast monthly sales (FRED.SALE) for Fred's Deli using monthly advertising (FRED.ADVERT) as a predictor variable. Initially, plotting the sales data over time shows no clear pattern. Looking at autocorrelations reveals some seasonality in sales at 12 months but values are not significant. Cross-correlating sales with advertising shows a significant correlation at a 2 month lag, so a regression model is estimated with sales as the dependent variable and advertising lagged by 2 months as the explanatory variable. Diagnostic checks of the residuals from this model show
Watson Analytics provides self-service data analytics capabilities including data acquisition, cleansing, insights discovery, outcome prediction, visualization, and action without requiring data expert assistance. It handles large volumes of rapidly accessible data and automates data preparation, refinement, management, and analysis from the cloud. Statistical analysis, correlations, and predictions help users gain a deeper understanding of their business to see relevant information, take action, and anticipate opportunities.
A Big Data Analytics Company focused on ‘Individual Persona Data driven Insights’ on mass scale to significantly enhance effectiveness of business and marketing actions. We provide unique tailored solutions to each client as compared to others offering a standard package. Free Trial Available
Financial Forecasting For WordPress BusinessesCaldera Labs
A financial forecast estimates a company's financial performance over a period of time. It is created using historical internal data, external indicators, and qualitative and quantitative methods. Qualitative methods include customer research while quantitative methods include time series analysis, regression analysis, and decomposition. The forecasting process involves identifying the problem, relevant variables, data collection approach, assumptions, model selection, forecasting, and verification. Examples of financial forecasts for WordPress businesses include estimating increased support needs from sales growth and creating a 3-year forecast for theme revenue, costs, and profits.
This document provides an overview of SAS Visual Programmer (VP) and walks through an example of analyzing retail store data without coding. It explains that VP allows analysts to visually map out the logical data flow from raw data to actionable insights. The example demonstrates importing retail store data, sorting and filtering it to find the highest selling and highest profit items, and presenting the results in charts for the CEO. The learning objectives are to effectively use VP, import data into it, and create different types of charts.
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
The document discusses becoming a data-driven organization. It provides an overview of the value chain of data science and an analytics maturity journey. The value chain of data science shows how data can be measured, optimized, used to generate predictions and insights, and ultimately create value. It emphasizes starting with the desired value and working backwards to the necessary data. The analytics maturity journey outlines four phases - initialization, continuous experimentation, enterprise empowerment, and data democratization - with different focuses at each stage to build analytical capabilities and business adoption of data and analytics. Key roles in a minimal viable data science team are also outlined.
Intuitions and Formulations for Data Science ProblemsMusfir Mohammed
What are the challenges the IT industry is facing now? And how is data science the savior?
How do you convert a real-world problem into a data science problem?
An introduction to the modeling of a known problem to formulate and simulate the real world scenario
Visualize data using the split-apply-combine approachLuca Candela
An easy to understand primer about the "split-apply-combine" concept popularized by Hadley Wickham applied to data visualization. Following that I go through a simple introduction to the perceptual variables available for data visualization and some common mistakes.
This document outlines an activity to practice modeling and predicting values using simple linear regression. Students are asked to:
1. Record guesses and actual values for various jars of jelly beans to see how off their guesses are.
2. Use the differences between guesses and actuals to develop a formula to "correct" future guesses.
3. Apply the same process to guessing college football wins to refine their predictive model.
4. Complete tables and calculations in StatCrunch to fit linear and quadratic models to their data and evaluate which model fits best. They are asked to use the model to predict further values and evaluate residuals.
The document discusses data science projects and their evolution over time. It covers several frameworks for data science projects including SEMMA, KDD, and CRISP-DM. It provides examples of descriptive and predictive analytics applied to automotive sales data. Finally, it discusses evaluating analytical models and assessing discrepancies between sales forecasts and actual sales.
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docxtangyechloe
Big Data Analytics Tools./.DS_Store
__MACOSX/Big Data Analytics Tools./._.DS_Store
Big Data Analytics Tools./ Final Exam/PROJECT - BETTER UNDERSTAND ATTRITION.docx
FINAL EXAM – EXERCISE – To Better Understand Attrition.
This is a final project – you are going to exam the HR-BalanceSheet dataset and write a short report on what you found. I will guide you through the analysis, but as we go through the analysis you are going to need to capture data for the final report.
1. Load the dataset into Statistica
2. Generate Histograms for all of the data
a. Make notes on what you observe from the histograms. Can you learn anything about the business from these histograms?
b. Capture all of the histograms.
3. Now generate a correlation matrix to see if any variables are highly correlated. If variables are highly correlated and you are doing a supervised method (e.g., decision tree), then one of them must be omitted from the analysis. Do you know why?
Statistics->Nonparametrics->Correlations Okay.
Now select ALL of the variables and select “Spearman rank R”.
4. Let’s copy this out to Excel.
a. Open a blank Excel file
b. Go to Statistica – the output correlation matrix –
i. Hit Ctrl – A - this will select everything.
ii. Right Click - select “Copy with Headers”
iii. Go To Excel – select Paste
5. Select all of the numbers in Excel
a. Go To Conditional Formatting
i. Highlight all values greater than 0.70
6. This tells you the values that are highly correlated. Record what they are – these cannot be used in a supervised modeling exercise together. For example, JobLevel and TotalWorkingYears are highly correlated.
a. Make a list of all of the variables that are highly correlated (>0.7).
BUSINESS PROBLEM: The company has employee data for the last several years. In this data set we have a wide range of data, including whether or not they left the company (i.e., Attrition). If Attrition is set to “Yes”, they left the company. If Attrition is set to “No”, they did not leave the company.
The first thing we want to do is take a “high” level look at those people who left the company.
Go to Selection Criteria – that is accessible through the Sel:Off setting at the bottom of the Statistica window. Click on “Sel:Off”
Set the selection criteria to Attribute = “Yes”.
7. Generate Histograms for all of the data
a. Make notes on what you observe from the histograms. Can you learn anything about the business from these histograms?
b. Capture the histograms that tell you something about the business.
Go back to the selection criteria and turn the Sel: back to “Off”.
8. Now build a decision tree (C&RT) to see if we can find out what influences where or not individuals decide to leave the company.
If you exclude the variables that are highly correlated, you can generate a tree.
Generate a C&RT tree
Pick your variables (Quick)
· Attrition is your dependent variable
· Select the categorical and continuous v.
Data Visualization is widely used in industries in info-graphics design, business analytics, data analytics, advanced analytics, business intelligence dashboards, content marketing. It is the 1st part of 3 part series on data visualization. These techniques will enable you to create a good design UI/UX. It contains r codes useful for programmers to create good visual charts and depict a story to clients, customer, senior management, etc ...
Digging into the Dirichlet Distribution by Max SklarHakka Labs
When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data.
The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...DataScienceConferenc1
Working with data isn’t just a technical challenge. Once you exit the safe nest of your data science department, you often need to act as a lawyer, therapist, writer or creative director. Storytelling turns out to be one of the key skills when explaining data to others. How not to overburden product managers with data and how to present findings in an understandable way? We will discuss about ways to hack the counterproductive nature of two professions, data scientist and product manager, where one is experimenting while the other is boxing it in. This talk will be about how this collaboration can be pushed to the next level and how a data scientist can expand her skill sets beyond just technical proficiency and critical thinking.
Charts are a beautiful way to visualize your data. In this session, we'll introduce the dozens of options to create unique Charts and take questions on how to best optimize your reporting.
Tutorial for beginning graduate students. Basic exploration of multivariate experimental data can be done with freely downloadable software. We also discuss the use of Excel because it is commonly in use.
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
The document discusses data mining and the gender gap as a fundamental issue for metrics. Data mining models and patterns can help resolve this issue. Achieving gender equality and empowering women and girls is Sustainable Development Goal number 5. Experience with data mining techniques like clusters and patterns can provide insights from data to information and knowledge. However, visual counting large numbers is risky due to production patterns with variability and ambiguity.
This document introduces intervention analysis as it applies to regression models. It describes a case scenario where a student and instructor work through building a regression model to explain and forecast monthly sales (FRED.SALE) for Fred's Deli using monthly advertising (FRED.ADVERT) as a predictor variable. Initially, plotting the sales data over time shows no clear pattern. Looking at autocorrelations reveals some seasonality in sales at 12 months but values are not significant. Cross-correlating sales with advertising shows a significant correlation at a 2 month lag, so a regression model is estimated with sales as the dependent variable and advertising lagged by 2 months as the explanatory variable. Diagnostic checks of the residuals from this model show
Watson Analytics provides self-service data analytics capabilities including data acquisition, cleansing, insights discovery, outcome prediction, visualization, and action without requiring data expert assistance. It handles large volumes of rapidly accessible data and automates data preparation, refinement, management, and analysis from the cloud. Statistical analysis, correlations, and predictions help users gain a deeper understanding of their business to see relevant information, take action, and anticipate opportunities.
A Big Data Analytics Company focused on ‘Individual Persona Data driven Insights’ on mass scale to significantly enhance effectiveness of business and marketing actions. We provide unique tailored solutions to each client as compared to others offering a standard package. Free Trial Available
Financial Forecasting For WordPress BusinessesCaldera Labs
A financial forecast estimates a company's financial performance over a period of time. It is created using historical internal data, external indicators, and qualitative and quantitative methods. Qualitative methods include customer research while quantitative methods include time series analysis, regression analysis, and decomposition. The forecasting process involves identifying the problem, relevant variables, data collection approach, assumptions, model selection, forecasting, and verification. Examples of financial forecasts for WordPress businesses include estimating increased support needs from sales growth and creating a 3-year forecast for theme revenue, costs, and profits.
This document provides an overview of SAS Visual Programmer (VP) and walks through an example of analyzing retail store data without coding. It explains that VP allows analysts to visually map out the logical data flow from raw data to actionable insights. The example demonstrates importing retail store data, sorting and filtering it to find the highest selling and highest profit items, and presenting the results in charts for the CEO. The learning objectives are to effectively use VP, import data into it, and create different types of charts.
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
The document discusses becoming a data-driven organization. It provides an overview of the value chain of data science and an analytics maturity journey. The value chain of data science shows how data can be measured, optimized, used to generate predictions and insights, and ultimately create value. It emphasizes starting with the desired value and working backwards to the necessary data. The analytics maturity journey outlines four phases - initialization, continuous experimentation, enterprise empowerment, and data democratization - with different focuses at each stage to build analytical capabilities and business adoption of data and analytics. Key roles in a minimal viable data science team are also outlined.
Intuitions and Formulations for Data Science ProblemsMusfir Mohammed
What are the challenges the IT industry is facing now? And how is data science the savior?
How do you convert a real-world problem into a data science problem?
An introduction to the modeling of a known problem to formulate and simulate the real world scenario
Visualize data using the split-apply-combine approachLuca Candela
An easy to understand primer about the "split-apply-combine" concept popularized by Hadley Wickham applied to data visualization. Following that I go through a simple introduction to the perceptual variables available for data visualization and some common mistakes.
This document outlines an activity to practice modeling and predicting values using simple linear regression. Students are asked to:
1. Record guesses and actual values for various jars of jelly beans to see how off their guesses are.
2. Use the differences between guesses and actuals to develop a formula to "correct" future guesses.
3. Apply the same process to guessing college football wins to refine their predictive model.
4. Complete tables and calculations in StatCrunch to fit linear and quadratic models to their data and evaluate which model fits best. They are asked to use the model to predict further values and evaluate residuals.
The document discusses data science projects and their evolution over time. It covers several frameworks for data science projects including SEMMA, KDD, and CRISP-DM. It provides examples of descriptive and predictive analytics applied to automotive sales data. Finally, it discusses evaluating analytical models and assessing discrepancies between sales forecasts and actual sales.
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docxtangyechloe
Big Data Analytics Tools./.DS_Store
__MACOSX/Big Data Analytics Tools./._.DS_Store
Big Data Analytics Tools./ Final Exam/PROJECT - BETTER UNDERSTAND ATTRITION.docx
FINAL EXAM – EXERCISE – To Better Understand Attrition.
This is a final project – you are going to exam the HR-BalanceSheet dataset and write a short report on what you found. I will guide you through the analysis, but as we go through the analysis you are going to need to capture data for the final report.
1. Load the dataset into Statistica
2. Generate Histograms for all of the data
a. Make notes on what you observe from the histograms. Can you learn anything about the business from these histograms?
b. Capture all of the histograms.
3. Now generate a correlation matrix to see if any variables are highly correlated. If variables are highly correlated and you are doing a supervised method (e.g., decision tree), then one of them must be omitted from the analysis. Do you know why?
Statistics->Nonparametrics->Correlations Okay.
Now select ALL of the variables and select “Spearman rank R”.
4. Let’s copy this out to Excel.
a. Open a blank Excel file
b. Go to Statistica – the output correlation matrix –
i. Hit Ctrl – A - this will select everything.
ii. Right Click - select “Copy with Headers”
iii. Go To Excel – select Paste
5. Select all of the numbers in Excel
a. Go To Conditional Formatting
i. Highlight all values greater than 0.70
6. This tells you the values that are highly correlated. Record what they are – these cannot be used in a supervised modeling exercise together. For example, JobLevel and TotalWorkingYears are highly correlated.
a. Make a list of all of the variables that are highly correlated (>0.7).
BUSINESS PROBLEM: The company has employee data for the last several years. In this data set we have a wide range of data, including whether or not they left the company (i.e., Attrition). If Attrition is set to “Yes”, they left the company. If Attrition is set to “No”, they did not leave the company.
The first thing we want to do is take a “high” level look at those people who left the company.
Go to Selection Criteria – that is accessible through the Sel:Off setting at the bottom of the Statistica window. Click on “Sel:Off”
Set the selection criteria to Attribute = “Yes”.
7. Generate Histograms for all of the data
a. Make notes on what you observe from the histograms. Can you learn anything about the business from these histograms?
b. Capture the histograms that tell you something about the business.
Go back to the selection criteria and turn the Sel: back to “Off”.
8. Now build a decision tree (C&RT) to see if we can find out what influences where or not individuals decide to leave the company.
If you exclude the variables that are highly correlated, you can generate a tree.
Generate a C&RT tree
Pick your variables (Quick)
· Attrition is your dependent variable
· Select the categorical and continuous v.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
27. WIKIPEDIA:
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.
28. WIKIPEDIA:
Data
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.
29. WIKIPEDIA:
Data
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
30. WIKIPEDIA:
Data
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
31. WIKIPEDIA:
Data
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
32. WIKIPEDIA:
Data
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
59. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
60. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
61. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
62. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
63. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
64. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
65. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
66. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
67. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
68. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
69. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
- AI
70. Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
- AI
- Hybrid
71. Data
Validation set
Training set
Test set
Train
classifier
Homogeneous
ensemble
algorithm
Individual
classification
algorithm
Apply
model
Classification
models
Apply
model
Test set
prediction
Train
classifier
Ensemble model
Validation set
predictions
Apply
model
Heterogeneous
ensemble
algorithm
Features
Selection
Clustering
Estimated
Value
STATISTICAL LEARNING FLOWCHART
PLIZ, OJO NGE-LIB!!
73. Description Value
ROW_ID Row ID NUMERIC
MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC
FRAME_NO Nomor rangka motor yang dipunyai customer TEXT
CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT
SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS)
KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...}
SEQUENCE_MESIN Sequence dari kode mesin NUMERIC
VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …}
COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …}
KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO}
JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN}
TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY)
KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …}
KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …}
KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …}
KODE_POS Kode pos surat menyurat customer NUMERIC
PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …}
STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA}
JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT}
JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT}
NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT
BESAR_DP Besar DP yang diberikan customer TEXT
BESAR_CICILAN Besar cicilan per bulan NUMERIC
LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC
AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA}
PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …}
PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7}
PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA}
NO_HP Nomor handphone customer TEXT
STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR}
NO_TLP Nomor telepon customer TEXT
KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO}
MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN}
TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI}
SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA}
YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI}
MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01}
DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …}
KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …}
TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS)
STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2}
UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)
96. Description Value
ROW_ID Row ID NUMERIC
MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC
FRAME_NO Nomor rangka motor yang dipunyai customer TEXT
CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT
SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS)
KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...}
SEQUENCE_MESIN Sequence dari kode mesin NUMERIC
VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …}
COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …}
KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO}
JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN}
TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY)
KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …}
KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …}
KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …}
KODE_POS Kode pos surat menyurat customer NUMERIC
PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …}
STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA}
JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT}
JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT}
NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT
BESAR_DP Besar DP yang diberikan customer TEXT
BESAR_CICILAN Besar cicilan per bulan NUMERIC
LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC
AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA}
PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …}
PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7}
PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA}
NO_HP Nomor handphone customer TEXT
STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR}
NO_TLP Nomor telepon customer TEXT
KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO}
MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN}
TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI}
SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA}
YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI}
MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01}
DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …}
KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …}
TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS)
STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2}
UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)