Objective - to analyse data to Identify items based on the transaction history of customers.
Identify patterns of relationship between data of customers using association rules.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
A Market Basket Analysis of a bakery shop data using Apriori Algorithms and Association Rule mining . Application and Benefits of Market Basket Analytics in Retail Management
Busy consumers who want home-cooked healthy meals but don't have the time to shop are becoming more attracted to the convenience of meal kit delivery services.
Consumers who subscribe to these services typically receive a box containing fresh, prepared ingredients for one or more meals and the corresponding recipes. They usually receive them once a week and some even have same-day delivery.
As the meal kit sector continues to grow, so does supply chain flexibility and meal kit delivery services demand.
In order to succeed, subscription meal kit companies not only need to offer good, quality meals and an intuitive user interface, but also must have excellent supply chain management skills.
Many services promise some mix of local, fresh, reduced-calorie, gluten-free, or organic products and may provide hard to find items.
Because of the growth of food home deliveries, the distance between stops has been reduced and the number of items delivered to each stop has been increased. This lowers the carrier’s cost resulting in overall lower shipping costs.
The attention to fresh ingredients is often ideal for consumers looking for healthy, tasty meal alternatives with many delivery options. Local farmers also benefit from the increased ongoing demand for their perishable foods.
BigInsights and Text Analytics.
As enterprises seek to gain operational efficiencies and competitive advantage through greater use of analytics, much of the new information they need to analyze is found in text documents and, increasingly, in a wide variety of social media sites and portals. A critical step in gaining insights from this information is extracting core data from huge volumes of text. That data is then available for downstream analytic, mining and machine learning tools. AQL (Annotator Query Language) is a powerful declarative, rule-based language for the extraction of information from text documents.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
A Market Basket Analysis of a bakery shop data using Apriori Algorithms and Association Rule mining . Application and Benefits of Market Basket Analytics in Retail Management
Busy consumers who want home-cooked healthy meals but don't have the time to shop are becoming more attracted to the convenience of meal kit delivery services.
Consumers who subscribe to these services typically receive a box containing fresh, prepared ingredients for one or more meals and the corresponding recipes. They usually receive them once a week and some even have same-day delivery.
As the meal kit sector continues to grow, so does supply chain flexibility and meal kit delivery services demand.
In order to succeed, subscription meal kit companies not only need to offer good, quality meals and an intuitive user interface, but also must have excellent supply chain management skills.
Many services promise some mix of local, fresh, reduced-calorie, gluten-free, or organic products and may provide hard to find items.
Because of the growth of food home deliveries, the distance between stops has been reduced and the number of items delivered to each stop has been increased. This lowers the carrier’s cost resulting in overall lower shipping costs.
The attention to fresh ingredients is often ideal for consumers looking for healthy, tasty meal alternatives with many delivery options. Local farmers also benefit from the increased ongoing demand for their perishable foods.
BigInsights and Text Analytics.
As enterprises seek to gain operational efficiencies and competitive advantage through greater use of analytics, much of the new information they need to analyze is found in text documents and, increasingly, in a wide variety of social media sites and portals. A critical step in gaining insights from this information is extracting core data from huge volumes of text. That data is then available for downstream analytic, mining and machine learning tools. AQL (Annotator Query Language) is a powerful declarative, rule-based language for the extraction of information from text documents.
Customer segmentation is a Project on Machine learning that is developed by using Clustering & clustering is the technique that comes under unsupervised learning of machine learning.
Segmentation allows prospects based on their wants and needs. It allows identifying the most valuable customer segment so the basis of it vender improve their return on marketing investment by only targeting those likely to be your best customer.
Market Basket Analysis in SQL Server Machine Learning ServicesLuca Zavarella
Market Basket Analysis is a methodology that allows the identification of the relationships between a large number of products purchased by different consumers. It was born as a Data Mining technique to support cross-selling and shelf placement of products; but it is also used in medical diagnosis, in bioinformatics, in the analysis of society on the basis of personal data, etc. In this session we will see how the new Machine Learning Services allow us to derive insights from this analysis directly in SQL Server, using the programming language R.
Instacart began in 2012 by former Amazon engineer who realized there is a gap in the industry from what online ordering and delivery service was supposed to be and what he was experiencing from a grocery delivery service living in San Francisco.Read More...http://www.infigic.com/instacart-business-model-revenue-how-instacart-works/
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
How DoorDash Works - Insights into Business ModelOyeLabs
DoorDash is an America based on-demand prepared food delivery service that was founded by four Standford students Tony Xu, Stanley Tang, Andy Fang, and Evan Moore in 2013. Since its inception, DoorDash has been on a roll.
I have done this analysis using SAS on a dataset with 5000 records. I have used CART and Logistic regression to build a predictive model to identify customers which are likely to shift to competitors network.
Frequent pattern mining is an analytical algorithm that is used by businesses and, is accessible in some self-serve business intelligence solutions. The FP Growth analytical technique finds frequent patterns, associations, or causal structures from data sets in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
Customer segmentation is a Project on Machine learning that is developed by using Clustering & clustering is the technique that comes under unsupervised learning of machine learning.
Segmentation allows prospects based on their wants and needs. It allows identifying the most valuable customer segment so the basis of it vender improve their return on marketing investment by only targeting those likely to be your best customer.
Market Basket Analysis in SQL Server Machine Learning ServicesLuca Zavarella
Market Basket Analysis is a methodology that allows the identification of the relationships between a large number of products purchased by different consumers. It was born as a Data Mining technique to support cross-selling and shelf placement of products; but it is also used in medical diagnosis, in bioinformatics, in the analysis of society on the basis of personal data, etc. In this session we will see how the new Machine Learning Services allow us to derive insights from this analysis directly in SQL Server, using the programming language R.
Instacart began in 2012 by former Amazon engineer who realized there is a gap in the industry from what online ordering and delivery service was supposed to be and what he was experiencing from a grocery delivery service living in San Francisco.Read More...http://www.infigic.com/instacart-business-model-revenue-how-instacart-works/
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
How DoorDash Works - Insights into Business ModelOyeLabs
DoorDash is an America based on-demand prepared food delivery service that was founded by four Standford students Tony Xu, Stanley Tang, Andy Fang, and Evan Moore in 2013. Since its inception, DoorDash has been on a roll.
I have done this analysis using SAS on a dataset with 5000 records. I have used CART and Logistic regression to build a predictive model to identify customers which are likely to shift to competitors network.
Frequent pattern mining is an analytical algorithm that is used by businesses and, is accessible in some self-serve business intelligence solutions. The FP Growth analytical technique finds frequent patterns, associations, or causal structures from data sets in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
Data Science - Part VI - Market Basket and Product Recommendation EnginesDerek Kane
This lecture provides an overview of association analysis, which includes topics such as market basket analysis and product recommendation engines. The first practical example centers around analyzing supermarket retailer product receipts and the second example touches upon the use of the association rules in the political arena.
Predicting online user behaviour using deep learning algorithmsArmando Vieira
We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. Table of Content
• Scope and objectives
• Introduction
• Modelling process
Data extraction
Data cleansing
• Association analysis
• Conclusion
3. Objective & Scope
Objective
• Our main objective was to
analyze our data to Identify
items based on the transaction
history of customers.
• Identify patterns of relationship
between data of customers using
association rules.
Scope
• Association Rule
• Tools been used: R
Studio, Microsoft
Excel
4. What is Instacart?
• Online grocery ordering app ,store.
• Aims to Deliver Groceries in an Hour.
5. Modelling Process
– Data Extraction
Data is extracted from Kaggle. This is an anonymized data on
customer orders over time.
6. - Data Cleaning
Naturally, unstructured data. Hence, data cleaning (or cleansing,
scrubbing) is important in further analysis. We cleaned our data, Orders
data for days_since_prior_order consist of some missing values so first we
will replace all our missing values with some mode of the values.
10. While most of the users have 8 products in their baskets, the average basket
contains 10 products. For determining the number of products in the future
baskets
The idea is to look at the purchase
history of each user, get the average
number of items in the baskets and
use this number for predicting the
number of items in future baskets.
11. The count and list the 15 most popular products in the basket
12. Fresh Veggie and Fresh Fruits are
most often sold by Aisle
So, basically we conclude that Fruits,Veggies Products have high probability to be ordered by
customers when he makes his next purchase
13. Milk or Dairy Products are the highest
reordered by customer
So, basically we conclude that Milk/Dairy Products have high probability to be ordered by
customers when he makes his next purchase
14. Association Analysis:
Association Identifies how the data items are associated with
each other.
Association rules are created by analyzing data patterns and
using the criteria support and confidence to identify the most
important relationships.
15. Support and Confidence
Support
• Support measures the probability of collection of items
being brought together.
Confidence
• Confidence measures that if a customer buys one product
‘A’ they will buy another product ‘B’, or A=>B. The
confidence of A =>B can be estimated as frequency that
someone will buy both A and B divided by the probability
they will buy A.
16. Rule 1:Low support and High Confidence
Support=0.003269976
Confidence=0.01
22. Conclusion
Using the association rules (rule 1-3), the next purchase of a
customer can be predicted based on his purchase history.
Rules can be refined further based on support and
confidence combination.
Using Jakart Index affinity between different item
combinations can be calculated which would help in
prediction of next purchase of customer.