The presentation describe what business goals may be driven by Recommender Systems, how to estimate the economic impact and determine when to start spending resources on RS.
Managing Supplier Performance with Advanced AnalyticsDan Traub
2017 NAEP Annual Meeting presentation by Dan Traub of FinVantage Solutions LLC and Judy Smith of HHMI. Reno, NV.
The Howard Hughes Medical Institute has implemented an innovative program based on advanced analytics which is elevating their strategic supplier management efforts. Utilizing data from multiple sources, including the latest Consumer Pricing Index, HHMI created a supplier scorecard which presents metrics on spend channels, top customers, order trends, freight percentage, and more. An in-depth pricing analysis measures pricing variation and computes a market basket for comparison to the CPI. Suppliers also receive a view into their relative market share with competitors. Attendees will learn the techniques involved and hear anecdotes of suppliers’ reactions to the new program.
How to Perform Churn Analysis for your Mobile Application?Tatvic Analytics
For every marketer of mobile application, acquiring new customers certainly requires more effort in terms of time and money. On the other hand, firm can always focus on maintaining existing customer base and gain maximum out of them. If this is the case, then predictive analysis will be the correct approach for this situation.
The primary goal of this webinar is to predict segment of Mobile application users,
* Who will uninstall the app
* Remain inactive (which will be also termed as a churner) for quite long time and are expected to churn.
Churn analysis is the approach by which we will predict the likelihood of this event to occur.
Our webinar covers:
* How to extract data from Google Analytics using R
* How to build churn model in R
* Identifying the customer/subscriber segment that are classified based on past data pattern, who are likely to churn (Study customer behavior Patterns)
Watch Full Webinar - http://www.tatvic.com/webinar/churn-analysis-for-mobile-application/
SPARK 2016: Why Your Social Content is Killing Your Bottom LineTrackMaven
See Noah Lomax's presentation from Spark 2016, TrackMaven's annual digital marketing summit focused on the intersection of marketing art and science. Learn more at spark.trackmaven.com
We’ve all heard it: “Organic reach is dead.” “There’s no point in posting if you aren’t using paid.” And for what? A goal of 4% engagement rates? With all of the data and insights at our fingertips, isn’t there another way? Turns out there is. Join us for a quick look at how some brands are reaching more than 60% of their fan base with engagement rates >30% while spending little or nothing.
QuestionPro Audience Webinar - How to Improve Data Quality For Your ResearchQuestionPro
With millions of people interacting online it only makes sense for market researchers to conduct research with online communities. The internet allows increased access to a wide range of diverse audiences, but gathering quality data from these participants can be challenging. This webinar will help you understand how to improve your data collection practices through better survey design methodology, tips to avoid response bias, variations in question styles and optimal data analysis. John Barrett, CEO of Priority Metrics Group, will share strategies regarding design methodologies that will keep your respondents engaged. With over 20 years experience working with Fortune 500 companies, John Barrett knows how to optimize results.
Managing Supplier Performance with Advanced AnalyticsDan Traub
2017 NAEP Annual Meeting presentation by Dan Traub of FinVantage Solutions LLC and Judy Smith of HHMI. Reno, NV.
The Howard Hughes Medical Institute has implemented an innovative program based on advanced analytics which is elevating their strategic supplier management efforts. Utilizing data from multiple sources, including the latest Consumer Pricing Index, HHMI created a supplier scorecard which presents metrics on spend channels, top customers, order trends, freight percentage, and more. An in-depth pricing analysis measures pricing variation and computes a market basket for comparison to the CPI. Suppliers also receive a view into their relative market share with competitors. Attendees will learn the techniques involved and hear anecdotes of suppliers’ reactions to the new program.
How to Perform Churn Analysis for your Mobile Application?Tatvic Analytics
For every marketer of mobile application, acquiring new customers certainly requires more effort in terms of time and money. On the other hand, firm can always focus on maintaining existing customer base and gain maximum out of them. If this is the case, then predictive analysis will be the correct approach for this situation.
The primary goal of this webinar is to predict segment of Mobile application users,
* Who will uninstall the app
* Remain inactive (which will be also termed as a churner) for quite long time and are expected to churn.
Churn analysis is the approach by which we will predict the likelihood of this event to occur.
Our webinar covers:
* How to extract data from Google Analytics using R
* How to build churn model in R
* Identifying the customer/subscriber segment that are classified based on past data pattern, who are likely to churn (Study customer behavior Patterns)
Watch Full Webinar - http://www.tatvic.com/webinar/churn-analysis-for-mobile-application/
SPARK 2016: Why Your Social Content is Killing Your Bottom LineTrackMaven
See Noah Lomax's presentation from Spark 2016, TrackMaven's annual digital marketing summit focused on the intersection of marketing art and science. Learn more at spark.trackmaven.com
We’ve all heard it: “Organic reach is dead.” “There’s no point in posting if you aren’t using paid.” And for what? A goal of 4% engagement rates? With all of the data and insights at our fingertips, isn’t there another way? Turns out there is. Join us for a quick look at how some brands are reaching more than 60% of their fan base with engagement rates >30% while spending little or nothing.
QuestionPro Audience Webinar - How to Improve Data Quality For Your ResearchQuestionPro
With millions of people interacting online it only makes sense for market researchers to conduct research with online communities. The internet allows increased access to a wide range of diverse audiences, but gathering quality data from these participants can be challenging. This webinar will help you understand how to improve your data collection practices through better survey design methodology, tips to avoid response bias, variations in question styles and optimal data analysis. John Barrett, CEO of Priority Metrics Group, will share strategies regarding design methodologies that will keep your respondents engaged. With over 20 years experience working with Fortune 500 companies, John Barrett knows how to optimize results.
NCompass Live - 2/8/17
http://nlc.nebraska.gov/ncompasslive/
How do researchers engage with special collections? Over the past three years, the Society of American Archivists and Association for College and Research Libraries have been teaming up to create, for the first time, a statistical standard to enable to archival repositories and special collections libraries to assess the services they provide their users according to common definitions. This webinar will provide you with an opportunity learn about the standard, and to provide the task force charged with its development vital feedback on the final stages of its drafting.
Christian Dupont (Boston College), Emilie Hardman (Harvard University), and Amy Schindler (University of Nebraska at Omaha) will review the task force’s most recent draft, which has benefited from important community feedback over the past few months. The task force is currently seeking comments on version 2 of the proposed new standard for archival repositories and special collections libraries.
This webinar will provide an overview of the current work undertaken to re-write the techniques for electronic resource management with the incorporation of open access workflow management. This overview will provide insight into the key areas under exploration and outline the feedback compiled from the two interactive sessions held at the UKSG Annual Conference. We will also talk about the next steps we undertake to share the development of this project.
Manuscript editing | Research data analyst | Data analysisPubrica
Big data analytics is significant because it allows businesses to use their data to uncover areas for improvement and optimization. Increasing efficiency leads to more intelligent operations, bigger earnings, and pleased consumers across all company sectors.
Read more @ https://pubrica.com/academy/manuscript-writing/how-to-prepare-a-manuscript-on-big-data-analytics/
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
In the talk I describe two approaches for improve the recall and precision of an enterprise search engine using machine learning techniques. The main focus is improving relevancy with ML while using your existing search stack, be that Luce, Solr, Elastic Search, Endeca or something else.
An in depth presentation on analysis of big data and its application in the advertising industry in order to reach maximum number or optimum customers.
Tags:
big data analytics
advertising and big data
advertising and big data ppt
ads with big data
how advertising use big data
how ads use big data
big data in advertising
advertising big data
big data engineer
big data engineer salary
big data dangerous
big data companies
big data band
big data technologies
big data tools
big data definition
big data analysis
big data analyst
big data architecture
big data architect
big data analytics tools
big data analyst salary
big data analytics is usually associated with
a big data-as-a-service framework state-of-the-art and perspectives
big data book
big data bowl
big data breaches
big data baseball
big data benefits
big data bombs over brooklyn
big data bootcamp
iim b big data analytics
iim b big data
b tech in big data analytics
a/b testing big data
difference b/w big data and hadoop
b tech projects on big data
Recommendations are everywhere : music, movies, books, social medias, e-commerce web sites… The Web is leaving the era of search and entering one of discovery. This quick introduction will help you to understand this vast topic and why you should use it.
*Updated and reorganised following feedback in the breakouts*
While many librarians have developed mechanisms and
structures for managing local scholarship separate from
their standard resource management practices, the
intersection of the two content streams is occurring at
many institutions. During the past decade the presenters
have dedicated themselves to capturing best practices
of electronic resource management and mapping out
paths for creating open access workflows. Join them for a
lively discussion and interactive session where they outline
ways to bring these two initiatives together and identify the
teams needed.
Graham Stone
Jisc Collections
Peter McCracken
Cornell University
Jill Emery
Portland State University Library
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
PCG's Emilie Delquie's presentation to the ASA Conference on 28th February 2012 on why the best way of predicting the future is inventing it. New roles for the publishing intermediary.
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that employ this new technology.
Dodging AI biases in future-proof Machine Translation solutionsKonstantin Savenkov
We all want to act locally while going global, and maintain an inclusive multilingual work environment for the international workforce. Every AI model has its linguistic, cultural, and geopolitical biases. Besides providing better linguistic quality for specific languages and domains, a particular Machine Translation system may not be fully compliant with local dialect, tone of voice, gender, and data locality rules. In this talk, we consider practical cases when those biases create obstacles in building a global presence and an inclusive multilingual work environment for an international company. We discuss how to dodge those biases by using multi-vendor international AI, and in some cases go further, by leveraging those biases to create more diverse and inclusive translations.
NCompass Live - 2/8/17
http://nlc.nebraska.gov/ncompasslive/
How do researchers engage with special collections? Over the past three years, the Society of American Archivists and Association for College and Research Libraries have been teaming up to create, for the first time, a statistical standard to enable to archival repositories and special collections libraries to assess the services they provide their users according to common definitions. This webinar will provide you with an opportunity learn about the standard, and to provide the task force charged with its development vital feedback on the final stages of its drafting.
Christian Dupont (Boston College), Emilie Hardman (Harvard University), and Amy Schindler (University of Nebraska at Omaha) will review the task force’s most recent draft, which has benefited from important community feedback over the past few months. The task force is currently seeking comments on version 2 of the proposed new standard for archival repositories and special collections libraries.
This webinar will provide an overview of the current work undertaken to re-write the techniques for electronic resource management with the incorporation of open access workflow management. This overview will provide insight into the key areas under exploration and outline the feedback compiled from the two interactive sessions held at the UKSG Annual Conference. We will also talk about the next steps we undertake to share the development of this project.
Manuscript editing | Research data analyst | Data analysisPubrica
Big data analytics is significant because it allows businesses to use their data to uncover areas for improvement and optimization. Increasing efficiency leads to more intelligent operations, bigger earnings, and pleased consumers across all company sectors.
Read more @ https://pubrica.com/academy/manuscript-writing/how-to-prepare-a-manuscript-on-big-data-analytics/
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
In the talk I describe two approaches for improve the recall and precision of an enterprise search engine using machine learning techniques. The main focus is improving relevancy with ML while using your existing search stack, be that Luce, Solr, Elastic Search, Endeca or something else.
An in depth presentation on analysis of big data and its application in the advertising industry in order to reach maximum number or optimum customers.
Tags:
big data analytics
advertising and big data
advertising and big data ppt
ads with big data
how advertising use big data
how ads use big data
big data in advertising
advertising big data
big data engineer
big data engineer salary
big data dangerous
big data companies
big data band
big data technologies
big data tools
big data definition
big data analysis
big data analyst
big data architecture
big data architect
big data analytics tools
big data analyst salary
big data analytics is usually associated with
a big data-as-a-service framework state-of-the-art and perspectives
big data book
big data bowl
big data breaches
big data baseball
big data benefits
big data bombs over brooklyn
big data bootcamp
iim b big data analytics
iim b big data
b tech in big data analytics
a/b testing big data
difference b/w big data and hadoop
b tech projects on big data
Recommendations are everywhere : music, movies, books, social medias, e-commerce web sites… The Web is leaving the era of search and entering one of discovery. This quick introduction will help you to understand this vast topic and why you should use it.
*Updated and reorganised following feedback in the breakouts*
While many librarians have developed mechanisms and
structures for managing local scholarship separate from
their standard resource management practices, the
intersection of the two content streams is occurring at
many institutions. During the past decade the presenters
have dedicated themselves to capturing best practices
of electronic resource management and mapping out
paths for creating open access workflows. Join them for a
lively discussion and interactive session where they outline
ways to bring these two initiatives together and identify the
teams needed.
Graham Stone
Jisc Collections
Peter McCracken
Cornell University
Jill Emery
Portland State University Library
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
PCG's Emilie Delquie's presentation to the ASA Conference on 28th February 2012 on why the best way of predicting the future is inventing it. New roles for the publishing intermediary.
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that employ this new technology.
Dodging AI biases in future-proof Machine Translation solutionsKonstantin Savenkov
We all want to act locally while going global, and maintain an inclusive multilingual work environment for the international workforce. Every AI model has its linguistic, cultural, and geopolitical biases. Besides providing better linguistic quality for specific languages and domains, a particular Machine Translation system may not be fully compliant with local dialect, tone of voice, gender, and data locality rules. In this talk, we consider practical cases when those biases create obstacles in building a global presence and an inclusive multilingual work environment for an international company. We discuss how to dodge those biases by using multi-vendor international AI, and in some cases go further, by leveraging those biases to create more diverse and inclusive translations.
This presentation covers our approach to building multi-purpose MT deployments. We talk about different enterprise use-cases for MT and the requirements of those use-cases. Since those requirements often have nothing to do with the objective linguistic quality, sometimes you don't want to select a specific MT engine just to meet them. Therefore, we provide some examples of how it's possible to fulfill those requirements by building NLP on top of your favorite Machine Translation black box.
Talk at Stanford HAI Workshop on "Measurement in AI Policy: Opportunities and Challenges", October 30, 2019, Stanford, USA
When we procure Machine Translation vendors for the multi-vendor MT solutions we build for enterprises, we run a lot of MT evaluation projects. We evaluate commercial MT systems on public and private data to find the best system for a specific language pair and domain. These evaluations are quite different from what you see in WMT benchmarks, as we evaluate commercial systems, which are optimized for economic efficiency and real-time performance.
Talk by Konstantin Savenkov (Intento, Inc.) at Developer Week 2019 (Seattle: Cloud Edition).
There are already hundreds of AI functions available via different APIs. Pick Machine Translation, Sentiment Analysis, Image Tagging or anything else - there's already a choice of 20-30 AI vendors to pick from. I will make a brief overview of what types of models are already available in the cloud, which of those enable customization and important things to look after when selecting the model (or a set of those) for a specific project. I will also touch AI API developer experience, to give an idea what a developer should be prepared for when choosing the API to work with.
https://devweeksea2019.sched.com/event/OGSp/pro-talk-cloud-ai-landscape
State of the Machine Translation by Intento (stock engines, Jun 2019)Konstantin Savenkov
Evaluation of 25 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, CloutTranslate, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Tilde, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Machine Translation by Intento (stock engines, Jan 2019)Konstantin Savenkov
Evaluation of 23 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Domain-Adaptive Machine Translation by Intento (November 2018)Konstantin Savenkov
In this report, we have evaluated 6 modern domain-adaptive NMT engines on Biomedical dataset (English to German). ModernMT, Globalese, Google AutoML, IBM Custom NMT, Microsoft Custom Translate, and Tilde. We explored how they compare by performance (using reference-based scores, linguistic quality analysis and automatic quality estimation), total cost of ownership, dataset size requirements, training time, data protection policy and how to start using this advanced technology.
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...Konstantin Savenkov
We discuss the importance of evaluating pre-built and customizable MT engines towards different goals in Post-Edited Machine Translation (PEMT) and raw MT settings, as well as different approaches to those evaluations. We'll cover main pitfalls on the path to choose the right MT engine and possible workarounds. The primary focus is on reference-based assessment and how we run them at Intento.
School of Advanced Technologies for Translators
Friday 14 and Saturday 15 September 2018 - Milano (Italy)
https://satt2018.fbk.eu/
Improving the Demand Side of the AI Economy (API World 2018)Konstantin Savenkov
Training AI in-house is often infeasible as it requires a critical mass of talent and data, and has high R&D risks. For Cognitive AI, like machine translation and speech recognition, hundreds of pre-trained and adaptive models are already available on the market via APIs from many vendors. Their performance varies case by case and change often. Their prices are 100x-200x times different, hence a wrong choice may be a complete miss.
In this talk, we argue that the only way to go is to evaluate and continuously optimize AI vendor portfolio and introduce our vendor-agnostic demand-side API platform for AI.
Evaluation of 19 major Cloud Machine Translation Engines (Alibaba, Amazon, Baidu, DeepL, Google, GRCom, IBM SMT and NMT, Microsoft SMT and NMT, ModernMT, PROMT, SAP, SDL Language Cloud, Systran SMT and PNMT, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyse how the MT landscape changed over the last year.
In this survey, we compare features, language support, and pricing for 15 vendors of Sentiment Analysis.
We consider only hosted services with public API: several algorithms on Algorithmia marketplace, Microsoft Text Analytics, Repustate, Google Cloud Natural Language, IBM Watson NLU,
Meaning Cloud, TheSay PreCeive, AWS Comprehend, Aylien,
Bozon NLP, Salesforce Einstein Language, Twinword.
Evaluation of 14 major Cloud Machine Translation Engines (Google, Microsoft, IBM, IBM NMT, SAP, Amazon, Yandex, SDL, Systran, Systran PNMT, Baidu, GTCom, PROMT, DeepL) for 48 language pairs: performance, quality, language coverage, API update frequency.
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
Evaluation of 11 major Machine Translation (Google, Microsoft, IBM, SAP, Yandex, SDL, Systran, Baidu, GTCom, PROMT, DeepL) providers for 35 most popular language pairs: performance, quality, language coverage, API update frequency.
We have evaluated intent prediction performance, false positives, learning rate, language coverage, response time and pricing for 7 NLU providers: Amazon Lex, Facebook’s wit.ai, IBM Watson Conversation, Google’s API.ai, Microsoft LUIS, Recast.ai, Snips.ai
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. The Economics of Recommender
Systems
Konstantin Savenkov,
COO at Bookmate
http://bookmate.com
2. Target audience
• RS enthusiasts, to get a context they may
lack otherwise
• B2C services and apps, to understand
how much resources to spend on RS
• data scientists and evangelists, to sell
your idea inside the company
• big data startups, to justify the business
model and sell it to investors
• big data businesses, to set fair prices and
convince potential customers
3.
4. Agenda
• Academy vs. industrial settings in RS
• Recommender Systems for content
discovery
• Business model for B2C content service
• Unit economics and underlying KPI
• Driving business goals with RS:
– conversion
– retention
– catalogue exploitation
– reactivation
RS
5. Methods, e.g:
Preserving locality during
matrix factorisation
Speeding up Gradient
Descent using Alternating
Least Squares
BASIC RESEARCH
Different shades of RS
6. Tools, e.g:
Achieve better filtering of
historical data
Combine several methods to
apply for a new domain and
prove NDCG is better
BASIC RESEARCH
APPLIED RESEARCH
Different shades of RS
7. Using it in
Production, e.g:
Pick a paper and reproduce
the result on live users
Achieve appropriate
response time
Combine offline and online
model updates to simulate
feedback on user actions
BASIC RESEARCH
APPLIED RESEARCH
TECHNOLOGY TRANSFER
Different shades of RS
8. BASIC RESEARCH
Does it pushes the
needle?
What are the benefits?
How to estimate them?
How to justify expenses on
RS?
When to start spending
resources?
Should we invest in RS or
better UX or add some
social features?
APPLIED RESEARCH
TECHNOLOGY TRANSFER
BUSINESS
Different shades of RS
9. Different shades of RS
BASIC RESEARCH
Does it pushes the
needle?
What are the benefits?
How to estimate them?
How to justify expenses on
RS?
When to start spending
resources?
Should we invest in RS or
better UX or add some
social features?
APPLIED RESEARCH
TECHNOLOGY TRANSFER
BUSINESS
This course
This lecture
10. Academy vs.Tech vs. Business
How to
improve
performance
by X%
How
hard is to
implement
that?
A:
T:
B:
When gains
match costs?
11. “It’s tempting, if the only tool
you have is a hammer, to
treat everything as a nail.”
* Despite the topic of the course, try to avoid the BigData bias
Abraham Maslow,The Psychology of Science, 1966
12. Setting scope #1: Content discovery
Importance of Recommender Systems for
content discovery:
– hard to describe preferences in textual form
– textual relevance doesn’t work well
– preference elicitation
– limited catalogue
“IWANTTO READ SOMETHING…”
EVEN FOR BOOKS!
LOOKING FOR UNKNOWN UNKNOWNS
REGIONAL SEGMENTATION
13. User with a book problem
Search case
Recommendation case
14. RS in the Interface
• Any place in the interface, when number of
objects to show exceeds available space
• Most of the interfaces are list-based
• Hence, order and size of the list can be
defined by either personalized or non-
personalized algorithm
• Explaining recommendations is a different
topic
There is no “no recommender system” setting.
If there’s “just something” or “popularity sorted”, that’s your RS
!
17. Setting scope #2: B2C Content Service
• User pays either subscription, or per
download, or hybrid
• User has a limited attention and time to share
with the service
• Content may have different cost for service
• Content itself is not a competitive advantage
• User aid to select proper content is a
competitive advantage
18. Unit Economics
• Business at scale (marginal revenue and expenses per user)
LTV
Cost of content
CAC
userlifetime
ARPU
ARPU
…
PROFIT!
19. How the
product
works
• Each connection here is
driven and improved by
business activities
• The content itself fits
into a sort of a BCG
matrix:
GROWTH
COSTS
CAC
22. Recommender Systems & KPI
• Users mostly convert via content (paywall)
– content is responsible for up to 10x difference in
conversion
– recommending content for new users raises the
conversion
• Users need help to discover content during
lifetime
– recurrent reading achieves recurrent payments
– customized aid increases user loyalty
– recommending content for loyal users increases
lifetime
• Long tail content costs less
– Recommending for diversity reduces costs
25. 1. We reduce resources waste on everything
that doesn’t push the needle.
2. There are no recipes on start, all we can is to
propose a hypothesis and experiment.
Conclusions:
• if there’s a proper place in the interface, you
may apply RS and see the effect
Setting scope #3: Lean formulation
offline and online testing results
often don’t correlate
NO ALL-INS AND LEAPS OF FAITH
26. RS for Conversion / CAC
• Hypotheses to prove:
1. There’re enough users who will use RS output
2. Their conversion will be above average
• A/B testing is the only way:
– different channels convert with up to 20x difference
– current traffic mix is unpredictable and hard to
control in the case of app installs
• Do pilots:
– Run with limited resources, then extrapolate and
decide if run full-scale
27. RS for Conversion / CAC
• Two approaches to estimate:
1. increase of revenue from additionally converted
users
2. decrease of CAC
• same amount of marketing expenses attract more
customers due to raised conversion, therefore CAC is
reduced
• Suits for estimating various models of RS costs:
– upfront costs (then the investments will return)
– flat fee (monthly license or added headcount)
– variable costs (CPA or PaaS model)
28. Case Study (Bookmate / E-Contenta)
• New users get 3 books as a starter
– group A – editorial books (non-personalised)
– group B – personalized based on social profile (cold-start
recommender) provided by E-Contenta service
• Two steps in the funnel:
1. User didn’t know what to read and used RS
2. User converted afterwards
• Straight to the results:
– step 1 – 2.17x higher for RS, step 2 – a bit lower
– overall, 1.4x increase of conversion for such users (3 sigma)
• Sounds promising! Did 40% more users become converted?
• Not really, as there’s just 7% of users who didn’t know what
to book to start with
29. Let’s look at the economics
• Let’s assume we attract 1000 new customers/
month, CAC = $5 (model data), the
conversion from traffic is X%
• Therefore, 1.4 increase of the conversion for 7%
of overall traffic results in x1.028 increase of
overall conversion
• That is, we’ll get 28 new customers more for the
same $5000
• That’s equivalent to:
– reducing CAC by 14 cents
– reducing marketing budget by $136/month
30. Conclusions from the pilot
• In case of using third-party RS on CPA basis
(payment per converted user), CPA is limited
by 14 cents per user
– actually, should be less as both sides should get
benefits
• In case of a flat license fee of, say, $1000, this is
economically efficient starting from 7143 new
customers per month
– or $35000 monthly marketing budget
31. RS for Retention / LTV
• Hypotheses to prove:
1. User pays as long as he finds what to read
2. There’re enough users who will use RS output
3. This channel has a discoverability above average
• Ideal experiment:A/B, then count actual lifetime
– with lifetime close to year, it’s too long to wait
• Solution:
– do separate A/B for different user cohorts (new, 1
month old, 2 months old etc)
– estimate significant change in month-to-month
retention for each cohorts
32. Model case
• Recommender system led to increase of
month-to-month retention from 3% (fresh
cohorts) to 0.5% (old cohorts)*
Here’s the benefit
(area is equal to
# of ARPU gains)
*
the
numbers
are
not
from
the
actual
case
and
provided
to
showcase
es6ma6ons
33. Let’s look at the economics
• Increase of the month-to-month retention
leads to the increase of the user lifetime:
– group A: 9 months
– group B: 11.6 months
• That means 29% increase of LTV
• It may be spend this either to attract more
users with the same marginal earnings or to
increase profitability
34. If this is still too long…
• Older cohorts may have too few users to
achieve statistical significance
• Proxy metrics may be estimated
– content discovery funnels: conversion of books
from opened to read
– to use that, a hypotheses “more reads lead to
increase of retention” needs to be proven
35. RS for catalogue exploitation
• complex case, as it affects both conversion and
retention
• hypotheses to prove:
1. Recommender system may expose users to a
content mix with more marginal profits
2. Conversion and retention would be the same or
decrease of costs will overweight decrease of
conversion and retention
3. There’re enough users who will use RS output
36. Case Study
• A bit too big to roll out in a presentation
• OK, just a bit: adding recommender system to the
interface really drives users out of search:
• as a homework, you may estimate how good
should be RS at reducing the costs to justify
$1000/month expenses.
37. Wrapping up
• The proper business approach to
Recommender Systems – run a pilot to
estimate some numbers, then conclude if you
have enough scale to afford the expenses
• The simplest recommender will probably
achieve you 80% of possible performance
– if it doesn’t, the problem is most likely not in the
algorithm
• And again,
38. Questions?
• Can you provide some data for my academic
research?
– Yes, probably!
• Do you have enough scale to hire me as a
Recommender Systems specialist?
– Most likely!
• May I ask some questions via email?
– Sure!
KS@BOOKMATE.COM