This presentation presents a high level overview of recommender systems and active learning, including from the viewpoint of startups vs. established companies, the cold-start problem, etc.
Cities and Startups: Cultivating Deep EngagementCode for America
Cities and Startups: Cultivating Deep Engagement
FastFWD, City of Philadelphia
Story Bellows, co-director of the Philadelphia Mayor's Office of New Urban Mechanics
Watch the video online: https://www.youtube.com/watch?v=PRKUCCHj-08&list=PL65XgbSILalVoej11T95Tc7D7-F1PdwHq&index=4
Get involved with Code for America: www.codeforamerica.org/action
In this presentation, we’ll go over real-world use cases of Machine Learning and Artificial Intelligence in web and mobile applications, and we’ll explain how they work. We’ll discuss opportunities for startups in all domains to create value from data (big or small) and to create innovative, predictive features in their applications.
We’ll review existing technologies that make Machine Learning accessible, in particular with automatic selection of algorithms, auto-tuning of parameters, and auto-scaling. Deep Learning (a subset of Machine Learning techniques which is getting a lot of press due to recent advances and successes) is also being made accessible without costly hardware and, in certain cases, without requiring specialized knowledge.
The main message for developers is that they can easily use the power of machine intelligence without having to rely on a team of Data Scientists. This will be illustrated in more detail with concrete use cases: priority detection and image categorization.
Ultra brief and ultra draft overview of investor's look at machine learning / deep learning startups by Victor Osyka of Almaz Capital, https://www.linkedin.com/in/victorosyka or http://fb.com/victor.osika
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...Rod King, Ph.D.
Fast Validated Learning is at the core of the Lean Startup Method. However, learning and mastering the Lean Startup Method is a time-consuming, arduous, and expensive venture. The main reason is that Lean Startup tools are developed, learned, and applied using a Fragmented Learning approach. There is an exponential increase in the number of Lean Startup tools. However, Lean Startup tools hardly talk to each other; they do not share a register or common vocabulary of topics,
Question-tags are very powerful tools for organizing and managing ideas as well as tools in any methodology including the Lean Startup Method. In this presentation, six question-tags and basic templates are presented. These question-tags and templates can be used as the basic building blocks or "atoms" for creating tools ("molecules" and "compounds") for Universal Problem Solving & Project Management (UPSPM). In other words, the presented blank and annotated Question (Q)-Templates can be used for discovering, solving, and managing problems in every domain.
For Lean Startups, these Q-Templates are the basic tools for effectively as well as efficiently organizing and managing Lean Startup projects. These Q-Templates can be put together to function like any Lean Startup tool' for instance, Validation Board, Value Proposition Canvas, Business Model Canvas, and Lean Canvas. Also, all business tools can be deconstructed or decomposed using the Q-Templates.
Investors foresee a safe bet on deep tech startupseTailing India
Indian deep technology start-ups have become the most sought after bets for angels and venture capital (VC) funds for their potential to scale up rapidly and be able to offer an opportunity for early exit for the investors.
Cities and Startups: Cultivating Deep EngagementCode for America
Cities and Startups: Cultivating Deep Engagement
FastFWD, City of Philadelphia
Story Bellows, co-director of the Philadelphia Mayor's Office of New Urban Mechanics
Watch the video online: https://www.youtube.com/watch?v=PRKUCCHj-08&list=PL65XgbSILalVoej11T95Tc7D7-F1PdwHq&index=4
Get involved with Code for America: www.codeforamerica.org/action
In this presentation, we’ll go over real-world use cases of Machine Learning and Artificial Intelligence in web and mobile applications, and we’ll explain how they work. We’ll discuss opportunities for startups in all domains to create value from data (big or small) and to create innovative, predictive features in their applications.
We’ll review existing technologies that make Machine Learning accessible, in particular with automatic selection of algorithms, auto-tuning of parameters, and auto-scaling. Deep Learning (a subset of Machine Learning techniques which is getting a lot of press due to recent advances and successes) is also being made accessible without costly hardware and, in certain cases, without requiring specialized knowledge.
The main message for developers is that they can easily use the power of machine intelligence without having to rely on a team of Data Scientists. This will be illustrated in more detail with concrete use cases: priority detection and image categorization.
Ultra brief and ultra draft overview of investor's look at machine learning / deep learning startups by Victor Osyka of Almaz Capital, https://www.linkedin.com/in/victorosyka or http://fb.com/victor.osika
SUPERSMART LEARNING TOOLS for Lean Startups: Volume 1 - Six Question (Q) Temp...Rod King, Ph.D.
Fast Validated Learning is at the core of the Lean Startup Method. However, learning and mastering the Lean Startup Method is a time-consuming, arduous, and expensive venture. The main reason is that Lean Startup tools are developed, learned, and applied using a Fragmented Learning approach. There is an exponential increase in the number of Lean Startup tools. However, Lean Startup tools hardly talk to each other; they do not share a register or common vocabulary of topics,
Question-tags are very powerful tools for organizing and managing ideas as well as tools in any methodology including the Lean Startup Method. In this presentation, six question-tags and basic templates are presented. These question-tags and templates can be used as the basic building blocks or "atoms" for creating tools ("molecules" and "compounds") for Universal Problem Solving & Project Management (UPSPM). In other words, the presented blank and annotated Question (Q)-Templates can be used for discovering, solving, and managing problems in every domain.
For Lean Startups, these Q-Templates are the basic tools for effectively as well as efficiently organizing and managing Lean Startup projects. These Q-Templates can be put together to function like any Lean Startup tool' for instance, Validation Board, Value Proposition Canvas, Business Model Canvas, and Lean Canvas. Also, all business tools can be deconstructed or decomposed using the Q-Templates.
Investors foresee a safe bet on deep tech startupseTailing India
Indian deep technology start-ups have become the most sought after bets for angels and venture capital (VC) funds for their potential to scale up rapidly and be able to offer an opportunity for early exit for the investors.
Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsDatentreiber
SELF-SERVICE.AI IN A NUTSHELL
Background:> artificial intelligence enables SaaS companies to build intelligent self-service solutions for complex tasks such as customer service, personal scheduling, dynamic pricing, ad targeting etc.
Objective:> provide a networking platform for AI-driven SaaS start-ups to present their product and team to high profile clients, partners and investors by organizing a start-up pitch competition.
Audience:> start-ups from any country worldwide at any given stage with a Software-as-a-Service product that uses artificial intelligence (i.e. machine & deep learning, predictive & prescriptive analytics etc.) to provide a self-service solution for companies or consumers that solve a concrete business problem or serve a certain need.
Examples:> existing AI-driven SaaS startups are e.g.: Clarifai, x.ai, Api.ai, Versium, Gpredictive, collectAI, trbo, DigitalGenius, DataMinr and many more to come.
Investor's view on machine intelligence startups, 2.0, Jan 2017Victor Osyka
Updated deeper overview of investor's look at machine learning / deep learning startups, with slight Russian accent. =)
Some slides are courtesy of Russia.ai and personally great friend @Petr Zhegin:
#23, #28 are from http://www.russia.ai/single-post/2016/09/21/Ten-Russian-speaking-venture-capital-funds-one-may-consider-to-back-an-AI-startup
#30 insights are from http://www.slideshare.net/RussiaAI/artificial-intelligence-investment-trends-and-applications-h1-2016
Victor Osyka of Almaz Capital, http://fb.com/victor.osika, http://medium.com/@victorosyka
Deep learning in production with the bestAdam Gibson
Getting deep learning adopted at your company. The current landscape of academia vs industry. Presentation at AI with the best (online conference):
http://ai.withthebest.com/
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...BootstrapLabs
This report covers companies that provide the infrastructure for creating Artificial Intelligence. These Infrastructure companies include those working on Machine Learning, Deep Learning based platforms, libraries. Some of theses companies also provide platforms for Natural Language Processing and Visual Recognition. In the Applications section, the report covers companies leveraging AI techniques to build applications tailored for end use in Enterprise, Industry & Consumer sectors.
Over $1B has been invested in AI-Infrastructure startups since 2010 with ¬$340M being invested in 2015. Over $7.5B has been invested in AI-Applications startups since 2010 with $2.3B being invested in 2015.
This is the slide that Terry. T. Um gave a presentation at Kookmin University in 22 June, 2014. Feel free to share it and please let me know if there is some misconception or something.
(http://t-robotics.blogspot.com)
(http://terryum.io)
Scalable Data Science and Deep Learning with H2O
In this session, we introduce the H2O data science platform. We will explain its scalable in-memory architecture and design principles and focus on the implementation of distributed deep learning in H2O. Advanced features such as adaptive learning rates, various forms of regularization, automatic data transformations, checkpointing, grid-search, cross-validation and auto-tuning turn multi-layer neural networks of the past into powerful, easy-to-use predictive analytics tools accessible to everyone. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases.
By the end of the hands-on-session, attendees will have learned to perform end-to-end data science workflows with H2O using both the easy-to-use web interface and the flexible R interface. We will cover data ingest, basic feature engineering, feature selection, hyperparameter optimization with N-fold cross-validation, multi-model scoring and taking models into production. We will train supervised and unsupervised methods on realistic datasets. With best-of-breed machine learning algorithms such as elastic net, random forest, gradient boosting and deep learning, you will be able to create your own smart applications.
A local installation of RStudio is recommended for this session.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
Deep Learning R Vignette Documentation: https://github.com/0xdata/h2o/tree/master/docs/deeplearning/
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice in traditional business analytics.
This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of enterprise-scale problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization and optimization for class imbalance. World record performance on the classic MNIST dataset, best-in-class accuracy for eBay text classification and others showcase the power of this game changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
About the Speaker: Arno Candel
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes.
He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
BigDL is a distributed deep Learning framework built for Big Data platform using Apache Spark. It combines the benefits of “high performance computing” and “Big Data” architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speedup than out-of-box open source DL frameworks (e.g., Caffe/Torch) wrt single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture. We’ll also share how our users adopt BigDL for their deep learning applications (such as image recognition, object detection, NLP, etc.), which allows them to use their Big Data (e.g., Apache Hadoop and Spark) platform as the unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Link to our Github:
https://github.com/tadeha/music-recommender-system
Authors:
Niloufar Farajpour, Mohamadreza Kiani, Mohamadreza Fereydooni, Tadeh Alexani
In this project, we build a music recommender system model to predict a playlist for each user of the BeepTunes dataset according to their taste and collection of track info.
BeepTunes is the largest digital music store in Iran.
We used a hybrid approach combining Collaborative Filtering and Content-Based Filtering techniques.
Technologies Used: Machine Learning, Hadoop, Spark MLlib, Python, Flask, MongoDB
Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsDatentreiber
SELF-SERVICE.AI IN A NUTSHELL
Background:> artificial intelligence enables SaaS companies to build intelligent self-service solutions for complex tasks such as customer service, personal scheduling, dynamic pricing, ad targeting etc.
Objective:> provide a networking platform for AI-driven SaaS start-ups to present their product and team to high profile clients, partners and investors by organizing a start-up pitch competition.
Audience:> start-ups from any country worldwide at any given stage with a Software-as-a-Service product that uses artificial intelligence (i.e. machine & deep learning, predictive & prescriptive analytics etc.) to provide a self-service solution for companies or consumers that solve a concrete business problem or serve a certain need.
Examples:> existing AI-driven SaaS startups are e.g.: Clarifai, x.ai, Api.ai, Versium, Gpredictive, collectAI, trbo, DigitalGenius, DataMinr and many more to come.
Investor's view on machine intelligence startups, 2.0, Jan 2017Victor Osyka
Updated deeper overview of investor's look at machine learning / deep learning startups, with slight Russian accent. =)
Some slides are courtesy of Russia.ai and personally great friend @Petr Zhegin:
#23, #28 are from http://www.russia.ai/single-post/2016/09/21/Ten-Russian-speaking-venture-capital-funds-one-may-consider-to-back-an-AI-startup
#30 insights are from http://www.slideshare.net/RussiaAI/artificial-intelligence-investment-trends-and-applications-h1-2016
Victor Osyka of Almaz Capital, http://fb.com/victor.osika, http://medium.com/@victorosyka
Deep learning in production with the bestAdam Gibson
Getting deep learning adopted at your company. The current landscape of academia vs industry. Presentation at AI with the best (online conference):
http://ai.withthebest.com/
BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...BootstrapLabs
This report covers companies that provide the infrastructure for creating Artificial Intelligence. These Infrastructure companies include those working on Machine Learning, Deep Learning based platforms, libraries. Some of theses companies also provide platforms for Natural Language Processing and Visual Recognition. In the Applications section, the report covers companies leveraging AI techniques to build applications tailored for end use in Enterprise, Industry & Consumer sectors.
Over $1B has been invested in AI-Infrastructure startups since 2010 with ¬$340M being invested in 2015. Over $7.5B has been invested in AI-Applications startups since 2010 with $2.3B being invested in 2015.
This is the slide that Terry. T. Um gave a presentation at Kookmin University in 22 June, 2014. Feel free to share it and please let me know if there is some misconception or something.
(http://t-robotics.blogspot.com)
(http://terryum.io)
Scalable Data Science and Deep Learning with H2O
In this session, we introduce the H2O data science platform. We will explain its scalable in-memory architecture and design principles and focus on the implementation of distributed deep learning in H2O. Advanced features such as adaptive learning rates, various forms of regularization, automatic data transformations, checkpointing, grid-search, cross-validation and auto-tuning turn multi-layer neural networks of the past into powerful, easy-to-use predictive analytics tools accessible to everyone. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases.
By the end of the hands-on-session, attendees will have learned to perform end-to-end data science workflows with H2O using both the easy-to-use web interface and the flexible R interface. We will cover data ingest, basic feature engineering, feature selection, hyperparameter optimization with N-fold cross-validation, multi-model scoring and taking models into production. We will train supervised and unsupervised methods on realistic datasets. With best-of-breed machine learning algorithms such as elastic net, random forest, gradient boosting and deep learning, you will be able to create your own smart applications.
A local installation of RStudio is recommended for this session.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
Deep Learning R Vignette Documentation: https://github.com/0xdata/h2o/tree/master/docs/deeplearning/
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice in traditional business analytics.
This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of enterprise-scale problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization and optimization for class imbalance. World record performance on the classic MNIST dataset, best-in-class accuracy for eBay text classification and others showcase the power of this game changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
About the Speaker: Arno Candel
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes.
He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
BigDL is a distributed deep Learning framework built for Big Data platform using Apache Spark. It combines the benefits of “high performance computing” and “Big Data” architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speedup than out-of-box open source DL frameworks (e.g., Caffe/Torch) wrt single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture. We’ll also share how our users adopt BigDL for their deep learning applications (such as image recognition, object detection, NLP, etc.), which allows them to use their Big Data (e.g., Apache Hadoop and Spark) platform as the unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Link to our Github:
https://github.com/tadeha/music-recommender-system
Authors:
Niloufar Farajpour, Mohamadreza Kiani, Mohamadreza Fereydooni, Tadeh Alexani
In this project, we build a music recommender system model to predict a playlist for each user of the BeepTunes dataset according to their taste and collection of track info.
BeepTunes is the largest digital music store in Iran.
We used a hybrid approach combining Collaborative Filtering and Content-Based Filtering techniques.
Technologies Used: Machine Learning, Hadoop, Spark MLlib, Python, Flask, MongoDB
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
Industry day keynote presentation held at ECIR 2016, Padova. The talk presents algorithmic, technical and business challenges Gravity R&D encountered from building a recommender system vendor company from being a top Netflix Prize contender.
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Lean Startup + Story Mapping = Awesome Products FasterBrad Swanson
To deliver the right outcomes, you need to learn your customers needs and validate your assumptions as early as possible. This means getting an early version of your product completed to start testing, validating and improving. This session will demonstrate how to combine Lean Startup and User Story Mapping techniques to determine where to start and how to learn early and often.
Participants will start with a partially completed Lean Canvas to flesh out and then define a product roadmap by building a Story Map. We will use Lean Startup concepts of Minimal Viable Product (MVP) and validated learning to focus on outcome over output.
Learning objectives:
Understand the importance of accelerated learning and techniques to achieve it
How a Lean Canvas can help shape your product vision and MVP
How to build a story map to create a product roadmap
How to use a story map to validate your users' journey
Solving the AL Chicken-and-Egg Corpus and Model ProblemNeil Rubens
paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/28_Paper.pdf
tool: https://github.com/move-tool/gephi-plugins
Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts, but has the limitation that it is carried out in a closed-loop manner, selecting points that will improve an existing model. When there is no model, or the task(s) is even under-defined (such as studying corpora-less phenomena), use of traditional AL is inapplicable. To remedy this, we propose a novel method for model-free AL that focuses on utilising phenomena as desirable characteristics. We introduce a tool, MOVE, that helps iteratively visualise and refine these characteristics. We show its potential on a real world case-study of a corpus we are developing.
Presentation given by Neil Rubens at the Centre for Database and Information Systems (Prof. Ricci), Free University of Bozen-Bolzano
For more information see http://activeintelligence.org/research/al-rs/
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
6. Value of RS
• Amazon: 35% of sales from recommendations
• Netflix: 2/3 of the movies watched are
recommended
• Choicestream: 28% of the people would buy
more music if they found what they liked
• Google News: recommendations generate
38% more click-throughs
www.slideshare.net/kerveros99/machine-learning-for-recommender-systems-mlss-2015-sydney 6
9. •Assumption: preferences of “similar”
items/users stay similar
•Similarity: variety of ways to define
9
Common Approach
10. Use ratings to estimate “similarity”
10
Collaborative Filtering (CF)
Users
Items Ratings
Love
Like
Okay
Dislike
Hate
https://buildingrecommenders.wordpress.com/2015/11/23/overview-of-recommender-algorithms-part-5/
11. Users with similar dis/likes are similar,
e.g. if Sarah and you have similar
tastes, then anything that Sarah likes
you will too (and vice versa)
Similar items will have similar
ratings, e.g., if you liked a book A,
you will also like a book B with a
similar rating
https://buildingrecommenders.wordpress.com/2015/11/23/overview-of-recommender-algorithms-part-5/
11
Item-based CF
User-based CF
15. Established
Companies
Startups
“cruise mode”
• Many existing loyal users
• RS used to increase per-
user metrics, e.g. revenue,
profit, etc.
“launch mode”
• Still building user-base
• RS used to attract/
retain new users
15
16. Startups = Growth
“The only essential thing is
growth. Everything else we
associate with startups follows
from growth.”
(Paul Graham, Y Combinator)
16
18. 18
“Cold Start” Problem
? ? ? ? ?
?
?
?
?
?
?
• RS Needs user/item data to make
recommendations with CF
• For new users/new items,
no data is available yet:
• New item problem
• New user problem
New User
New
Item
19. • Problem: don’t have any reviews yet
(to base recommendations on)
• Solution: can use content-based item
similarity (to bootstrap recommendations)
19
New Item Problem
Jordan Jumpman Team II Air Jordan 1 Retro High
Nouveau
Hurley One And Only
Printed
Air Jordan 1 Retro
High OG
23. • Contacts:
friends may already be users
of app (likely to have similar
interests)
• Location
• Device type
• Social profile
NOTE: should not be intrusive
23
Indirect Data
26. • Recommend an item that a
user will like:
Popular items, i.e., everyone
likes (but provides little info
about user’s preferences)
• Present an item to learn about user’s
preferences (Active Learning, AL):
Contentious Items, i.e., many people
like / dislike (informative about user’s
preferences)
26
Item Selection
•RS Presents items for two primary purposes:
•In practice multiple items are shown for different
objectives
27. 27
AL Categories
• Item-based AL: analyse items and select
items that seem most informative
• Model-based AL: analyse model and
select items that seem most informative
28. • Popular: rated by many users [Rashid 2002]
• High Variance in Ratings: item that people
either like or hate [Rashid 2002]
• Best/Worst: ask user which items s/he likes
most/least [Leino & Raiha 2007]
• Influential: items on which ratings of many
other items depend (representative + not
represented) [Rubens & Sugiyama 2007]
28
Item Categories
29. c
a
b
input1
input2
d
• 3R Properties:
• Represented by the
existing training set? E.g.,
(b) is already represented
• Representative of others?
E.g., (a) is not this way
• Results in achieving
objective? E.g., (d) → max
coverage
[Rubens & Kaplan, 2010] 29
Item-based AL
36.
g: optimal function (in the sollution
space)
bf : learned function
bfi ’s: learned functions from a slightly
di⇣erent training set.
EG = B +V +C
B =
⇣
Ebf (x) g(x)
⌘2
V =
⇣
bf Ebf (x)
⌘2
C = (g(x) f (x))2
Model Error – C
constant and is ignored
Bias – B
Hard to estimate, but is assumed
to vanish (assymptotically).
Variance – V
Estimate and minize.
10 / 20
36
AL Model Error
37. Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)
ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Type Strategy
Metric Eval.
Compar. Strategies Datasets
MAE/RMSE
NDCG/MAP
Precision
#Rating
Online
Offline
Non-Personalized
Single
uncertainty based
1. variance [59, 61] X - - - - y 2, 4, 6, 9, 24 AWM, EM
2. entropy [20, 67] - - - - y 3, 6, 8, 9, 11, 13, 22 EM
3. entropy0 [67] XX - - XX y y 2, 6, 8, 11, 13, 22 ML
error reduction
4. greedy extend [68] X - - - - y 2, 3, 6, 7, 10, 11 NF
5. representative [69] - XX XX - - y 6 NF, ML, LF
attention based
6. popularity [20, 67] X - - XX y y 2, 8, 9, 11, 13, 22 ML
7. co-coverage [68] - - - - y 2, 3, 4, 6, 10, 11 NF
Combined
static combin.
8. rand-pop [20, 67] - - y y 2, 3, 6, 11, 13, 22 ML
9. log(pop)*entropy [20] XX - - X y y 3, 6, 8, 13 ML
10. sqrt(pop)*var [68] X - - - - y 2, 3, 4, 6, 7, 11 NF
11. HELF [67] XX - - y y 2, 3, 6, 8, 13, 22 ML
12. non-pers-part rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
Personalized
Single
acquisition prob.
13. item-item [20, 67] - - XX y y 2, 3, 6, 8, 9, 11, 22 ML
14. binary-pred [11, 12] X XX X - y 1, 6, 9, 12, 20, 21, 28, 29 ML, NF
15. personality-based [70, 97] XX XX - XX y y 3, 9, 14 STS, MP
16. impact analysis [71] XX - - - - y 9 ML
prediction based
17. aspect model [72, 73] X - - - - y 2 EM, ML
18. min rating [74] X - - - - y 19,25 ML
19. min norm [74] - - - - y 18,25 ML
20. highest-pred [11, 12] X XX X - y 1, 6, 9, 12, 14, 21, 28, 29 ML, NF
21. lowest-pred [11, 12] X X - y 1, 6, 9, 12, 14, 20, 28, 29 ML, NF
user partitioning
22. IGCN [67] XX - - X y y 2, 3, 6, 8, 11, 13 ML
23. decision tree [64] XX - - - - y 3, 4, 10, 11 NF
Combined
static combin.
24. influence based [61] XX - - - - y 1, 4, 6, 9 ML
25. non-myopic [74] X - - - - y 18, 19 ML
26. treeU [75] X - - - - y 23, 27 ML, EM, NF
27. fMF [75] XX - - - - y 23, 26 ML, EM, NF
28. pers-partially rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
29. voting [11, 12] XX XX - y 1, 6, 9, 12, 14, 20, 21, 28 ML, NF
adaptive combin. 30. switching [76] XX XX - XX - y 9, 20, 29 ML
Mehdi Elahi, Francesco Ricci, Neil Rubens,A survey of active learning in collaborative
filtering recommender systems, Computer Science Review, Elsevier, 2016.
It is clearly shown in the table that different strategies can improve different aspects of the recom-
mendation quality. In terms of rating prediction accuracy (MAE/RMSE), there are various strategies that
have shown excellent performance. While, some of these strategies are easy to implement (e.g., Entropy0
and Log(popularity)*Entropy), others are more complex and use more sophisticated Machine Learning
algorithms (e.g., Decision Tree, and Personality-based FM). Strategies that have shown excellent per-
formance in terms of ranking quality (NDCG/MAP), are Representative-based and Voting strategies.
In terms of precision, prediction-based strategies (Highest-predicted, and Binary-predicted) have shown
excellent performance. In terms of number of ratings acquired (# Ratings), as expected, strategies that
consider the popularity of items (Popularity and Entropy0) can acquire the largest number of ratings.
But, other strategies that maximize the chance that the selected items are familiar to the user (Item-item
and Personality-based) can also elicit a considerable number of ratings. For these strategies the success
ratio (#acquired_ratings/#requested_items) is the largest. This is an important factor, since strategies
that only focus on the informativeness of the items may fail to actually acquire ratings, by selecting
obscure items that users do not know and cannot rate.
Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)
ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Metric Eval.
Tailored to:
•different
objectives
•different
data &
settings
37
MANY AL-RS APPROACHES
39. Take-home Messages
• RS shows users items they want
• RS accounts for a large portion of purchases
• RS methods: user/item-based
• RS is crucial for user growth, and:
• addressing new items/users (“cold start”) with:
• indirect data acquisition
• content-based item similarity
• informative item selection with AL
• Many RS components could be tuned to achieve high
performance