Basics of recommender systems, machine learning, supervised classification an unsupervised classification via clustering. Code in python, including simple examples. k-Nearest Neighbors, Naive Bayes, Non Negative Matrix Factorization
Codebits 2010 presentation.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
Lets learn the philosophy NOSQL takes (from a developer's standpoint), the changes you'll (not) have to take, discuss mongo, and see some practical examples!
These are my first revision of this talk and will be making some organizational improvements late.
The document discusses schema design basics for MongoDB, including terms, considerations for schema design, and examples of modeling different types of data structures like trees, single table inheritance, and many-to-many relationships. It provides examples of creating indexes, evolving schemas, and performing queries and updates. Key topics covered include embedding data versus normalization, indexing, and techniques for modeling one-to-many and many-to-many relationships.
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
This document provides an overview of probabilistic approaches to information retrieval. It discusses why probabilities are useful for IR given the inherent uncertainty. It covers the Probability Ranking Principle, which aims to rank documents by estimated probability of relevance. Other probabilistic techniques discussed include probabilistic indexing, probabilistic inference using logic representations, and using Bayesian networks for IR. The document notes open issues with some of these approaches and concludes by surveying existing survey papers on probabilistic IR.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
Lets learn the philosophy NOSQL takes (from a developer's standpoint), the changes you'll (not) have to take, discuss mongo, and see some practical examples!
These are my first revision of this talk and will be making some organizational improvements late.
The document discusses schema design basics for MongoDB, including terms, considerations for schema design, and examples of modeling different types of data structures like trees, single table inheritance, and many-to-many relationships. It provides examples of creating indexes, evolving schemas, and performing queries and updates. Key topics covered include embedding data versus normalization, indexing, and techniques for modeling one-to-many and many-to-many relationships.
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
This document provides an overview of probabilistic approaches to information retrieval. It discusses why probabilities are useful for IR given the inherent uncertainty. It covers the Probability Ranking Principle, which aims to rank documents by estimated probability of relevance. Other probabilistic techniques discussed include probabilistic indexing, probabilistic inference using logic representations, and using Bayesian networks for IR. The document notes open issues with some of these approaches and concludes by surveying existing survey papers on probabilistic IR.
This document provides an overview of machine learning concepts including:
1. Machine learning aims to create computer programs that improve with experience by learning from data. It involves tasks like classification, regression, and clustering.
2. Data comes in different types like text, numbers, images and is generated in massive quantities daily from sources like Google, Facebook, and sensors.
3. Machine learning algorithms are either supervised, using labeled training data, or unsupervised, using unlabeled data. Common supervised techniques are decision trees, neural networks, and support vector machines while clustering is a major unsupervised technique.
This document discusses classification and clustering techniques used in search engines. It covers classification tasks like spam detection, sentiment analysis, and ad classification. Naive Bayes and support vector machines are described as common classification approaches. Features, feature selection, and evaluation metrics for classifiers are also summarized.
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
This document discusses using machine learning and various machine learning platforms like MongoDB, Spark, Watson, Azure, and AWS to engage customers. It provides examples of using these platforms for tasks like topic detection on tweets, sentiment analysis, recommendation engines, forecasting, and marketing response prediction. It also discusses architectures, languages, and functions supported by tools like Mahout, MLlib, and Watson Developer Cloud.
"Towards a Science of Reproducible Science?" DPRMA Workshop talk at JCDL 2013, Indianapolis, 25th July 2013. Workshop website is http://dprma.oerc.ox.ac.uk/
Paper is
David De Roure. 2013. Towards computational research objects. In Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts (DPRMA '13). ACM, New York, NY, USA, 16-19. DOI=10.1145/2499583.2499590 http://doi.acm.org/10.1145/2499583.2499590
PHASE (Philly Area Scala Enthusiasts) - Word2vec in Scala. Talk explains concrete examples of how Word2vec works, built around a demo of constructing email alerts using concept search.
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
This document discusses how TrustYou processes large amounts of hotel review data to provide summaries to travelers. It crawls over 30 million reviews daily across 25 languages. Natural language processing and machine learning techniques are used to analyze the text and provide recommendations. Workflows are managed through Luigi and tasks include crawling, text processing, modeling word embeddings, and powering a sample application. Hadoop and Python are used extensively to handle the large scale processing.
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
One of the most challenging issues in the era of Big Data is the
“Variety” of the data. In general, there are two solutions to directly manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, tradeoffs, and research opportunities. In particular, we dive into four key aspects of technology for both types of systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.
Interest in Neural networks is growing with many areas from image recognition to speech processing reporting impressive results. Applications in Natural language processing with Neural networks have found multiple applications. With advances in software and hardware technologies, and interest in AI based applications growing, it is time to understand neural networks applied to natural language processing better!
In this workshop, we will discuss the basics of neural networks and natural language processing and discuss how neural approaches differ from traditional natural language modeling techniques with practical applications.
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf
The document provides an overview of NoSQL databases and discusses various types including document databases, column-family stores, and key-value pairs. It provides examples of MongoDB, CouchDB, Redis, HBase and their data models, query operations, and architectures.
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
This document provides information and resources for students on conducting research for coursework. It includes tips on searching the library database effectively using keywords, filters, and search limits. Various library databases are introduced for finding academic sources like journal articles. Criteria for coursework assessments focus on problem description, solution, evaluation and language quality. Strategies are presented for evaluating online information sources based on their authority, relevance, objectivity and currency. Students are directed to additional guides and contacts for research help.
Classical modeling techniques like statistics and clustering are difficult to apply to large datasets. Statistics involves collecting and describing data to find patterns and build predictive models. Clustering groups similar objects together based on distance between objects. These techniques were developed before terms like "data mining" and are still commonly used, but can struggle with very large datasets.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
Data mining involves analyzing large datasets to extract useful patterns. It is needed due to the huge amounts of data being generated from various sources like transactions, web documents, social media, sensors, etc. This data contains valuable information but requires analysis to extract knowledge. Data mining techniques are useful for tasks like recommendations, predictions, grouping similar items, and understanding relationships in the data. The main types of data include numeric, categorical, text, transactions, sequences, graphs and different analyses can be done depending on the domain and data type.
Data mining involves analyzing large datasets to extract useful patterns. It is needed due to the huge amounts of data being generated from various sources like transactions, web documents, social media, sensors, etc. This data contains valuable information but requires analysis to extract knowledge. Data mining techniques are useful for tasks like recommendations, predictions, grouping similar items, and understanding relationships in the data. The main types of data include numeric, categorical, text, transactions, sequences, graphs and different analyses can be done depending on the domain and data type.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
More Related Content
Similar to The Magical Art of Extracting Meaning From Data
This document provides an overview of machine learning concepts including:
1. Machine learning aims to create computer programs that improve with experience by learning from data. It involves tasks like classification, regression, and clustering.
2. Data comes in different types like text, numbers, images and is generated in massive quantities daily from sources like Google, Facebook, and sensors.
3. Machine learning algorithms are either supervised, using labeled training data, or unsupervised, using unlabeled data. Common supervised techniques are decision trees, neural networks, and support vector machines while clustering is a major unsupervised technique.
This document discusses classification and clustering techniques used in search engines. It covers classification tasks like spam detection, sentiment analysis, and ad classification. Naive Bayes and support vector machines are described as common classification approaches. Features, feature selection, and evaluation metrics for classifiers are also summarized.
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
This document discusses using machine learning and various machine learning platforms like MongoDB, Spark, Watson, Azure, and AWS to engage customers. It provides examples of using these platforms for tasks like topic detection on tweets, sentiment analysis, recommendation engines, forecasting, and marketing response prediction. It also discusses architectures, languages, and functions supported by tools like Mahout, MLlib, and Watson Developer Cloud.
"Towards a Science of Reproducible Science?" DPRMA Workshop talk at JCDL 2013, Indianapolis, 25th July 2013. Workshop website is http://dprma.oerc.ox.ac.uk/
Paper is
David De Roure. 2013. Towards computational research objects. In Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts (DPRMA '13). ACM, New York, NY, USA, 16-19. DOI=10.1145/2499583.2499590 http://doi.acm.org/10.1145/2499583.2499590
PHASE (Philly Area Scala Enthusiasts) - Word2vec in Scala. Talk explains concrete examples of how Word2vec works, built around a demo of constructing email alerts using concept search.
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
This document discusses how TrustYou processes large amounts of hotel review data to provide summaries to travelers. It crawls over 30 million reviews daily across 25 languages. Natural language processing and machine learning techniques are used to analyze the text and provide recommendations. Workflows are managed through Luigi and tasks include crawling, text processing, modeling word embeddings, and powering a sample application. Hadoop and Python are used extensively to handle the large scale processing.
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
One of the most challenging issues in the era of Big Data is the
“Variety” of the data. In general, there are two solutions to directly manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, tradeoffs, and research opportunities. In particular, we dive into four key aspects of technology for both types of systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.
Interest in Neural networks is growing with many areas from image recognition to speech processing reporting impressive results. Applications in Natural language processing with Neural networks have found multiple applications. With advances in software and hardware technologies, and interest in AI based applications growing, it is time to understand neural networks applied to natural language processing better!
In this workshop, we will discuss the basics of neural networks and natural language processing and discuss how neural approaches differ from traditional natural language modeling techniques with practical applications.
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf
The document provides an overview of NoSQL databases and discusses various types including document databases, column-family stores, and key-value pairs. It provides examples of MongoDB, CouchDB, Redis, HBase and their data models, query operations, and architectures.
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
This document provides information and resources for students on conducting research for coursework. It includes tips on searching the library database effectively using keywords, filters, and search limits. Various library databases are introduced for finding academic sources like journal articles. Criteria for coursework assessments focus on problem description, solution, evaluation and language quality. Strategies are presented for evaluating online information sources based on their authority, relevance, objectivity and currency. Students are directed to additional guides and contacts for research help.
Classical modeling techniques like statistics and clustering are difficult to apply to large datasets. Statistics involves collecting and describing data to find patterns and build predictive models. Clustering groups similar objects together based on distance between objects. These techniques were developed before terms like "data mining" and are still commonly used, but can struggle with very large datasets.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
Data mining involves analyzing large datasets to extract useful patterns. It is needed due to the huge amounts of data being generated from various sources like transactions, web documents, social media, sensors, etc. This data contains valuable information but requires analysis to extract knowledge. Data mining techniques are useful for tasks like recommendations, predictions, grouping similar items, and understanding relationships in the data. The main types of data include numeric, categorical, text, transactions, sequences, graphs and different analyses can be done depending on the domain and data type.
Data mining involves analyzing large datasets to extract useful patterns. It is needed due to the huge amounts of data being generated from various sources like transactions, web documents, social media, sensors, etc. This data contains valuable information but requires analysis to extract knowledge. Data mining techniques are useful for tasks like recommendations, predictions, grouping similar items, and understanding relationships in the data. The main types of data include numeric, categorical, text, transactions, sequences, graphs and different analyses can be done depending on the domain and data type.
Similar to The Magical Art of Extracting Meaning From Data (20)
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
3. “The greatest problem of today is how to teach people to ignore the
irrelevant, how to refuse to know things, before they are suffocated. For
too many facts are as bad as none at all.”
(W.H.Auden)
“The key in business is to know something that nobody else knows.”
(Aristotle Onassis)
5. Tools
• Python vs C or C++
• feedparser, Beautiful
Soup (scrap web pages)
• NumPy, SciPy
• Weka
• R
• Libraries
http://mloss.org/software/
6. Down The Rabbit Hole
• In 2006, google search crawler used
850TB of data.Total web history is
around 3PB
• Think of all the audio, photos & videos
• That’s a lot of data
• Open formats (HTML, RSS, PDF, ...)
• Everyone + their dog has an API
• facebook, twitter, flickr, last.fm,
delicious, digg, gowalla, ...
• Think about:
• news articles published every day
• status updates / day
8. The Netflix Prize
• In October 2006 Netflix launched an open competition for the best
collaborative filtering algorithm
• at least 10% improvement over netflix’s own algorithm
• Predict user ratings for films based on previous ratings (by all users)
• US$1,000,000 prize won in Sep 2009
9. The Three Acts
I: The Pledge
The magician shows you something ordinary. But of course... it
probably isn't.
II: The Turn
The magician takes the ordinary something and makes it do
something extraordinary. Now you're looking for the secret...
III: The Prestige
But you wouldn't clap yet. Because making something disappear
isn't enough; you have to bring it back.
11. I. Collecting Preferences
• yes/no votes
• Ratings in stars
• Purchase history
• Who you follow/who’s your
friend.
• The music you listen to or the
movies you watch
• Comments (“Bad”, “Great”, “Lousy”, ...)
12. II. Similarity
• Euclidean Distance
Olsen Twins - notice the similarity!
> 0.0 (positive correlation)
< 1.0 (not equal)
Same eyes, nose, ...
Different hair color, dress, earings, ...
• Pearson Correlation
√(a-b)2
15. UsersVs Items
• Find similar items instead of similar users!
• Same recommendation process:
• just switch users with items & vice versa (conceptually)
• Why?
• Works for new users
• Might be more accurate (might not)
• It can be useful to have both
16. Cross-Validation
• How good are the recommendations?
• Partitioning the data:Training set vs Test set
• Size of the sets? 95/5
• Variance
• Multiple rounds with different partitions
• How many rounds? 1? 2? 100?
• Measure of “goodness” (or rather, the error): Root
Mean Square Error
17. Case Study: Francesinhas.com
• Django project by 1 programmer
• Users give ratings to restaurants
• 0 to 5 stars (0-100 internally)
• Challenge: recommend users
restaurants they will probably like
25. Case Study:Twitter Follow
•Recommend users to follow
•Users don’t have ratings
•implied rating:
“follow” (binary)
•Recommend users that the
people the target user
follows also follow (but that the
target user doesn’t)
this was stuff I presented @codebits in 2008
before twitter had follow recommendations
(code was rewritten)
27. A KNN in 1 minute
• Calculate the nearest neighbors (similarity)
• e.g. the other users with the highest number of equal ratings
to the customer
• For the k nearest neighbors:
• neighbor base predictor (e.g. avg rating for neighbor)
• s += sim * (rating - nbp)
• d += sim
• prediction = cbp + s/d (cbp = customer base predictor, e.g. average customer rating)
28. Classifying
•Assign an item into a category
•An email as spam (document classification)
•A set of symptoms to a particular disease
•A signature to an individual (biometric identification)
•An individual as credit worthy (credit scoring)
•An image as a particular letter (Optical Character Recognition)
Item
Category
Item
29. Common Algorithms
• Supervised
• Neural Networks
• SupportVector Machines
• Genetic Algorithms
• Naive Bayes Classifier
• Unsupervised:
• Usually done via Clustering (clustering hypothesis)
• i.e. similar contents => similar classification
31. Case Study:A Spam Filter
• The item (document) is an email message
• 2 Categories: Spam and Ham
• What do we need?
fc: {'python': {'spam': 0, 'ham': 6}, 'the': {'spam': 3, 'ham': 3}}
cc: {'ham': 6, 'spam': 6}
32. Feature Extraction
• Input data can be way too large
• Think every pixel of an image
• It can also be mostly useless
• A signature is the same regardless of color (B&W
will suffice)
• And incredibly redundant (lots of data, little info)
• The solution is too transform the input into a
smaller representation - a features vector!
• A feature is either present or not
33. Get Features
• WordVector: Features are words (basic for doc classfication)
• An item (document) is an email message and can:
• contain a word (feature is present)
• not contain a word (feature is absent)
[‘date', 'don', 'mortgage', 'taint',‘you’,‘how’,‘delay’, ...]
Other ideas: use capitalization, stemming, tlf-idf
34. I.Training
For every training example (item, category):
1.Extract the item’s features
2.For each feature:
• Increment the count for this (feature, category) pair
3.Increment the category count (+1 example)
fc: {'feature': {'category': count, ...}}
cc: {'category': count, ...}
35. II. Probabilities
P(word | category) the probability that a word is in a particular category (classification)
P(w | c) =
P(c ∩ w)
P(c)
Assumed Probability
using only the information it has seen so far makes it incredibly sensitive to words
that appear very rarely.
It would be much more realistic for the value to gradually change as a word is
found in more and more documents with the same category.
a weight of 1 means the assumed probability is weighted the same as one word
36. P(Document | Category) probability that a given doc belongs in a particular category
= P(w1 | c) ∩ P(w2 | c) ∩ ... P(wn | c) for every word in the document
Yeah that’s nice... but what we want is
P(Category | Document)!
*note: Decimal vs float
38. III. Bayes’ Theorem
P(c | d) =
P(d | c) x P(c)
P(d)
P(d | c) = P(w1 | c) ∩ P(w2 | c) ∩ ... P(wn | c)
can be ignored
39.
40. • If you’re thinking of filtering spam, go with akismet
• If you really want to do your own bayesian spam filter,
a good start is wikipedia
• Training datasets are available online - for spam and
pretty much everything else
http://en.wikipedia.org/wiki/Bayesian_spam_filter
http://akismet.com/
http://spamassassin.apache.org/publiccorpus/
41. Clustering
• Find structure in datasets:
• Groups of things, people, concepts
• Unsupervised (i.e. there is no training)
• Common algorithms:
• Hierarchical clustering
• K-means
• Non Negative Matrix Approximation
A, B, C, D, F, G, I, J
A, C
B, D, GF
I, J
42. Non Negative Matrix
Approximation (or Factorization)
I. Get the data
• in matrix form!
II. Factorize the matrix
III.Present the results
yeah the matrix is kind of magic
44. I.The Data
[[7, 8, 1, 10, ...]
[2, 0, 16, 1, ...]
[22, 3, 0, 0, ...]
[9, 12, 5, 4, ...]
...]]
Matrix
word vector
article vector
[‘sapo’,‘codebits’,‘haiti’,‘iraq’, ...]
[‘A’,‘B’,‘C’,‘D’, ...]
value
(word frequency/article)
Article D contains the word ‘iraq’ 4 times
item
(article)
property (word)
45.
46. II. Factorize
[[7, 8]
[2, 0]]
[[1, 0]
[2, 3]]
x=
[[23, 24]
[2, 0]]
data matrix = features matrix x weights matrix
word
feature article
feature
importance of the word to the feature
how much the feature applies to the article
48. • For every feature:
• Display the top X words
(from the features
matrix)
• Display the topY articles
for this feature (from the
weights matrix)
III.The Results
49.
50. ['adobe', 'flash', 'platform', 'acrobat', 'software', 'reader']
(0.0014202284481846406, u"Apple,Adobe, and Openness: Let's Get Real")
(0.00049914481067248734, u'Piggybacking on Adobe Acrobat and others')
(0.00047202214371591086, u'CVE-2010-3654 - New dangerous 0-day authplay library adobe products')
['macbook', 'hard', 'only', 'much', 'drive', 'screen']
(0.0017976618817123543, u'The new MacBook Air')
(0.00067015549607138966, u'Revisiting Solid State Hard Drives')
(0.00035732495413261966, u"The new MacBook Air's SSD performance")
['apps', 'mobile', 'business', 'other', 'good', 'application']
(0.0013598162030796167, u'Which mobile apps are making good money?')
(0.00054549656743046277, u'An open enhancement request to the Mobile Safari team for sane
bookmarklet installation or alternatives')
(0.00040802131970223176, u'Google Apps highlights u2013 10/29/2010')
['quot', 'strike', 'operations', 'forces', 'some', 'afghan']
(0.002464522414843272, u'Kandahar diary:Watching conventional forces conduct a successful COIN')
(0.00027058999725999285, u'How universities can help in our wars - By Tom Ricks')
(0.00026940637538539202, u'This Weekendu2019s News:Afghanistanu2019s Long-Term Stability')
*note: this was created using an OPML file exported from my google
reader (260 subscriptions)
51. Food for the Brain
Machine Learning
Tom Mitchell
Neural Networks:
A Comprehensive Foundation
Simon Haykin
Programming Collective Intelligence:
Building Smart Web 2.0 Applications
Toby Segaran
Data Mining: Practical Machine
Learning Tools and Techniques
Ian H.Witten, Eibe Frank