Opinion stream mining encompasses methods for monitoring and understanding how people’s attitude towards products changes. Understanding which product features influence a buyer’s choice positively or
negatively allows decision makers to make well-informed decisions on improving their products or marketing them properly. We propose OPINSTREAM, a framework for the discovery and polarity monitoring of implicit product features deemed important in the people’s reviews on different products.
This document is Schering-Plough Corporation's annual report (Form 10-K) filed with the SEC for the year ending December 31, 2008. It provides an overview of the company and its three business segments: Prescription Pharmaceuticals, Animal Health, and Consumer Health Care. It discusses key products and therapeutic areas for each segment. It also describes recent acquisitions, strategic plans, and challenges facing the company in the current environment.
PPG reported record sales and earnings for the third quarter of 2008 despite economic headwinds. Sales increased 37% year-over-year to $4.2 billion due to acquisitions and higher prices offsetting lower volumes. Net income was $117 million compared to $191 million last year, though adjusted earnings per share were comparable at $1.37. The company strengthened its balance sheet by reducing debt by $650 million already in 2008. PPG expects further financial benefits from divesting its automotive glass business and business restructuring initiatives.
The document outlines 10 ways to sabotage an employee survey and not improve organizational culture. It suggests buying an expensive, outdated survey; writing poor questions; using the same old survey; asking irrelevant questions; not emphasizing confidentiality; having HR administer it; not sharing results; not taking action on findings; making action planning a burden; and only doing the survey once without follow up. The goal is to appear to conduct a survey but actually make problems worse rather than better.
- PPG Industries reported net sales of $2.87 billion for the quarter and $11.2 billion for the year, up compared to the same periods in 2006. Net income was $200 million for the quarter and $834 million for the year.
- By business segment, Performance Coatings and Industrial Coatings experienced the largest sales increases both for the quarter and full year.
- Total current assets at the end of 2007 were $7.14 billion, up from $4.86 billion at the end of 2006, mainly due to increases in cash, short-term investments, and receivables.
This document is Schering-Plough Corporation's annual report (Form 10-K) filed with the SEC for the year ending December 31, 2008. It provides an overview of the company and its three business segments: Prescription Pharmaceuticals, Animal Health, and Consumer Health Care. It discusses key products and therapeutic areas for each segment. It also describes recent acquisitions, strategic plans, and challenges facing the company in the current environment.
PPG reported record sales and earnings for the third quarter of 2008 despite economic headwinds. Sales increased 37% year-over-year to $4.2 billion due to acquisitions and higher prices offsetting lower volumes. Net income was $117 million compared to $191 million last year, though adjusted earnings per share were comparable at $1.37. The company strengthened its balance sheet by reducing debt by $650 million already in 2008. PPG expects further financial benefits from divesting its automotive glass business and business restructuring initiatives.
The document outlines 10 ways to sabotage an employee survey and not improve organizational culture. It suggests buying an expensive, outdated survey; writing poor questions; using the same old survey; asking irrelevant questions; not emphasizing confidentiality; having HR administer it; not sharing results; not taking action on findings; making action planning a burden; and only doing the survey once without follow up. The goal is to appear to conduct a survey but actually make problems worse rather than better.
- PPG Industries reported net sales of $2.87 billion for the quarter and $11.2 billion for the year, up compared to the same periods in 2006. Net income was $200 million for the quarter and $834 million for the year.
- By business segment, Performance Coatings and Industrial Coatings experienced the largest sales increases both for the quarter and full year.
- Total current assets at the end of 2007 were $7.14 billion, up from $4.86 billion at the end of 2006, mainly due to increases in cash, short-term investments, and receivables.
This document introduces an employee feedback survey tool that assesses organizational health across six key factors - accountability, communication, fairness, growth, relationships, and trust - in a way that is designed not to damage company culture. The survey takes about 10 minutes to complete online and provides results and recommendations within one week. It uses a slider scale interface instead of forced rating scales. The tool is aimed at startups and smaller firms that want to monitor culture without overly formal programs.
Το NeeMo είναι ένα εργαλείο για την αυτόματη σύνοψη της Ελληνικής blogo-σφαιρας.
Ανά τακτά χρονικά διαστήματα, το NeeMo συλλέγει posts από όλα τα blogs, τα ομαδοποιεί σε θέματα και τα παρουσιάζει με βάση το πόσο ενδιαφέρουν την κοινή γνώμη.
The document is in Spanish and appears to be about Holy Tuesday in 2010. It mentions Misa CrismalMartes Santo 2010, which translates to Chrism Mass Holy Tuesday 2010. Chrism Mass is a special Mass that is usually celebrated on Holy Tuesday where the bishop blesses the holy oils used in sacraments.
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...Patricia Santos
(1) Researchers conducted a case study to explore how a meaningful mobile learning (m-learning) activity could foster engagement among older users not initially interested in information and communication technologies (ICT).
(2) They designed a context-aware m-learning activity where participants answered questions about a novel on a route using mobile devices. Participants were involved in co-design workshops to create the questions.
(3) Results showed that the meaningful, collaborative nature of the activity improved participants' views of technology over time and engaged them in learning, despite initial reluctance towards ICT. When mobile devices supported an enjoyable experience, anxiety towards them reduced.
There are many types of cars that serve different purposes. Sedans are common passenger vehicles that can carry 4-5 people in comfort. Sports cars are low and sleek vehicles designed for speed and performance rather than passenger capacity. SUVs provide ample passenger and cargo space but are taller and often all-wheel drive for navigating different terrains.
PPG Industries reported financial results for Q4 and full year 2008. Q4 sales declined 18% to $3.1 billion due to a severe drop in global demand. However, full year sales increased 30% to $15.8 billion due to growth in coatings segments and acquisitions. Earnings per share were $0.41 for Q4 and $4.59 for the full year after adjustments. PPG expects global demand and currency rates to impact Q1 2009 results. The company generated strong cash flow in 2008 and repaid debt ahead of schedule.
This document discusses the importance of love and wisdom in solving life's problems. It states that faults are more apparent where love is lacking and that love waxes and wanes like the moon. It also advises that it is better to sit with an owl, representing wisdom, than to fly with a falcon representing rashness.
PPG Industries is a leading global manufacturer of coatings, glass, and chemicals. In 2004, the company achieved record sales of $9.5 billion, a 9% increase over 2003. Net income was $683 million, a 38% rise, driven by higher volumes, prices, and manufacturing efficiencies. PPG used its strong cash flow to repurchase shares, increase dividends, pay down debt, and fund capital expenditures including acquisitions, while still ending the year with $700 million in cash. The company aims to increase growth and consistency through strategies like building a better mix of businesses, creating breakthrough products, and improving customer results.
An ethical will is a non-legal document that allows people to define and pass on their values, beliefs, life lessons, dreams, and hopes to family and community after death. Ethical wills have biblical roots and were traditionally oral but are now often written. They are one of three types of wills, along with legal wills and living wills. Ethical wills help honor one's past, capture important present values, and inform the future by creating a legacy. Writing an ethical will can provide personal and emotional benefits for both the writer and their descendants.
PPG Industries is a leading global manufacturer that supplies coatings, glass, and chemical products. In 2005, PPG reported record sales of $10.2 billion, with 10 of its 15 business posting annual sales records. Net income was $596 million. To accelerate growth, PPG is expanding in key markets like Asia through acquisitions and new facilities. PPG is also investing in new technologies to strengthen its businesses and remain competitive, such as adding coating capabilities and installing new membrane cell technology.
Trizivir is a fixed-dose combination tablet containing abacavir, lamivudine and zidovudine used to treat HIV-1 infection. It contains the same doses of the individual drugs as taking them separately. Studies found Trizivir to be clinically equivalent to taking the drugs separately in terms of virologic response and similar rates of adverse effects. Switching stable patients from other regimens to Trizivir was also found to be effective with similar adverse event rates and potential cholesterol and triglyceride benefits due to the simplified regimen. The presentation recommends adding Trizivir to the VA formulary due to its benefits of being bioequivalent, offering a simple regimen with low pill burden, good
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
This document provides an overview of recommendation systems including: how they work using content-based, collaborative filtering, and hybrid approaches; challenges like performance, modelization, and biases; common evaluation methods including offline metrics, A/B testing, and the online-offline gap; interactive learning using reinforcement learning; and concludes with best practices like prioritizing criteria, blending approaches, and continuous evaluation and monitoring. A hands-on tutorial is also proposed to implement a neural network-based recommendation system.
The document outlines an agenda for a meeting to review a previous project and define improvement actions. The objectives are to work better together and improve future projects. Ground rules include a blame-free discussion, listing both successes and problems, and focusing on solutions. The agenda includes reviewing feedback, identifying root causes through a fishbone analysis, prioritizing issues, and developing an action plan to prevent issues from reoccurring. Participants completed pre-work to identify causes and brainstorm solutions which will be discussed in the meeting.
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
This document summarizes Xavier Amatriain's presentation on recommender systems. It discusses traditional recommendation methods like collaborative filtering, content-based recommendations, and hybrid approaches. It also covers newer methods that go beyond traditional techniques, such as learning to rank, deep learning, social recommendations, and context-aware recommendations. Throughout the presentation, Amatriain discusses challenges like cold starts, popularity bias, and limitations of different recommendation approaches. He also shares lessons learned from the Netflix Prize competition, including how SVD and RBM models were used.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
This document provides an introduction to recommender systems. It discusses how recommender systems help users discover new content in the "age of recommendation" by providing personalized recommendations. Common techniques for building recommender systems include collaborative filtering, which looks at ratings from similar users to provide recommendations. Memory-based collaborative filtering uses item or user similarities to make predictions, while model-based approaches like matrix factorization use dimensionality reduction techniques to learn latent factors from user-item rating matrices. Matrix factorization approaches like SVD have been shown to provide accurate recommendations.
This document introduces an employee feedback survey tool that assesses organizational health across six key factors - accountability, communication, fairness, growth, relationships, and trust - in a way that is designed not to damage company culture. The survey takes about 10 minutes to complete online and provides results and recommendations within one week. It uses a slider scale interface instead of forced rating scales. The tool is aimed at startups and smaller firms that want to monitor culture without overly formal programs.
Το NeeMo είναι ένα εργαλείο για την αυτόματη σύνοψη της Ελληνικής blogo-σφαιρας.
Ανά τακτά χρονικά διαστήματα, το NeeMo συλλέγει posts από όλα τα blogs, τα ομαδοποιεί σε θέματα και τα παρουσιάζει με βάση το πόσο ενδιαφέρουν την κοινή γνώμη.
The document is in Spanish and appears to be about Holy Tuesday in 2010. It mentions Misa CrismalMartes Santo 2010, which translates to Chrism Mass Holy Tuesday 2010. Chrism Mass is a special Mass that is usually celebrated on Holy Tuesday where the bishop blesses the holy oils used in sacraments.
Not Interested in ICT? A Case Study to Explore How a Meaningful m-Learning Ac...Patricia Santos
(1) Researchers conducted a case study to explore how a meaningful mobile learning (m-learning) activity could foster engagement among older users not initially interested in information and communication technologies (ICT).
(2) They designed a context-aware m-learning activity where participants answered questions about a novel on a route using mobile devices. Participants were involved in co-design workshops to create the questions.
(3) Results showed that the meaningful, collaborative nature of the activity improved participants' views of technology over time and engaged them in learning, despite initial reluctance towards ICT. When mobile devices supported an enjoyable experience, anxiety towards them reduced.
There are many types of cars that serve different purposes. Sedans are common passenger vehicles that can carry 4-5 people in comfort. Sports cars are low and sleek vehicles designed for speed and performance rather than passenger capacity. SUVs provide ample passenger and cargo space but are taller and often all-wheel drive for navigating different terrains.
PPG Industries reported financial results for Q4 and full year 2008. Q4 sales declined 18% to $3.1 billion due to a severe drop in global demand. However, full year sales increased 30% to $15.8 billion due to growth in coatings segments and acquisitions. Earnings per share were $0.41 for Q4 and $4.59 for the full year after adjustments. PPG expects global demand and currency rates to impact Q1 2009 results. The company generated strong cash flow in 2008 and repaid debt ahead of schedule.
This document discusses the importance of love and wisdom in solving life's problems. It states that faults are more apparent where love is lacking and that love waxes and wanes like the moon. It also advises that it is better to sit with an owl, representing wisdom, than to fly with a falcon representing rashness.
PPG Industries is a leading global manufacturer of coatings, glass, and chemicals. In 2004, the company achieved record sales of $9.5 billion, a 9% increase over 2003. Net income was $683 million, a 38% rise, driven by higher volumes, prices, and manufacturing efficiencies. PPG used its strong cash flow to repurchase shares, increase dividends, pay down debt, and fund capital expenditures including acquisitions, while still ending the year with $700 million in cash. The company aims to increase growth and consistency through strategies like building a better mix of businesses, creating breakthrough products, and improving customer results.
An ethical will is a non-legal document that allows people to define and pass on their values, beliefs, life lessons, dreams, and hopes to family and community after death. Ethical wills have biblical roots and were traditionally oral but are now often written. They are one of three types of wills, along with legal wills and living wills. Ethical wills help honor one's past, capture important present values, and inform the future by creating a legacy. Writing an ethical will can provide personal and emotional benefits for both the writer and their descendants.
PPG Industries is a leading global manufacturer that supplies coatings, glass, and chemical products. In 2005, PPG reported record sales of $10.2 billion, with 10 of its 15 business posting annual sales records. Net income was $596 million. To accelerate growth, PPG is expanding in key markets like Asia through acquisitions and new facilities. PPG is also investing in new technologies to strengthen its businesses and remain competitive, such as adding coating capabilities and installing new membrane cell technology.
Trizivir is a fixed-dose combination tablet containing abacavir, lamivudine and zidovudine used to treat HIV-1 infection. It contains the same doses of the individual drugs as taking them separately. Studies found Trizivir to be clinically equivalent to taking the drugs separately in terms of virologic response and similar rates of adverse effects. Switching stable patients from other regimens to Trizivir was also found to be effective with similar adverse event rates and potential cholesterol and triglyceride benefits due to the simplified regimen. The presentation recommends adding Trizivir to the VA formulary due to its benefits of being bioequivalent, offering a simple regimen with low pill burden, good
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
This document provides an overview of recommendation systems including: how they work using content-based, collaborative filtering, and hybrid approaches; challenges like performance, modelization, and biases; common evaluation methods including offline metrics, A/B testing, and the online-offline gap; interactive learning using reinforcement learning; and concludes with best practices like prioritizing criteria, blending approaches, and continuous evaluation and monitoring. A hands-on tutorial is also proposed to implement a neural network-based recommendation system.
The document outlines an agenda for a meeting to review a previous project and define improvement actions. The objectives are to work better together and improve future projects. Ground rules include a blame-free discussion, listing both successes and problems, and focusing on solutions. The agenda includes reviewing feedback, identifying root causes through a fishbone analysis, prioritizing issues, and developing an action plan to prevent issues from reoccurring. Participants completed pre-work to identify causes and brainstorm solutions which will be discussed in the meeting.
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
This document summarizes Xavier Amatriain's presentation on recommender systems. It discusses traditional recommendation methods like collaborative filtering, content-based recommendations, and hybrid approaches. It also covers newer methods that go beyond traditional techniques, such as learning to rank, deep learning, social recommendations, and context-aware recommendations. Throughout the presentation, Amatriain discusses challenges like cold starts, popularity bias, and limitations of different recommendation approaches. He also shares lessons learned from the Netflix Prize competition, including how SVD and RBM models were used.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
This document provides an introduction to recommender systems. It discusses how recommender systems help users discover new content in the "age of recommendation" by providing personalized recommendations. Common techniques for building recommender systems include collaborative filtering, which looks at ratings from similar users to provide recommendations. Memory-based collaborative filtering uses item or user similarities to make predictions, while model-based approaches like matrix factorization use dimensionality reduction techniques to learn latent factors from user-item rating matrices. Matrix factorization approaches like SVD have been shown to provide accurate recommendations.
Context-aware recommender systems (CARS) help improve the effectiveness of recommendations by adapting to users' preferences in different contextual situations. One approach to CARS that has been shown to be particularly effective is Context-Aware Matrix Factorization (CAMF). CAMF incorporates contextual dependencies into the standard matrix factorization (MF) process, where users and items are represented as collections of weights over various latent factors. In this paper, we introduce another CARS approach based on an extension of matrix factorization, namely, the Sparse Linear Method (SLIM). We develop a family of deviation-based contextual SLIM (CSLIM) recommendation algorithms by learning rating deviations in different contextual conditions. Our CSLIM approach is better at explaining the underlying reasons behind contextual recommendations, and our experimental evaluations over five context-aware data sets demonstrate that these CSLIM algorithms outperform the state-of-the-art CARS algorithms in the top-$N$ recommendation task. We also discuss the criteria for selecting the appropriate CSLIM algorithm in advance based on the underlying characteristics of the data.
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
This document proposes a new type of recommender system called a context recommender that recommends appropriate contexts (e.g. time, location, companion) for users to consume items. It discusses how context recommenders are different than traditional and context-aware recommenders. It also presents the framework for context recommenders including algorithms using multi-label classification to directly predict contexts. The document reports on experiments comparing these algorithms on several datasets and finds that personalized algorithms outperform non-personalized ones and that certain multi-label classification algorithms like label powerset using support vector machines achieve the best performance.
Detecting Food and Activities in Lifelogging ImagesRami Albatal
The document discusses detecting food and activities in lifelogs (records of daily experiences). It describes a system using deep learning convolutional neural networks to automatically extract features and classify images from lifelogs into concepts like food items and locations. The system was the top performer in the NTCIR-12 Lifelogging evaluation campaign for tasks like finding images of a person at a specific location. The document explains how deep learning can build multimedia indexing and retrieval systems for lifelogs without needing manual feature extraction.
Context-aware Recommendation: A Quick ViewYONG ZHENG
Context-aware recommendation systems take into account additional contextual information beyond just the user and item, such as time, location, and companion. There are three main approaches: contextual prefiltering splits items or users based on context; contextual modeling directly integrates context into models like matrix factorization; and CARSKit is an open source Java library for building context-aware recommender systems.
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
The document outlines the STATIK framework for implementing Kanban, which involves understanding the system's purpose, sources of dissatisfaction, demand and capability, knowledge discovery processes, classes of service, Kanban systems design, and rollout. It emphasizes understanding multiple perspectives, balancing demand and capability, discovering needs through collaboration and validation, recognizing different expectations through classes of service, and designing systems to pursue sustained evolutionary change through Kanban values like transparency and respect.
How to gain a foothold in the world of classificationTorsten Schön
This talk is supposed to serve as a basic introduction to classification. I will explain some common classification algorithms and fundamentals in the field of classification. Before starting to learn a model, it is crucial to explore and understand the underlying data. Based on these findings, proper feature engineering and selection is to be performed in order to get appropriate results. After choosing a model and classifying data instances, we will see different methods of evaluating the results, using techniques like cross-validation.
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Joint work with Kris Jack, Chief Data Scientist at Mendeley.
The document provides guidance for students completing the Mini-PACT assessment and reflection. It instructs students to plan an individual assessment to include in the assessment section using actual student scores and identifying a cut point on the rubric. For the reflection, students should cite what they learned from daily reflections and instruction/assessment analyses, properly cite research, and be specific about next steps. It also includes instructions for a self-assessment where students should work quietly and honestly assess themselves, noting any questions.
This document discusses recommender systems and approaches used at Netflix. It covers collaborative filtering using user-user and item-item methods, content-based recommendations using item attributes, and hybrid approaches. It provides examples of how Netflix uses collaborative filtering to generate personalized genre rows and social recommendations. Netflix combines many data sources and machine learning models to power its highly personalized recommendation engine.
The document discusses the Analytic Hierarchy Process (AHP) decision-making method. It begins with an introduction to AHP and the speaker. It then outlines the basic steps of AHP, including defining the problem as a hierarchy, pairwise comparisons of criteria, calculating weights, and consistency checks. Finally, it provides an example of using AHP to choose an automobile, walking through creating the hierarchy and making pairwise comparisons to derive weights and determine the best alternative.
The document discusses the Analytic Hierarchy Process (AHP) decision-making method. It begins with an introduction to the speaker and overview of the tutorial. Then, it outlines the typical steps in the AHP decision-making process, including identifying the decision, gathering information, identifying alternatives, weighting criteria, choosing among alternatives, and reviewing the decision. The remainder of the document provides an in-depth explanation of applying the AHP process through pairwise comparisons and calculating weights and consistency. Examples are provided to illustrate how AHP can be used to evaluate multiple criteria in complex decision problems.
The document discusses Unit II of a syllabus which covers class diagrams, including elaboration, domain modeling, finding conceptual classes and relationships. It discusses when to use class diagrams and provides examples of a class diagram for a hotel management system. It also discusses inception and elaboration phases in software development processes and provides artifacts used in elaboration. Finally, it discusses domain modeling including how to identify conceptual classes, draw associations, and avoid adding too many associations.
Continuous Learning Systems: Building ML systems that learn from their mistakesAnuj Gupta
This document discusses building machine learning systems that can continuously learn from feedback to improve over time.
It describes how models trained on static datasets degrade over time when applied to non-stationary domains like customer support conversations. A two-model approach is proposed using a global model trained on large datasets and local models that learn from feedback to adapt to each brand.
Local models use online learning algorithms to incorporate feedback from predictions and continuously update. This approach improved accuracy by 7% on a customer support test dataset by adapting to each brand's definitions. Combining global and local model scores provided further gains. Future work focuses on improving adaptation and reducing bias from feedback.
Similar to Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM (20)
AAISI AI Colloquium 30/3/2021: Bias in AI systemsEirini Ntoutsi
The document summarizes a presentation about bias in AI systems. It discusses understanding bias by examining how human biases enter AI systems through data and algorithms. It also covers approaches for mitigating bias, including pre-processing the data, changing the learning algorithm, and post-processing models. As an example, it describes changing decision tree algorithms to incorporate fairness metrics when selecting attributes for splits. The overall goal is to deal with bias at different stages of AI system development and deployment.
Fairness-aware learning: From single models to sequential ensemble learning a...Eirini Ntoutsi
An overview of fairness-aware learning: from batch-learning with single models to batch-learning with sequential ensembles and to fairness-aware learning over non-stationary data streams
Invited talk on fairness in AI systems at the 2nd Workshop on Interactive Natural Language Technology for Explainable AI co-located with the International Conference on Natural Language Generation, 18/12/2020.
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
This document discusses sentiment analysis of social media content. It begins by noting the abundance of opinion-rich resources online due to the social nature of Web 2.0. Sentiment analysis aims to automatically analyze opinions expressed in text to determine sentiment. This is challenging for social media due to informal language, short texts, and continuous new content. The document outlines challenges in building sentiment classifiers from social media data due to the need for large labeled training datasets but few available labels. It describes using emoticons and sentiment dictionaries to implicitly label some data for use in classifier training and evaluates the agreement between these labeling methods.
This document provides an introduction to machine learning, including definitions of key concepts like supervised vs. unsupervised learning. It discusses traditional machine learning assumptions like having fully labeled small datasets and stationary data. However, it notes real-world data often has huge amounts of unlabeled or streaming data. It emphasizes tackling these challenges with approaches like semi-supervised learning and stream mining. Overall, the document aims to give an overview of machine learning tasks and algorithms, highlighting the need to select the right methods for problems while working with domain experts on complex real-world data.
Density-based Projected Clustering over High Dimensional Data StreamsEirini Ntoutsi
The document presents HDDStream, a new density-based projected clustering algorithm for clustering high dimensional streaming data. It models microclusters with a content summary and dimension preference vector. It defines core, potential core, and outlier microclusters based on density, radius, and dimensionality criteria. It shows HDDStream improves clustering quality over existing approaches and can detect changes and be used for monitoring by adapting the number of summaries to the underlying population.
Density Based Subspace Clustering Over Dynamic DataEirini Ntoutsi
The document summarizes a presentation on incremental subspace clustering over dynamic data. It introduces the challenges of high dimensionality and dynamic data. It describes the PreDeCon algorithm for static subspace clustering and proposes an incremental version called incPreDeCon that maintains density-based subspace clusters as new data arrives over time in a streaming fashion. Experimental results show that incPreDeCon requires significantly fewer range queries than recomputing clusters from scratch using PreDeCon as new data is inserted.
Summarizing Cluster Evolution in Dynamic EnvironmentsEirini Ntoutsi
The document presents a framework for summarizing cluster evolution over data streams called FINGERPRINT. It proposes representing cluster changes across time as an evolution graph and then condensing this graph into a FINGERPRINT structure. The FINGERPRINT uses virtual centers to summarize similar clusters and traces of cluster survivals, reducing information loss and space requirements. Experiments on three datasets show the framework achieves significant space reductions with minimal information loss by varying a threshold parameter. The authors discuss avenues for future work, like summarizing splits and absorptions in addition to survivals.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
1. Discovering and monitoring product features
and the opinions on them with the
OPINSTREAM framework
Max Zimmermann1, Eirini Ntoutsi2,
Myra Spiliopoulou1
Hellenic Data Management Symposioum
July 2014
1 2 Institute for Informatics, Ludwig‐
Maximilians‐Universität (LMU)
München, Germany
2. o Opinions from TripAdvisor on Vienna Marriott Hotel
2/5/2014: Great hotel, very nice rooms, perfect location, very nice staffs except for a mid‐aged
female receptionist who tried to charge me extra for wifi fees when checking out. It was waived
at the desk when I checked‐in. And she started treating me with an attitude after she found out
that I got a great deal through priceline.com. ….
26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular
and a nice pool etc. Staff in pool weren't Good and I found them actually quite rood. Executive
lounge was ok and not busy but selection of wine and beer wasn't great. The reception has
many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great
for business travel but not sure id come again for leisure.
7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the
basic stuff done right.
It's in a fine location, maybe 10 minute walk from the major city attractions while being in a
quiet area. Breakfast buffet exceptional and good fitness center. Very helpful and happy staff.
Lobby lounge just okay. Not a good wine selection and the Sinatra‐like singer adds nothing.
Maybe just a little more expensive than it should be, too.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 2
3. • Opinionated streams
• Opinion stream mining & the OPINSTREAM framework
• Extracting an initial opinionated hierarchy of product (sub)features
• Online (sub)feature hierarchy maintenance
• Online opinion classifier maintenance
• Experiments
• Summary
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 3
4. o In a static setting, i.e., given is a collection of opinionated documents:
• How to extract the different, ad hoc aspects discussed in the collection?
• How to associate a dominant polarity to each aspect?
o When the set becomes a stream
• How to extract the ad hoc aspects over time?
• How to update their sentiment over time?
o The OPINSTREAM framework
• Feature discovery adaptive unsupervised learning (clustering)
Features are defined as cluster centroids
Features evolve over time
• Polarity learning adaptive semi‐supervised learning (classification)
Majority voting within the cluster for the polarity
Learning under concept drift
Learn with a small amount of labeled reviews
Polarized aspect
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
4
5. o Reviews arrive at distinct timepoints t0, t1, …., ti, …. in batches of fixed size.
o We don’t receive class labels from the stream; the only sentiment‐related
information we assume is an initial seed set S of labeled reviews.
o Reviews are subject to ageing according to the exponential ageing function
• Recent reviews are more important for learning
• Gradual decrease of the weight of a review over time
• Ageing affects both clustering (feature extraction) and classification (sentiment learning)
o The importance of a review r with respect to a set of reviews R, is defined as the
number of reviews in R that have r among their kNN, also considering age.
• For the kNN, cosine similarity is employed
o Important reviews Rβ: those exceeding the review importance threshold β.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 5
6. • Opinionated streams
• Opinion stream mining & the OPINSTREAM framework
• Extracting an initial opinionated hierarchy of product (sub)features
• Online (sub)feature hierarchy maintenance
• Online opinion classifier maintenance
• Experiments
• Summary
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 6
7. Initial seed set S (labeled)
Extract clusters
battery,
weight,
life, …
picture,
quality,
size, …
Global/ 1st level clusters
Clustering process
• Extract important reviews R from S.
• Define dimensions F as all nouns in R
• Vectorize S over F through TFIDF.
• Apply fuzzy c‐means to derive the 1st level
Label k
Cluster 1:
battery
Cluster 2:
picture
Cluster k:
…
clusters.
…
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
7
8. Initial seed set S
(labeled)
Extract clusters
battery,
weight,
life, …
picture,
quality,
size, …
Cluster 1:
battery
Cluster 2:
picture
…
Extract subclusters
weight life quality size
Local/ 2nd level clusters
Similar to 1st level clustering,
the dataset though is the
population of each cluster
… …
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
subCluster 11:
battery weight
Clustering process
• Extract important reviews R from
cluster c.
• Define dimensions F as all nouns in R
• Vectorize S over F through TFIDF.
• Apply fuzzy c‐means to derive the
subclusters of c.
subCluster 12:
battery life
subCluster 21:
picture quality
subCluster 22:
picture size
8
9. Definition: Polarized feature
The polarized feature represented by a cluster c defined in a feature space FR consists
of:
• The centroid vector <w1, …., w|FR|>, wi is the avg TF‐IDF of word i in c.
• The polarity label cpolarity, is the majority class label of reviews in c.
Topic‐specific classifiers
o Within each (sub)cluster, a topic‐specific classifier is trained based on the
corresponding instances from the initial (labeled) seed set S.
o We focus on adverbs, adjectives as sentimental words.
o Multinomial Naïve Bayes (MNB) sentiment learner
• Laplace correction factor for unknown words
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
9
Laplace correction for
unknown words
10. Here is how the hierarchy looks like (ignore the containers for the moment)
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
10
11. • Opinionated streams
• Opinion stream mining & the OPINSTREAM framework
• Extracting an initial opinionated hierarchy of product (sub)features
• Online (sub)feature hierarchy maintenance
• Online opinion classifier maintenance
• Experiments
• Summary
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 11
12. o So far, we discussed the initialization of the polarized hierarchy.
o How do we place a new coming review r (unlabeled) in the hierarchy?
No Yes
• Assign it to its closest
(sub)cluster w.r.t. δg, δl
• Update centroid
• Learn its polarity via its topic
specific classifier.
Is r novel?
• Difficult to decide whether its an outlier or the
beginning of a new feature.
• Wait and accumulate evidence till a decision can be
made
Novel reviews are maintained in containers
Definition: Review novelty
A review r is novel w.r.t. a set of clusters Θ if ist cosine similarity to the closest cluster
centroid in Θ is less than δ (0 ≤ δ ≤1).
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
12
13. Global container:
accumulates reviews that are
novel w.r.t. global clusters
Local container (1 per global cluster):
accumulates reviews that are close to
it cluster centroid but far away from
all centroids of its subclusters
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 13
14. o Adapt the hierarchy when enough novelty has been accumulated
• Questions 1: How to define adequate novelty?
• Questions 2: How to incorporate novel reviews in the hierarchy?
o Always reclustering approach: if the size of a container exceeds a threshold,
reorganize from scratch its corresponding local/global clusters (reclustering)
• Results in a lot of local/global reclusterings
• Lot of effort for the end user to re‐comprehend the hierarchy
Such drastic changes should
be avoided, unless necessary.
o Merge or reclustering approach: instead of a drastic change like reclustering, try
first to merge containers to their corresponding clusters.
• Rationale: Clusters and containers might “start moving towards each other” due to
ageing.
• Merge option 1: Merge b and c w.r.t. the feature space of c
• Merge option 2: Merge b and c w.r.t. a new feature space derived from b U C
?
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 14
15. o Check model quality before and after the merge
• Cluster description length as quality indicator
o The cluster description length (CDL) of a cluster c defined over dimensions Dc:
o Merge strategy 1: Do we gain in quality if we merge c with b, while retaining the
feature space of c?
o Merge strategy 2: Do we gain in quality if we merge c with b, using a new set of
dimensions derived from b and c?
This results actually in a new cluster.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 15
16. o We should keep the number of rebuilds low to minimize the effort of the end user
• The mental effort is modeled by fatigue.
Definition: Fatigue
Let M(t) be the hierarchy at t and n the number of reviews contained in its clusters.
Fatigue is defined as the percentage of reviews involved in rebuilt clusters:
o At the end of each batch and for each global cluster
• Try Merge Strategy I
• If not satisfied, try Merge Strategy II
Sets of clusters involved in rebuilds
o Compute fatigue (based on all global clusters for which Merge Strategy II is true)
If fatigue < γ, go on with local reclusterings
otherwise, rebuild the whole hierarchy from scratch.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
16
17. • Opinionated streams
• Opinion stream mining & the OPINSTREAM framework
• Extracting an initial opinionated hierarchy of product (sub)features
• Online (sub)feature hierarchy maintenance
• Online opinion classifier maintenance
• Experiments
• Summary
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 17
18. o Static Multinomial Naïve Bayes
o Probability of class c for word wi, estimation based on seed set S:
Laplace correction for
unknown words
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
18
19. o Backward adaptation version
• Old reviews have gradually less effect on the classifier
o Effect of age in the word counts
o Effect of age in the probability estimates
Correction factor
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
19
20. o Forward adaptation version
• Labels are expensive, can we extend the initial seed set S?
• Yes, by adding useful reviews to S
o Definition: Usefulness of a review
Let d be a new review, to which Δ(S) assigns the label c. The usefulness of d is:
o A document is useful if its usefulness exceeds a threshold α in (‐1,0]
• ~0 values promote smooth adaptation (d should “agree” with existing model)
• ~‐1 values promote diversity
VS ={perfect, friendly}
d ={expensive, noisy}
VS ={perfect, friendly, expensive, noisy}
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 20
21. • Opinionated streams
• The OPINSTREAM framework
• Extracting an initial opinionated hierarchy of product (sub)features
• Online hierarchy maintenance
• Online opinion classifier maintenance
• Experiments
• Summary
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 21
22. o Yu et al dataset: 12,825 reviews on 327 product features
# features varies strongly from one
batch to the next making adaptation
challenging
High entropy values indicating that
there is no clear sentiment per batch
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
22
23. o Kg=9, δg=0.1, σg=500; Kl=9, δl=0.2, σl=150;
o BaselineClu: is the container size‐based approach
Smooth adaptation leads to clusters of better quality
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
23
24. o Kg=6, δg=0.2, σg=500; Kl=6, δl=0.3, σl=150;
o BaselineClas: non‐adaptive semi‐supervised approach
o Silva: the classification rule learner by Silva et al.
Performance oscillates for all methods, OPINSTREAM and BaselineCLAS perform
similarly and better than Silva.
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM
24
25. o We presented OPINSTREAM, a framework for the extraction of implicit features
and their associated polarities from a stream of reviews.
o Feature extraction unsupervised task (stream clustering)
• Accumulate novelty in containers
• Cluster decscription length as an indicator of clusters quality
• Fatigue for quantifying the mental effor caused by rebuilds
o Sentiment learning ‐> supervised/semi‐supervised task (stream classification)
• Learn with only an initial seed of labeled documents
• Backward adaptation to count for ageing
• Forward adaptation to expand the seed set by incorporating useful documents
o Ongoing/future work
• Simplifying the framework, parameter tuning, new datasets/ domains
• Semi‐supervised sentiment learning
• Evaluation of backward propagation in full supervised learning
• Evaluation of polarity and aspect learning simultaneously
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 25
27. Original class distribution
Twitter sentiment dataset, 1.6M tweets
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 27
28. o Product reviews dataset: 540 reviews on 9 products, where each review refers to 1
out of 38 (implicit) product feature.
Label & Polarity evolution per cluster: polarity is captured as a shade of gray, where dark stands for positive
Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM 28