Determine the sentiment of sentence that is positive or negative based on the presence of part of
speech tag, the emoticons present in the sentences. For this research we use the most popular microblogging sit
twitter for sentiment orientation. In this paper we want to extract tweets form the twitter related to the product
like mobile phones, home appliances, vehicle etc. After retrieving tweets we perform some preprocessing on it
like remove retweets, remove tweets containing few words with minimum threshold of length five, remove tweets
containing only urls. After this the remaining tweets are pre-processed like that transform all letters of the
tweets to the lower case then remove punctuation from the tweets because it reduces the accuracy of result.
After this remove extra white spaces from the tweets, then we apply a pos tagger to tag each word. The tuple
after the applying above steps contain (word, pos tag, English-word, stop-word). We are interested in only
tweets that contain opinion and eliminate the remaining non-opinion tweets from the data set. For this we use
the Naïve Bays classification algorithm. After this we use short text classification on tweets i.e., the word having
different meaning in different domain. In order to solve this problem we use two different feature selection
algorithms the mutual information (MI) and the X2 feature selection. At final stage predicting the orientation of
an opinion sentence that is positive or negative as we mentioned above. For this we use two model like unigram
model and opinion miner.
Sentiment Analysis on Amazon Movie Reviews DatasetMaham F'Rajput
The document summarizes a project analyzing sentiment in Amazon movie reviews using machine learning techniques. It discusses gathering an Amazon movie reviews dataset containing over 8 million reviews spanning 10+ years. The project aims to provide users a more informed decision on movies by calculating sentiment scores for each review and movie, along with point-wise mutual information scores. Experimental results show the sentiment analysis produces accurate results while analyzing reactions in the Amazon Movie Reviews dataset, despite requiring some human labeling effort. The document outlines the problem statement, introduction, data collection, model selection, results and areas for potential improvement.
This document provides an overview of opinion mining and sentiment analysis. It defines opinion mining as attempting to automatically determine human opinion from natural language text. It discusses some key applications, such as classifying reviews and understanding public opinion. The document also outlines some challenges, such as understanding context and differing domains. It then describes common models for sentiment analysis, including preparing data, analyzing reviews linguistically, and classifying sentiment using techniques like machine learning classifiers.
Natural Language Processing in Artificial Intelligence.
What is the basic concept of Text Normalization? How does it work while processing human languages? Difference between Stemming and Lemmatization. Term Frequency and Inverse Document Frequency contribute to TFIDF.
This version of NLP(PPT) contains the updated contents. In the earlier one, Stemming and Lemmatization processes were not taken into consideration while working with Bag of Words Algorithm. This PPT has come with all those corrections.
Natural Language Processing in Artificial Intelligence.
What is the basic concept of Text Normalization? How does it work while processing human languages? Difference between Stemming and Lemmatization. Term Frequency and Inverse Document Frequency contribute to TFIDF.
This document provides an overview of natural language processing (NLP) and the use of deep learning for NLP tasks. It discusses how deep learning models can learn representations and patterns from large amounts of unlabeled text data. Deep learning approaches are now achieving superior results to traditional NLP methods on many tasks, such as named entity recognition, machine translation, and question answering. However, deep learning models do not explicitly model linguistic knowledge. The document outlines common NLP tasks and how deep learning algorithms like LSTMs, CNNs, and encoder-decoder models are applied to problems involving text classification, sequence labeling, and language generation.
Big Data and Natural Language ProcessingMichel Bruley
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
The document provides a mini review of chatbots, from the early ELIZA chatbot created in 1966 to modern conversational agents like Alexa. It summarizes the key developments in chatbots, including ELIZA which used simple pattern matching to simulate conversations, early natural language processing chatbots like Jabberwacky and Dr. Sbaitso, and modern voice assistants from Apple, Google, Microsoft and Amazon that incorporate more advanced AI techniques. The implications of the original ELIZA chatbot are discussed, namely the tendency of users to perceive computer systems as more intelligent than their underlying programming allows.
Determine the sentiment of sentence that is positive or negative based on the presence of part of
speech tag, the emoticons present in the sentences. For this research we use the most popular microblogging sit
twitter for sentiment orientation. In this paper we want to extract tweets form the twitter related to the product
like mobile phones, home appliances, vehicle etc. After retrieving tweets we perform some preprocessing on it
like remove retweets, remove tweets containing few words with minimum threshold of length five, remove tweets
containing only urls. After this the remaining tweets are pre-processed like that transform all letters of the
tweets to the lower case then remove punctuation from the tweets because it reduces the accuracy of result.
After this remove extra white spaces from the tweets, then we apply a pos tagger to tag each word. The tuple
after the applying above steps contain (word, pos tag, English-word, stop-word). We are interested in only
tweets that contain opinion and eliminate the remaining non-opinion tweets from the data set. For this we use
the Naïve Bays classification algorithm. After this we use short text classification on tweets i.e., the word having
different meaning in different domain. In order to solve this problem we use two different feature selection
algorithms the mutual information (MI) and the X2 feature selection. At final stage predicting the orientation of
an opinion sentence that is positive or negative as we mentioned above. For this we use two model like unigram
model and opinion miner.
Sentiment Analysis on Amazon Movie Reviews DatasetMaham F'Rajput
The document summarizes a project analyzing sentiment in Amazon movie reviews using machine learning techniques. It discusses gathering an Amazon movie reviews dataset containing over 8 million reviews spanning 10+ years. The project aims to provide users a more informed decision on movies by calculating sentiment scores for each review and movie, along with point-wise mutual information scores. Experimental results show the sentiment analysis produces accurate results while analyzing reactions in the Amazon Movie Reviews dataset, despite requiring some human labeling effort. The document outlines the problem statement, introduction, data collection, model selection, results and areas for potential improvement.
This document provides an overview of opinion mining and sentiment analysis. It defines opinion mining as attempting to automatically determine human opinion from natural language text. It discusses some key applications, such as classifying reviews and understanding public opinion. The document also outlines some challenges, such as understanding context and differing domains. It then describes common models for sentiment analysis, including preparing data, analyzing reviews linguistically, and classifying sentiment using techniques like machine learning classifiers.
Natural Language Processing in Artificial Intelligence.
What is the basic concept of Text Normalization? How does it work while processing human languages? Difference between Stemming and Lemmatization. Term Frequency and Inverse Document Frequency contribute to TFIDF.
This version of NLP(PPT) contains the updated contents. In the earlier one, Stemming and Lemmatization processes were not taken into consideration while working with Bag of Words Algorithm. This PPT has come with all those corrections.
Natural Language Processing in Artificial Intelligence.
What is the basic concept of Text Normalization? How does it work while processing human languages? Difference between Stemming and Lemmatization. Term Frequency and Inverse Document Frequency contribute to TFIDF.
This document provides an overview of natural language processing (NLP) and the use of deep learning for NLP tasks. It discusses how deep learning models can learn representations and patterns from large amounts of unlabeled text data. Deep learning approaches are now achieving superior results to traditional NLP methods on many tasks, such as named entity recognition, machine translation, and question answering. However, deep learning models do not explicitly model linguistic knowledge. The document outlines common NLP tasks and how deep learning algorithms like LSTMs, CNNs, and encoder-decoder models are applied to problems involving text classification, sequence labeling, and language generation.
Big Data and Natural Language ProcessingMichel Bruley
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
The document provides a mini review of chatbots, from the early ELIZA chatbot created in 1966 to modern conversational agents like Alexa. It summarizes the key developments in chatbots, including ELIZA which used simple pattern matching to simulate conversations, early natural language processing chatbots like Jabberwacky and Dr. Sbaitso, and modern voice assistants from Apple, Google, Microsoft and Amazon that incorporate more advanced AI techniques. The implications of the original ELIZA chatbot are discussed, namely the tendency of users to perceive computer systems as more intelligent than their underlying programming allows.
From Natural Language Processing to Artificial IntelligenceJonathan Mugan
Overview of natural language processing (NLP) from both symbolic and deep learning perspectives. Covers tf-idf, sentiment analysis, LDA, WordNet, FrameNet, word2vec, and recurrent neural networks (RNNs).
Foundations understanding users and interactionsPreeti Mishra
This document discusses qualitative user research methods. It explains that qualitative research helps understand user behavior, which is too complex to understand solely through quantitative data. Qualitative research methods include interviews, observation, and persona creation. Personas are fictional user archetypes created from interview data to represent different types of users. They are useful for product design by providing empathy for users and guiding decisions. The document provides details on creating personas and using scenarios to represent how personas would interact with a product.
The presentation discusses relationship elements in RDA, which have increased significantly in the new RDA Toolkit. Relationship elements describe associations between entities, with each entity playing a domain or range role. The presentation explores where RDA relationships originated from the FRBR entity-relationship model. It examines the related corporate body relationship in detail and briefly discusses other new relationship concepts like meta-data works. The goals are to help participants better understand relationships and explore additional elements in the new RDA Toolkit.
This document discusses the semantic web and its graphical representation of information. It begins with an introduction to the topic and a discussion of the need to change how the internet is used from a "web of documents" to a "web of data." It then explains key concepts of the semantic web including what semantic means, what the semantic web is, how it can be built, and the differences between the original web and semantic web. Applications of the semantic web are provided like knowledge graphs, information verification, and social networks. The document concludes by discussing the future of the semantic web.
Language is Infrastructure for InteractConf London 2014Andrew Hinton
I had the pleasure of speaking at Interact London in October 2014. I presented an updated version of this talk, which I originally gave at IA Summit earlier in the spring. The talk is based on content from my book, Understanding Context. You can read more about it at http://contextbook.com.
In this version, I have updated the way I'm talking about how language works as environment: instead of 'semantic affordance' I'm now calling it 'semantic function.' (Which is in keeping with how it's now being described in the book.)
Metaphic or the art of looking another way.Suresh Manian
For all intents and purposes, we are our words. And verbs and adjectives capture actions and sentiments better than any other tool. Metaphic is premised on the belief that a grammar book and a calculator are all you really need to make sense of web search and social media chatter, apart from all text, in general.
This document presents a project report on sarcasm analysis using machine learning techniques. It discusses how sarcasm detection is a challenging task in natural language processing due to the gap between the literal and intended meaning of sarcastic texts. The report outlines a methodology to detect sarcasm in tweets by extracting features like intensifiers and interjections and training machine learning classifiers. Naive Bayes, maximum entropy, and decision tree classifiers are tested, with decision trees achieving the highest accuracy of 63%. The conclusion discusses how accuracy could be improved by incorporating better features, and future work includes adding context and detecting sarcasm in other languages.
COMM 100Mass Media and Society Reflection Paper OutlineGuest S.docxmonicafrancis71118
COMM 100 Mass Media and Society Reflection Paper Outline
Guest Speech Reflection Paper Outline
The reflection paper should be between 2 to 4 pages. The paper should be written in narrative form and include the following two elements (do not number them in your essay). I have provided some fictitious examples for each part of the narrative, however these are merely suggestions for what the narrative might look like and do not prescribe a specific approach. Remember, you must have both elements in your paper. Each element counts 10 points, and the structure, organization, transition, and grammar of the paper count 5 points.
1. Briefly summarize some of the key points of the presentation, and comment on them. (10 points)
For example: A social virtual reality (SVR) exists when the user feels that the environment is "real" and the user has some sense of presence in the environment. If these two conditions are not met, then the experience is not a SVR. Presence was defined as the effect the user has of feeling outside the physical body and inside a virtual body. I took this emphasis on user perception to mean that if there is more than one person in an environment at the same time (picture a virtual chat room with avatars) one person may perceive it as SVR and the other might not. For me there was ambiguity in this definition, …
2. Discuss your reactions to or impressions of the presentation. This could include thoughts about extending the research, your reaction to the talk, your observation about others’ reaction to the talk or anything else that relates to the presentation (10 points).
My final thoughts reflect on the question about the "dark side" of technological communication. Given that technology is wrought from human endeavor, I believe a dark side would exist. I think though, that if its dangers are identified and recognized that the negative effects could be mitigated. The example raised dealt with hate groups, and their ability to create a presence and rally supporters. This is a valid concern and does need to be addressed by social scientists in many fields including those in computer mediated communication. To me one of the first defenses…
.
Natural language processing PPT presentationSai Mohith
A ppt presentation for technicial seminar on the topic Natural Language processing
References used:
Slideshare.net
wikipedia.org NLP
Stanford NLP website
Semantic analysis is the process of machines understanding relationships between words and concepts in text to derive meaning. It involves analyzing grammatical structure and identifying connections between individual words in context. Semantic analysis tools can automatically extract meaningful information from unstructured data like emails and customer feedback. Machine learning algorithms are trained using samples of text labeled with semantic information like word meanings, relationships between entities, and more to enable accurate text analysis. The results can then be used for tasks like text classification, sentiment analysis, intent analysis, keyword and entity extraction.
This document provides a live transcript of an ALA webinar on relationship elements in RDA. The webinar host introduces the presenter, Thomas Brenndorfer, and covers some logistical information. Brenndorfer then begins his presentation on relationship elements, discussing how they have been treated more consistently in the new RDA toolkit compared to the original. He outlines the topics that will be covered, including the concepts of relationships, a case study on a specific relationship element, and how relationship elements tie into other new RDA concepts.
The document discusses the use of neural networks and deep learning techniques like word2vec and seq2seq models to develop representations of language that computers can understand without explicit symbolic representations or rules. It notes that while these techniques have achieved success, computers still lack a grounded understanding of language and the ability to reason about language based on real-world experiences and commonsense knowledge.
Help with Writing Essay Questions: Types and Examples. How to guide (Answering an Essay Question L1 English). PPT - Essay Question PowerPoint Presentation, free download - ID:5809464. Examples Of Essay Questions And Answers. Examples Of Argumentative Essays 5Th Grade / Sample 5 Paragraph Essay ....
This is a deck i would often use highlighting the mess of website irrelevance I call today, Microsoft.com and its associate sites.
There is way to much noise and not enough signal and the deck hopefully highlights one slice of this reasoning.
Beyond Buzz - Web 2.0 Expo - K.Niederhoffer & M.Smithkategn
A framework to measure a conversation based on approaches from social psychology and sociology. Beyond quantity of buzz, we propose measuring the context of conversation: the signal, person, role, and ecosystem.
2820181Phil 2 Puzzles and ParadoxesProf. Sven B.docxlorainedeserre
This document discusses Grelling's Paradox, which is a semantic paradox similar to the liar paradox. It defines the terms "heterological" and "autological" and examines whether the term "heterological" is itself heterological. It leads to a contradiction, as both assuming that "heterological" is and is not heterological results in a contradiction. The document then shifts topics to discuss future trends in training and development, including increased use of new technologies, sustainability initiatives, and advances in areas like neuroscience and data analysis that will influence the field.
Rules For Writing Numbers Know When To Spell Them Out YourDictionaryAllison Thompson
Internal equity refers to fairness and consistency within an organization's compensation system. It means that employees in similar jobs with similar levels of skills, effort, responsibility, and working conditions should be paid fairly in comparison to one another.
Job evaluation is a process used by organizations to analyze and compare the relative value of different jobs. It helps establish internal equity by determining appropriate pay ranges or grades for different jobs based on factors like skill, effort, responsibility, and working conditions. Job evaluation helps ensure that pay is aligned with job content and responsibilities, thus supporting internal equity within the compensation system.
2. What are the main steps in conducting a job evaluation?
The main steps in conducting a job evaluation include:
1. Selecting
Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. NLP analyzes text to determine meaning and relationships between words in order to automatically perform tasks like translation, information extraction, and sentiment analysis. Common applications of NLP include virtual assistants, chatbots, language translation, text extraction, and sentiment analysis of customer feedback.
From Natural Language Processing to Artificial IntelligenceJonathan Mugan
Overview of natural language processing (NLP) from both symbolic and deep learning perspectives. Covers tf-idf, sentiment analysis, LDA, WordNet, FrameNet, word2vec, and recurrent neural networks (RNNs).
Foundations understanding users and interactionsPreeti Mishra
This document discusses qualitative user research methods. It explains that qualitative research helps understand user behavior, which is too complex to understand solely through quantitative data. Qualitative research methods include interviews, observation, and persona creation. Personas are fictional user archetypes created from interview data to represent different types of users. They are useful for product design by providing empathy for users and guiding decisions. The document provides details on creating personas and using scenarios to represent how personas would interact with a product.
The presentation discusses relationship elements in RDA, which have increased significantly in the new RDA Toolkit. Relationship elements describe associations between entities, with each entity playing a domain or range role. The presentation explores where RDA relationships originated from the FRBR entity-relationship model. It examines the related corporate body relationship in detail and briefly discusses other new relationship concepts like meta-data works. The goals are to help participants better understand relationships and explore additional elements in the new RDA Toolkit.
This document discusses the semantic web and its graphical representation of information. It begins with an introduction to the topic and a discussion of the need to change how the internet is used from a "web of documents" to a "web of data." It then explains key concepts of the semantic web including what semantic means, what the semantic web is, how it can be built, and the differences between the original web and semantic web. Applications of the semantic web are provided like knowledge graphs, information verification, and social networks. The document concludes by discussing the future of the semantic web.
Language is Infrastructure for InteractConf London 2014Andrew Hinton
I had the pleasure of speaking at Interact London in October 2014. I presented an updated version of this talk, which I originally gave at IA Summit earlier in the spring. The talk is based on content from my book, Understanding Context. You can read more about it at http://contextbook.com.
In this version, I have updated the way I'm talking about how language works as environment: instead of 'semantic affordance' I'm now calling it 'semantic function.' (Which is in keeping with how it's now being described in the book.)
Metaphic or the art of looking another way.Suresh Manian
For all intents and purposes, we are our words. And verbs and adjectives capture actions and sentiments better than any other tool. Metaphic is premised on the belief that a grammar book and a calculator are all you really need to make sense of web search and social media chatter, apart from all text, in general.
This document presents a project report on sarcasm analysis using machine learning techniques. It discusses how sarcasm detection is a challenging task in natural language processing due to the gap between the literal and intended meaning of sarcastic texts. The report outlines a methodology to detect sarcasm in tweets by extracting features like intensifiers and interjections and training machine learning classifiers. Naive Bayes, maximum entropy, and decision tree classifiers are tested, with decision trees achieving the highest accuracy of 63%. The conclusion discusses how accuracy could be improved by incorporating better features, and future work includes adding context and detecting sarcasm in other languages.
COMM 100Mass Media and Society Reflection Paper OutlineGuest S.docxmonicafrancis71118
COMM 100 Mass Media and Society Reflection Paper Outline
Guest Speech Reflection Paper Outline
The reflection paper should be between 2 to 4 pages. The paper should be written in narrative form and include the following two elements (do not number them in your essay). I have provided some fictitious examples for each part of the narrative, however these are merely suggestions for what the narrative might look like and do not prescribe a specific approach. Remember, you must have both elements in your paper. Each element counts 10 points, and the structure, organization, transition, and grammar of the paper count 5 points.
1. Briefly summarize some of the key points of the presentation, and comment on them. (10 points)
For example: A social virtual reality (SVR) exists when the user feels that the environment is "real" and the user has some sense of presence in the environment. If these two conditions are not met, then the experience is not a SVR. Presence was defined as the effect the user has of feeling outside the physical body and inside a virtual body. I took this emphasis on user perception to mean that if there is more than one person in an environment at the same time (picture a virtual chat room with avatars) one person may perceive it as SVR and the other might not. For me there was ambiguity in this definition, …
2. Discuss your reactions to or impressions of the presentation. This could include thoughts about extending the research, your reaction to the talk, your observation about others’ reaction to the talk or anything else that relates to the presentation (10 points).
My final thoughts reflect on the question about the "dark side" of technological communication. Given that technology is wrought from human endeavor, I believe a dark side would exist. I think though, that if its dangers are identified and recognized that the negative effects could be mitigated. The example raised dealt with hate groups, and their ability to create a presence and rally supporters. This is a valid concern and does need to be addressed by social scientists in many fields including those in computer mediated communication. To me one of the first defenses…
.
Natural language processing PPT presentationSai Mohith
A ppt presentation for technicial seminar on the topic Natural Language processing
References used:
Slideshare.net
wikipedia.org NLP
Stanford NLP website
Semantic analysis is the process of machines understanding relationships between words and concepts in text to derive meaning. It involves analyzing grammatical structure and identifying connections between individual words in context. Semantic analysis tools can automatically extract meaningful information from unstructured data like emails and customer feedback. Machine learning algorithms are trained using samples of text labeled with semantic information like word meanings, relationships between entities, and more to enable accurate text analysis. The results can then be used for tasks like text classification, sentiment analysis, intent analysis, keyword and entity extraction.
This document provides a live transcript of an ALA webinar on relationship elements in RDA. The webinar host introduces the presenter, Thomas Brenndorfer, and covers some logistical information. Brenndorfer then begins his presentation on relationship elements, discussing how they have been treated more consistently in the new RDA toolkit compared to the original. He outlines the topics that will be covered, including the concepts of relationships, a case study on a specific relationship element, and how relationship elements tie into other new RDA concepts.
The document discusses the use of neural networks and deep learning techniques like word2vec and seq2seq models to develop representations of language that computers can understand without explicit symbolic representations or rules. It notes that while these techniques have achieved success, computers still lack a grounded understanding of language and the ability to reason about language based on real-world experiences and commonsense knowledge.
Help with Writing Essay Questions: Types and Examples. How to guide (Answering an Essay Question L1 English). PPT - Essay Question PowerPoint Presentation, free download - ID:5809464. Examples Of Essay Questions And Answers. Examples Of Argumentative Essays 5Th Grade / Sample 5 Paragraph Essay ....
This is a deck i would often use highlighting the mess of website irrelevance I call today, Microsoft.com and its associate sites.
There is way to much noise and not enough signal and the deck hopefully highlights one slice of this reasoning.
Beyond Buzz - Web 2.0 Expo - K.Niederhoffer & M.Smithkategn
A framework to measure a conversation based on approaches from social psychology and sociology. Beyond quantity of buzz, we propose measuring the context of conversation: the signal, person, role, and ecosystem.
2820181Phil 2 Puzzles and ParadoxesProf. Sven B.docxlorainedeserre
This document discusses Grelling's Paradox, which is a semantic paradox similar to the liar paradox. It defines the terms "heterological" and "autological" and examines whether the term "heterological" is itself heterological. It leads to a contradiction, as both assuming that "heterological" is and is not heterological results in a contradiction. The document then shifts topics to discuss future trends in training and development, including increased use of new technologies, sustainability initiatives, and advances in areas like neuroscience and data analysis that will influence the field.
Rules For Writing Numbers Know When To Spell Them Out YourDictionaryAllison Thompson
Internal equity refers to fairness and consistency within an organization's compensation system. It means that employees in similar jobs with similar levels of skills, effort, responsibility, and working conditions should be paid fairly in comparison to one another.
Job evaluation is a process used by organizations to analyze and compare the relative value of different jobs. It helps establish internal equity by determining appropriate pay ranges or grades for different jobs based on factors like skill, effort, responsibility, and working conditions. Job evaluation helps ensure that pay is aligned with job content and responsibilities, thus supporting internal equity within the compensation system.
2. What are the main steps in conducting a job evaluation?
The main steps in conducting a job evaluation include:
1. Selecting
Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. NLP analyzes text to determine meaning and relationships between words in order to automatically perform tasks like translation, information extraction, and sentiment analysis. Common applications of NLP include virtual assistants, chatbots, language translation, text extraction, and sentiment analysis of customer feedback.
Similar to Web & Social Media Analystics - Workshop Semantica (20)
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
2. Part 1WARM UP
(10 min)
Exercise
Ask for a volunteer
Inside the box, ask the V. to look for 1 specific item
Bend the V.’s eyes and ask her to find similar items, retrieve and put them in
different places depending on their similarity
3. Part 1WARM UP - EXPLANATION
What does it all mean?
What you have just met is the problem the your computer faces:
If you ask it to “find” the item that you need, it will do: “find” actually
means “match what I give you with what you have in your db”
What your computer is not able to do is to put similar things
together or to separate the different ones
Or better, it’s not able to make new categories which include similar items
In other words, topics ;)
4. But why isn’t your computer able to make up such new categories?
The answer is pretty straight… Because it does not know what those objects are and “mean”
Everything your computer sees in a text is a series of characters. So, in a sentence like
“Roberto is having great fun in this workshop!”,
the thing that your computer sees is actually just…
“Xxxxxx zv gdatdin dhdp3 axnwbx sdxn hwbxwbx xbwxhbjwx!”
That is, it just does not know what those sequences of chars stand for.
And the only things that he can put together are ”similar shapes”
That’s why you need to tag...
Part 1WARM UP - EXPLANATION
5. PART 2
WHAT IS “SEMANTICS”
AND WHAT IS IT FOR A COMPUTER?
6. THERE ARE MANY TYPES OF MEANING
Semantics is meaning. But first of all, let’s broaden the
meaning of what “meaning” means :P
Actually, we should better talk with plurals, ie. meanings.
There are several types of meanings, and each one
depends on the purpose of the communication (or better,
communicative action)
7. Some examples of types of meaning could be:
Referents, labels, relations
Events
Text cohesion
Units of analysis
User intentions
Context: implications and consequences
Homo Sapiens (or just people, ehe) can understand all these types of meanings, and many more.
THERE ARE MANY TYPES OF MEANING
8. BUT WHAT CAN A COMPUTER UNDERSTAND?
A lot, for sure. But a computer does not have (yet) the knowledge about the
world that a human being has gained since she was born. By “knowledge of
the world” we mean all of possible information registered in many ways,
from biological perception to cultural education and social habits.
if this all sounds so far away from your need to make a tool work, think like this:
there are so many implied and underlying meanings in the text your tool is
processing that it just does not know anything about. That’s why you need
to cover its lacks of knowledge.
9. REFERENTS, LABELS AND RELATIONS
a REFERENT is the OBJECT : a person, a company, an event (oh, btw, these are also the
SmartThemes in TalkWalker!).
A Referent could be sticked with more than just one LABEL: a name is a label for a person, or
company, or even an event (eg. a title for a conference). For example, the curly guy writing
these slides is named Roberto. And he’s also Digital Analyst. And he’s also the guy travelling
from and to Bergamo every day. These are all ways (labels) that could be used to refer to the
same Referent (object) ALTERNATIVELY.
This is important to your computer (and you behind it; btw, why aren’t you sitting in the front?)
because the writer of a text could refer to the same object in many ways, and you could miss
out some results because you didn’t set up those keyword in your query.
10. REFERENTS, LABELS AND RELATIONS
And then there are SYNONYMS, ie. things that are similar and thus they sometimes occur in
the same contexts, close to one another. Unfortunately, a computer doesn’t know there 2
words are synonyms, unless you instruct it that they are. Thus, 2 or more words could have a
RELATION of synonymy (or antonymy, to put it veeeery simply) and belong to the same
SEMANTIC AREA.
11. Referents, labels, relations
Entities
Have attributes and features and are involved in events
Could be referred to with different labels
Are in relations with other similar entities, which have
different names, but which sometimes are used in their
place
12. BUT WHAT CAN A COMPUTER UNDERSTAND?
First of all, a computer lacks all of this knowledge
about the world and the language. This is why tech
giants are building it.
Schema.org
Google’s Knowledge graph
15. EVENT STRUCTURES - FRAMES
The basic idea is that one cannot understand the meaning of a single word without
access to all the essential knowledge that relates to that word.
For example, one would not be able to understand the word "sell" without knowing
anything about the situation of commercial transfer, which also involves, among
other things, a seller, a buyer, goods, money, the relation between the money
and the goods, the relations between the seller and the goods and the money, the
relation between the buyer and the goods and the money and so on.
Thus, a word activates, or evokes, a frame of semantic knowledge relating to the
specific concept it refers to (or highlights, in frame semantic terminology)
17. EVENT STRUCTURES
So, in events there are
PEOPLE DOING THINGS TO OTHER PEOPLE, maybe WITH SOME THING
So far, so good.
Imagine if you could relate the entities you identify to the actions they take.
Imagine if your computer could do that….
And indeed, there are projects working on the description of events. Ever heard about Framenet?
18. UNDERSTADING TEXTS
Understanding a text is definitely much more than plain reading it (or its
“graphical shapes”). Especially when it comes to relate it to other texts and
make content and meaningful collections, ie. Gathering into topics
In order to have a computer understand even the flatter meaning of a sentence
(not even a text) we would need at least
A dictionary: to provide for linguistic information (eg. Grammatical meta-
data)
An ontology: to relate entities and frame them into events structures
This is the future of Semantic Web. But this one is already another story…
19. USER’S INTENTIONS
What is an “intention” ?
It’s a form of meaning, ie. Pragmatic. Something that is
present in one’s mind
Intentions are made of both the motivations to take an action
and the result the one wants to achieve by that action
Intentions show the background where the motivation was
formed (eg. Emotionally) and the direction where the user’s
attention is heading
20. In search: how does a search engine satisfy the query with a broader
scope?
Uses a dictionary (with synonyms, variants, etc. all included in the
algorithm): the case of Google’s broad match type
In social networks:
Posts, comments, shares: which one counts more?
A scale of “original” content
why did Fb introduce reactions?
To give a limited and predefinite set of emotions beyond the Like button
Are they a source for sentiment analysis? (yes, to count in social media analytics, with
breakdown)
USER’S INTENTIONS
21. And what about text? How can a computer understand the intention of a (written or spoken)
text?
IT IS A HUUUUUUUUUUUUGGGEEE QUESTION
Most tech giants are hardly working on this, making great efforts in developing algorithms and
computing systems. Here’s an excerpt from the Microsoft Speech Technology Dept.
Intent understanding is about identifying the action a user wants a computer to take or the
information she/he would like to obtain, conveyed in a spoken utterance or a text query.
USER’S INTENTIONS
22. USER’S INTENTIONS
… and which way are they of interest to ORM?
Identifying the background and the direction of an intention may provide a “path” of
action. Which could potentially be a pattern (in which it could be possible to intervene)
words can be ambiguous (not clearly connoted, or not used for their straight meaning,
eg. Irony). Identifying the intention of use can help attributing the best fitting sentiment
Computers don’t have (yet) the ability to “sense” the overall mood of a situation
Considering the pragmatic dimension of intention (and context, later) broadens the
perspective of ORM beyond keywords and metrics, and can help writing more significant
insights
23. Context: implications and consequences
Context meaning can be conceived of as the meaning that is received by the people (more
or less) independently of the speaker’s intentions
Or said differently, the effects that a communicative act brings about within the environment
that it gets into
Shares on Social networks: a case of “mute meaning”
Intention is to amplify the attention on the news, to make it own, to show support
If no comments are added, shares are examples of how a content spreads the consequences of the
original content
24. PART 3 SENTIMENT ANALYSYS
Units of analysis for sentiment attribution
Word?
Sentences?
document?
Discourse?
Topic / Theme?
Data-driven approach
Exc. 2
Top-down?
Bottom-up?
25. WHAT IS AN OPINION? ABSTRACTION 1
“I bought an iPhone a few days ago. It is such a nice phone. The touch screen
is really cool. The voice quality is clear too. It is much better than my old
Blackberry, which was a terrible phone and so difficult to type with its tiny
keys. However, my mother was mad with me as I did not tell her before I
bought the phone. She also thought the phone was too expensive, ...” (Liu, Ch.
in NLP handbook, 2010)
One can look at this review/blog at the
document level, i.e., is this review + or -?
sentence level, i.e., is each sentence + or -? entity and feature/aspect level
26. Entity and aspect/feature level
“I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool.
The voice quality is clear too. It is much better than my old Blackberry, which was a terrible
phone and so difficult to type with its tiny keys. However, my mother was mad with me as I
did not tell her before I bought the phone. She also thought the phone was too expensive,
...”
What do we see?
Opinion targets: entities and their features/aspects
Sentiments: positive and negative
Opinion holders: persons who hold the opinions
Time: when opinions are expressed
27. OPINION LOGIC STRUCTURE
An opinion is a quintuple
(ej, ajk, soijkl, hi, tl)
where
ej is a target entity.
ajk is an aspect/feature of the entity ej.
soijkl is the sentiment value of the opinion from the opinion holder hi on aspect ajk of entity
ej at time tl. soijkl is positive, negative, or neutral, or a more granular rating.
hi is an opinion holder.
tl is the time when the opinion is expressed.
Opinion definition (Liu, Ch. in NLP handbook, 2010)
28. HOW TO USE THIS OPINION LOGIC STRUCTURE?
With this logic, it’s possible to face the issue to structure the unstructured
Goal: Given an opinionated document
we can discover all quintuples (ej, ajk, soijkl, hi, tl),
Or, solve some simpler forms of the problem; E.g., sentiment classification at the document or
sentence level.
With the quintuples, it’s possible to convert unstructured Text to structured Data
Traditional data and visualization tools can be used to slice, dice and visualize the results
However, as seen in the logic structure, tools need to have dictionaries and ontologies built-in
It is then possible to enable qualitative and quantitative analysis
29. OPINION SUMMARY (ABSTRACTION 2)
With a lot of opinions, a summary is necessary.
It’s a multi-document summarization task
For factual texts, summarization is to select the most important facts and present them in a
sensible order while avoiding repetition
1 fact = any number of the same fact
But for opinion documents, it is different because opinions have a quantitative side & have
targets
1 opinion = a number of opinions
Aspect-based summary is more suitable
quintuples form the basis for opinion summarization
30. Aspect-based opinion summary
(Hu & Liu, 2004)
““I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear
too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys.
However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was
too expensive, ...”
Feature Based Summary of iPhone:
Feature1: Touch screen
Positive: 212
The touchscreen was really cool.
The touch screen was so easy to use and can do amazing things.
...
Negative: 6
The screen is easily scratched.
I have a lot of difficulty in removing finger marks from the touch screen.
...
Feature2: voice quality
…
31. ASPECT-BASED OPINION SUMMARY
This approach seems to be more suitable also for ORM purposes
because the variety and fragmentation of target objects is extremely
wide when it comes to summerize the reputation of products and
people
Indeed, it allows to breakdown more aspects of a object and to
assess them
33. DATA-DRIVEN APPROACH
Exc 2.
Think of today’s presentation’s parts
On a colored sticky notes, write the part that you like the most (green), medium (yellow) and
the least (red)
Parts were
Topic grouping
Semantics
Sentiment and opinions
34. Top-down?
The one that we currently use
We identify some documents and assign them some sentiment
It’s a ”fake bottom-up approach” because we can’t read all documents (for limited time and
resources)
Bottom-up?
a more fine-grained approach: sentiment tagging at word, sentence, or document level?
App2check tags at sentences level. Then calculates the average sentiment of all sentences, and assigns it to the
document (single review)
The document sentiment is compared and matched against the user’s rating, as a control measure
Topics are also rated: topics are identified as keywords within the sentences that are opinionated, and get
calculated on average
DATA-DRIVEN APPROACH