Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
( **Natural Language Processing Using Python: - https://www.edureka.co/python-natural... ** )
This PPT will provide you with detailed and comprehensive knowledge of the two important aspects of Natural Language Processing ie. Stemming and Lemmatization. It will also provide you with the differences between the two with Demo on each. Following are the topics covered in this PPT:
Introduction to Big Data
What is Text Mining?
What is NLP?
Introduction to Stemming
Introduction to Lemmatization
Applications of Stemming & Lemmatization
Difference between stemming & Lemmatization
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
( **Natural Language Processing Using Python: - https://www.edureka.co/python-natural... ** )
This PPT will provide you with detailed and comprehensive knowledge of the two important aspects of Natural Language Processing ie. Stemming and Lemmatization. It will also provide you with the differences between the two with Demo on each. Following are the topics covered in this PPT:
Introduction to Big Data
What is Text Mining?
What is NLP?
Introduction to Stemming
Introduction to Lemmatization
Applications of Stemming & Lemmatization
Difference between stemming & Lemmatization
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
** NLP Using Python: - https://www.edureka.co/python-natural-language-processing-course **
This Edureka PPT will provide you with a comprehensive and detailed knowledge of Natural Language Processing, popularly known as NLP. You will also learn about the different steps involved in processing the human language like Tokenization, Stemming, Lemmatization and much more along with a demo on each one of the topics.
The following topics covered in this PPT:
1. The Evolution of Human Language
2. What is Text Mining?
3. What is Natural Language Processing?
4. Applications of NLP
5. NLP Components and Demo
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Presentation took during Computer Linguistics course at UPF (Universitat Pompeu Fabra) covering the following topic:
Information Extraction
Jerry R. Hobbs, University of Southern California Ellen Riloff, University of Utah
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
** NLP Using Python: - https://www.edureka.co/python-natural-language-processing-course **
This Edureka PPT will provide you with a comprehensive and detailed knowledge of Natural Language Processing, popularly known as NLP. You will also learn about the different steps involved in processing the human language like Tokenization, Stemming, Lemmatization and much more along with a demo on each one of the topics.
The following topics covered in this PPT:
1. The Evolution of Human Language
2. What is Text Mining?
3. What is Natural Language Processing?
4. Applications of NLP
5. NLP Components and Demo
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Presentation took during Computer Linguistics course at UPF (Universitat Pompeu Fabra) covering the following topic:
Information Extraction
Jerry R. Hobbs, University of Southern California Ellen Riloff, University of Utah
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Presentation of work that will be published at EMNLP 2016.
Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, Sebastian Riedel. emoji2vec: Learning Emoji Representations from their Description. SocialNLP at EMNLP 2016. https://arxiv.org/abs/1609.08359
Georgios Spithourakis, Isabelle Augenstein, Sebastian Riedel. Numerically Grounded Language Models for Semantic Error Correction. EMNLP 2016. https://arxiv.org/abs/1608.04147
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva. Stance Detection with Bidirectional Conditional Encoding. EMNLP 2016. https://arxiv.org/abs/1606.05464
The Semantic Web meets the Code of Federal Regulationstbruce
Semantic Web and natural-language-processing techniques meet the Code of Federal Regulations. Presentation from CALICON12 by the Legal Information Institute. Work on definition extraction, linked data publishing, search enhancement, vocabulary discovery.
Joint presentation with Nuria Casellas.
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
Iulia Pasov is a senior Data Scientist working for Sixt SE, as well as a PhD student in Artificial Intelligence and Psychology and a WiDS Ambassador. As a Data Scientist, Iulia focuses on building AI-based services meant to optimize car rental processes, as well as pipelines for automatic training and deploying of machine learning models. For her studies, she searches ways to improve learning in online knowledge building communities with the use of artificial intelligence.
Speech Overview:
Sentiment analysis is one of the most known sub-domains of Natural Language Processing (NLP), especially used in the classification of feedback messages. This talk will condense over 15 years of research on different approaches in sentiment analysis, as they evolved during time. The audience will be guided through the advantages and disadvantages of each method, in order to understand how to approach the topic given their needs.
Presented by Ted Xiao at RobotXSpace on 4/18/2017. This workshop covers the fundamentals of Natural Language Processing, crucial NLP approaches, and an overview of NLP in industry.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
An overview of some core concept in natural language processing, some example (experimental for now!) use cases, and a brief survey of some tools I have explored.
ULM-1 Understanding Languages by Machines: The borders of AmbiguityRubén Izquierdo Beviá
In this presentation will explore the closed world of language as a system of word relations. Words and texts are highly ambiguous, but we believe the complete
scope and complexity of this ambiguity is not well defined yet. The goal is to more properly define the problem and find the optimal solution given the vast volumes of textual data that are available.
Most of the WSD systems are not tacking properly the problem and the context is not being modelled in a proper way. Besides to this, lately WSD has been changed from a purely lexical approach
(static view) to a reference approach (dynamic view). Considering these two facts, the role of the background and discourse information is crucial.
To prove our hypothesis about what WSD systems are not facing properly, we performed an error analysis on the participant outputs of the SensEval/SemEval WSD competitions. Interesting and
surprising conclusions came out of this analysis.
Finally, our participation on the last SemEval-2015 task 13: Multilingual All-Words WSD and Entity Linking. In our system we implement our ideas about using background information to perform WSD.
In this paper we present an approach to Word Sense Disambiguation based on Topic Modeling (LDA). Our approach consists of two different steps, where first a binary classifier is applied to decide whether the most frequent sense applies or not, and then another classifier deals with the non most frequent sense cases. An exhaustive evaluation is performed on the Spanish corpus Ancora, to analyze the performance of our two-step system and the impact of the context and the different parameters in the system. Our best experiment reaches an accuracy of 74.53, which is 6 points over the highest baseline. All the software developed for these experiments has been made freely available, to enable reproducibility and allow the re-usage of the software.
CLIN-2015 Presentation
Word Sense Disambiguation is still an unsolved problem in Natural Language Processing.
We claim that most approaches do not model the context correctly, by relying
too much on the local context (the words surrounding the word in question), or on
the most frequent sense of a word. In order to provide evidence for this claim, we
conducted an in-depth analysis of all-words tasks of the competitions that have been
organized (Senseval 2&3, Semeval-2007, Semeval-2010, Semeval 2013). We focused
on the average error rate per competition and across competitions per part of speech,
lemma, relative frequency class, and polysemy class. In addition, we inspected the
“difficulty” of a token(word) by calculating the average polysemy of the words in the
sentence of a token. Finally, we inspected to what extent systems always chose the
most frequent sense. The results from Senseval 2, which are representative of other
competitions, showed that the average error rate for monosemous words was 33.3%
due to part of speech errors. This number was 71% for multiword and phrasal verbs.
In addition, we observe that higher polysemy yields a higher error rate. Moreover, we
do not observe a drop in the error rate if there are multiple occurrences of the same
lemma, which might indicate that systems rely mostly on the sentence itself. Finally,
out of the 799 tokens for which the correct sense was not the most frequent sense, system
still assigned the most frequent sense in 84% of the cases. For future work, we plan
to develop a strategy in order to determine in which context the predominant sense
should be assigned, and more importantly when it should not be assigned. One of the
most important parts of this strategy would be to not only determine the meaning of
a specific word, but to also know it’s referential meaning. For example, in the case of
the lemma ‘winner’, we do not only want to know what ‘winner’ means, but we also
want to know what this ‘winner’ won and who this ‘winner’ was.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. Text Mining Course
• 1) Introduction to Text Mining
• 2) Introduction to NLP
• 3) Named Entity Recognition and Disambiguation
• 4) Opinion Mining and Sentiment Analysis
• 5) Information Extraction
• 6) NewsReader and Visualisation
• 7) Guest Lecture and Q&A
3. Outline
1. What is Information Extraction
2. Main goals of Information Extraction
3. Information Extraction Tasks and Subtasks
4. MUC conferences
5. Main domains of Information Extraction
6. Methods for Information Extraction
o Cascaded finite-state transducers
o Regular expressions and patterns
o Supervised learning approaches
o Weakly supervised and unsupervised approaches
7. How far we are with IE
4. What is IE?
• Late 1970s within NLP field
• Find and extract automatically limited relevant
parts of texts
• Merge information from many pieces of text
5. What is IE?
• Quite often in specialized domains
• Move from unstructured/semi-structured data to
structured data
o Schemas
o Relations (as a database)
o Knowledge base
o RDF triples
6. What is IE?
Unstructured text
• Natural language sentences
• Historically NLP system have been designed to process this type of data
• The meaning à linguistic analysis and natural language understanding
7. What is IE?
Semi-‐‑structured text
• The physical layout helps to the interpretation
• Processing half way linguistic features ßà positional features
9. Main goals of IE
• Fill a predefined “template” from raw text
• Extract who did what to whom and when?
o Event extraction
• Organize information so that is useful to people
• Put information in a form that allows further
inferences by computers
o Big data
10. IE. Task & Subtasks
• Named Entity Recognition
o Detection à Mr. Smith eats bitterballen [Mr. Smith] : ENTITY
o Classification à Mr. Smith eats bitterballen [Mr. Smith] : PERSON
• Event extraction
o The thief broke the door with a hammer
• CAUSE_HARMà Verb: break
Agent: the thief
Patient: the door
Instrument: a hammer
• Coreference resolution
o [Mr. Smith] eats bitterballen. Besides to this, [he] only drinks Belgium beer.
11. IE. Task & Subtasks
• Relationship extraction
o Bill works for IBM PERSON works for ORGANISATION
• Terminology extraction
o Finding relevant terms of multi words from a given corpus
• Some concrete examples
o Extracting earnings, profits, board members, headquarters from company
reports
o Searching on the WWW for e-mails for advertising (spamming)
o Learn drug-gene product interactions from biomedical research papers
13. MUC conferences
• Message Understanding Conference (MUC), held
between 1987 and 1998.
• Domain specific texts + training examples + template
definition
• Precision, Recall and F1 as evaluation
• Domains
o MUC-1 (1987), MUC-2 (1989): Naval operations messages.
o MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.
o MUC-5 (1993): Joint ventures and microelectronics domain.
o MUC-6 (1995): News articles on management changes.
o MUC-7 (1998): Satellite launch reports.
14. MUC conferences
Bridgestone Sports Co. said Friday it has set up a joint venture in
Taiwan with a local concern and a Japanese trading house to produce
golf clubs to be shipped to Japan.
The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20
million new Taiwan dollars, will start production in January 1990 with
production of 20,000 iron and “metal wood” clubs a month.
Example from MUC5
15. Main domains of IE
• Terrorist events
• Joint ventures
• Plane crashes
• Disease outbreaks
• Seminar announcements
• Biological and medical domain
16. Outline
1. What is Information Extraction
2. Main goals of Information Extraction
3. Information Extraction Tasks and Subtasks
4. MUC conferences
5. Main domains of Information Extraction
6. Methods for Information Extraction
o Cascaded finite-state transducers
o Regular expressions and patterns
o Supervised learning approaches
o Weakly supervised and unsupervised approaches
7. How far we are with IE
17. Methods for IE
• Cascaded finite-state transducers
o Rule based
o Regular expressions
• Learning based approaches
o Traditional classifiers
• Bayes, MME, SVM …
o Sequence label models
• HMM, CMM, CRF
• Unsupervised approaches
• Hybrid approaches
18. Cascaded finite-‐‑state
transducers
• Emerging idea from MUC participants and
approaches
• Decompose the task into small sub-tasks
• One element is read at a time from a sequence
o Depending on the type a certain transition in produced in the automaton
to a new state
o Some states are considered final (the input matches a certain pattern)
• Can be defined as a regular expression
20. Cascaded finite-‐‑state
transducers
• Earlier stages recognize smaller linguistics objects
o Usually domain independent
• Later stages build on top of the previous ones
o Usually domain dependent
• Typical IE systems
1. Complex words
2. Basic phrases
3. Complex phrases
4. Domain events
5. Merging structures
21. Cascaded finite-‐‑state
transducers
• Complex words
o Multiwords: “set up” “trading house”
o NE: “Bridgestone Sports Co”
• Basic Phrases
o Syntactic chunking
• Noun groups (head noun + all modifiers)
• Verb groups
23. Cascaded finite-‐‑state
transducers
• Complex phrases
o Complex noun and verb groups on the basis of syntactic information
• The attachment of appositives to their head noun group
o “The joint venture, Bridgestone Sports Taiwan Co.,”
• The construction of measure phrases
o “20,000 iron and ‘metal wood’ clubs a month”
24. Cascaded finite-‐‑state
transducers
• Domain events
o Recognize events and match with “fillers” detected in previous steps
o Requires domain specific patterns
• To recognize phrases of interest
• To define what are the roles
o Patterns can be defined also as a finite-state machines or regular
expressions
• <Company/ies><Set-up><Joint-Venture> with <Company/ies>
• <Company><Capitalized> at <Currency>
26. Regular Expressions
• 1950’s Stephen Kleene
• A string pattern that describes/matches a set of
strings
• A regular expression consists of:
o Characters
o Operation symbols
• Boolean (and/or)
• Grouping (for defining scopes)
• Quantification
27. Regular Expressions
Character
Description
a
The character a
.
Any single character
[abc]
Any character in the brackets (OR) ‘a’
or ‘b’ or ‘c’
[^abc]
Any character not in the brackets. Any
symbol that is not ‘a ‘ or ‘b’ or ‘c’
*
Quantifier. Matches the preceding
element ZERO or more times
+
Quantifier. Matches the preceding
element ONE or more times
?
Matches the previous element zero or
one time
|
Choice (OR) Matches one of the
expressions (before of after the |)
29. Regular Expressions
① .at è hat cat bat xat …
② [hc]at è hat cat
③ [^b]at è all matched by .at but “bat”
④ [^hc]at è all match by .at but “hat” and
“cat”
⑤ s.* è s sssss ssbsd2ck3e
30. Regular Expressions
① .at è hat cat bat xat …
② [hc]at è hat cat
③ [^b]at è all matched by .at but “bat”
④ [^hc]at è all match by .at but “hat” and
“cat”
⑤ s.* è s sssss ssbsd2ck3e
⑥ [hc]*at è hat cat hhat chat cchhat at …
⑦ cat|dogè cat dog
⑧ ….
⑨ ….
31. Using Regular
Expressions
• Typically extracting information from automatic
generated webpages is easy
o Wikipedia
• To know the country for a given city
o Amazon webpage
• From a list of hits
o Weather forecast webpages
o DBpedia
35. Using Regular
Expressions
• Some “unstructured” pieces of information keep
some structure and are easy to capture by means
of regular expressions
o Phone numbers
o What else?
o …
o ...
36. Using Regular
Expressions
• Some “unstructured” pieces of information keep
some structure and are easy to capture by means
of regular expressions
o Phone numbers
o E-mails
o URL Websites
37. Using Regular
Expressions
• Also to detect relations and fill events
• Higher level regular expressions make use of
“objects” detected by lower level patterns
• Some NLP information may help (pos tags, phrases,
semantic word categories)
o Crime-Victim can use things matched by “noun-group”
• Prefiller: [pos: V, type-of-verb: KILL] WordNet MCR
• Filler: [phrase: NOUN-GROUP]
38. Using Regular
Expressions
• Extraction relations between entities
o Which PERSON holds what POSITION in what ORGANIZATION
• [PER], [POSITION] of [ORG]
Entities:
PER: Jose Mourinho
POSITION: trainer
ORG: Chelsea
Relation
Jose Mourinho
Trainer
Chelsea
39. Using Regular
Expressions
• Extraction relations between entities
o Which PERSON holds what POSITION in what ORGANIZATION
• [PER], [POSITION ] of [ORG]
• [ORG] (named, appointed,…) [PER] Prep [POSITION]
o Nokia has appointed Rajeev Suri as President
o Where a ORGANIZATION is located
• [ORG] headquarters in [LOC]
o NATO headquarters in Brussels
• [ORG][LOC] (division, branch, headquarters…)
o KFOR Kosovo headquarters
41. Extracting relations with
palerns
• Hearst 1992
• What does Gelidium mean?
• “Agar is a substance prepared from a mixture of red
algae, such as Gelidium, for laboratory or industrial
use”
• How do you know?
42. Extracting relations with
palerns
• Hearst 1992: Automatic Acquisition of Hyponyms (IS-A)
X à Gelidium (sub-type) Y à red algae (super-type)
X à IS-A à Y
• “Y such as X”
• “Y, such as X”
• “X or other Y”
• “X and other Y”
• “Y including X”
• ….
44. Hand-‐‑built palerns
• Positive
o Tend to be high-precision
o Can be adapted to specific domains
• Negative
o Human patterns are usually low-recall
o A lot of work to think all possible patterns
o Need to create a lot of patterns for every relation
45. Learning-‐‑based
Approaches
• Statistical techniques and machine learning
algorithms
o Automatically learn patterns and models for new domains
• Some types
o Supervised learning of patterns and rules
o Supervised Learning for relation extraction
o Supervised learning of Sequential Classifier Methods
o Weakly supervised and supervised
46. Supervised Learning of
Palerns and Rules
• Aiming to reduce the knowledge engineering
bottleneck to create an IE in a new domain
• AutoSlog and PALKA à first IE pattern learning
systems
o AutoSlog: syntactic templates, lexico-syntactic patterns and manual
review
• Learning Algorithms à generate rules from
annotated text
o LIEP (Huffman 1996) : syntactic paths, role fillers. Patterns that work ok in
training are kept
o (LP)2 uses tagging rules and correction rules
47. Supervised Learning of
Palerns and Rules
• Relational learning methods
o RAPIER: rules for pre-filler, filler, and post-filler component. Each
component is a pattern that consists of words, POS tags, and semantic
classes.
48. Supervised Learning for
relation extraction (I)
• Design a supervised machine learning framework
• Decide what relations we are interested in
• Choose what entities are relevant
• Find (or create) labeled data
o Representative corpus
o Label the entities in the corpus (Automatic NER)
o Hand label relation between these entities
o Split into train + dev + test
• Train, improve and evaluate
49. Supervised Learning for
relation extraction (II)
• Relation extraction as a classification problem
• 2 classifiers
o To decide if two entities are related
o To decide the class for a pair or related entities
• Why 2?
o Faster training by eliminating most pairs
o Appropriate feature sets for each task
• Find all pairs of NE (restricted to the sentence)
o For every pair
1. Are the entities related (classifier 1)
1. no à END
2. Yes à guess the class (classifier 2)
50. Supervised Learning for
relation extraction (III)
• Are the two entities related?
• What is the type of relation?
51. Supervised Learning for
relation extraction (IV)
“[American Airlines], a unit of AMR, immediately
matched the move, spokesman [Tim Wagner] said”
• What features?
o Head words of entity mentions and combination
• Airlines Wagner Airlines-Wagner
o Bag-of-words in the two entity mentions
• American, Airlines, Tim, Wagner, American Airlines, Tim Wagner
o Words/bigrams in particular positions to the left and right
• M2#-1: spokesman M1#+1: said
o Bag-of-words (or bigrams) between the 2 mentions
• a, AMR, of, immediately, matched, move, spokesman, the, unit
52. Supervised Learning for
relation extraction (V)
“[American Airlines], a unit of AMR, immediately
matched the move, spokesman [Tim Wagner] said”
• What features?
o Named entity types
• M1: ORG M2: PERSON
o Entity level (Name, Nominal (NP), Pronoun)
• M1: NAME (“it” or “he” would be PRONOUN)
• M2: NAME (“the company” would be NOMINAL)
o Basic chunk sequence from one entity to the other
• NP NP PP VP NP NP
o Constituency path on the parse tree
• NP é NP é S é S ê NP
53. Supervised Learning for
relation extraction (VI)
“[American Airlines], a unit of AMR, immediately
matched the move, spokesman [Tim Wagner] said”
• What features?
• Trigger lists
o For family à parent, wife, husband… (WordNet)
• Gazetteers
o List of countries…
• ….
• ….
• …
54. Supervised Learning for
relation extraction (VII)
• Decide your algorithm
o MaxEnt, Naïve Bayes, SVM
• Train the system on the training data
• Tune it on the dev set
• Test on the evaluation test
o Traditional Precision, Recall and F-score
55. Sequential Classifier
Methods
• IE as a classification problem using sequential
learning models.
• A classifier is induced from annotated data to
sequentially scan a text from left to right and
decide what piece of text must be extracted or not
• Decide what you want to extract
• Represent the annotated data in a proper way
57. Sequential Classifier
Methods
• Typical steps for training
o Get the annotated training data
o Represent the data in IOB
o Design feature extractors
o Decide the algorithm to use
o Train the models
• Testing steps
o Get the test documents
o Extract features
o Run the sequence models
o Extract the recognized entities
58. Sequential Classifier
Methods
• Algorithms
o HMM
o CMM
o CRF
• Features
o Words (current, previous, next)
o Other linguistic information (PoS, chunks…)
o Task specific features (NER…)
• Word shapes: abstract representation for words
59. Sequential Classifier
Methods
• Algorithms
o HMM
o SVM
o CRF
• Features
o Words (current, previous, next)
o Other linguistic information (PoS, chunks…)
o Task specific features (NER…)
• Word shapes: abstract representation for words
60. Weakly supervised and
unsupervised
• Manual annotation is also “expensive”
o IE is quite domain specific à not reuse
• AutoSlog-Ts:
o Just needs 2 sets of documents: relevant/irrelevant
o Syntactic templates + relevance according to relevant set
• Ex-Disco (Yangarber et al. 2000)
o No need preclassified corpus
o They use a small set of patterns to decide relevant/irrelevant
61. Weakly supervised and
unsupervised
• OpeNER:
• European project dealing with entity recognition,
sentiment analysis and opinion mining mainly in
hotel reviews (also restaurants, attractions, news)
• Double propagation
o Method to automatically gather opinion words and targets
• From a large raw hotel corpus
• Providing a set of seeds and patterns
62. Weakly supervised and
unsupervised
• Seed list
• + à good, nice
• - à bad, ugly
• Patterns
• a [EXP] [TAR]
• the [EXP] [TAR]
• Polarity patterns
• = [EXP] and [EXP] [EXP], [EXP]
• ! [EXP] but [EXP]
63. Weakly supervised and
unsupervised
• Propagation method
o 1) Get new targets using the seed expressions and the
patterns
• a nice [TAR] a bad [TAR] the ugly [TAR]
• Output à new targets (hotel, room, location)
o 2) Get new expression using the previous targets and the
patterns
• a [EXP] hotel the [EXP] location
• Output à new expressions (expensive, cozy, perfect…)
o Keep running 1 and 2 to get new EXP and TAR
64. Weakly supervised and
unsupervised
• Polarity guessing
o Apply the polarity patters to guess the polarity
• = a nice(+) and cozy(?) à cozy(+)
• ! Clean(+) but expensive(?) à expensive (-)
hlps://github.com/opener-‐‑project/opinion-‐‑domain-‐‑
lexicon-‐‑acquisition
65. Outline
1. What is Information Extraction
2. Main goals of Information Extraction
3. Information Extraction Tasks and Subtasks
4. MUC conferences
5. Main domains of Information Extraction
6. Methods for Information Extraction
o Cascaded finite-state transducers
o Regular expressions and patterns
o Supervised learning approaches
o Weakly supervised and unsupervised approaches
7. How far we are with IE
67. How good is IE
• Some progress has been done
• Still the barrier of 60% seems difficult to outperform
• Most errors on entities and event coreference
• Propagation errors
o Entity recognition à 90%
o One event -> 4 entities
o 0.9 x 4 à 60%
• A lot of knowledge is implicit or “common world
knowledge”
68. How good is IE
Information Type
Accuracy
Entities
90 – 98%
Alributes
80%
Relations
60 – 70%
Events
50 – 60%
• Very optimistic numbers for well-established tasks
• The numbers go down for specific/new tasks