This document provides an overview of natural language processing (NLP), including common methods like part-of-speech tagging, entity extraction, and relation extraction. It discusses applications of NLP like machine translation, spam filtering, and question answering. Sentiment analysis is explained in more detail, including how it is used for applications like predicting elections and analyzing reviews on TripAdvisor. Machine learning approaches to sentiment analysis like using word embeddings are also summarized.
This document discusses natural language processing and text segmentation. It introduces ELUTE (Essential Libraries and Utilities of Text Engineering) and some of its Chinese language processing tools. It then discusses word segmentation algorithms like maximum matching, hidden Markov models, and conditional random fields. Finally, it talks about building language models and the importance of having a large corpus to train models on.
This document contains information about Chih-Ming, a Ph.D student researching machine learning. It provides details about his educational background, research interests, and affiliations. The document then discusses competing on Kaggle and tips for how to approach machine learning competitions, including exploring the data, preprocessing techniques, model selection, ensembling models, and learning from other competitors.
This document discusses logging in Perl applications. It recommends using structured logging by creating log event objects that contain contextual information like date, hostname, etc. as well as custom fields. It introduces Log::Message::Structured, which provides roles to easily create log event classes that can be stringified to JSON or other formats and passed to logging modules like Log::Dispatch. This structured logging approach packages log data as objects for rich formatting and transmission while avoiding reimplementing basic logging functionality.
Lecture 2: Data, pre-processing and post-processing
Chapters 2,3 from the book “Introduction to Data Mining” by Tan, Steinbach, Kumar.
Chapter 1 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman
Those Ember test helpers sure are great, aren’t they? Ever wonder what’s going on under the hood to make these helpers so… helpful? Come, join me on an epic journey to uncover the secrets (and not-so-secrets) of our friendly neighborhood Ember test helpers!
This document summarizes a workshop on understanding challenges. The workshop leader, Chris Douce, will lead interactive activities for students to discuss challenges they face in projects, as well as advice for overcoming challenges and managing risks. Students will break into groups to present on their project's most difficult challenges, advice for teams struggling with challenges, and risks their project faces. The workshop aims to help students consider challenges, approaches to handling them, and risks to move projects forward successfully.
Using Machine Learning and Chatbots to handle 1st line Technical SupportBarbara Fusinska
The document discusses using machine learning and chatbots for first line technical support. It defines what a chatbot is and provides examples like the Turing test. Chatbots are useful for customer service, technical support via answering frequently asked questions, and automating forms. The document outlines various chatbot platforms and architectures. It also provides an example use case of a chatbot for an IT help desk. The document discusses how natural language processing and classification algorithms can help chatbots understand language. Finally, it demonstrates how tools like LUIS can add intelligence to a technical support chatbot.
This document discusses natural language processing and text segmentation. It introduces ELUTE (Essential Libraries and Utilities of Text Engineering) and some of its Chinese language processing tools. It then discusses word segmentation algorithms like maximum matching, hidden Markov models, and conditional random fields. Finally, it talks about building language models and the importance of having a large corpus to train models on.
This document contains information about Chih-Ming, a Ph.D student researching machine learning. It provides details about his educational background, research interests, and affiliations. The document then discusses competing on Kaggle and tips for how to approach machine learning competitions, including exploring the data, preprocessing techniques, model selection, ensembling models, and learning from other competitors.
This document discusses logging in Perl applications. It recommends using structured logging by creating log event objects that contain contextual information like date, hostname, etc. as well as custom fields. It introduces Log::Message::Structured, which provides roles to easily create log event classes that can be stringified to JSON or other formats and passed to logging modules like Log::Dispatch. This structured logging approach packages log data as objects for rich formatting and transmission while avoiding reimplementing basic logging functionality.
Lecture 2: Data, pre-processing and post-processing
Chapters 2,3 from the book “Introduction to Data Mining” by Tan, Steinbach, Kumar.
Chapter 1 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman
Those Ember test helpers sure are great, aren’t they? Ever wonder what’s going on under the hood to make these helpers so… helpful? Come, join me on an epic journey to uncover the secrets (and not-so-secrets) of our friendly neighborhood Ember test helpers!
This document summarizes a workshop on understanding challenges. The workshop leader, Chris Douce, will lead interactive activities for students to discuss challenges they face in projects, as well as advice for overcoming challenges and managing risks. Students will break into groups to present on their project's most difficult challenges, advice for teams struggling with challenges, and risks their project faces. The workshop aims to help students consider challenges, approaches to handling them, and risks to move projects forward successfully.
Using Machine Learning and Chatbots to handle 1st line Technical SupportBarbara Fusinska
The document discusses using machine learning and chatbots for first line technical support. It defines what a chatbot is and provides examples like the Turing test. Chatbots are useful for customer service, technical support via answering frequently asked questions, and automating forms. The document outlines various chatbot platforms and architectures. It also provides an example use case of a chatbot for an IT help desk. The document discusses how natural language processing and classification algorithms can help chatbots understand language. Finally, it demonstrates how tools like LUIS can add intelligence to a technical support chatbot.
Teaching Constraint Programming, Patrick ProsserPierre Schaus
The document discusses teaching a CP (constraint programming) module. It outlines the context, structure, and content of the module. It describes how lectures will cover CP theory, modeling problems, and using Choco solver. It also discusses assessed exercises, including a Sudoku problem and a choice of modeling assignments. The goal is to convey both theory and practice of CP through lectures, exercises, and interactive in-class modeling and solving.
This document discusses Ask.com's challenge of determining which search queries deserve editorial answers. It presents Ask.com's hybrid approach which first filters out queries that are obviously not suitable for editorial answers. It then uses dedicated classifiers and machine learning to further filter queries, with any low confidence queries sent for human review. This reduces the workload for human reviewers by 97% compared to no filtering. The approach improves the machine learning model's accuracy by focusing its domain and allows it to gradually improve using human ratings as training data. Certain human rater biases are also discussed, showing how pre-filtering data can improve the reliability of human reviews.
Using Spark's RDD APIs for complex, custom applicationsTejas Patil
https://spark-summit.org/east-2017/events/experiences-with-sparks-rdd-apis-for-complex-custom-applications/
In this talk, we will discuss several advantages of the Spark RDD API for developing custom applications when compared to pure SQL-like interfaces such as Hive. In particular, we will describe how to control data distribution, avoid data skew, and implement application specific optimizations in order to build performant and reliable data pipelines. In order to illustrate these ideas, we will share our experiences redesigning a large-scale, complex (100+ stage) language model training pipeline for Spark that was originally built in Hive. The final Spark based pipeline is modular, readable, and more maintainable when compared to previous set of HQL queries. In addition to the qualitative improvements, we also observed a significant reduction in both resource usage and data landing time. Finally, we will also describe Spark optimizations that we implemented for this workload that can be applied toward batch workloads in general.
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
The task of “data profiling”—assessing the overall content and quality of a data set—is a core aspect of the analytic experience. Traditionally, profiling was a fairly cut-and-dried task: load the raw numbers into a stat package, run some basic descriptive statistics, and report the output in a summary file or perhaps a simple data visualization. However, data volumes can be so large today that traditional tools and methods for computing descriptive statistics become intractable; even with scalable infrastructure like Hadoop, aggressive optimization and statistical approximation techniques must be used. In this talk Sean will cover technical challenges in keeping data profiling agile in the Big Data era. He will discuss both research results and real-world best practices used by analysts in the field, including methods for sampling, summarizing and sketching data, and the pros and cons of using these various approaches.
Sean is Trifacta’s Chief Technical Officer. He completed his Ph.D. at Stanford University, where his research focused on user interfaces for database systems. At Stanford, Sean led development of new tools for data transformation and discovery, such as Data Wrangler. He previously worked as a data analyst at Citadel Investment Group.
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
1) Reynold Xin presented on using sketches like Bloom filters, HyperLogLog, count-min sketches, and stratified sampling to summarize and analyze large datasets in Spark.
2) Sketches allow analyzing data in small space and in one pass to identify frequent items, estimate cardinality, and sample data.
3) Spark incorporates sketches to speed up exploration, feature engineering, and building faster exact algorithms for processing large datasets.
This document discusses various techniques for measuring and improving application performance. It begins by explaining the importance of measuring performance at the machine, component, and request levels. This includes collecting metrics on CPU, memory, I/O, logs, and tracing requests. Once issues are identified, the document recommends actions like caching, queueing work, and rearchitecting systems using service-oriented principles to improve performance. It stresses the importance of an ongoing process of measuring, analyzing data, taking action, and verifying the impact of changes.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
This document provides an overview of practical content mining. It discusses what content mining is, including mining text, tables, lists, diagrams and images from born-digital and high-throughput documents. It describes the ContentMine project, workshops held to train people in content mining, and collaborations with various scientific communities. Challenges to content mining include opposition from content owners and a lack of common infrastructure and technology.
This presentation is a part of the COP2271C college level course taught at the Florida Polytechnic University located in Lakeland Florida. The purpose of this course is to introduce Freshmen students to both the process of software development and to the Python language.
The course is one semester in length and meets for 2 hours twice a week. The Instructor is Dr. Jim Anderson.
A video of Dr. Anderson using these slides is available on YouTube at:https://www.youtube.com/watch?feature=player_embedded&v=_fdNoErxjBw
A presentation I gave on Refactoring for the RIA Unleashed conference in 2011 up in Boston.
Video References:
Winston Wolfe - I Solve Problems form Pulp Fiction
http://www.youtube.com/watch?v=DO0d7dpA-K8
And the ready scene from The Last Samurai:
http://www.youtube.com/watch?v=QE3yMEfpk6E
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)Tech in Asia ID
This slide was shared on Tech in Asia Jakarta 2016 @ 17 November 2016.
Get updates about our dev events delivered straight to your inbox by signing up here: http://bit.ly/tia-dev ! Be the first to know when new information is available!
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Genericmeetupslides 110607190400-phpapp02Jeffrey Clark
The document announces a DC Python meetup that is held on the first Tuesday of each month from 7-9PM in DC. It is organized by DC Python (ZPUGDC, Inc) and brings together attendees to learn Python programming, network, and listen to guest speakers. Typical topics covered include Python web frameworks, core development, and introductions to intermediate and advanced Python programming.
Text Mining, Association Rules and Decision Tree LearningAdrian Cuyugan
This document discusses decision tree learning, which is a supervised machine learning technique used for classification and regression. It begins by explaining the differences between supervised and unsupervised learning. It then covers concepts like information gain, entropy, and weighted average entropy which are used to build decision trees by splitting nodes. Finally, it discusses variations of decision tree algorithms and suggested topics to further read about evaluating classification models.
Natural language processing (NLP) is a subfield of artificial intelligence that studies how to process and understand human language, with the ultimate goal of enabling natural communication between humans and computers; it is an interdisciplinary field that draws from computer science, linguistics, psychology and other areas to allow computers to understand, generate and translate between different human languages. NLP techniques include morphology, lexicography, syntax, semantics and discourse analysis to analyze words, sentences and full conversations at different levels of meaning.
In your code base, to understand a random line, how many lines do you need to read back? Cloud you make it zero?
This talk will start with the impact of the maintainability, define the maintainability as “to understand a random line, the lines you need to read back”, show the practicing techniques to make it zero, or nearly zero, and finally, reach the goal: boost the maintainability.
It's the revision of “Beyond the Style Guides” [1] and the talk at PyCon TW 2016 [2], PyCon APAC/KR 2016 [3], and GDG DevFest Taipei 2016 [4].
[1]: https://speakerdeck.com/mosky/beyond-the-style-guides
[2]: https://tw.pycon.org/2016
[3]: https://www.pycon.kr/2016apac/
[4]: https://devfest-taipei-3cbee.firebaseapp.com/
This document describes a method for ranking entity types using contextual information from text. It presents several approaches for ranking types, including entity-centric, hierarchy-based, and context-aware methods. It also describes how the different ranking approaches are evaluated using crowdsourcing to collect relevance judgments on entity types within given contexts. The approaches are implemented in a system called TRank that uses inverted indices and MapReduce for scalability.
This document discusses using Splunk as a security information and event management (SIEM) tool. It describes how Cisco's Computer Security Incident Response Team (CSIRT) uses Splunk to monitor over 1 terabyte of log data per day across Cisco's global operations. The document contrasts old approaches that relied on vendor-provided reports with new approaches like hunting for threats by building custom queries. It emphasizes an iterative process of filtering, refining queries to find bad traffic and saving reusable searches to automate threat detection.
AstriCon 2017 - Machine Learning, AI & AsteriskEvan McGee
This talk covers the current state-of-the-art open source projects related to AI & machine learning applied to realtime communications. Includes practical examples and containers that allow for utilizing them today.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Teaching Constraint Programming, Patrick ProsserPierre Schaus
The document discusses teaching a CP (constraint programming) module. It outlines the context, structure, and content of the module. It describes how lectures will cover CP theory, modeling problems, and using Choco solver. It also discusses assessed exercises, including a Sudoku problem and a choice of modeling assignments. The goal is to convey both theory and practice of CP through lectures, exercises, and interactive in-class modeling and solving.
This document discusses Ask.com's challenge of determining which search queries deserve editorial answers. It presents Ask.com's hybrid approach which first filters out queries that are obviously not suitable for editorial answers. It then uses dedicated classifiers and machine learning to further filter queries, with any low confidence queries sent for human review. This reduces the workload for human reviewers by 97% compared to no filtering. The approach improves the machine learning model's accuracy by focusing its domain and allows it to gradually improve using human ratings as training data. Certain human rater biases are also discussed, showing how pre-filtering data can improve the reliability of human reviews.
Using Spark's RDD APIs for complex, custom applicationsTejas Patil
https://spark-summit.org/east-2017/events/experiences-with-sparks-rdd-apis-for-complex-custom-applications/
In this talk, we will discuss several advantages of the Spark RDD API for developing custom applications when compared to pure SQL-like interfaces such as Hive. In particular, we will describe how to control data distribution, avoid data skew, and implement application specific optimizations in order to build performant and reliable data pipelines. In order to illustrate these ideas, we will share our experiences redesigning a large-scale, complex (100+ stage) language model training pipeline for Spark that was originally built in Hive. The final Spark based pipeline is modular, readable, and more maintainable when compared to previous set of HQL queries. In addition to the qualitative improvements, we also observed a significant reduction in both resource usage and data landing time. Finally, we will also describe Spark optimizations that we implemented for this workload that can be applied toward batch workloads in general.
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
The task of “data profiling”—assessing the overall content and quality of a data set—is a core aspect of the analytic experience. Traditionally, profiling was a fairly cut-and-dried task: load the raw numbers into a stat package, run some basic descriptive statistics, and report the output in a summary file or perhaps a simple data visualization. However, data volumes can be so large today that traditional tools and methods for computing descriptive statistics become intractable; even with scalable infrastructure like Hadoop, aggressive optimization and statistical approximation techniques must be used. In this talk Sean will cover technical challenges in keeping data profiling agile in the Big Data era. He will discuss both research results and real-world best practices used by analysts in the field, including methods for sampling, summarizing and sketching data, and the pros and cons of using these various approaches.
Sean is Trifacta’s Chief Technical Officer. He completed his Ph.D. at Stanford University, where his research focused on user interfaces for database systems. At Stanford, Sean led development of new tools for data transformation and discovery, such as Data Wrangler. He previously worked as a data analyst at Citadel Investment Group.
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
1) Reynold Xin presented on using sketches like Bloom filters, HyperLogLog, count-min sketches, and stratified sampling to summarize and analyze large datasets in Spark.
2) Sketches allow analyzing data in small space and in one pass to identify frequent items, estimate cardinality, and sample data.
3) Spark incorporates sketches to speed up exploration, feature engineering, and building faster exact algorithms for processing large datasets.
This document discusses various techniques for measuring and improving application performance. It begins by explaining the importance of measuring performance at the machine, component, and request levels. This includes collecting metrics on CPU, memory, I/O, logs, and tracing requests. Once issues are identified, the document recommends actions like caching, queueing work, and rearchitecting systems using service-oriented principles to improve performance. It stresses the importance of an ongoing process of measuring, analyzing data, taking action, and verifying the impact of changes.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
This document provides an overview of practical content mining. It discusses what content mining is, including mining text, tables, lists, diagrams and images from born-digital and high-throughput documents. It describes the ContentMine project, workshops held to train people in content mining, and collaborations with various scientific communities. Challenges to content mining include opposition from content owners and a lack of common infrastructure and technology.
This presentation is a part of the COP2271C college level course taught at the Florida Polytechnic University located in Lakeland Florida. The purpose of this course is to introduce Freshmen students to both the process of software development and to the Python language.
The course is one semester in length and meets for 2 hours twice a week. The Instructor is Dr. Jim Anderson.
A video of Dr. Anderson using these slides is available on YouTube at:https://www.youtube.com/watch?feature=player_embedded&v=_fdNoErxjBw
A presentation I gave on Refactoring for the RIA Unleashed conference in 2011 up in Boston.
Video References:
Winston Wolfe - I Solve Problems form Pulp Fiction
http://www.youtube.com/watch?v=DO0d7dpA-K8
And the ready scene from The Last Samurai:
http://www.youtube.com/watch?v=QE3yMEfpk6E
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)Tech in Asia ID
This slide was shared on Tech in Asia Jakarta 2016 @ 17 November 2016.
Get updates about our dev events delivered straight to your inbox by signing up here: http://bit.ly/tia-dev ! Be the first to know when new information is available!
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Genericmeetupslides 110607190400-phpapp02Jeffrey Clark
The document announces a DC Python meetup that is held on the first Tuesday of each month from 7-9PM in DC. It is organized by DC Python (ZPUGDC, Inc) and brings together attendees to learn Python programming, network, and listen to guest speakers. Typical topics covered include Python web frameworks, core development, and introductions to intermediate and advanced Python programming.
Text Mining, Association Rules and Decision Tree LearningAdrian Cuyugan
This document discusses decision tree learning, which is a supervised machine learning technique used for classification and regression. It begins by explaining the differences between supervised and unsupervised learning. It then covers concepts like information gain, entropy, and weighted average entropy which are used to build decision trees by splitting nodes. Finally, it discusses variations of decision tree algorithms and suggested topics to further read about evaluating classification models.
Natural language processing (NLP) is a subfield of artificial intelligence that studies how to process and understand human language, with the ultimate goal of enabling natural communication between humans and computers; it is an interdisciplinary field that draws from computer science, linguistics, psychology and other areas to allow computers to understand, generate and translate between different human languages. NLP techniques include morphology, lexicography, syntax, semantics and discourse analysis to analyze words, sentences and full conversations at different levels of meaning.
In your code base, to understand a random line, how many lines do you need to read back? Cloud you make it zero?
This talk will start with the impact of the maintainability, define the maintainability as “to understand a random line, the lines you need to read back”, show the practicing techniques to make it zero, or nearly zero, and finally, reach the goal: boost the maintainability.
It's the revision of “Beyond the Style Guides” [1] and the talk at PyCon TW 2016 [2], PyCon APAC/KR 2016 [3], and GDG DevFest Taipei 2016 [4].
[1]: https://speakerdeck.com/mosky/beyond-the-style-guides
[2]: https://tw.pycon.org/2016
[3]: https://www.pycon.kr/2016apac/
[4]: https://devfest-taipei-3cbee.firebaseapp.com/
This document describes a method for ranking entity types using contextual information from text. It presents several approaches for ranking types, including entity-centric, hierarchy-based, and context-aware methods. It also describes how the different ranking approaches are evaluated using crowdsourcing to collect relevance judgments on entity types within given contexts. The approaches are implemented in a system called TRank that uses inverted indices and MapReduce for scalability.
This document discusses using Splunk as a security information and event management (SIEM) tool. It describes how Cisco's Computer Security Incident Response Team (CSIRT) uses Splunk to monitor over 1 terabyte of log data per day across Cisco's global operations. The document contrasts old approaches that relied on vendor-provided reports with new approaches like hunting for threats by building custom queries. It emphasizes an iterative process of filtering, refining queries to find bad traffic and saving reusable searches to automate threat detection.
AstriCon 2017 - Machine Learning, AI & AsteriskEvan McGee
This talk covers the current state-of-the-art open source projects related to AI & machine learning applied to realtime communications. Includes practical examples and containers that allow for utilizing them today.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Impartiality as per ISO /IEC 17025:2017 StandardMuhammadJazib15
This document provides basic guidelines for imparitallity requirement of ISO 17025. It defines in detial how it is met and wiudhwdih jdhsjdhwudjwkdbjwkdddddddddddkkkkkkkkkkkkkkkkkkkkkkkwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwioiiiiiiiiiiiii uwwwwwwwwwwwwwwwwhe wiqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq gbbbbbbbbbbbbb owdjjjjjjjjjjjjjjjjjjjj widhi owqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq uwdhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhwqiiiiiiiiiiiiiiiiiiiiiiiiiiiiw0pooooojjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj whhhhhhhhhhh wheeeeeeee wihieiiiiii wihe
e qqqqqqqqqqeuwiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiqw dddddddddd cccccccccccccccv s w c r
cdf cb bicbsad ishd d qwkbdwiur e wetwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww w
dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffw
uuuuhhhhhhhhhhhhhhhhhhhhhhhhe qiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii iqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccccccc bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuum
m
m mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm m i
g i dijsd sjdnsjd ndjajsdnnsa adjdnawddddddddddddd uw
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
AI in customer support Use cases solutions development and implementation.pdfmahaffeycheryld
AI in customer support will integrate with emerging technologies such as augmented reality (AR) and virtual reality (VR) to enhance service delivery. AR-enabled smart glasses or VR environments will provide immersive support experiences, allowing customers to visualize solutions, receive step-by-step guidance, and interact with virtual support agents in real-time. These technologies will bridge the gap between physical and digital experiences, offering innovative ways to resolve issues, demonstrate products, and deliver personalized training and support.
https://www.leewayhertz.com/ai-in-customer-support/#How-does-AI-work-in-customer-support
2. Agenda
• Natural Language Processing Background
• Methods used in NLP
• Applications
• Sentiment Analysis
• Usage in TripAdvisor
• Challenges
2
3. What is Natural Language Processing?
Text NLP
Structured
Data
Applications
• Machine Reading
3
4. Methods in NLP
• Automatic Summarization:
• There are basically two types of auctions.
• There are two types of auctions.
• Part-of-speech Tagging: classify and label words
• They refuse to permit us to obtain the refuse permit
• [('They', ‘pronouns'), ('refuse', verb'), ('to', prepositions'), ('permit', verb')…..]
• Entity Extraction:
• People, organizations, locations, times, dates, prices, …
• Relation Extraction:
• Located in, employed by, part of, married to, ...
4
5. Applications
• Machine Translation: Google Translate
• An electric guitar and bass player stand off…...
• fish as Pacific salmon and striped bass
• Email Spam Filters: Gmail
• Naive Bayes classifier is used to identify spam/ham emails
• P(spam|word) = P(word|spam)*P(spam)/P(word)
• Question-Answering: Amazon’s Alexa , Google Home
• Amazon Lex: AI Api used in Amazon’s Alexa
• Sentiment Analysis: Opinion Mining
5
6. Sentiment Analysis
• What is it?
• Determine the emotional tone behind a series of words
• Uses
• Political Polling: 2012 Presidential Election
• Business Purpose: TripAdvisor
6
11. Example
• “Beautiful impressionist paintings and outstanding sculptures. For
me, the original buildings were the best bit! The renovations and
creation of an amazing museum are a work of art in themselves.
Loved the paintings although a bit disappointed with the low number
of Van Gogh.” 😄
• Score: 0.301644
11
12. Example
beautiful impressionist and, outstanding ….
best ... amazing ...,love,...,disappoint,....
• Pre-Tagged Dictionary
• Positive:[beautiful, wonderful, best, outstanding, amazing, best, love ….]
• Negative: [disappoint, sad, unhappy.....]
• Score: 0.301644
12
13. Machine Learning Based Approach
Load & Pre-
Process Data
Extract
Features
Train Model
Evaluate
Model
13
14. ML Based Approach
• Load Data
• 25,000 labeled training tweets
• Another 25, 000 validation tweets
• 50,000 test tweets
14
15. ML Based Approach
• Pre-Process Data:
• Remove punctuation: “I like this one!!!!!” -> “I like this one”
• Filter out stopwords: “this”, “the”
• Normalize each contiguous occurrence of whitespace to ’ ‘: ” goodd” ->
“goodd”
• Convert to lowercase: “Upper” -> “upper”
• Stemming: “Learning” -> learn”, “Done” -> “do”
• Tokenization
15
16. ML Based Approach
• Extract Features
• Use Word2Vec model to map each word into an n-dimensional vector
• Each element of the vector can be viewed as a feature
16
17. What Is Word2Vec Model
• Use:
• Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Given a word, get the similar words
• Advantage:
• Preserve semantic relationship between each word
17
18. What Is Word2Vec Model
vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”)
18
man
woman
queen
king
19. What Is Word2Vec Model
• Use: Map the word into high dimensional ( > 100) vector
• Input: a large corpus of text
• Output: vector spaces: w=(w1,w2…..wn)
• Advantage:
• Preserve semantic relationship between each word
• Feature:
• “How Close” words or phrases are to each other
• The angle between the vectors of two words is an indicator of how similar
the words are
19
21. How To Train A Word2Vec Model?
• Build the model using Genism: Open source python toolkit
• model = Word2Vec(tweets, size=200, window=2, min_count=5, workers=4)
21
The quick brown fox jumps over the lazy dog.
22. How To Train A Word2Vec Model?
Source Text
22
The quick brown fox jumps over the lazy dog
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick, fox)
(brown, the), (brown, quick),
(brown, fox), (brown, jumps)
(fox, quick), (fox, brown)
(fox, jumps), (fox, over)
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
23. How To Train A Word2Vec Model?
Source Text
23
The quick brown rabbit jumps out of the sink
Training Samples
( the, quick), (the, brown)
(quick, the), (quick, brown), (quick,
rabbit)
(brown, the), (brown, quick),
(brown, rabbit), (brown, jumps)
(rabbit, quick), (rabbit, brown)
(rabbit, jumps), (rabbit, out)
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
The quick brown rabbit jumps out of the sink
24. How To Train A Word2Vec Model?
For a given word: Rabbit, we get similar surrounding words of same
context:
• Input:
• tweet_w2v.most_similar(’rabbit')
• Output:
• [ (u’fox', 0.7355118989944458), (u’jump', 0.7164269685745239),..]
24
25. How To Train A Word2Vec Model?
• Input:
• tweet_w2v.most_similar(’good')
• Output:
• [(u'goood', 0.7355118989944458), (u'great', 0.7164269685745239),…]
25
26. Word2Vec Usage in TripAdvisor
26
User browser seq: Madrid, Lisbon, Barcelona,
Boston
Sentence: “Madrid, Lisbon, Barcelona, Boston”
27. ML Based Approach
• Train the Model
• Represent each word using Word2Vec
• Combine these word vectors
• Train the classifier
27
28. ML Based Approach
• Evaluate the Model
• Using the 50,000 test data to assess the model
• Accuracy: 0.78984528240986307
28
29. Challenges
• Some challenging examples
• “My flight’s been delayed. Brilliant! ☹️ (Sarcasm)
• “I do not dislike cabin cruisers.” (Negation handling)
• Some promising works, but still low accuracy
• Contextualized Sarcasm Detection on Twitter - David Bamman and Noah A.
Smith
29
30. • Online course:
• https://www.coursera.org/learn/natural-language-processing
• Open resource:
• https://nlp.stanford.edu/ : Standford NLP group
• https://arxiv.org/
30
where each tweet is labeled 1 when it's positive and 0 when it's negative
Validation tweet are used to tune the model. Prevent overfitting, neural networking is used to train the hidden output layer.
For example, patterns such as “Man is to Woman as King is to Queen” can be generated through algebraic operations on the vector representations of these words such that the vector representation of “Brother” - ”Man” + ”Woman” produces a result which is closest to the vector representation of “Sister” in the model
The vector offset is pretty much parallel to each other
After we have some knowledge to word2vec. Let me continue with how to train a Word2vec model?
The common way is to use Genisum.. Then calling this will build a model for us. Feeding this model by a large corpus of sentences, which is used to build a vocabulary.
The size is the word vector dimension.
min_count = ignore all words with total frequency lower than this.wordkers: use this many worker threads to train the model: thread. Because the text corpus are really large, so I set the thread to be 4.
The window is the maximum distance between the current and predicted word within a sentence.
If we set the window size = 2, and dimension to be 200? How it works? Let me demonstrate this with only 1 input sentence:
Size is size is the dimensionality of the feature vectors.
Window: window is the maximum distance between the current and predicted word within a sentence.
Given a specific word in the middle of a sentence (the input word), look at the words nearby.
The output probabilities are going to relate to how likely it is find each vocabulary word nearby our input word.
For example, if you gave the trained network the input word “Soviet”, the output probabilities are going to be much higher for words like “Union” and “Russia” than for unrelated words like “watermelon” and “kangaroo”.
min_count = ignore all words with total frequency lower than this.wordkers: use this many worker threads to train the model: thread
Tripadvisor recommendation use word2Vec model.
For example, a user’s brwoser sequence is “ Madrid./…..” which means, this user actually search/browser Madird, then Boston.....
so we can make up a sentence by the user’s browser sequence; The sentece we will use to feed the word2vec model is: Madrid, Lisbon,…” Like we do for The quick brown fox jumps over the lazy dog.
after feeding many such sentences from different users, it learns pretty well how geos are similar in meaning! Then after I booked a vacational rentals in Boston, it will also recommend other places in Spain.
It is hard for people
Sarcasm is dependent on its context
They think the the relationship between author and audience is central for understanding the sarcasm phenomenon. Promising work: looks at attributes of the author (author features), attributes of the intended recipient of a tweet (audience features), and the attributes of responses to potentially sarcastic tweets (response features).
use of grammatical relations among words to model a sentence, and hence to determine words that are affected by negation.
static window and punctuation marks to determine the scope of negation.
Using natural language processing to detect sarcasm on the internet still has a long way to go and may never be particularly reliable