This document discusses using minimal supervision for relation extraction from text. It proposes using a small number of entity pairs labeled as having or not having a relation to generate positive and negative bags of sentences from a large corpus. These bags are then used to train a relation extraction model using multiple instance learning, which handles noise from labeling entire bags rather than individual sentences. A support vector machine framework is presented that transforms the problem into standard supervised learning by assigning bag labels to instances. The document evaluates using this approach with a subsequence kernel customized for relation extraction.
Actividad 8 . taller práctico 10 claves para la implementación de tendencias ...Marilin Mercado
Este documento describe un taller práctico sobre la implementación de tendencias y enfoques innovadores en la educación. El taller busca que los docentes identifiquen el cambio necesario para incorporar las TIC en el aula y currículo, y desarrollen las habilidades requeridas para el nuevo paradigma educativo. El taller se enfoca en temas como las habilidades del siglo 21, políticas de acceso a la tecnología y los desafíos de adaptar la educación a la sociedad moderna. El documento guía a los docentes a través de ejerc
Reflexiones sobre tendencias pedagogicasdiegovil77
Este documento describe tres proyectos pedagógicos basados en enfoques de aprendizaje experiencial. El proyecto Ondas-Colciencias involucra a estudiantes que investigan problemas en equipo. El proyecto Caminos de Paz busca diseñar estrategias para reforzar una cultura de paz tomando en cuenta las experiencias de los estudiantes. El trabajo investigativo presenta un desafío que estimula a los estudiantes a colaborar con la ayuda de profesores y expertos, aplicando un aprendizaje por retos.
Anthony LaJoye has over 5 years of experience in logistics, customer service, and team management. He has a proven track record of multi-tasking, adapting to changing environments, and using various systems to track information and follow up with customers. LaJoye holds a Bachelor's degree in Music from Grand Valley State University and an Associate's degree from North Central Michigan Community College.
The document provides a name, Ahmed Mohamed Wahba. No other details are given about this person in the short text. The name is the only piece of information presented, with no other context around who this person is or any other identifying details provided.
This document provides a list of things to be thankful for in life. It expresses gratitude for everyday things like having a home, a job, friends and family, and basic necessities. The overall message is that even life's small inconveniences or challenges are things we can appreciate because they signify having our basic needs met and being surrounded by loved ones. The document encourages living well, laughing, and loving with all your heart.
A blind boy was sitting with a hat out seeking donations. A man changed the sign to emphasize that while others could see the day's beauty, the boy could not due to his blindness. This more impactful message led to more donations being given to the boy. The story teaches that having empathy for others' hardships can motivate helping where possible, and that creativity and perspective can overcome limitations.
Actividad 8 . taller práctico 10 claves para la implementación de tendencias ...Marilin Mercado
Este documento describe un taller práctico sobre la implementación de tendencias y enfoques innovadores en la educación. El taller busca que los docentes identifiquen el cambio necesario para incorporar las TIC en el aula y currículo, y desarrollen las habilidades requeridas para el nuevo paradigma educativo. El taller se enfoca en temas como las habilidades del siglo 21, políticas de acceso a la tecnología y los desafíos de adaptar la educación a la sociedad moderna. El documento guía a los docentes a través de ejerc
Reflexiones sobre tendencias pedagogicasdiegovil77
Este documento describe tres proyectos pedagógicos basados en enfoques de aprendizaje experiencial. El proyecto Ondas-Colciencias involucra a estudiantes que investigan problemas en equipo. El proyecto Caminos de Paz busca diseñar estrategias para reforzar una cultura de paz tomando en cuenta las experiencias de los estudiantes. El trabajo investigativo presenta un desafío que estimula a los estudiantes a colaborar con la ayuda de profesores y expertos, aplicando un aprendizaje por retos.
Anthony LaJoye has over 5 years of experience in logistics, customer service, and team management. He has a proven track record of multi-tasking, adapting to changing environments, and using various systems to track information and follow up with customers. LaJoye holds a Bachelor's degree in Music from Grand Valley State University and an Associate's degree from North Central Michigan Community College.
The document provides a name, Ahmed Mohamed Wahba. No other details are given about this person in the short text. The name is the only piece of information presented, with no other context around who this person is or any other identifying details provided.
This document provides a list of things to be thankful for in life. It expresses gratitude for everyday things like having a home, a job, friends and family, and basic necessities. The overall message is that even life's small inconveniences or challenges are things we can appreciate because they signify having our basic needs met and being surrounded by loved ones. The document encourages living well, laughing, and loving with all your heart.
A blind boy was sitting with a hat out seeking donations. A man changed the sign to emphasize that while others could see the day's beauty, the boy could not due to his blindness. This more impactful message led to more donations being given to the boy. The story teaches that having empathy for others' hardships can motivate helping where possible, and that creativity and perspective can overcome limitations.
google is which type of inovation and about googleprincepal63522
Google started in 1996 as a research project at Stanford University called BackRub, which was created by Sergey Brin and Larry Page to search for files on the Internet. They later changed the name to Google, inspired by the term "googol." Google focuses its innovation efforts on search and algorithms, artificial intelligence, cloud computing, self-driving cars, smart devices, healthcare applications, renewable energy, and quantum computing. Some of the key issues Google faces include privacy concerns, regulatory compliance, competition from other companies, and accurately translating complex sentences across languages.
This document summarizes information about Google presented by Mr. Niraj N. Bariya and Ms. Krupa D. Mashruwala from KSV University's MSc. IT Department. It discusses Google's founding in 1998, its name which is derived from the mathematical term "googol", its founders Larry Page and Sergey Brin, and some of Google's main products and services like Search, Maps, Gmail, and more. It also provides overviews of Google's infrastructure, search capabilities, and other tools like News, Images, Books, and more.
In this session we'll dive into the journey that Google chooses to take in order focus on AI: what was the mindset, what were the challenges and what is the direction for the future.
Google not all clouds are created equal - sap sapphire 2014 (1)David Torres
Google Cloud Platform is built using Google's globally connected infrastructure that has been optimized over 15+ years for scalability, performance, and quality. It provides data processing, storage, and analytics services like Compute Engine, BigQuery, and Cloud Storage. Customers can use these services to build and host applications, process vast amounts of data using MapReduce/Hadoop, and perform digital marketing analytics on large datasets.
IRJET- Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...IRJET Journal
This document discusses privacy issues with content-based image retrieval (CBIR) systems used by major tech companies for photo storage applications on mobile devices. It notes that while CBIR allows for more efficient image searching than metadata-based systems, current implementations require uploading images to the cloud for deep learning analysis, compromising user privacy. The document proposes an alternative approach using an "indistinguishability under chosen-plaintext attack" framework that performs CBIR locally on devices without uploading private image data. It suggests this could address privacy concerns while still providing accurate image retrieval capabilities.
Checking the pulse of your cloud native architectureEric D. Schabell
The daily choices you make as an engineer when shipping code contributes to the feedback loop. In cloud native environments a surprising amount of data is generated from the application layer down to infrastructure and along the delivery path. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pressures on our engineering teams around the collection, storage, and maintenance of your cloud data can mean the difference between successful teams and burnout. Let us take you on a journey, looking closely at the current state of observability based on a recent research conducted with 500 cloud native engineers and find out what it’s like to be in the trenches.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
Google made numerous acquisitions from 2004 to 2008 to expand into new areas like mobile (Android), social networking (Orkut, YouTube), local services (Dodgeball, @Last), advertising (dMarc Broadcasting, Adscape), and enterprise solutions (Urchin, Tonic Systems), demonstrating its strategy of using acquisitions to rapidly enter new markets and technologies beyond its core search business.
Google Management Ppt (group assignment).pptxssuser8aaff2
Google was founded in 1998 by Larry Page and Sergey Brin while they were PhD students at Stanford University. It has grown to be the largest search engine in the world. Google promotes a casual and flexible work culture with policies like "20% time" where engineers can spend one day per week working on passion projects. While Google has strengths in its powerful search engine and huge data collection, it also faces threats from competitors in areas like online shopping and voice assistants as well as risks regarding user privacy and data use.
DYNO is a data-as-a-service company that processes large amounts of user data daily from 4 countries. It aggregates over 700 million user profiles and 800 million new data points daily. The company uses various big data technologies like Java, Python, databases and cloud platforms to build a big data system that powers thousands of online marketing campaigns. DYNO is hiring data miners and back-end engineers to help tackle challenges like user profiling across multiple social networks, image processing, and delivering targeted ads in real-time.
This document introduces Google Cloud Platform and provides an overview of its products and services. It describes how GCP allows users to build and host applications, store and analyze data, and leverage Google's computing infrastructure. Key products highlighted include Compute Engine, App Engine, Kubernetes Engine, Cloud Storage, Cloud Firestore, and Google Cloud's Machine Learning APIs. The document also lists various educational resources for learning GCP, such as Qwiklabs, Coursera courses, certifications, study jams, and startup programs.
SearchLove Boston 2013_Bill Slawski_Future SearchDistilled
The document discusses emerging trends in search technologies, including new ways of crawling, indexing, and ranking webpages. It outlines Google's development of distributed indexing using links clicked (WOWD), information extraction from sensors in mobile devices and self-driving cars, and knowledge panels/cards to provide contextual information to users. Ranking is moving towards prioritizing entities and their relationships rather than just text, as well as factors like user authority, freshness of news articles, and website quality signals. The future of search involves understanding aspects and relationships between topics rather than just individual pages or strings.
This document introduces Google Cloud Platform and its products and services. It provides an overview of compute, storage, database, machine learning, and other tools available in GCP. It also describes resources for learning about GCP, including hands-on labs, online courses, certifications, grants for education and startups, and free trials. The presentation aims to explain what GCP is and how users can leverage its scalable infrastructure and machine learning capabilities.
This document discusses key characteristics of Web 2.0 including communication, dynamic content, direct response and interaction, using the community network, and building bottom-up through mashups. It provides examples like RouteYou, which allows users to create and share cycling routes. It also discusses views on Web 2.0 like collective intelligence versus collective stupidity. Privacy is discussed as a concern, as personal data on sites can be exposed. Trends are mentioned like the shift to indirect revenue models and free maps/navigation challenging traditional paid services.
We're in the age of toolbox Machine Learning. What should you know about how to use emerging technologies like pre-trained models, large-language models, fine-tuning, and MLOps solutions to quickly and effectively build AI products.
The document discusses Google Cloud Platform and its capabilities for big data and analytics. It notes that Google Cloud Platform is built on Google's infrastructure which powers its own services and has 17 years of experience building cloud infrastructure. It then summarizes several key services including Compute Engine, App Engine, BigQuery, Cloud Dataflow, and Cloud Dataproc that can be used for infrastructure, platforms, software, as well as big data, analytics, and machine learning.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
google is which type of inovation and about googleprincepal63522
Google started in 1996 as a research project at Stanford University called BackRub, which was created by Sergey Brin and Larry Page to search for files on the Internet. They later changed the name to Google, inspired by the term "googol." Google focuses its innovation efforts on search and algorithms, artificial intelligence, cloud computing, self-driving cars, smart devices, healthcare applications, renewable energy, and quantum computing. Some of the key issues Google faces include privacy concerns, regulatory compliance, competition from other companies, and accurately translating complex sentences across languages.
This document summarizes information about Google presented by Mr. Niraj N. Bariya and Ms. Krupa D. Mashruwala from KSV University's MSc. IT Department. It discusses Google's founding in 1998, its name which is derived from the mathematical term "googol", its founders Larry Page and Sergey Brin, and some of Google's main products and services like Search, Maps, Gmail, and more. It also provides overviews of Google's infrastructure, search capabilities, and other tools like News, Images, Books, and more.
In this session we'll dive into the journey that Google chooses to take in order focus on AI: what was the mindset, what were the challenges and what is the direction for the future.
Google not all clouds are created equal - sap sapphire 2014 (1)David Torres
Google Cloud Platform is built using Google's globally connected infrastructure that has been optimized over 15+ years for scalability, performance, and quality. It provides data processing, storage, and analytics services like Compute Engine, BigQuery, and Cloud Storage. Customers can use these services to build and host applications, process vast amounts of data using MapReduce/Hadoop, and perform digital marketing analytics on large datasets.
IRJET- Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...IRJET Journal
This document discusses privacy issues with content-based image retrieval (CBIR) systems used by major tech companies for photo storage applications on mobile devices. It notes that while CBIR allows for more efficient image searching than metadata-based systems, current implementations require uploading images to the cloud for deep learning analysis, compromising user privacy. The document proposes an alternative approach using an "indistinguishability under chosen-plaintext attack" framework that performs CBIR locally on devices without uploading private image data. It suggests this could address privacy concerns while still providing accurate image retrieval capabilities.
Checking the pulse of your cloud native architectureEric D. Schabell
The daily choices you make as an engineer when shipping code contributes to the feedback loop. In cloud native environments a surprising amount of data is generated from the application layer down to infrastructure and along the delivery path. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pressures on our engineering teams around the collection, storage, and maintenance of your cloud data can mean the difference between successful teams and burnout. Let us take you on a journey, looking closely at the current state of observability based on a recent research conducted with 500 cloud native engineers and find out what it’s like to be in the trenches.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
Google made numerous acquisitions from 2004 to 2008 to expand into new areas like mobile (Android), social networking (Orkut, YouTube), local services (Dodgeball, @Last), advertising (dMarc Broadcasting, Adscape), and enterprise solutions (Urchin, Tonic Systems), demonstrating its strategy of using acquisitions to rapidly enter new markets and technologies beyond its core search business.
Google Management Ppt (group assignment).pptxssuser8aaff2
Google was founded in 1998 by Larry Page and Sergey Brin while they were PhD students at Stanford University. It has grown to be the largest search engine in the world. Google promotes a casual and flexible work culture with policies like "20% time" where engineers can spend one day per week working on passion projects. While Google has strengths in its powerful search engine and huge data collection, it also faces threats from competitors in areas like online shopping and voice assistants as well as risks regarding user privacy and data use.
DYNO is a data-as-a-service company that processes large amounts of user data daily from 4 countries. It aggregates over 700 million user profiles and 800 million new data points daily. The company uses various big data technologies like Java, Python, databases and cloud platforms to build a big data system that powers thousands of online marketing campaigns. DYNO is hiring data miners and back-end engineers to help tackle challenges like user profiling across multiple social networks, image processing, and delivering targeted ads in real-time.
This document introduces Google Cloud Platform and provides an overview of its products and services. It describes how GCP allows users to build and host applications, store and analyze data, and leverage Google's computing infrastructure. Key products highlighted include Compute Engine, App Engine, Kubernetes Engine, Cloud Storage, Cloud Firestore, and Google Cloud's Machine Learning APIs. The document also lists various educational resources for learning GCP, such as Qwiklabs, Coursera courses, certifications, study jams, and startup programs.
SearchLove Boston 2013_Bill Slawski_Future SearchDistilled
The document discusses emerging trends in search technologies, including new ways of crawling, indexing, and ranking webpages. It outlines Google's development of distributed indexing using links clicked (WOWD), information extraction from sensors in mobile devices and self-driving cars, and knowledge panels/cards to provide contextual information to users. Ranking is moving towards prioritizing entities and their relationships rather than just text, as well as factors like user authority, freshness of news articles, and website quality signals. The future of search involves understanding aspects and relationships between topics rather than just individual pages or strings.
This document introduces Google Cloud Platform and its products and services. It provides an overview of compute, storage, database, machine learning, and other tools available in GCP. It also describes resources for learning about GCP, including hands-on labs, online courses, certifications, grants for education and startups, and free trials. The presentation aims to explain what GCP is and how users can leverage its scalable infrastructure and machine learning capabilities.
This document discusses key characteristics of Web 2.0 including communication, dynamic content, direct response and interaction, using the community network, and building bottom-up through mashups. It provides examples like RouteYou, which allows users to create and share cycling routes. It also discusses views on Web 2.0 like collective intelligence versus collective stupidity. Privacy is discussed as a concern, as personal data on sites can be exposed. Trends are mentioned like the shift to indirect revenue models and free maps/navigation challenging traditional paid services.
We're in the age of toolbox Machine Learning. What should you know about how to use emerging technologies like pre-trained models, large-language models, fine-tuning, and MLOps solutions to quickly and effectively build AI products.
The document discusses Google Cloud Platform and its capabilities for big data and analytics. It notes that Google Cloud Platform is built on Google's infrastructure which powers its own services and has 17 years of experience building cloud infrastructure. It then summarizes several key services including Compute Engine, App Engine, BigQuery, Cloud Dataflow, and Cloud Dataproc that can be used for infrastructure, platforms, software, as well as big data, analytics, and machine learning.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
Similar to Learning to Extract Relations from the Web using Minimal Supervision (20)
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
Learning to Extract Relations from the Web using Minimal Supervision
1. Learning to Extract Relations from the Web
using Minimal Supervision
Razvan C. Bunescu
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
razvan@cs.utexas.edu
Raymond J. Mooney
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
mooney@cs.utexas.com
2. Introduction: Relation Extraction
• People are often interested in finding relations between
entities:
– What proteins interact with IRAK1?
– Which companies were acquired by Google?
– In which city was Mozart born?
• Relation Extraction (RE) is the task of automatically
locating predefined types of relations in text documents.
1
3. • Relation Examples:
1) Protein Interactions:
2) Company Acquisitions:
3) People Birthplaces:
Introduction: Relation Extraction
– The phosphorylation of Pellino2 by activated IRAK1 could trigger
the translocation of IRAKs from complex I to II.
– Search engine giant Google has bought video-sharing website
YouTube in a controversial $1.6 billion deal.
– Wolfgang Amadeus Mozart was born to Leopold and Ana Maria
Mozart, in the front room of Getreidegasse 9 in Salzburg.
2
4. Motivation: Minimal Supervision
• Developing an RE system usually requires a significant
amount of human effort:
– Extraction patterns designed by a human expert [Blaschke et al.,
2002].
– Extraction patterns learned from a corpus of manually annotated
examples [Zelenko et al., 2003; Culotta and Sorensen, 2004].
• A different RE approach:
– Extraction patterns learned from weak supervision derived from a
significantly reduced amount of human supervision.
3
5. Relation Extraction with Minimal Supervision
• Human supervision a handful of pairs of entities known
to exhibit (+) or not exhibit (–) a particular relation.
• Weak supervision bags of sentences containing the
pairs, automatically extracted from a very large corpus.
• Use bags of sentences in a Multiple Instance Learning
framework [Dietterich et al., 1997] to train a relation
extraction model.
4
6. Types of Supervision for RE
• Single Instance Learning (SIL):
– A corpus of positive and negative sentence examples, with the two
entity names annotated.
– A sentence example is positive iff it explicitly asserts the target
relationship between the two annotated entities.
• Multiple Instance Learning (MIL):
– A corpus of positive and negative bags of sentences.
– A bag is positive iff it contains at least one positive sentence
example.
5
7. RE from Web with Minimal Supervision
+/ Argument a1 Argument a2
+ Google YouTube
+ Adobe Systems Macromedia
+ Viacom DreamWorks
+ Novartis Eon Labs
Yahoo Microsoft
Pfizer Teva
Example pairs of named entities for R Corporate Acquisitions.
6
8. Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
7
9. Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
8
10. Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
9
11. Minimal Supervision: Negative Bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Yahoo, Microsoft
S1
Yahoo is starting to look more like Microsoft and less like the innovative,
unified service that got my loyalty in the first place.
S2
Whatever it is, Yahoo is dashing in front, with Microsoft close behind.
. .
. .
. .
Sn
Yahoo and Microsoft teamed up on October 12 to make their instant
messaging software compatible.
10
12. Minimal Supervision: Negative Bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Yahoo, Microsoft
S1
Yahoo is starting to look more like Microsoft and less like the innovative,
unified service that got my loyalty in the first place.
S2
Whatever it is, Yahoo is dashing in front, with Microsoft close behind.
. .
. .
. .
Sn
Yahoo and Microsoft teamed up on October 12 to make their instant
messaging software compatible.
11
13. MIL Background: Domains
• Originally introduced to solve a Drug Activity prediction
problem in biochemistry [Dietterich et al., 1997]
– Each molecule has a limited set of low energy conformations
bags of 3D conformations.
– A bag is positive is at least one of the conformations binds to a
predefined target.
– MUSK dataset [Dietterich et al., 1997]
• A bag is positive if the molecule smells “musky”.
• Content Based Image Retrieval [Zhang et al., 2002]
• Text categorization [Andrews et al., 03], [Ray et al., 05].
12
14. MIL Background: Algorithms
• Axis Parallel Rectangles [Dietterich, 1997]
• Diverse Density [Maron, 1998]
• Multiple Instance Logistic Regression [Ray & Craven, 05]
• Multi-Instance SVM kernels of [Gartner et al., 2002]
– Normalized Set Kernel.
– Statistic Kernel.
13
15. MIL for Relation Extraction
• Focus on SVM approaches
– Through kernels, can work efficiently with instances that implicitly
belong to a high-dimensional feature spaces.
– Can reuse existing relation extraction kernels.
• Multi-Instance kernels of [Gartner et al., 2002] not appropriate
when very few bags:
– Bags (not instances) are considered as training examples.
– The number of SVs is upper bounded by the number of bags
– Very few bags very few SVs insufficient capacity.
14
16. MIL for Relation Extraction
• A simple approach to MIL is to transform it into a standard supervised
learning problem:
– Apply the bag label to all instances inside the bag.
– Train a standard supervised algorithm on the transformed dataset.
– Despite class noise, obtains competitive results [Ray & Craven, 05]
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube in a controversial
$1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.
. .
. .
. .
Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock
transaction as announced by Google Inc. on October 9, 2006.
15
17. MIL for Relation Extraction
• A simple approach to MIL is to transform it into a standard supervised
learning problem:
– Apply the bag label to all instances inside the bag.
– Train a standard supervised algorithm on the transformed dataset.
– Despite class noise, obtains competitive results [Ray & Craven, 05]
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube in a controversial
$1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.
. .
. .
. .
Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock
transaction as announced by Google Inc. on October 9, 2006.
16
18. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
17
19. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
Regularization term
18
20. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
Error on positive bags
19
21. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
Error on negative bags
20
22. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
• cp, cn > 0, cp+ cn = 1, controls the relative influence that
false negative vs. false positives have on the objective
function.
• want cp < 0.5 (penalize false negatives less than false
positives); used cp = 0.1
21
23. SVM Framework with MIL Supervision
np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ
2
2
1
)(
0
,1)(
,1)(
x
nx
px
Xxbxw
Xxbxw
minimize:
subject to:
• Dual formulation kernel between bag instances K(x1,x2) (x1)(x2).
• Use SSK a subsequence kernel customized for relation extraction.
[Bunescu & Mooney, 2005]
22
24. The Subsequence Kernel for Relation
Extraction
• Implicit features are sequences of words anchored at the
two entity names.
e1 … bought … e2 … billion … deal.
s a word sequence
Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.
g1 1 g2 3 g3 4 g4 0
x an example sentence, containing s as a subsequence
[Bunescu & Mooney, 2005].
s(x) the value of feature s in example x
0431),(
)(
xsgapg
s
i
x
23
25. The Subsequence Kernel for Relation Extraction
• K(x1,x2) (x1)(x2) the number of common “anchored”
subsequences between x1 and x2, weighted by their total gap.
• Many relations require at least one content word modify
kernel to optionally ignore sequences formed exclusively of
stop words and punctuation signs.
• Kernel is computed efficiently by a generalized version of
the dynamic programming procedure from [Lodhi et al., 2002].
[Bunescu & Mooney, 2005].
24
26. Two Types of Bias
• The MIL approach to RE differs from other MIL problems
in two respects:
– The training dataset contains very few bags.
– The bags can be very large.
• These properties lead to two types of bias:
– [Type I] Combinations of words that are correlated to the two
relation arguments are given too much weight in the learned
model.
– [Type II] Words specific to a particular relation instance are given
too much weight.
25
27. Type I Bias
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube
in a controversial $1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's
video expertise, pushing what executives believe is a hot emerging
market of video offered over the Internet.
• Overweighted Patterns:
– search … e1 … video … e2
– … e1 … video … e2
– e1 … search … e2
– e1 … search … e2 … video
26
28. Type II Bias
Google, YouTube
S1
Ever since Google paid $1.65 billion for YouTube in October , plenty of
pundits from Mark Cuban to yours truly have been waiting for the other
shoe to drop.
S2
Google Gobbles Up YouTube for $1.6 BILLION October 9, 2006
S3
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
• Overweighted Patterns:
– … e1 … for … e2 … October
– … e1 … has … e2 … October
27
29. A Solution for Type I Bias
• Use the SSK approach, with new feature weight:
sw
xsgap
s wx )()( ),(
),(
)( xsgap
s x
• Modify subsequence kernel computations to use word
weights (w).
• Want small (w) for words w correlated with either of the
two relation arguments.
28
30. A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w
29
Use a formula for word weights (w) that discounts the effect
of correlations of w with either of the two arguments a1 and a2.
31. A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w
The # of sentences in bag X.
30
32. A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w
The # of sentences in bag X that
contain word w.
31
33. A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w
The probability that the word w appears in a sentence due
only to the presence of X.a1 or X.a2, assuming X.a1 and
X.a2 are independent causes for w.
)).|(1()).|(1(1)..|( 2121 aXwPaXwPaXaXwP
).|().|().|().|( 2121 aXwPaXwPaXwPaXwP
• P(w|a) is the probability that w appears in a sentence due to the presence of a.
• Estimate P(w|a) using counts from a separate bag of sentences containing a.
32
34. MIL Relation Extraction Datasets
• Given two arguments a1 and a2, submit query string
“a1 * * * * * * * a2” to Google.
• Download the resulting documents (less than 1000).
• Split text into sentences and tokenize using the OpenNLP
package.
• Keep only sentences containing both a1 and a2.
• Replace closest occurrences of a1 and a2 with generic tags
e1 and e2 .
33
35. MIL Relation Extraction Datasets
+/ Argument a1 Argument a2 Bag size
+ Google YouTube 1375
+ Adobe Systems Macromedia 622
+ Viacom DreamWorks 323
+ Novartis Eon Labs 311
Yahoo Microsoft 163
Pfizer Teva 247
+ Pfizer Rinat Neuroscience 50 (41)
+ Yahoo Inktomi 433 (115)
Google Apple 281
Viacom NBC 231
Training Pairs
Testing Pairs
manually labeled
all bag sentences
Corporate Acquisitions Dataset
34
36. MIL Relation Extraction Datasets
+/ Argument a1 Argument a2 Bag size
+ Franz Kafka Prague 522
+ Andre Agassi Las Vegas 386
+ Charlie Chaplin London 292
+ George Gershwin New York 260
Luc Besson New York 74
W. A. Mozart Vienna 288
+ Luc Besson Paris 126 (6)
+ Marie Antoinette Vienna 39 (10)
Charlie Chaplin Hollywood 266
George Gershwin London 104
Training Pairs
PersonBirthplace Dataset
35
Testing Pairs
manually labeled
all bag sentences
37. Experimental Results: Systems
• [SSK-MIL] MIL formulation using the original SSK.
• [SSK-T1] MIL formulation with the SSK modified to use
word weights in order to reduce Type I bias.
• [BW-MIL] MIL formulation using a bag-of-words kernel.
• [SSK-SIL] SIL formulation using the original subsequence
kernel:
– Use manually labeled instances from the test bags.
– Train on instances from one positive bag and one negative bag, test
on instances from the other two bags.
– Average results over all four combinations.
36
38. Experimental Results: Evaluation
1) Plot Precision vs. Recall (PR) graphs:
– vary a threshold on the extraction confidence.
2) Report Area Under PR Curve (AUC).
37
41. Experimental Results: AUC
• SSK-T1 is significantly more accurate than SSK-MIL.
• SSK-T1 is competitive with SSK-SIL, however:
– SSK-T1 supervision only 6 pairs (4 positive).
– SSK-SIL average supervision:
• ~500 manually labeled sentences (78 positive) for Acquisitions.
• ~300 manually labeled sentences (22 positive) for Birthplaces.
Dataset SSK-MIL SSK-T1 BW-MIL SSK-SIL
Company Acquisitions 76.9% 81.1% 45.8% 80.4%
People Birthplace 72.5% 78.2% 69.2% 73.4%
40
42. Applications & Extensions
• A “Google Sets” system for relation extraction
– Ideally, the user provides only positive pairs.
– Likely negative examples are created by pairing the argument
entity with other named entities in the same sentence.
– Any pair of entities different from the relation pair is likely to be
negative implicit negative evidence.
Google YouTube
Adobe Systems Macromedia
Viacom DreamWorks
Novartis Eon Labs
Pfizer Rinat Neuroscience
Yahoo Inktomi
. .
. .
. .
Input Output
41
43. Future Work
• Investigate methods for reducing Type II bias.
• Experiment with other, more sophisticated MIL algorithms.
• Explore the effect of Type I and Type II bias when using
dependency information in the relation extraction kernel.
42
44. Conclusion
• Presented a new approach to Relation Extraction, trained
using only a handful of pairs of entities known to exhibit or
not exhibit the target relationship.
• Extended an existing subsequence kernel to resolve
problems caused by the minimal supervision provided.
• The new MIL approach is competitive with its SIL
counterpart that uses significantly more human supervision.
43