mlas06_nigam_tie_01.ppt
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

mlas06_nigam_tie_01.ppt

on

  • 647 views

 

Statistics

Views

Total Views
647
Views on SlideShare
647
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Not only did this support extremely useful and targeted search and data navigation, with highly summarized display
  • CiteSeer, which we all know and love, is made possible by the automatic extraction of title and author information from research paper headers and references in order to build the reference graph that we use to find related work.
  • ML… although this is an area where ML has not yet trounced the hand-built systems. In some of the latest evaluations, hand-built shared 1 st place with a ML. Now many companies making a business from IE (from the Web): WasBang, Inxight, Intelliseek, ClearForest.
  • Technical approach? At CMU I worked w/ HMMs. HMMs: standard tool…, It is a model that makes certain Markov and independence assumptions as depicted in this Graphical Model. Pr certain states are the val of this random variable… = this product of independent pr. Emissions are traditionally atomic entities, like words. Big part of power is the context of finite state: Viterbi We say this is generative model because it has parameters for probability of producing observed emissions (given state)
  • Both in academia and in industry, FSMs are the dominant method of information extraction. HMM acts as a generative model of a research paper header. It has trad. parameters for… Emissions are multinomial distributions over words Params set to maximize likelihood of emissions in training data. HMMs work better than several alternative methods, but...
  • 1. Prefer richer… one that allows for multiple, arbitrary... In many cases, for example names and other large vocabulary fields, the identity of word alone is not very predictive because you probably have never seen it before. For example…”Wisneiwski” … not city In many other cases, such as FAQ seg, the relevant features are not really the words at all, but attributes of the line or para as a whole… emissions=whole lines of text at a time
  • Technical approach? At CMU I worked w/ HMMs. HMMs: standard tool…, It is a model that makes certain Markov and independence assumptions as depicted in this Graphical Model. Pr certain states are the val of this random variable… = this product of independent pr. Emissions are traditionally atomic entities, like words. Big part of power is the context of finite state: Viterbi We say this is generative model because it has parameters for probability of producing observed emissions (given state)
  • 04/26/10
  • MEMMs do have one technical problem: biased towards states with few siblings. No time to explain intricasies now. CRFs = generalization of MEMMs: move normalizer Don’t let all the equations scare you: this slide is easier than it looks. Here is the old, traditional HMM: Prob of state and obs sequence is product of each state trans, and observation emission in the sequence. MEMMs: as we discussed, replaces with conditional, where we no longer generate the obs. We write this as an exponential function Only difference between MEMMs and CRFs is that we move the normalizer outside the product. Inference in MRFs usually requires nasty Gibbs sampling, but due to special structure, clever DP comes to the rescue again.
  • Graphical model on previous slide was a special case. In general, observations don’t have to be chopped up, and feature functions can ask arbitrary questions about any range of the observation sequence. Not going to get into method for learning lambda weights from training data, but simply a matter of maximum likelihood: This is what trying to maximize, differentiate, and solve with Conjugate Gradient. Again, DP makes it efficient.

mlas06_nigam_tie_01.ppt Presentation Transcript

  • 1. Machine Learning for Information Extraction: An Overview Kamal Nigam Google Pittsburgh With input, slides and suggestions from William Cohen, Andrew McCallum and Ion Muslea
  • 2. Example: A Problem Genomics job Mt. Baker, the school district Baker Hostetler , the company Baker, a job opening
  • 3. Example: A Solution
  • 4. Job Openings: Category = Food Services Keyword = Baker Location = Continental U.S.
  • 5. Extracting Job Openings from the Web Title: Ice Cream Guru Description: If you dream of cold creamy… Contact: [email_address] Category: Travel/Hospitality Function: Food Services
  • 6. Potential Enabler of Faceted Search
  • 7. Lots of Structured Information in Text
  • 8. IE from Research Papers
  • 9. What is Information Extraction?
    • Recovering structured data from formatted text
  • 10. What is Information Extraction?
    • Recovering structured data from formatted text
      • Identifying fields (e.g. named entity recognition)
  • 11. What is Information Extraction?
    • Recovering structured data from formatted text
      • Identifying fields (e.g. named entity recognition)
      • Understanding relations between fields (e.g. record association)
  • 12. What is Information Extraction?
    • Recovering structured data from formatted text
      • Identifying fields (e.g. named entity recognition)
      • Understanding relations between fields (e.g. record association)
      • Normalization and deduplication
  • 13. What is Information Extraction?
    • Recovering structured data from formatted text
      • Identifying fields (e.g. named entity recognition)
      • Understanding relations between fields (e.g. record association)
      • Normalization and deduplication
    • Today, focus mostly on field identification & a little on record association
  • 14. IE Posed as a Machine Learning Task
    • Training data: documents marked up with ground truth
    • In contrast to text classification, local features crucial. Features of:
      • Contents
      • Text just before item
      • Text just after item
      • Begin/end boundaries
    00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun prefix contents suffix … …
  • 15. Good Features for Information Extraction
    • Example word features:
      • identity of word
      • is in all caps
      • ends in “-ski”
      • is part of a noun phrase
      • is in a list of city names
      • is under node X in WordNet or Cyc
      • is in bold font
      • is in hyperlink anchor
      • features of past & future
      • last person name was female
      • next two words are “and Associates”
    begins-with-number begins-with-ordinal begins-with-punctuation begins-with-question-word begins-with-subject blank contains-alphanum contains-bracketed-number contains-http contains-non-space contains-number contains-pipe contains-question-mark contains-question-word ends-with-question-mark first-alpha-is-capitalized indented indented-1-to-4 indented-5-to-10 more-than-one-third-space only-punctuation prev-is-blank prev-begins-with-ordinal shorter-than-30 Creativity and Domain Knowledge Required!
  • 16.
    • Word Features
      • lists of job titles,
      • Lists of prefixes
      • Lists of suffixes
      • 350 informative phrases
    • HTML/Formatting Features
      • {begin, end, in} x {<b>, <i>, <a>, <hN>} x {lengths 1, 2, 3, 4, or longer}
      • {begin, end} of line
    Good Features for Information Extraction Is Capitalized Is Mixed Caps Is All Caps Initial Cap Contains Digit All lowercase Is Initial Punctuation Period Comma Apostrophe Dash Preceded by HTML tag Character n-gram classifier says string is a person name (80% accurate) In stopword list (the, of, their, etc) In honorific list (Mr, Mrs, Dr, Sen, etc) In person suffix list (Jr, Sr, PhD, etc) In name particle list (de, la, van, der, etc) In Census lastname list; segmented by P(name) In Census firstname list; segmented by P(name) In locations lists (states, cities, countries) In company name list (“J. C. Penny”) In list of company suffixes (Inc, & Associates, Foundation) Creativity and Domain Knowledge Required!
  • 17. IE History
    • Pre-Web
    • Mostly news articles
      • De Jong’s FRUMP [1982]
        • Hand-built system to fill Schank-style “scripts” from news wire
      • Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96]
    • Most early work dominated by hand-built models
      • E.g. SRI’s FASTUS , hand-built FSMs.
      • But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98]
    • Web
    • AAAI ’94 Spring Symposium on “Software Agents”
      • Much discussion of ML applied to Web. Maes, Mitchell, Etzioni.
    • Tom Mitchell’s WebKB, ‘96
      • Build KB’s from the Web.
    • Wrapper Induction
      • Initially hand-build, then ML: [Soderland ’96], [Kushmeric ’97],…
  • 18. Landscape of ML Techniques for IE: Any of these models can be used to capture words, formatting or both. Classify Candidates Abraham Lincoln was born in Kentucky . Classifier which class? Sliding Window Abraham Lincoln was born in Kentucky. Classifier which class? Try alternate window sizes: Boundary Models Abraham Lincoln was born in Kentucky. Classifier which class? BEGIN END BEGIN END BEGIN Finite State Machines Abraham Lincoln was born in Kentucky. Most likely state sequence? Wrapper Induction <b><i> Abraham Lincoln </i></b> was born in Kentucky. Learn and apply pattern for a website <b> <i> PersonName
  • 19. Sliding Windows & Boundary Detection
  • 20. Information Extraction by Sliding Windows GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 21. Information Extraction by Sliding Windows GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 22. Information Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 23. Information Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 24. Information Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 25. Information Extraction with Sliding Windows [Freitag 97, 98; Soderland 97; Califf 98] 00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun w t-m w t-1 w t w t+n w t+n+1 w t+n+m prefix contents suffix … …
    • Standard supervised learning setting
      • Positive instances: Candidates with real label
      • Negative instances: All other candidates
      • Features based on candidate, prefix and suffix
    • Special-purpose rule learning systems work well
    courseNumber(X) :- tokenLength(X,=,2), every(X, inTitle, false), some(X, A, <previousToken>, inTitle, true), some(X, B, <>. tripleton, true)
  • 26. Rule-learning approaches to sliding-window classification: Summary
    • Representations for classifiers allow restriction of the relationships between tokens, etc
    • Representations are carefully chosen subsets of even more powerful representations based on logic programming (ILP and Prolog)
    • Use of these “heavyweight” representations is complicated , but seems to pay off in results
  • 27. IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 28. IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 29. IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 30. IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 31. IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement
  • 32. BWI: Learning to detect boundaries
    • Another formulation: learn three probabilistic classifiers:
      • START(i) = Prob( position i starts a field)
      • END(j) = Prob( position j ends a field)
      • LEN(k) = Prob( an extracted field has length k )
    • Then score a possible extraction (i,j) by
      • START(i) * END(j) * LEN(j-i)
    • LEN(k) is estimated from a histogram
    [Freitag & Kushmerick, AAAI 2000]
  • 33. BWI: Learning to detect boundaries
    • BWI uses boosting to find “detectors” for START and END
    • Each weak detector has a BEFORE and AFTER pattern (on tokens before/after position i).
    • Each “pattern” is a sequence of tokens and/or wildcards like: anyAlphabeticToken, anyToken, anyUpperCaseLetter, anyNumber, …
    • Weak learner for “patterns” uses greedy search (+ lookahead) to repeatedly extend a pair of empty BEFORE,AFTER patterns
  • 34. BWI: Learning to detect boundaries Field F1 Person Name: 30% Location: 61% Start Time: 98%
  • 35. Problems with Sliding Windows and Boundary Finders
    • Decisions in neighboring parts of the input are made independently from each other.
      • Naïve Bayes Sliding Window may predict a “seminar end time” before the “seminar start time”.
      • It is possible for two overlapping windows to both be above threshold.
      • In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step.
  • 36. Finite State Machines
  • 37. Hidden Markov Models S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Finite state model Graphical model Parameters: for all states S={s 1 ,s 2 ,…} Start state probabilities: P(s t ) Transition probabilities: P(s t |s t-1 ) Observation (emission) probabilities: P(o t |s t ) Training: Maximize probability of training observations (w/ prior) HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, … ... transitions observations o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 Generates: State sequence Observation sequence Usually a multinomial over atomic, fixed alphabet
  • 38. IE with Hidden Markov Models Yesterday Lawrence Saul spoke this example sentence. Yesterday Lawrence Saul spoke this example sentence. Person name: Lawrence Saul Given a sequence of observations: and a trained HMM: Find the most likely state sequence: (Viterbi) Any words said to be generated by the designated “person name” state extract as a person name:
  • 39. Generative Extraction with HMMs
    • Parameters: { P(s t |s t-1 ), P(o t |s t ), for all states s t , words o t }
    • Parameters define generative model:
    [McCallum, Nigam, Seymore & Rennie ‘00]
  • 40. HMM Example: “Nymble” Other examples of HMMs in IE: [Leek ’97; Freitag & McCallum ’99; Seymore et al. 99] Task: Named Entity Extraction Train on 450k words of news wire text. Case Language F1 . Mixed English 93% Upper English 91% Mixed Spanish 90% [Bikel, et al 97] Person Org Other (Five other name classes) start-of-sentence end-of-sentence Transition probabilities Observation probabilities P(s t | s t-1 , o t-1 ) P(o t | s t , s t-1 ) Back-off to: Back-off to: P(s t | s t-1 ) P(s t ) P(o t | s t , o t-1 ) P(o t | s t ) P(o t ) or Results:
  • 41. Regrets from Atomic View of Tokens Would like richer representation of text: multiple overlapping features, whole chunks of text.
    • line, sentence, or paragraph features:
      • length
      • is centered in page
      • percent of non-alphabetics
      • white-space aligns with next line
      • containing sentence has two verbs
      • grammatically contains a question
      • contains links to “authoritative” pages
      • emissions that are uncountable
      • features at multiple levels of granularity
    • Example word features:
      • identity of word
      • is in all caps
      • ends in “-ski”
      • is part of a noun phrase
      • is in a list of city names
      • is under node X in WordNet or Cyc
      • is in bold font
      • is in hyperlink anchor
      • features of past & future
      • last person name was female
      • next two words are “and Associates”
  • 42. Problems with Richer Representation and a Generative Model
    • These arbitrary features are not independent:
      • Overlapping and long-distance dependences
      • Multiple levels of granularity (words, characters)
      • Multiple modalities (words, formatting, layout)
      • Observations from past and future
    • HMMs are generative models of the text:
    • Generative models do not easily handle these non-independent features. Two choices:
      • Model the dependencies . Each state would have its own Bayes Net. But we are already starved for training data!
      • Ignore the dependencies . This causes “over-counting” of evidence (ala naïve Bayes). Big problem when combining evidence, as in Viterbi!
  • 43. Conditional Sequence Models
    • We would prefer a conditional model: P(s|o) instead of P(s,o):
      • Can examine features, but not responsible for generating them.
      • Don’t have to explicitly model their dependencies.
      • Don’t “waste modeling effort” trying to generate what we are given at test time anyway.
    • If successful, this answers the challenge of integrating the ability to handle many arbitrary features with the full power of finite state automata.
  • 44. Conditional Markov Models S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Generative (traditional HMM) ... transitions observations S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Conditional ... transitions observations Standard belief propagation: forward-backward procedure. Viterbi and Baum-Welch follow naturally. Maximum Entropy Markov Models [McCallum, Freitag & Pereira, 2000] MaxEnt POS Tagger [Ratnaparkhi, 1996] SNoW-based Markov Model [Punyakanok & Roth, 2000]
  • 45. Exponential Form for “Next State” Function Capture dependency on s t-1 with |S| independent functions, P s t-1 (s t |o t ). Each state contains a “next-state classifier” that, given the next observation, produces a probability of the next state, P s t-1 (s t |o t ). s t-1 s t Recipe: - Labeled data is assigned to transitions. - Train each state’s exponential model by maximum entropy weight feature
  • 46. Label Bias Problem
    • Consider this MEMM, and enough training data to perfectly model it:
    Pr(0123|rob) = Pr(1|0,r)/Z1 * Pr(2|1,o)/Z2 * Pr(3|2,b)/Z3 = 0.5 * 1 * 1 Pr(0453|rib) = Pr(4|0,r)/Z1’ * Pr(5|4,i)/Z2’ * Pr(3|5,b)/Z3’ = 0.5 * 1 *1 Pr(0123|rib)=1 Pr(0453|rob)=1
  • 47. From HMMs to MEMMs to CRFs HMM MEMM CRF S t-1 S t O t S t+1 O t+1 O t-1 S t-1 S t O t S t+1 O t+1 O t-1 S t-1 S t O t S t+1 O t+1 O t-1 ... ... ... ... ... ... (A special case of MEMMs and CRFs.) Conditional Random Fields (CRFs) [Lafferty, McCallum, Pereira ‘2001]
  • 48. Conditional Random Fields (CRFs) S t S t+1 S t+2 O = O t , O t+1 , O t+2 , O t+3 , O t+4 S t+3 S t+4 Markov on s , conditional dependency on o. Hammersley-Clifford-Besag theorem stipulates that the CRF has this form—an exponential function of the cliques in the graph. Assuming that the dependency structure of the states is tree-shaped (linear chain is a trivial tree), inference can be done by dynamic programming in time O(|o| |S| 2 )—just like HMMs.
  • 49. Training CRFs
    • Methods:
    • iterative scaling (quite slow)
    • conjugate gradient (much faster)
    • conjugate gradient with preconditioning (super fast)
    • limited-memory quasi-Newton methods (also super fast)
    • Complexity comparable to standard Baum-Welch
    [Sha & Pereira 2002] & [Malouf 2002]
  • 50. Sample IE Applications of CRFs
    • Noun phrase segmentation [Sha & Pereira, 03]
    • Named entity recognition [McCallum & Li 03]
    • Protein names in bio abstracts [Settles 05]
    • Addresses in web pages [Culotta et al. 05]
    • Semantic roles in text [Roth & Yih 05]
    • RNA structural alignment [Sato & Satakibara 05]
  • 51. Examples of Recent CRF Research
    • Semi-Markov CRFs [Sarawagi & Cohen 05]
      • Awkwardness of token level decisions for segments
      • Segment sequence model alleviates this
      • Two-level model with sequences of segments, which are sequences of tokens
    • Stochastic Meta-Descent [Vishwanathan 06]
      • Stochastic gradient optimization for training
      • Take gradient step with small batches of examples
      • Order of magnitude faster than L-BFGS
      • Same resulting accuracies for extraction
  • 52. Further Reading about CRFs
    • Charles Sutton and Andrew McCallum. An Introduction to Conditional Random Fields for Relational Learning . In Introduction to Statistical Relational Learning . Edited by Lise Getoor and Ben Taskar. MIT Press. 2006.
    • http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf