STAT Requirement Analysis
Upcoming SlideShare
Loading in...5
×
 

STAT Requirement Analysis

on

  • 726 views

 

Statistics

Views

Total Views
726
Views on SlideShare
722
Embed Views
4

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 4

http://seit1.lti.cs.cmu.edu 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

STAT Requirement Analysis STAT Requirement Analysis Presentation Transcript

  • Requirement Analysis THE STAT PROJECT Milestone 1 Report
  • To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
  • Variation for importing dataset (File Sources)
  • Variations for importing dataset (File formats)
  • Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
  • Reuters dataset example
  • Simplified approach
    • One approach: High Level Reader Class,
    • - ReutersReader
    • RCV1Reader
    • Once written, can be shared by community
    Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
  • Able to persist and read back memory objects
  • Able to visualize memory objects
  • STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
  • STAT framework sample code (conceptual)
  •  
  • Domain Concept: RawCorpus A collection of RawDocument , supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …
  • Domain Concept: RawCorpus abstract class RawCorpus { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
  • Domain Concept: RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
  • Domain Concept: RawDocument class MyRawDocument extends RawDocument { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class RawDocument { public RawDocument() {} }
  • Domain Concept: Processor An object that processes RawCorpus and produces Corpus . - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
  • Domain Concept: Corpus An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
  • Domain Concept: Trainer A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model .
  • Domain Concept: Model An object of what machine learning algorithm (i.e., Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e., Corpus )
  • Domain Concept: Classifier An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model .
  • Domain Concept: Prediction A collection of target values (label, number, probability) that associate with a Corpus , i.e., a collection of Document .
  • Domain Concept: Evaluator An object used for comparing the Prediction against its associated Corpus and generating Evaluation
  • Domain Concept: Evaluation A representation of evaluation result given by a Evaluator , in a summarized manner.
  • THE STAT PROJECT Thanks
  • STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
  • STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
  • STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument