SlideShare a Scribd company logo
1 of 4
3/31/05 CSKD Funding Meeting



Notes from CSKD Grant Meeting of 3/31/05
By Kathryn Clodfelter

Attendees:
Elin Jacob
Kiduk Yang
Kathryn Clodfelter

Kiduk created a diagram of how he envisions the grant project. (See separate PowerPoint
document: http://elvis.slis.indiana.edu/index.shtml )

Collection builder:
   • Intelligent web harvest – a customized crawler will identify:
           o syllabus
           o topics in schedule format
           o reading assignments associated
           o lecture itself, exercises/homework/problems/questions (application of lectures)
           o labs/exercises
           o Link to external resources – eg. Physics on-line dictionary, encyclopedia

Question: how do we identify these?
   • Heuristic: based on URL information:
           o extension (e.g. pdf)
           o name of file
           o path
           o home page
   • Actual content of the page:
           o Automatic classifier – identify subset of data (like all the lectures), then train the
              classifier, who will identify what type of material it is; this is machine learning
              approach: get training data, identified and labeled documents (some set of
              lectures, papers), run thru statistical process – will fail if content is exactly the
              same
           o Rule-based classifier: looking for lexicons (specific words), e.g., Linguistic: look
              at training data manually and come up with a heuristic; whether visual format or
              whatever it may be – use human intelligence to identify the rules for classifying
              things
                    Computational Linguistics: comes from NLP, look at part of speech; still
                       statistical approach, but utilizes the linguistic structure that may be
                       necessary to identify these things

Significant question: Machine learning has never been applied to classify these types of
resources. That's why we're going to use all three. Has done combination of manual and
automatic for TREC – use as reference. We're going to leverage machine learning, which is



                                          Page 1
3/31/05 CSKD Funding Meeting


stable area of research. In building rule-based classifier, we're going to discover a combined
method of classifying the resource type, as opposed to topical classification.

One of the significant contributions to research will be to combine the machine-learning
approach with heuristic (rule-based) to identify the different types/formats of documents. May
look at actual markup as a basis (some research on that from Lizzie and Howard).

All href anchor text is separated from content for retrieval purposes more than straight text.
We're going beyond that. Why and what for? To get a high quality collection – items that aren't
outside

To avoid intellectual property issues, every piece of content clicked on will take to original site.
We have data internally for mapping and crunching, but we deliver the original site. Collection
builder can run from any web connection – doesn't have to be IU.

Schedule – has list of topics, etc. presented in a particular order
Plan to give the user:
Intelligent sample sites: we give him 3 options – if you have a schedule in mind for specific
topic, use this, etc. – try the FAQ – come up with diff things – key thing is Introductory IR –
sample course – has entire website (of someone else's). then not an intelligent service where
combines everything. We've manually selected the best web site. The FAQ will be manual.

Elin wants an overview and synthesis of what everyone's doing out there. What's going on at
higher level.
FAQ #1: Sample Site – can select good source for top physics or most popular (link analysis and
page rank)
FAQ #2: Overview – instead of just pointing to site, decide on structure, schedule, questions,
lectures with visual representation, little bit of analysis by looking at all schedules, ranked by
popularity, use-based, links (if have identified vocab and topics, can map relationships bet topics
in various schedules) I want to know how the topics are related; schedule is based on concepts
and the relationships between them, analyze the linear relationships (a line of info is the
connections between the points); the syllabus schedule doesn't follow the concept relationships;
have to find a way to get to that structure thru mapping of concept relationships (concept maps
are part of KB search; facet search)
A faceted structure is just like a database
When do a topic search browse, it shows a concept mapping
Facet search has something based on our faceted prototype
What are the concepts –
Q2: what are the concepts?
#1 doesn't give me kind of help I need not knowing anything
Maybe combine 1&2,
We have to come up with the questions – from Gregor – get by talking to the users/faculty
That kind of FAQ doesn't come from what we're good at
It's there because it's part of Gregor stuff
2 things in intelligent service:
Do sample sites as one aspect of structure/template component



                                           Page 2
3/31/05 CSKD Funding Meeting


Our focus is not to develop the FAQ – we already have several, the sample size
Example: what are the most heavily-used courses on intro physics? What are most highly linked
sites on physics? A lot of lecture content is repeated.
2nd question: show me what the topics are for nuclear fusion course? This was what Gregor
talked about with intelligent web service. KY has steered Gregor toward glorified FAQ – it's an
expert system. That's why it's called intelligent FAQ. It's a kb. Show me the syllabus overview,
then do some analysis on all syllabus materials we have. Then do things by frequency, some kind
of structure – these are common ones.
Where's the data on how people build their syllabi?
What we have is the middle one – concept/topic search/browse
The FAQ whose answer requires analysis has to use the data we create
Show me concept map:
Show me the concept relationships on intro physics, linked to graphical concept map
Concept topic search are when you have a specific topic and you want to find material
Structured template populator is when instead of using a concept/topic as query, you use a
structure or template (e.g. schedule) – get structure that's hyperlinked
If everything has to be linked back to original, result page is organized and filtered to find what
you want

Needs to be diff from doing a Google search
this is a digital indexing service, you're pointing me somewhere else, not giving me the
resources; therefore not a DL
example: get frustrated with SLIS course page, having to back to main page to go to diff session
don't want something where I have to keep going back
not giving them value-added

3rd component of structure/template: I came up with course structure, there's a form where I type
in and create a schedule, got result page, everything is linked,
That person comes in with their structure in hand
I put in the topic I want, and then this table comes back all linked

Date               Topic             Lecture & Lab        Readings             Assignments


Can add description
The link is not specific reading
Remember CSKD is all about fusion – give humans options
Have copyright issue – identify the available readings – can do by link analysis popularity, user
popularity; user can sort by source, trustworthiness, usage
Need to have something like this for someone who doesn't have the structure
FAQs are for the person who has no idea
It's the who, what, when, where, how and why
Minimum cognitive requirement from user – don’t' even have to come up with question
Some type of reasonable answer – samples, site, overview /analysis of schedule
FAQs about how to use this service
It's a FAQ – where get the questions


                                          Page 3
3/31/05 CSKD Funding Meeting


Then user contribution module will stabilize if not used; we keep track of the system log, the
usage of these services
The way we bill it, is we will build over time – it's not the focus – where gonna get FAQs
Schedule overview – click
Do by top 10 overviews by linkage popularity, usage popularity, source importance, reading
material, overview of readings
How come up with data? We have all harvested schedules, do link analysis for top 10
Has nothing to do with what we do, it's purely statistical
Can use the intelligent classifier to identify resources by structure, add the content, and plug it in
Structure/template populator, the core module would be conctp/topic search – each concept
employer that search to populate
Middle one – you're trying to find a single topic, Last one is a list of all topics grouped

Concept/Topic Search:
2 of that has concept relationships as part of result – one is facet search result with a list of
results organized using our classified data
Get list of resources & list of relationships (remember facet search interface)
Still want thing that gives me sense of how to structure my course – that's the concept/topic
browse
Concept is that
Develop FAQ as part of user contribution module, they will contribute data as well as
Some will have structure, some won't
Run thru classifier and concept mapper
On top of this is digital object management
Parser will try to get automatically pull metadata info – eg. Author

Rule based classifier construction will be done manually, then becomes automatic
Automatic classifier
Once rule identified, becomes automatic classifier

Manual & automatic & hybrid: whole thing about maximizing the utility/performance/task
completion by leveraging both machine capability and human intelligence

next time:
1) hybrid, etc.
2) flag where can make significant contribution for IR and classification




                                            Page 4

More Related Content

What's hot

Mapping a path to the empowered searcher
Mapping a path to the empowered searcherMapping a path to the empowered searcher
Mapping a path to the empowered searcherSheila Webber
 
Social Work Masters Student Introduction
Social Work Masters Student IntroductionSocial Work Masters Student Introduction
Social Work Masters Student IntroductionLucia Ravi
 
TSEM Cooper - Fall 2011 Research
TSEM Cooper - Fall 2011 ResearchTSEM Cooper - Fall 2011 Research
TSEM Cooper - Fall 2011 ResearchLaksamee Putnam
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfeeJohn Sarr
 
Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Nadeem Sohail
 
Literature searching
Literature searchingLiterature searching
Literature searchingazjackson
 
Scholarly search vs.public
Scholarly search vs.publicScholarly search vs.public
Scholarly search vs.publicForuz Soltani
 
GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2Michele Knobel
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2Michele Knobel
 
Final to send m2 bapp arts ethical practice
Final to send m2 bapp arts ethical practiceFinal to send m2 bapp arts ethical practice
Final to send m2 bapp arts ethical practicePaula Nottingham
 
13.4.16 module 1 session 3
13.4.16 module 1 session 313.4.16 module 1 session 3
13.4.16 module 1 session 3Paula Nottingham
 
Getting started with your research skills
Getting started with your research skillsGetting started with your research skills
Getting started with your research skillsL. D. Morris
 
Structuring Information Before DITA
Structuring Information Before DITAStructuring Information Before DITA
Structuring Information Before DITASteven Jong
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdfZixunZhou
 
Nursing resources & research tutorial
Nursing resources & research tutorialNursing resources & research tutorial
Nursing resources & research tutorialSeth Porter, MA, MLIS
 
Nursing resources & research tutorial
Nursing resources & research tutorialNursing resources & research tutorial
Nursing resources & research tutorialSeth Porter, MA, MLIS
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401Traciwm
 

What's hot (20)

Mapping a path to the empowered searcher
Mapping a path to the empowered searcherMapping a path to the empowered searcher
Mapping a path to the empowered searcher
 
Social Work Masters Student Introduction
Social Work Masters Student IntroductionSocial Work Masters Student Introduction
Social Work Masters Student Introduction
 
TSEM Cooper - Fall 2011 Research
TSEM Cooper - Fall 2011 ResearchTSEM Cooper - Fall 2011 Research
TSEM Cooper - Fall 2011 Research
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfee
 
Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...Literature searching techniques and free online resources for scholars by Nad...
Literature searching techniques and free online resources for scholars by Nad...
 
670-11 Analysis of Urban Conversations 675-5
670-11 Analysis of Urban Conversations 675-5670-11 Analysis of Urban Conversations 675-5
670-11 Analysis of Urban Conversations 675-5
 
Literature searching
Literature searchingLiterature searching
Literature searching
 
Introduction to Library Research Skills
Introduction to Library Research Skills Introduction to Library Research Skills
Introduction to Library Research Skills
 
TSEM Spring 2012 - Wood
TSEM Spring 2012 - WoodTSEM Spring 2012 - Wood
TSEM Spring 2012 - Wood
 
Scholarly search vs.public
Scholarly search vs.publicScholarly search vs.public
Scholarly search vs.public
 
GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2
 
Final to send m2 bapp arts ethical practice
Final to send m2 bapp arts ethical practiceFinal to send m2 bapp arts ethical practice
Final to send m2 bapp arts ethical practice
 
13.4.16 module 1 session 3
13.4.16 module 1 session 313.4.16 module 1 session 3
13.4.16 module 1 session 3
 
Getting started with your research skills
Getting started with your research skillsGetting started with your research skills
Getting started with your research skills
 
Structuring Information Before DITA
Structuring Information Before DITAStructuring Information Before DITA
Structuring Information Before DITA
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
 
Nursing resources & research tutorial
Nursing resources & research tutorialNursing resources & research tutorial
Nursing resources & research tutorial
 
Nursing resources & research tutorial
Nursing resources & research tutorialNursing resources & research tutorial
Nursing resources & research tutorial
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401
 

Similar to CSKD Grant Meeting Notes

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Non-MARC metadata training for "traditional" catalogers: the role and importa...
Non-MARC metadata training for "traditional" catalogers: the role and importa...Non-MARC metadata training for "traditional" catalogers: the role and importa...
Non-MARC metadata training for "traditional" catalogers: the role and importa...Kelly Thompson
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchJoshuaApolonio1
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdfKuan-Tsae Huang
 
Modules module5mod5home.htmlmodule 5 homecomparing models
Modules module5mod5home.htmlmodule 5   homecomparing modelsModules module5mod5home.htmlmodule 5   homecomparing models
Modules module5mod5home.htmlmodule 5 homecomparing modelsPOLY33
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and TextNBER
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1NBER
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarNithish Kumar
 
Research report nithish
Research report nithishResearch report nithish
Research report nithishNithish Kumar
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docx
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docxNORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docx
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docxvannagoforth
 

Similar to CSKD Grant Meeting Notes (20)

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Non-MARC metadata training for "traditional" catalogers: the role and importa...
Non-MARC metadata training for "traditional" catalogers: the role and importa...Non-MARC metadata training for "traditional" catalogers: the role and importa...
Non-MARC metadata training for "traditional" catalogers: the role and importa...
 
Data informed decision making - Yaz El Hakim
Data informed decision making - Yaz El HakimData informed decision making - Yaz El Hakim
Data informed decision making - Yaz El Hakim
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
WORD
WORDWORD
WORD
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdf
 
Research Project Management
Research Project ManagementResearch Project Management
Research Project Management
 
Modules module5mod5home.htmlmodule 5 homecomparing models
Modules module5mod5home.htmlmodule 5   homecomparing modelsModules module5mod5home.htmlmodule 5   homecomparing models
Modules module5mod5home.htmlmodule 5 homecomparing models
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish Kumar
 
Research report nithish
Research report nithishResearch report nithish
Research report nithish
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docx
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docxNORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docx
NORMAN, ELTON_BUS7380-8-62NORMAN, ELTON_BUS7380-8-61.docx
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

CSKD Grant Meeting Notes

  • 1. 3/31/05 CSKD Funding Meeting Notes from CSKD Grant Meeting of 3/31/05 By Kathryn Clodfelter Attendees: Elin Jacob Kiduk Yang Kathryn Clodfelter Kiduk created a diagram of how he envisions the grant project. (See separate PowerPoint document: http://elvis.slis.indiana.edu/index.shtml ) Collection builder: • Intelligent web harvest – a customized crawler will identify: o syllabus o topics in schedule format o reading assignments associated o lecture itself, exercises/homework/problems/questions (application of lectures) o labs/exercises o Link to external resources – eg. Physics on-line dictionary, encyclopedia Question: how do we identify these? • Heuristic: based on URL information: o extension (e.g. pdf) o name of file o path o home page • Actual content of the page: o Automatic classifier – identify subset of data (like all the lectures), then train the classifier, who will identify what type of material it is; this is machine learning approach: get training data, identified and labeled documents (some set of lectures, papers), run thru statistical process – will fail if content is exactly the same o Rule-based classifier: looking for lexicons (specific words), e.g., Linguistic: look at training data manually and come up with a heuristic; whether visual format or whatever it may be – use human intelligence to identify the rules for classifying things  Computational Linguistics: comes from NLP, look at part of speech; still statistical approach, but utilizes the linguistic structure that may be necessary to identify these things Significant question: Machine learning has never been applied to classify these types of resources. That's why we're going to use all three. Has done combination of manual and automatic for TREC – use as reference. We're going to leverage machine learning, which is Page 1
  • 2. 3/31/05 CSKD Funding Meeting stable area of research. In building rule-based classifier, we're going to discover a combined method of classifying the resource type, as opposed to topical classification. One of the significant contributions to research will be to combine the machine-learning approach with heuristic (rule-based) to identify the different types/formats of documents. May look at actual markup as a basis (some research on that from Lizzie and Howard). All href anchor text is separated from content for retrieval purposes more than straight text. We're going beyond that. Why and what for? To get a high quality collection – items that aren't outside To avoid intellectual property issues, every piece of content clicked on will take to original site. We have data internally for mapping and crunching, but we deliver the original site. Collection builder can run from any web connection – doesn't have to be IU. Schedule – has list of topics, etc. presented in a particular order Plan to give the user: Intelligent sample sites: we give him 3 options – if you have a schedule in mind for specific topic, use this, etc. – try the FAQ – come up with diff things – key thing is Introductory IR – sample course – has entire website (of someone else's). then not an intelligent service where combines everything. We've manually selected the best web site. The FAQ will be manual. Elin wants an overview and synthesis of what everyone's doing out there. What's going on at higher level. FAQ #1: Sample Site – can select good source for top physics or most popular (link analysis and page rank) FAQ #2: Overview – instead of just pointing to site, decide on structure, schedule, questions, lectures with visual representation, little bit of analysis by looking at all schedules, ranked by popularity, use-based, links (if have identified vocab and topics, can map relationships bet topics in various schedules) I want to know how the topics are related; schedule is based on concepts and the relationships between them, analyze the linear relationships (a line of info is the connections between the points); the syllabus schedule doesn't follow the concept relationships; have to find a way to get to that structure thru mapping of concept relationships (concept maps are part of KB search; facet search) A faceted structure is just like a database When do a topic search browse, it shows a concept mapping Facet search has something based on our faceted prototype What are the concepts – Q2: what are the concepts? #1 doesn't give me kind of help I need not knowing anything Maybe combine 1&2, We have to come up with the questions – from Gregor – get by talking to the users/faculty That kind of FAQ doesn't come from what we're good at It's there because it's part of Gregor stuff 2 things in intelligent service: Do sample sites as one aspect of structure/template component Page 2
  • 3. 3/31/05 CSKD Funding Meeting Our focus is not to develop the FAQ – we already have several, the sample size Example: what are the most heavily-used courses on intro physics? What are most highly linked sites on physics? A lot of lecture content is repeated. 2nd question: show me what the topics are for nuclear fusion course? This was what Gregor talked about with intelligent web service. KY has steered Gregor toward glorified FAQ – it's an expert system. That's why it's called intelligent FAQ. It's a kb. Show me the syllabus overview, then do some analysis on all syllabus materials we have. Then do things by frequency, some kind of structure – these are common ones. Where's the data on how people build their syllabi? What we have is the middle one – concept/topic search/browse The FAQ whose answer requires analysis has to use the data we create Show me concept map: Show me the concept relationships on intro physics, linked to graphical concept map Concept topic search are when you have a specific topic and you want to find material Structured template populator is when instead of using a concept/topic as query, you use a structure or template (e.g. schedule) – get structure that's hyperlinked If everything has to be linked back to original, result page is organized and filtered to find what you want Needs to be diff from doing a Google search this is a digital indexing service, you're pointing me somewhere else, not giving me the resources; therefore not a DL example: get frustrated with SLIS course page, having to back to main page to go to diff session don't want something where I have to keep going back not giving them value-added 3rd component of structure/template: I came up with course structure, there's a form where I type in and create a schedule, got result page, everything is linked, That person comes in with their structure in hand I put in the topic I want, and then this table comes back all linked Date Topic Lecture & Lab Readings Assignments Can add description The link is not specific reading Remember CSKD is all about fusion – give humans options Have copyright issue – identify the available readings – can do by link analysis popularity, user popularity; user can sort by source, trustworthiness, usage Need to have something like this for someone who doesn't have the structure FAQs are for the person who has no idea It's the who, what, when, where, how and why Minimum cognitive requirement from user – don’t' even have to come up with question Some type of reasonable answer – samples, site, overview /analysis of schedule FAQs about how to use this service It's a FAQ – where get the questions Page 3
  • 4. 3/31/05 CSKD Funding Meeting Then user contribution module will stabilize if not used; we keep track of the system log, the usage of these services The way we bill it, is we will build over time – it's not the focus – where gonna get FAQs Schedule overview – click Do by top 10 overviews by linkage popularity, usage popularity, source importance, reading material, overview of readings How come up with data? We have all harvested schedules, do link analysis for top 10 Has nothing to do with what we do, it's purely statistical Can use the intelligent classifier to identify resources by structure, add the content, and plug it in Structure/template populator, the core module would be conctp/topic search – each concept employer that search to populate Middle one – you're trying to find a single topic, Last one is a list of all topics grouped Concept/Topic Search: 2 of that has concept relationships as part of result – one is facet search result with a list of results organized using our classified data Get list of resources & list of relationships (remember facet search interface) Still want thing that gives me sense of how to structure my course – that's the concept/topic browse Concept is that Develop FAQ as part of user contribution module, they will contribute data as well as Some will have structure, some won't Run thru classifier and concept mapper On top of this is digital object management Parser will try to get automatically pull metadata info – eg. Author Rule based classifier construction will be done manually, then becomes automatic Automatic classifier Once rule identified, becomes automatic classifier Manual & automatic & hybrid: whole thing about maximizing the utility/performance/task completion by leveraging both machine capability and human intelligence next time: 1) hybrid, etc. 2) flag where can make significant contribution for IR and classification Page 4