SlideShare a Scribd company logo
SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007
Contents of this presentation ,[object Object],[object Object],[object Object],[object Object],[object Object]
A) Problems of classifying digital genres
Approaches to genre and text typology ,[object Object],[object Object],[object Object],[object Object],[object Object]
A faceted classification scheme (Herring 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Aspects of digital genres concrete abstract
Are blogs a genre? ,[object Object],[object Object]
B) Granularity and meta-data in blogs
Blog content syndication ,[object Object],[object Object],[object Object],[object Object],[object Object]
A sample RSS 2.0 feed ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
C) An example: the corporate web log corpus
The corpus tool ,[object Object],[object Object],[object Object],[object Object]
MySQL data structure - sources - BNC top 100 words - sub-types of corp. blogs - blogs, press eds., ... - n-grams (not computed due to cost) - POS frequencies by post - post data (via RSS/Atom) - additional post statistics - tokens (depends on types) - types (string + POS)
Corpus data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
D) Variation and genre
Measuring text formality via f-scores ,[object Object],[object Object],[object Object]
Example: high f-score (press release) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: low f-score (blog entry) ,[object Object],[object Object],[object Object],[object Object],[object Object]
F-score over time for two sources light blue = Jonathan Schwartz (blog); dark blue = New York Times (editorials)
F-score and standard deviation for all sources x-axis = stdev; y-axis = f-score; dot size = number of posts editorials press releases
E) Individuated CL?
Observations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks for listening!
SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007

More Related Content

Similar to SchemaCMD - An XML-based storage schema for the compilation of mixed-source CMD corpora

Presentation of the AIC-IMA publishing tool for OSCI
Presentation of the AIC-IMA publishing tool for OSCIPresentation of the AIC-IMA publishing tool for OSCI
Presentation of the AIC-IMA publishing tool for OSCIRobert J. Stein
 
Getty Presentation of IMA/AIC OSCI tool
Getty Presentation of IMA/AIC OSCI toolGetty Presentation of IMA/AIC OSCI tool
Getty Presentation of IMA/AIC OSCI toolRobert J. Stein
 
Metadata Workshop-Maastricht - November 6, 2008
Metadata Workshop-Maastricht - November 6, 2008Metadata Workshop-Maastricht - November 6, 2008
Metadata Workshop-Maastricht - November 6, 2008askamy
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
Optum
 
Beyond Seamless Access: Meta-data In The Age of Content Integration
Beyond Seamless Access: Meta-data In The Age of Content IntegrationBeyond Seamless Access: Meta-data In The Age of Content Integration
Beyond Seamless Access: Meta-data In The Age of Content Integration
New York University
 
Metadata Workshop - Utrecht - November 5, 2008
Metadata Workshop - Utrecht - November 5, 2008Metadata Workshop - Utrecht - November 5, 2008
Metadata Workshop - Utrecht - November 5, 2008askamy
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Artificial Intelligence Institute at UofSC
 
April 2016 - USG Web Tech Day - Let's Talk Drupal
April 2016 - USG Web Tech Day - Let's Talk DrupalApril 2016 - USG Web Tech Day - Let's Talk Drupal
April 2016 - USG Web Tech Day - Let's Talk Drupal
Eric Sembrat
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies secondJoseba Abaitua
 
Eprints Application Profile
Eprints Application ProfileEprints Application Profile
Eprints Application Profile
Eduserv Foundation
 
Statster introduction essay
Statster introduction essayStatster introduction essay
Statster introduction essay
Yleisradio
 
A Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionA Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information Extraction
Amber Ford
 
Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchablekksst
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
Eduserv Foundation
 
(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult
SilvaGraf83
 
(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult
MoseStaton39
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
Mark Greaves
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesRalf Klamma
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Webliddy
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
Eduserv Foundation
 

Similar to SchemaCMD - An XML-based storage schema for the compilation of mixed-source CMD corpora (20)

Presentation of the AIC-IMA publishing tool for OSCI
Presentation of the AIC-IMA publishing tool for OSCIPresentation of the AIC-IMA publishing tool for OSCI
Presentation of the AIC-IMA publishing tool for OSCI
 
Getty Presentation of IMA/AIC OSCI tool
Getty Presentation of IMA/AIC OSCI toolGetty Presentation of IMA/AIC OSCI tool
Getty Presentation of IMA/AIC OSCI tool
 
Metadata Workshop-Maastricht - November 6, 2008
Metadata Workshop-Maastricht - November 6, 2008Metadata Workshop-Maastricht - November 6, 2008
Metadata Workshop-Maastricht - November 6, 2008
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
Beyond Seamless Access: Meta-data In The Age of Content Integration
Beyond Seamless Access: Meta-data In The Age of Content IntegrationBeyond Seamless Access: Meta-data In The Age of Content Integration
Beyond Seamless Access: Meta-data In The Age of Content Integration
 
Metadata Workshop - Utrecht - November 5, 2008
Metadata Workshop - Utrecht - November 5, 2008Metadata Workshop - Utrecht - November 5, 2008
Metadata Workshop - Utrecht - November 5, 2008
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
April 2016 - USG Web Tech Day - Let's Talk Drupal
April 2016 - USG Web Tech Day - Let's Talk DrupalApril 2016 - USG Web Tech Day - Let's Talk Drupal
April 2016 - USG Web Tech Day - Let's Talk Drupal
 
Metadata first, ontologies second
Metadata first, ontologies secondMetadata first, ontologies second
Metadata first, ontologies second
 
Eprints Application Profile
Eprints Application ProfileEprints Application Profile
Eprints Application Profile
 
Statster introduction essay
Statster introduction essayStatster introduction essay
Statster introduction essay
 
A Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionA Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information Extraction
 
Yahoo Making The Web Searchable
Yahoo  Making The  Web  SearchableYahoo  Making The  Web  Searchable
Yahoo Making The Web Searchable
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
 
(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult
 
(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult(Christopher)With packet-switched networks its services allow mult
(Christopher)With packet-switched networks its services allow mult
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community Engines
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
 

More from Cornelius Puschmann

Collecting Twitter Data
Collecting Twitter DataCollecting Twitter Data
Collecting Twitter Data
Cornelius Puschmann
 
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
Cornelius Puschmann
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Cornelius Puschmann
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic research
Cornelius Puschmann
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Cornelius Puschmann
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Cornelius Puschmann
 
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Cornelius Puschmann
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Cornelius Puschmann
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Cornelius Puschmann
 
Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?
Cornelius Puschmann
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Cornelius Puschmann
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Cornelius Puschmann
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...Cornelius Puschmann
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User Study
Cornelius Puschmann
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should care
Cornelius Puschmann
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Cornelius Puschmann
 
#www2010 user activity chart
#www2010 user activity chart#www2010 user activity chart
#www2010 user activity chart
Cornelius Puschmann
 
#s21 user activity chart
#s21 user activity chart#s21 user activity chart
#s21 user activity chart
Cornelius Puschmann
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Cornelius Puschmann
 

More from Cornelius Puschmann (20)

Collecting Twitter Data
Collecting Twitter DataCollecting Twitter Data
Collecting Twitter Data
 
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic research
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
 
The Pragmatics of Retweeting
The Pragmatics of RetweetingThe Pragmatics of Retweeting
The Pragmatics of Retweeting
 
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
 
Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User Study
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should care
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und Mikronarrativ
 
#www2010 user activity chart
#www2010 user activity chart#www2010 user activity chart
#www2010 user activity chart
 
#s21 user activity chart
#s21 user activity chart#s21 user activity chart
#s21 user activity chart
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 

SchemaCMD - An XML-based storage schema for the compilation of mixed-source CMD corpora

  • 1. SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007
  • 2.
  • 3. A) Problems of classifying digital genres
  • 4.
  • 5.
  • 6. Aspects of digital genres concrete abstract
  • 7.
  • 8. B) Granularity and meta-data in blogs
  • 9.
  • 10.
  • 11. C) An example: the corporate web log corpus
  • 12.
  • 13. MySQL data structure - sources - BNC top 100 words - sub-types of corp. blogs - blogs, press eds., ... - n-grams (not computed due to cost) - POS frequencies by post - post data (via RSS/Atom) - additional post statistics - tokens (depends on types) - types (string + POS)
  • 14.
  • 16.
  • 17.
  • 18.
  • 19. F-score over time for two sources light blue = Jonathan Schwartz (blog); dark blue = New York Times (editorials)
  • 20. F-score and standard deviation for all sources x-axis = stdev; y-axis = f-score; dot size = number of posts editorials press releases
  • 22.
  • 24. SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007