Final presentation al_8760_ansari_fernandez_hercules


Published on

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Final presentation al_8760_ansari_fernandez_hercules

  1. 1. Zakkiyyah Ansari<br />Maria Fernandez<br /> Christopher Hercules<br />A Comparison of Three Corpora from Internet-based Registers<br />
  2. 2. Introduction/Background<br />Email has effectively replaced the letter in recent years as the frequency of computer mediated communication has increased. Not being quite a letter, yet not being quite conversational, email has been described as “written speech” or “computer conversation” by some (Perez, Turney, & Montero, 2008.) Such a classification then allows for the creation of a new register for corpus linguists to examine. The ENRON corpus , being an example of this register, is the largest real email corpus (Yang, Zeng, & Chau, 2007.) The ENRON corpus contains e-mails that were utilized for a variety of purposes (not all of which professional), and due to this, is a very versatile corpus to work with – many forms of communication are presented here, thus furthering this notion of a register between writing and conversation.<br />Similarly, another common web-based register that has the capacity to be examined by corpus linguists is online news. Slate magazine is an on-line magazine which has never been featured in print. Topics within Slate include arts, business, sports, news, and politics; although it is primarily politically focused. It is property of the Microsoft corporation, who contributed 4,694 different articles published between 1996-2000 from its archives to the American National Corpus.<br />
  3. 3. Introduction/Background (cont.)<br />Lastly, there are blogs; perhaps one of the most frequent web-based registers in current use. Blogs are so varied in their uses and topics that it is almost impossible to talk about blogs without some context. For example, many blog-writers use their blogs for personal journals, politics, business, academic purposes, and literary criticism, to name a few (Schmidt, 2007.)<br />
  4. 4. Goals of the Project<br /><ul><li>Compare different yet related registers from internet sources to discover similar or dissimilar linguistic features.
  5. 5. Interpret common features found in the three registers to understand their level of formality or informality.
  6. 6. Disprove or confirm the intuition of whether certain registers are truly more formal or informal than others.
  7. 7. Describe and interpret the common linguistic features of each register by utilizing Biber’s 5 major dimensions of English.</li></li></ul><li>Intended Audience and Purpose Across Registers<br />Registers Audience Purpose<br />Email Enron Employees/other Enron affairs/other<br />Blog Blog readers Topical Knowledge <br />Slate Online public News/Politics<br />
  8. 8. Dimensional Discourse Analysis<br />Dimension 1: Edited informational discourse vs. on-line informational discourse<br />Dimension 2: Involved personal discourse vs. non-personal uninvolved discourse<br />(Biber, 1998, p. 183) <br />
  9. 9. Enron Email Example<br />Hi. This is ***, and now ***'s, friend ***. I know that she spoke <br />with you about me, and I wanted to see if you might be interested in getting <br />together sometime. You probably want to know a little about me. I work at <br />Enron as an electricity trader, and I've lived in Houston for about three <br />years. I've known Richard 9 years from our fraternity days together. I'm 27 <br />and I graduated from UT 4 1/2 years ago. Of course, I would be interested in <br />hearing something about you as well. *** just told me that she had a good <br />friend in Houston that she thought I should meet. Let me know what you <br />think, and we can go from there.<br />***Wow, I almost forgot. Thank you. I will contact her soon. Thanks ***<br />Monday Feb 4th<br />
  10. 10. An Example Slate Article Excerpt<br />The papers report that the federal judge who presided over the Paula Jones lawsuit yesterday ordered President Clinton to pay nearly $90,000 to Jones' legal team for giving false testimony about his relationship with Monica Lewinsky. The sum is less than the Jones lawyers asked for, but more than Clinton offered. Clinton and his lawyer said that they will pay the money without further legal protest. The WP and LAT play the story below the fold, the NYT reefers it, and USAT runs it on Page 5. This play, given the story's unprecedented content, signifies that editors have just plain had it with the whole topic and are sure that readers feel likewise.<br />The WP , NYT , LAT and Wall Street Journal report that China has issued an arrest warrant for Li Hongzhi, the New York-based leader of the now-banned sect Falun Gong, charging him with the deaths of hundreds of his followers. The papers note that the warrant is much more political than legal, in that the U.S. Does not have an extradition treaty with China.<br />The NYT reports that immediately after last summer's terrorist bombing of two American embassies in Africa, the government of Sudan detained two suspects, only to angrily release them after the U.S. conducted a cruise missile strike against an alleged chemical weapons facility in Sudan. Also, the paper says, Sudanese officials claim that the U.S. ignored their message that they had suspects in the case in custody. The NYT says that Sudan's notification has been confirmed by some American officials.<br />– Slate Article 247_3303 from the American National Corpus<br />
  11. 11.
  12. 12. Word Count Result/Interpretation<br />In the chart of average word count, there is an exceptional difference across the three registers word counts. However, the register with the highest word count is the Blog (4,615.5). One explanation that could explain this difference is the fact that blogs tend to be less concise and less informal. While blog writers are creating blogs, they are self conscious of their audience. Therefore, in order to attract readers, bloggers must use attractive language in order to hold a readers attention. In that process writing becomes more drawn out and detail oriented. <br />In addition, the second highest average word count register is the Enron emails. After analyzing random samples of the Enron email, it can be stated that many of the emails are informal and involved. Thus, both the Blog and Enron Email registers can be characterized as On-line informational in Discourse Dimension 1. In Discourse Dimension 2, the Enron emails and the blogs can be generalized to exhibit features of Involved personal discourse.<br />
  13. 13. Dimension 1 Analysis<br />Positive features to be analyzed and interpreted:<br />1.) First Person Pronouns<br />2.) Second Person Pronouns<br />3.) It Occurrence<br />4.) Private Verbs<br />
  14. 14. Dimension 1 Analysis (cont.)<br />Negative features to be analyzed and interpreted:<br />1.) Noun Usage<br />2.) Word Length<br />
  15. 15. Dimension 1 Analysis (cont.)<br />According to Biber, more positive features in dimension 1 generalizably suggests more involved discourse.<br />Cyclically, more negative features suggests more in terms of informational production.<br />(Biber, 1998)<br />
  16. 16.
  17. 17. First Person Interpretation/Analysis<br />According to Longman Grammar of Spoken and Written English, “first person pronouns function to refer to the speaker/writer” (Biber, 2007, p. 41.)<br />Furthermore, according to Ingrid Westin, first person pronouns are generally avoided in newspaper writing (2002.)<br />Thus, given the data, it would be sensible to conclude that the infrequent use of first person pronouns in the Slate corpus is predominantly due to the fact that it is a source for news.<br />
  18. 18.
  19. 19. Second Person Result/Interpretation<br />The use of the 2nd person in ENRON e-mails is not particularly surprising, given that e-mails are prone to less formality and are usually written as a dialogue between two people (with the exception of office memos which includes an audience of more than one person). However, even e-mails addressed to more than one person would still utilize the 2nd person.<br />The same can be said for blogs, as blogs may be less formal than Slate magazine columns, for example. We attribute the lower 2nd person usage in blogs to the wide variety of blogs included in the corpus. Many may resemble Slate magazine columns and be more informational, while others may be more involved.<br />
  20. 20.
  21. 21. Results & Interpretation of it occurrence<br />Although when we measured this feature, we anticipated that blogs and ENRON would show the involved feature of “pronoun it” as shown in dimension 1, the data suggests the contrary, as ENRON shows the least frequency of the pronoun it.<br />
  22. 22. Results and Interpretation of itOccurrence <br />With that said, Biber, Conrad, Johansson, Leech, & Finegan state as well in the Longman Grammar of Spoken and Written English (2007, p. 235 )that news registers have a very high count of both nouns and pronouns in general, much higher than conversational registers. Figure 4.1 from p. 235 of Longman Grammar demonstrates this. <br />
  23. 23. Private Verbs<br />In Variation Across Speech and Writing, Biber states that private verbs are associated with expressing intellectual states or non-observable intellectual acts (1992, p. 242.) In addition, he states, “private verbs (e.g. think and feel) are used for the overt expression of private attitudes, thoughts, and emotions.” <br />Since we can assume with that at least some of the blogs in the corpus were written in a more diary or journal format due to the data provided, the similar private verb counts between the blogs and ENRON e-mails are unsurprising. Slate, as a magazine, would have less of these as they are focused on reporting news and covering politics.<br />
  24. 24.
  25. 25. Noun Usage Result/Interpretation<br />The highest register for noun usage is Slate (320.6) in comparison to Enron (245.5 and Blog ( 261.0). This suggests that Slate, being a magazine, involves more informational production. For instance, when reporting events, detailed descriptions are given. This example from Slate demonstrates this:<br />In the 1970s, W. Glenn Campbell had a brilliant idea for reviving the backwater California think tank he ran: He would hire pre-eminent scholars who were being let go from their universities because they had reached the age of mandatory retirement. So in the 1970s, Campbell lured philosopher Sidney Hook, physicist Edward Teller, and Nobel Laureate economist Milton Friedman to the Hoover Institution at Stanford University.  <br />
  26. 26.
  27. 27. Word Length Result/Interpretation<br />The resulting high word length for the Slate corpus suggests that Slate actually falls under Dimension 1’s informational production (Biber et al., 1998.) As is stated in the text, Corpus Linguistics: Investigating Language Structure and Use, word length can indicate an “informational focus and a careful integration of information in a text” (1998, p. 149.)<br />The low word length counts for the blogs and ENRON e-mails suggest the opposite – an involved focus.<br />
  28. 28. Dimension 2 Analysis<br />Positive features to be analyzed and interpreted:<br />1.) Third Person Pronouns<br />2.) Public Verbs<br />
  29. 29. Dimension 2 Analysis (cont.)<br />Higher frequency counts in the positive features of Dimension 2 suggest a more narrative discourse.<br />Cyclically, lower counts in the positive features of Dimension 2 (and higher counts in the negative features) suggest a more non-narrative discourse.<br />(Biber et al., 1998)<br />
  30. 30.
  31. 31. Third Person Result/Interpretation<br />While, Slate (22.6) and Blog(21.8) both have close counts of 3rd person usage, the ENRON e-mails (12.1) corpus is significantly low. Generally, magazines have a frequent usage of the 3rd person because of reporting. Magazines indirectly address the audience, given that information in magazines is reported, following a formal format.<br />However, blogs, depending on the type and topic can be formal or informal. One explanation for such a close count between the Slate and Blog corpora is the type of blogs used in the sample. <br />For example, here are two different blogs that represent formal language and informal language: <br />
  32. 32. Informal<br />I normally would not post this type of recipe until summer when I would probably feature it as a main course for a poolside buffet ; but I was so excited to try it I decided to think of it as a practice run.  It all started with leftover pineapple and  grew from there.  Although I found many recipes for this particular rice dish, the one I decided to try came from Closet Cooking, a food blogger I follow from the West Coast.<br />Formal<br />Scientists recently gleaned valuable information from emails sent by its employees in the 18 months prior to the company’s collapse. New Scientist reported that two researchers at the Florida Institute of Technology assessed 517,000 emails sent to approximately 15,000 employees at the now defunct energy company.<br />(Courtesy Lindaraxa & Hoban)<br />
  33. 33. Public Verbs<br />The public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively. Slate had a slightly higher count with 7.13. We attribute this to the fact that Slate should have more positive narrative features, much like the American history articles mentioned in Biber, Conrad, and Reppen(1998) on pp. 159-160. <br />Slate is concerned with reporting events and offering commentary on these events. As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.<br />
  34. 34. Dimension 3 Analysis<br />Positive features to be analyzed and interpreted:<br />1.) wh- Relative Clauses<br />2.) Nominalizations<br />
  35. 35. Dimension 3 Analysis (cont.)<br />Negative features to be analyzed and interpreted:<br />1.) Time Adverbials<br />2.) Place Adverbials<br />
  36. 36. Wh- Relative Clauses<br />
  37. 37. Wh- RelativeClauses<br />In the wh- relative clause average frequency counts, the data suggests that the Slate and Blogs corpora utilize more elaborated reference.<br />In terms of situation-dependent reference, the ENRON corpus contains significantly less uses of wh- relative clauses.<br />Due to the fact that wh- relative clauses are a positive feature in dimension 3, a low average frequency count suggests situation-dependent reference whereas a high average frequency count suggests elaborated reference.<br />
  38. 38. Time Adverbials<br />
  39. 39. Place Adverbials<br />
  40. 40. Time/Place Adverbials<br />In the time and place adverbials average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more situation-dependent reference.<br />In terms of elaborated reference, the Slate corpus contains significantly less uses of time and place adverbials.<br />Due to the fact that time and place adverbials are a negative feature in dimension 3, a low average frequency count suggests elaborated reference whereas a high average frequency count suggests situation-dependent reference.<br />
  41. 41. Nominalizations<br />
  42. 42. Nominalizations<br />In the nominalization average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more elaborated reference.<br />In terms of situation-dependent reference, the Slate corpus contains significantly less uses of nominalization.<br />Due to the fact that nominalizations are a positive feature in dimension 3, a high average frequency count suggests elaborated reference whereas a low average frequency count suggests situation-dependent reference.<br />
  43. 43. Interpretation and Analysis of Dimension 3<br />Although the ENRON and Blog corpora may seem to have a contradictory relationship with Dimension 3 (i.e., more elaborated reference according to the positive features and more situation-dependent reference according to the negative features –except in the case of wh- relative clauses), it is important to note that time and place adverbials are usually used for “text-external references to the physical context of the discourse” (Biber et al., 1998, p. 153.)<br />Moreover, wh- relative clauses are used to specify the identity of referents within a text in an explicit/elaborated manner (Biber et al., 1998.)<br />
  44. 44. Interpretation and Analysis of Dimension 3 (cont.)<br />Given this information, it is then not hard to consider the reasons behind such a discrepancy.<br />1.) The Slate and Blogs corpora are highly elaborated in wh- relative clauses as there must be constant reference to those who are being addressed; most likely due to the nature of weblogs and news writing – in a sense this is because they are not formally addressed to a singular individual (most times) and are thus more elaborated or explicit in their use of reference… as opposed to say an e-mail that generally already tends to have a clear and concise personal reference (i.e., the receiver of the e-mail). However, it is also important to note that according to Ingrid Westin, in news writing there was a decrease in the use of wh- relative clauses that started in the 1970’s (Westin, 2003.) Up to this point there have been no significant statistical changes in the use of wh- relative clauses in news writing. Although only speculative, perhaps this resurgence of wh- relative clauses in Slate then has to do with the nature to web-based news media, and some more elaborated/involved tendencies that the register of online news is beginning to adopt. <br />2.) In terms of time and place adverbials, it would also make sense that Blogs and ENRON would contain more situation-dependent reference as there would be more time-external references to physical contexts outside of the discourse – this would likely signal more involved production (see dimension 1 analysis) and thus would be less related to Slate given its generally informational nature.<br />
  45. 45. Interpretation and Analysis of Dimension 3 (cont.)<br />However, in terms of nominalizations, the data suggests findings that are quite contrary to intuition. That is to say that given the informational and professional nature of Slate, one could assume that nominalization would be more frequent in the Slate corpus than in the ENRON or Blog corpora – in a sense, this would be relating the Slate corpus to things like AWL or other corpora that involve academic/professional language use where nominalizations tend to be more frequent.<br />The data suggests instead that there is significantly less use of nominalization in Slate than in the ENRON and Blog corpora. In fact, there is about half as much nominalization in Slate than in the other two corpora.<br />
  46. 46. Interpretation and Analysis of Dimension 3 (cont.)<br />One explanation for this comes from Maurizio Gotti’s text Investigating Specialized Discourse. In Gotti’s words, “one effect of nominalization is the simplification of syntactic structures within a sentence” (Gotti, 2008, p. 83.)<br />In terms of e-mails and blogs a simplification of the syntactic structures would be extremely important given the often brief and concise nature of both registers. This, in part can help explain why the Slate nominalization counts would be so low and the ENRON and Blog counts so high.<br />
  47. 47. Interpretation and Analysis of Dimension 3 (cont.)<br />Furthermore, according to Gotti, the increase in the use of nominalization is part of a “gradual tendency towards a loss of importance with the verb” (Gotti, 2008, p. 167.)<br />In addition to news writing’s attempt to target a vast audience (and thus lowering the required “reading level” for a text – which could also help explain the low nominalization counts and low word length), news articles tend to be more active in order to convey important information in such a way that is digestible to the reader. <br />For instance, according to Westin “the increase of present tense verbs suggests an increased interest in topics of current relevance [i.e., news]” (Westin, 2003, p. 39.)<br />
  48. 48. Interpretation and Analysis of Dimension 3 (cont.)<br />Finally, one last way to look at the anti-intuitively low counts in Slate’s use of nominalization is to consider the anti-intuitively high counts in nominalization in the ENRON and Blog corpora.<br />In the blog corpus, this could be very well due to the wide range of blogs contained within the corpus itself (i.e., some more formal or professional and some more informal or personal) – thus the use of nominalization then would be subject to all sorts of different criteria. <br />However, in the ENRON corpus, one would think that nominalization would be more infrequent given the (generally perceived) conversational nature of e-mails. According to Suzanne Eggins, conversational interactions generally contain very low use of nominalization (Eggins, 2004. ) With that said, perhaps this goes back to an e-mail being somewhere between conversation and a letter – making it the intermediary web-based register that it is. E-mails then are not specifically conversation, nor are blogs.<br />
  49. 49. Dimension 4<br />Biber’s dimension 4 is described as the overt expression of argumentation. In other words, this dimension includes features that are associated with persuasive language. <br />Infinitives, the highest feature associated with dimension 4, were very similar in all three registers. ENRON had an average count of 15.5, blogs had a count of 14.94, and Slate had the lowest count at 12.45.<br />The next feature in dimension 4 is the occurrence of prediction modals. ENRON had a exceptionally high count of prediction modals with 13.04, while the blog average was 6.81. The average for Slate was again the lowest at 5.62.<br />
  50. 50. Dimension 4 (cont.)<br />Suasive verbs occurred at roughly the same frequencies, with ENRON having an average of 1.22, blogs having an average of 1.07, and Slate having an average of 1.05.<br />Necessity modals occurred most frequently in blogs (with an average of 3.51,) with ENRON and Slate having averages of 3.22 and 2.22.<br />Possibility modals occurred more frequently in the ENRON emails, with an average of 8.77. The blogs and Slate had averages of 6.41 and 4.88 respectively.<br />
  51. 51. Interpretation of Dimension 4 Results<br />ENRON had a surprisingly high average of prediction modals, and a slightly high average of possibility modals. This can be explained by a study by Carmen Frehner (2008) that prediction modals and possibility modals are most common in conversation, and email features more closely reflect those of conversation or a dialogue than the other corpora we are examining.<br />On all features in Dimension 4, Slate had the lowest averages. The reasons for this will be discussed in the conclusions.<br />
  52. 52. Prediction & Possibility Modals in Emails<br />Frehner, 2008, p.71<br />
  53. 53. Dimension 5<br />Dimension 5 is described as “impersonal versus non-impersonal style.” This feature was formerly known as abstract versus non-abstract.<br />Although not many features listed in dimension 5 were available to analyze, we were able to extract data on both by-passives and agentless passives.<br />Agentless passives were most common in Slate at 8.18 while ENRON and the blogs had averages of 5.08 and 7.43.<br />By-passives were also highest in Slate at 1.22, while ENRON and the blogs had similar counts of 0.68 and 0.87.<br />We think that this may suggest that Slate has a slightly more abstract or impersonal style. <br />
  54. 54. Other Features to Consider<br />Overall Verb Usage<br />Activity Verbs<br />
  55. 55.
  56. 56. Overall Verb Usage Result/Interpretation<br />Although we could not find any research that supports that high or low verb usage alone is significant, we do find that certain types of verbs occurred more or less frequently within the different corpora. As stated in our dimension 1 analysis, the private verb counts between the Blogs and ENRON corpora are expected given that some of the blogs can be assumed to have been written in a diary/journal format. Furthermore, Slate, as a magazine, would have less private verbs as they are focused on reporting news and covering politics.<br />On the contrary, the public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively. Slate had a slightly higher count with 7.13. We attribute this to the fact that Slate should have more positive narrative features as mentioned in dimension 2. Moreover, Slate is concerned with reporting events and offering commentary on these events. As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.<br />
  57. 57.
  58. 58. Activity Verbs Result/Interpretation<br />According to Biber (2007) on p. 336 of Longman Grammar, across semantic domains, activity verbs have a high frequency in conversation. The only place where they have a higher frequency is fiction. <br />We attribute the high frequency of activity verbs in both blogs and ENRON e-mails to be due to the nature of these corpora being more placed on the positive features of Biber’s dimension 1, suggesting more involved production. Blogs may be more informal, but this depends on the type of blog. However, e-mail often functions much more like conversation.<br />
  59. 59. Our Intuitions about the ENRON Corpus<br />Generally, company emails are viewed as tools to use in the office. Strictly professional tools to communicate to co-workers about work related affairs. Therefore the assumption would be that a company, such as Enron, would tend to engage in more formal language, a conciseness of language, frequent usage of 1st or 2nd person pronouns, and less usage of nouns. <br />Reasons:<br />Conciseness of Language- Specific tasks that are explained within an email. Email is not used for informal conversation, so there is less reason for a high word count.<br />Frequent 1st or 2nd pronouns- Usually the employee is writing about an office task that they are engaged in or want others to take part in. Words such as “I” and “you” are commonly used.<br />Less usage of nouns- Emails consist of directives, which request participation from another. Therefore activity is more frequent, and less informational which is associated with nouns.<br />In order to illustrate this, here is an example of a standard professional e-mail:<br />
  60. 60. To: abc@wyz.comCC: Accounts PayableSubject: Request for copy of invoiceDear ABC,I'm LMN form the Accounts Payable department at GHI. Ltd. I understand that we have an invoice outstanding with your company since 07/01/2010. This email is to request you for a copy of the invoice, so that we can clear it for payment at the earliest. First of all, apologies for the delay in payment. The accounts team has been reshuffled and this case came to my notice just an hour ago and I am writing to you immediately. The invoice in question is invoice number 246849, for Mr.JKI who stayed at your hotel for a period of 4 days. That is, from 06/28/2010 to 07/01/2010. We cannot seem to locate the invoice, so I request you to email me a copy of the invoice, so that I can issue the payment right away. Please send it to the email address mentioned below and mark it for my attention. Once again, sincere apologies for the delay.Thank you,LMN,Senior ExecutiveAccounts Payable,GHI. Ltdemail: <br />Courtesy: (Iyer, 2010)<br />
  61. 61. Our Intuitions about Slate<br />As a group, we had little to no familiarity with Slate Magazine’s content. However, in our early research, we uncovered a few comments regarding Slate’s “left-leaning, liberal bias.” Given this, our intuitions told us that there would be a tendency towards persuasion; which, subsequently, could be reflected in an analysis of dimension 4 (i.e., overt expression of argumentation).<br />We also thought that Slate, given its web-based nature, would reflect patterns as seen in blogs, thus making it more informal.<br />
  62. 62. Our Intuitions about Blogs<br />Given the lack of metadata about the Blogs corpus, we could only rely on the raw data to inform us as to the content of the overall corpus.<br />However, we did have a few preconceived notions about what the Blogs corpus might be like:<br />1.) They would score high in terms of involved production via dimension 1.<br />2.) They would score high in terms of narrative discourse via dimension 2.<br />3.) They would tend to more closely mirror the frequency counts of the ENRON corpus.<br />
  63. 63. Conclusion<br /><ul><li>While the data regarding the Enron Emails corpus suggests more informal and non professional discourse, the data also provides partial evidence for support of some of the intuitions previously stated. Based on the data gathered, the Enron Emails in comparison to the Slate and Blog corpora, had more concise, less detailed language and tended to directly address the reader.
  64. 64. In addition to this, ENRON also had a tendency to reflect conversational discourse as seen in our analysis of dimensions 1; this is given that ENRON tended to exhibit frequency counts that suggested that it was more involved.
  65. 65. As for the Blogs corpus, according to the data, the blogs were high in involved production and narrative discourse. However, they did not always mirror the frequency counts of ENRON (as seen in the uses of prediction modals and third person pronouns). Also, according to our analysis of dimension 4, the Blogs corpus was more persuasive than Slate (although less than ENRON). These factors, forced us to conclude that the blogs, as a register, fell somewhere in between the ENRON and Slate corpora.</li></li></ul><li>Conclusion (cont.)<br />In terms of Slate, the data suggested that the corpus was the least persuasive and less informal than expected (as reflected in Dimensions 4 and 5).<br />However, through our analysis of wh- relative clauses in dimension 3, it was suggested that wh- relative clauses were more frequent in Slate (which is rare for a news source according to Ingrid Westin); possibly being due to the fact that Slate is entirely Web-based. Although intriguing, such a speculation merits further research.<br />
  66. 66. References<br />Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. <br />Biber, D., Conrad, S., Johansson, S., Leech, G., & Finegan, E. (2007). Longman grammar of spoken and written English. Essex, England: Pearson Education Limited. <br />Biber, D. (1992). Variation across speech and writing. New York, NY: Cambridge University Press.<br />Cho, T. (2010). language@internet. Retrieved 3 26, 2011, from Linguistic Features of Electronic Mail in the Workplace: A Comparison with Memoranda:<br />Eggins, Suzanne (2004). An introduction to systemic functional linguistics. New York: Continuum International Publishing Group.<br />Frehner, Carmen (2008). Email, SMS, MMS: the linguistic creativity of asynchronous discourse in the new media age. New York: Peter Lang<br />Gotti, Maurizio (2008). Investigating specialized discourse. New York: Peter Lang.<br />Hoban, J. (2009, June 24). Hidden Patterns – Enron Email Predicted Collapse. Retrieved 03 26, 2011, from Xobni Blog :<br />Iyer, S. (2010, August 13). Business Email Sample. Retrieved 04 2011, from <br /><br />Lindaraxa. (2011, April 13). Lindaraxa:Thai Pineapple Fried Rice With Shrimp. Retrieved 04 14, 2011, from<br />Perez Sabater, C. , Turney, E. , & Montero Fleta, B. (2008). Orality and literacy, formality and informality in email communication. Iberica: Revista De La AsociacionEuropea De Lenguas Para Fines Especificos/Journal of the European Association of Languages for Specific Purposes (AELFE) (IbericaR), 15, 71-88.<br />
  67. 67. References<br />Schmidt, J. (2007). Blogging practices: An analytical framework. Journal of Computer-Mediated Communication, 12(4), article 13.<br />Westin, Ingrid (2003). Language change in English newspaper editorials. New York: Editions Rodopi B.V.<br />Yang, C.C., Zeng, Daniel, & Chau, Michael (Eds). (2007). Proceedings from the Pacific Asia Workshop, PAISI ‘07: Intelligence and Security Informatics. Chengdu, China: Springer<br />Yoffe, E. (2011, April 14). Please Take the Gold Watch. Please! Retrieved 04 14, 2011, from Slate:<br />