The document summarizes the results of analyzing metadata and conducting topic modeling on articles published in the MIS Quarterly journal over the last 20 years. Some key findings include:
- The number of articles published per year and average number of keywords per article have both doubled over 20 years. Most articles fall into research articles or special issue categories.
- Average article length and abstract length have increased, while average title length has remained consistent. Average number of tables, figures, and references per article have also seen small increases.
- Topic modeling of abstracts identified 8 dominant topics discussed in articles, including user-centric approaches and product attributes. Some topics like ethics have decreased while others like firm investments have increased.
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
Given a large collection of uncategorized text documents such as blogs, news articles, research papers or historical documents, how can we automatically discover major subject areas or topics in the collection? In addition, how should the abstract notion of "topic" be mathematically represented and presented to an end-user? For example, a document describing UTCS intuitively might be a combination the topic "computer science" and the topic "University of Texas". Most topic models--and in particular the most common model Latent Dirichlet Allocation (LDA)--attempt to answer these questions by proposing that each topic can be represented as a simple frequency distribution over possible words (i.e. a Multinomial distribution). With this representation of a topic, the ubiquitous presentation of the topic to an end-user is a list of top 10 or 20 words. While LDA has been useful in many applications, we suggest that a simple frequency distribution is an oversimplified notion of topic and hinders both interpretation and further analysis of these topics. Thus, we propose the new topic model Admixture of Poisson MRFs (APM). Unlike in previous models, the topic representations in APM allow dependencies between words. For example, if a computer science paper contains the word "programming", it is more likely to contain the word "languages" than a random computer science paper. This talk describes the APM model, the optimization algorithm for fitting APM, some preliminary results, and some future directions.
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
Slides from the introductory tutorial to topic modeling with R and LSA, pLSA and LDA algorithms organized at LAK15 conference in Poughkeepsie, NY March 17, 2015
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
LAK'16 Conference paper presentation:
abstract:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen’s kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features
than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
Given a large collection of uncategorized text documents such as blogs, news articles, research papers or historical documents, how can we automatically discover major subject areas or topics in the collection? In addition, how should the abstract notion of "topic" be mathematically represented and presented to an end-user? For example, a document describing UTCS intuitively might be a combination the topic "computer science" and the topic "University of Texas". Most topic models--and in particular the most common model Latent Dirichlet Allocation (LDA)--attempt to answer these questions by proposing that each topic can be represented as a simple frequency distribution over possible words (i.e. a Multinomial distribution). With this representation of a topic, the ubiquitous presentation of the topic to an end-user is a list of top 10 or 20 words. While LDA has been useful in many applications, we suggest that a simple frequency distribution is an oversimplified notion of topic and hinders both interpretation and further analysis of these topics. Thus, we propose the new topic model Admixture of Poisson MRFs (APM). Unlike in previous models, the topic representations in APM allow dependencies between words. For example, if a computer science paper contains the word "programming", it is more likely to contain the word "languages" than a random computer science paper. This talk describes the APM model, the optimization algorithm for fitting APM, some preliminary results, and some future directions.
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
Slides from the introductory tutorial to topic modeling with R and LSA, pLSA and LDA algorithms organized at LAK15 conference in Poughkeepsie, NY March 17, 2015
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
LAK'16 Conference paper presentation:
abstract:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen’s kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features
than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Keynote at the Insight@DCU Deep Learning Workshop (https://www.eventbrite.ie/e/insightdcu-deep-learning-workshop-tickets-45474212594) on successes and frontiers of Deep Learning, particularly unsupervised learning and transfer learning.
A summary of my personal expertise and knowledge completed by a description of some of the most relevant research and developments engagements carried out so far via specific examples
Transfer Learning for Natural Language ProcessingSebastian Ruder
Slides on Transfer Learning for Natural Language Processing by Sebastian Ruder. Talk given at Natural Language Processing Copenhagen Meetup on 31 May 2017.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Contributions to the multidisciplinarity of computer science and ISSaïd Assar
Les diapos de ma présentation HDR en informatique (CNU section 27) à l'université Paris 1 Panthéon Sorbonne le vendredi 20 janvier 2017. L'enregistrement vidéo de la soutenance est visible sur https://www.youtube.com/watch?v=1ro_iaI-roA
--
Slides of my presentation for Habilitation (HDR) defense in computer science (Informatique section 27 CNU) at University Paris 1 Panthéon Sorbonne on Friday January 2017.
Video recording is visible on https://www.youtube.com/watch?v=1ro_iaI-roA
Tutorial given at LAK13 conference, Leuven, April, 9th, 2013. The presentation is informed by WP2 of the LinkedUp-project.eu that develops an Evaluation Framework for Open Web Data (Linked Data) Applications for Education purposes.
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...Olga Maksimenkova
Presenation at ICL-2018 by Fedor Dudyrev, Olga Maksimenkova, and Alexey Neznanov. The paper deals with adaptive learning system which provides cognitive scaffolding in Vygotsky meaning and based on FCA and descriptive logic foundation.
For you who wants to download this, please follow this link: http://sdrv.ms/SkymHg
A simpler version of my previous seminar slide, but provides a clearer explanation to the LDA.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Keynote at the Insight@DCU Deep Learning Workshop (https://www.eventbrite.ie/e/insightdcu-deep-learning-workshop-tickets-45474212594) on successes and frontiers of Deep Learning, particularly unsupervised learning and transfer learning.
A summary of my personal expertise and knowledge completed by a description of some of the most relevant research and developments engagements carried out so far via specific examples
Transfer Learning for Natural Language ProcessingSebastian Ruder
Slides on Transfer Learning for Natural Language Processing by Sebastian Ruder. Talk given at Natural Language Processing Copenhagen Meetup on 31 May 2017.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Contributions to the multidisciplinarity of computer science and ISSaïd Assar
Les diapos de ma présentation HDR en informatique (CNU section 27) à l'université Paris 1 Panthéon Sorbonne le vendredi 20 janvier 2017. L'enregistrement vidéo de la soutenance est visible sur https://www.youtube.com/watch?v=1ro_iaI-roA
--
Slides of my presentation for Habilitation (HDR) defense in computer science (Informatique section 27 CNU) at University Paris 1 Panthéon Sorbonne on Friday January 2017.
Video recording is visible on https://www.youtube.com/watch?v=1ro_iaI-roA
Tutorial given at LAK13 conference, Leuven, April, 9th, 2013. The presentation is informed by WP2 of the LinkedUp-project.eu that develops an Evaluation Framework for Open Web Data (Linked Data) Applications for Education purposes.
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...Olga Maksimenkova
Presenation at ICL-2018 by Fedor Dudyrev, Olga Maksimenkova, and Alexey Neznanov. The paper deals with adaptive learning system which provides cognitive scaffolding in Vygotsky meaning and based on FCA and descriptive logic foundation.
For you who wants to download this, please follow this link: http://sdrv.ms/SkymHg
A simpler version of my previous seminar slide, but provides a clearer explanation to the LDA.
The European Industrial Minerals Association issues Recognition Awards every 2 years.
The IMA-Europe Recognition Award are granted to outstanding projects that bring significant contributions to Safety, Innovation, Biodiversity, and to Public Awareness, Acceptance & Trust.
Website: http://www.ima-europe.eu/award/
There are four main components to our Active
Communication Experience: Listen, Collaborate,
Perform, and Report. These pieces combine to form a
Continuous Loop Process; once we engage, the process
does not end—it only builds momentum, creating scalable
historical knowledge that ties together your current
event to all upcoming events.
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzzXavier Chefneux
De plus en plus d’entreprises investissent Internet. Afin d’accompagner leur contenu, ces entreprises emploient souvent les grands moyens financiers ou humains pour provoquer le « buzz ». Que ce soit par de la publicité à proprement parlée, ou par des stratégies de communication assimilées aux relations publiques. Dans le contexte actuel où les médias sociaux croissent en importance, toute action, bonne ou mauvaise, peut être sujette à discussion, détournement ou accusation. Les stratégies de certaines entreprises peuvent alors être rapidement épinglées en « good buzz » ou « bad buzz ». Par l’étude du cas de Gamma et des « Kluspoezen », cette analyse a pour but de répondre à la question suivante : jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le « buzz » ?
Manuscript editing | Research data analyst | Data analysisPubrica
Big data analytics is significant because it allows businesses to use their data to uncover areas for improvement and optimization. Increasing efficiency leads to more intelligent operations, bigger earnings, and pleased consumers across all company sectors.
Read more @ https://pubrica.com/academy/manuscript-writing/how-to-prepare-a-manuscript-on-big-data-analytics/
School of Accounting Trimester 3A 2013 Information Sheet Tes.docxkenjordan97598
School of Accounting Trimester 3A 2013 Information Sheet
Test 2 (15%) -Essay
Due Week 9 (5 pm on Friday 3rd January 2014 uploaded through Turnitin on Blackboard)
In the March 2001 edition of Australian CPA there was an article by Ian Nash and Adam Awty entitled “Just clowning around?”. The following is a quote from the article:
Basically, environmental and social reporting is when the accounting profession eases into its Birkenstock sandals and becomes green, fluffy and friendly. It’s the type of reporting that nobody is the market could possibly take seriously, and even if it’s on the horizon, it’s a long way from becoming a regulatory and legal issue. True or false?
With reference to accounting theory critically evaluate the above quotation and provide an opinion on the ‘true or false’ question
As outlined in the unit outline page 5, students are required to write an essay and address the following
The essay should be no smaller than 1000 words and no greater than 1500 words. (Use the word count in Microsoft office and write the number of words at the end of the essay).
Required Format Students are required to upload their document through “Turnitin” in Blackboard. By no later than 5pm on Friday 3rd January 2014. Essays should be typed using Microsoft Word with a minimum size 11 font and using minimum 1.5 line spacing (no single spaced submissions please). Left and right page margins should be at least 3 cm. Chicago referencing style is required for in-text and end-text referencing. A completed assignment coversheet should be included with the assignment and the declaration signed by the student indicating that the work submitted is his/her own work. University policies and procedures for academic misconduct and plagiarism will be applied. Further information is available at academicintegrity.curtin.edu.au. Unsigned declarations will not be accepted. Originality reports can be viewed by students to ensure they have referenced where appropriate and not plagiarised. (In summary, plagiarism is not giving due reference to work that is not your own whether copied or paraphrased). Students can reload their edited documents multiple times prior to the submission time. Assignment cover sheets may increase the percentage of similarity but this can be ignored along with percentages related to end-text references. Other similarity matches will all be examined closely to ensure that students submit their own work. As a guide, try to keep similarity below 20%. IMPORTANT -The file name of the word document submitted (ie the submission title) needs to reflect your location and student ID for example if you are from Sydney your file name should be (SYD_12345678) or Hong Kong (HK_12345678) or Singapore (SING_12345678). File name should not include your name or title test 2, you can include these in your actual document, NOT THE FILE NAME.
Failure to comply with labelling and formatting instructions will result in loss of u.
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicaçõesSIBiUSP
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações foi realizado dia 05 de junho de 2018 na no Auditório da Engenharia Elétrica da Escola Politécnica da USP. O evento foi promovido pelo Sistema Integrado de Bibliotecas da USP - SIBiUSP, a Divisão de Biblioteca da Escola Politécnica da USP e a Biblioteca do IME USP em parceria com a EBSCO e teve como objetivo apresentar dicas sobre como publicar com o IEEE para aumentar a visibilidade, a atividade de pesquisa e a reputação dos pesquisadores em nível internacional. Ministrante: Paul Canning.
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docxoswald1horne84988
· ;/,/Individual Research Paper Topics
Discussion Topic
I'm Done
Research the speculations on where the state-of-the-art will be in the near future for one of the following technologies. Your paper should include a description of the state-of-the-art in your technology, a discussion of where the sources that you read believe the technology is heading in the near future, and a discussion of how this technology will affect the choices you would make if you were making purchase recommendations for a client. Although there is room for personal opinion in your paper, you must justify your conclusions.
Firewall policies and methodologies
Intrusion Detection
Routing protocols
Wireless network quality of services
Compare layer 2 wireless network with layer 2 wired-line network
Comparing transport layer protocols – more than TCP and UDP
Service Oriented Architecture (SOA)
Network virtualization
Video and Voice over Internet (VVoIP) or Voice over Internet (VoIP)
Cellular network infrastructure
Big Data
Fog Computing
Cloud Computing
The Internet of Everything (IoE)
Network management
Disaster Recovery
Quality of Services (QoS) at different layers
Cyber security
Note: Most of the listed topics are very broad, so you should narrow your research to some specific technical aspects related to the subject.
· Research Paper Guidelines
Discussion Topic
I'm Done
The different types of research can be classified as Theoretical, Empirical, and Evaluation. Theoretical research is focused on explaining phenomena through the logical analysis and synthesis of theories, principles, and the results of other forms of research such as empirical studies. Empirical research is focused on testing conclusions related to theories. Evaluation research is focused on a particular program, product or method, usually in an applied setting, for the purpose of describing, improving, or estimating its effectiveness and worth.
Research methods are broadly classified as Quantitative and Qualitative.
· Quantitative research includes experimental, quasi-experimental, correlational, and other methods that primarily involve collection of quantitative data and its analysis using inferential statistics such as t-tests, ANOVA, correlation, and regression analysis.
· Qualitative research includes observation, case studies, diaries, interviews, and other methods that primarily involve the collection of qualitative data and its analysis using grounded theory and ethnographic approaches. The Case Study method provides a way of studying human events and actions in their natural surroundings. It captures people and events as they appear in their daily circumstance. It can offer a researcher empirical and theoretical gains in understanding phenomena.
You, as an adult learner, bring a wealth of expertise to your studies. This knowledge and skills should be used to formulate a research paper that raises new questions, new possibilities, and regards existing problems from a new angle. Effecti.
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...Ringgold Inc
The third session took place on Wednesday 15 February and covered making content easily discoverable. Well-structured and complete metadata about your published works are the key to ensuring content can be easily found, purchased, and used - particularly within the emerging Demand Driven Acquisition Model. The discussion explored:
- The changing landscape of discovery and collection development
- Current industry initiatives surrounding publication metadata
- Review of discovery platforms and discovery layers
- Ringgold's ProtoView service - supporting publishers with the creation and targeted dissemination of quality metadata
If You Tag it, Will They Come? Metadata Quality and Repository ManagementSarah Currier
Presentation to Metadata Perspectives 2009, a conference held in Vienna, Austria in November 2009.
When we build collections of scholarly works, learning materials, or other educational "stuff", we want people to be able to find it. This raises a number of problems, including ensuring that resources are tagged with adequate metadata. In 2004 a pioneering paper on this issue noted:
"At its best, “accurate, consistent, sufficient, and thus reliable” (Greenberg & Robertson, 2002) metadata is a powerful tool that enables the user to discover and retrieve relevant materials quickly and easily and to assess whether they may be suitable for reuse. At worst, poor quality metadata can mean that a resource is essentially invisible within the repository and remains unused." (Currier et al, 2004).
Have the five years since the above-quoted paper was published borne out its prediction: that simply expecting resource authors to create their own metadata at upload would lead to metadata of insufficient quality? Have repository managers been able to persuade funders that including professional metadata augmentation is worth the money? What has been the impact of recent Web developments allowing easier exposure, searching and sharing of resources? How is metadata being treated within the emerging domain of open educational resources? And what does all this mean for repository managers wanting to increase the discoverability of their resources, and to implement workflows for creation of good quality metadata?
Currier, S. et al (2004) Quality assurance for digital learning object repositories: issues for the metadata creation process, ALT-J, Research in Learning Technology, Vol. 12, No. 1, March 2004
http://repository.alt.ac.uk/616/1/ALT_J_Vol12_No1_2004_Quality%20assurance%20for%20digital%20.pdf
Greenberg, J. & Robertson, W. (2003) Semantic web construction: an inquiry of authors’ views on collaborative metadata generation, Proceedings of the International Conference on Dublin Core and Metadata for e-Communities 2002, 45–52.
http://dcpapers.dublincore.org/ojs/pubs/article/viewArticle/693
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016IXIASOFT
The switch to DITA is often justified using a business plan based on the expected Return On Investment (ROI). However, DITA metrics aren't just about cost savings. They are also extremely valuable in evaluating and optimizing your production process as they can help you answer the following questions:
• Is your content effectively serving your audiences?
• Is reuse optimal?
• What are the ongoing content costs?
In this session, you will learn:
• How to set the right metrics for your organization
• How to use DITA metrics beyond cost savings
• How DITA metrics can contribute to a continual improvement process
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...Jenn Corcoran
Industrial marketers need to understand what content engineers consume, why they look for it, and how they find it.
Therefore, IEEE GlobalSpec and TREW Marketing partnered to conduct a survey in major regions of the world to learn critical information from technical professionals.
This research report details our findings, including the types of content engineers prefer, what portion of their buying process happens online, how they use content during the buying process, and more.
Content will range start with why does Text Analytics need a special session on convincing boss, followed by a role play summarizing current mistakes, a sample elevator pitch for your boss and a proposed execution plan. The content is tailored for Mid to Senior Level Managers trying to convince Leaders/Executives/Heads. It doesn’t provide any technical details –methodologies, tools, vendors or hardware investments.
This was presented at Text Analytics West Summit 2014 at San Francisco. Questions? Reach out at Ramkumar Ravichandran @ Linkedin.
Assignment 2 LASA Research ProposalSubmit your final research BenitoSumpter862
Assignment 2: LASA: Research Proposal
Submit your final research paper to the
Submissions Area
by the
due date assigned.
It should include a cover page, an abstract, an introduction, a literature review, a methodology, and a reference page.
Your final paper should be double-spaced, 8–10 pages in length, and properly edited.
Please use the following outline:
Introduction (2–3 pages)
Introduction (including the statement of the problem)
Purpose of the study
Research question and hypotheses
Theoretical framework
Operational definitions
Literature review (3 pages)
Introduction
Review of research topic (as covered by the literature)
Conclusion
Methodology (3–4 pages)
Introduction
Research design
Participants
Instruments
Procedures
Data analysis
Limitations of the study (i.e., threats to validity)
Ethical issues
Dissemination strategy
Summary
Reference page
All written assignments and responses should follow APA rules for attributing sources.
Submission Details:
By the due date assigned,
save your document as M5_A2_Lastname_Firstname.doc and submit it to the
Submissions Area
.
This LASA is worth 300 points and will be graded according to the following rubric.
Assignment Component
Proficient
Maximum Points Possible
Articulate the problem to be researched, purpose of the study, the research question and hypotheses in operational terms aligned with the theoretical framework of the research. States the research question in operational terms that make the question measurable, but neglects to articulate the primary hypothesis and the null hypothesis in operational terms or the relationship between them.
Addresses the importance of the research with limited examples of appropriate scholarly support.
Mentions the theoretical but only superficially developed. 40 Presents a comprehensive literature review in support of the proposed research question. Presents and defines the research design.
Presents limited scholarly research to support the selected research design. 40 Identify and define all relevant variables (e.g., participants).
Present procedures for obtaining informed consent. States most appropriate variables with the appropriate statistical research questions for each variable.
Provides a general description of informed consent. 40 Present a systematic description of the methodology to be used in the proposed research. States the type of data being collected.
Partially defines how that data would be collected.
Addresses some limitations, but neglected others. 40 Identify and discuss the assessment instruments to be administered and rationale. Present the empirical support for the assessments you have suggested. Stated tests or assessment procedures proposed to address forensic issues are accurate based on the information provided in the vignette and empirically supported, but underdeveloped.
Accurate but incomplete description of how these tests woul ...
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docxsherni1
ASSIGNMENT 2 - Research Proposal
Weighting: 30% towards final grade
Word limit: 3000 (-/+10%) – text only, excluding tables, appendices, references,
covers page, contents.
This is an individual piece of work
Apply the requirements of the Harvard Referencing System throughout the
report.
Use the structure appearing below:
Research Proposal Specifics
You are about to commence a new research project in a field of your choice.
You are expected to write a report that constitutes a research proposal.
1. Working individually, you will:
- Have chosen a clear and specific research question/ aim/ hypothesis for your research;
- Have contextualised your research question/ aim within the academic literature;
- Understand the philosophical and methodological bases for your research;
- Have a sound method to address the research question/ aim/ hypothesis.
2. Use Harvard style in-text citation and referencing.
3. Do not copy any materials you use word for word unless you identify these sections clearly as
quotations.
4. If you paraphrase any materials, you must identify sources through in-text referencing.
5. This is an individual assignment please do not work closely with anyone else.
6. Write 3000 words (+ or – 10%) excluding the header sheet, cover page, contents page, reference
list, footnotes and appendices.
Marks for criteria: Criteria
10% Focus and Completion Does the proposal
address the set tasks in a meaningful
manner?
20% Research Objective Does the proposal
clearly articulate
20% Synthesis and Soundness Does the
proposal place the research objective in
the context of the relevant academic
literature and any relevant past studies?
Does the discussion demonstrate a
comprehensive understanding of that
literature?
30% Research Methods and Methodology Does
the proposal sensibly outline methods for
accessing sources of data that will address
or answer the research objective? Is the
method consistent with the methodology?
10% Clarity of Approach Is the proposal well
organised, logically constructed and
attentive to the needs of the reader? Does
the timeline include an Gantt chart or key
milestones for research?
10% Mechanical Soundness Is the portfolio
clearly written, spell
Structuring the research proposal
1. Introduction (~200 words)
Explain the issue you are examining and why it is significant.
Describe the general area to be studied
Explain why this area is important to the general area under study (e.g., psychology of
language, second language acquisition, teaching methods)
2. Background/Review of the Literature (~1000 words)
A description of what has already known about this area and short discussion of why the background
studies are not sufficient.
Summarise what is already known about the field. Include a summary of the basic
background information on the topic gleaned from your literature re ...
Successful Single-Source Content Development Xyleme
This presentation looks at why single-source content development is rapidly becoming a strategic initiative within organizations. Content management experts, Dawn Stevens of Comtech & Stuart Grossman of Xyleme, show you how to design granular content for reusability across products, functions & delivery modalities and assess your organization’s readiness for the move to single source. To view webinar please visit: http://www.xyleme.com/download-form?type_of_download=Webinar&nid=218
1. Analysis of Metadata and Topic Modeling for
Academic Articles - MIS Quarterly Journal
Under the Supervision: Dr. Arun RaiBy Jigar Mehta
May 12th, 2016
GRA Work Report Submission
Spring 2016
2. Results and Insights – MIS Quarterly Journal– Descriptive Stats
• #Articles published per year has increased two fold over 20 years
• Avg. #Keywords per article has doubled over 20 years
• 82% of Articles belong to two dominant categories : Research Article/Note (60%) and Special Issue (20%)
• Avg. length of articles (number of pages) per year has witnessed a three fold increase over last 20 years
• Avg. Abstract length per article was higher in ’05-’10 but has been consistent since then (~1500 characters)
• No significant trend in Avg Title length (~100 characters) per article except for small variations by year
• For last 5 years: Avg #Tables per article ranges from 7 to 8; whereas Avg #Figures per article is around 4
• For last 5 years: Avg #References per article per year has seen a small increase (Avg ~ 85 references)
• On an average there are two authors per article
3. Results and Insights – MIS Quarterly Journal– Content Analysis
• Based on Topic Modeling on only Abstracts for last 20 years, these 8 topics are widely discussed by authors:
User/ customer centric – approach and attributes
Product/service attributes
Ethics and legal issues
Project outsourcing, teams and offshoring
Scientific studies, analysis methods and models
Firms investments, working and capabilities
Decision support systems and framework
Organizational process development and framework
Ethics and legal issues
Product/service attributes
Project outsorcing, teams and offshoring
Scientific studies, analysis methods and…
Decision support systems and framework
Firms investments, working and capabilities
User - centric
Organizational process development
6%
11%
11%
12%
13%
14%
16%
18%
TOPICS AND THEIR WEIGHTAGE
Increasing Trend of Topics :
1. product/service attributes,
2. user-centric focused approach,
3. firms investment & capability alignment
Decreasing Trend of Topics :
1. Ethics and legal issues
2. Project outsourcing
Consistent Trend of Topic:
1. Organizational processes dev
2. Scientific studies and models
3. Decision support systems
4. Project
Objective and
Framework
Discussion
MISQ Journal
- Data Fetch
Python Script
to create
Metadata and
Other tables
Python Script
for Base Table
Preparation
for Analysis
R code for
Word Clouds
and Keywords
Trend
Analysis
Academic
Papers
Descriptive
Analysis -
Code and
Results
Topic
Modeling -
R code, results
and
Presentation
Topic
Modeling -
Trend
Analysis and
Presentation
Topic
Modeling -
Multiple
iterations &
Tableau
Final Results
Visualizing Work Progress
Jan 12th
Jan 19th
Feb 2th
Feb 16th
Mar 1st
Mar 8th
Mar 22th
Apr 5th
Apr 19th
May 4th
7. Documents and words can be directly observed, topics are latent
Textual Analysis – Topic Modeling on Abstracts of Papers
8. Assumptions
Documents
• A Document is a mix of topics
• Single document can consist of many topics, but to different proportions
• A Topic is a mix of word
• Two documents with the same topics will have overlap in words
• Use statistics to find latent topics represented by groups of words
Topics
• To find topics that are as much distinct as each other
• To highlight the most heavily discussed topic(s) in each paper
• Keeping α low will lead to sparse topic distribution
• Keeping β low will lead to topics having less common words
10. Understanding Alpha and Beta parameters
α
• A high alpha-value means that
each document is likely to contain
a mixture of most of the topics,
and not any single topic
specifically
• A low alpha value puts less such
constraints on documents and
means that it is more likely that a
document may contain mixture of
just a few, or even only one, of
the topics.
β
• A high beta-value means that each
topic is likely to contain a mixture
of most of the words, and not any
word specifically, while
• A low value means that a topic
may contain a mixture of just a
few of the words.
Impact on Content
• In practice, A high alpha-value
will lead to documents being
more similar in terms of what
topics they contain.
• A high beta-value will similarly
lead to topics being more similar
in terms of what words they
contain.
11. N- iterations N- iterations α β 5 8 12 16 20
700 1500 0.02 0.02
1000 1500 0.1 0.08
2000 1500 0.3 0.1
5000 1500 0.6 0.4
8000 1500 0.8 0.6
10000 1500 1 0.8
K
Multiple Iterations – Tuning α, β, K and N – 60 Topic Models
Insights
• As α increases, topics are more evenly distributed in terms of proportion of documents they hold. Low values causes Sparse topic
distribution, High value causes topics to have common themes and hence, overlap.
• As β increases, topics are more similar in terms of the words they are made up and end up being more similar topics. Low values causes
unique topics, High values causes topic to be similar and overlap.
• As K increases, more topics are discovered. Low values causes significant topics to be missed and and higher value can cause overlapping and
similar topics.
• As N increases, topics discovery becomes stable and guarantees convergence. Low values indicated unstable and unreliable topics discovery.
12. Topic Model Result 1
(Topics= 8, Iterations = 1800, alpha = 0.61, beta = 0.4)
13. Topic Trend over years and Top words for each Topic
User –
centric
behavior
Product/
Service
attributes
Epistemological
perspectives in
IS
IS
development
/ Project
management
(outsourcing/
offshoring)
Research
Design and
Methods
IT
Strategy/
Business
Value
Changing
nature of
computing
Organizati
onal
processes
user product work project studies firm decision development
influence service theories task field firms support innovation
adoption quality managers time analysis strategic making organizations
users trust professionals communication modeling strategy virtual practice
perceived privacy quandaries projects researchers risk effectiveness technologies
usage price deception groups interpretive alignment complexity analysis
factors consumer ethical group constructs resource problem develop
intention electronic term media methods capability usersã context
security markets increase team models resources tools work
behaviors perceived stakeholder teams evaluation investments effects change
behavior products normative members case capabilities search
understandin
g
training impact challenges differences science level user action
individual content managerial control measurement significant approach practices
acceptance Market explored client construct investment world theoretical
relationship effects resolve tasks approach outsourcing develop framework
affect uncertainty law development validity benefits explanations case
support consumers conflict cultural statistical industry present processes
efficacy internet turnover offshore principles findings framework concept
implementation sales reported offshoring structural network existing developing
computer find ethics learning issues governance important role
beliefs feedback violating support techniques agility interface mechanisms
20%
7%
20%
13%
26%
12%
21%
10%
18%
13%
17%
10%
23%
27%
7%
10%
13%
11%
19%
5%
13%
7%
4%
2%
10%
6%
9%
7%
15%
8%
12%
16%
10%
15%
13% 13%
24%
13%
10%
8%
12%
13%
7%
4%
3%
7%
4%
5%
6%
3% 3%
7% 6%
4%
5%
3% 3%
8%
13%
18%
12%
22%
7%
13%
11%
8% 9%
10%
20%
10%
6%
7%
6%
7%
13%
7%
10% 10%
10%
12%
10%
13%
12%
7%
9%
8%
10%
15%
20%
12%
19%
21%
10%
9% 9%
12%
22%
6%
16%
5%
14%
18%
16%
24%
19%
6%
12%
9%
11%
15%
21%
16%
13%
20%
16%
22%
14%
18%
7%
12%
8%
9% 9%
20%
17%
14%
7%
11%
17%
11%
10% 10% 11%
19%
5%
13%
13%
21%
28%
20%
31%
21%
12%
26%
14%
12%
17%
14% 13%
25%
19% 19%
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Topic Trend over the years
user - centric product/service attributes
ethics and legal issues project outsorcing, teams and offshoring
scientific studies, analysis methods and models firms investments, working and capabilities
decision support systems and framework organizational process development
14. Pearson Correlation (Linear) amongst the topics
Topics
User –
centric
behavior
Product/
Service
attributes
Epistemol
ogical
perspecti
ves in IS
IS
development
/ Project
management
(outsourcing/
offshoring)
Research
Design
and
Methods
IT
Strategy/
Business
Value
Changing
nature of
computing
Organizati
onal
processes
User – centric behavior 1.00 -0.45 0.08 0.13 -0.12 -0.47 -0.49 0.12
Product/ Service attributes -0.45 1.00 -0.54 -0.27 0.22 0.21 0.04 -0.23
Epistemological perspectives in IS 0.08 -0.54 1.00 0.20 -0.24 -0.27 0.47 -0.20
IS development/ Project
management
(outsourcing/offshoring)
0.13 -0.27 0.20 1.00 -0.17 -0.48 -0.06 -0.17
Research Design and Methods -0.12 0.22 -0.24 -0.17 1.00 -0.04 -0.10 -0.38
IT Strategy/ Business Value -0.47 0.21 -0.27 -0.48 -0.04 1.00 0.15 -0.17
Changing nature of computing -0.49 0.04 0.47 -0.06 -0.10 0.15 1.00 -0.49
Organizational processes 0.12 -0.23 -0.20 -0.17 -0.38 -0.17 -0.49 1.00
15. Topic Model Result 2
(Topics = 8, Iterations =1500, alpha = 0.02, beta = 0.02)
20. Semantic Relatedness and TF-IDF
Semantic
Analysis
TF-IDF
Dimen-
sionality
Reduction
• Reduce high-dimensional term vector space to low-dimensional
'latent' topic space
• Two words co-occurring in a text
• signal that they are related
• document frequency determines strength of signal
• co-occurrence index
• TF: Term Frequency
• terms more frequently in document are more important
• IDF: Inverted Document Frequency
• terms in fewer documents are more specific
• TF * IDF indicates importance of term relative to the document
21. Topic Modeling Process – LDA Implementation Steps (Part 1)
• Cleaned the abstracts from as much noise as possible and lowercase all the abstract
• Replace all special characters and do n-gram tokenizing
• Lemmatizing - reducing words to their root form, e.g., “reviews” and “reviewing to “review”
• Removing numbers (e.g., “2014”) and removing HTML tags and symbols,
• Create Dictionaries, Corpus of Bag-of-Words
• Pass through LDA Algorithm and Evaluate
Vector Space Model
Bag of-
words Dictionaries
Tokeniz
ation
Lemmati
zation
Stopwords
Removal
LDA
Preprocessing
Topics and their Words
Tuning
Parameter
s
Dictionaries
Bag-of-
Words
22. Step 1:
Select β
• The term distribution β is determined for each topic by
β ∼ Dirichlet (δ).
Step 2:
Select α
• The proportions θ of the topic distribution for the document w
are determined by: θ ∼ Dirichlet (α).
Step 3:
Iterate
• For each of the N words wi
• (a) Choose a topic zi ∼ Multinomial(θ).
• (b) Choose a word wi from a multinomial probability distribution
conditioned on the topic
• zi : p(wi|zi, β).
Topic Modeling Generative Process
LDA Implementation Steps (Part 2)
For LDA the generative model consists of the following three steps :
* β is the term distribution of topics and contains the probability of a word occurring in a given topic.
* The process is purely based on frequency and co-occurrence of words
25. Number of Articles Published by the Year of Publication (1977 – 2015)
Total Papers = 1081
26. Number of Articles Published by the Category of Paper (2000-2015)
0
50
100
150
200
250
300
RESEARCH
ARTICLE
SPECIAL ISSUE RESEARCH NOTE ISSUES AND
OPINIONS
RESEARCH ESSAY THEORY AND
REVIEW
MISQ REVIEW SIM PAPER
COMPETITION
[CELLRANGE] (281)
[CELLRANGE] (111)
[CELLRANGE] (69)
[CELLRANGE] (41)
[CELLRANGE] (25) [CELLRANGE] (21)
[CELLRANGE] (7) [CELLRANGE] (3)
# Articles by Category
Total Papers = 551
27. Trend of Average # Keywords Per Article by Year (1996 – 2015)
Avg. #Keywords per article have doubled over 20 years
Total Papers = 584