Scholar Plot –
Scalable Data Visualization Methods for Academic Careers
Kyeongan (Karl) Kwon
PhD Dissertation
Department of Computer Science
University of Houston
Monday July 18, 2016
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Amy Koshoffer, University of Cincinnati
Eric J. Tepe, University of Cincinnati
A very short, very minimal presentation I prepared for the Yale Libraries' SCOPA event to introduce librarians in diverse disciplines to the concepts and challenges of data curation.
June 18, 2014
NISO Virtual Conference: Transforming Assessment: Alternative Metrics and Other Trends
Assessing and Reporting Research Impact – A Role for the Library
- Kristi L. Holmes, Ph.D., Director, Galter Health Sciences Library, Northwestern University, Feinberg School of Medicine
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Amy Koshoffer, University of Cincinnati
Eric J. Tepe, University of Cincinnati
A very short, very minimal presentation I prepared for the Yale Libraries' SCOPA event to introduce librarians in diverse disciplines to the concepts and challenges of data curation.
June 18, 2014
NISO Virtual Conference: Transforming Assessment: Alternative Metrics and Other Trends
Assessing and Reporting Research Impact – A Role for the Library
- Kristi L. Holmes, Ph.D., Director, Galter Health Sciences Library, Northwestern University, Feinberg School of Medicine
The Personal Networks of Novice Librarian ResearchersIRDL
This presentation reports the findings of an analysis of personal network data gathered from the novice librarian researcher participants of the summer workshop of the Institute for Research Design in Librarianship (IRDL), an institute designed to provide instruction in how to conduct a research project and establish a peer-network of like-minded library professionals to support each other throughout the research process. The first wave of data was gathered before the participants began IRDL, again at the completion of the workshop, at six months after completing the workshop, and will be gathered again at the one-year marker. The data gathered is about the people and the strength of the relationship in the personal research networks of each of the IRDL participants. During the presentation we will report on the observations of the research networks over time.
Highlighted in the presentation is the use of the freely available, open source, web-based software used to gather the personal network data, EgoWeb 2.0. We will describe the process of customizing the survey software to ask questions about the names of people these novice researchers go to get or give advice or help related to research, how often they interact (related to research or not), modes in which the interactions take place, and whether or not the people in the network know each other. We will report the statistical results that the software computes, about density and closeness and provides a customizable visualization of the personal network.
Online Data Analysis for Librarians using SDA and the General Social SurveyCelia Emmelhainz
This presentation overviews the difference between raw and aggregate data, when tables are useful vs. running an analysis of microdata, and how librarians could analyze data from the General Social Survey (GSS) via the SDA (survey documentation and analysis) interface. For a presentation at Maine Academic Libraries Day, 2015.
Machine Learning-Driven Caregiving for Older Adults with DementiaBarnan Das
The world’s population is aging. By 2030, the number of U.S. adults aged 65+ will be approximately 71 million. Increase in older adult population is resulting in an epidemical growth of diseases related to cognitive impairment, such as dementia. Therefore, innovative healthcare options need to be considered in order to provide quality care to our aging population and to reduce caregiver burden. Smart home technologies can play a pivotal role in disrupting conventional “caregiving”. A primary intervention that is valuable for individuals with cognitive impairment is automated prompts that aid with initiation and completion of daily activity.
We postulate that prompt timing can be automated by incorporating contextual information of activities gathered from sensors located in a smart home. To determine the ability of machine learning algorithms to generate appropriate activity-aware prompts, we performed a study in an on-campus smart home with 300 volunteer participants, aged 50+, who are healthy older adults or individuals with mild cognitive impairment.
The sensor data collected from the smart home were used to generate various contextual features of an individual’s daily activities. Machine learning algorithms were trained on the data to classify a “prompt” situation from a “no-prompt” situation. However, lack of training samples representing prompt situations raises a fundamental machine learning problem known as imbalanced class distribution. We proposed a probabilistic oversampling technique that helps in better learning of the “prompt” class. While existing approaches achieve 0-40% accuracy on predicting prompt situations correctly, our approach achieved >80% accuracy.
Responsible metrics for research - Jisc Digifest 2016Jisc
Following the publication of the "Metric tide" report - which called for more open and transparent metrics and indicators for research - a panel of experts will discuss the challenges and opportunities from both a policy and more technical level (for example highlighting some of the issues relating to standards and research infrastructure).
Discussion will also focus on implementation challenges (who, how, what and when).
Developing a multiple-document-processing performance assessment for epistem...Simon Knight
http://oro.open.ac.uk/41711/
The LAK15 theme “shifts the focus from data to impact”, noting the potential for Learning Analytics based on existing technologies to have scalable impact on learning for people of all ages. For such demand and potential in scalability to be met the challenges of addressing higher-order thinking skills should be addressed. This paper discuses one such approach – the creation of an analytic and task model to probe epistemic cognition in complex literacy tasks. The research uses existing technologies in novel ways to build a conceptually grounded model of trace-indicators for epistemic-commitments in information seeking behaviors. We argue that such an evidence centered approach is fundamental to realizing the potential of analytics, which should maintain a strong association with learning theory.
SGCI-URSSI-Sustainability in Research ComputingSandra Gesing
Sustainability in research computing has many facets such as funding and career paths for facilitators and research software engineers. The concern about sustainability is addressed in projects like the Science Gateways Community Institute (SGCI) and the conceptualization of the US Research Software Sustainability Institute (URSSI). Many further initiatives and projects are concerned with sustainability and the discussion at the ACI-REF VR Intermediate Workshop led to some consolidation ideas.
The Personal Networks of Novice Librarian ResearchersIRDL
This presentation reports the findings of an analysis of personal network data gathered from the novice librarian researcher participants of the summer workshop of the Institute for Research Design in Librarianship (IRDL), an institute designed to provide instruction in how to conduct a research project and establish a peer-network of like-minded library professionals to support each other throughout the research process. The first wave of data was gathered before the participants began IRDL, again at the completion of the workshop, at six months after completing the workshop, and will be gathered again at the one-year marker. The data gathered is about the people and the strength of the relationship in the personal research networks of each of the IRDL participants. During the presentation we will report on the observations of the research networks over time.
Highlighted in the presentation is the use of the freely available, open source, web-based software used to gather the personal network data, EgoWeb 2.0. We will describe the process of customizing the survey software to ask questions about the names of people these novice researchers go to get or give advice or help related to research, how often they interact (related to research or not), modes in which the interactions take place, and whether or not the people in the network know each other. We will report the statistical results that the software computes, about density and closeness and provides a customizable visualization of the personal network.
Online Data Analysis for Librarians using SDA and the General Social SurveyCelia Emmelhainz
This presentation overviews the difference between raw and aggregate data, when tables are useful vs. running an analysis of microdata, and how librarians could analyze data from the General Social Survey (GSS) via the SDA (survey documentation and analysis) interface. For a presentation at Maine Academic Libraries Day, 2015.
Machine Learning-Driven Caregiving for Older Adults with DementiaBarnan Das
The world’s population is aging. By 2030, the number of U.S. adults aged 65+ will be approximately 71 million. Increase in older adult population is resulting in an epidemical growth of diseases related to cognitive impairment, such as dementia. Therefore, innovative healthcare options need to be considered in order to provide quality care to our aging population and to reduce caregiver burden. Smart home technologies can play a pivotal role in disrupting conventional “caregiving”. A primary intervention that is valuable for individuals with cognitive impairment is automated prompts that aid with initiation and completion of daily activity.
We postulate that prompt timing can be automated by incorporating contextual information of activities gathered from sensors located in a smart home. To determine the ability of machine learning algorithms to generate appropriate activity-aware prompts, we performed a study in an on-campus smart home with 300 volunteer participants, aged 50+, who are healthy older adults or individuals with mild cognitive impairment.
The sensor data collected from the smart home were used to generate various contextual features of an individual’s daily activities. Machine learning algorithms were trained on the data to classify a “prompt” situation from a “no-prompt” situation. However, lack of training samples representing prompt situations raises a fundamental machine learning problem known as imbalanced class distribution. We proposed a probabilistic oversampling technique that helps in better learning of the “prompt” class. While existing approaches achieve 0-40% accuracy on predicting prompt situations correctly, our approach achieved >80% accuracy.
Responsible metrics for research - Jisc Digifest 2016Jisc
Following the publication of the "Metric tide" report - which called for more open and transparent metrics and indicators for research - a panel of experts will discuss the challenges and opportunities from both a policy and more technical level (for example highlighting some of the issues relating to standards and research infrastructure).
Discussion will also focus on implementation challenges (who, how, what and when).
Developing a multiple-document-processing performance assessment for epistem...Simon Knight
http://oro.open.ac.uk/41711/
The LAK15 theme “shifts the focus from data to impact”, noting the potential for Learning Analytics based on existing technologies to have scalable impact on learning for people of all ages. For such demand and potential in scalability to be met the challenges of addressing higher-order thinking skills should be addressed. This paper discuses one such approach – the creation of an analytic and task model to probe epistemic cognition in complex literacy tasks. The research uses existing technologies in novel ways to build a conceptually grounded model of trace-indicators for epistemic-commitments in information seeking behaviors. We argue that such an evidence centered approach is fundamental to realizing the potential of analytics, which should maintain a strong association with learning theory.
SGCI-URSSI-Sustainability in Research ComputingSandra Gesing
Sustainability in research computing has many facets such as funding and career paths for facilitators and research software engineers. The concern about sustainability is addressed in projects like the Science Gateways Community Institute (SGCI) and the conceptualization of the US Research Software Sustainability Institute (URSSI). Many further initiatives and projects are concerned with sustainability and the discussion at the ACI-REF VR Intermediate Workshop led to some consolidation ideas.
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
In March, I had the pleasure of being the inaugural speaker in a new lecture series (http://library.wustl.edu/research-data-testing/dss_speaker/dss_altman.html) initiated by the Libraries at the Washington University in St. Louis Libraries -- dedicated to the topics of data reproducibility, citation, sharing, privacy, and management.
In the presentation embedded below, I provide an overview of the major categories of new initiatives to promote research reproducibility, reliability, and reuse and related state of the art in informatics methods for managing data.
Introduction to Altmetrics for Medical and Special LibrariansLinda Galloway
Altmetrics (or alternative citation metrics) provide new ways to track scholarly influence across a wide range of media and platforms. This presentation covers altmetric fundamentals, tips on connecting your users with altmetrics, and an overview of newly published research. Presented as part of the NN/LM MAR Boost Box Series; http://nnlm.gov/mar/training/boost_mar2014.pdf
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
Keynote Integrative Bioinformatics 2018
https://docs.google.com/document/d/1E7D4_CS0vlldEcEuknXjEnSBZSZCJvbI5w1FdFh-gG4/edit
Can we improve research productivity through providing answers stemming from knowledge graphs? In this presentation, I discuss different ways of building and combining knowledge graphs.
This presentation was provided by Andrew K. Pace of OCLC, during the 13th Annual NISO-BISG forum "Interoperability: From Silos to An Ecosystem," held on June 24, 2020.
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
Presentation by Elena Zudilova-Seinstra on Elsevier's work on data and the article of the future and open data given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
Keynote at 4th Annual KnowEscape Conference, Sofia, Bulgaria (Feb 24, 2017). http://knowescape.org/knowescape2017/
Yes, we’re open: Open science & altmetrics
Abstract: Open Science is en vogue – especially after Carlos Moedas, EU-Commissioner for Research, Science and Innovation, has outlined his vision for Europe along the lines of ‘open innovation, open science, open to the world’. Open science describes the transition of ‘publishing as fast as possible’ towards ‘sharing knowledge as fast as possible’. Several reasons explain the move towards openness, it is expected, for example, that open science will increase the efficiency of science. Of course, digital media and web-based environments are keys to this development, but it also requires a systemic change to transform open science from a nice-to-have-feature into the default way of performing research. Altmetrics, i.e. social media-based metrics, are often considered drivers of open science and essential tools for changing the reward system in science. When looking closer, though, severe tensions between features as well as expectations of open science and altmetrics become apparent. The talk will argue that open science only can enfold its potential if ‘openness’ is fully embraced and supported by open metrics.
Slides from Monday 30 July - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
9. Article Author Year Conclusion
“Visualization of the citation impact environments
of scientific journals”
Journal of the American Society for Information
Science andTechnology
L Leydesdorff 2007 Effort focused on visualizing
citation patterns using a journal
data set
“Augmenting the exploration of digital libraries with
web-based visualizations”
IEEE Fourth International Conference on Digital
Information Management (ICDIM 2009)
P Bergstrom
D Atkinson
2009 Exploring patterns in the literature
using a static data set at CiteSeer
“SciVal experts: A collaborative tool”
Medical Reference Services Quarterly
EVardell
T Feddern-Bekcan
M Moore
2011 Summary of researchers’ profiles
using Scopus
“Scholarometer: A system for crowdsourcing
scholarly impact metrics”
Proceedings of the 2014 ACM Conference onWeb
Science (WebSci 2014)
J Kaur
M JafariAsbagh
F Radicchi
F Menczer
2014 Citation analysis using Google
Scholar, but no Impact Factor and
no funding information
31. Daniel M. Smith Daniel Michael Smith
M % Daniel
Daniel M %
Daniel Michael
Daniel Michael
Google Profile Funding
32. • 1.1 :The scheme should reveal causal relationships among merit criteria
• Funding + pre-production credit + post-production credit
• 1.2:The scheme should be scale invariant
• Individual or Department or College (composite personhood)
33. • Scholar Plot is good for individuals
• Not scalable to groups
No!!!
34. • 1.1 :The scheme should reveal causal relationships among merit criteria
• Funding + pre-production credit + post-production credit
• 1.2:The scheme should be scale invariant
• Individual or Department or College (composite personhood)
• Scholar Plot is good for individuals
• Not scalable to groups
• Scholar Plot draws from Google Scholar,Thompson Reuters, and OpenGov
• It is a public product working flawlessly! (ScholarPlot.com)
• Scaling interface was still pending
Work-in-progress
Done
Done
Done
Work-in-progress
40. • Local – same department • Global – same discipline
41. • Local - same department • Global - same discipline
42. • Local – same department • Global – same discipline
43.
44.
45.
46.
47.
48.
49. • 1.1 :The scheme should reveal causal relationships among merit criteria
• Funding + pre-production credit + post-production credit
• 1.2:The scheme should be scale invariant
• Individual or Department or College (composite personhood)
• Scholar Plot is good for individuals
• Not scalable to groups
• Scholar Plot draws from Google Scholar,Thompson Reuters, and OpenGov
• It is a public product working flawlessly! (ScholarPlot.com)
• Scaling interface is still pending
• Validates the design choice of the three criteria for the visualization
Done
Done
Done
50. Fall 2011
(1st year)
Spring
2012
Fall 2012
(2nd year)
Spring
2013
Fall 2013
(3rd year)
Spring
2014
Fall 2014
(4th year)
Spring
2015
Fall 2015
(5th year)
Spring
2016
S Taamneh, M Dcosta, K Kwon and I Pavlidis "SubjectBook: Web-
based Visualization Of Multimodal Affective Datasets", ACM
Human Factors in Computing Systems, CHI 2016, San Jose, CA
D Majeti, K Kwon, P Tsiamyrtzis and I Pavlidis "Dissecting
Scholarly Patterns in Biology and Computer Science", The Science
of Team Science, SciTS 2015, Bethesda, MD
K Kwon, D Shastri and I Pavlidis "Information Visualization in
Affective User Studies", The IEEE Visual Analytics Science and
Technology, IEEE Information Visualization, and IEEE Scientific
Visualization, VIS 2014, Paris, France
K Kwon, D Shastri and I Pavlidis "Interfacing Information in Affective
User Studies", The 2014 ACM International Joint Conference on
Pervasive and Ubiquitous Computing, Ubicomp 2014, Seattle, WA
T Feng, Z Liu, K Kwon, W Shi, B Carbunar, Y Jiang and N Nguyen,
"Enhancing Mobile Security with Continuous Authentication Based on
Touchscreen Gestures", The twelfth annual IEEE Conference on Technologies
for Homeland Security, HST 2012, Waltham, MA
J Lee, Z Liu, X Tian, D Woo, W Shi, D Boumber, Y Yan, and K Kwon,
"Acceleration of Bulk Memory Operations in a Heterogeneous Multicore
Architecture", 21st International Conference on Parallel Architectures and
Compilation Techniques, PACT 2012, Minneapolis
Conference Presentations
K Kwon, "Design Principles: Information Visualization in User Studies", Proceedings of the
2015 US-Korea Conference on Science, Technology and Entrepreneurship, UKC 2015 Atlanta
K Kwon, "Interfacing Information with Mixed Methods", Proceedings of the 2014 US-Korea
Conferenceon Science, Technology and Entrepreneurship, UKC 2014 San Francisco, CA
Activities / Membership
2012 PhD Student Association Officer
2014 Computer Science PhD Showcase
2014 Graduate Research and Scholarship Projects (GRaSP)
2015 Graduate Research and Scholarship Projects (GRaSP)
2016 Volunteering Judges
M.S.Switched Lab
Released Released
Thank you for coming for my presentation today.
My name is Kyeongan Kwon.
Today, I am going to present my PhD dissertation about scalable data visualization for academic careers.
This is overview about today’s presentation.
In introduction, I am going to talk about the research problem and related works.
In Design philosophy and methodology, I am going to cover design philosophy and how the philosophy apply to the product.
Before starting to present my research,
I would like to talk about a little bit what data visualization is exactly.
There are many benefits of it, however, I point out two main things.
Qualitative analysis is the scientific study of data that can be observed, but not measured.
Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports.
A Good Research Problem
There are difficult and time consuming tasks
when Appraising academic careers.
For examples,
CV is still very popular to evaluate.
- it is where your publishing and research history as well as your memberships and fellowships.
Why it is inconsistent. For example, the way people list their publications is inconsistent.
Journal (IF or not) , Conference (Acceptant rate or not )
Three are three goals of research.
Some related works is out three. Software products.
Also there are some literature related work.
There is no definitive scheme for evaluating academics.
Because everybody has own ideas.
Big journal is good, many cited publication is also good.
This isn’t clear, how it is related to each other.
Let me tell you how I structured this visual abstract based on three merit criteria.
- Impact of the intellectual products, what happened after you published.
- Prestige of the venues where intellectual products appear, How famous’ places where you published.
- Funding that enables intellectual production
Why impact on the vertical scale is not a Impact Factor,
but citation because of the fixed scale of Impact Factor (1-60), citations could reach 100,000
I have three figures of the same scholar in order to see the difference.
Google Scholar Profile. Only one bar chart ; publication list
Curriculum Vitae, a brief account of a person's education, qualifications, and previous experience.
However, Scholar Plot brings more but simply. It includes all the publications with different colors and symbols which lead people could distinguish type of publication quickly. It also includes funding information.
This is one result of a scholar who have about 300 publications includes all Journal/ Conference and Patents.
It is easy to identify what type of publication powers the individual’s scholarship.
How Differentiate different prestige
Journal has little different prestige because it has only a widely accepted ranking system that is the Journal Impact Factor.
Conference has acceptance rates and some general guide line to tell the rank of conferences like A+ / A. However, some has no these information.
Patents have no this type of information.
We chose disks to be the symbols of journal publications, as disk scaling can be done very effectively by simply varying its radius (A ∼ r2).
Size
How do I present this value from lower to higher IF.
I did some data analytics for this.
Frequency of Journal published by Thomson Reuters
Four categories based on histogram analysis. I use this to commensurate the size of disk.
Two different scales views to create a standardized scale for the y-axis for comparison, I introduced a log10 scale for the default plot and an option to toggle to the decimal scale view.
If you have a senior records, and put it as a decimal scale, it suppress visualization.
However, junior records isn’t important because you hardly can see the difference.
That’s why Logarithm is default. It makes the difference visibility.
Scholar Plot also depicts the NSF/NIH/NASA funding of an individual as a multiline in log2 scale
Line Chart is proper for me to use visualizing funding data because it is commonly used like stock markets.
There are some patterns of scholarly profiles.
I bring three example.
not necessarily look at the main plots, you can simply see the vertical projection and tell the type of patterns.
Now I had a prototype including publication records and funding records.
I wanted to make it improved.
So, I gave two evaluations both focus group and user study.
A total of n = 15 participants from various natural, mathematical, and social sciences participated in the user study.
Usability, accuracy and intuitive understanding of Scholar Plot scored the highest (μ = 4.2).
The participants were planning to use Scholar Plot frequently (μ = 4.1).
Overall, the survey confirmed that the base level of Scholar Plot is a user- friendly tool that academic users find of interest and value.
through a focus group
I added 4 panels below scholar plot results.
Team science information.
- the number and intensity of collaborations for the depicted scholar.
Impact information
- Summarize highly cited papers.
Prestige information
- the specific journals where the scholar publishes most often and their impact factors.
We have another feature that interact an each plot.
When you interact with a plot, it shows the tooltip, which includes paper title, year of publication, the number of citations, journal or conference name and journal impact factor.
this is optional
Data sources
Impact which is citations from Google Scholar
Prestige which is Impact Factor from Thomson Reuters
User visit to Scholar Plot
Type the scholar name to send HTTP request in Ajax(asynchronous JavaScript and XML) and jQuery
Web server fetches scholar data from Google Scholar Profile with user input parameter
Web server connects database server in order to fetch the data (Author, Impact Factor, Funding information)
Web server returns data as a Json format to user with HTTP response
Web browsers renter the data in SVG using HTML5+CSS
The biggest problem is middle name
Matching
Title will be placed
1. remove
2. matching
I presented Scholar Plot (SP), a compact and scalable (individual-department-college) visualization interface for academic merit. We released SP at http://scholarplot.com.
The basic idea behind SP is to facilitate instant deeper insights regarding different strengths of academic records, supporting the work of evaluation committees and the curious academic in search of an advisor or department. One of SP’s strengths is that it draws data from open sources that are inclusive.
1. Funding information - NSF/NIH/NASA
2. Citation
3. Impact Factor - size and colors
You can simply understand the concept and information of department
Local - Local scale Y-axis is automatically scaled to the highest citation received by faculty in the department. Quartiles are calculated using the faculty within the same department. Citation / Impact Factor / Funding
Global – Quartiles are calculated using the faculty within the same discipline.
We used CIP (The National Center for Education Statistics designed the Classification of Instructional Program) codes to map the departments to disciplines.
In Global, scale below the cloud is fixed with 20,000 which is 90 percentile of all faculty. scale above the cloud is fixed with 300,000 which is approximately the highest citation of faculty
Guess how do you think local,
Truly everyone in department are good.
To validate design choices – prominent group (chaired)
1. Ground truth (standard, base)
2. Linear Model – proof
3. Variables - Mirror to visualization
I ran data analysis for validation of academic garden design.
CS – n= 248 of 515 , n=61 of 130
Bio – n=152 of 381 , n=32 of 97
This linear model evaluates the correlation chaired professors with the three variables displayed in Academic Garden.
Total Citations, Mean IF, Total Funding (NSF, NIH, NASA)
Is_chair (1) – if the faculty is in the top quartile for either of the 3 criteria – funding, mean IF, total citations
Is_chair (0) – if the faculty is not in the top quartile for all the three criteria
Quartiles are calculated with respect to the department to which the faculty belongs to.
ex., if a faculty is in the top quartile for citations but not for funding and impact factor, he still gets 1 according to this measure/metric
All three criteria as separately factors
In computer science, citations are very important. And you can clearly see the chaired faculty. Significantly related to citation.
Because they don’t publish so much journals, impact factor are not significant. Also, our funding sources are not included everything which computer science faculty received funding from.
local_top_q_any_of_three
1 – if the faculty is in the top quartile for either of the 3 criteria – funding, mean IF, total citations
0 – if the faculty is not in the top quartile for all the three criteria
Quartiles are calculated with respect to the department to which the faculty belongs to.
ex., if a faculty is in the top quartile for citations but not for funding and impact factor, he still gets 1 according to this measure/metric
All three criteria as separately factors
In computer science, citations are very important. And you can clearly see the chaired faculty. Significantly related to citation.
Because they don’t publish so much journals, impact factor are not significant. Also, our funding sources are not included everything which computer science faculty received funding from.
Linear Model: Academic Garden
This linear model evaluates the correlation chaired professors have with the three variables displayed in Academic Garden.
Why ? Validate design choices – prominent group (chaired)
1. Ground truth (standard, base)
2. Linear Model – proof
3. Quartile - Mirror to visualization
This data analysis validates the design choice of the three criteria for our visualization!
We presented Scholar Plot (SP), a compact and scalable (individual-department-college) visualization interface for academic merit. We released SP at http://scholarplot.com.
The basic idea behind SP is to facilitate instant deeper insights regarding different strengths of academic records, supporting the work of evaluation committees and the curious academic in search of an advisor or department. One of SP’s strengths is that it draws data from open sources that are inclusive.
I would like to thank Dr. Pavlidis being my research advisor. And I would like thank each of you for serving on my committee.
Done.