A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others.
The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).
More information about this may be accessed at http://scalar.usc.edu/works/c2c-digital-magazine-spring--summer-2017/wrangling-big-data-in-a-small-tech-ecosystem.
NCERT Class 10 First Flight Chapter-1 a Letter To GodPragyaC1
Hi guys!
This is my PPT for Chapter no. 1 of First Flight textbook Class 10. I have made sure to include all the topics necessary in the chapter. I will be easily understood by you. Like for more chapter presentations in English and Science
This is a book review on Faces in the water. This was written by Ranjit Lal. Its a nice book for children that deals with the topic female infanticide.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Education must capitalize on the trend within technology toward big data. New types of data are becoming available. From evidence approaches to xAPI and the whole Training and Learning Architecture(TLA) big data is the foundation of all.
Question Answering has been a well-researched NLP area over recent years. It has become necessary for
users to be able to query through the variety of information available - be it structured or unstructured. In
this paper, we propose a Question Answering module which a) can consume a variety of data formats - a
heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal
discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to
the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts
when deemed relevant, based on user query and business context. Our solution provides a comprehensive
and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying
modules. Our solution is capable of handling a plethora of data sources such as text, images, tables,
community forums, and flow charts. Our studies performed on a variety of business-specific datasets
represent the necessity of custom pipelines like the proposed one to solve several real-world document
question-answering
NCERT Class 10 First Flight Chapter-1 a Letter To GodPragyaC1
Hi guys!
This is my PPT for Chapter no. 1 of First Flight textbook Class 10. I have made sure to include all the topics necessary in the chapter. I will be easily understood by you. Like for more chapter presentations in English and Science
This is a book review on Faces in the water. This was written by Ranjit Lal. Its a nice book for children that deals with the topic female infanticide.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Education must capitalize on the trend within technology toward big data. New types of data are becoming available. From evidence approaches to xAPI and the whole Training and Learning Architecture(TLA) big data is the foundation of all.
Question Answering has been a well-researched NLP area over recent years. It has become necessary for
users to be able to query through the variety of information available - be it structured or unstructured. In
this paper, we propose a Question Answering module which a) can consume a variety of data formats - a
heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal
discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to
the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts
when deemed relevant, based on user query and business context. Our solution provides a comprehensive
and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying
modules. Our solution is capable of handling a plethora of data sources such as text, images, tables,
community forums, and flow charts. Our studies performed on a variety of business-specific datasets
represent the necessity of custom pipelines like the proposed one to solve several real-world document
question-answering
Question Answering has been a well-researched NLP area over recent years. It has become necessary for
users to be able to query through the variety of information available - be it structured or unstructured. In
this paper, we propose a Question Answering module which a) can consume a variety of data formats - a
heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal
discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to
the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts
when deemed relevant, based on user query and business context. Our solution provides a comprehensive
and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying
modules. Our solution is capable of handling a plethora of data sources such as text, images, tables,
community forums, and flow charts. Our studies performed on a variety of business-specific datasets
represent the necessity of custom pipelines like the proposed one to solve several real-world document
question-answering.
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
Toward a System Building Agenda for Data Integration
(and Data Science)
AnHai Doan, Pradap Konda, Paul Suganthan G.C., Adel Ardalan, Jeffrey R. Ballard, Sanjib Das,
Yash Govind, Han Li, Philip Martinkus, Sidharth Mudgal, Erik Paulson, Haojun Zhang
University of Wisconsin-Madison
Abstract
We argue that the data integration (DI) community should devote far more effort to building systems,
in order to truly advance the field. We discuss the limitations of current DI systems, and point out that
there is already an existing popular DI “system” out there, which is PyData, the open-source ecosystem
of 138,000+ interoperable Python packages. We argue that rather than building isolated monolithic DI
systems, we should consider extending this PyData “system”, by developing more Python packages that
solve DI problems for the users of PyData. We discuss how extending PyData enables us to pursue an
integrated agenda of research, system development, education, and outreach in DI, which in turn can
position our community to become a key player in data science. Finally, we discuss ongoing work at
Wisconsin, which suggests that this agenda is highly promising and raises many interesting challenges.
1 Introduction
In this paper we focus on data integration (DI), broadly interpreted as covering all major data preparation steps
such as data extraction, exploration, profiling, cleaning, matching, and merging [10]. This topic is also known
as data wrangling, munging, curation, unification, fusion, preparation, and more. Over the past few decades, DI
has received much attention (e.g., [37, 29, 31, 20, 34, 33, 6, 17, 39, 22, 23, 5, 8, 36, 15, 35, 4, 25, 38, 26, 32, 19,
2, 12, 11, 16, 2, 3]). Today, as data science grows, DI is receiving even more attention. This is because many
data science applications must first perform DI to combine the raw data from multiple sources, before analysis
can be carried out to extract insights.
Yet despite all this attention, today we do not really know whether the field is making good progress. The
vast majority of DI works (with the exception of efforts such as Tamr and Trifacta [36, 15]) have focused on
developing algorithmic solutions. But we know very little about whether these (ever-more-complex) algorithms
are indeed useful in practice. The field has also built mostly isolated system prototypes, which are hard to use and
combine, and are often not powerful enough for real-world applications. This makes it difficult to decide what
to teach in DI classes. Teaching complex DI algorithms and asking students to do projects using our prototype
systems can train them well for doing DI research, but are not likely to train them well for solving real-world DI
problems in later jobs. Similarly, outreach to real users (e.g., domain scientists) is difficult. Given that we have
Copyright 0000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purpose ...
Modern Database Management 12th Global Edition by Hoffer solution manual.docxssuserf63bd7
https://qidiantiku.com/solution-manual-for-modern-database-management-12th-global-edition-by-hoffer.shtml
name:Solution manual for Modern Database Management 12th Global Edition by Hoffer
Edition:12th Global Edition
author:by Hoffer
ISBN:ISBN 10: 0133544613 / ISBN 13: 9780133544619
type:solution manual
format:word/zip
All chapter include
Focusing on what leading database practitioners say are the most important aspects to database development, Modern Database Management presents sound pedagogy, and topics that are critical for the practical success of database professionals. The 12th Edition further facilitates learning with illustrations that clarify important concepts and new media resources that make some of the more challenging material more engaging. Also included are general updates and expanded material in the areas undergoing rapid change due to improved managerial practices, database design tools and methodologies, and database technology.
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
Workshop session at IWMW 2002 on "The Value of Metadata and How to Realise It" facilitated by Dennis Nicholson.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2002/materials/nicholson/
The project is to ask college related queries and get the responses through a chatbot an Artificial Conversational Entity. This System is a web application which provides answer to the query of the student. Students just have to query through the bot which is used for chatting. Students can chat using any format there is no specific format the user has to follow. This system helps the student to be updated about the college activities.
Over recent years, big data, a huge amount of structured and unstructured data is generated from social Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper, multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the classification accuracy over multi class classification based on single-tier architecture by 7%.
Module 3 SLP will introduce the basic concepts of computer network.docxraju957290
Module 3 SLP will introduce the basic concepts of computer networks. The IT infrastructure uses a mixture of computer hardware from different vendors. Large and complex databases that need central storage are found on mainframes or specialized servers, whereas smaller databases and parts of large databases are loaded on PCs and small servers. Client-server computing is often used to distribute more processing power to the desktop. The course materials take a look at the different types of networks that exist, with the primary focus on the LAN. The readings in computer networks continue with an introduction to the concept of layers, which is central to understanding how computer networks operate.
SLP Assignment Expectations
After reading the articles, please answer the following questions and prepare a PPT presentation with 10-12 slides, excluding cover slide and reference list slide.
What is the significance of telecommunications for organizations and society? What is a telecommunications system? What are the principle functions of all telecommunications systems? Briefly describe the company where these systems will be in place and then explain your reasoning for its details.
Assignment Expectations
Your presentation will be evaluated on the following criteria:
Answers to the questions and the accompanying explanation must be given in 10-12 slides excluding cover and reference slides.
· Precision: You see what the module is all about and structure your paper accordingly. You draw on a range of sources and establish your understanding of the historical context of the question. You carry out the exercise as assigned or carefully explain the limitations that prevented your completing some parts. (Running out of time isn’t generally considered an adequate limitation).
· Clarity: Your answers are clear and show your good understanding of the topic. You see what the module is all about and structure your paper accordingly.
· Critical thinking: The paper incorporates your reactions, examples, and applications of the material to business and illustrates your reflective judgment and good understanding of the concepts. It is important to read the "Required Reading" in the Background material plus other sources you find relevant.
· Breadth and Depth: You provide informed commentary and analysis—simply repeating what your sources say does not constitute an adequate paper. The scope covered in your paper is directly related to the questions of the assignment and the learning outcomes of the module.
· Overall quality: You apply the professional language and terminology of systems design and analysis correctly and in context; you are familiar with this language and use it appropriately. Your paper is well written and the references, where needed, are properly cited and listed (refer to the APA Purdue Online Writing Lab athttps://owl.english.purdue.edu/owl/resource/560/01/) if you are uncertain about formats or other issues.
11-*
Information Systems: ...
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
More Related Content
Similar to Leveraging Flat Files from the Canvas LMS Data Portal at K-State
Question Answering has been a well-researched NLP area over recent years. It has become necessary for
users to be able to query through the variety of information available - be it structured or unstructured. In
this paper, we propose a Question Answering module which a) can consume a variety of data formats - a
heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal
discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to
the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts
when deemed relevant, based on user query and business context. Our solution provides a comprehensive
and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying
modules. Our solution is capable of handling a plethora of data sources such as text, images, tables,
community forums, and flow charts. Our studies performed on a variety of business-specific datasets
represent the necessity of custom pipelines like the proposed one to solve several real-world document
question-answering.
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
Toward a System Building Agenda for Data Integration
(and Data Science)
AnHai Doan, Pradap Konda, Paul Suganthan G.C., Adel Ardalan, Jeffrey R. Ballard, Sanjib Das,
Yash Govind, Han Li, Philip Martinkus, Sidharth Mudgal, Erik Paulson, Haojun Zhang
University of Wisconsin-Madison
Abstract
We argue that the data integration (DI) community should devote far more effort to building systems,
in order to truly advance the field. We discuss the limitations of current DI systems, and point out that
there is already an existing popular DI “system” out there, which is PyData, the open-source ecosystem
of 138,000+ interoperable Python packages. We argue that rather than building isolated monolithic DI
systems, we should consider extending this PyData “system”, by developing more Python packages that
solve DI problems for the users of PyData. We discuss how extending PyData enables us to pursue an
integrated agenda of research, system development, education, and outreach in DI, which in turn can
position our community to become a key player in data science. Finally, we discuss ongoing work at
Wisconsin, which suggests that this agenda is highly promising and raises many interesting challenges.
1 Introduction
In this paper we focus on data integration (DI), broadly interpreted as covering all major data preparation steps
such as data extraction, exploration, profiling, cleaning, matching, and merging [10]. This topic is also known
as data wrangling, munging, curation, unification, fusion, preparation, and more. Over the past few decades, DI
has received much attention (e.g., [37, 29, 31, 20, 34, 33, 6, 17, 39, 22, 23, 5, 8, 36, 15, 35, 4, 25, 38, 26, 32, 19,
2, 12, 11, 16, 2, 3]). Today, as data science grows, DI is receiving even more attention. This is because many
data science applications must first perform DI to combine the raw data from multiple sources, before analysis
can be carried out to extract insights.
Yet despite all this attention, today we do not really know whether the field is making good progress. The
vast majority of DI works (with the exception of efforts such as Tamr and Trifacta [36, 15]) have focused on
developing algorithmic solutions. But we know very little about whether these (ever-more-complex) algorithms
are indeed useful in practice. The field has also built mostly isolated system prototypes, which are hard to use and
combine, and are often not powerful enough for real-world applications. This makes it difficult to decide what
to teach in DI classes. Teaching complex DI algorithms and asking students to do projects using our prototype
systems can train them well for doing DI research, but are not likely to train them well for solving real-world DI
problems in later jobs. Similarly, outreach to real users (e.g., domain scientists) is difficult. Given that we have
Copyright 0000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purpose ...
Modern Database Management 12th Global Edition by Hoffer solution manual.docxssuserf63bd7
https://qidiantiku.com/solution-manual-for-modern-database-management-12th-global-edition-by-hoffer.shtml
name:Solution manual for Modern Database Management 12th Global Edition by Hoffer
Edition:12th Global Edition
author:by Hoffer
ISBN:ISBN 10: 0133544613 / ISBN 13: 9780133544619
type:solution manual
format:word/zip
All chapter include
Focusing on what leading database practitioners say are the most important aspects to database development, Modern Database Management presents sound pedagogy, and topics that are critical for the practical success of database professionals. The 12th Edition further facilitates learning with illustrations that clarify important concepts and new media resources that make some of the more challenging material more engaging. Also included are general updates and expanded material in the areas undergoing rapid change due to improved managerial practices, database design tools and methodologies, and database technology.
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
Workshop session at IWMW 2002 on "The Value of Metadata and How to Realise It" facilitated by Dennis Nicholson.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2002/materials/nicholson/
The project is to ask college related queries and get the responses through a chatbot an Artificial Conversational Entity. This System is a web application which provides answer to the query of the student. Students just have to query through the bot which is used for chatting. Students can chat using any format there is no specific format the user has to follow. This system helps the student to be updated about the college activities.
Over recent years, big data, a huge amount of structured and unstructured data is generated from social Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper, multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the classification accuracy over multi class classification based on single-tier architecture by 7%.
Module 3 SLP will introduce the basic concepts of computer network.docxraju957290
Module 3 SLP will introduce the basic concepts of computer networks. The IT infrastructure uses a mixture of computer hardware from different vendors. Large and complex databases that need central storage are found on mainframes or specialized servers, whereas smaller databases and parts of large databases are loaded on PCs and small servers. Client-server computing is often used to distribute more processing power to the desktop. The course materials take a look at the different types of networks that exist, with the primary focus on the LAN. The readings in computer networks continue with an introduction to the concept of layers, which is central to understanding how computer networks operate.
SLP Assignment Expectations
After reading the articles, please answer the following questions and prepare a PPT presentation with 10-12 slides, excluding cover slide and reference list slide.
What is the significance of telecommunications for organizations and society? What is a telecommunications system? What are the principle functions of all telecommunications systems? Briefly describe the company where these systems will be in place and then explain your reasoning for its details.
Assignment Expectations
Your presentation will be evaluated on the following criteria:
Answers to the questions and the accompanying explanation must be given in 10-12 slides excluding cover and reference slides.
· Precision: You see what the module is all about and structure your paper accordingly. You draw on a range of sources and establish your understanding of the historical context of the question. You carry out the exercise as assigned or carefully explain the limitations that prevented your completing some parts. (Running out of time isn’t generally considered an adequate limitation).
· Clarity: Your answers are clear and show your good understanding of the topic. You see what the module is all about and structure your paper accordingly.
· Critical thinking: The paper incorporates your reactions, examples, and applications of the material to business and illustrates your reflective judgment and good understanding of the concepts. It is important to read the "Required Reading" in the Background material plus other sources you find relevant.
· Breadth and Depth: You provide informed commentary and analysis—simply repeating what your sources say does not constitute an adequate paper. The scope covered in your paper is directly related to the questions of the assignment and the learning outcomes of the module.
· Overall quality: You apply the professional language and terminology of systems design and analysis correctly and in context; you are familiar with this language and use it appropriately. Your paper is well written and the references, where needed, are properly cited and listed (refer to the APA Purdue Online Writing Lab athttps://owl.english.purdue.edu/owl/resource/560/01/) if you are uncertain about formats or other issues.
11-*
Information Systems: ...
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some funds—small and big, one-off and continuing—to conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakes…from both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
CrAIyon (formerly DALL-E after Salvador “Dali”) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Augmented reality (AR)—the use of digital overlays over physical space—manifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe Aero®) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Leveraging Flat Files from the Canvas LMS Data Portal at K-State
1. Leveraging “Flat Files”
from the Canvas LMS
Data Portal (at K-State)
SIDLIT 2017 | LMS Preconference
Colleague 2 Colleague
August 2, 2017
2. Presentation
A lot of data are created in an LMS instance, and much of this can be
analyzed for insight. In 2016, Instructure, the makers of Canvas, made their
LMS data available to their customers through a data portal (updated
monthly). This portal enables access to a number of flat files related to that
particular instance. This presentation showcases how this big data was
analyzed on a regular laptop with basic office software, to summarize
Kansas State University’s use of the LMS. Methods for analysis include the
following: basic descriptive statistics, survival analysis, computational
linguistic analysis, and others.
2
3. Presentation (cont.)
The results are reported out with both numbers and data visualizations,
including classic pie charts, line graphs, bar charts, mixed-charts, word
clouds, and others. The findings provide some insights about how to
approach the data, how to use a data dictionary, and other methods for
extracting the data for awareness and practical decision-making. This
work also is suggestive of next steps for more advanced analysis (using the
flat files in a SQL database).
More information about this experience may be accessed on SlideShare
through an article download titled “Wrangling Big Data in a Small Tech
Ecosystem” at http://www.slideshare.net/ShalinHaiJew/wrangling-big-data-
in-a-small-tech-ecosystem (orig. from Oct. 2016). The original article
“Wrangling Big Data in a Small Tech Ecosystem” is from C2C Digital
Magazine.
3
4. Presentation Order
Canvas LMS at Kansas State University (K-State)
Canvas LMS Data Portal and Flat Files
The Summary Data
Some Practical Applications
Moving Forward with the Data
4
5. General Approach
Framework
Approaches
An instructional design approach
What can enhance teaching and
learning?
A researcher approach
What can enhance accurate
data collection, usage, researcher
awareness, and decision-making?
Using all data (every part!)
Using all basic software tools
available on a regular machine
Data Clients on a
Campus
Faculty
Staff
System Administrators
Leaders
Students
Analysts
5
7. LMS History at K-State
Homegrown Learning Management System (LMS) (Axio Learning)
Informed by faculty, admin, and staff needs (IT Help Desk tickets, focus groups
with faculty and staff)
Software updates rolled out annually with some patches in-between
Built mostly by K-State graduates and professional developers (often hired from
student ranks)
Instructure’s Canvas LMS at K-State (2013 – present)
Availability of the data portal in 2016
Monthly updates of select data from the particular instance
Accessed at K-State in October 2016
7
8. An Early Brainstorm
Brainstorm beneficial questions (data queries) before exploring the data, so
you’re not limited by the found data, and keep these in mind even after
the initial data exploration. It is important to conceptualize what may be
practically helpful through the informed imagination first.
It would be helpful to continue with the brainstorming as the data are
explored.
8
9. Initial Brainstormed Questions
What can be reported out at various levels: university, college,
department, course, and individual?
Is it possible to make observations about course design? Learner
engagement (Discussions? Conversations?)? Advising? Technology usage
(such as external tools)? Uses of the LMS site for non-course applications?
What sorts of manual-created courses exist, and how are these used?
What percentage of the courses are these manual types of courses?
9
10. Initial Brainstormed Questions (cont.)
How closely is it possible to map the data of a learner’s trajectory? A
group’s trajectory?
What are some attributes to use to identify various groups? Which attributes
would be helpful? What sorts of group-specific questions may be asked?
For example, is it possible to identify high-performing groups vs. low-performing
groups in order to run analytics to see what differences there may be between
the two?
What may be understood about the learning going on in a particular
course? A learning sequence?
Are there ways to understand effective support for learners and support for
learning from this data?
10
11. Required Preliminary Understandings
Need to understand the front-end view of the LMS and its general uses on
campus; otherwise, the back-end data view will be looking through a mirror
darkly
Need to understand what terms are applied to the various types of data
(because you want to be on the same page with the creators and users of the
LMS)
Need to have experiences with the various analytical technologies applied to
the particular data because various queries require different data processing
and data structures
Will be applying the following: descriptive statistics, inferential statistics, direct
data queries, linguistic analysis, survival analysis, sentiment analysis, topic
modeling, and others
Will ultimately be applying more complex machine learning as well
11
12. Required Preliminary Understandings
(cont.)
Need understandings of “states” of being for various objects in an LMS
Need ability to identify anomalies and the skills to interpret what these
might mean
Need to know what data mean and where to dig deeper for more relevant
information
Need to know where noise might enter a particular dataset or an analytical
process…and to head off the introduction of or inclusion of noise
12
14. Canvas Data Portal
Data updated once a month (then, now, daily)
Live dynamic data may be accessed via a higher level of service
Flat files (in compressed .gz format for download with 7Zip) downloaded
from SQL servers
Also known as table data (albeit without defined structural relationships between
records and therefore “flat”)
May contain labeled data like numbers
May contain unstructured or semi-structured data like texts, names, messages, and
others
Contain content data (messaging), trace data (interaction data), and some
metadata (data about data, often riding on imagery and multimedia)
Data described in a formal data dictionary
14
15. “Flat Files” Strengths and Weaknesses
Strengths
Manageable on a small-scale
laptop
Can ask questions across several
flat files
Weaknesses
Lack relational data between the
various flat files
Cannot query data effectively
across the various data tables
(because the relationships are not
defined)
Lack access to identifier column
Lack access to the foreign key
15
16. Data Dictionary
A reference resource that describes particular data
Documentation of data captured in the Canvas Data warehouse
Helpful for understanding naming protocols of the various data types
The following is a verbatim example:
16
Name Type Description
assignment_id bigint (big integer) Foreign key to the
assignment the
override is associated
with. May be empty.
23. Order: First Data Visualizations and
Then Light Text Commentary
The data visualizations come first…so that the audience may analyze the
data to see what it says
The summary analyses come directly after the visualization, so there is a
kind of debriefing
23
25. Purposeful Blur and Block
Need to know how to protect against data leakage
Never share the underlying dataset
Never share unique identifiers
Always double check screen grabs against accidental inclusion of personally
identifiable data (PII); use effective redaction if PII is viewable
When redacting, make sure that the redaction cannot be reversed (backwards iterated
or some other strategy) and a person re-identified
Check that no metadata is riding with multimedia being released
Any personally identifiable information (PII) is obfuscated here
No granular level of data was captured in the article
25
27. Workflow
1. Conceptualizing questions and applications of the data
2. Review of the dataset information
3. Data download
4. Data extraction
5. Data processing (cleaning) and analytics
6. Validating / invalidating the findings
7. Additional data analytics
8. Write-up for presentation
9. Data and informational materials archival
27
37. Date Restriction Accesses for Course
Sections
Non-defined (default) as the majority
Restricted section access (by learner name) to defined dates
Non-restricted (all participants in the course welcome) section access to
defined dates
37
44. Time Features for Assignments
Half of assignments with no time allotment
Other half with time features
Due_at, no unlock_at, no look_at
Due_at, lock_at, unlock_at (all three)
44
48. Some Linguistic Features of the
Assignment Titles and Descriptions
Analytic: 91.69
“Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and-
now, and narrative thinking”
Clout: 73.25
“perspective of high expertise” and confidence vs. “more tentative, humble, even
anxious style”
Authentic: 11.83
“more honest, personal, and disclosing text” vs. “a more guarded, distanced form of
discourse”
Tone: 64.98
“a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional
tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22)
48
50. Delving into Topics of Interest
Identifying words (names, formulas, dates, symbols, etc.)-of-interest
Using NVivo 11 Plus to create word trees with the target term as the seeding
topic
Ability to double-click on the respective branches to link back to the
original source data files
50
56. Survival Function of Assignments to
Update
How long does it take before an assignment is updated?
At what point does an assignment seem to be “safe” against update?
What are some ways to understand assignments that are updated some
1,000 days after the date of creation?
Is it possible that some assignments were transferred over from a prior LMS
through an LTI-enabled process that might have captured the very first moment
of creation for that assignment? (“LTI” refers to the Learning Tools Interoperability
standard created by the IMS Global Learning Consortium.)
56
62. A Survey of Quiz Types
Assignment
Practice quiz
Graded survey
Survey
Affordances of the various quiz types change over time, so it is important to
update on the various functions and capabilities even as one is looking at
the data.
62
66. Quiz Question Workflow States
unpublished (default)
published
deleted
So a majority of quiz questions are created / drafted but held in reserve
and not published.
What are some possible inferences that can be made from the instance-
scale statistics and numbers?
66
68. An Inclusive Scatterplot of Quiz Point
Values
min-max range: 0 – 23,700 points per quiz
average quiz value: 33 points (w/o zeroes average in) and 28 points (with
zeroes averaged in)
The 23,700 occurred twice, which suggests that it might be purposeful. That
huge number, though, pulls the curve, and in a normal research context,
such an outlier would likely be omitted to erase its pull on the curve, which
would result in skew. A zoom-in would require going to the particular
instructor and course. That might require a different approach to the data
than described in this work…such as re-animating all the flat files in a SQL
database and using unique identifiers to connect related data.
68
70. Histogram of Quiz Point Values in LMS
Instance (with a normal curve)
Frequency of point values for quizzes
Tendencies
Most at the lower number values
70
72. Survival Curve of Deleted Quizzes in
LMS Instance
Based on timestamp data, how long does it take for a deleted quiz to
achieve “event” or be deleted (from its moment of creation)?
In this dataset, 22% of quizzes were deleted (14,769/66,366).
The min-max day range for the quiz deletions ranged from 0 - 813 days.
A survival analysis showed that the estimated survival time of quizzes that
were deleted were 23.6 days, with a lower bound of 22.7 and an upper
bound of 24.4 in the 95% confidence interval; the standard error was .419.
The median survival time--of the deleted quizzes--was a low 2 days, which
means if a quiz is to be deleted, it usually happens fairly early.
The drop-off in the curve below is steep but tapers off after about several
months.
72
74. One Minus Survival Function Curve for
Deleted Quizzes in the LMS Instance
Shows how long a quiz survives before it is deleted from a set of quizzes that
were ultimately deleted
74
76. Hazard Function for Deleted Quizzes in
the LMS Instance
All quizzes in the set were ultimately deleted
This linegraph shows time-to-event of when quizzes were deleted from their
respective creation-dates in the LMS instance.
All quizzes listed here ultimately were deleted.
The hazard function curve sometimes shows particular time-patterns of
when a quiz is most at risk of deletion…but this curve only generally shows a
steep rise initially and then a gradual achievement of time-to-event.
76
90. Submission Comment Participation
Type
Admin
Submitter
Author
So administrators all comment on learner submissions, but not all authors or
submitters comment. In other words, the creator of contents may submit
the file without comment.
90
93. Uploads and Revisions of Files to the
LMS Instance by Year
A sense of the university’s transition to the LMS, over multiple years (so
caution)
93
101. Wikis and Wiki Pages
A “wiki” in Canvas is a page with its history captured and able to be
reinstituted (enabled by wiki software)
Pages may be interconnected
A page may be set as the home page
A page may be embedded in a modular sequence
A page may contain the MediaSite video
A page may contain any number of contents: imagery, iframes, videos,
and other contents
101
103. Parent Types for Wiki Pages in the LMS
Instance
Course
Group
In other words, the administrators (instructors) of courses are the ones who
create a majority of the pages. The learners in groups create fewer of the
wiki pages.
Note that the sense of a “wiki” page is different here.
103
105. Wiki Page Workflow
Null (default)
Active
Unpublished
Deleted
This needs more insight, but the data dictionary does not explain the
different states and what they mean. For example, is a “null” wiki page
published? Is an “active” wiki page something that is included in a
sequence? Is a “deleted” wiki page recoverable or not?
105
109. About Enrollment Role Types
Role Name Basic Role Type
Librarian TAEnrollment
StudentEnrollment StudentEnrollment
TeacherEnrollment TeacherEnrollment
TAEnrollment TAEnrollment
DesignerEnrollment DesignerEnrollment
ObserverEnrollment ObserverEnrollment
Grader TAEnrollment
GradeObserver TAEnrollment
109
116. Request Types in the LMS Instance
GET (Read)
POST (Create)
PUT (Create)
HEAD (Retrieve Resource)
DELETE (Remove)
PATCH (Update, Modify)
116
126. User “Workflow” States in the LMS
Instance
registered
pre_registered
deleted
creation_pending
The “creation_pending” may well refer to a process of approval for people
to have access—for a level of security.
126
128. Years of Origination of User Accounts
Initial exploration in 2013
Big push in 2014
New accounts in 2015 and 2016 indicating not only students but also
employment churn and stragglers slow to change to a new LMS
128
130. Retired Accounts = Registered False
2013 – early May 2017
Word frequency count from unigrams (so no full names represented as
such)
First names more common and so better represented
One number removed in the “stopwords” list
130
132. Pseudonyms
Pseudonyms = “logins associated with users”
Seems to be the connection between the LMS and various university
information systems
Seems like partial data (extracted in May 2017)
132
140. Conversations with Media Objects
Included
False
True
So when people use the email system inside Canvas, they do not generally
attach media objects (like digital imagery, slideshows, audio, video, or
other digital files).
140
142. Conversations w/ or without
Attachments
A majority of conversations are without attachments
A minority of conversations are with attachments
142
146. Conversation Messages Word
Frequency Count
482,339 conversation messages
Texts with 60,509,894 words
2/3 analyzed for textual contents (because of data size)
146
148. Mass Conversation Message Contents
Analytic: 82.33
“Formal, logical, and hierarchical thinking” vs. “more informal, personal, here-and-
now, and narrative thinking”
Clout: 80.21
“perspective of high expertise” and confidence vs. “more tentative, humble, even
anxious style”
Authentic: 26.41
“more honest, personal, and disclosing text” vs. “a more guarded, distanced form of
discourse”
Tone: 66.24
“a more positive, upbeat style” vs. “greater anxiety, sadness, or hostility” (emotional
tone) (“Linguistic Inquiry and Word Count: LIWC2015 Operator’s Manual,” 2015, p. 22)
148
150. Messaging about “Human Drives” in
the Mass Conversation Messages
Affiliation (2.35)
Power (2.19)
Achievement (1.46)
Reward (1.3)
Risk (0.37)
“The focus on affiliation and social identity seems reasonable, given the
typical college age of learners. The "power" language may come from
faculty speaking from positions of authority. The low level of focus on risk is
intriguing here (maybe young learners are not thought to have developed
the efficacy and confidence to take on uncontrolled risks?). Clearly, there
is a role for theorizing and interpretation, even with computation-based
analytics.”
150
152. Sentiment Analysis of Sample of
Conversation Messaging
A smaller sample of the conversation messages were analyzed for
sentiment. This set consisted of 72,377 messages.
The automated observations of sentiment showed that there were two
tendencies...either very positive or moderately negative (in terms of text
categories).
In this software tool, it is possible to explore which texts were categorized to
which categories of sentiment (very negative, moderately negative,
moderately positive, or very positive) in the comparisons between the
target text and the built-in sentiment dictionary.
In other words, the actual exploration of the content is possible through both
machine reading and human close reading.
152
154. Auto-Extracted Theme Based Hierarchy
Chart of Conversation Messaging Sample
(as a Treemap)
Class
Assignment
Time
Paper
Questions
Exam
Online
Group, etc.
154
156. Auto-extracted Themes from
Conversation Messaging Sample
These are in alphabetical order
The themes are listed in a human-readable way going clockwise around
the pie (in a pie chart)
156
158. Auto-Coded Theme-Based Hierarchy Chart of
Topics and Subtopics from Conversation
Messaging Sample (as a Sunburst Diagram)
This sunburst diagram—in the software—is somewhat interactive
This enables digging down into a Topic by double-clicking on it and seeing
the subtopic contents there
If the sliver is too thin, a mouse hovering will result in the actual subtopic
and the statistics and quant data available for viewing
158
160. Contexts of “Help” in a Word Tree
It is possible to analyze the various contexts in which “help” was used in the
conversation messaging in the prior word tree
In the software (NVivo 11 Plus), the word tree is interactive and is linked to
the original sources where the word appears, so it is possible to achieve
close reading of every use of “help” from the underlying dataset
The challenge is engaging a full dataset of millions of words
160
169. External Tool Activations in 2014
There is an increase in both variety and number of external tool activations
No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
169
171. External Tool Activations in 2015
There is an increase in both variety and number of external tool activations
No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
171
173. External Tool Activations in 2016
There is an increase in both variety and number of external tool activations
No deeper analysis was applied, but it could be…as to the external tool types
and the changing senses of needs
173
176. Course User Interface Navigation Item
State
Visible
Hidden
This refers to user capabilities of enabling the pre-set functions in the left
navigation of a course shell remain active or be placed in “hidden.”
There are “hidden” navigation element presets as well, which users may
choose to activate.
176
179. Delimiting the Analytics from the LMS
Data Portal Data
The concept behind delimiting is to make conclusions more accurate by
representing how confident one may be about the results.
As noted, there may be challenges and noise in the data from any step in
the workflow…but there are inherent limits also to the various data analytics
types—as shown in the visualization in the prior slide.
179
181. Some Practical Applications
Self awareness (holding up a mirror to the campus for its use of its LMS)
Analytics
To improve usage of the LMS
To know what functions and features are desirable
To support learner usage
To support teaching and learning
To support non-teaching and learning approaches to the data
Decision-making
Instructional design
Administrative awareness, decision-making, funding, and others
181
183. What are Ways to Go Beyond?
Other Analytical Methods
Reconnecting the flat files as
relational files in SQL server
Design of specific cross-file queries
for data analytics
Applying more and varied
computational text analysis
Engaging machine learning for
patterns (such as decision trees
for predictivity of classifications
based on available information)
Bringing in More Data
Comparing macro-level data with
other instances of the Canvas LMS
(such as with comparable
institutions of higher education)
Using additional data to enable
close-in reads (but without
compromising people’s privacy)
Keep confidential information
confidential
183
185. Assessing the Initial Haul of Biggish Data
Formulating askable questions
Analyzing the columnar data (and variables)
Understanding where the data comes from and how it is processed by Instructure
Analyzing the date data
Analyzing the textual data
Understanding ways to mix data in various datasets for enriched querying
Conceptualizing mixes of questions and potential findings based on the
available data
185
186. Assessing the Initial Haul of Biggish Data
(cont.)
Understanding the types of software that may be used to engage the data
Software enables cross-sectional base rate counts from flat files
Software enables cross-tabulation analysis and assessments of statistical
significance (rarity of patterns)
Software enables finding patterns through machine learning (like applying
decision trees to see what variables help determine classifications)
Software enables the identification of text-based patterns
186
187. Some Early Lessons Learned
Data visualizations are only summary data, and it’s important to get to the
actual underlying data to understand some dynamics.
It helps to theorize or hypothesize broadly to understand what may be
going on with the observed empirical data.
It is always wise to “sanity check” data extractions and data processing to
see what is going on.
It is important to understand the LMS data portal’s default settings and the
rationales behind those defaults to make sure that they make sense for the
particular context.
187
188. Some Early Lessons Learned (cont.)
Avoid double-counting for complex data with similar lead-in terms.
Watch out to not type incorrectly.
Do not ignore error messages; figure out why they’re happening and deal
with the issues.
Slow down the process, so you’re certain of what is happening at every
step. Be careful not to lose data.
Be careful about going to Excel, which has 1.05 million rows of data limits.
Be careful also of OS clipboards, which have 65,000 record limits. Do not let
such limits stall the work and result in lost data. Go to MS Access first or SQL
server.
188
189. Some Early Lessons Learned (cont.)
Use the LMS data portal “data dictionary” for the LMS data, but realize that
it may be dated or incomplete or inaccurate. A particular instance of an
LMS will be particular, so a general dictionary offers a general view, not a
specific one. Use the data dictionary in an attentive way.
Realize that there are nuances in the data that may not be apparent initially.
With computational text analysis, oftentimes, foreign languages will get
short shrift. There may be effective ways to address this.
With any sort of automation, there will be trade-offs. It is important to check
findings against the data and conduct data queries on multiple software
tools.
189
190. Some Early Lessons Learned (cont.)
Data is messy. It is totally possible (even probable) to have a process going
smoothly when something has glitch-ed with a data download.
No matter what, it is not possible to import the data for processing into either
Microsoft Access or SQL. In that case, there may need to be a data
“substitution” by extracting the “same-ish” set from the LMS data portal (days
later from when the first set was extracted).
The assumption is that new data is incremented on the end of the existing data,
so if the file is the proper one, a “later” version still should be accurate.
Depending on the data handling, though, that assumption may not be true. It
will be important to check.
190
191. Some Early Lessons Learned (cont.)
Don’t just go with how software is designed. For example, with a word
frequency count, don’t just go with the high counts, but analyze the “long
tail” of the low counts.
The “power law” does often apply to word counts in language. The long tail
shows something of outlier data in terms of single mentions (but you have to slog
through misspellings, strange alphanumeric strings, and other noise first).
There are certain data visualizations that work better for certain types of
data.
All data visualizations should be sufficiently labeled.
It helps to calculate not only raw numbers but percentages, where possible.
191
192. Some Early Lessons Learned (cont.)
Data portals contain personally identifiable information (PII), so extra care
has to be taken to ensure that people’s private information is not misused
nor leaked.
What is knowable depends on what other datasets one has access to and
how one sets up the analyses…
It helps to know what is possible to know from the data (full universe)
It helps to know what is politically viable to ask and capture (subset) (people
may ask for the moon)
It helps to use resources wisely to pursue asks that create constructive awareness
and good decision-making (sub-subset)
Recording steps is important (in notes and in macros)…so everything can
be repeated as needed.
192
193. To a Relational Database
So…Flat files are downloaded as compressed .gz files, opened with 7Zip as
.csv files.
Microsoft offers SQL Server Express as a free tool but limits to one CPU (up to
4 cores), 1 GB RAM, and database size limits to 10 GB (“Limitations of SQL
Server Express”).
Set this up on a dedicated machine, so the setup does not disrupt other work.
In shifting to SQL Server Express, the flat files have to be properly processed
for the data to move without lossiness or other problems.
It may help to process the data first in MS Access (as long as the flat file data is
not too large to handle in Access). Treat text columns as “Long Text,” not “Short
Text.” Label Date fields not as text but “Date with Time.” The idea is to have the
proper settings for appropriate receipt in SQL.
193
194. To a Relational Database (cont.)
Then, export the object from Access to Excel 2016 with the formatting and proper
data structure.
If the records have > 65,000 records, then MS Access is unable to export the data
table.
194
195. To a Relational Database (cont.)
One option is to split the dataset in
Access (Highlight the table -> go to
Database Tools tab -> click Access
Database -> Split database.) The
problem with this is that a dataset will
have to be split quite a few times to
get to the low 65,000 records, and then
after ingestion into SQL, any repeat
data will have to be deleted. This path
is too onerous to be helpful, especially
with LMS data portal data which can
easily go into the millions and millions
of rows.
A more direct option follows on the next
slide.
195
196. To a Relational Database (cont.)
When files are too large (anything over the 65,000 records that will fit in a
clipboard), then it makes better sense to just clean data on export in SQL. The
sequence goes like this: .gz -> .csv (using 7Zip) -> open SQL Management Studio
-> import data (change “DT_String” columns to “DT_Text” (for a “text stream”), so
there is not a 50 character constraint on the columns), and the data import
generally goes well. (This solution takes up more computer memory and is
inelegant, but it solves the many issues that would crop up otherwise with a
straight import without the data label adjustments.)
There is no import of column names in the first row.
In SQL Server Management Studio 17, go to Databases -> System
Databases -> “master” database (right-click) -> Tasks -> Import Data … and
specify that the original source is from Microsoft Excel. The flat files are now
database objects (dbos) in the master database. Do keep the original file
names, for ease-of-reference.
196
197. To a Relational Database (cont.)
Re-indexing needed?
If so, the foreign keys may have to be reconnected to the correct primary
keys for the relating in a relational database to make sense and for SQL
queries across the files to make sense.
Foreign keys point to primary keys in another table; they are unique identifiers
that connect related data between tables.
Primary keys are unique identifiers (and “reserved” against reuse in that sense),
and they indicate unique records in data tables (and databases).
If not, it may be possible to run SQL queries by loading the tables with
primary keys first and those with referring foreign keys second…but I am not
there yet. Working on it.
197
198. To a Relational Database (cont.)
Proceed with a good basic text on SQL server. Give it a good read-through
before actually going too far into a project. (Experimentation is always
good, but time wastage—not so much.)
If local support with a database administrator (DBA) is available, that would
be optimal.
198
199. References
Pennebaker, J.W., Booth, R.J., Boyd, R.L., & Francis, M.E. (2015). Linguistic
Inquiry and Word Count: LIWC2015. Operator’s Manual. Retrieved at
https://s3-us-west-
2.amazonaws.com/downloads.liwc.net/LIWC2015_OperatorManual.pdf.
199
200. Contact and Conclusion
Dr. Shalin Hai-Jew
iTAC
Kansas State University
212 Hale / Farrell Library
shalin@k-state.edu
785-532-5262
200