Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open learning- Text analysis basics

4 views

Published on

Text Wandering exchanging experiences: crowdsourced based interactive awareness.
Presented at the Territoires innovants conference (Essauira, Maroc 28 February 2019)

Published in: Education
  • Be the first to comment

  • Be the first to like this

Open learning- Text analysis basics

  1. 1. Open Learning [0] Text analysis Basics [Innovation and Sustainability] 2019_04_11 Text Wandering exchanging experiences: crowdsourced based interactive awareness. Stefano Lariccia (Sapienza Università di Roma) stefano.lariccia@uniroma1.it Fernando Martínez de Carnero Calzada (Sapienza University) fernando.martinez@uniroma1.it UROMA 28/02/2019 Territoires Innovants: Essaouira 1
  2. 2. Creating a new protocol framework for the collection of data and direct personal experiences. Authors: Stefano Lariccia (Sapienza University ) stefano.lariccia@uniroma1.it Fernando Martínez de Carnero Calzada (Sapienza University) fernando.martinez@uniroma1.it Giovanni Toffoli (Link R&D) toffoli@linkroma.it
  3. 3. Computing with Python: what is Python?
  4. 4. Computing with Python: why Python?
  5. 5. Computing with Python: why Python?
  6. 6. Computing with Python: what you can do with Python?
  7. 7. Computing with Python: what you can do with Python?
  8. 8. Computing with Python: how to proceed with Python?
  9. 9. ● >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. ○ text1: Moby Dick by Herman Melville 1851 ○ text2: Sense and Sensibility by Jane Austen 1811 ○ text3: The Book of Genesis ○ text4: Inaugural Address Corpus ○ text5: Chat Corpus ○ text6: Monty Python and the Holy Grail ○ text7: Wall Street Journal ○ text8: Personals Corpus ○ text9: The Man Who Was Thursday by G . K . Chesterton 1908 >>> Computing with Language: Texts and Words
  10. 10. ● Any time we want to find out about these texts, we just have to enter their names at the Python prompt: ○ >>> text1 ○ >>> text2 ○ >>> ● Now that we can use the Python interpreter, and have some data to work with, we’re ready to get started. Computing with Language: Texts and Words
  11. 11. ● Searching Text There are many ways to examine the context of a text apart from simply reading it. A concordance view shows us every occurrence of a given word, together with some context. ● Here we look up the word monstrous in Moby Dick by entering text1 followed by a period, then the term concordance, and then placing "monstrous" in parentheses: ● Computing with Language: Texts and Words
  12. 12. ● Let’s begin by finding out the length of a text from start to finish, in terms of the words and punctuation symbols that appear. We use the term len to get the length of something, which we’ll apply here to the book of Genesis: >>> len(text3) 44764 >>> ● So Genesis has 44,764 words and punctuation symbols, or “tokens.” ● A token is the technical name for a sequence of characters—such as hairy, his, or :)—that we want to treat as a group. ● When we count the number of tokens in a text, say, the phrase to be or not to be, we are counting occurrences of these sequences. Thus, in our example phrase there are two occurrences of to, two of be, and one each of or and not. ● But there are only four distinct vocabulary items in this phrase. How many distinct words does the book of Genesis contain? Computing with Language: Counting Vocabulary
  13. 13. ● There are many ways to examine the context of a text apart from simply reading it. A concordance view shows us every occurrence of a given word, together with some context. ● Here we look up the word monstrous in Moby Dick by entering text1 followed by a period, then the term concordance, and then placing "monstrous" in parentheses: ● Computing with Language: Searching Text
  14. 14. ● Now, let’s calculate a measure of the lexical richness of the text. The next example shows us that each word is used 16 times on average (we need to make sure Python uses floating-point division): >>> from __future__ import division >>> len(text3) / len(set(text3)) 16.050197203298673 >>> The constant __future__ makes it possible usage of future version constant in a today version of python) ● Now let’s calculate the variance of each of these texts: ○ Text1 ○ Text2 ○ Text3 ○ Text4 ○ Text5 ○ Text6 ○ Text7 ○ Text8 ○ Text9 Computing with Language: Counting Vocabulary
  15. 15. ● Now, let’s calculate a measure of the lexical richness of the text. The next example shows us that each word is used 16 times on average (we need to make sure Python uses floating-point division): >>> from __future__ import division >>> len(text3) / len(set(text3)) 16.050197203298673 >>> ● The constant __future__ makes it possible usage of future version constant in a today version of python) ● Now let’s calculate the variance of each of these texts: ○ Text1 ○ Text2 ○ Text3 ○ …..
  16. 16. ● Counting Vocabulary
  17. 17. 28/02/2019 Territoires Innovants: Essaouira 17 Innovation in creativity: Up2U an Open Flexible ecosistem (Screen shot of the Up2U gateway)
  18. 18. 28/02/2019 Territoires Innovants: Essaouira 18 Innovation in creativity: Up2U an Open Flexible ecosystem CommonSpaces Social Learning Platform
  19. 19. 28/02/2019 Territoires Innovants: Essaouira 19 Innovation in creativity: Up2U an Open Flexible ecosystem Learning Experiences Traceability in Up2U (Learning Analytics, Learning Locker)
  20. 20. 28/02/2019 Territoires Innovants: Essaouira 20 Innovation in creativity: Up2U an Open Flexible ecosystem Learning Experiences Traceability in Up2U (Learning Analytics, Learning Locker)
  21. 21. 28/02/2019 Territoires Innovants: Essaouira 21 Innovation in creativity: Up2U an Open Flexible ecosystem Learning Experiences Traceability in Up2U (Learning Analytics, Learning Locker)
  22. 22. 28/02/2019 Territoires Innovants: Essaouira 22 • Demo Learning Units preparation (part of Up2U Pedagogial WP5) • Unit 1: New technologies, new learning methods • Unit 2: Language Technologies and Learning Assessment () • Unit 3: Language analysis and Critical Thinking. How to increase your ability to understand texts and documents () • Unit 4: GDPR (General Data Protection Regulation) • Unit 5: Climate Change risk mitigation () • Unit 6: Education to environmental and seismic emergency and to () • Unit 7: Valorisation of Archaeological sites () • Unit 8: Landscape monitoring and navigation () Innovation in creativity: Up2U an Open Flexible ecosystem
  23. 23. 1. Social Movements fighting to promoting and let emerge Open Source data initiatives are trying to alert research centres, universities, governments and other institutions on the risk for the people to be alienated of the oil of the future , Big Data 2. Quali analogie presenta la situazione di alienazione dei dati (information alienation) dalle realtà locali rispetto alla alienation of commodities being produced by workers? Is there any theoretical, systematic comparison? 3. Developing countries are reluctant to accept limitations to their growth for Climate Change alerts: they argument that if western countries did all they wanted for centuries, they should not accept now limitation in the name of the planet 23Territoires Innovants: Essaouira
  24. 24. 1. Why Europe should express a european specific position? 2. Open Source movements are more and more visible and authoritative 3. Linux case 4. Galileo case 5. Up2U case 6. How Up2U could support european schools and international schools to increase awareness on Climate Change problems and mitigation policies 7. A pan european community of geographical monitoring and survey could be enlarged to a wider community of countries (mediterranean area, middle-east) Territoires Innovants: Essaouira
  25. 25. ● Why a citizen science? Academic level Alan Irwin (1995): collaboratively determine the objectives of the research. Bonney (1996): collective investigations at the Cornell Ornithology Laboratory. Institutional level Holdren (2015): voluntary participation of the public in the scientific process. ● Causes of the displacement of the participation of the scientist to the experimenter ○ Management of business and financial models at the public level: users and quality control. ○ Semantic Web and W30: the user as a data source. From hunter to hunted. ○ The cult of storytelling, micro-stories and emotionality. ○ Interactive devices, smartphone and social networks. ○ Traceability and GPS. ○ Big data, processing capacity and interpretation algorithms. ○ Ethical and unethical uses, with acceptance and ignored. The data and its thefts. Cambridge analytica as limit. ○ Profiles and behavioral patterns.
  26. 26. ● Areas of application of citizen science. ○ A digital revolution tailored to technology or a use of technology tailored to a cultural model? ○ Transversality and interdisciplinarity. ○ Impact factors: social and educational. ○ Technologies and social networks as development instruments. ○ Applications for tourist studies. ○ Ecotourism, food and wine tourism and cultural tourism: sustainability, diversification and complementarity of the offer. ○ Growing interest in financing research.
  27. 27. ● The Ten Principles of Citizen Science. ○ 1. Citizen science projects actively involve citizens in scientific endeavour that generates new knowledge or understanding. ○ 2. Citizen science projects have a genuine science outcome. ○ 3. Both the professional scientists and the citizen scientists benefit from taking part. ○ 4. Citizen scientists may, if they wish, participate in multiple stages of the scientific process. ○ 5. Citizen scientists receive feedback from the project. ○ 6. Citizen science is considered a research approach like any other, with limitations and biases that should be considered and controlled for. ○ 7. Citizen science project data and metadata are made publicly available and where possible, results are published in an open-access format. ○ 8. Citizen scientists are acknowledged in project results and publications. ○ 9. Citizen science programmes are evaluated for their scientific output, data quality, participant experience and wider societal or policy impact. ○ 10. The leaders of citizen science projects take into consideration legal and ethical issues surrounding copyright, intellectual property, data-sharing agreements, confidentiality, attribution and the environmental impact of any activities.
  28. 28. ● Tourist experience and experiential tourism. ○ Is a non-experiential tourism possible? ○ Experiential tourism as an emerging collaborative form. ○ The gaps in the tourism: the demand that does not cover the offer. ○ Educational instruments and appreciation of the learning process.
  29. 29. [Marco Ramazzotti] 1. What kind of tools we can put at work to automate analysis landscapes photographs and satellite images to achieve the goal of monitoring changes trend on our planet? ○ Climate Change Emergency: the lack of time can boost the new adaptation process 2. Neural Network and Machine Learning methods to analyse images (aereo photogrammetry, drone images and satellite images) ○ Many progresses were made into image analysis and pattern recognition ○ Another big step forward can be introduced by man machine interaction 3. Mixed methods (human and automata collaboration] ○ Students of secondary schools are invited to collaborate in communities between them and in communities with well trained automata Territoires Innovants: Essaouira
  30. 30. 26/02/2019 32 • Mentoring Pilot organization: Italy • CYC2 Module 1: 1-2 weeks; CYC2 Module 2: 12 weeks, Online • Distribution of Learning Path 1, 2 and 3 (the last 2 need to be translated) • Schools engagement: GARR mailing the schools, and regional districts • Supporting Pilot organization: Greece • Schools mailing: expected engagement of 100 schools • Modeling a common collection of data • Supporting Pilot organization: Lithuania • Schools engagement: active engagement of 80 schools • Modeling a common collection of data • Supporting Pilot organization: Hungary ● Schools engagement: expected engagement of 50 schools Essaouira 28 January 2019
  31. 31. 26/02/2019 33 • Supporting Pilots, general overview: • Supporting Pilot organization: Poland ■ Schools mailing: expected engagement of 100 schools ■ Modeling a common collection of data • Supporting Pilot organization: Portugal ■ Schools mailing: expected engagement of 30 schools • Supporting Pilot organization: Spain ■ Schools mailing: expected engagement of 30 schools • Supporting Pilot organization: Switzerland ■ Schools recruitment: expected engagement of 5 schools Essaouira Up2U January 2019
  32. 32. 28/02/2019 Territoires Innovants: Essaouira 34 • Further piloting activities into Up2U ecosystem • “Light” integration of existing Jupyter notebooks based on CommonSpaces integrated for formal/informal in Up2U; • University as a Hub is planning first “flipped” class for Italian secondary schools based on CommonSpaces. 1 Pilot will start i classroom on February • Starting planning May Meeting in Rome at SLERD 2019 • In Italy (to be integrated into WP7): started negotiation with Heritage Ministry to co-authoring a pilot in italian secondary schools about “Emergency and seismic education and risk mitigation” and “Valorisation of Archaeological sites” ; this latter activity will converge during SLERD 2019 in a demonstration of Up2U system used in Heritage studies. January - February 2019 [3]
  33. 33. 28/02/2019 Territoires Innovants: Essaouira 35 • Planned piloting activities into Up2U ecosystem • “Light” integration of existing Jupyter notebooks based on CommonSpaces integrated for formal/informal in Up2U; • UaH is planning first “flipped” class for Italian secondary schools based on CommonSpaces. 1 Pilot will start in classroom on February / March • Planning May Meeting in Rome at SLERD 2019 • Negotiation with the Italian Heritage Ministry to co-authoring a pilot in italian secondary schools about “Emergency and seismic education and risk mitigation” and “Valorisation of Archaeological sites” March - December 2019

×