This document describes Series-O-Rama, a system that allows users to search for and get recommendations on TV series using SQL. It mines subtitles to extract terms for each series. These terms are indexed and weighted using TF-IDF to model each series as a vector. Series similarity is calculated based on shared terms. Queries can retrieve matching series based on term weights and series can be recommended based on a user's interests. The system provides search, browsing and recommendation capabilities through its GUI and uses a database to store the subtitle data and indexes.
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
Stream Reasoning (SR) research field is grown enough to prove that reasoning upon rapidly changing information is possible. RDF Stream Processing (RSP) Engines, systems capable to handle at semantic level RDF-encoded information flows, are increasing in number of implemented solutions. Now the Stream Reasoning community is working on the standardization of the methods and tools that supported their development.
Many Computer Science (CS) research fields shown their interest for a deeper comprehension of their own work nature. Studies like [46, 51] investi- gated the publications in those field, highlighting that the majority of them are allied to an Engineering epistemology. However, they also evinced and criticized the concrete differences with other engineering research areas, which focus on evaluation of the proposed systems and not only on their design and development.
The lacks of an empirical approach can be ascribed to the complex nature of the software systems. However, it is possible to face such studies that can not be easily modeled, reducing the complexity of the analysis keeping intact the relevance of each involved system. In social science and economy, where researchers deal with cross case studies, it is commonly used a System- atic Comparative Research Approach (SCRA) within an experimental setting, which grants properties like repeatability, reproducibility and comparability to build the evaluation upon.
The SR community agreed that it is mandatory evaluating RSP Engines, understanding how these systems perform in real uses cases. Recent works in the filed [53, 41, 19] pursued this goal, providing benchmarks for RSP Engines evaluation. Further analysis pointed out the challenges involved by the Stream Reasoning research and posed the basis for a proper RSP Engines evaluation, describing in detail where previous works have failed and how the can be
improved [44].
The limitations of the existing benchmarking proposals proved that the
empirical evaluation of RSP Engines is just at the beginning. What is still missing in an infrastructure that allows to compare, possibly automatically, the performances of many RSP Engines and that grants the properties of an experimental setting. In this thesis we brace this challenge borrowing from the aerospace engineering the idea of an engine test stand, which is an automatic facility for engine testing and development.
A test stand allows to design experiments and to execute them, evaluat- ing engines in a controlled environment. Thus, we formulate the following research question: ”Can an engine test stand, together with queries, datasets and methods, support Systematic Comparative Research Approach for Stream Reasoning? ”
In this thesis we propose Heaven, an open source framework that enables the Systematic Comparative Approach in the Stream Reasoning research field. Heaven consists of: an RSP Engine Test Stand, which emulates the aerospace engineering facility in the Stream Rea
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention, to content and network recommendations, to ranking and bidding, our world not only consumes low-latency data streams, it adapts to changing conditions modeled by that data.
While software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the ecosystem of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few.
Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Riccardo Tommasini
Stream Reasoning (SR) research field is grown enough to prove that reasoning upon rapidly changing information is possible. RDF Stream Processing (RSP) Engines, systems capable to handle at semantic level RDF-encoded information flows, are increasing in number of implemented solutions. Now the Stream Reasoning community is working on the standardization of the methods and tools that supported their development.
Many Computer Science (CS) research fields shown their interest for a deeper comprehension of their own work nature. Studies like [46, 51] investi- gated the publications in those field, highlighting that the majority of them are allied to an Engineering epistemology. However, they also evinced and criticized the concrete differences with other engineering research areas, which focus on evaluation of the proposed systems and not only on their design and development.
The lacks of an empirical approach can be ascribed to the complex nature of the software systems. However, it is possible to face such studies that can not be easily modeled, reducing the complexity of the analysis keeping intact the relevance of each involved system. In social science and economy, where researchers deal with cross case studies, it is commonly used a System- atic Comparative Research Approach (SCRA) within an experimental setting, which grants properties like repeatability, reproducibility and comparability to build the evaluation upon.
The SR community agreed that it is mandatory evaluating RSP Engines, understanding how these systems perform in real uses cases. Recent works in the filed [53, 41, 19] pursued this goal, providing benchmarks for RSP Engines evaluation. Further analysis pointed out the challenges involved by the Stream Reasoning research and posed the basis for a proper RSP Engines evaluation, describing in detail where previous works have failed and how the can be
improved [44].
The limitations of the existing benchmarking proposals proved that the
empirical evaluation of RSP Engines is just at the beginning. What is still missing in an infrastructure that allows to compare, possibly automatically, the performances of many RSP Engines and that grants the properties of an experimental setting. In this thesis we brace this challenge borrowing from the aerospace engineering the idea of an engine test stand, which is an automatic facility for engine testing and development.
A test stand allows to design experiments and to execute them, evaluat- ing engines in a controlled environment. Thus, we formulate the following research question: ”Can an engine test stand, together with queries, datasets and methods, support Systematic Comparative Research Approach for Stream Reasoning? ”
In this thesis we propose Heaven, an open source framework that enables the Systematic Comparative Approach in the Stream Reasoning research field. Heaven consists of: an RSP Engine Test Stand, which emulates the aerospace engineering facility in the Stream Rea
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention, to content and network recommendations, to ranking and bidding, our world not only consumes low-latency data streams, it adapts to changing conditions modeled by that data.
While software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the ecosystem of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few.
Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Conférence invitée au congrès CORIA 2019 (Conference francophone sur la Recherche d'Information et ses Applications)
https://coria-earia2019.projet.liris.cnrs.fr/Programme/keynotes/
Les données de la recherche forment un matériau d'une rare richesse pour notre communauté. Les publications sont des textes structurés et interconnectés via les références bibliographiques qui étayent leurs rhétoriques. La paternité des productions a trait à des individus regroupés, parfois même hiérarchisés, au sein de collectifs de co-signataires qui se reconfigurent au fil du temps. La nature de la contribution de chacun tend désormais à être explicitée. Chaque affiliation est ancrée sur un territoire, le contenu des recherches l'est parfois aussi. L'impact des savoirs produits se matérialise explicitement via les citations et implicitement par les éponymes et autres évocations d'écoles de pensée. La délimitation des disciplines et le front de recherche — séparant le connu de l'inconnu — sont en perpétuelle évolution. Tous ces savoirs circulent dans la sphère académique, certains atteignent le grand public qui les relaie sur les réseaux sociaux et dans la presse, alimentant des altmetrics qui attestent de cette percolation science–société.
Cet exposé présentera une variété de tâches de recherche interrogeant ce matériau pour éclairer la genèse et l'évolution des mondes sociaux et des savoirs en sciences. Il s'agit de travaux interdisciplinaires à la croisée de l'informatique, de la scientométrie (désignant l'étude quantitative de la science et de l'innovation) et des sciences humaines et sociales. Je souhaite transmettre mon enthousiasme pour ces problématiques et promouvoir les thèmes du workshop Bibliometric-enhanced Information Retrieval (BIR) que je co-anime dans le cadre d'ECIR.
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
Conférence aux 10es journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "La participation des usagers et lecteurs en contexte numérique : quels impacts sur les pratiques professionnelles ?"
https://www.irit.fr/publis/IRIS/2019_DDAME_C.pdf
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 28 septembre 2017
https://web.archive.org/web/20170928/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-les-usages-des-medias-sociaux-dans-le-cadre-des-mobilisations-collectives-au-bresil--524448.kjsp
Conférence au workshop “Women and
men in science: Do we need gender metrics?” du 27 avril 2017 à l'Université Toulouse 2 - Jean Jaurès
https://www.irit.fr/~Guillaume.Cabanac/docs/workshopGenderScienceLabexSMS2017.pdf
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 20 octobre 2016
https://web.archive.org/web/20161009/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-programmation-2016-2017-451614.kjsp?RH=actions-SMS
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...Guillaume Cabanac
« T'as pensé à retweeter mon article ? »
Enjeux, limites et critique de la bibliométrie alternative via les Altmetrics
Guillaume Cabanac, MCF, Université Toulouse 3, Institut de Recherche en Informatique de Toulouse
L'impact d'un résultat scientifique est traditionnellement estimé par le nombre de citations que la publication associée suscite. Cependant, cet indicateur ne permet pas d'estimer la réception d'une recherche en dehors de la sphère académique, à court terme, que ce soit dans la presse ou sur les médias sociaux (Twitter, Facebook, etc.).
C'est dans ce contexte que des indicateurs complémentaires appelés « altmetrics » sont développés depuis 2012 pour refléter l'engouement exprimé à l'égard des résultats scientifiques. Les altmetrics sont désormais intégrés aux plateformes des éditeurs scientifiques (tels qu'Elsevier, PLOS, Springer et Wiley) et aux CV en ligne des chercheurs (sur impactstory.org par exemple).
Mesurer l'intérêt du grand public pour la science : les altmetrics atteignent-ils cet objectif louable ? De récentes études suggèrent que le gros de l'activité captée par les altmetrics provient des chercheurs eux-mêmes... Les scientifiques adeptes des réseaux sociaux auraient-ils détourné -- inconsciemment ou délibérément -- cet indicateur pour accroître leur e-reputation ?
"When a measure becomes a target, it ceases to be a good measure" -- Goodhart's law
https://en.wikipedia.org/wiki/Goodhart%27s_law
Quelques références récentes :
-Colquhoun, D., & Plested, A. (2014). Scientists don't count: Why you should ignore Altmetrics and other bibliometric nightmares. DC's Improbable Science [Blog post]. Available from: http://wp.me/p2ZpqR-1EJ
-González-Valiente, C. L., Pacheco-Mendoza, J. and Arencibia-Jorge, R. (2016), A review of altmetrics as an emerging discipline for research evaluation. Learned Publishing. doi:10.1002/leap.1043
-Ke, Q., Ahn, Y.-Y., & Sugimoto, C. R. (2016). A Systematic Identification and Analysis of Scientists on Twitter. ArXiV preprint available from http://arxiv.org/abs/1608.06229
https://openeval2016.sciencesconf.org
https://openeval2016.sciencesconf.org/data/program/Resume_Guillaume_Cabanac.pdf
Émergence de l’open access « gris » : LibGen et Sci-HubGuillaume Cabanac
Séminaire ELICO « Observer les dynamiques socio-économiques de la publication scientifique : approches qualitative et bibliométrique »
http://web.archive.org/web/20160511081918/http://www.elico-recherche.eu/actualites/actualites-du-laboratoire/programme-du-seminaire-elico-2015-16
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxGuillaume Cabanac
Journée d’étude Doccitanist : "Evaluation scientifique, qui croire et pourquoi ?"
08 octobre 2015
http://web.archive.org/web/20151008092117/http://doccitanist.lirmm.fr/spip.php?article273&lang=fr
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueGuillaume Cabanac
Conférence invitée à la 6e édition de la journée « Réseau des bibliothèques », université Fédérale de Toulouse, 9 juin 2015
http://web.archive.org/web/20150606154341/http://bibliotheques.univ-toulouse.fr/actualite/journee-reseau-des-bibliotheques-2015
Le renfort des liens forts - dynamique relationnelle du coauthorshipGuillaume Cabanac
Le renfort des liens forts - dynamique relationnelle du coauthorship
Cas de l’informatique (1980-2010)
Journées d'études RÉSOCIT
http://www.irit.fr/~Guillaume.Cabanac/docs/resocit2015.pdf
Conférence aux 6e journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "Visibilité et légitimité de l'information : comment se faire "bien voir" dans le contexte numérique ?"
Programme : http://www.irit.fr/publis/SIG/2015_DAM_C.pdf
In Praise of Interdisciplinary Research through ScientometricsGuillaume Cabanac
Keynote talk to the workshop on on Bibliometric-enhanced Information Retrieval (BIR) collocated with the ECIR 2015 conference.
http://www.gesis.org/en/events/events-archive/conferences/ecirworkshop2015/
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
Article publié dans les actes d'Inforsid 2020 (voir http://inforsid.fr/actes/2020/INFORSID_2020_p19-34.pdf et https://doi.org/10.1002/leap.1451).
Les systèmes d’information de la recherche collectent et mettent en visibilité la pro- duction scientifique des chercheurs. Leur désambiguïsation est capitale pour ne pas fusionner les productions de plusieurs personnes (cas des homonymes). Or, l’initiative ORCID offre un identifiant à chaque chercheur, pointant vers ses affiliations et sa bibliographie. Les agences de financement (ANR et ERC) et les revues savantes encouragent l’adoption d’ORCID. Nous présentons une méthode pour quantifier cette adoption selon la discipline et de la catégorie d’emploi des publiants d’un établissement. La preuve de concept est réalisée sur les données des 6 471 personnels rattachés aux 150 laboratoires du site toulousain. Nous confrontons avec une validation manuelle leur identité aux 7,3 de millions profils d’orcid.org. Nous observons une adoption croissante d’ORCID avec une disparité d’adoption selon les disciplines. Étonnement, des profils sont uniquement créés pour obtenir un ORCID, sans renseigner ni affiliation ni bibliographie. Ces profils « vides » ont peu d’intérêt pour la tâche de désambiguïsation des identités. À notre connaissance, aucune autre étude de cette ampleur n’a été publiée concernant l’adoption d’ORCID sur un site universitaire multidisciplinaire. La méthode proposée est réplicable et de futures études pourront chercher à confronter les situations et les dynamiques d’évolution.
Conférence invitée au congrès CORIA 2019 (Conference francophone sur la Recherche d'Information et ses Applications)
https://coria-earia2019.projet.liris.cnrs.fr/Programme/keynotes/
Les données de la recherche forment un matériau d'une rare richesse pour notre communauté. Les publications sont des textes structurés et interconnectés via les références bibliographiques qui étayent leurs rhétoriques. La paternité des productions a trait à des individus regroupés, parfois même hiérarchisés, au sein de collectifs de co-signataires qui se reconfigurent au fil du temps. La nature de la contribution de chacun tend désormais à être explicitée. Chaque affiliation est ancrée sur un territoire, le contenu des recherches l'est parfois aussi. L'impact des savoirs produits se matérialise explicitement via les citations et implicitement par les éponymes et autres évocations d'écoles de pensée. La délimitation des disciplines et le front de recherche — séparant le connu de l'inconnu — sont en perpétuelle évolution. Tous ces savoirs circulent dans la sphère académique, certains atteignent le grand public qui les relaie sur les réseaux sociaux et dans la presse, alimentant des altmetrics qui attestent de cette percolation science–société.
Cet exposé présentera une variété de tâches de recherche interrogeant ce matériau pour éclairer la genèse et l'évolution des mondes sociaux et des savoirs en sciences. Il s'agit de travaux interdisciplinaires à la croisée de l'informatique, de la scientométrie (désignant l'étude quantitative de la science et de l'innovation) et des sciences humaines et sociales. Je souhaite transmettre mon enthousiasme pour ces problématiques et promouvoir les thèmes du workshop Bibliometric-enhanced Information Retrieval (BIR) que je co-anime dans le cadre d'ECIR.
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
Conférence aux 10es journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "La participation des usagers et lecteurs en contexte numérique : quels impacts sur les pratiques professionnelles ?"
https://www.irit.fr/publis/IRIS/2019_DDAME_C.pdf
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 28 septembre 2017
https://web.archive.org/web/20170928/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-les-usages-des-medias-sociaux-dans-le-cadre-des-mobilisations-collectives-au-bresil--524448.kjsp
Conférence au workshop “Women and
men in science: Do we need gender metrics?” du 27 avril 2017 à l'Université Toulouse 2 - Jean Jaurès
https://www.irit.fr/~Guillaume.Cabanac/docs/workshopGenderScienceLabexSMS2017.pdf
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Guillaume Cabanac
Séminaire PragmaTIC sur les TIC, les pratiques associées et leurs incidences sociales.
Université Toulouse 2, 20 octobre 2016
https://web.archive.org/web/20161009/http://sms.univ-tlse2.fr/accueil-sms/comunitic/seminaire-journee-d-etudes/seminaire-pragmatic-programmation-2016-2017-451614.kjsp?RH=actions-SMS
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...Guillaume Cabanac
« T'as pensé à retweeter mon article ? »
Enjeux, limites et critique de la bibliométrie alternative via les Altmetrics
Guillaume Cabanac, MCF, Université Toulouse 3, Institut de Recherche en Informatique de Toulouse
L'impact d'un résultat scientifique est traditionnellement estimé par le nombre de citations que la publication associée suscite. Cependant, cet indicateur ne permet pas d'estimer la réception d'une recherche en dehors de la sphère académique, à court terme, que ce soit dans la presse ou sur les médias sociaux (Twitter, Facebook, etc.).
C'est dans ce contexte que des indicateurs complémentaires appelés « altmetrics » sont développés depuis 2012 pour refléter l'engouement exprimé à l'égard des résultats scientifiques. Les altmetrics sont désormais intégrés aux plateformes des éditeurs scientifiques (tels qu'Elsevier, PLOS, Springer et Wiley) et aux CV en ligne des chercheurs (sur impactstory.org par exemple).
Mesurer l'intérêt du grand public pour la science : les altmetrics atteignent-ils cet objectif louable ? De récentes études suggèrent que le gros de l'activité captée par les altmetrics provient des chercheurs eux-mêmes... Les scientifiques adeptes des réseaux sociaux auraient-ils détourné -- inconsciemment ou délibérément -- cet indicateur pour accroître leur e-reputation ?
"When a measure becomes a target, it ceases to be a good measure" -- Goodhart's law
https://en.wikipedia.org/wiki/Goodhart%27s_law
Quelques références récentes :
-Colquhoun, D., & Plested, A. (2014). Scientists don't count: Why you should ignore Altmetrics and other bibliometric nightmares. DC's Improbable Science [Blog post]. Available from: http://wp.me/p2ZpqR-1EJ
-González-Valiente, C. L., Pacheco-Mendoza, J. and Arencibia-Jorge, R. (2016), A review of altmetrics as an emerging discipline for research evaluation. Learned Publishing. doi:10.1002/leap.1043
-Ke, Q., Ahn, Y.-Y., & Sugimoto, C. R. (2016). A Systematic Identification and Analysis of Scientists on Twitter. ArXiV preprint available from http://arxiv.org/abs/1608.06229
https://openeval2016.sciencesconf.org
https://openeval2016.sciencesconf.org/data/program/Resume_Guillaume_Cabanac.pdf
Émergence de l’open access « gris » : LibGen et Sci-HubGuillaume Cabanac
Séminaire ELICO « Observer les dynamiques socio-économiques de la publication scientifique : approches qualitative et bibliométrique »
http://web.archive.org/web/20160511081918/http://www.elico-recherche.eu/actualites/actualites-du-laboratoire/programme-du-seminaire-elico-2015-16
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxGuillaume Cabanac
Journée d’étude Doccitanist : "Evaluation scientifique, qui croire et pourquoi ?"
08 octobre 2015
http://web.archive.org/web/20151008092117/http://doccitanist.lirmm.fr/spip.php?article273&lang=fr
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueGuillaume Cabanac
Conférence invitée à la 6e édition de la journée « Réseau des bibliothèques », université Fédérale de Toulouse, 9 juin 2015
http://web.archive.org/web/20150606154341/http://bibliotheques.univ-toulouse.fr/actualite/journee-reseau-des-bibliotheques-2015
Le renfort des liens forts - dynamique relationnelle du coauthorshipGuillaume Cabanac
Le renfort des liens forts - dynamique relationnelle du coauthorship
Cas de l’informatique (1980-2010)
Journées d'études RÉSOCIT
http://www.irit.fr/~Guillaume.Cabanac/docs/resocit2015.pdf
Conférence aux 6e journées d'étude du Département Archive et Médiathèque de l'Université Toulouse Jean Jaurès sur le thème "Visibilité et légitimité de l'information : comment se faire "bien voir" dans le contexte numérique ?"
Programme : http://www.irit.fr/publis/SIG/2015_DAM_C.pdf
In Praise of Interdisciplinary Research through ScientometricsGuillaume Cabanac
Keynote talk to the workshop on on Bibliometric-enhanced Information Retrieval (BIR) collocated with the ECIR 2015 conference.
http://www.gesis.org/en/events/events-archive/conferences/ecirworkshop2015/
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Unit 8 - Information and Communication Technology (Paper I).pdf
Searching and Recommending TV series with SQL
1. Series-O-RamaSeries-O-Rama
Search & Recommend TV series with SQLSearch & Recommend TV series with SQL
http://bit.ly/series-o-ramahttp://bit.ly/series-o-rama
Guillaume Cabanac
guillaume.cabanac@univ-tlse3.fr
2. Toulouse: A Picture is Worth a Thousand Words
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
2
1
2
3
4
Capbreton
3h ride
Toulouse
population: 437 000
students: 97 000
Ax-les-Thermes
1h40 ride
Collioure
2h30 ride
3. en.wikipedia.org
Telly Addicts Need Help to Find TV Series
Main Topics of Grey’s AnatomyGrey’s Anatomy?
Text mining, Visualization
Series about ‘plane crash islandplane crash island’
Search engine
What should I watch next?
Recommender system
amazon.com →
3
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
4. Text Mining: Let’s Crunch Subtitles
4
Main Topics of Grey’s AnatomyGrey’s Anatomy?
Text mining, Visualization
Series about ‘plane crash islandplane crash island’
Search engine
What should I watch next?
Recommender system
Cold CaseCold Case
GreyGrey’s Anatomy’s Anatomy
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
5. What’s in a Subtitle File?
5
Title – Season – Episode – Language.srt
1 episode = 1 plain text file
Synchronization
start --> stop
Dialogue
We can easily extract words
[ a, again*2, and, but, com, cuban,
different, favorite, food, for*2, forum,
going, great, happen*2, has, hungry, i*2,
is, it, love, m, my, nice, night*2, miami,
now, pork, s*2, sandwiches, something, the,
to*2, tonight, town, www ]
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
6. 6
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
DB technology at Work! [Home]
7 527 files = 337 MB
100% Java and Oracle
7. DB technology at Work! [Search engine]
7
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Ranked list
of results
8. DB technology at Work! [Infos]
8
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Most
popular
terms
Most
related
series
9. DB technology at Work! [Recommendations]
9
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
10. DB technology at Work! [Recommendations]
10
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
I liked I disliked
What should
I watch next?
11. DB technology at Work! [Recommendations]
11
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Ranked list of
recommendations
12. How Does this Work?
12
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
13. Architecture and Data Model
13
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
DB
subtitles
indexing
searching
browsing
recommending
GUI
offline
online
Dict = { idT, term}
8 plane
27 killer
29 crash
Posting = { idT*, idS*, nb}
27 45 89
8 45 3
8 12 90
⊆
⊆
14. Theory − Text Indexing Pipeline
14
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
[the, plane, crashed, ..., planes, ..., is]
[plane, crashed, ..., planes, ...]
[plane, crash, ..., plane, ...]
{(plane, 48), (crash, 15) ...}
Tokenization +
lowercase
Stopwords removal
Stemming
PorterPorter’s Stemmer (1980)’s Stemmer (1980)
http://qaa.ath.cx/porter_js_demo.html
In 1720 Robert Gordon retired to Aberdeen having amassed a
considerable fortune in Poland. On his death 11 years later he willed his
entire estate to build a residential school for educating young boys. In
the summer of 1750 the Robert Gordon’s Hospital was born In 1881 this
was converted into a day school to be known as Robert Gordon’s
College. This school also began to hold day and evening classes for boys
girls and adults in primary secondary mechanical and other subjects …
Counting
15. Theory − Similarity of Paired Series
15
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
A Big Limitation
The distribution of terms among series is ignored
It makes no difference that a term occurs 1 time or 1,000,000 times
Dice’s Coefficient (1945)
Based on the Set Theory
Example: Let us Model a Series as a Set of Terms
House = {hospital, doctor, crazy, psycho}
Grey’s = {doctor, care, hospital}
16. Vocabulary
Theory − Vector Space Model, Term Weighting
16
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Raw TF
dexter > lost
max
max
Normalization
TF / max(TF)
survive ?
max
max
dexter < lost
17. Theory − Best Match Retrieval
17
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
1 TV series = 1 vector
1 45 1467 6790 n
Now, we know how to:
Find most popular termspopular terms for a TV series
Compute similaritysimilarity between TV series
Find TV series matching a querymatching a query
18. Theory − More on Term Weighting
18
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
1 45 1467 6790 n
1 TV series = 1 vector
All terms are supposed to be equally representative
… but ‘survive’ is way more unusual than ‘people’
⇒ ‘survive’ better represents Lost than ‘people’ does
IDF: Inverse Document FrequencyIDF: Inverse Document Frequency
19. Theory − The Big Picture: TF*IDF
19
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
An important term for series S is frequent in Sis frequent in S and globally unusualglobally unusual.
20. Theory … and Practice
20
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Series = { idS, name, maxNb}
12 Lost 540
45 Dexter 125
Dict = { idT, term idf }
8 plane 1.25
27 killer 2.87
29 crash 3.07
Posting = { idT*, idS*, nb, tf }
27 45 89 0.71
8 45 3 0.02
8 12 90 0.16
⊆
⊆
21. Description of a TV Series
21
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
Lost
⋈
Many surnames need to be filtered out
22. Retrieval of TV Series − queries with 1 term
22
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
survive ⋈
Importance of normalization
• Stargate Atlantis
nb/maxNb = 63/1116 = 0.05645
• Blade
nb/maxNb = 9/163 = 0.05521
23. Retrieval of TV Series − queries with n terms
23
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
survive mulder ⋈
67|The Vampire Diaries
survive|0.028|0.107 = 0.028 * 0.107 = 0.003
mulder|0.007|3.977 = 0.007 * 3.977 = 0.028
+ 0.031
18| X-Files
survive|0.014|0.107 = 0.014 * 0.107 = 0.001
mulder|1.000|3.977 = 1.000 * 3.977 = 3.977
+ 3.978
⁞
24. Similar to House?
Computing Similarities Among TV Series 1/2
24
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
⋈
First, let’s compute the numerator where:
Ai = Terms from House
Bi = Terms from Another TV series Ai Bi
25. Similar to House?
Computing Similarities Among TV Series 2/2
Series-O-Rama: Search & Recommend TV series with SQL Guillaume Cabanac
⋈
⋈
⋈
25
select term, tf*idf score from posting p, dict d where p.idT = d.idT and idS = (select idS from series where name = 'Lost') order by 2 desc, 1 ;
select name, term, nb, tf from posting p, series s, dict d where p.idS = s.idS and p.idT = d.idT and term = 'survive' order by tf desc, name ;
select name, sum(tf*idf) rsv from posting p, series s, dict d where p.idS = s.idS and p.idT = d.idT and term in ('survive', 'mulder') group by p.idS, name order by 2 desc, 1 ;
with numerator as ( select pLost.idS idLostS, pOther.idS idOtherS, sum(pLost.tf*idf * pOther.tf*idf) numValue from posting pLost, posting pOther, dict d where pLost.idT = pOther.idT -- common terms and pLost.idT = d.idT -- for IDF and pLost.idS <> pOther.idS and pLost.idS = (select idS from series where name = 'House') group by pLost.idS, pOther.idS ) select name, numValue / ( sqrt((select sum(power(tf*idf, 2)) from posting p, dict d where p.idT = d.idT and p.idS = n.idLostS)) * sqrt((select sum(power(tf*idf, 2)) from posting p, dict d where p.idT = d.idT and p.idS = n.idOtherS))) score from numerator n, series s where n.idOtherS = s.idS order by 2 desc, 1 ;
with numerator as ( select pHouse.idS idHouseS, pOther.idS idOtherS, sum(pHouse.tf*idf * pOther.tf*idf) numValue from posting pHouse, posting pOther, dict d where pHouse.idT = pOther.idT -- common terms and pHouse.idT = d.idT -- for IDF and pHouse.idS <> pOther.idS and pHouse.idS = (select idS from series where name = 'House') group by pHouse.idS, pOther.idS ) select name, numValue / ( sqrt((select sum(power(tf*idf, 2)) from posting p, dict d where p.idT = d.idT and p.idS = n.idHouseS)) * sqrt((select sum(power(tf*idf, 2)) from posting p, dict d where p.idT = d.idT and p.idS = n.idOtherS))) score from numerator n, series s where n.idOtherS = s.idS order by 2 desc, 1 ;