https://bigscience.huggingface.co/
EN: Presentation of the BigScience project: a research initiative launched by HuggingFace and aiming to build a large language model (inspired by OpenAI and GPTx) over multiple languages and a very large processing cluster. The participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
FR : Présentation du projet Bigscience : un projet de recherche ouvert lancé par HuggingFace et qui a pour objectif de contruire un modèle de langue (ie un peu comme openAI et GPT-3) mais en explorant les problèmes liés au jeux de données et au modèle selon les angles des biais cognitifs, de l'impact social et environemental, des limites éthiques, des possibles gain de performance et de l'impact général de ce type d'approche lorsque le but n'est pas seulement "d'avoir un plus gros modèle".
Building and Evaluating Collection DashboardsRichard Urban
This slide deck served as an introduction to the Building and Evaluating Collections Dashboards Workshop, held at Museums & the Web 2010, Denver, CO.
Piotr Adamcyzk
Richard Urban
Michael Twidale
http://archimuse.com/mw2010/abstracts/prg_335002328.html
https://bigscience.huggingface.co/
EN: Presentation of the BigScience project: a research initiative launched by HuggingFace and aiming to build a large language model (inspired by OpenAI and GPTx) over multiple languages and a very large processing cluster. The participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
FR : Présentation du projet Bigscience : un projet de recherche ouvert lancé par HuggingFace et qui a pour objectif de contruire un modèle de langue (ie un peu comme openAI et GPT-3) mais en explorant les problèmes liés au jeux de données et au modèle selon les angles des biais cognitifs, de l'impact social et environemental, des limites éthiques, des possibles gain de performance et de l'impact général de ce type d'approche lorsque le but n'est pas seulement "d'avoir un plus gros modèle".
Building and Evaluating Collection DashboardsRichard Urban
This slide deck served as an introduction to the Building and Evaluating Collections Dashboards Workshop, held at Museums & the Web 2010, Denver, CO.
Piotr Adamcyzk
Richard Urban
Michael Twidale
http://archimuse.com/mw2010/abstracts/prg_335002328.html
Introduction to Singularity and Data ContainersVanessa S
A presentation for the encapsulation section at the Dataverse 2020 Community Meeting. Includes an overview of Singularity and dummy implementation / idea for data containers.
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Olivier Dobberkau
What is the value of the content on your website? Which one is creating value for your business? Who created it and how does the network of your editors perform?
In this presentation we want to introduce you to the ideas of our work that is done in the ForgetIt Project. We will also give an insight into how we think CMIS will be implemented in TYPO3 CMS so that content can be exchanged thru a content repository.
Last but not least we will give you a brief view into our semantic and concept detection services that we will introduce to TYPO3 CMS.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
Treading out Technology theory meets praxis during a thousand mile walk round...Alan Dix
Treading out Technology: theory meets praxis during a thousand mile walk round Wales
Swansea Nov 2012
In 2012 I will be walking around Wales following the Wales Coast Path that opened this year and Offa's Dyke long distance path up the borders. This is partly a personal journey reconnecting with the country of my childhood, but also a technological journey investigating the IT needs of the walker and the local communities through which I pass. Some of this will be mundane technologically speaking, but hopefully transformative in practice. However, I also expect to be pushed to the limits cartographically and theoretically. In particular, aspects of Semantic Web and the odd Galois Connection will be essential parts of the need to synchronise data between heterogeneous sources and following disconnection.
"Applied capability graphs: measuring large-scale technological changes using big data and network science"
Pedro Parraguez is Co-founder at Dataverz and Postdoctoral Researcher at the Technical University of Denmark
Connected heritage: How should Cultural Institutions Open and Connect Data?Mia
Keynote for the International Digital Culture Forum 2017, Taichung, Taiwan, August 2017
I approach the question by describing the mechanisms organisations have used to open and connect data, then I look at some of the positive outcomes that resulted from their actions. This is not a technical talk about different acronyms, it's about connecting people to our shared heritage.
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.
From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.
The Open Data movement is now moving a step forward, many governments, institutions and business have recently started the process of making information available to citizens and customers. Data is now seen as a powerful instrument to increase transparency in public administration and business on policies. About 80% of this information has a spatial component that is not entirely exploited yet. A range of open source solutions are now available to address this challenge, in this session we will explore their potential and possible applications. The so-called “data deluge” is here.. but we can build good umbrellas.
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
Talk di Juan Arevalo & Marco Giacomassi | Drupal Day Roma 2011
The Open Data movement is now moving a step forward, many governments, institutions and business have recently started the process of making information available to citizens and customers. Data is now seen as a powerful instrument to increase transparency in public administration and business on policies. About 80% of this information has a spatial component that is not entirely exploited yet. A range of open source solutions are now available to address this challenge, in this session we will explore their potential and possible applications. The so-called “data deluge” is here.. but we can build good umbrellas. Please come to learn more about it!
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRMariadnenetwork
Presentation given by Béatrice Markhoff of the University of Tours at the ARIADNE winter school on work that has been carried out to integrate data and to implement ArSol (Archives du Sol). The presentation describes the mapping to the CIDOC CRM and how its been implemented to provide a web based application.
Preserving born digital architectural drawings: developing preservation stra...datable_be
Presentation held on the conference 'Between Paper and Pixels: Transmedial traffic in architectural drawing' at TU Delft and Het Nieuwe Instituut, Rotterdam, Nov 30 – Dec 1, 2016
Introduction to Singularity and Data ContainersVanessa S
A presentation for the encapsulation section at the Dataverse 2020 Community Meeting. Includes an overview of Singularity and dummy implementation / idea for data containers.
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Olivier Dobberkau
What is the value of the content on your website? Which one is creating value for your business? Who created it and how does the network of your editors perform?
In this presentation we want to introduce you to the ideas of our work that is done in the ForgetIt Project. We will also give an insight into how we think CMIS will be implemented in TYPO3 CMS so that content can be exchanged thru a content repository.
Last but not least we will give you a brief view into our semantic and concept detection services that we will introduce to TYPO3 CMS.
Connections that work: Linked Open Data demystifiedJakob .
Keynote given 2014-10-22 at the National Library of Finland at Kirjastoverkkopäivät 2014 (https://www.kiwi.fi/pages/viewpage.action?pageId=16767828) #kivepa2014
Treading out Technology theory meets praxis during a thousand mile walk round...Alan Dix
Treading out Technology: theory meets praxis during a thousand mile walk round Wales
Swansea Nov 2012
In 2012 I will be walking around Wales following the Wales Coast Path that opened this year and Offa's Dyke long distance path up the borders. This is partly a personal journey reconnecting with the country of my childhood, but also a technological journey investigating the IT needs of the walker and the local communities through which I pass. Some of this will be mundane technologically speaking, but hopefully transformative in practice. However, I also expect to be pushed to the limits cartographically and theoretically. In particular, aspects of Semantic Web and the odd Galois Connection will be essential parts of the need to synchronise data between heterogeneous sources and following disconnection.
"Applied capability graphs: measuring large-scale technological changes using big data and network science"
Pedro Parraguez is Co-founder at Dataverz and Postdoctoral Researcher at the Technical University of Denmark
Connected heritage: How should Cultural Institutions Open and Connect Data?Mia
Keynote for the International Digital Culture Forum 2017, Taichung, Taiwan, August 2017
I approach the question by describing the mechanisms organisations have used to open and connect data, then I look at some of the positive outcomes that resulted from their actions. This is not a technical talk about different acronyms, it's about connecting people to our shared heritage.
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.
From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.
The Open Data movement is now moving a step forward, many governments, institutions and business have recently started the process of making information available to citizens and customers. Data is now seen as a powerful instrument to increase transparency in public administration and business on policies. About 80% of this information has a spatial component that is not entirely exploited yet. A range of open source solutions are now available to address this challenge, in this session we will explore their potential and possible applications. The so-called “data deluge” is here.. but we can build good umbrellas.
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
Talk di Juan Arevalo & Marco Giacomassi | Drupal Day Roma 2011
The Open Data movement is now moving a step forward, many governments, institutions and business have recently started the process of making information available to citizens and customers. Data is now seen as a powerful instrument to increase transparency in public administration and business on policies. About 80% of this information has a spatial component that is not entirely exploited yet. A range of open source solutions are now available to address this challenge, in this session we will explore their potential and possible applications. The so-called “data deluge” is here.. but we can build good umbrellas. Please come to learn more about it!
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
Béatrice Markhoff - Semantic mediation ArSol and CIDOC CRMariadnenetwork
Presentation given by Béatrice Markhoff of the University of Tours at the ARIADNE winter school on work that has been carried out to integrate data and to implement ArSol (Archives du Sol). The presentation describes the mapping to the CIDOC CRM and how its been implemented to provide a web based application.
Preserving born digital architectural drawings: developing preservation stra...datable_be
Presentation held on the conference 'Between Paper and Pixels: Transmedial traffic in architectural drawing' at TU Delft and Het Nieuwe Instituut, Rotterdam, Nov 30 – Dec 1, 2016
ECHO-core: metadataschema voor het beschrijven van studiecollectiesdatable_be
Presentation held at ‘Studiecollecties, een uitdagende context binnen universiteiten en musea’ Campus Mutsaard – Universiteit Antwerpen
donderdag 29 september 2016
Presentation held at the workshop 'Using the Europeana Fashion platform. Tools, best practices and use
cases' (Europeana Fashion International Conference: Digital Fashion Futures in Antwerp, February 24-25, 2015)
Basisvorming digitaliseren, digitaal bewaren en online publicerendatable_be
Slides van de sessie Digitaliseren, digitaal bewaren en online publiceren (deel 6 van het traject 'Duurzaam Digitaal Beheren', een modulair vormingstraject over het duurzaam beheren van digitale collecties)
Deze module biedt een instap in de wereld van het digitaliseren, digitaal bewaren en online publiceren van documentaire collecties.
Steeds meer medewerkers in erfgoedinstellingen worden geconfronteerd met digitaliseren en het digitaal bewaren van collecties. Ook nieuwe medewerkers in erfgoedorganisaties zijn vaak niet zo vertrouwd met deze materie. Tijdens deze cursus worden volgende vragen behandeld: Hoe organiseert u een digitaliseringproject? Wat zijn standaarden en bestandsformaten? Daarbij komt ook de technische kant van digitaliseren aan bod: wat is het meest geschikte materiaal en hoe gaat u praktisch aan de slag?
Er wordt ook aandacht besteed aan het archiveren, beschrijven en ontsluiten van digitale bestanden. Hierbij komen zowel technische aspecten als de problematiek van het auteursrecht aan bod.
7 bouwstenen voor een digitale strategiedatable_be
Welke impact heeft digitalisering op een erfgoedorganisatie, en hoe bouw je hiervoor een strategie uit? In deze presentatie worden 7 bouwstenen toegelicht waarmee de uitdaging van digitalisering zowel technisch als beleidsmatig kan worden aangegaan.
Presentatie op de OKBN bijeenkomst in het Van Abbemuseum, op 27 oktober 2014.
Presentatie op het Cultuurforum (Leuven, 23 april 2014).
Het belang van open data voor de cultuursector is aan een steile klim bezig.
Wat zijn open data? Hoe kunnen we het gebruik ervan in de erfgoedsector
stimuleren en een dynamiek creëren? Enkele experts schetsen de algemene achtergrond, maar u krijgt ook apps te zien die zijn gemaakt met open culturele datasets die nu al beschikbaar zijn
Digitaliseren, archiveren en online publiceren voor lokale erfgoedbeheerdersdatable_be
Presentaties van sessie 5 & 6 van de reeks Aan de slag met archief en documentatie, georganiseerd door Heemkunde Vlaanderen ism PACKED vzw (Maasmechelen, maart 2014)
The memory of the architect: archiving 2D and 3D born digital architectural a...datable_be
A presentation of a survey and testbed results, investigating the way architects manage their digital assets and a round up of issues when ingesting architects' bron digital archives
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Heritage data beyond the GLAM
1. ./datable
KNOWLEDGE CENTERS
BEYOND THE (G)LAM
Duurzaam digitaliseren in de museale wereld
Museum voor Schone Kunsten Gent
December 12, 2016
Henk Vanstappen
DATABLE
Hoe zet je in op duurzame data
en flexibele systemen?
30. ./datable
Visitor’s behavior in Louvre
(Yuji Yoshimura, Stanislav Sobolevsky, Carlo Ratti Fabien Girardin Juan
Pablo Carrascal, Josep Blat Roberta Sinatra, 2016)