Alea iacta est! Understanding historical dynamics using Monte Carlo simulations” wtih Xavier-Rubio Campillo, researcher at Barcelona Supercomputing Center - @xrubiocampillo
Pythagoras proved the Pythagorean theorem using squares and triangles of equal area to show that the square of the hypotenuse is equal to the sum of the squares of the other two sides. While he is credited with the proof, the theorem was used over 1,000 years earlier. According to legend, Pythagoras was so pleased with his proof that he sacrificed 100 oxen. He was part of a group called the Pythagoreans who made discoveries in irrational numbers.
#2 DataBeersBCN - "Analog data visualitzation in the digital age" by Alberto ...DataBeersBCN
This document provides a summary of 6 examples of analog data visualization projects. It lists the names and website URLs of projects like Data Cuisine, Decoding Dom Perignon, Dear Data, and Handmade Dataviz Kit, which create non-digital visualizations of data through means like postcards, paper crafts and physical installations. The document serves to highlight creative approaches to representing data and insights in tangible, non-screen based forms.
#5 DataBeersBCN -"Dos and Don'ts of Data Viz"DataBeersBCN
This document provides dos and don'ts for data visualization. It discusses how to properly scale and represent proportions in charts. Common misleading techniques are shown such as rescaling axes, omitting the y-axis origin, using different scales for the same axis, and including meaningless or invented data. The document advocates showing only relevant information without crowding plots. It also notes that the visualization should fit the intended audience and goal. While rules can be broken, the overall message is that visualizations should accurately and honestly portray the data.
#5 DataBeersBCN -"Location Based Business Oportunity Detector"DataBeersBCN
The document discusses a potential location-based business opportunity detector service called an LBBOD. The LBBOD would gather sensor, business, and fieldwork data around a user to detect opportunities near them and save them time spent searching. It would provide information on competitors and collaborators, the type of business environment, and the suitability of different locations for businesses. The LBBOD aims to answer questions about opportunities, risks, commercial activity in an area, and whether specific locations would work for a user's business.
This document discusses designing cities with consideration for urban smells. It mentions conducting "smell walks" in several cities to collect words associated with urban smells. A wheel was created to categorize both positive and negative urban smells into groups like nature, animals, and emissions. Data from London and Barcelona showed correlations between reported smells and air pollution levels. The document advocates that city planning should account for different urban smells in addition to other senses like sight and sound.
#2 DataBeersBCN - "Why counting people at public transport" by Caterina FontDataBeersBCN
Counting passengers at public transportation helps transportation agencies understand capacity levels over time, passenger travel patterns including popular routes and times of trips, and passenger demographics. This data allows agencies to appropriately size vehicles, adjust prices, improve scheduling flexibility and route planning to better match capacity with demand. Mobility surveys using people counters provide passenger numbers, origin-destination data and other insights over periods of weeks to help transportation planning.
The document discusses the income of average US teachers compared to Bill Gates. It notes that the average yearly income for a US teacher is $40,000, while Bill Gates' net worth was listed as $4 billion in Forbes in 2015. Charts on the pages compare the "Resistance to Pay" or willingness to spend money for common expenses like rice or a car between a teacher making $40,000 a year and Bill Gates with a net worth of $4 billion. The document also provides brief descriptions of the scientific method process.
Pythagoras proved the Pythagorean theorem using squares and triangles of equal area to show that the square of the hypotenuse is equal to the sum of the squares of the other two sides. While he is credited with the proof, the theorem was used over 1,000 years earlier. According to legend, Pythagoras was so pleased with his proof that he sacrificed 100 oxen. He was part of a group called the Pythagoreans who made discoveries in irrational numbers.
#2 DataBeersBCN - "Analog data visualitzation in the digital age" by Alberto ...DataBeersBCN
This document provides a summary of 6 examples of analog data visualization projects. It lists the names and website URLs of projects like Data Cuisine, Decoding Dom Perignon, Dear Data, and Handmade Dataviz Kit, which create non-digital visualizations of data through means like postcards, paper crafts and physical installations. The document serves to highlight creative approaches to representing data and insights in tangible, non-screen based forms.
#5 DataBeersBCN -"Dos and Don'ts of Data Viz"DataBeersBCN
This document provides dos and don'ts for data visualization. It discusses how to properly scale and represent proportions in charts. Common misleading techniques are shown such as rescaling axes, omitting the y-axis origin, using different scales for the same axis, and including meaningless or invented data. The document advocates showing only relevant information without crowding plots. It also notes that the visualization should fit the intended audience and goal. While rules can be broken, the overall message is that visualizations should accurately and honestly portray the data.
#5 DataBeersBCN -"Location Based Business Oportunity Detector"DataBeersBCN
The document discusses a potential location-based business opportunity detector service called an LBBOD. The LBBOD would gather sensor, business, and fieldwork data around a user to detect opportunities near them and save them time spent searching. It would provide information on competitors and collaborators, the type of business environment, and the suitability of different locations for businesses. The LBBOD aims to answer questions about opportunities, risks, commercial activity in an area, and whether specific locations would work for a user's business.
This document discusses designing cities with consideration for urban smells. It mentions conducting "smell walks" in several cities to collect words associated with urban smells. A wheel was created to categorize both positive and negative urban smells into groups like nature, animals, and emissions. Data from London and Barcelona showed correlations between reported smells and air pollution levels. The document advocates that city planning should account for different urban smells in addition to other senses like sight and sound.
#2 DataBeersBCN - "Why counting people at public transport" by Caterina FontDataBeersBCN
Counting passengers at public transportation helps transportation agencies understand capacity levels over time, passenger travel patterns including popular routes and times of trips, and passenger demographics. This data allows agencies to appropriately size vehicles, adjust prices, improve scheduling flexibility and route planning to better match capacity with demand. Mobility surveys using people counters provide passenger numbers, origin-destination data and other insights over periods of weeks to help transportation planning.
The document discusses the income of average US teachers compared to Bill Gates. It notes that the average yearly income for a US teacher is $40,000, while Bill Gates' net worth was listed as $4 billion in Forbes in 2015. Charts on the pages compare the "Resistance to Pay" or willingness to spend money for common expenses like rice or a car between a teacher making $40,000 a year and Bill Gates with a net worth of $4 billion. The document also provides brief descriptions of the scientific method process.
nternational Biodiversity Projects and Natural History Museums: Current stat...Klaus Riede
Background / Purpose: The 21st century started with an impressive number of international biodiversity initiatives, such as the International Year of Biodiversity (2010) and the recently launched United Nations Decade on Biodiversity ( http://www.cbd.int/2011-2020/ ). Main conclusion: Most nations are now members of the Convention on Biological Diversity and expressed a strong commitment for safeguarding Earth´s biodiversity through their National Biodiversity Action Plans and work programs supporting taxonomy, such as the Global Taxonomy Initiative. Internet projects such as the Global Biodiversity Information Facility provide unprecedented opportunities for taxonomists and Natural History Museums to make their efforts visible through the federation of separate museum databases: users can search for species, visualise localities on a map and recall pictures of museum specimens made available by “Virtual Museums”. However, availability of multimedia data is still limited, particularly for type specimens. Taking European museums as an example, I demonstrate the potential of successful virtual museum projects and analyse priorities and needs for further digitisation, which is a pre-requisite for repatriation of biodiversity data from tropical countries. Improved access to collections is also among the main tasks of the recently established CETAF secretariat in Brussels ( Consortium of European Taxonomic Facilities ). This new institution will function as a European voice for taxonomy and systematics, and hopefully helps to sustain orphaned EU activities from former projects supporting taxonomy, such as the European Distributed Institute of Taxonomy .
This document discusses the Biodiversity Heritage Library (BHL) project and its role in supporting other biodiversity initiatives like the Encyclopedia of Life (EOL). The BHL aims to digitize published literature on biodiversity and make it openly accessible online. It has already digitized over 4 million pages and works closely with groups like EOL to integrate taxonomic data. The document outlines the BHL's goals, partnerships, digitization process, and how it brings together distributed information on species through its use of taxonomic intelligence.
Slides for the talk on DNA data storage, molecular recording using CRISPR and molecular machines for biologically embedded computational functions. #science #scicomm #outreach
This document discusses new directions for e-science in the arts and humanities. Specifically, it discusses using networks to connect resources like virtual libraries and museums. It also addresses challenges like dealing with large datasets from simulations and linking heterogeneous resources. Finally, it provides examples of past e-science projects in areas like dance documentation, image analysis, and musicology that have helped map e-science approaches to digital humanities research.
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
Oh Time, Thy Pyramids! The Biodiversity Heritage Library and the Unchaining o...Martin Kalfatovic
Oh Time, Thy Pyramids! The Biodiversity Heritage Library and the Unchaining of the Universal Library(?). Martin Kalfatovic. Information Futures Institute. Berkman Center for Internet & Society. April 12, 2008. Cambridge, MA.
The document discusses best practices for preserving digital research data for future use. It emphasizes the importance of thorough documentation, structured file organization, and open file formats to ensure digital research can be understood and built upon over time. Proper documentation should explain the data sources and limitations, document any changes or iterations clearly, and be in a machine-readable format like plain text. File naming, versioning, and folder structures also impact how understandable and reusable the data will be in the future.
The document provides an overview of what life was like for soldiers living in trenches during World War 1. It describes trenches as long narrow ditches dug for shelter from enemy fire, with the German and Allied trenches on opposite sides separated by no man's land. Life in the trenches was difficult, as soldiers lived in cold, muddy conditions without proper sanitation or sleeping accommodations. Trenches were infested with rats carrying disease, and soldiers faced the constant threat of attack when going over the top into no man's land.
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Martin Kalfatovic
The document discusses the Biodiversity Heritage Library (BHL), a collaborative project to digitize literature related to biodiversity and make it openly accessible online. It describes the goals of the BHL, participating institutions like natural history museums and botanical gardens, the types of literature being digitized, and challenges around metadata and linking digitized content to taxonomic databases.
Analysis of technical papers and terms from technical dictionaries (e.g. CIRP dictionary, CIRPedia, etc..). Solution of disambiguation of technical texts through high quality dictionaries.
This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes.
I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)
Digital Scholarship Intersection Scale Social MachinesDavid De Roure
This document discusses digital scholarship and social machines. It begins with an overview of digital humanities and social machines. It then provides examples of digital scholarship projects that utilize large datasets, citizen science, and social annotation. These examples demonstrate how digital methods can facilitate collaboration at scale. The document argues that a digital strategy is needed to guide investment and support for research using digital infrastructure and methods at universities.
The All Birds Barcoding Initiative has opened new research opportunities with its global atlas of avian mitochondrial DNA diversity. The initiative has published over 13,000 barcode records from museum collections, representing over 3,000 bird species. Private records include an additional 12,000 sequences and 1,000 species. The avian barcode library is a valuable research tool that can be used to analyze large-scale patterns in biodiversity and evolution, population biology, and biogeography.
IB Biology Option D.3: Human evolutionJason de Nys
This document provides an overview of topics related to human evolution, including:
- Methods for radioactive dating of rocks and fossils using carbon-14 and potassium-40 isotopes.
- Key anatomical features that define humans as primates, such as grasping limbs and binocular vision.
- Major trends seen in hominid fossils like Ardipithecus, Australopithecus, and Homo species showing brain size increase and facial shortening over time.
- Potential for multiple hominid species to coexist and uncertainties due to an incomplete fossil record.
- Relationship between increased brain size and diet change in hominids, correlated with meat consumption.
- Distinction between genetic evolution through natural selection
1) Fossil evidence and similarities in DNA sequences provide evidence that evolution has occurred.
2) Archaeopteryx lithographica is considered a transitional form that shares characteristics with both birds and their reptilian predecessors.
3) DNA evidence from related species would likely show similarities, suggesting a shared common ancestor.
#6 DataBeersBCN -"The (Big) Data behind the brain"DataBeersBCN
This document discusses different methods for measuring and mapping the brain at micro, meso, and macro scales. It outlines technologies like electron microscopy, axonal tracing, Clarity brainbow techniques, MRI, CT, PET, and SPECT imaging. Important historical developments are noted, such as the invention of X-ray, CT, PET and MRI imaging technologies. Challenges and opportunities in open data initiatives and the emerging field of connectomics are also mentioned.
#5 DataBeersBCN -"How to do Data Journalism… and not die trying"DataBeersBCN
1. The document discusses the history and evolution of data journalism, from early examples in the 1800s to modern practices using new digital tools.
2. It outlines key aspects of modern data journalism, such as multidisciplinary teams and making sources and methods transparent.
3. The author argues that data journalism is increasingly important for accountability by enabling investigative reporting using transparency laws and open data.
#5 DataBeersBCN -"The gripping potentials of Sociothermodynamics"DataBeersBCN
This document discusses the potential of sociothermodynamics to model human decision making and behavior using concepts from quantum mechanics. It provides several examples showing how human decisions can exhibit quantum-like effects, such as order effects and superposition of states. The document suggests this approach could help explain viral spreading of messages, electoral outcomes, and designing sustainable transportation networks by better understanding how individuals in a society interact similarly to quantum particles.
nternational Biodiversity Projects and Natural History Museums: Current stat...Klaus Riede
Background / Purpose: The 21st century started with an impressive number of international biodiversity initiatives, such as the International Year of Biodiversity (2010) and the recently launched United Nations Decade on Biodiversity ( http://www.cbd.int/2011-2020/ ). Main conclusion: Most nations are now members of the Convention on Biological Diversity and expressed a strong commitment for safeguarding Earth´s biodiversity through their National Biodiversity Action Plans and work programs supporting taxonomy, such as the Global Taxonomy Initiative. Internet projects such as the Global Biodiversity Information Facility provide unprecedented opportunities for taxonomists and Natural History Museums to make their efforts visible through the federation of separate museum databases: users can search for species, visualise localities on a map and recall pictures of museum specimens made available by “Virtual Museums”. However, availability of multimedia data is still limited, particularly for type specimens. Taking European museums as an example, I demonstrate the potential of successful virtual museum projects and analyse priorities and needs for further digitisation, which is a pre-requisite for repatriation of biodiversity data from tropical countries. Improved access to collections is also among the main tasks of the recently established CETAF secretariat in Brussels ( Consortium of European Taxonomic Facilities ). This new institution will function as a European voice for taxonomy and systematics, and hopefully helps to sustain orphaned EU activities from former projects supporting taxonomy, such as the European Distributed Institute of Taxonomy .
This document discusses the Biodiversity Heritage Library (BHL) project and its role in supporting other biodiversity initiatives like the Encyclopedia of Life (EOL). The BHL aims to digitize published literature on biodiversity and make it openly accessible online. It has already digitized over 4 million pages and works closely with groups like EOL to integrate taxonomic data. The document outlines the BHL's goals, partnerships, digitization process, and how it brings together distributed information on species through its use of taxonomic intelligence.
Slides for the talk on DNA data storage, molecular recording using CRISPR and molecular machines for biologically embedded computational functions. #science #scicomm #outreach
This document discusses new directions for e-science in the arts and humanities. Specifically, it discusses using networks to connect resources like virtual libraries and museums. It also addresses challenges like dealing with large datasets from simulations and linking heterogeneous resources. Finally, it provides examples of past e-science projects in areas like dance documentation, image analysis, and musicology that have helped map e-science approaches to digital humanities research.
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
Oh Time, Thy Pyramids! The Biodiversity Heritage Library and the Unchaining o...Martin Kalfatovic
Oh Time, Thy Pyramids! The Biodiversity Heritage Library and the Unchaining of the Universal Library(?). Martin Kalfatovic. Information Futures Institute. Berkman Center for Internet & Society. April 12, 2008. Cambridge, MA.
The document discusses best practices for preserving digital research data for future use. It emphasizes the importance of thorough documentation, structured file organization, and open file formats to ensure digital research can be understood and built upon over time. Proper documentation should explain the data sources and limitations, document any changes or iterations clearly, and be in a machine-readable format like plain text. File naming, versioning, and folder structures also impact how understandable and reusable the data will be in the future.
The document provides an overview of what life was like for soldiers living in trenches during World War 1. It describes trenches as long narrow ditches dug for shelter from enemy fire, with the German and Allied trenches on opposite sides separated by no man's land. Life in the trenches was difficult, as soldiers lived in cold, muddy conditions without proper sanitation or sleeping accommodations. Trenches were infested with rats carrying disease, and soldiers faced the constant threat of attack when going over the top into no man's land.
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Martin Kalfatovic
The document discusses the Biodiversity Heritage Library (BHL), a collaborative project to digitize literature related to biodiversity and make it openly accessible online. It describes the goals of the BHL, participating institutions like natural history museums and botanical gardens, the types of literature being digitized, and challenges around metadata and linking digitized content to taxonomic databases.
Analysis of technical papers and terms from technical dictionaries (e.g. CIRP dictionary, CIRPedia, etc..). Solution of disambiguation of technical texts through high quality dictionaries.
This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes.
I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)
Digital Scholarship Intersection Scale Social MachinesDavid De Roure
This document discusses digital scholarship and social machines. It begins with an overview of digital humanities and social machines. It then provides examples of digital scholarship projects that utilize large datasets, citizen science, and social annotation. These examples demonstrate how digital methods can facilitate collaboration at scale. The document argues that a digital strategy is needed to guide investment and support for research using digital infrastructure and methods at universities.
The All Birds Barcoding Initiative has opened new research opportunities with its global atlas of avian mitochondrial DNA diversity. The initiative has published over 13,000 barcode records from museum collections, representing over 3,000 bird species. Private records include an additional 12,000 sequences and 1,000 species. The avian barcode library is a valuable research tool that can be used to analyze large-scale patterns in biodiversity and evolution, population biology, and biogeography.
IB Biology Option D.3: Human evolutionJason de Nys
This document provides an overview of topics related to human evolution, including:
- Methods for radioactive dating of rocks and fossils using carbon-14 and potassium-40 isotopes.
- Key anatomical features that define humans as primates, such as grasping limbs and binocular vision.
- Major trends seen in hominid fossils like Ardipithecus, Australopithecus, and Homo species showing brain size increase and facial shortening over time.
- Potential for multiple hominid species to coexist and uncertainties due to an incomplete fossil record.
- Relationship between increased brain size and diet change in hominids, correlated with meat consumption.
- Distinction between genetic evolution through natural selection
1) Fossil evidence and similarities in DNA sequences provide evidence that evolution has occurred.
2) Archaeopteryx lithographica is considered a transitional form that shares characteristics with both birds and their reptilian predecessors.
3) DNA evidence from related species would likely show similarities, suggesting a shared common ancestor.
#6 DataBeersBCN -"The (Big) Data behind the brain"DataBeersBCN
This document discusses different methods for measuring and mapping the brain at micro, meso, and macro scales. It outlines technologies like electron microscopy, axonal tracing, Clarity brainbow techniques, MRI, CT, PET, and SPECT imaging. Important historical developments are noted, such as the invention of X-ray, CT, PET and MRI imaging technologies. Challenges and opportunities in open data initiatives and the emerging field of connectomics are also mentioned.
#5 DataBeersBCN -"How to do Data Journalism… and not die trying"DataBeersBCN
1. The document discusses the history and evolution of data journalism, from early examples in the 1800s to modern practices using new digital tools.
2. It outlines key aspects of modern data journalism, such as multidisciplinary teams and making sources and methods transparent.
3. The author argues that data journalism is increasingly important for accountability by enabling investigative reporting using transparency laws and open data.
#5 DataBeersBCN -"The gripping potentials of Sociothermodynamics"DataBeersBCN
This document discusses the potential of sociothermodynamics to model human decision making and behavior using concepts from quantum mechanics. It provides several examples showing how human decisions can exhibit quantum-like effects, such as order effects and superposition of states. The document suggests this approach could help explain viral spreading of messages, electoral outcomes, and designing sustainable transportation networks by better understanding how individuals in a society interact similarly to quantum particles.
#4 DataBeersBCN - "We know what you did last sonar" by Fernando CucchiettiDataBeersBCN
This document discusses using Cassandra DB for data storage, analytics, real-time processing, recommendations, aggregated statistics tracking, sensors, and visualization for business intelligence and t-shirts. It also mentions Sonar+D PlantaComplex Village, Sonardome Hall, and the number of iOS devices compared to people and addresses, with signals lasting less than 20 minutes. A schedule is listed for various days of the week.
#3 DataBeersBCN - "The impact of data in reality" by Karina GibertDataBeersBCN
The document discusses the impact of data on decision making in complex real-world domains. It notes that while data availability is growing exponentially, less than 1% of data is currently analyzed, leading to suboptimal decisions. Two main research fields—data mining and intelligent decision support systems—aim to extract better knowledge from data and support decision makers, but there remains a gap between these fields. The document advocates for an integrative, multidisciplinary approach combining data mining, modeling, and decision theory to develop intelligent decision support systems that can help decision makers understand data and make more informed choices.
#3 DataBeersBCN - "Big Fun Data" by Xavier GuardiolaDataBeersBCN
The document discusses how King.com uses big data and the scientific method to optimize user engagement and retention in their mobile and web games. It notes that King.com has 185 games with 364 million monthly unique players generating 14 billion rows of data per day. King focuses on key metrics like retention, engagement, monetization, conversion, and virality. The presentation explains how King designs experiments by making small changes to levels and uses data science to analyze results and improve the user experience.
#4 DataBeersBCN - "When a Movement Becomes a Party" by Pablo AragonDataBeersBCN
The document analyzes how the political movement 15M transformed into a political party called Barcelona en Comú for the 2015 Barcelona City Council election. Through analysis of over 500k tweets, it finds that Barcelona en Comú exhibits both centralized and decentralized structures on Twitter. Specifically, it has a centralized and less resilient party cluster, as well as a decentralized and more resilient movement cluster, suggesting the party acts as an interface between minor political parties and 15M activists.
#2 DataBeersBCN - "Using data to make great and succesful mobile games" by J...DataBeersBCN
This document discusses using data to create successful mobile games. It highlights the importance of analyzing funnels to track the percentage of users completing tutorials and engagement events. Natural language processing is also mentioned as a way to understand what players say about games to help guide improvements. The overall message is that collecting and leveraging data on player behavior and feedback is key to developing great mobile games.
#2 DataBeersBCN - "Govern Obert - Opengov.cat" by Concha CatalanDataBeersBCN
The document discusses a seismometer, an API that tracks daily changes, and tweets from Govern and others about opening data and requests for information. Visualizar15 thanked Concha Catalan, MVTango, LasaRux, Albert Carles, and others for a new design and open government in Catalonia.
“Pear Campaigns -Publicity Effective AwaReness Campaigns. 1st prize BBVA Innova Challenge Mx” with David Solans from Centre Innovació i Tecnoglogia, UPC
#1 DataBeersBCN - Dani Villatoro from BBVA DATA ANALYTICSDataBeersBCN
This document provides information about Databeers, which are events that combine presentations on data-related topics with socializing over beers. The events follow a standard format of talks no longer than 6 minutes on data relationships, followed by more socializing. Anyone can give a talk, as long as it avoids code, formulas, and sells the presenter's data story. The document promotes following the Databeers social media accounts and attending future related events that combine data, innovation challenges, food, and drinks.
#1 DataBeersBCN - Oscar Marin from Outliers.CollectiveDataBeersBCN
This document contains links to various projects and analyses done by Óscar Marín Miró including mood analysis of text using lexicons and corpora, geohashing to analyze events and locations on social media, viralization analysis, and profile mining of political affiliations on Twitter through lexicon and corpus analysis. Many of the links provide more information on analyzing emotions in text and geolocating social media posts.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
1. 1
Alea Iacta Est!
Understanding historical dynamics using Monte Carlo simulations
Xavier Rubio Campillo
xavier.rubio@bsc.es
@xrubiocampillo
@DataBeersBCN
2.
3. 3
Understanding human culture
Gray, R. D.; Atkinson, Q.D. (2003) "Language-tree divergence times support the
Anatolian theory of Indo-European origin." Nature 426.6965: 435-439
9. Can we test this hypothesis?
"Vauban reversed the dominance of the trace italienne and
overturned the pattern of long sieges of early centuries"
Ostwald, J. (2007). Vauban under siege. Engineering efficiency and Martial Vigor in the War of the
Spanish Succession, Brill Academic.
11. 1. Formalise hypotheses
During the period 1702-1714...
H1 – The duration of sieges increased
H2 – The duration of sieges decreased
H3 – The uncertainty of sieges increased
H4 – The uncertainty of sieges decreased
12. 2. Define a model
For each year between 1702-
1714
Sample the number of sieges
For each siege sample its
duration
Add to the mean/variance of
duration some fixed value
14. 4. The dice are cast!
Execute the model with randomly sampled values from your
prior:
run 152112:
mean duration modifier = +1.5
variance duration modifier = -0.3
15. 5. Approximate Bayesian Computation
Execute this algorithm for millions of runs
Store the parameters for the 1k runs with most similar output
to historical data
20. All text and image content in this document is licensed under the Creative Commons Attribution-Share Alike 3.0 License
(unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos
and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy.
20
Moltes gràcies!
Xavier Rubio Campillo
xavier.rubio@bsc.es
@xrubiocampillo