Presentation for GOTO Berlin 2014.
Sorting algorithms are computational processes used to organize elements of a sequence in a certain order. In the last months I have tried to understand and transform the data left behind sorting algorithms into elegant visual forms that could help highlight the unique characteristics of each algorithm and find hidden patterns.
SORTING (http://sorting.at) is the result of this exploration.
This session explored some of the most famous and interesting sorting algorithms through their history, visualization and implementation with D3.js and require.js
Chasing Bugs with the BeepBeep Event Stream ProcessorSylvain Hallé
Runtime verification is the process of observing a sequence of events generated by a running system and comparing it to some formal specification for potential violations. We show how the use of the BeepBeep event stream processor can greatly speed up the testing phase of a video game under development, by automating the detection of bugs when the game is being played. This process generalizes to a wide number of other use cases, including web application debugging and network intrusion detection.
Dyablox is a toolkit for designing systems and devices for the internet of things.
Project for my master thesis in Interaction Design at Domus Academy.
Provide The Key To A First Class Educationpjdemees
Canadians with a university Bachelor level degree earn 50% more than someone with a high school diploma.
The question is – did these individuals have a plan to pay for their education or are they still paying for it now?
Let’s take a closer look.
Chasing Bugs with the BeepBeep Event Stream ProcessorSylvain Hallé
Runtime verification is the process of observing a sequence of events generated by a running system and comparing it to some formal specification for potential violations. We show how the use of the BeepBeep event stream processor can greatly speed up the testing phase of a video game under development, by automating the detection of bugs when the game is being played. This process generalizes to a wide number of other use cases, including web application debugging and network intrusion detection.
Dyablox is a toolkit for designing systems and devices for the internet of things.
Project for my master thesis in Interaction Design at Domus Academy.
Provide The Key To A First Class Educationpjdemees
Canadians with a university Bachelor level degree earn 50% more than someone with a high school diploma.
The question is – did these individuals have a plan to pay for their education or are they still paying for it now?
Let’s take a closer look.
Using Topological Data Analysis on your BigDataAnalyticsWeek
Synopsis:
Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks.
Speaker:
Anthony Bak, Senior Data Scientist, Ayasdi
Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.
Persistent Data Structures - partial::ConfIvan Vergiliev
The slides from my talk on Persistent Data Structures at http://partialconf.com/ . The "Implementation" part assumes a bit of prior knowledge on how persistent data structures work, but the rest should be generally accessible.
Seven Ineffective Coding Habits of Many ProgrammersKevlin Henney
Presented at Build Stuff (20th November 2014)
Habits help you manage the complexity of code. You apply existing skill and knowledge automatically to the detail while focusing on the bigger picture. But because you acquire habits largely by imitation, and rarely question them, how do you know your habits are effective? Many of the habits and conventions programmers have for naming, formatting, commenting and unit testing do not stand up as rational and practical on closer inspection.
This session examines seven coding habits that are not as effective as many programmers — whether working with Java, .NET, native or scripting languages — might believe, and suggests alternatives.
DN18 | A/B Testing: Lessons Learned | Dan McKinley | MailchimpDataconomy Media
Abstract about the Presemtation:
Introducing A/B testing to a large team that has never done it before is a weird and bewildering thing that Dan McKinley has somehow done twice. This has burdened him with many opinions about how to achieve this with minimal wailing and gnashing of teeth.
About the Author:
Dan McKinley is a Co-Founder of Skyliner in Los Angeles. Previously he worked at Stripe and spent nearly 7 years building Etsy, during which he worked on “pretty much every feature and backend facility on the site”. He resides in LA with his wife and son.
This is a talk I presented at University Limerick to give people an introduction into CouchDB.
What is it? How does it generally work? Introducing new concepts, etc.
An Incomplete Introduction to Artificial IntelligenceSteven Beeckman
This is the releasable version of an internal presentation on artificial intelligence. It includes a brief history of AI, a mathematical approach to deep learning and an overview of some use-cases of deep learning.
Spellcheck: "General Adversarial Networks" are actually called "Generative Adversarial Networks".
This talk covers the indexing structures considered and ultimately implemented in the Apache Lucene Open Source Project along with the 25 - 30X boost in performance and centimeter spatial accuracy achieved in the latest release. Have a look and see what's next for scalable Geospatial Search in Apache Lucene and Elasticsearch.
A slightly-modified version of my IPRUG talk, this time for the BT DevCon5 developer conference at Adastral Park on 25 May 2012.
The main changes are the addition of the Ruby section and the increased number of HHGTTG references in honour of towel day.
The things we don't see – stories of Software, Scala and AkkaKonrad Malawski
Opening keynote for Scalapeno, Tel Aviv 2016.
The talk focuses and explains the things we don't often see explicitly and/or don't notice when doing our daily work, yet make up a large part of the ecosystem and maturity of the ecoststem as a whole. We also dive into some of the more confusing bits around using the same word about different things in software
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
a generic implementation of a method for producing spreadsheets out of pipeline graphs See https://github.com/ISA-tools/graph2tab for details.
Presentation given to my group at EBI, on Feb 2, 2012.
Build, Branded and Coded - Placemaking in the Digital EraTom Beck
Our experience of place has always been a mash-up of the personal, social, natural, and manufactured environments. But what happens when an always-on layer of digital technology is added to the mix? This presentation explores three major themes at the intersection of placemaking and digital media and challenges us to consider the evolving role of design in a world were everything has the potential to become an interface.
Using Topological Data Analysis on your BigDataAnalyticsWeek
Synopsis:
Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks.
Speaker:
Anthony Bak, Senior Data Scientist, Ayasdi
Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.
Persistent Data Structures - partial::ConfIvan Vergiliev
The slides from my talk on Persistent Data Structures at http://partialconf.com/ . The "Implementation" part assumes a bit of prior knowledge on how persistent data structures work, but the rest should be generally accessible.
Seven Ineffective Coding Habits of Many ProgrammersKevlin Henney
Presented at Build Stuff (20th November 2014)
Habits help you manage the complexity of code. You apply existing skill and knowledge automatically to the detail while focusing on the bigger picture. But because you acquire habits largely by imitation, and rarely question them, how do you know your habits are effective? Many of the habits and conventions programmers have for naming, formatting, commenting and unit testing do not stand up as rational and practical on closer inspection.
This session examines seven coding habits that are not as effective as many programmers — whether working with Java, .NET, native or scripting languages — might believe, and suggests alternatives.
DN18 | A/B Testing: Lessons Learned | Dan McKinley | MailchimpDataconomy Media
Abstract about the Presemtation:
Introducing A/B testing to a large team that has never done it before is a weird and bewildering thing that Dan McKinley has somehow done twice. This has burdened him with many opinions about how to achieve this with minimal wailing and gnashing of teeth.
About the Author:
Dan McKinley is a Co-Founder of Skyliner in Los Angeles. Previously he worked at Stripe and spent nearly 7 years building Etsy, during which he worked on “pretty much every feature and backend facility on the site”. He resides in LA with his wife and son.
This is a talk I presented at University Limerick to give people an introduction into CouchDB.
What is it? How does it generally work? Introducing new concepts, etc.
An Incomplete Introduction to Artificial IntelligenceSteven Beeckman
This is the releasable version of an internal presentation on artificial intelligence. It includes a brief history of AI, a mathematical approach to deep learning and an overview of some use-cases of deep learning.
Spellcheck: "General Adversarial Networks" are actually called "Generative Adversarial Networks".
This talk covers the indexing structures considered and ultimately implemented in the Apache Lucene Open Source Project along with the 25 - 30X boost in performance and centimeter spatial accuracy achieved in the latest release. Have a look and see what's next for scalable Geospatial Search in Apache Lucene and Elasticsearch.
A slightly-modified version of my IPRUG talk, this time for the BT DevCon5 developer conference at Adastral Park on 25 May 2012.
The main changes are the addition of the Ruby section and the increased number of HHGTTG references in honour of towel day.
The things we don't see – stories of Software, Scala and AkkaKonrad Malawski
Opening keynote for Scalapeno, Tel Aviv 2016.
The talk focuses and explains the things we don't often see explicitly and/or don't notice when doing our daily work, yet make up a large part of the ecosystem and maturity of the ecoststem as a whole. We also dive into some of the more confusing bits around using the same word about different things in software
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
a generic implementation of a method for producing spreadsheets out of pipeline graphs See https://github.com/ISA-tools/graph2tab for details.
Presentation given to my group at EBI, on Feb 2, 2012.
Build, Branded and Coded - Placemaking in the Digital EraTom Beck
Our experience of place has always been a mash-up of the personal, social, natural, and manufactured environments. But what happens when an always-on layer of digital technology is added to the mix? This presentation explores three major themes at the intersection of placemaking and digital media and challenges us to consider the evolving role of design in a world were everything has the potential to become an interface.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
17. Walter Hickey / BI
http://www.businessinsider.com/pie-charts-are-the-worst-2013-6
18. Walter Hickey / BI
http://www.businessinsider.com/pie-charts-are-the-worst-2013-6
19. pie charts are the Aquaman of data visualization
http://www.businessinsider.com/pie-charts-are-the-worst-2013-6
20. pie charts are good at one thing. comparing 2-3 different data points with very different amounts of information
http://www.businessinsider.com/pie-charts-are-the-worst-2013-6
21. Walter Hickey / BI
http://visual.ly/impact-social-media-pr-industry-infographic // via @WTFViz
26. 2nd try I study data and I transform them into some type of visual stuff.
27.
28. 3rd and last try Out there, there is a lot of data generated by people and the environment. Sometime it is very scary to be put face to face with this giant amount of data. My job is to take all the information, understand it and transform it into some type of interactive tool that simplify the understanding of the data. Usually I generate a web application that can be used by people who have no knowledge of the data...
29.
30. the answer I like generate order before people’s brains try to do it in their own.
92. bubble sort O(n^2)
if the list is already sorted O(n)
always compare elements next to each other
also known as the “sinking sort”, “it has only a catchy name” (Donald Knuth)
93. comb sort O(n^2)
improves bubblesort by eliminating turtles and rabbits
gap between compared elements is bigger than 1 and shrinks at every iteration
94. selection sort O(n^2)
search for the smallest element and put it in first position
inefficiently looks for element to be positioned at their right position in the list
only n swaps, therefore is useful where swapping is expensive
95. insertion sort O(n^2)
makes space to the current item by moving larger items to the right
shifting all elements is expensive
96. shell sort O(n log n) – O(n )
variant of insertion sort based on pre-defined gaps
works on shrinking gaps, complexity based on the gaps
2
3/2
97. quick sort O(n log n)
divide and conquer algorithm
based on partioning and pivot selection
all elements smaller than the pivot are moved before it, greater after it
98. heap sort O(n log n)
improved selection sort by selecting largest element and placing at the end
uses a heap (b-tree) to rearrange the list
finding the next largest element takes O(log n)
99. inversions count
An inversion is a pair of positions in a sequence where the elements there located are out of their natural order.
It indicates the distance of that sequence from being sorted. A pair of elements (A[ i ] , A[ j ]) of a sequence A is called an inversion if i<j and A[ i ]>A[ j ].
107. d3.timer
to repeat batch of operations with requestAnimation frame
prevent path rendering from locking up the UI
d3.timer(function(elapsed){
element
.selectAll("path")
.attr("d",function(d,i){
//evaluate path_coordinates
return path_coordinates;
})
//evaluate flag of end of loop
return flag;
})
108. attrTween(t) + stroke-dashoffset
to animate the drawing
The attrTween operator is used when you need a custom interpolator, such as one that understands the semantics of SVG path data
.transition()
.duration(DURATION)
.attrTween("stroke-dashoffset",function(d,i){
var len = this.getTotalLength();
return function(t) {
return len*(1-t);
}
})
109. attrTween(t) + path.getPointAtLength(len*t)
to animate along a path
.transition()
.duration(DURATION)
.attrTween("transform",function(d){
return function(t){
var len = path.getTotalLength();
var p = path.getPointAtLength(len*t);
return "translate("+p.x+","+p.y]+")";
}
})
110.
111. and now
the best interpretation of
sorting algorithms EVER