Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.
MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
A more deeper talk on the Transformer architecture from the webinar at NTR
https://www.ntr.ai/webinar/transformery
Google slides version: https://docs.google.com/presentation/d/1dIadh_nIszxXG8-672vJmvFGT6jBp0mOqzNV4g3e2Lc/edit?usp=sharing
Parallel implementation of geodesic distance transform with application in su...Tuan Q. Pham
This poster presents a parallel implementation of geodesic distance transform using OpenMP. This work forms part of a C implementation for geodesic superpixel segmentation of natural images. Presented at DICTA 2013 conference
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
A more deeper talk on the Transformer architecture from the webinar at NTR
https://www.ntr.ai/webinar/transformery
Google slides version: https://docs.google.com/presentation/d/1dIadh_nIszxXG8-672vJmvFGT6jBp0mOqzNV4g3e2Lc/edit?usp=sharing
Parallel implementation of geodesic distance transform with application in su...Tuan Q. Pham
This poster presents a parallel implementation of geodesic distance transform using OpenMP. This work forms part of a C implementation for geodesic superpixel segmentation of natural images. Presented at DICTA 2013 conference
Deep Learning in Python with Tensorflow for FinanceBen Ball
Speaker: Ben Ball
Abstract: Python is becoming the de facto standard for many machine learning applications. We have been using Python with deep learning and other ML techniques, with a focus in prediction and exploitation in transactional markets. I am presenting one of our implementations (A dueling double DQN - a class of Reinforcement Learning algorithm) in Python using TensorFlow, along with information and background around the class of deep learning algorithm, and the application to financial markets we have employed. Attendees will learn how to implement a DQN using Tensorflow, and how to design a system for deep learning for solving a wide range of problems. The code will be available on github for attendees.
Bio: Ben is a believer in making a career out of what you love. He is inspired by the joining of excellent technology and research and likes building software that is easy to use, but does amazing things. His work has spanned 15 years, with a dual focus in AI Software Engineering and Algorithmic Trading. He is currently working as the CTO of http://prediction-machines.com
Video of the presentation:
https://engineers.sg/video/deep-learning-with-python-in-finance-singapore-python-user-group--1875
By Tobias Grosser, Scalable Parallel Computing Laboratory
The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.
Tobias Grosser Bio
Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.
Email
bgerofi@riken.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
Scheduling technology either commercial or homegrown in today’s crude-oil refining industries relies on a complex simulation of scenarios where the user is solely responsible for making many different decisions manually in the search for feasible solutions over some limited time-horizon i.e., trial-and-error heuristics. As a normal outcome, schedulers abandon these solutions and then return to their simpler spreadsheet simulators due to: (i) time-consuming efforts to configure and manage numerous scheduling scenarios, and (ii) requirements of updating premises and situations that are constantly changing. Moving to solutions based in optimization rather than simulation, the lecture describes the future steps in the refactoring of the scheduling technology in PETROBRAS considering in separate the graphic user interface (GUI) and data communication developments (non-modeling related), and the modeling and process engineering related in an automated decision-making with built-in problem representation facilities and integrated data handling features among other techniques in a smart scheduling frontline.
Deep Learning in Python with Tensorflow for FinanceBen Ball
Speaker: Ben Ball
Abstract: Python is becoming the de facto standard for many machine learning applications. We have been using Python with deep learning and other ML techniques, with a focus in prediction and exploitation in transactional markets. I am presenting one of our implementations (A dueling double DQN - a class of Reinforcement Learning algorithm) in Python using TensorFlow, along with information and background around the class of deep learning algorithm, and the application to financial markets we have employed. Attendees will learn how to implement a DQN using Tensorflow, and how to design a system for deep learning for solving a wide range of problems. The code will be available on github for attendees.
Bio: Ben is a believer in making a career out of what you love. He is inspired by the joining of excellent technology and research and likes building software that is easy to use, but does amazing things. His work has spanned 15 years, with a dual focus in AI Software Engineering and Algorithmic Trading. He is currently working as the CTO of http://prediction-machines.com
Video of the presentation:
https://engineers.sg/video/deep-learning-with-python-in-finance-singapore-python-user-group--1875
By Tobias Grosser, Scalable Parallel Computing Laboratory
The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.
Tobias Grosser Bio
Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.
Email
bgerofi@riken.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
Scheduling technology either commercial or homegrown in today’s crude-oil refining industries relies on a complex simulation of scenarios where the user is solely responsible for making many different decisions manually in the search for feasible solutions over some limited time-horizon i.e., trial-and-error heuristics. As a normal outcome, schedulers abandon these solutions and then return to their simpler spreadsheet simulators due to: (i) time-consuming efforts to configure and manage numerous scheduling scenarios, and (ii) requirements of updating premises and situations that are constantly changing. Moving to solutions based in optimization rather than simulation, the lecture describes the future steps in the refactoring of the scheduling technology in PETROBRAS considering in separate the graphic user interface (GUI) and data communication developments (non-modeling related), and the modeling and process engineering related in an automated decision-making with built-in problem representation facilities and integrated data handling features among other techniques in a smart scheduling frontline.
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
Please cite the following paper:
Toyotaro Suzumura and Koji. Ueno, "ScaleGraph: A high-performance library for billion-scale graph analytics," 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, 2015, pp. 76-84.
doi: 10.1109/BigData.2015.7363744
Recently, large-scale graph analytics has become a very popular topic owing to the emergence of gigantic graphs whose number of vertices and edges is in millions, billions or even trillions. Many graph analytics libraries and frameworks have been proposed with various computational models and programming languages to deal with such graphs. X10 programming language is a PGAS language that aims at both software performance and programmer's productivity. We introduce ScaleGraph library developed using X10 programming to illustrate the use of X10 for large-scale graph analytics. ScaleGraph library provides XPregel framework that is inspired by Google's Pregel computation model, serving as a building block for implementing graph kernels. We also optimized X10 runtime in some parts such as collective communication and memory management. We evaluated the performance and scalability of ScaleGraph libraries. The result shows that most graph kernels have good performance and scalability. ScaleGraph library is 9.4 times faster than Giraph in the experiment of PageRank with 16 machine nodes. To the best of our knowledge, ScaleGraph is the first X10-based library to address performance, scalability and productivity issues in dealing with large-scale graph analytics.
Global Knowledge Collaboration to Cure Cancer: How GPUs Impact Graph & Predictive Analytics
Brad Bebee, CEO of Blazegraph
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...AMD Developer Central
Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Results of the GPUs for GEC Competition held at GECCO 2013.
Organizers
Daniele Loiacono, Politecnico di Milano
Antonino Tumeo, Pacific Northwest National Laboratory
Webpage
http://gpu.geccocompetitions.com
Deep learning algorithms have benefited greatly from the recent performance gains of GPUs. However, it has been unclear whether GPUs can speed up machine learning algorithms such as generalized linear modeling, random forests, gradient boosting machines, and clustering. H2O.ai, the leading open source AI company, is bringing the best-of-breed data science and machine learning algorithms to GPUs.
We introduce H2O4GPU, a fully featured machine learning library that is optimized for GPUs with a robust python API that is drop dead replacement for scikit-learn. We'll demonstrate benchmarks for the most common algorithms relevant to enterprise AI and showcase performance gains as compared to running on CPUs.
Jon’s Bio:
https://umdphysics.umd.edu/people/faculty/current/item/337-jcm.html
Please view the video here:
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
For the IPDPS ParSocial event a presentation submission is required by 15th May. The event is on 3rd June.
https://gist.github.com/wolfram77/51b15ca09eb28f6909673a2deb1a314d
DYNAMIC BATCH PARALLEL
ALGORITHMS FOR UPDATING
PAGERANK
Subhajit Sahut, Kishore Kothapallit and Dip Sankar Banerjeet
tInternational Institute of Information Technology Hyderabad, India.
tIndian Institute of Technology Jodhpur, India.
subhajit.sahu@research. ,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the
National Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
FACEBOOK 15 TAKING A PAGE OUT
OF GOOGLE’S PLAYBOOK 10 STOP
FAKE NEWS FROM GOING VIRAL
PUBLISHED APR 2015 BY SALVADOR RODRIGUEZ
Click-Gap: When is Facebook
is driving disproportionate
amounts of traffic to
websites.
Effort to rid fakes news
from Facebook’s services.
Is a website relying on
Facebook to drive
significant traffic, but not
well ranked by the rest of
the web?
Also News Citation Graph.
PAGERANK APPLICATIONS
Ranking of websites.
Measuring scientific impact of researchers.
Finding the best teams and athletes.
Ranking companies by talent concentration.
Predicting road/foot traffic in urban spaces.
Analysing protein networks.
Finding the most authoritative news sources
Identifying parts of brain that change jointly.
Toxic waste management.
PAGERANK APPLICATIONS
Debugging complex software systems (Moni torRank)
Finding the most original writers (BookRank)
Finding topical authorities (TwitterRank)
WHAT IS PAGERANK
l—-d
Plu = Cus + ——
UCIiNny
Pru
u->v = (1-—d) x
“us ( ) outdegy,
PageRank is a lLink-analysis algorithm.
By Larry Page and Sergey Brin in 1996.
For ordering information on the web.
Represented with a random-surfer model.
Rank of a page is defined recursively.
Calculate iteratively with power-iteration.
OpenPOWER summit NA 2019 topics.
Contents cited from here:
https://events19.linuxfoundation.org/events/openpower-summit-north-america-2019/program/schedule/
GraphChi (Michael Leznik, Head of BI - London, King)
GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer.
Big Stream Processing Systems, Big GraphsPetr Novotný
Big Data, a recent phenomenon. Everyone talks about it, but do you really know what Big Data is? Join our four-part series about Big Data and you will get answers to your questions!
We will cover Introduction to Big Data and available platforms which we can use to deal with Big Data. And in the end, we are going to give you an insight into the possible future of dealing with Big Data.
After the two previous episodes you know the basics about Big Data. Yet, it might get a bit more complicated than that. Usually when you have to deal with data which is generated in real-time. In this case, you are dealing with Big Stream.
This episode of our series will be focussed on processing systems capable of dealing with Big Streams. But analysing data lacking graphical representation will not be very convenient for us. And this is where we have to use a platform capable of visualising Big Graphs. All these topics will be covered in today’s presentation.
#CHEDTEB
www.chedteb.eu
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
Similar to Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC (20)
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
Understanding Human Impact: Social and Equity Assessments for AI Technologies
Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies.
In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
The Brain’s Guide to Dealing with Context in Language Understanding
Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity.
In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
Applying Computer Vision to Reduce Contamination in the Recycling Stream
With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt.
Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology.
Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean.
Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable.
In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
Quantum Computing: a Treasure Hunt, not a Gold Rush
Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.
Josh Wills - Data Labeling as Religious ExperienceMLconf
Data Labeling as Religious Experience
One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics
The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language
Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
Optimized Image Classification on the Cheap
In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier.
To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
The Importance of Modeling Data Collection
Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have.
In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set.
My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.
The Uncanny Valley of ML
Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Deep Learning Architectures for Semantic Relation Detection Tasks
Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model
At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability.
We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI World
The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query.
We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.