Bài review cách tính nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị. Ứng dụng trong nhiều lĩnh vực như: telecome, internet routing, social network analysis, etc.
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning in the Wild with Arno CandelSri Ambati
"Deep Learning in the Wild" Meetup at H2O, Mountain View
Livestream: http://t.co/o7p2hYcWgy (includes part 2 with Alex Tellez)
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Scalable Data Science and Deep Learning with H2O
In this session, we introduce the H2O data science platform. We will explain its scalable in-memory architecture and design principles and focus on the implementation of distributed deep learning in H2O. Advanced features such as adaptive learning rates, various forms of regularization, automatic data transformations, checkpointing, grid-search, cross-validation and auto-tuning turn multi-layer neural networks of the past into powerful, easy-to-use predictive analytics tools accessible to everyone. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases.
By the end of the hands-on-session, attendees will have learned to perform end-to-end data science workflows with H2O using both the easy-to-use web interface and the flexible R interface. We will cover data ingest, basic feature engineering, feature selection, hyperparameter optimization with N-fold cross-validation, multi-model scoring and taking models into production. We will train supervised and unsupervised methods on realistic datasets. With best-of-breed machine learning algorithms such as elastic net, random forest, gradient boosting and deep learning, you will be able to create your own smart applications.
A local installation of RStudio is recommended for this session.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
Deep Learning R Vignette Documentation: https://github.com/0xdata/h2o/tree/master/docs/deeplearning/
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice in traditional business analytics.
This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of enterprise-scale problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization and optimization for class imbalance. World record performance on the classic MNIST dataset, best-in-class accuracy for eBay text classification and others showcase the power of this game changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
About the Speaker: Arno Candel
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes.
He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
- The document describes a presentation on deep learning given by Arno Candel of H2O.ai.
- The presentation covered deep learning methods and implementations, results from case studies in Higgs boson classification, handwritten digit recognition, and text classification.
- It also demonstrated H2O's scalability and the ability of its deep learning algorithm to achieve state-of-the-art results on benchmark datasets.
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning in the Wild with Arno CandelSri Ambati
"Deep Learning in the Wild" Meetup at H2O, Mountain View
Livestream: http://t.co/o7p2hYcWgy (includes part 2 with Alex Tellez)
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Scalable Data Science and Deep Learning with H2O
In this session, we introduce the H2O data science platform. We will explain its scalable in-memory architecture and design principles and focus on the implementation of distributed deep learning in H2O. Advanced features such as adaptive learning rates, various forms of regularization, automatic data transformations, checkpointing, grid-search, cross-validation and auto-tuning turn multi-layer neural networks of the past into powerful, easy-to-use predictive analytics tools accessible to everyone. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases.
By the end of the hands-on-session, attendees will have learned to perform end-to-end data science workflows with H2O using both the easy-to-use web interface and the flexible R interface. We will cover data ingest, basic feature engineering, feature selection, hyperparameter optimization with N-fold cross-validation, multi-model scoring and taking models into production. We will train supervised and unsupervised methods on realistic datasets. With best-of-breed machine learning algorithms such as elastic net, random forest, gradient boosting and deep learning, you will be able to create your own smart applications.
A local installation of RStudio is recommended for this session.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
Deep Learning R Vignette Documentation: https://github.com/0xdata/h2o/tree/master/docs/deeplearning/
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice in traditional business analytics.
This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of enterprise-scale problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization and optimization for class imbalance. World record performance on the classic MNIST dataset, best-in-class accuracy for eBay text classification and others showcase the power of this game changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
About the Speaker: Arno Candel
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes.
He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
- The document describes a presentation on deep learning given by Arno Candel of H2O.ai.
- The presentation covered deep learning methods and implementations, results from case studies in Higgs boson classification, handwritten digit recognition, and text classification.
- It also demonstrated H2O's scalability and the ability of its deep learning algorithm to achieve state-of-the-art results on benchmark datasets.
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
Abstract: How graphs became just another big data primitive
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
How to win data science competitions with Deep LearningSri Ambati
This document summarizes a presentation about how to win data science competitions using deep learning with H2O. It discusses H2O's architecture and capabilities for deep learning. It then demonstrates live modeling on Kaggle competitions, providing step-by-step explanations of building and evaluating deep learning models on three different datasets - an African soil properties prediction challenge, a display advertising challenge, and a Higgs boson machine learning challenge. It concludes with tips and tricks for deep learning with H2O and an invitation to the H2O World conference.
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document provides an overview of deep learning and reinforcement learning. It discusses the current state of artificial intelligence and machine learning, including how deep learning algorithms have achieved human-level performance in various tasks such as image recognition and generation. Reinforcement learning is introduced as learning through trial-and-error interactions with an environment to maximize rewards. Examples are given of reinforcement learning algorithms solving tasks like playing Atari games.
San Francisco Hadoop User Group Meetup Deep LearningSri Ambati
Hadoop User Group, San Francisco, Dec 10 2014.
Video: http://new.livestream.com/accounts/10932136/events/3649553 (starting at 48 minutes)
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
Deep learning has achieved superhuman performance on tasks like image classification, object detection, and traffic sign recognition. Several examples are provided, including algorithms that outperform humans on German traffic sign recognition by 2-6 times. Deep learning has also been applied to tasks involving text, video, speech recognition and generation, question answering, and reinforcement learning. Libraries and frameworks like TensorFlow and Caffe have helped spread deep learning techniques.
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
This document describes research on semi-supervised learning on graph-structured data using graph convolutional networks. It proposes a layer-wise propagation model for graph convolutions that is more efficient than previous methods. The model is tested on several datasets, achieving state-of-the-art results for semi-supervised node classification while training faster than alternative methods. Future work to address limitations regarding memory requirements, directed graphs, and locality assumptions is also discussed.
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
1. The elements of Neural Networks: Weights, Biases, and Gating functions
2. MNIST (Hand writing recognition) using simple NN in TensorFlow (Introduce Tensors, Computation Graphs)
3. MNIST using Convolution NN in TensorFlow
4. Understanding words and sentences as Vectors
5. word2vec in TensorFlow
This document provides an overview of machine learning and artificial intelligence presented by Arno Candel, Chief Architect at H2O.ai. It discusses the history and evolution of AI from early concepts in the 1950s to recent advances in deep learning. It also describes H2O.ai's platform for scalable machine learning and how it works, allowing users to easily build and deploy models on big data using APIs for R, Python, and other languages.
STRIP: stream learning of influence probabilities.Albert Bifet
This document presents a method called STRIP (Streaming Learning of Influence Probabilities) for learning influence probabilities between users in a social network from a streaming log of propagations. It describes three solutions: (1) storing the whole social graph in memory, (2) using min-wise independent hashing to estimate probabilities while using sublinear space, and (3) estimating probabilities only for the most active users to be more space efficient. Experimental results on a Twitter dataset showed these solutions provided good approximations while using reasonable memory and processing time.
1. Real-time analytics of social networks can help companies detect new business opportunities by understanding customer needs and reactions in real-time.
2. MOA and SAMOA are frameworks for analyzing massive online and distributed data streams. MOA deals with evolving data streams using online learning algorithms. SAMOA provides a programming model for distributed, real-time machine learning on data streams.
3. Both tools allow companies to gain insights from social network and other real-time data to understand customers and react to opportunities.
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.
TensorFrames: Google Tensorflow on Apache SparkDatabricks
Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.
This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.
The document discusses data stream classification and algorithms for handling data streams. It begins with an introduction to data stream characteristics and challenges. It then discusses approximation algorithms for data streams, including maintaining statistics over sliding windows. Classification algorithms for data streams discussed include Naive Bayes classifiers, perceptrons, and Hoeffding trees, which are decision trees adapted for data streams using the Hoeffding bound inequality to determine the optimal split attribute.
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Neural networks are composed of interconnected neurons arranged in layers that can learn patterns from data. They consist of an input layer, hidden layers, and an output layer. Each neuron receives weighted inputs, passes them through an activation function, and outputs the result. Backpropagation allows neural networks to learn by calculating error derivatives to update weights between layers. Deeper networks can model more complex patterns using techniques like convolutional neural networks for images and recurrent neural networks for sequential data. While powerful, neural networks require large datasets and computational resources to train effectively.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
Cluster analysis is an unsupervised learning technique used to group similar data objects into clusters. It aims to partition data into groups called clusters such that objects within a cluster are as similar as possible while objects in different clusters are as dissimilar as possible. The k-means algorithm is commonly used for partitioning-based clustering. It works by randomly selecting k initial cluster centroids and then iteratively assigning data points to their nearest centroid and recalculating the centroids until cluster membership stabilizes. However, k-means is sensitive to outliers and noise since outliers can distort cluster centroids.
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
Abstract: How graphs became just another big data primitive
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
How to win data science competitions with Deep LearningSri Ambati
This document summarizes a presentation about how to win data science competitions using deep learning with H2O. It discusses H2O's architecture and capabilities for deep learning. It then demonstrates live modeling on Kaggle competitions, providing step-by-step explanations of building and evaluating deep learning models on three different datasets - an African soil properties prediction challenge, a display advertising challenge, and a Higgs boson machine learning challenge. It concludes with tips and tricks for deep learning with H2O and an invitation to the H2O World conference.
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document provides an overview of deep learning and reinforcement learning. It discusses the current state of artificial intelligence and machine learning, including how deep learning algorithms have achieved human-level performance in various tasks such as image recognition and generation. Reinforcement learning is introduced as learning through trial-and-error interactions with an environment to maximize rewards. Examples are given of reinforcement learning algorithms solving tasks like playing Atari games.
San Francisco Hadoop User Group Meetup Deep LearningSri Ambati
Hadoop User Group, San Francisco, Dec 10 2014.
Video: http://new.livestream.com/accounts/10932136/events/3649553 (starting at 48 minutes)
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
Deep learning has achieved superhuman performance on tasks like image classification, object detection, and traffic sign recognition. Several examples are provided, including algorithms that outperform humans on German traffic sign recognition by 2-6 times. Deep learning has also been applied to tasks involving text, video, speech recognition and generation, question answering, and reinforcement learning. Libraries and frameworks like TensorFlow and Caffe have helped spread deep learning techniques.
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
This document describes research on semi-supervised learning on graph-structured data using graph convolutional networks. It proposes a layer-wise propagation model for graph convolutions that is more efficient than previous methods. The model is tested on several datasets, achieving state-of-the-art results for semi-supervised node classification while training faster than alternative methods. Future work to address limitations regarding memory requirements, directed graphs, and locality assumptions is also discussed.
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
1. The elements of Neural Networks: Weights, Biases, and Gating functions
2. MNIST (Hand writing recognition) using simple NN in TensorFlow (Introduce Tensors, Computation Graphs)
3. MNIST using Convolution NN in TensorFlow
4. Understanding words and sentences as Vectors
5. word2vec in TensorFlow
This document provides an overview of machine learning and artificial intelligence presented by Arno Candel, Chief Architect at H2O.ai. It discusses the history and evolution of AI from early concepts in the 1950s to recent advances in deep learning. It also describes H2O.ai's platform for scalable machine learning and how it works, allowing users to easily build and deploy models on big data using APIs for R, Python, and other languages.
STRIP: stream learning of influence probabilities.Albert Bifet
This document presents a method called STRIP (Streaming Learning of Influence Probabilities) for learning influence probabilities between users in a social network from a streaming log of propagations. It describes three solutions: (1) storing the whole social graph in memory, (2) using min-wise independent hashing to estimate probabilities while using sublinear space, and (3) estimating probabilities only for the most active users to be more space efficient. Experimental results on a Twitter dataset showed these solutions provided good approximations while using reasonable memory and processing time.
1. Real-time analytics of social networks can help companies detect new business opportunities by understanding customer needs and reactions in real-time.
2. MOA and SAMOA are frameworks for analyzing massive online and distributed data streams. MOA deals with evolving data streams using online learning algorithms. SAMOA provides a programming model for distributed, real-time machine learning on data streams.
3. Both tools allow companies to gain insights from social network and other real-time data to understand customers and react to opportunities.
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.
TensorFrames: Google Tensorflow on Apache SparkDatabricks
Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.
This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.
The document discusses data stream classification and algorithms for handling data streams. It begins with an introduction to data stream characteristics and challenges. It then discusses approximation algorithms for data streams, including maintaining statistics over sliding windows. Classification algorithms for data streams discussed include Naive Bayes classifiers, perceptrons, and Hoeffding trees, which are decision trees adapted for data streams using the Hoeffding bound inequality to determine the optimal split attribute.
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Neural networks are composed of interconnected neurons arranged in layers that can learn patterns from data. They consist of an input layer, hidden layers, and an output layer. Each neuron receives weighted inputs, passes them through an activation function, and outputs the result. Backpropagation allows neural networks to learn by calculating error derivatives to update weights between layers. Deeper networks can model more complex patterns using techniques like convolutional neural networks for images and recurrent neural networks for sequential data. While powerful, neural networks require large datasets and computational resources to train effectively.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
Cluster analysis is an unsupervised learning technique used to group similar data objects into clusters. It aims to partition data into groups called clusters such that objects within a cluster are as similar as possible while objects in different clusters are as dissimilar as possible. The k-means algorithm is commonly used for partitioning-based clustering. It works by randomly selecting k initial cluster centroids and then iteratively assigning data points to their nearest centroid and recalculating the centroids until cluster membership stabilizes. However, k-means is sensitive to outliers and noise since outliers can distort cluster centroids.
This document discusses kriging interpolation theory and spatial data visualization. It begins with an introduction to kriging theory, describing how kriging estimates unknown points as a weighted average of nearby sample points based on their variances and covariances. Several case studies applying kriging to problems in environmental science, hydrogeology and mining are presented. Methods for visualizing spatial data using APIs like Google Maps and Baidu Maps or non-API tools are then explored. Finally, the document compares Leaflet, Baidu Maps and Google Maps for interactive spatial data visualization.
[ICLR/ICML2019読み会] A Wrapped Normal Distribution on Hyperbolic Space for Grad...Yoshihiro Nagano
1. The document presents a novel hyperbolic distribution called the pseudo-hyperbolic Gaussian, which is a Gaussian distribution on hyperbolic space that can be evaluated analytically and differentiated with respect to parameters.
2. This distribution enables gradient-based learning of probabilistic models on hyperbolic space. It also allows sampling from the hyperbolic probability distribution without auxiliary means like rejection sampling.
3. As applications of the distribution, the authors develop a hyperbolic variational autoencoder and a method for probabilistic word embedding on hyperbolic space. They demonstrate the efficacy of the distribution on datasets including MNIST, Atari 2600 Breakout, and WordNet.
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
The Cloud is a computing platform that provides on-demand access to a shared pool of configurable resources such as networks, servers and storage that can be rapidly provisioned and released with minimal management effort from clients. At its core, Cloud computing focuses on maximizing the effectiveness of the shared resources. Therefore, workflow scheduling is one of the challenges that the Cloud must tackle especially if a large number of tasks are executed on geographically distributed servers. This entails the need to adopt an effective scheduling algorithm in order to minimize task completion time (makespan). Although workflow scheduling has been the focus of many researchers, a handful efficient solutions have been proposed for Cloud computing. In this paper, we propose the LPSO, a novel algorithm for workflow scheduling problem that is based on the Particle Swarm Optimization method. Our proposed algorithm not only ensures a fast convergence but also prevents getting trapped in local extrema. We ran realistic scenarios using CloudSim and found that LPSO is superior to previously proposed algorithms and noticed that the deviation between the solution found by LPSO and the optimal solution is negligible.
This document summarizes a student's research project on approximate matching on graph databases using the GeX approach. It introduces graph databases and the need for approximate matching. It describes testing the GeX Top-K query algorithm on biological interaction data from multiple organisms. While accurate, the algorithm's performance decreases with larger datasets. Future work could approximate edge labels as well to improve scalability.
This document provides an overview of graph edit distance, including its definition, history, and algorithms. It begins by defining an edit path as a sequence of node/edge insertions, deletions, and substitutions that transforms one graph into another. The graph edit distance is the cost of the lowest cost edit path. It describes tree search algorithms used to explore the space of possible edit paths efficiently. It also explains how edit paths can be modeled as assignment problems that are solved using techniques like the Hungarian algorithm to find approximations of the graph edit distance.
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
Abstract: In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
Keywords – Feature Extraction, Cross correlation, Rain Fall, RADAR, Image Processing.
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management and other non-trivial tasks to the system. This general approach is at the heart of the modern Big Data systems, such MapReduce/Spark and SciDB. However, a wide variety of common scientific data operations -- such as computing the moving average of a time series, the vorticity of a fluid flow, etc., -- are hard to express and slow to execute with these Big Data systems. In this talk, we will introduce a brand new Big Data system namely ArrayUDF (https://bitbucket.org/arrayudf/arrayudf) for scientific data sets, especially for multi-dimensional arrays. The ArrayUDF allows flexible expressiveness of UDF for scientific data analysis on the strength of their common character--structural locality. ArrayUDF executes the UDF directly on arrays stored in files, such as HDF5, without any data load overload. ArrayUDF's desi
gn and implementation considerations for parallel data processing on large-scale HPC will also be introduced. The performance tests on Edison at NERSC show that ArrayUDF is around 2000X faster than Spark on processing large scientific datasets.
Slides from our PacificVis 2015 presentation.
The paper tackles the problems of the “giant hairballs”, the dense and tangled structures often resulting from visualiza- tion of large social graphs. Proposed is a high-dimensional rotation technique called AGI3D, combined with an ability to filter elements based on social centrality values. AGI3D is targeted for a high-dimensional embedding of a social graph and its projection onto 3D space. It allows the user to ro- tate the social graph layout in the high-dimensional space by mouse dragging of a vertex. Its high-dimensional rotation effects give the user an illusion that he/she is destructively reshaping the social graph layout but in reality, it assists the user to find a preferred positioning and direction in the high- dimensional space to look at the internal structure of the social graph layout, keeping it unmodified. A prototype im- plementation of the proposal called Social Viewpoint Finder is tested with about 70 social graphs and this paper reports four of the analysis results.
The computational infrastructure is becoming a vast interconnected fabric of formal methods, including per a major shift from 2d grids to 3d graphs in machine learning architectures
The implication is systems-level digital science at unprecedented scale for discovery in a diverse range of scientific disciplines
Optics ordering points to identify the clustering structureRajesh Piryani
The presentation summarized the OPTICS (Ordering Points To Identify the Clustering Structure) algorithm, a density-based clustering algorithm that addresses some limitations of DBSCAN. OPTICS does not produce an explicit clustering but instead outputs an ordering of all objects based on their reachability distances, representing the intrinsic clustering structure. It works by iteratively expanding clusters and updating an ordering seeds list to generate the output ordering without requiring pre-specification of parameters like DBSCAN. The ordering can then be used to extract clusters for a range of density parameter values. An example applying OPTICS on a 2D dataset was provided to illustrate the algorithm.
The document discusses hashing techniques for embedding objects into binary codes to enable efficient similarity search of large datasets. It provides an overview of locality sensitive hashing and learning-based hashing methods, including data-oblivious techniques like SimHash and data-aware approaches like spectral hashing. Examples of hashing research from ICML and other conferences in 2013 are also summarized, focusing on improving accuracy, utilizing multiple data views, and updating hash functions for new data.
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
This is a slightly expanded version of the talk I gave at the 2018 ISMP (International Symposium on Mathematical Programming). This SIAM talk has some more introductory material than the ISMP talk.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
Improving search time for contentment based image retrieval via, LSH, MTRee, ...IOSR Journals
This document proposes a new index structure called LSH-LUBMTree to improve search time for content-based image retrieval using the Earth Mover's Distance metric. LSH-LUBMTree combines Locality Sensitive Hashing (LSH) and the LUBMTree index. Images hashed to the same bucket via LSH are then stored in the LUBMTree to reduce false positives and accelerate search time. Experimental results show LSH-LUBMTree performs better than standard LSH in terms of search time by leveraging advantages of both LSH and LUBMTree indexing.
Similar to Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị (20)
Feast Feature Store - An In-depth Overview Experimentation and Application in...Hong Ong
In this event, we will dive into the world of Feast and explore its numerous benefits and applications. 🌐 During the session, we'll showcase:
✅ How Feast optimizes team collaboration and enhances data versioning, storage and service
✅ How we can store and serve Feast features through some scenarios
✅ Quick experimentation based on pre-calculated features and quick serving online API
✅ The important role of FeatureService in data versioning and feature selection
Don't miss out on this opportunity to expand your knowledge and leverage the power of Feast for data operations and collaboration.
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
In this session, we will introduce Dagster, a cutting-edge framework that simplifies DataOps and MLOps for machine learning engineers. We will explore the benefits of this powerful tool, learn how to implement it in your machine learning workflows, and discuss practical use cases to help you enhance productivity, collaboration, and deployment of ML models.
Data Products for Mobile Commerce in Real-time and Real-life.pdfHong Ong
🌀 The strong development trend of Mobile has helped M-Commerce - Mobile Commerce rise to become an inevitable era in the near future. 𝗠𝗼𝗯𝗶𝗹𝗲 𝗖𝗼𝗺𝗺𝗲𝗿𝗰𝗲 not only attracts attention with great utilities for users, but also is a great opportunity to help business owners develop their brands and promote online business in the Vietnamese market.
🌀 Following the development of the times, overcoming the "pain-points" of customers when shopping online is one of the problems of concern. Building Data products is one of the solutions to these problems. So how to do that?
Nền tảng thuật toán của AI, Machine Learning, Big DataHong Ong
Tổ chức: TopDev.
Chủ đề: Nền tảng thuật toán của AI, Machine Learning, Big Data
Speaker: Ông Xuân Hồng - Researcher engineer @ Trusting Social.
Ngày: 15/10/2017.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
1. A Geometric Distance Oracle for Large Real-World
Graphs
Hong, Ong Xuan
Data Science School
November 16, 2017
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 1 / 30
2. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 2 / 30
3. Introduction
Explosion of available
information → Mining
information about interactions
between: Subscribers, Groups,
People, Objects, etc.
Fundamental graph
computational is computing
shortest path distance
between arbitrary nodes, but:
Slow calculating and querying
distance results.
Limited memory for storing
graph.
How to do this analysis
effectively?
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 3 / 30
4. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 4 / 30
5. Background
Graph theory.
Distance oracle.
Approximate distance.
Metric space: Euclidean, Hyperbolic.
δ - hyperbolic metric space.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 5 / 30
6. Graph theory
Let G(V , E) be an undirected, weighted graph, with n = |N| nodes and
m = |E| edges. What is the distance between the nodes s and t?
Dijkstra algorithm: O(m + nlogn) with Fibonacci heap, requires no
extra space.
Adjacency matrix: query time O(1), requires O(n2) extra space.
Floyd-Warshall algorithm: return all-pairs shortest paths, initialized
in time O(n3)
How to use less than O(n2) space and answer queries in less than
O(m + nlogn)?
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 6 / 30
7. Distance oracle
A distance oracle (constant query time) is a data structure which is
cheaper to compute, fast to query, and satisfy 4 properties:
Preprocessing time should be O(n) or O(nlogn).
Storage less than O(n2).
Query less than O(m + nlogn).
Fidelity: approximated distance as close as possible to the actual
distances.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 7 / 30
8. Approximate distance oracles
Using spanning trees and distance labeling for approximating distances
(Thorup and Zwick):
Preprocessing time: O(kmn1/k).
Storage: O(kn1+1/k).
Query less than O(k).
Fidelity: estimated distance vs actual distance ∈ [1, 2k − 1].
Note: k = 1, 2, logn, higher values of k do not improve the space or
preprocessing time.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 8 / 30
9. Metric space
Ordered pair (M, d) where M is a set and d is a metric
d : M × M → R
∀x, y, z ∈ M, the following holds:
d(x, y) ≥ 0
d(x, y) = 0 ⇐⇒ x = y
d(x, y) = d(y, x)
d(x, z) ≤ d(x, y) + d(y, z)
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 9 / 30
10. Euclidean distance
d(p, q) = d(q, p) = (q1 − p1)2 + (q2 − p2)2 + ... + (qn − pn)2
=
n
i=1
(qi − pi )2
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 10 / 30
11. Hyperbolic distance
d( x1, y1 , x2, y2 ) = arcosh(coshy1cosh(x2 − x1)coshy2 − sinhy1sinhy2)
Where:
sinhx = ex −e−x
2 (hyperbolic Sine).
coshx = ex +e−x
2 (hyperbolic Cosine).
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 11 / 30
12. δ - hyperbolic metric space
Given metric space (V , d) embeds into tree metric iff 4-point condition
holds:
∀w, x, y, z ∈ V :
S := S(w, x, y, z) = d(w, x) + d(y, z)
M := M(w, x, y, z) = d(x, y) + d(w, z)
L := L(w, x, y, z) = d(x, z) + d(w, y)
S ≤ M ≤ L
Then: ∀δ ≥ 0, (L − M)/2 ≤ δ
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 12 / 30
13. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 13 / 30
14. Related works
Theoretical results provide guaranteed approximation bounds for
specific graph classes:
Distance labeling in hyperbolic graphs
A Note on Distance Approximating Trees in Graphs
Additive spanners and distance and routing labeling schemes for
hyperbolic graphs
A compact routing scheme and approximate distance oracle for
power-law graphs
Reconstructing approximate tree metrics
Essays in Group Theory
Diameters, centers, and approximating trees of δ-hyperbolic geodesic
spaces and graphs
But has not been empirically evaluated on real-world graphs.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 14 / 30
15. Related works
Spanning trees
Quick query O(nlogn).
Reduce space storage.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 15 / 30
16. Related works
Developing approximate distance oracles on empirical Graphs small world
graphs, hypergrid graphs, Facebook, telecom, Google news graph, web
graph, etc.
Efficient Shortest Paths on Massive Social Graphs
Fast fully dynamic landmark-based estimation of shortest path
distances in very large graphs
Querying Shortest Path Distance with Bounded Errors in Large
Graphs
Orion: shortest path estimation for large social graphs
Approximating Shortest Paths in Social Graphs
Fast exact shortest-path distance queries on large networks by pruned
landmark labeling
Toward a distance oracle for billion-node graphs
Heuristics lack a theoretical foundation.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 16 / 30
17. Related works
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 17 / 30
18. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 18 / 30
19. Proposed method
Hyperbolicity-based Breath First Search (HyperBFS). Notation from graph
hyperbolicity on real world networks for developing spanning trees:
Height ≤ O(logn)
Distance queries: O(logn)
Storage O(n) words of space for an n-node graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 19 / 30
20. Algorithm
Hyperbolicity-based Tree Oracle: constructing geometric oracle
Choose highly central vertex (measure of centrality in graph based on
shortest paths) as root. But we use out degree instead (power-law
network) cause they are correlated.
Build 1-10 trees (BFS algorithm) with distinct root by ordered degree
for approximation → parallel computing distance labeling.
Distances between x and y is minimum distances in different trees
constructed.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 20 / 30
21. Algorithm
Set 1: Embedding graph into multi-dimensional geometric space
Mapping the nodes of the graph into points in the hyperbolic space.
Distance between two d-dimension points x = (x1, x2, ..., xd ) and
y = (y1, y2, ..., yd ) is defined as follow:
arcosh( (1 +
d
i=1
x2
i )(1 +
d
i=1
y2
i ) −
d
i=1
xi yi ).|c|
Note: no guarantees on the distance estimation error
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 21 / 30
22. Algorithm
Set 2: Gromov-type tree contraction: improves the accuracy of distance
estimates.
partitioning tree into i-level connected component (coalesce multiple
edges into a single edge)
additive error guaranteed not to exceed 2δlogn, where δ is the
hyperbolic constant of the graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 22 / 30
23. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 23 / 30
24. Evaluation
Four Bench-marked:
Gromov-type contraction-based tree.
Steiner trees with proven multiplicative bound.
Rigel: landmark-based approach.
HyperBFS: centrality-based spanning tree oracle.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 24 / 30
25. Setup
2.4 GHz Intel(R) Xeon(R) processor with 190GB of RAM.
Calculate distortion: Let x, y be vertices of a graph G and let dA be the
distance approximated by a distance oracle:
Additive distortion: dG − dA.
Absolute distortion: |dG − dA|.
Multiplicative distortion: |dG −dA|
dG
.
Figure: Computational Time of Hyper BFS on Call Graph II.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 25 / 30
26. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 26 / 30
27. Average absolute error
Figure: Average absolute error on various real-world graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 27 / 30
28. Average additive and multiplicative error
Figure: Average additive and multiplicative error on SantaBarbara Facebook
graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 28 / 30
29. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 29 / 30
30. Discussion
Exact and approximate algorithms for computing the hyperbolicity of
large-scale graphs (N. Cohen, D. Coudert, A. Lancin)
Indexing and space O(nm) vs O(n).
Query O(n) vs O(logn).
Exact distance vs error bound 2δlogn.
Extending metrics:
Clustering local coefficient: Ci =
2|{eji :vj ,vk ∈Ni ,ejk ∈E}|
ki (ki −1)
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 30 / 30