Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Scalable Data Science and Deep Learning with H2Oodsc
The era of Big Data has passed, and the era of sensory overload – that is, the proliferation of sensor data – is upon us. The challenge today is how to create the next generation of business and consumer applications that transform how we interact with sensors themselves. Applications need to learn from every user interaction and data point and predict what can happen next. The future depends on Machine Learning, as much as it depends on the data itself, to change the way we interact with these systems.
In this talk, we explain H2O’s scalable distributed in-memory math architecture and its design principles. The platform was built alongside (and on top of) both Hadoop and Spark clusters and includes interfaces for R, Python, Scala, Java, JavaScript and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. We outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. By the end of this presentation, you will know how to create your own machine learning workflows on your data using R, Python (iPython Notebooks) or the Flow GUI.
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning in the Wild with Arno CandelSri Ambati
"Deep Learning in the Wild" Meetup at H2O, Mountain View
Livestream: http://t.co/o7p2hYcWgy (includes part 2 with Alex Tellez)
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Scalable Data Science and Deep Learning with H2Oodsc
The era of Big Data has passed, and the era of sensory overload – that is, the proliferation of sensor data – is upon us. The challenge today is how to create the next generation of business and consumer applications that transform how we interact with sensors themselves. Applications need to learn from every user interaction and data point and predict what can happen next. The future depends on Machine Learning, as much as it depends on the data itself, to change the way we interact with these systems.
In this talk, we explain H2O’s scalable distributed in-memory math architecture and its design principles. The platform was built alongside (and on top of) both Hadoop and Spark clusters and includes interfaces for R, Python, Scala, Java, JavaScript and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. We outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. By the end of this presentation, you will know how to create your own machine learning workflows on your data using R, Python (iPython Notebooks) or the Flow GUI.
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Learning in the Wild with Arno CandelSri Ambati
"Deep Learning in the Wild" Meetup at H2O, Mountain View
Livestream: http://t.co/o7p2hYcWgy (includes part 2 with Alex Tellez)
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
Abstract: How graphs became just another big data primitive
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment. In this talk, I will go through the motivation and benefits of Deep Water. After that, I will demonstrate how to build and deploy deep learning models with or without programming experience using H2O's R/Python/Flow (Web) interfaces.
Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
Dmitry will show the audience on how get started with Mxnet and building Deep Learning models to classify images, sound and text.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Sri Ambati
https://www.re-work.co/events/deep-learning-boston-2015
Scalable Data Science & Deep Learning with H2O
H2O is fast scalable open-source machine learning and deep learning for Smarter Applications. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. Sparkling Water combines the flexibility of Spark with the speed and accuracy of H2O's Machine Learning solution.
In this workshop, we explain H2O's scalable in-memory architecture and design principles and outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. We cover data ingest, feature engineering, model tuning, model validation and model selection; and how to take models into production. Live demos will be run on distributed systems. By the end of this workshop, you will know how to create your own machine learning models on your data using R, Python (iPython Notebooks) or Flow.
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O's Deep Learning. Before joining H2O, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators.
Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named “2014 Big Data All-Star” by Fortune Magazine. Follow him on Twitter: @ArnoCandel.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Water - GPU Deep Learning for H2O - Arno CandelSri Ambati
Deep Water brings the latest and greatest in the Deep Learning space all under the H2O hood. Use Tensorflow, Mxnet & Caffe all from standard H2O interface's including R, Python & Flow. Also deploy your models easily using the H2O platform.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2OSri Ambati
Arno Candel introduces Deep Water, which brings Tensorflow, Caffe, Mxnet to H2O. It also brings support for GPUs, image classification, NLP and much more to H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Scalable Data Science and Deep Learning with H2O
GOTO Conference Chicago May 12 2015
http://gotocon.com/chicago-2015/speaker/Arno+Candel
H2O is fast scalable open-source machine learning and deep learning for Smarter Applications. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. Sparkling Water combines the flexibility of Spark with the speed and accuracy of H2O's Machine Learning solution.
In this talk, we explain H2O's scalable in-memory architecture and design principles and outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. We will cover data ingest, feature engineering, model tuning, model validation and model selection; and how to take models into production. Live demos will be run on distributed systems. By the end of this presentation, you will know how to create your own machine learning models on your data using R, Python (iPython Notebooks) or Flow.
Bio:
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O's Deep Learning. Before joining H2O, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelera
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
Effective programmers work in tight loops: making a small code edit, observing its effect on their system, and repeating. When your data is too big to read and your system isn’t local, println() won’t work. Fortunately, the Spark DataFrame and Dataset APIs have your back. Attendees will leave with better tools for exploring large datasets and debugging distributed code with Spark, and a better mental model of distributed programming at scale.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
These slides will show how to approach a multi-class (classification) problem using H2O. The data that is being used is an aggregated log of multiple systems that are constantly providing information about their status, connections and traffic. In large organizations, these log datasets can be very huge and unidentifiable due to the number of sources, legacy systems etc. In our example, we use a created response for each source. The use H2O to classify the source of data.
Author Bio: Ashrith Barthur is a Security Scientist at H2O currently working on algorithms that detect anomalous behaviour in user activities, network traffic, attacks, financial fraud and global money movement. He has a PhD from Purdue University in the field of information security, specialized in Anomalous behaviour in DNS protocol.
Don’t forget to download H2O!
http://www.h2o.ai/download/
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
Abstract: How graphs became just another big data primitive
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment. In this talk, I will go through the motivation and benefits of Deep Water. After that, I will demonstrate how to build and deploy deep learning models with or without programming experience using H2O's R/Python/Flow (Web) interfaces.
Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
Dmitry will show the audience on how get started with Mxnet and building Deep Learning models to classify images, sound and text.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Sri Ambati
https://www.re-work.co/events/deep-learning-boston-2015
Scalable Data Science & Deep Learning with H2O
H2O is fast scalable open-source machine learning and deep learning for Smarter Applications. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. Sparkling Water combines the flexibility of Spark with the speed and accuracy of H2O's Machine Learning solution.
In this workshop, we explain H2O's scalable in-memory architecture and design principles and outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. We cover data ingest, feature engineering, model tuning, model validation and model selection; and how to take models into production. Live demos will be run on distributed systems. By the end of this workshop, you will know how to create your own machine learning models on your data using R, Python (iPython Notebooks) or Flow.
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O's Deep Learning. Before joining H2O, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators.
Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named “2014 Big Data All-Star” by Fortune Magazine. Follow him on Twitter: @ArnoCandel.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep Water - GPU Deep Learning for H2O - Arno CandelSri Ambati
Deep Water brings the latest and greatest in the Deep Learning space all under the H2O hood. Use Tensorflow, Mxnet & Caffe all from standard H2O interface's including R, Python & Flow. Also deploy your models easily using the H2O platform.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2OSri Ambati
Arno Candel introduces Deep Water, which brings Tensorflow, Caffe, Mxnet to H2O. It also brings support for GPUs, image classification, NLP and much more to H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Scalable Data Science and Deep Learning with H2O
GOTO Conference Chicago May 12 2015
http://gotocon.com/chicago-2015/speaker/Arno+Candel
H2O is fast scalable open-source machine learning and deep learning for Smarter Applications. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. Sparkling Water combines the flexibility of Spark with the speed and accuracy of H2O's Machine Learning solution.
In this talk, we explain H2O's scalable in-memory architecture and design principles and outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. We will cover data ingest, feature engineering, model tuning, model validation and model selection; and how to take models into production. Live demos will be run on distributed systems. By the end of this presentation, you will know how to create your own machine learning models on your data using R, Python (iPython Notebooks) or Flow.
Bio:
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O's Deep Learning. Before joining H2O, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelera
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
Effective programmers work in tight loops: making a small code edit, observing its effect on their system, and repeating. When your data is too big to read and your system isn’t local, println() won’t work. Fortunately, the Spark DataFrame and Dataset APIs have your back. Attendees will leave with better tools for exploring large datasets and debugging distributed code with Spark, and a better mental model of distributed programming at scale.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
These slides will show how to approach a multi-class (classification) problem using H2O. The data that is being used is an aggregated log of multiple systems that are constantly providing information about their status, connections and traffic. In large organizations, these log datasets can be very huge and unidentifiable due to the number of sources, legacy systems etc. In our example, we use a created response for each source. The use H2O to classify the source of data.
Author Bio: Ashrith Barthur is a Security Scientist at H2O currently working on algorithms that detect anomalous behaviour in user activities, network traffic, attacks, financial fraud and global money movement. He has a PhD from Purdue University in the field of information security, specialized in Anomalous behaviour in DNS protocol.
Don’t forget to download H2O!
http://www.h2o.ai/download/
H2O Deep Learning through Examples, Silicon Valley Big Data Science Meetup, Mountain View, 2/12/15
http://www.meetup.com/Silicon-Valley-Big-Data-Science/events/219790984/?a=md1_grp&rv=md1&_af_eid=219790984&_af=event
Live Stream: http://new.livestream.com/accounts/10932136/events/3806139
H2O World 2015
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
Erin LeDell's presentation on Scalable Ensemble Learning with H2O at Strata + Hadoop World San Jose, 03.29.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Michal Malohlava from H2O.ai talks about the new features in Sparkling Water 2.0 and the future roadmap.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
33insights: Investing in WineTech, Global InsightsVincent PRETET
At 33entrepreneurs, we focus on three verticals: wine, food and tourism. In those fields, we are looking for the most promising teams, the boldest ideas, the most exciting technology. As a consequence, we invest significantly into market analysis and surveys. We have a team of Tech Analysts investigating WineTech, FoodTech and TourismTech StartUps. We also have some secret sauce... We are pleased to share some insights here on the 5,500 StartUps we are monitoring.
You will find the following survey based on 550 Wine StartUps worldwide, that we keep up to date on. More is to come on 33insights.
HKIWSF Conference - Liv-ex on Fine Wine Investment Seminar 2009Livex
Liv-ex Director James Miles' presentation on “The history and development of the fine wine investment market”, given at the Hong Kong International Wine & Sprits Fair Trade Conference.
H2O’s AI platform provides open source machine learning framework that works with sparklyr and PySpark. H2O’s Sparkling Water allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark. With Sparkling Water, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers. H2O's open AutoML also fully automates the process training ML algorithms, tuning the right parameters and building ensemble models. Setting up an environment to perform advanced analytics on top of big data is hard, but with H2O Sparkling Water for HDInsight, customers can get started with just a few clicks. This solution will install Sparkling Water on an HDInsight Spark cluster so you can exploit all the benefits from both Spark and H2O. The solution can access data from Azure Blob storage and/or Azure Data Lake Store in addition to all the standard data sources that H2O support. It also provides Jupyter Notebooks with in-built examples for an easy jumpstart, and a user-friendly H2O FLOW UI to monitor and debug the applications.
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
Robin Bloor and Teradata
Live Webcast on April 22, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=2e69345c0a6a4e5a8de6fc72652e3bc6
Can you replace the data warehouse with Hadoop? Is Hadoop an ideal ETL subsystem? And what is the real magic of Hadoop? Everyone is looking to capitalize on the insights that lie in the vast pools of big data. Generating the value of that data relies heavily on several factors, especially choosing the right solution for the right context. With so many options out there, how do organizations best integrate these new big data solutions with the existing data warehouse environment?
Register for this episode of The Briefing Room to hear veteran analyst Dr. Robin Bloor as he explains where Hadoop fits into the information ecosystem. He’ll be briefed by Dan Graham of Teradata, who will offer perspective on how Hadoop can play a critical role in the analytic architecture. Bloor and Graham will interactively discuss big data in the big picture of the data center and will also seek to dispel several common misconceptions about Hadoop.
Visit InsideAnlaysis.com for more information.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Event-first thinking and streaming help organizations transition from followers to leaders in the market. A reliable, scalable, and economical streaming architecture helps them get there.
This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures, where it aims to make event-driven real-time applications viable for organizations.
The second half of the talk explores the MSS in detail, including its core components, their purposes, and how Kappa architecture has influenced it. Moreover, the talk lays out a few considerations before planning a new streaming application within an organization. The talk concludes by discussing the challenges in the streaming world and how vendors are trying to overcome them in the future.
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
Virtualization, Cloud Deployments, and Cloud-Based Tools have challenged and changed performance testing practices. Today’s performance tester can summons tens of thousands of virtual users from the cloud in a few minutes at a cost far lower than the expensive on-premise installations of yesteryear.
Meanwhile, systems under test have changed more. Updated software stacks have increased the complexity of scripting and performance measurement, but the biggest changes are in the nature and quantities of resources powering the systems. Interpreting resource usage when resources are shared on a private virtualization platform is exceedingly difficult. Understanding resources when they live in a large public cloud is impossible.
New developers and teams are now polyglot :
- they use multiple programming languages (Java, Javascript, Ruby, ...)
- they use multiple persistence store (RDBMS, NoSQL, Hadoop)
In this talk you will learn about the benefits if being polyglot: use the good language or framework for the good cause, select the good persistence for specific constraints.
This presentation will show how developer could mix the Java platform with other technologies such as NodeJS and AngularJS to build application in a more productive way. This is also the opportunity to talk about the new Command Query Responsibility Segregation (CQRS) pattern to allow developers to be more effective and deliver the proper application to the user quicker.
This presentation was delivered during Devfest Nantes 2014
Intro to Machine Learning with H2O and AWSSri Ambati
Navdeep Gill @ Galvanize Seattle- May 2016
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
(Big) Data Processing for Next Generation Business Value. Presented at the Leaders Buildings Leaders Conference, held at Union College on April 3, 2015.
https://www.ucollege.edu/academics/business-and-computer-science/leaders-building-leaders
This presentation is a semi-technical overview of big data and related use-cases, the Apache Hadoop software stack, and some example data-science / analysis models.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
The next generation user experience should move to customer engagement zones along their preferred channels with desired action to outcome approaches. With scores of information ranging from inventory to inquiry, weather to warehouse alerts, product to promotion info at disposal, enterprise digitization can create value at every customer touch point. Attendees witnessed the manifestation of TCS’ Thought Leadership in the Game of Retail.
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
2. • About H2O.ai and H2O (10 minutes)
• H2O in Big Data Environments (10 minutes)
• How H2O Processes Data (10 minutes)
• Building Smarter Apps with H2O (20 minutes)
• (Demo) Storm App w/ Streaming Predictions (10 minutes)
• Q & A (20 minutes)
Content for today’s talk can be found at:
https://github.com/h2oai/h2o-
meetups/tree/master/2015_01_28_MLForSmarterApps
Outline for today’s talk
H2O.ai
Machine Intelligence
5. H2O.ai
Machine Intelligence
cientific Advisory Council
Stephen Boyd
Professor of EE Engineering
Stanford University
Rob Tibshirani
Professor of Health Research
and Policy, and Statistics
Stanford University
Trevor Hastie
Professor of Statistics
Stanford University
6. H2O.ai
Machine Intelligence
What is H2O?
Open source in-memory prediction engineMath Platform
• Parallelized and distributed algorithms making the most use out of
multithreaded systems
• GLM, Random Forest, GBM, PCA, etc.
Easy to use and adoptAPI
• Written in Java – perfect for Java Programmers
• REST API (JSON) – drives H2O from R, Python, Excel, Tableau
More data? Or better models? BOTHBig Data
• Use all of your data – model without down sampling
• Run a simple GLM or a more complex GBM to find the best fit for the data
• More Data + Better Models = Better Predictions
7. Ensembles
Deep Neural Networks
Algorithms on H2O
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Cox Proportional Hazards Models
• Naïve Bayes
• Distributed Random Forest: Classification or
regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
• Deep learning: Create multi-layer feed
forward neural networks starting with an input
layer followed by multiple layers of nonlinear
transformations
Supervised Learning
Statistical
Analysis
H2O.ai
Machine Intelligence
8. Dimensionality Reduction
Anomaly Detection
Algorithms on H2O
• K-means: Partitions observations into k
clusters/groups of the same spatial size
• Principal Component Analysis: Linearly
transforms correlated variables to independent
components
• Autoencoders: Find outliers using a nonlinear
dimensionality reduction using deep learning
Unsupervised Learning
Clustering
H2O.ai
Machine Intelligence
9. Per Node
2M Row ingest/sec
50M Row Regression/sec
750M Row Aggregates / sec
On Premise
On Hadoop & Spark
On EC2
Tableau
R
JSON
Scala
Java
H2O Prediction Engine
Nano Fast Scoring Engine
Memory Manager
Columnar Compression
Rapids Query R-engine
In-Mem Map Reduce
Distributed fork/join
Python
HDFS S3 SQL NoSQL
SDK / API
Excel
H2O.ai
Machine Intelligence
Deep Learning
Ensembles
11. H2O on YARN Deployment
Hadoop Gateway Node
hadoop jar h2odriver.jar …
YARN RM
HDFSHDFS
Data Nodes
Hadoop Cluster
H2
O
H2O Mappers
(YARN Containers)
YARN Worker Nodes
NM NM NM NM NM NM
YARN Node
Managers (NM)
H2
O
H2
O
12. Now You Have an H2O Cluster
Hadoop Gateway Node
hadoop jar h2odriver.jar …
HDFSHDFS
Data Nodes
Hadoop Cluster
H2
O
H2O Mappers
(YARN Containers)
YARN Worker Nodes
NM NM NM
YARN Node
Managers (NM)
H2
O
H2
O
YARN RM
13. Read Data from HDFS Once
Hadoop Gateway Node
hadoop jar h2odriver.jar …
HDFSHDFS
Data Nodes
Hadoop Cluster
H2
O
H2O Mappers
(YARN Containers)
YARN Worker Nodes
NM NM NM
YARN Node
Managers (NM)
H2
O
H2
O
YARN RM
14. Build Models in-Memory
Hadoop Gateway Node
hadoop jar h2odriver.jar …
Hadoop Cluster
H2
O
H2O Mappers
(YARN Containers)
YARN Worker Nodes
NM NM NM
YARN Node
Managers (NM)
H2
O
H2
O
YARN RM
17. Distributed Data Taxonomy
Vector
The vector may be very large
(billions of rows)
- Stored as a compressed column (often 4x)
- Access as Java primitives with
on-the-fly decompression
- Support fast Random access
- Modifiable with Java memory semantics
H2O.ai
Machine Intelligence
18. Distributed Data Taxonomy
Vector Large vectors must be distributed
over multiple JVMs
- Vector is split into chunks
- Chunk is a unit of parallel access
- Each chunk ~ 1000 elements
- Per-chunk compression
- Homed to a single node
- Can be spilled to disk
- GC very cheap
H2O.ai
Machine Intelligence
19. Distributed Data Taxonomy
age gender zip_code ID
A row of data is always
stored in a single JVM
Distributed data frame
- Similar to R data frame
- Adding and removing
columns is cheap
H2O.ai
Machine Intelligence
25. A Propensity To Buy model (P2B) is a statistical model that tries to predict whether
a certain company will buy a certain product in a given time frame in the future.
What is a Propensity To Buy model?
Build
Statistical
Model
Historical
• Firmographics
• Past Purchase Behavior
• Contacts (# and types)
• Marketing Interactions
• Cust Sat surveys
• Macroeconomic indicators
• Purchase / Non Purchase
• SCORE: probability that a company
will buy a specific technology in the
next quarter
• VALUE: bookings amount that
Cisco will likely see IF the company
in fact buys the technology
t
…
Q1 Q2Q4
(most recent
closed quarter)
Q3Q2Q1Q4
past purchase behavior
and marketing interactions
firmographics
and contact data
…
predicted
purchase
window
Scoring
happens
here
Latest
• Firmographics
• Past Purchase Behavior
• Contacts (# and types)
• Marketing Interactions
• Customer satis. surveys
• Macroeconomic indicators
H2O.ai
Machine Intelligence
26. H2O Business Benefit with P2B
Q1 Q2
P2B Training
Scoring
models
Data
Refresh Q2
Data
Refresh Q1
Prepare,
execute
Mktg & Sales
activities
Before, without H2O
Q1 Q2
Train
&
score
Data
Refresh
Prepare, execute
Mktg & Sales
activities
Train
&
score
Data
Refresh
Prepare, execute
Mktg & Sales
activities
Now, with H2O
H2O.ai
Machine Intelligence
Without H2O:
• Models needed to be
prepared in advance, not to
delay scoring
• More time preparing
models, less time left for
using the scores in the
sales activities
With H2O:
• Newer buying patterns
incorporated immediately
into models
• Scores are published
sooner
• More time for planning
and executing
activities
27. Results & Lessons Learned
Improvements
• P2B factory is 15x faster with H2O
• Quicker techniques for simpler problems, deeper for harder ones
(grid searches!)
• Ensembles improved accuracy and stability of models significantly
Lessons Learned
• Even with few nodes speed improvement over traditional data
mining tools is substantial
• H2O becomes really powerful and robust when combined with R
• Rely on H2O’s extremely responsive support
H2O.ai
Machine Intelligence
29. Fraud Prevention at PayPal
H2O.ai
Machine Intelligence
• Employs state of the art machine learning and
statistical models to flag fraudulent behavior
upfront
• More sophisticated algorithms after
transaction complete
Transaction Level
• Monitor activity to identify abusive behavior
• Frequent payments, suspicious profile
changes
Account Level
• Monitor account to account interaction
• Frequent transfer of money from sever
accounts to one central account suspicious
Network Level
30. Fraud Prevention at PayPal
H2O.ai
Machine Intelligence
• Unearth low-level complex
abstractions
• Learn complex, highly
varying functions not
present in the training
examples
• Widely employed for
image/video processing
and object recognition
Why Deep
Learning?
• Highly scalable
• Superior performance
• Flexible deployment
• Works seamlessly with
other big data frameworks
• Simple interface
Why H2O?
Experiment
• Dataset
− 160 million records
− 1500 features (150 categorical)
− 0.6TB compressed in HDFS
• Infrastructure
− 800 node Hadoop (CDH3)
cluster
• Decision
− Fraud/not-fraud
31. Conclusions using H2O
Deep Learning with H2O is beneficial for payment fraud prevention
• Network architecture- 6 layers with 600 neurons each performed the
best
• Activation function
− RectifierWithDropout performed the best
• Improved performance with limited feature set & a deep network
− 11% improvement with a third of the original feature set, 6 hidden layers, 600
neurons each
• Robust to temporal variations
H2O.ai
Machine Intelligence
33. Unsupervised Learning & Bordeaux Wine
• A personal and expen$$$ive obsession
• Alex Tellez - h2o.ai
34. BORDEAUX WINE
Largest wine-growing region in France
+ 700 Million bottles of wine produced / year !
Some years better than others: Great ($$$) vs. Typical ($)
Last Great years: 2010, 2009, 2005, 2000
35. ‘EN PRIMEUR’
While wine is still barreled, purchasers can ‘invest’ in the wine
before bottling and official public release.
Advantage: Wines may be considerably cheaper during
‘en primeur’ period than @ bottling.
Great Years: 2009, ’05, ‘00
36. GREAT VS. TYPICAL VINTAGE?
Question:
Can we study weather patterns in Bordeaux
leading up to harvest to identify ‘anomalous’ weather years >>
correlates to Great ($$$) vs. Typical ($) Vintage?
The Bordeaux Dataset (1952 - 2014 Yearly)
Amount of Winter Rain (Oct > Apr of harvest year)
Average Summer Temp (Apr > Sept of harvest year)
Rain during Harvest (Aug > Sept)
Years since last Great Vintage
37. AUTOENCODER + ANOMALY
DETECTION
In Steps:
1) Train deep autoencoder to learn ‘typical’ vintage weather pattern
2) Append ‘great’ vintage year weather data to original dataset
3) IF great vintage year weather data does NOT match learned
weather pattern, autoencoder will produce high reconstruction
error (MSE)
‘en primeur of en primeur’
- Can we use weather patterns to identify anomalous years
>> indicates great vintage quality?
Goal:
38. RESULTS (MSE > 0.10)
MeanSquareError
1961 V 2009 V
2005 V
2000 V
1990 V
1989 V
1982 V
2010 V
41. DESCRIBING WINES
=
…wet forest floor
…decaying animal skin
…cat pee
…grandmother’s closet
…barnyardy (AKA sh*t)
…asian plum sauce
…chewy tannins
Can we use Parker reviews to recommend wines we would like??
42. WINE RECOMMENDATIONS
I like: 2000 Cheval Blanc (Red)
Robert Parker Tasting Notes:
“…sweet nose of menthol, melted licorice, boysenberry, blueberry,
creme de cassis…sweet tannins w/ hints of coffee & earth”
So I’m at BevMo…now what?
43. APPROACH
In Steps:
1. Collect hundreds of Robert Parker reviews of wine
2. Create bag-of-words dataset & scale word counts between 0 & 1
(~ probability a particular word occurs in a wine review)
3. Build deep autoencoder from sparse bag-of-words matrix
>> reduce to 10 ‘deep features’
4. Take Euclidian distance of wine I like (2000 Cheval Blanc) to
other wines >> Information Retrieval
5. ‘Nearby’ wine review-vectors = recommendations!
49. Q & A
H2O.ai
Machine Intelligence
Thanks for attending!
Content for today’s talk can be found at:
https://github.com/h2oai/h2o-
meetups/tree/master/2015_01_28_MLForSmarterApps
Editor's Notes
H2O MapReduce is not Hadoop MapReduce
For more details, go visit cliff in his Hacker corner.