- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Java heap memory model has wasteful memory usage. References, object headers, internal collection structure, extra fields such as String.hashCode… This talk shows practical ways to reduce memory usage and fit more data into memory: primitive types, specialized java collections, bit packing, reducing number of pointers, replacing String with char[], semi-serialized objects… As bonus we get lower GC overhead by reducing number of references.
Java heap memory model has wasteful memory usage. References, object headers, internal collection structure, extra fields such as String.hashCode… This talk shows practical ways to reduce memory usage and fit more data into memory: primitive types, specialized java collections, bit packing, reducing number of pointers, replacing String with char[], semi-serialized objects… As bonus we get lower GC overhead by reducing number of references.
Advanced Non-Relational Schemas For Big DataVictor Smirnov
This is the presentation from barcamp in Altoros where I was explaining how various advanced non-relational schemas (or, simply, data structures) can be modelled on top of Key/Value storage. The set of covered schemas includes Dynamic Vector, File System, Searchable Bitmap, LOUDS Tree, Wavelet Tree and Inverted Index.
See https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData
for additional details.
This presentation deals with RealmDB, which is a convenient replacement for SQLite & Core Data in mobile development.
Presentation by Anton Minashkin (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
MapDB - taking Java collections to the next levelJavaDayUA
Java collections have several limitations. But imagine library without limits, which could even replace your database. This session talks about drop-in replacement with many new possibilities. MapDB provides Java collections backed by in-memory or on-disk store. It adds extra features to traditional collections (entry expiration, binding, secondary collections…). It is also proper database engine and has transactions, snapshots, incremental backups… And finally it is not affected by GC, so it can take a billion entries without a hiccup.
This presentation helps discover some of the specific features of Java 8, (in particular, atomics and parallelism) and start using them effectively.
This presentation by Maksym Voronyi (Software Engineer, GlobalLogic) was delivered at Java.io 3.0 conference in Kharkiv on March 24, 2016, and GlobalLogic Mykolaiv Java Conference on June 11, 2016.
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
The slides discuss various matters of the No SQL and casandra Models, the slide gives a complete picture of the both topics and its relations. Also it discuss the merits and demerits of the topics and its features and examples are also described.
Parallel programming patterns - Олександр ПавлишакIgor Bronovskyy
Паралелізм та concurrency -- напрямок, в якому технології програмування прямують зараз і, без сумніву, прямуватимуть в майбутньому. Багатоядерними процесорами оснащуються комп'ютери навіть початкового рівня, що відкриває можливості для створення швидких ефективних програм із "живими" інтерфейсами. Тому навіть ті розробники, які раніше не стикались із concurrent кодом, будуть все частіше і частіше програмувати із врахуванням паралелізму. В доповіді буде розглянуто шаблони паралельного програмування, способи асинхронного програмування, а також тенденції окремих сучасних технологій в області паралелізму та асинхронності. Слухачі зможуть отримати знання про основні способи організації паралельних обчислень у desktop-, web- і серверних аплікаціях, засоби досягнення responsive GUI, техніки вирішення проблем, що виникають у concurrent програмуванні.
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldJihoon Son
This slide was presented at the SK Telecom T Developer Forum. It contains the brief evaluation results of the query execution performance of Tajo on Swift.
I conducted two kinds of experiments; The first experiment was to compare the performance of Tajo with on another distributed storage, i.e., HDFS. And the second experiment was the scalability test of Swift.
Interestingly, the scan performance on Swift is slower more than two times than that on HDFS. In addition, the task scheduling time on Swift is much greater than that on HDFS, which means the query initialization cost is very high.
Speaker: Oscar Westra van Holthe - Kind
Genre & level: Backend, Medior
Data congestion slows down even the fastest database. Want to avoid getting your data jammed? I’ll show you how you can turn your database into an information highway. Get your recipes for query performance improvements here.
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
Spark Streaming is a framework for processing large volumes of streaming data in near-real-time. This is an introductory presentation about how Spark Streaming and Kafka can be used for high volume near-real-time streaming data processing in a cluster. This was a guest lecture in a Stanford course.
More information on the course at http://stanford.edu/~rezab/dao/
Slides of the workshop conducted in Model Engineering College, Ernakulam, and Sree Narayana Gurukulam College, Kadayiruppu
Kerala, India in December 2010
21st Century University feasibility study Jouni Eho
This feasibility study looked at the disruption taking place in the higher education space and sketched an MVP prototype of a radical new 21UNI concept to be tested in Kotka, Finland
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Advanced Non-Relational Schemas For Big DataVictor Smirnov
This is the presentation from barcamp in Altoros where I was explaining how various advanced non-relational schemas (or, simply, data structures) can be modelled on top of Key/Value storage. The set of covered schemas includes Dynamic Vector, File System, Searchable Bitmap, LOUDS Tree, Wavelet Tree and Inverted Index.
See https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData
for additional details.
This presentation deals with RealmDB, which is a convenient replacement for SQLite & Core Data in mobile development.
Presentation by Anton Minashkin (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
MapDB - taking Java collections to the next levelJavaDayUA
Java collections have several limitations. But imagine library without limits, which could even replace your database. This session talks about drop-in replacement with many new possibilities. MapDB provides Java collections backed by in-memory or on-disk store. It adds extra features to traditional collections (entry expiration, binding, secondary collections…). It is also proper database engine and has transactions, snapshots, incremental backups… And finally it is not affected by GC, so it can take a billion entries without a hiccup.
This presentation helps discover some of the specific features of Java 8, (in particular, atomics and parallelism) and start using them effectively.
This presentation by Maksym Voronyi (Software Engineer, GlobalLogic) was delivered at Java.io 3.0 conference in Kharkiv on March 24, 2016, and GlobalLogic Mykolaiv Java Conference on June 11, 2016.
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
The slides discuss various matters of the No SQL and casandra Models, the slide gives a complete picture of the both topics and its relations. Also it discuss the merits and demerits of the topics and its features and examples are also described.
Parallel programming patterns - Олександр ПавлишакIgor Bronovskyy
Паралелізм та concurrency -- напрямок, в якому технології програмування прямують зараз і, без сумніву, прямуватимуть в майбутньому. Багатоядерними процесорами оснащуються комп'ютери навіть початкового рівня, що відкриває можливості для створення швидких ефективних програм із "живими" інтерфейсами. Тому навіть ті розробники, які раніше не стикались із concurrent кодом, будуть все частіше і частіше програмувати із врахуванням паралелізму. В доповіді буде розглянуто шаблони паралельного програмування, способи асинхронного програмування, а також тенденції окремих сучасних технологій в області паралелізму та асинхронності. Слухачі зможуть отримати знання про основні способи організації паралельних обчислень у desktop-, web- і серверних аплікаціях, засоби досягнення responsive GUI, техніки вирішення проблем, що виникають у concurrent програмуванні.
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldJihoon Son
This slide was presented at the SK Telecom T Developer Forum. It contains the brief evaluation results of the query execution performance of Tajo on Swift.
I conducted two kinds of experiments; The first experiment was to compare the performance of Tajo with on another distributed storage, i.e., HDFS. And the second experiment was the scalability test of Swift.
Interestingly, the scan performance on Swift is slower more than two times than that on HDFS. In addition, the task scheduling time on Swift is much greater than that on HDFS, which means the query initialization cost is very high.
Speaker: Oscar Westra van Holthe - Kind
Genre & level: Backend, Medior
Data congestion slows down even the fastest database. Want to avoid getting your data jammed? I’ll show you how you can turn your database into an information highway. Get your recipes for query performance improvements here.
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
Spark Streaming is a framework for processing large volumes of streaming data in near-real-time. This is an introductory presentation about how Spark Streaming and Kafka can be used for high volume near-real-time streaming data processing in a cluster. This was a guest lecture in a Stanford course.
More information on the course at http://stanford.edu/~rezab/dao/
Slides of the workshop conducted in Model Engineering College, Ernakulam, and Sree Narayana Gurukulam College, Kadayiruppu
Kerala, India in December 2010
21st Century University feasibility study Jouni Eho
This feasibility study looked at the disruption taking place in the higher education space and sketched an MVP prototype of a radical new 21UNI concept to be tested in Kotka, Finland
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati
Machine Learning with H2O, Spark, and Python at Strata SJ 2015-by Cliff Click and Michal Malohlava
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
PayPal's Fraud Detection with Deep Learning in H2O World 2014 -
Flexible Deployment, Seamlessly with Big Data, Accuracy and Responsive support.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
Note: Get all workshop content at - https://github.com/h2oai/h2o-meetups/tree/master/2017_02_22_Seattle_STC_Meetup
Basic knowledge of R/python and general ML concepts
Note: This is bring-your-own-laptop workshop. Make sure you bring your laptop in order to be able to participate in the workshop
Level: 200
Time: 2 Hours
Agenda:
- Introduction to ML, H2O and Sparkling Water
- Refresher of data manipulation in R & Python
- Supervised learning
---- Understanding liner regression model with an example
---- Understanding binomial classification with an example
---- Understanding multinomial classification with an example
- Unsupervised learning
---- Understanding k-means clustering with an example
- Using machine learning models in production
- Sparkling Water Introduction & Demo
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati, Sri Ambati
Transformation with Data and AI, H2O Open Dallas 2016, Keynote by Sri Ambati, founder @h2o.ai @srisatish
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. H2O brings and elegant lego-like infrastructure that brings fine-grained parallelism to math over simple distributed arrays.
H2O World - Sparkling Water - Michal MalohlavaSri Ambati
H2O World 2015 - Michal Malohlava
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Michal Malohlava talks about the PySparkling Water package for Spark and Python users.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Ray Peck from H2O.ai talks about the roadmap for the upcoming AutoML product in H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
L’ utilizzo delle applicazioni di messaging ha superato quello dei social network. In che modo il modello dei chatbot sta cambiando le regole di engagement con i consumatori?
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Building a Big Data Machine Learning PlatformCliff Click
H2O - It's open source, in-memory, big data, clustered computing - Math At Scale. We got the Worlds Fastest Logistic Regression (by a lot!), world's first (and fastest) distributed Gradient Boosted Method (GBM), plus Random Forest, PCA, KMeans++, etc... R's "plyr" style data munging at-scale, including ddply (Group-By for you SQL'rs) and much of R's expressive coding style.
We built H2O, an open-source platform for working with in-memory distributed data. Then we built on top of H2O state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets.
This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
Unsupervised Learning on Huge Data with Apache Spark
Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Spark’s MLLib module contains implementations of several unsupervised learning algorithms that scale to large datasets. In this talk, we’ll discuss how to use and implement large-scale machine learning algorithms with the Spark programming model, diving into MLLib’s K-means clustering and Principal Component Analysis (PCA).
Доклад рассказывает об устройстве и опыте применения инструментов динамического тестирования C/C++ программ — AddressSanitizer, ThreadSanitizer и MemorySanitizer. Инструменты находят такие ошибки, как использование памяти после освобождения, обращения за границы массивов и объектов, гонки в многопоточных программах и использования неинициализированной памяти.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Options and trade offs for parallelism and concurrency in Modern C++Satalia
While threads have become a first class citizen in C++ since C++11, it is not always the case that they are the best abstraction to express parallelism where the objective is to speed up computations. OpenMP is a parallelism API for C/C++ and Fortran that has been around for a long time. Intel's Threading Building Blocks (TBB) is only a little bit more than 10 years old, but is very mature, and specifically for C++.
Mats will introduce OpenMP and TBB and their use in modern C++ and provide some best practices for them as well as try to predict what the C++ standard has in store for us when it comes to parallelism in the future.
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)PROIDEA
Kiedy ostatnio stworzyłeś nową strukturę pisząc aplikację w .NET? Wiesz do czego wykorzystywać struktury i jak mogą one zwiększyć wydajność Twojego programu? W prezentacji pokażę czym charakteryzują się struktury, jak dużo różni je od klas oraz opowiem o kilku ciekawych eksperymentach.
When you think about C#, you'll usually think about a high-level language, one that is utilized to build websites, APIs, and desktop applications. However, from its inception, C# had the foundation to be used as a system language, with facilities that allow you direct memory access and fine-grained control over memory and execution.
In the last five years, there has been a huge emphasis on making C# a more capable language for system development. Oren Eini, the founder of RavenDB, has used C# as the base language to build a distributed document database for over a decade.
In this talk, Oren will discuss the features that make C# a viable system language for building high-end systems. Learn how you can mix and match, in a single project, both high-level concepts and intimate control over every single thing that is happening in your system.
Similar to GBM in H2O with Cliff Click: H2O API (20)
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
2. 0xdata.com 2
H2O is...
● Pure Java, Open Source: 0xdata.com
● https://github.com/0xdata/h2o/
● A Platform for doing Math
● Parallel Distributed Math
● In-memory analytics: GLM, GBM, RF, Logistic Reg
● Accessible via REST & JSON
● A K/V Store: ~150ns per get or put
● Distributed Fork/Join + Map/Reduce + K/V
3. 0xdata.com 3
Agenda
● Building Blocks For Big Data:
● Vecs & Frames & Chunks
● Distributed Tree Algorithms
● Access Patterns & Execution
● GBM on H2O
● Performance
4. 0xdata.com 4
A Collection of Distributed Vectors
// A Distributed Vector
// much more than 2billion elements
class Vec {
long length(); // more than an int's worth
// fast random access
double at(long idx); // Get the idx'th elem
boolean isNA(long idx);
void set(long idx, double d); // writable
void append(double d); // variable sized
}
5. 0xdata.com 5
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Frames
A Frame: Vec[]
age sex zip ID car
●Vecs aligned
in heaps
●Optimized for
concurrent access
●Random access
any row, any JVM
●But faster if local...
more on that later
6. 0xdata.com 6
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Distributed Data Taxonomy
A Chunk, Unit of Parallel Access
Vec Vec Vec Vec Vec
●Typically 1e3 to
1e6 elements
●Stored compressed
●In byte arrays
●Get/put is a few
clock cycles
including
compression
7. 0xdata.com 7
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Distributed Parallel Execution
Vec Vec Vec Vec Vec
●All CPUs grab
Chunks in parallel
●F/J load balances
●Code moves to Data
●Map/Reduce & F/J
handles all sync
●H2O handles all
comm, data manage
8. 0xdata.com 8
Distributed Data Taxonomy
Frame – a collection of Vecs
Vec – a collection of Chunks
Chunk – a collection of 1e3 to 1e6 elems
elem – a java double
Row i – i'th elements of all the Vecs in a Frame
9. 0xdata.com 9
Distributed Coding Taxonomy
● No Distribution Coding:
● Whole Algorithms, Whole Vector-Math
● REST + JSON: e.g. load data, GLM, get results
● Simple Data-Parallel Coding:
● Per-Row (or neighbor row) Math
● Map/Reduce-style: e.g. Any dense linear algebra
● Complex Data-Parallel Coding
● K/V Store, Graph Algo's, e.g. PageRank
10. 0xdata.com 10
Distributed Coding Taxonomy
● No Distribution Coding:
● Whole Algorithms, Whole Vector-Math
● REST + JSON: e.g. load data, GLM, get results
● Simple Data-Parallel Coding:
● Per-Row (or neighbor row) Math
● Map/Reduce-style: e.g. Any dense linear algebra
● Complex Data-Parallel Coding
● K/V Store, Graph Algo's, e.g. PageRank
Read the docs!
This talk!
Join our GIT!
11. 0xdata.com 11
Simple Data-Parallel Coding
● Map/Reduce Per-Row: Stateless
●
Example from Linear Regression, Σ y2
● Auto-parallel, auto-distributed
● Near Fortran speed, Java Ease
double sumY2 = new MRTask() {
double map( double d ) { return d*d; }
double reduce( double d1, double d2 ) {
return d1+d2;
}
}.doAll( vecY );
14. 0xdata.com 14
Distributed Trees
● Overlay a Tree over the data
● Really: Assign a Tree Node to each Row
● Number the Nodes
● Store "Node_ID" per row in a temp Vec
● Make a pass over all Rows
● Nodes not visited in order...
● but all rows, all Nodes efficiently visited
● Do work (e.g. histogram) per Row/Node
Vec nids = v.makeZero();
… nids.set(row,nid)...
15. 0xdata.com 15
Distributed Trees
● An initial Tree
● All rows at nid==0
● MRTask: compute stats
● Use the stats to make a decision...
● (varies by algorithm)!
nid=0
X Y nids
A 1.2 0
B 3.1 0
C -2. 0
D 1.1 0
nid=0nid=0
Tree
MRTask.sum=3.4
16. 0xdata.com 16
Distributed Trees
● Next layer in the Tree (and MRTask across rows)
● Each row: decide!
– If "1<Y<1.5" go left else right
● Compute stats per new leaf
● Each pass across all
rows builds entire layer
nid=0
X Y nids
A 1.2 1
B 3.1 2
C -2. 2
D 1.1 1
nid=01 < Y < 1.5
Tree
sum=1.1
nid=1 nid=2
sum=2.3
17. 0xdata.com 17
Distributed Trees
● Another MRTask, another layer...
● i.e., a 5-deep tree
takes 5 passes
●
nid=0nid=01 < Y < 1.5
Tree
sum=1.1
Y==1.1 leaf
nid=3 nid=4
X Y nids
A 1.2 3
B 3.1 2
C -2. 2
D 1.1 4 sum= -2. sum=3.1
18. 0xdata.com 18
Distributed Trees
● Each pass is over one layer in the tree
● Builds per-node histogram in map+reduce calls
class Pass extends MRTask2<Pass> {
void map( Chunk chks[] ) {
Chunk nids = chks[...]; // Node-IDs per row
for( int r=0; r<nids.len; r++ ){// All rows
int nid = nids.at80(i); // Node-ID THIS row
// Lazy: not all Chunks see all Nodes
if( dHisto[nid]==null ) dHisto[nid]=...
// Accumulate histogram stats per node
dHisto[nid].accum(chks,r);
}
}
}.doAll(myDataFrame,nids);
19. 0xdata.com 19
Distributed Trees
● Each pass analyzes one Tree level
● Then decide how to build next level
● Reassign Rows to new levels in another pass
– (actually merge the two passes)
● Builds a Histogram-per-Node
● Which requires a reduce() call to roll up
● All Histograms for one level done in parallel
20. 0xdata.com 20
Distributed Trees: utilities
● “score+build” in one pass:
● Test each row against decision from prior pass
● Assign to a new leaf
● Build histogram on that leaf
● “score”: just walk the tree, and get results
● “compress”: Tree from POJO to byte[]
● Easily 10x smaller, can still walk, score, print
● Plus utilities to walk, print, display
21. 0xdata.com 21
GBM on Distributed Trees
● GBM builds 1 Tree, 1 level at a time, but...
● We run the entire level in parallel & distributed
● Built breadth-first because it's "free"
● More data offset by more CPUs
● Classic GBM otherwise
● Build residuals tree-by-tree
● Tuning knobs: trees, depth, shrinkage, min_rows
● Pure Java
22. 0xdata.com 22
GBM on Distributed Trees
● Limiting factor: latency in turning over a level
● About 4x faster than R single-node on covtype
● Does the per-level compute in parallel
● Requires sending histograms over network
– Can get big for very deep trees
●
23. 0xdata.com 23
Summary: Write (parallel) Java
● Most simple Java “just works”
● Fast: parallel distributed reads, writes, appends
● Reads same speed as plain Java array loads
● Writes, appends: slightly slower (compression)
● Typically memory bandwidth limited
– (may be CPU limited in a few cases)
● Slower: conflicting writes (but follows strict JMM)
● Also supports transactional updates
24. 0xdata.com 24
Summary: Writing Analytics
● We're writing Big Data Analytics
● Generalized Linear Modeling (ADMM, GLMNET)
– Logistic Regression, Poisson, Gamma
● Random Forest, GBM, KMeans++, KNN
● State-of-the-art Algorithms, running Distributed
● Solidly working on 100G datasets
● Heading for Tera Scale
● Paying customers (in production!)
● Come write your own (distributed) algorithm!!!
25. 0xdata.com 25
Cool Systems Stuff...
● … that I ran out of space for
● Reliable UDP, integrated w/RPC
● TCP is reliably UNReliable
● Already have a reliable UDP framework, so no prob
● Fork/Join Goodies:
● Priority Queues
● Distributed F/J
● Surviving fork bombs & lost threads
● K/V does JMM via hardware-like MESI protocol
26. 0xdata.com 26
H2O is...
● Pure Java, Open Source: 0xdata.com
● https://github.com/0xdata/h2o/
● A Platform for doing Math
● Parallel Distributed Math
● In-memory analytics: GLM, GBM, RF, Logistic Reg
● Accessible via REST & JSON
● A K/V Store: ~150ns per get or put
● Distributed Fork/Join + Map/Reduce + K/V
28. 0xdata.com 28
Other Simple Examples
● Filter & Count (underage males):
● (can pass in any number of Vecs or a Frame)
long sumY2 = new MRTask() {
long map( long age, long sex ) {
return (age<=17 && sex==MALE) ? 1 : 0;
}
long reduce( long d1, long d2 ) {
return d1+d2;
}
}.doAll( vecAge, vecSex );
29. 0xdata.com 29
Other Simple Examples
● Filter into new set (underage males):
● Can write or append subset of rows
– (append order is preserved)
class Filter extends MRTask {
void map(Chunk CRisk, Chunk CAge, Chunk CSex){
for( int i=0; i<CAge.len; i++ )
if( CAge.at(i)<=17 && CSex.at(i)==MALE )
CRisk.append(CAge.at(i)); // build a set
}
};
Vec risk = new AppendableVec();
new Filter().doAll( risk, vecAge, vecSex );
...risk... // all the underage males
30. 0xdata.com 30
Other Simple Examples
● Filter into new set (underage males):
● Can write or append subset of rows
– (append order is preserved)
class Filter extends MRTask {
void map(Chunk CRisk, Chunk CAge, Chunk CSex){
for( int i=0; i<CAge.len; i++ )
if( CAge.at(i)<=17 && CSex.at(i)==MALE )
CRisk.append(CAge.at(i)); // build a set
}
};
Vec risk = new AppendableVec();
new Filter().doAll( risk, vecAge, vecSex );
...risk... // all the underage males
31. 0xdata.com 31
Other Simple Examples
● Group-by: count of car-types by age
class AgeHisto extends MRTask {
long carAges[][]; // count of cars by age
void map( Chunk CAge, Chunk CCar ) {
carAges = new long[numAges][numCars];
for( int i=0; i<CAge.len; i++ )
carAges[CAge.at(i)][CCar.at(i)]++;
}
void reduce( AgeHisto that ) {
for( int i=0; i<carAges.length; i++ )
for( int j=0; i<carAges[j].length; j++ )
carAges[i][j] += that.carAges[i][j];
}
}
32. 0xdata.com 32
class AgeHisto extends MRTask {
long carAges[][]; // count of cars by age
void map( Chunk CAge, Chunk CCar ) {
carAges = new long[numAges][numCars];
for( int i=0; i<CAge.len; i++ )
carAges[CAge.at(i)][CCar.at(i)]++;
}
void reduce( AgeHisto that ) {
for( int i=0; i<carAges.length; i++ )
for( int j=0; i<carAges[j].length; j++ )
carAges[i][j] += that.carAges[i][j];
}
}
Other Simple Examples
● Group-by: count of car-types by age
Setting carAges in map() makes it an output field.
Private per-map call, single-threaded write access.
Must be rolled-up in the reduce call.
Setting carAges in map makes it an output field.
Private per-map call, single-threaded write access.
Must be rolled-up in the reduce call.
33. 0xdata.com 33
Other Simple Examples
● Uniques
● Uses distributed hash set
class Uniques extends MRTask {
DNonBlockingHashSet<Long> dnbhs = new ...;
void map( long id ) { dnbhs.add(id); }
void reduce( Uniques that ) {
dnbhs.putAll(that.dnbhs);
}
};
long uniques = new Uniques().
doAll( vecVistors ).dnbhs.size();
34. 0xdata.com 34
Other Simple Examples
● Uniques
● Uses distributed hash set
class Uniques extends MRTask {
DNonBlockingHashSet<Long> dnbhs = new ...;
void map( long id ) { dnbhs.add(id); }
void reduce( Uniques that ) {
dnbhs.putAll(that.dnbhs);
}
};
long uniques = new Uniques().
doAll( vecVistors ).dnbhs.size();
Setting dnbhs in <init> makes it an input field.
Shared across all maps(). Often read-only.
This one is written, so needs a reduce.