SystemML was designed with the main goal of lowering the complexity required to maintain and scale Machine Learning algorithms. It provides a declarative machine learning (DML) that simplify the specification of machine learning algorithms using an R-like and Python-like that significantly increases the productivity of data scientist as it provides flexibility on how the custom analytics are expressed and also provides data independence from the underlying input formats and physical custom analytics.
This presentation gives a quick introduction to Apache SystemML, provides an updated on the recent areas that are being developed by the project community, and go over a tutorial that enables one to quickly get up to speed in SystemML.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Discrete Logarithmic Problem- Basis of Elliptic Curve CryptosystemsNIT Sikkim
ECC was developed in 1985 independently by Neal Koblitz and Victor Miller. Both men saw the application of the elliptic curve discrete log problem (ECDLP) as a replacement for the conventional discrete log problem (DLP) which is used in DSA, and the integer factorization problem found in RSA. For both problems, sub-exponential solutions have been generated; the
same which cannot be said for ECDLP . In addition to offering increased security for a smaller key size, operations of adding and doubling can be optimized successfully on a mobile
platform . ECC offers a viable replacement to the most common public-key cryptography algorithms on mobile devices.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Discrete Logarithmic Problem- Basis of Elliptic Curve CryptosystemsNIT Sikkim
ECC was developed in 1985 independently by Neal Koblitz and Victor Miller. Both men saw the application of the elliptic curve discrete log problem (ECDLP) as a replacement for the conventional discrete log problem (DLP) which is used in DSA, and the integer factorization problem found in RSA. For both problems, sub-exponential solutions have been generated; the
same which cannot be said for ECDLP . In addition to offering increased security for a smaller key size, operations of adding and doubling can be optimized successfully on a mobile
platform . ECC offers a viable replacement to the most common public-key cryptography algorithms on mobile devices.
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
This slide first introduces the sequential pattern mining problem and also presents some required definitions in order to understand GSP algorithm. At then end there is a brief introduction of GSP algorithm and some practical constraints which it supports.
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
Laura Florentina Stoica, Florian Mircea Boian, Algebraic Approach to Implementing an ATL Model Checker, STUDIA Univ. Babes Bolyai, INFORMATICA, Volume LVII, Number 2, 2012, pp. 73-82
"Java 8, Lambda e la programmazione funzionale" by Theodor DumitrescuThinkOpen
Theodor Dumitrescu racconta perché si sente sempre più spesso parlare di programmazione funzionale e perché soprattutto in presenza del termine “lambda”.
Machine learning in the enterprise is an iterative process. Data scientists will tweak or replace their learning algorithm in a small data sample until they find an approach that works for the business problem and then apply the Analytics to the full data set. Apache SystemML is a new system that accelerates this kind of exploratory algorithm development for large-scale machine learning problems. SystemML provides a high-level language to quickly implement and run machine learning algorithms on Spark. SystemML’s cost-based optimizer takes care of low-level decisions about how to use Spark’s parallelism, allowing users to focus on the algorithm and the real-world problem that the algorithm is trying to solve. This talk will introduce you to SystemML and get you started building declarative analytics with SystemML using a simple Zeppelin notebook and running on Apache Spark environment.
30-minute talk from Spark Summit East about the internals of Apache SystemML. Apache SystemML is a system that automatically parallelizes machine learning algorithms, greatly improving the productivity of data scientists. For more information about Apache SystemML, please go to the project's home page at http://systemml.apache.org
The presentation is a brief case study of R Programming Language. In this, we discussed the scope of R, Uses of R, Advantages and Disadvantages of the R programming Language.
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
This slide first introduces the sequential pattern mining problem and also presents some required definitions in order to understand GSP algorithm. At then end there is a brief introduction of GSP algorithm and some practical constraints which it supports.
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
Laura Florentina Stoica, Florian Mircea Boian, Algebraic Approach to Implementing an ATL Model Checker, STUDIA Univ. Babes Bolyai, INFORMATICA, Volume LVII, Number 2, 2012, pp. 73-82
"Java 8, Lambda e la programmazione funzionale" by Theodor DumitrescuThinkOpen
Theodor Dumitrescu racconta perché si sente sempre più spesso parlare di programmazione funzionale e perché soprattutto in presenza del termine “lambda”.
Machine learning in the enterprise is an iterative process. Data scientists will tweak or replace their learning algorithm in a small data sample until they find an approach that works for the business problem and then apply the Analytics to the full data set. Apache SystemML is a new system that accelerates this kind of exploratory algorithm development for large-scale machine learning problems. SystemML provides a high-level language to quickly implement and run machine learning algorithms on Spark. SystemML’s cost-based optimizer takes care of low-level decisions about how to use Spark’s parallelism, allowing users to focus on the algorithm and the real-world problem that the algorithm is trying to solve. This talk will introduce you to SystemML and get you started building declarative analytics with SystemML using a simple Zeppelin notebook and running on Apache Spark environment.
30-minute talk from Spark Summit East about the internals of Apache SystemML. Apache SystemML is a system that automatically parallelizes machine learning algorithms, greatly improving the productivity of data scientists. For more information about Apache SystemML, please go to the project's home page at http://systemml.apache.org
The presentation is a brief case study of R Programming Language. In this, we discussed the scope of R, Uses of R, Advantages and Disadvantages of the R programming Language.
this presentation is an introduction to R programming language.we will talk about usage, history, data structure and feathers of R programming language.
Is it easier to add functional programming features to a query language, or to add query capabilities to a functional language? In Morel, we have done the latter.
Functional and query languages have much in common, and yet much to learn from each other. Functional languages have a rich type system that includes polymorphism and functions-as-values and Turing-complete expressiveness; query languages have optimization techniques that can make programs several orders of magnitude faster, and runtimes that can use thousands of nodes to execute queries over terabytes of data.
Morel is an implementation of Standard ML on the JVM, with language extensions to allow relational expressions. Its compiler can translate programs to relational algebra and, via Apache Calcite’s query optimizer, run those programs on relational backends.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
(A talk given by Julian Hyde at Strange Loop 2021, St. Louis, MO, on October 1st, 2021.)
This slides describes the basic concepts of industrial-strength compiler design. This includes basic concept of static single-assignment form (SSA) and various optimizations such as dead code elimination, global value numbering, constant propagation, etc. This is intend for a 150 minutes undergraduate compiler class.
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
Many data scientists are already making heavy usage of the Jupyter ecosystem for analyzing data using interactive notebooks. Apache Toree (incubating) is a Jupyter kernel designed that enables data scientists and data engineers to easily connect and leverage Apache Spark and its powerful APIs from a standard Jupyter notebook to execute their analytics workloads. In this talk, we will go over what's new with the most recent Apache Toree release. We will cover available magics and visualizations extensions that can be integrated with Toree to enable better data exploration and data visualizations. We will also describe some high-level designs of Toree and how users can extend the functionality of Apache Toree powerful plugin system. And all of these with multiple live demos that demonstrate how Toree can help with your analytics workloads in an Apache Spark environment.
In this session, Luciano will be walking you through a real use case pipeline that uses Elyra features to help analyze COVID-19 related datasets. He will introduce Elyra, a project built to extend JupyterLab with AI-centric capabilities. He'll showcase the extensions that allow you to build Notebook Pipelines and execute these in a Kubeflow environment, execute notebooks as batch jobs, the ability to create, edit and execute Python scripts directly from JupyterLab
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
In this session Luciano will explore the different projects that compose the Jupyter ecosystem; including Jupyter Notebooks, JupyterLab, JupyterHub and Jupyter Enterprise Gateway. Jupyter Notebooks are the current open standard for data science and AI model development, and IBM is dedicated to contributing to their success and adoption. Continuing the trend of building out the Jupyter ecosystem, Luciano will introduce Elyra. It's a project built to extend JupyterLab with AI-centric capabilities. He'll showcase the extensions that allow you to build Notebook Pipelines, execute notebooks as batch jobs, navigate and execute Python scripts, and tie neatly into Notebook versioning.
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
The IBM Center for Open Source, Data and AI Technology "CODAIT" (https://developer.ibm.com/code/open/centers/codait/) works on multiple open-source Data and AI projects. In this section we will introduce these projects around Jupyter Notebooks, reusable Model and Data assets, Trusted AI among others.
The Jupyter Notebook has become the de facto platform used by data scientists and AI engineers to build interactive applications and develop their AI/ML models. In this scenario, it’s very common to decompose various phases of the development into multiple notebooks to simplify the development and management of the model lifecycle.
Luciano Resende details how to schedule together these multiple notebooks that correspond to different phases of the model lifecycle into notebook-based AI pipelines and walk you through scenarios that demonstrate how to reuse notebooks via parameterization.
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
Born in academia, Jupyter notebooks are prevalent in both learning and research environments throughout the scientific community. Due to the widespread adoption of big data, AI, and deep learning frameworks, notebooks are also finding their way into the enterprise, which introduces a different set of requirements.
Alan Chin and Luciano Resende explain how to introduce Jupyter Enterprise Gateway into new and existing notebook environments to enable a “bring your own notebook” model while simultaneously optimizing resources consumed by the notebook kernels running across managed clusters within the enterprise. Along the way, they detail how to use different frameworks with Enterprise Gateway to meet the needs of data scientists operating within the AI and deep learning ecosystems.
Scaling notebooks for Deep Learning workloadsLuciano Resende
Deep learning workloads are computing intensive, and training these type of models is better done with specialized hardware like GPUs. Luciano Resende outlines a pattern for building deep learning models using the Jupyter Notebook’s interactive development in commodity hardware and leveraging platforms and services such as Fabric for Deep Learning (FfDL) for cost-effective full dataset training of deep learning models.
Jupyter Enterprise Gateway enables Jupyter Notebook to launch remote kernels in a distributed cluster, including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes.
It provides out of the box support for the following kernels:
Python using IPython kernel
R using IRkernel
Scala using Apache Toree kernel
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
Nesta palestra vamos abordar algumas das tendências em Inteligência Artificial e as dificuldades na uso da Inteligência Artificial. Por isso, também apresentaremos algumas ferramentas disponíveis em código livre que podem ajudar a simplificar a adoção da IA. E faremos uma breve introdução ao “Call for Code” que é uma iniciativa da IBM para construir soluções na prevenção e reação a desastres naturais.
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
The Internet of Things (IoT) is all about connected devices that produce and exchange data, and building applications that produce insights from these high volumes of data is very challenging and require a understanding of multiple protocols, platforms and other components. On this session, we will start by providing a quick introduction to IoT, some of the common analytic patterns used on IoT, and also touch on the MQTT protocol and how it is used by IoT solutions some of the quality of services tradeoffs to be considered when building an IoT application. We will also discuss some of the Apache Spark platform components, the ones utilized by IoT applications to process devices streaming data.
We will also talk about Apache Bahir and some of its IoT connectors available for the Apache Spark platform. We will also go over the details on how to build, test and deploy an IoT application for Apache Spark using the MQTT data source for the new Apache Spark Structure Streaming functionality.
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
The Internet of Things (IoT) is all about connected devices that produce and exchange data, and producing insights from these high volumes of data is challenging. On this session, we will start by providing a quick introduction to the MQTT protocol, and focus on using AI and machine learning techniques to provide insights from data collected from IoT devices. We will present some common AI concepts and techniques used by the industry to deploy state-of-the-art smart IoT systems. These techniques allow systems to determined patterns from the data, predict and prevent failures as well as suggest actions that can be used to minimize or avoid IoT device breakdowns on an intelligent way beyond rule-based and database search approaches. We will finish with a demo that puts together all the techniques discussed in an application that uses Apache Spark and Apache Bahir support for MQTT.
This presentation describes some of the Open Source Ai projects we are working at the Center for Open Source, Data and AI Technologies (CODAIT), including Model Asset Exchange (MAX), Fabric for Deep Learning (FfDL) and Jupyter Enterprise Gateway.
Building analytical microservices powered by jupyter kernelsLuciano Resende
The Jupyter Kernels, which abstracts the computing engine used in Jupyter Notebooks, are a very powerful component that can be reutilized in different scenarios to bring analytical capabilities to applications. In this session, we will discuss how you can build a simple python based micro service that leverages Jupyter Kernels to incorporate sentiment analysis to the service it provides.
Building iot applications with Apache Spark and Apache BahirLuciano Resende
We leave in a connected world where connected devices are becoming part of our day to day and are providing invaluable streams of data. In this talk, we will introduce you to Apache Bahir and some of its IoT connectors available for Apache Spark. We will also go over the details on how to build, test and deploy an IoT application for Apache Spark using the MQTT data source for the new Apache Spark Structure Streaming functionality.
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale. Behind this service, there are various components that power this platform, including Jupyter Notebooks, an enterprise gateway that manages the execution of the Jupyter Kernels and an Apache Spark cluster that power the computation. In this session we will describe our experience and best practices putting together this analytical platform as a service based on Jupyter Notebooks and Apache Spark, in particular how we built the Enterprise Gateway that enables all the Notebooks to share the Spark cluster computational resources.
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale. Behind this service, there are various components that power this platform, including Jupyter Notebooks, an enterprise gateway that manages the execution of the Jupyter Kernels and an Apache Spark cluster that power the computation. In this session we will describe our experience and best practices putting together this analytical platform as a service based on Jupyter Notebooks and Apache Spark, in particular how we built the Enterprise Gateway that enables all the Notebooks to share the Spark cluster computational resources.
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
Data Scientists are becoming a necessity of every company in the data-centric world of today, and with them comes the requirement to make available a elastic and interactive analytics platform. This session will describe our experience and best practices putting together an Analytical platform based on Jupyter stack and different kernels running in a distributed Apache Spark cluster.
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
Big Data is all about being to access and process data in various formats, and from various sources. Apache Bahir provides extensions to distributed analytic platforms providing them access to different data sources. In this talk we will introduce you to Apache Bahir and its various connectors that are available for Apache Spark and Apache Flink. We will also go over the details of how to build, test and deploy an Spark Application using the MQTT data source for the new Apache Spark 2.0 Structure Streaming functionality.
How mentoring can help you start contributing to open sourceLuciano Resende
As adoption of Open Source code and development practices continues to gain momentum, more newcomers have become interested in getting involved and contributing to Open Source. However, it's usually not easy for newcomers to start contributing to open source projects. This session will discuss how community mentors can ease the way for newcomers to get started with open source, and will provide an overview of existing mentoring programs such as Google Summer of Code that can help you get paired with community mentors and start contributing to open source right away.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
What's new in Apache SystemML - Declarative Machine Learning
1. IBM SparkTechnology Center
Apache SystemML
Declarative Machine Learning
Luciano Resende
IBM | Spark Technology Center
BigDataDevelopersMeetup–Spain/Madrid–Nov2017
3. Open Source Community Leadership
Spark Technology Center
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee memberships OSS Advisory Board
Open Source
4. 4
IBM
Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark
community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications
— http://bigdatauniversity.com
Key statistics:
About 50 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now an Apache Incubator project.
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
Spark Technology Center
5. Contributions 46,385 Spark LOC
863 Spark JIRAs
457 SystemML JIRAs
67 Speakers at Events
Spark Technology Center
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC impact on community
6. Spark Technology Center
Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
SQL
TPC-DS and Performance
Query Pushdown/Federation
Project Focus Areas
6
8. Origins of the SystemML Project
2007-2008: Multiple projects at IBM Research – Almaden involving machine
learning on Hadoop.
2009: We create a dedicated team for scalable ML.
2009-2010: Through engagements with customers, we observe how data scientists
create machine learning algorithms.
11. State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per
iteration
😞 Errors while translating
algorithms
14. 14
Linear Algebra
is the Language of Machine Learning.
Linear algebra is
powerful,
precise,
and high-level.
Express complex transformations over
large arrays of data…
…using a small number of instructions.
…in a clear and unambiguous way
SystemML Provides
Highly Optimized
Distributed Linear
Algebra
15. Running Example:
Alternating Least Squares
Problem:
Movie Recommendations
Movies
Users
i
j
User i liked movie
j.
Movies Factor
UsersFactor
Multiply these two
factors to produce a
less-sparse matrix.
×
New nonzero values
become movies
suggestions.
16. Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
17. Alternating Least Squares (in R)
1. Start with random factors.
2. Hold the Movies factor constant and
find the best value for the Users factor.
(Value that most closely approximates the original matrix)
3. Hold the Users factor constant and find
the best value for the Movies factor.
4. Repeat steps 2-3 until convergence.
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
1
2
2
3
3
4
4
4
Every line has a clear purpose!
23. Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
24. Alternating Least Squares (in R)
SystemML can compile and run this algorithm at scale
No additional performance code needed!
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
(in SystemML’s
subset of R)
25. How fast does it run?
Running time comparisons between machine learning algorithms are problematic
Different, equally-valid answers
Different convergence rates on different data
But we’ll do one anyway
26. Spark Technology CenterPerformance Comparison: ALS
0
5000
10000
15000
20000
1.2GB (sparse
binary)
12GB 120GB
RunningTime(sec)
R
MLLib
SystemML
>24h>24h
OOM
OOM
Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data,
sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations
tuned so that all algorithms produce comparable result quality.Details:
27. SystemML runs the R script in parallel
Same answer as original R script
Performance is comparable to a low-level RDD-
based implementation
Also, for python lovers, equivalent python DML
exists!
How does SystemML achieve this result?
Takeaway Points
28. The SystemML Optimizer and Runtime for Spark
Automates critical performance
decisions
Distributed or local computation?
How to partition the data?
To persist or not to persist?
Distributed vs local: Hybrid runtime
Multithreaded computation in Spark
Driver
Distributed computation in Spark
Executors
Optimizer makes a cost-based choice
28
High-Level Operations (HOPs)
General representation of statements in the data
analysis language
Low-Level Operations (LOPs)
General representation of operations in the
runtime framework
High-level language
front-ends
Multiple execution
environments
Cost
Based
Optimizer
29. Many other rewrites
Cost-based selection of operators
Dynamic recompilation for accurate stats
Parallel FOR (ParFor) optimizer
Direct operations on RDD partitions
YARN and MapReduce support
New in Next Release: Compressed Linear
Algebra
29
But wait, there’s
more!
30. Summary
Cost-based compilation of machine learning algorithms generates execution plans
for single-node in-memory, cluster, and hybrid execution
for varying data characteristics:
varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data
for varying cluster characteristics (memory configurations, degree of parallelism)
Out-of-the-box, scalable machine learning algorithms
e.g. descriptive statistics, regression, clustering, and classification
"Roll-your-own" algorithms
Enable programmer productivity (no worry about scalability, numeric stability, and optimizations)
Fast turn-around for new algorithms
Higher-level language shields algorithm development investment from platform
progression
Yarn for resource negotiation and elasticity
Spark for in-memory, iterative processing
34. Expressing Algorithms with SystemML
Gaussian Nonnegative Matrix Factorization
in DML (SystemML’s R-like syntax)
while (i < max_iteration) {
H <- H * ((t(W) %*% V) /
(((t(W) %*% W) %*% H)+Eps))
W <- W * ((V %*% t(H)) /
((W %*% (H %*% t(H)))+Eps))
i <- i + 1
}
Gaussian Nonnegative Matrix Factorization
in PyDML (SystemML’s Python-like syntax)
while (i < max_iteration):
H = H * (dot(W.transpose(), V) /
(dot(dot(W.transpose(), W, H)
+ Eps))
W = W * (dot(V, H.transpose()) /
(dot(W, dot(H,H.transpose()))
+ Eps))
i = i + 1
34
SystemML users write machine learning algorithms in a domain specific language.
SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications
The R4ML project provides similar functionality for SparkR.
35. Scikit-Learn
Compatibility: The
MLLearn API
Python API designed to be compatible with scikit-
learn and Spark MLPipelines
Algorithms that are currently part of mllearn API:
•LogisticRegression, LinearRegression, SVM, NaiveBayes
and Caffe2DML (discussed later)
Hyperparameter naming/initialization similar to
scikit-learn (penalty, fit_intercept,
normalize, …) to reduce learning curve
Supports loading and saving the model
36. Linear Regression Example
From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Python script using sklearn
Changes required to run on SystemML
37. Integration with Apache Spark’s ML Pipelines
Changes required to run on SystemML
From https://spark.apache.org/docs/latest/ml-pipeline.html
38. 38
caffe2dml
(experimental)
caffe2dml is a tool that converts the
specification for a Caffe deep learning model
into a SystemML script to perform training or
scoring at scale.
The generated scripts produce TensorBoard-
compatible log output.
Caffe2DML
Caffe
Network
File
Caffe
Solver
File
Log
Generated DML
Script
Apache
SystemML
40. SystemML Deep
Learning `nn`
Library
• Deep learning library written in DML.
• Multiple layers:
• Core: Affine, 2D Conv, 2D Transpose Conv, 2D
Max Pooling, 1D/2D Batch Norm, RNN, LSTM
• Nonlinearity/Transfer: ReLU, Sigmoid, Tanh,
Softmax
• Regularization: Dropout, L1, L2
• Loss: Log-loss, Cross-entropy, L1, L2
• Multiple optimizers:
• SGD, SGD w/ momentum, SGD w/ Nesterov
momentum, Adagrad, RMSprop, Adam
• Layers have a simple `forward` & `backward` API.
• Optimizers have a simple `update` API.
https://github.com/apache/systemml/tree/master/scripts/nn
(LeNet-like convnet)
41. 41
GPU Support in
SystemML Spark Technology Center
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the
system improves.
9
42. 42
GPU Support in
SystemML
SystemML’s optimizer can target multiple runtime back
ends:
Single-node SMP
Multi-node Spark
Hybrid: Large SMP plus a pool of Spark workers
We are adding new GPU-accelerated runtimes to SystemML
Single-node single GPU
Single-node multi-GPU
Distributed multi-GPU on Spark
GPU-accelerate an algorithm without changing its code
43. 43
GPU Support in
SystemML:
Current Status
(In Progress) Single Node, Single GPU Support
• Deep Neural Network Operators
conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply,
max_pooling, max_pooling_backward, relu_max_pooling,
relu_max_pooling_backward
• Unary Aggregates
{All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product
• Matrix Multiplication
Various shapes & sparsities
• Transpose
• Matrix-Matrix and Matrix-Scalar Element-Wise
+, -, *, /, ^
• Trigonometric & Mathematical Operations (on entire Matrices)
sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve
• Some Fused/Special Case Operators
Ax+y, X*t(X), Max(X, 0.0)
• (In Progress) Automatically determine whether to use the GPU or not
(In Progress) - Single Node, Multiple GPU Support
(Planned) - Multiple Node, Multiple GPU Support
44. 44
Summary:
Cool New Stuff in
Apache
SystemML
Top-level Apache project
API improvements
Deep learning
Code generation
Compressed linear algebra
47. Tutorial hosted at IBM developerWorks Code
Patterns
https://developer.ibm.com/code/patterns/perform-a-machine-learning-
exercise/
Tutorial source code available on GitHub
https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_-
perform-a-machine-learning-exercise-_-Get-the-Code
Try this on DSX/IBM Cloud
https://ibm.biz/BdjJJG
47
SystemML
Tutorial
49. For
More
Information…
Try Apache SystemML!
http://systemml.apache.org
Read our VLDB 2016 paper on compressed linear algebra:
Best Paper award!
Ahmed Elgohary et al, “Compressed Linear Algebra for Large-
Scale Machine Learning.” VLDB 2016
Read our CIDR 2017 paper on codegen:
Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and
Operator Fusion for Large-Scale Machine Learning,” CIDR
2017
Get the slides for our Strata 2016 talk on deep learning with
SystemML:
Leveraging deep learning to predict breast cancer proliferation
scores with Apache Spark and Apache SystemML49