Big Data with Hadoop & Spark Training: http://bit.ly/2Jab9wX
This CloudxLab Advanced Spark Programming tutorial helps you to understand Advanced Spark Programming in detail. Below are the topics covered in this tutorial:
1) Persistence (Caching) in Spark
2) Persistence Storage Level
3) Which Storage Level to Choose?
4) Data Partitioning in Spark
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyP2Ct
This CloudxLab Introduction to NoSQL tutorial helps you to understand NoSQL in detail. Below are the topics covered in this slide:
1) Introduction to NoSQL
2) Scaling Out vs Scaling Up
3) ACID - Properties of DB Transactions
4) RDBMS - Story
5) What is NoSQL?
6) Types Of NoSQL Stores
7) CAP Theorem
8) Serialization
9) Column Oriented Database
10) Column Family Oriented DataStore
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2spQIBA
This CloudxLab Introduction to Apache Spark tutorial helps you to understand Spark in detail. Below are the topics covered in this tutorial:
1) Spark Architecture
2) Why Apache Spark?
3) Shortcoming of MapReduce
4) Downloading Apache Spark
5) Starting Spark With Scala Interactive Shell
6) Starting Spark With Python Interactive Shell
7) Getting started with spark-submit
A very short set of slides to describe an RDD data structure.
Extracted from my 3-day course: www.sparkInternals.com
There is also a video of this on YouTube: http://youtu.be/odcEg515Ne8
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2so9ZDk
This CloudxLab Introduction to Structured Streaming tutorial helps you to understand Structured Streaming in detail. Below are the topics covered in this tutorial:
1) Structured Streaming - Introduction
2) Word Count using Structured Streaming
3) Programming Model
4) Output Modes - Complete, Append and Update
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyP2Ct
This CloudxLab Introduction to NoSQL tutorial helps you to understand NoSQL in detail. Below are the topics covered in this slide:
1) Introduction to NoSQL
2) Scaling Out vs Scaling Up
3) ACID - Properties of DB Transactions
4) RDBMS - Story
5) What is NoSQL?
6) Types Of NoSQL Stores
7) CAP Theorem
8) Serialization
9) Column Oriented Database
10) Column Family Oriented DataStore
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2spQIBA
This CloudxLab Introduction to Apache Spark tutorial helps you to understand Spark in detail. Below are the topics covered in this tutorial:
1) Spark Architecture
2) Why Apache Spark?
3) Shortcoming of MapReduce
4) Downloading Apache Spark
5) Starting Spark With Scala Interactive Shell
6) Starting Spark With Python Interactive Shell
7) Getting started with spark-submit
A very short set of slides to describe an RDD data structure.
Extracted from my 3-day course: www.sparkInternals.com
There is also a video of this on YouTube: http://youtu.be/odcEg515Ne8
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2so9ZDk
This CloudxLab Introduction to Structured Streaming tutorial helps you to understand Structured Streaming in detail. Below are the topics covered in this tutorial:
1) Structured Streaming - Introduction
2) Word Count using Structured Streaming
3) Programming Model
4) Output Modes - Complete, Append and Update
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming.
Presented at the Desert Code Camp:
http://oct2016.desertcodecamp.com/sessions/all
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyRTuW
This CloudxLab Advanced Spark Programming tutorial helps you to understand Advanced Spark Programming in detail. Below are the topics covered in this slide:
1) Shared Variables - Accumulators & Broadcast Variables
2) Accumulators and Fault Tolerance
3) Custom Accumulators - Version 1.x & Version 2.x
4) Examples of Broadcast Variables
5) Key Performance Considerations - Level of Parallelism
6) Serialization Format - Kryo
7) Memory Management
8) Hardware Provisioning
This deep dive attempts to "de-mystify" Spark by touching on some of the main design philosophies and diving into some of the more advanced features that make it such a flexible and powerful cluster computing framework. It will touch on some common pitfalls and attempt to build some best practices for building, configuring, and deploying Spark applications.
In these slides is given an overview of the different parts of Apache Spark.
We analyze spark shell both in scala and python. Then we consider Spark SQL with an introduction to Data Frame API. Finally we describe Spark Streaming and we make some code examples.
Topics:spark-shell, pyspark, HDFS, how to copy file to HDFS, spark transformations, spark actions, Spark SQL (Shark),
spark streaming, streaming transformation stateless vs stateful, sliding windows, examples
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
We will see internal architecture of spark cluster i.e what is driver, worker, executor and cluster manager, how spark program will be run on cluster and what are jobs,stages and task.
we will see an overview of Spark in Big Data. We will start with an introduction to Apache Spark Programming. Then we will move to know the Spark History. Moreover, we will learn why Spark is needed. Afterward, will cover all fundamental of Spark components. Furthermore, we will learn about Spark’s core abstraction and Spark RDD. For more detailed insights, we will also cover spark features, Spark limitations, and Spark Use cases.
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm9c61
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Loading XML
2) What is RPC - Remote Process Call
3) Loading AVRO
4) Data Sources - Parquet
5) Creating DataFrames From Hive Table
6) Setting up Distributed SQL Engine
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming.
Presented at the Desert Code Camp:
http://oct2016.desertcodecamp.com/sessions/all
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming.
Presented at the Desert Code Camp:
http://oct2016.desertcodecamp.com/sessions/all
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyRTuW
This CloudxLab Advanced Spark Programming tutorial helps you to understand Advanced Spark Programming in detail. Below are the topics covered in this slide:
1) Shared Variables - Accumulators & Broadcast Variables
2) Accumulators and Fault Tolerance
3) Custom Accumulators - Version 1.x & Version 2.x
4) Examples of Broadcast Variables
5) Key Performance Considerations - Level of Parallelism
6) Serialization Format - Kryo
7) Memory Management
8) Hardware Provisioning
This deep dive attempts to "de-mystify" Spark by touching on some of the main design philosophies and diving into some of the more advanced features that make it such a flexible and powerful cluster computing framework. It will touch on some common pitfalls and attempt to build some best practices for building, configuring, and deploying Spark applications.
In these slides is given an overview of the different parts of Apache Spark.
We analyze spark shell both in scala and python. Then we consider Spark SQL with an introduction to Data Frame API. Finally we describe Spark Streaming and we make some code examples.
Topics:spark-shell, pyspark, HDFS, how to copy file to HDFS, spark transformations, spark actions, Spark SQL (Shark),
spark streaming, streaming transformation stateless vs stateful, sliding windows, examples
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
We will see internal architecture of spark cluster i.e what is driver, worker, executor and cluster manager, how spark program will be run on cluster and what are jobs,stages and task.
we will see an overview of Spark in Big Data. We will start with an introduction to Apache Spark Programming. Then we will move to know the Spark History. Moreover, we will learn why Spark is needed. Afterward, will cover all fundamental of Spark components. Furthermore, we will learn about Spark’s core abstraction and Spark RDD. For more detailed insights, we will also cover spark features, Spark limitations, and Spark Use cases.
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm9c61
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Loading XML
2) What is RPC - Remote Process Call
3) Loading AVRO
4) Data Sources - Parquet
5) Creating DataFrames From Hive Table
6) Setting up Distributed SQL Engine
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming.
Presented at the Desert Code Camp:
http://oct2016.desertcodecamp.com/sessions/all
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
In this second part, we'll continue the Spark's review and introducing SparkSQL which allows to use data frames in Python, Java, and Scala; read and write data in a variety of structured formats; and query Big Data with SQL.
Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek AlumniDemi Ben-Ari
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your saviour.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Bio:
Demi Ben-Ari, Sr. Data Engineer @Windward, Ofek Alumni
Has over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Co-Founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-i...
Understanding computer vision with Deep LearningCloudxLab
Computer vision is a branch of computer science which deals with recognising objects, people and identifying patterns in visuals. It is basically analogous to the vision of an animal.
Topics covered:
1. Overview of Machine Learning
2. Basics of Deep Learning
3. What is computer vision and its use-cases?
4. Various algorithms used in Computer Vision (mostly CNN)
5. Live hands-on demo of either Auto Cameraman or Face recognition system
6. What next?
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/5u2RiS )
This CloudxLab Reinforcement Learning tutorial helps you to understand Reinforcement Learning in detail. Below are the topics covered in this tutorial:
1) What is Reinforcement?
2) Reinforcement Learning an Introduction
3) Reinforcement Learning Example
4) Learning to Optimize Rewards
5) Policy Search - Brute Force Approach, Genetic Algorithms and Optimization Techniques
6) OpenAI Gym
7) The Credit Assignment Problem
8) Inverse Reinforcement Learning
9) Playing Atari with Deep Reinforcement Learning
10) Policy Gradients
11) Markov Decision Processes
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm5Ekd
This CloudxLab Key-Value RDD Transformations tutorial helps you to understand Key-Value RDD transformations in detail. Below are the topics covered in this tutorial:
1) Transformations on Key-Value Pair RDD - keys(), values(), groupByKey(), combineByKey(), sortByKey(), subtractByKey(), join(), leftOuterJoin(), rightOuterJoin(), cogroup(), countByKey() and lookup()
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sh5b3E
This CloudxLab Hadoop Streaming tutorial helps you to understand Hadoop Streaming in detail. Below are the topics covered in this tutorial:
1) Hadoop Streaming and Why Do We Need it?
2) Writing Streaming Jobs
3) Testing Streaming jobs and Hands-on on CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabCloudxLab
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/6n3vko )
This CloudxLab TensorFlow tutorial helps you to understand TensorFlow in detail. Below are the topics covered in this tutorial:
1) Why TensorFlow?
2) What are Tensors?
3) What is TensorFlow?
4) Creating your First Graph
5) Linear Regression with TensorFlow
6) Implementing Gradient Descent using TensorFlow
7) Implementing Gradient Descent Using autodiff
8) Implementing Gradient Descent Using an Optimizer
9) Graph Visualization using TensorBoard
10) Name Scopes in TensorFlow
11) Modularity in TensorFlow
12) Sharing Variables in TensorFlow
Introduction to Deep Learning | CloudxLabCloudxLab
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/goQxnL )
This CloudxLab Deep Learning tutorial helps you to understand Deep Learning in detail. Below are the topics covered in this tutorial:
1) What is Deep Learning
2) Deep Learning Applications
3) Artificial Neural Network
4) Deep Learning Neural Networks
5) Deep Learning Frameworks
6) AI vs Machine Learning
In this tutorial, we will learn the the following topics -
+ The Curse of Dimensionality
+ Main Approaches for Dimensionality Reduction
+ PCA - Principal Component Analysis
+ Kernel PCA
+ LLE
+ Other Dimensionality Reduction Techniques
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
In this tutorial, we will learn the the following topics -
+ Training and Visualizing a Decision Tree
+ Making Predictions
+ Estimating Class Probabilities
+ The CART Training Algorithm
+ Computational Complexity
+ Gini Impurity or Entropy?
+ Regularization Hyperparameters
+ Regression
+ Instability
In this tutorial, we will learn the the following topics -
+ Linear SVM Classification
+ Soft Margin Classification
+ Nonlinear SVM Classification
+ Polynomial Kernel
+ Adding Similarity Features
+ Gaussian RBF Kernel
+ Computational Complexity
+ SVM Regression
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2wLh5aF
This CloudxLab Introduction to Linux helps you to understand Linux in detail. Below are the topics covered in this tutorial:
1) Linux Overview
2) Linux Components - The Programs, The Kernel, The Shell
3) Overview of Linux File System
4) Connect to Linux Console
5) Linux - Quick Start Commands
6) Overview of Linux File System
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LF3pBA
This CloudxLab Introduction to Pig & Pig Latin tutorial helps you to understand Pig and Pig Latin in detail. Below are the topics covered in this tutorial:
1) Introduction to Pig
2) Why Do We Need Pig?
3) Pig - Usecases
4) Pig - Philosophy
5) Pig Latin - Data Flow Language
6) Pig - Local and MapReduce Mode
7) Pig Data Types
8) Load, Store, and Dump in Pig
9) Lazy Evaluation in Pig
10) Pig - Relational Operators - FOREACH, GROUP and FILTER
11) Hands-on on Pig - Calculate Average Dividend of NYSE
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2JjXp2u
This CloudxLab Oozie tutorial helps you to understand Oozie in detail. Below are the topics covered in this tutorial:
1) Introduction to Oozie
2) Oozie - Workflow & Coordinator Jobs
3) Oozie - Workflow jobs - DAG (Directed Acyclic Graph)
4) Oozie Use cases
5) Oozie Workflow - XML
6) Oozie Hands-on on the command line and Hue
7) Oozie WorkFlow for Hive
8) Execute shell script using Oozie Workflow
9) Run and debug the Spark task on Oozie
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
4. Advanced Spark Programming
1. RDDs are lazily evaluated
2. RDD and its dependencies are recomputed on action
Persistence (caching)
5. Advanced Spark Programming
1. RDDs are lazily evaluated
2. RDD and its dependencies are recomputed on action
3. We may wish to use the same RDD multiple times.
Persistence (caching)
6. Advanced Spark Programming
1. RDDs are lazily evaluated
2. RDD and its dependencies are recomputed on action
3. We may wish to use the same RDD multiple times.
4. To avoid re-computing, we can persist the RDD
Persistence (caching)
7. Advanced Spark Programming
1. If a node that has data persisted on it fails, Spark will
recompute the lost partitions of the data when needed.
Persistence (caching)
8. Advanced Spark Programming
Persistence (caching)
1. If a node that has data persisted on it fails, Spark will
recompute the lost partitions of the data when needed.
2. We can also replicate our data on multiple nodes if we
want to be able to handle node failure without slowdown.
11. Advanced Spark Programming
1. var nums = sc.parallelize((1 to 100000), 50)
2. def mysum(itr:Iterator[Int]):Iterator[Int] = {
3. return Array(itr.sum).toIterator
4. }
5.
6. var partitions = nums.mapPartitions(mysum)
Persistence (caching) - Example
12. Advanced Spark Programming
1. var nums = sc.parallelize((1 to 100000), 50)
2. def mysum(itr:Iterator[Int]):Iterator[Int] = {
3. return Array(itr.sum).toIterator
4. }
5.
6. var partitions = nums.mapPartitions(mysum)
7. def incrByOne(x:Int ):Int = {
a. return x+1;
8. }
9.
10. var partitions1 = partitions.map(incrByOne)
11. partitions1.collect()
12. // say, partitions1 is going to be used very frequently
Persistence (caching) - Example
13. Advanced Spark Programming
Persistence (caching) - Example
persist()
partitions1.persist()
res21: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[4] at map at <console>:29
partitions1.getStorageLevel
res27: org.apache.spark.storage.StorageLevel = StorageLevel(false, true, false, true, 1)
14. Advanced Spark Programming
//To unpersist an RDD - before repersisting we need to unpersist
partitions1.unpersist()
Persistence (caching) - Example
persist()
15. Advanced Spark Programming
//To unpersist an RDD - before repersisting we need to unpersist
partitions1.unpersist()
import org.apache.spark.storage.StorageLevel
//This persists our RDD into memrory and disk
partitions1.persist(StorageLevel.MEMORY_AND_DISK)
Persistence (caching) - Example
persist()
16. Advanced Spark Programming
//To unpersist an RDD - before repersisting we need to unpersist
partitions1.unpersist()
import org.apache.spark.storage.StorageLevel
//This persists our RDD into memrory and disk
partitions1.persist(StorageLevel.MEMORY_AND_DISK)
partitions1.getStorageLevel
res2: org.apache.spark.storage.StorageLevel = StorageLevel(true, true, false, true, 1)
StorageLevel.MEMORY_AND_DISK
res7: org.apache.spark.storage.StorageLevel = StorageLevel(true, true, false, true, 1)
Persistence (caching) - Example
persist()
26. Advanced Spark Programming
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit
in memory, some partitions will not be cached and will be recomputed on
the fly each time they're needed. This is the default level.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
partitions1.persist(StorageLevel.MEMORY_ONLY)
27. Advanced Spark Programming
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit
in memory, store the partitions on disk that don't fit in memory, and read
them from there when they're needed.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
partitions1.persist(StorageLevel.MEMORY_AND_DISK)
28. Advanced Spark Programming
Store RDD as serialized Java objects (one byte array per partition). This is
generally more space-efficient than deserialized objects, especially when
using a fast serializer, but more CPU-intensive to read.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
partitions1.persist(StorageLevel.MEMORY_ONLY_SER)
29. Advanced Spark Programming
Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in
memory to disk instead of recomputing them on the fly each time they're
needed.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
partitions1.persist(StorageLevel.MEMORY_AND_DISK_SER)
30. Advanced Spark Programming
Store the RDD partitions only on disk.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
partitions1.persist(StorageLevel.DISK_ONLY)
31. Advanced Spark Programming
Same as the levels above, but replicate each partition on two cluster nodes.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
- - - -
2
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
32. Advanced Spark Programming
Store RDD in serialized format in Tachyon.
Persistence (caching) - StorageLevel
useDisk useMemory useOffHeap deserialized replication
1
OFF_HEAP (experimental)
35. Advanced Spark Programming
● RDD fits comfortably in memory, use MEMORY_ONLY
● If not, try MEMORY_ONLY_SER
Which Storage Level to Choose?
36. Advanced Spark Programming
Which Storage Level to Choose?
● RDD fits comfortably in memory, use MEMORY_ONLY
● If not, try MEMORY_ONLY_SER
● Don’t persist to disk unless, the computation is really expensive
37. Advanced Spark Programming
Which Storage Level to Choose?
● RDD fits comfortably in memory, use MEMORY_ONLY
● If not, try MEMORY_ONLY_SER
● Don’t persist to disk unless, the computation is really expensive
● For fast fault recovery, use replicated
39. Advanced Spark Programming
Data Partitioning
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
40. Advanced Spark Programming
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
● Helpful when dataset is reused multiple times in key-oriented operations
Data Partitioning
41. Advanced Spark Programming
Data Partitioning
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
● Helpful when dataset is reused multiple times in key-oriented operations
● Available for k-v rdds
42. Advanced Spark Programming
Data Partitioning
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
● Helpful when dataset is reused multiple times in key-oriented operations
● Available for k-v rdds
● Causes system to group elements based on key
43. Advanced Spark Programming
Data Partitioning
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
● Helpful when dataset is reused multiple times in key-oriented operations
● Available for k-v rdds
● Causes system to group elements based on key
● Spark does not give explicit control which worker node has which key
44. Advanced Spark Programming
Data Partitioning
● Layout data to minimize transfer
● Not helpful if you need to scan the dataset only once
● Helpful when dataset is reused multiple times in key-oriented operations
● Available for k-v rdds
● Causes system to group elements based on key
● Spark does not give explicit control which worker node has which key
● It lets program control/ensure which set of key will appear together
○ - based on some hash value for e.g.
○ - or you could range-sort
45. Advanced Spark Programming
Data Partitioning - Example - Subscriptions
Objective:
Count how many users visited a link that was not one of their subscribed topics.
UserID LinkInfoUserID UserSubscriptionTopicsInfo
UserData
(subscriptions) Events
46. Advanced Spark Programming
Data Partitioning - Example
Code will run fine as is, but it will be inefficient. Lot of network transfer
Periodically, we combines UserData with last five mins events (smaller)
47. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
//.partitionBy( new HashPartitioner( 100))
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
val events = sc.sequenceFile[ UserID, LinkInfo]( logFileName)
val joined = userData.join( events) // RDD of (UserID, (UserInfo, LinkInfo)) pairs
val offTopicVisits = joined.filter({
case (userId, (userInfo, linkInfo)) = > !userInfo.topics.contains( linkInfo.topic)
}).count()
println(" Number of visits to non-subscribed topics: " + offTopicVisits)
}
48. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
.persist()
49. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
50. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
val events = sc.sequenceFile[ UserID, LinkInfo]( logFileName)
51. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
val events = sc.sequenceFile[ UserID, LinkInfo]( logFileName)
val joined = userData.join( events) // RDD of (UserID, (UserInfo, LinkInfo)) pairs
52. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
val events = sc.sequenceFile[ UserID, LinkInfo]( logFileName)
val joined = userData.join( events) // RDD of (UserID, (UserInfo, LinkInfo)) pairs
val offTopicVisits = joined.filter {
case (userId, (userInfo, linkInfo)) = > !userInfo.topics.contains( linkInfo.topic)
}.count()
53. Advanced Spark Programming
Data Partitioning - Example
val sc = new SparkContext(...)
val userData = sc
.sequenceFile[ UserID, UserInfo](" hdfs://...")
//.partitionBy( new HashPartitioner( 100))
.persist()
// Function called periodically to process a logfile of events in the past 5 minutes;
// we assume that this is a SequenceFile containing (UserID, LinkInfo) pairs.
def processNewLogs( logFileName: String) {
val events = sc.sequenceFile[ UserID, LinkInfo]( logFileName)
val joined = userData.join( events) // RDD of (UserID, (UserInfo, LinkInfo)) pairs
val offTopicVisits = joined.filter {
case (userId, (userInfo, linkInfo)) = > !userInfo.topics.contains( linkInfo.topic)
}.count()
println(" Number of visits to non-subscribed topics: " + offTopicVisits)
}
55. Advanced Spark Programming
Data Partitioning - Example
import org.apache.spark.HashPartitioner
sc = SparkContext(...)
basedata = sc.sequenceFile("hdfs://...")
var userData = basedata.partitionBy(new HashPartitioner(8)).persist()
56. Advanced Spark Programming
Data Partitioning - Partitioners
1. Reorganize the keys of a PairRDD
2. Can be applied using partionBy
3. Examples
a. HashPartitioner
b. RangePartitioner
57. Advanced Spark Programming
● Implements hash-based partitioning using Java's Object.hashCode.
● Does not work if the key is an array
Data Partitioning - HashPartitioner
58. Advanced Spark Programming
● Partitions sortable records by range into roughly equal ranges
● The ranges are determined by sampling
Data Partitioning - RangePartitioner