Shared by Mansoor Mirza
Distributed Computing
What is it?
Why & when we need it?
Comparison with centralized computing
‘MapReduce’ (MR) Framework
Theory and practice
‘MapReduce’ in Action
Using Hadoop
Lab exercises
Shared by Mansoor Mirza
Distributed Computing
What is it?
Why & when we need it?
Comparison with centralized computing
‘MapReduce’ (MR) Framework
Theory and practice
‘MapReduce’ in Action
Using Hadoop
Lab exercises
Programa del PSC Premià de Mar (versió extensa)txuscruz
Programa de govern (versió extensa) que presenta el PSC de Premià de Mar i el seu candidat: Tomàs Esteban, a les eleccions municipals del 22 de maig del 2011
In this talk I have discussed some ideas of BigData distribution using CDNs (Content Delivery Networks). These ideas included not only the static content, but had primarily content pre-computation in focus. I have also discussed some basic technical tricks of global content distribution
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
The need to process huge data is increasing day by day. Processing huge data involves compute, network and storage. In terms of Big Data, What it takes to innovate and what is innovation at the end? This talk provide high level details on the need of big data and capabilities of Mapr converged data platform.
Speaker: Vijaya Saradhi Uppaluri, Technical Director at MapR Technologies
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Programa del PSC Premià de Mar (versió extensa)txuscruz
Programa de govern (versió extensa) que presenta el PSC de Premià de Mar i el seu candidat: Tomàs Esteban, a les eleccions municipals del 22 de maig del 2011
In this talk I have discussed some ideas of BigData distribution using CDNs (Content Delivery Networks). These ideas included not only the static content, but had primarily content pre-computation in focus. I have also discussed some basic technical tricks of global content distribution
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
In this talk I will introduce you to a Docker container that provides you an easy way to do distributed graph processing using Apache Spark GraphX and a Neo4j graph database. You'll learn how to analyze big data graphs that are exported from Neo4j and consequently updated from the results of a Spark GraphX analysis. The types of analysis I will be talking about are PageRank, connected components, triangle counting, and community detection.
Database technologies have evolved to be able to store big data, but are largely inflexible. For complex graph data models stored in a relational database there may be tedious transformations and shuffling around of data to perform large scale analysis.
Fast and scalable analysis of big data has become a critical competitive advantage for companies. There are open source tools like Apache Hadoop and Apache Spark that are providing opportunities for companies to solve these big data problems in a scalable way. Platforms like these have become the foundation of the big data analysis movement.
Speakers
The need to process huge data is increasing day by day. Processing huge data involves compute, network and storage. In terms of Big Data, What it takes to innovate and what is innovation at the end? This talk provide high level details on the need of big data and capabilities of Mapr converged data platform.
Speaker: Vijaya Saradhi Uppaluri, Technical Director at MapR Technologies
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Big Data vs. Small Data...what's the difference?Anna Kuhn
What is big data? A 3-pg summary of the key differences between "big data" and "small data."
Includes comparison of data jargon, high level technologies, staffing / people, and the nature of the data itself.
Perfect for data-savvy marketers & agencies, and beginner-to-intermediate data and analytics professionals.
Harry Potter and Enormous Data (Pavlo Baron)Pavlo Baron
Slides of the talk I did at the Jboss One Day Talk 2012. It explains the typical use cases, theoretical aspects and practical implementations of something one calls "big data". The point is to understand how to deal wiith enormous data emounts
Streaming analytics on Google Cloud Platform, by Javier Ramirez, teowakijavier ramirez
Do you think you can write a system to get data from sensors across the world, do real time analytics, and display the data on a dashboard in under 100 lines of code? Would you like to add some monitoring and autoscaling too? And what about serverless? In this talk I'll show you all the technologies GCP offers to build such a system reliably and at scale.
Creating a Data Science Team from an Architect's perspective. This is about team building on how to support a data science team with the right staff, including data engineers and devops.
Presentation slides by D Ruth Bavousett, Lead Migration Specialist from ByWater Solutions, on the method and process of an ILS Migration, with particular attention to Koha migrations.
Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron
That's the slides of my half day workshop at the EFS'11 in Stuttgart where I covered some theoretical aspects of NoSQL data stores relevant for dealing with large data amounts
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
8. Peta bytes of data every hour
on different continents, with
complex relations and with
the need to analyze them
in almost real time for anomalies
and to visualize them for
your management
9. You can easily call this big data.
Everything below this you can
call Mickey Mouse data
10. Good news is: you can
intentionally grow – collect
more data. And you should!
16. Oh.
When you expect big data, you
need to scale very far and thus
build on distribution and
combine theoretically
unlimited amount of machines to
one single distributed storage
17. There is no way around.
Except you invent the BlackHoleDB
18. np, dude!
NoSQL scales and is
cool.
They achieve this
through sharding,
ya know?
Sharding hides
this distribution
stuff
22. The only thing that is absolutely
certain about distributed systems is
that parts of them will fail and you
will have no idea where and what
the hell is going on
23. So your P must be a given in a
distributed system. And you want to
play with C vs. A, not just take black
or white
25. Seriously?
For example, one of the hardest
challenges with big data
is to distribute/shard parts over
several machines still having fast
traversals and reads, thus
keeping related data together.
Valid for graph and any other data
store, also NoSQL, kind of
26. Another hard challenge with sharding
is to avoid naive hashing.
Naive hashing would make you depend
on the number of nodes and would not
allow you to easily add or remove
nodes to/from the system
27. And still, the trade-off between data
locality, consistency, availability,
read/write/search speed, latency etc.
is hard
33. Data locality, redundancy,
consistent hashing and eventual
consistency combined with use
case driven storage design
are key principles in succeeding
with a huge distributed data storage.
That's big data development
36. When you have thousands or millions
parallel requests per second, begging
for data, the first mile will (also) quickly
become the bottle neck.
Requests will get queued and discarded
as soon as your server doesn't bring
data fast enough through the pipe
38. I bet you will.
But under high load, your hardware
will more or less quickly start to crack
39. You'll burn your hard disks, boards and
cards. And wires. And you'll heat up
to a maximum
40. It's not about sexy hardware, but
about being able to quickly replace it.
Ideally while the system keeps running
41. But anyway.
To keep the first mile scalable and
fast, would lead to some expensive
network infrastructure.
You need to get the maximum out
of your servers in order to reduce
their number
42. np, dude!
I will use an event
driven C10K
problem solving
awesome web
server. Or I'll write
one on my own
43. Maybe.
But when your users are coming from
all over the world, it won't help you
much since the network latency from
them to your server will kill them
44. You would have to go for a CDN one
day, statically pre-computing content.
You would use their infrastructure
and reduce the number of hits on
your own servers to a minimum
45. np, dude!
I'll push my whole
platform out to the
cloud. It's even
more flexible and
scales like hell
46. Well.
You cannot reliably predict on which
physical machine and actually how
close to the data you program will run.
Whenever virtual machines or
storage fragments get moved, your
world stops
47. You can easily force data locality and
shorter stop-the-world-phases
by paying higher bills
48. Data locality, geographic spatiality,
dedicated virtualization and content
pre-computability combined with use
case driven cloudification
are key principles in succeeding
with provisioning of huge data amounts.
That's big data development
52. The slowest of those two
is definitely “split”.
Moving data from one huge pile to
another before map/reduce is
damn expensive
53. np, dude!
I'll write my data
straight to the
storage of my
map/reduce tool.
It will then tear
54. It can.
But what if you need to search during
the map phase – full-text, meta?
55. np, dude!
I'll use a cool
indexing search
engine or library.
It can find my data
in a snap
56. Would it?
A very hard challenge is to partition
the index and to couple its related
parts to the corresponding data.
With data locality of course, having
index pieces on the related machines
57. Data and index locality and direct filling
of data pots as data flies by combined
with use case driven technology
usage are key principles in succeeding
with processing of huge data amounts.
That's big data development
64. I'm sure you will. But:
You cannot predict and fix the
map/reduce time.
You cannot ensure
the completeness of data.
You cannot
guarantee causality knowledge
66. If you need to predict better,
to be able to know about data/event
causality, to be fast you need to CEP
data streams as data flies by.
There is no (simple, fast) way around
67. But the most important thing is:
None of the BI tools you know will
adequately support your NoSQL
data store, so you're all alone
in the world of proprietary
immature tool combinations.
The world of pain.
69. There is no point in fearing
math/statistics. You just need it
70. Separation of immediate and
post fact analytics and CEP of
data streams as data flies by combined
with use case driven technology
usage and statistical knowledge
are key principles in succeeding
with analytics of huge data amounts.
That's big data development
73. Me neither.
I just know that you can't visualize
huge data amounts using classic
spreadsheets. There are better ways,
tools, ideas to do this – find them
That's big data development
75. Almost.
In one of my humble moments I would
suggest you to do the following:
76. Stop thinking you gain adequately deep
knowledge through reading half-baked
blog posts. Get yourself some of those:
77. Statistics, Visualization
Distribution
Network
Different languages
Tools, chains
Know and use full stack
Data stores
Different platforms
OS
Storage
Machine
Algorithms
Math
78. Know your point of pain.
You must be Twitter, Facebook or
Google to have them all same time.
If you're none of them, you can have
one or two. Or even none.
Go for them with the right chain tool
79. First and the most important tool in the
chain is your brain
81. Most images originate from
istockphoto.com
except few ones taken
from Wikipedia or Flickr (CC)
and product pages
or generated through public
online generators