R user group 2011 09

•Download as PPTX, PDF•

0 likes•346 views

Talk given on September 2011 to the Bay Area R User Group. The talk walks a stochastic project SVD algorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.

Technology Business

8/9/2013 © MapR Confidential 1
R
Hadoop
and MapR

8/9/2013 © MapR Confidential 2
The bad old days (i.e. now)
• Hadoop is a silo
• HDFS isn’t a normal file system
• Hadoop doesn’t really like C++
• R is limited
• One machine, one memory space
• Isn’t there any way we can just get along?

8/9/2013 © MapR Confidential 3
The white knight
• MapR changes things
• Lots of new stuff like snapshots, NFS
• All you need to know, you already know
• NFS provides cluster wide file access
• Everything works the way you expect
• Performance high enough to use as a message bus

8/9/2013 © MapR Confidential 4
Example, out-of-core SVD
• SVD provides compressed matrix form
• Based on sum of rank-1 matrices
A =s1u1 ¢v1 +s2u2 ¢v2 +e
± ±≈ + + ?

8/9/2013 © MapR Confidential 5
More on SVD
• SVD provides a very nice basis
Ax = A aiviå = s juj ¢vj
j
å
é
ë
ê
ê
ù
û
ú
ú
aivi
i
å
é
ë
ê
ù
û
ú= aisiui
i
å

8/9/2013 © MapR Confidential 6
• And a nifty approximation property
Ax =s1a1u1 +s2a2u2 + siaiui
i>2
å
e 2
£ si
2
i>2
å

8/9/2013 © MapR Confidential 7
Also known as …
• Latent Semantic Indexing
• PCA
• Eigenvectors

8/9/2013 © MapR Confidential 8
An application, approximate translation
• Translation distributes over concatenation
• But counting turns concatenation into
addition
• This means that translation is linear!
T(s1 | s2 )=T(s1)| T(s2 )
k(s1 | s2 )= k(s1) + k(s2 )
k(T(s1 | s2 )) = k(T(s1)) + k(T(s2 ))

8/9/2013 © MapR Confidential 10
Traditional computation
• Products of A are dominated by large singular
values and corresponding vectors
• Subtracting these dominate singular values
allows the next ones to appear
• Lanczos method, generally Krylov sub-space
A ¢A A( )
n
=US2n+1
¢V

8/9/2013 © MapR Confidential 12
The gotcha
• Iteration in Hadoop is death
• Huge process invocation costs
• Lose all memory residency of data
• Total lost cause

8/9/2013 © MapR Confidential 13
Randomness to the rescue
• To save the day, run all iterations at the same
time
Y = AW
QR = Y
B = ¢Q A
US ¢V = B
QU( )S ¢V » A
==
A

8/9/2013 © MapR Confidential 14
In R
lsa = function(a, k, p) {
n = dim(a)[1]
m = dim(a)[2]
y = a %*% matrix(rnorm(m*(k+p)), nrow=m)
y.qr = qr(y)
b = t(qr.Q(y.qr)) %*% a
b.qr = qr(t(b))
svd = svd(t(qr.R(b.qr)))
list(u=qr.Q(y.qr) %*% svd$u[,1:k],
d=svd$d[1:k],
v=qr.Q(b.qr) %*% svd$v[,1:k])
}

8/9/2013 © MapR Confidential 15
Not good enough yet
• Limited to memory size
• After memory limits, feature extraction
dominates

8/9/2013 © MapR Confidential 16
Hybrid architecture
Feature
extraction
and
down
sampling
I
n
p
u
t
Side-data
Data
join
Sequential
SVD
Map-reduce
Via NFS

8/9/2013 © MapR Confidential 17
Hybrid architecture
Feature
extraction
and
down
sampling
I
n
p
u
t
Side-data
Data
join
Map-reduce
Via NFS
R
Visualization
Sequential
SVD

8/9/2013 © MapR Confidential 18
Randomness to the rescue
• To save the day again, use blocks
Yi = AiW
¢R R = ¢Y Y = ¢Yi Yiå
Bj = AiWR-1
( )Aij
i
å
LL' = B ¢B
US ¢V = L
AWR-1
U( )S L-1
B ¢V( )» A
==
=

8/9/2013 © MapR Confidential 19
Hybrid architecture
Map-reduce
Feature extraction
and
down sampling Via NFS
R
Visualization
Map-reduce
Block-wise
parallel
SVD

8/9/2013 © MapR Confidential 20
Conclusions
• Inter-operability allows massively scalability
• Prototyping in R not wasted
• Map-reduce iteration not needed for SVD
• Feasible scale ~10^9 non-zeros or more

LocationPowers OGC BigGeoData 2016 This presentation will discuss tools in the open source landscape that are used to handle big geospatial data. In particular, we will focus on how Apache frameworks such as Spark and Accumulo are "geospatially enabled" by four projects: GeoTrellis, GeoWave, GeoMesa, and GeoJinni. These four projects all participate in LocationTech, a working group under the Eclipse Foundation. In particular, we will discuss how each of these LocationTech technologies implement spatial indexing (e.g. by using space filling curves) in order to provide quick access to data, and other common themes among the four projects. Attendees should walk away from this presentation understanding important parts of the Apache big data ecosystem, a set of LocationTech projects that belong to the cutting edge of enabling those Apache project's handling of geospatial data, as well as some solutions to common problems when dealing with large geospatial data.

http://bit.ly/1ALVcwR – MapR Director of Architecture and Enterprise Strategy Jim Scott presented a session titled “Time Series Data in a Time Series World.” His session focused on working with time series data including single-value, geospatial and log time series data. By focusing on enterprise applications and the data center, OpenTSDB will be used as an example to explain some of the key time series core concepts including when to use different storage models. Things Expo | San Jose, California - November 2014

Ch 5: Introduction to heap overflows

Sam Bowne

DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...

Deltares

Cassandra at talkbits

Max Alexejev

Weather Data Analytics Using Hadoop

Najima Begum

Daniel Marcous

Locality Sensitive Hashing By Spark

Spark Summit

LIDAR-derived DTM for archaeology and landscape history research some recent ...

Shaun Lewis

LocationTech Projects

Jody Garnett

LocationTech is an Eclipse Foundation industry working group for location aware technologies. This presentation introduces LocationTech, looks at what it means for our industry and the participating projects. Libraries: JTS Topology Suite is the rocket science of GIS providing an implementation of Geometry. Mobile Map Tools provides a C++ foundation that is translated into Java and Javascript for maps on iOS, Andriod and WebGL. GeoMesa is a distributed key/value store based on Accumulo. Spatial4j integrates with JTS to provide Geometry on curved surface. Process: GeoTrellis real-time distributed processing used scala, akka and spark. GeoJinni mixes spatial data/indexing with Hadoop. Applications: GEOFF offers OpenLayers 3 as a SWT component. GeoGit distributed revision control for feature data. GeoScipt brings spatial data to Groovy, JavaScript, Python and Scala. uDig offers an eclipse based desktop GIS solution. Attend this presentation if want to know what LocationTech is about, are interested in these projects or curious about what projects will be next.

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...

Jen Aman

06 how to write a map reduce version of k-means clustering

Subhas Kumar Ghosh

Leveraging Map Reduce With Hadoop for Weather Data Analytics

iosrjce

IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)

Alexey Zinoviev

Building maps for apps in the cloud - a Softlayer Use Case

Timan Rebel

High Throughput Processing of Space Debris Data

Andreas Schreiber

Space Debris are defunct objects in space, including old space vehicles or fragments from collisions. Space debris can cause great damage to functional space ships and satellites. Thus detection of space debris and prediction of their orbital paths are essential. The talk shows a Python based infrastructure for storing space debris data from sensors and high-throughput processing of that data. PyData Seattle (26. Juli 2015) http://seattle.pydata.org/schedule/presentation/35/

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...

Dataconomy Media

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0 Visit the conference website to learn more: http://telaviv.datanatives.io/ Follow Data Natives: https://www.facebook.com/DataNatives https://twitter.com/DataNativesConf Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS About the Author: I am a data science researcher. I have a diverse academic background - a B.Sc. in electrical engineering, a B.Sc. in physics (cum laude) from Tel Aviv University's prestigious program for parallel B.Sc. in Physics and in Electrical Engineering, an M.Sc. in condensed matter (cum laude), and have started my Ph.D. in bioinformatics. Prior to my M.Sc. I have served as a captain in a technology unit of the IDF. I am passionate about science and solving complex big data problems that require out of the box thinking, and like to dive deep into the details. I always take a positive, proactive approach, and put an emphasis on understanding the big picture as well.

CNIT 127 Ch 5: Introduction to heap overflows

Sam Bowne

Advancing Scientific Data Support in ArcGIS

The HDF-EOS Tools and Information Center

CS205 Final projectDanny Gibbs

Recommendation as Search: Reflections on Symmetry

MapR Technologies

LA HUG 2012 02-07

MapR Technologies

Oscon Data 2011 Ted DunningMapR Technologies

What's hot

Time Series Data in a Time Series World

MapR Technologies

Ch 5: Introduction to heap overflows

Sam Bowne

DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...

Deltares

Cassandra at talkbits

Max Alexejev

Weather Data Analytics Using Hadoop

Najima Begum

Daniel Marcous

Locality Sensitive Hashing By Spark

Spark Summit

LIDAR-derived DTM for archaeology and landscape history research some recent ...

Shaun Lewis

LocationTech Projects

Jody Garnett

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...

Jen Aman

06 how to write a map reduce version of k-means clustering

Subhas Kumar Ghosh

Leveraging Map Reduce With Hadoop for Weather Data Analytics

iosrjce

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)

Alexey Zinoviev

Building maps for apps in the cloud - a Softlayer Use Case

Timan Rebel

High Throughput Processing of Space Debris Data

Andreas Schreiber

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...

Dataconomy Media

CNIT 127 Ch 5: Introduction to heap overflows

Sam Bowne

Advancing Scientific Data Support in ArcGIS

The HDF-EOS Tools and Information Center

CS205 Final projectDanny Gibbs

What's hot (19)

Time Series Data in a Time Series World

Ch 5: Introduction to heap overflows

DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...

Cassandra at talkbits

Weather Data Analytics Using Hadoop

Locality Sensitive Hashing By Spark

LIDAR-derived DTM for archaeology and landscape history research some recent ...

LocationTech Projects

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...

06 how to write a map reduce version of k-means clustering

Leveraging Map Reduce With Hadoop for Weather Data Analytics

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)

Building maps for apps in the cloud - a Softlayer Use Case

High Throughput Processing of Space Debris Data

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...

CNIT 127 Ch 5: Introduction to heap overflows

Advancing Scientific Data Support in ArcGIS

CS205 Final project

Viewers also liked

Recommendation as Search: Reflections on Symmetry

MapR Technologies

LA HUG 2012 02-07

MapR Technologies

Oscon Data 2011 Ted DunningMapR Technologies

Paris Data Geeks

MapR Technologies

London Data Science - Super-Fast Clustering Report

MapR Technologies

Big Data Paris

MapR Technologies

Storm Users Group Real Time Hadoop

MapR Technologies

Viewers also liked (7)

Recommendation as Search: Reflections on Symmetry

LA HUG 2012 02-07

Oscon Data 2011 Ted Dunning

Paris Data Geeks

London Data Science - Super-Fast Clustering Report

Big Data Paris

Storm Users Group Real Time Hadoop

Similar to R user group 2011 09

Lawrence Livermore Labs talk 2011

MapR Technologies

MapReduce Algorithm Design

Gabriela Agustini

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

Ontico

Introduction to Spark

Carol McDonald

DSD-INT 2016 The new parallel Krylov Solver package - Verkaik

Deltares

Cleveland Hadoop Users Group - Spark

Vince Gonzalez

Tall and Skinny QRs in MapReduce

David Gleich

Real-time and Long-time Together

MapR Technologies

dmapply: A functional primitive to express distributed machine learning algor...

Bikash Chandra Karmokar

MapReduce with Hadoop

Vitalie Scurtu

Sparse matrix computations in MapReduce

David Gleich

Resilient Distributed Datasets

Gabriele Modena

Introduction to Spark - Phoenix Meetup 08-19-2014

cdmaxime

Introduction to Spark on Hadoop

Carol McDonald

ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats

Giuseppe Masetti

Dealing with an Upside Down Internet

MapR Technologies

From the Hadoop Summit 2015 Session with Ted Dunning: Just when we thought the last mile problem was solved, the Internet of Things is turning the last mile problem of the consumer internet into the first mile problem of the industrial internet. This inversion impacts every aspect of the design of networked applications. I will show how to use existing Hadoop ecosystem tools, such as Spark, Drill and others, to deal successfully with this inversion. I will present real examples of how data from things leads to real business benefits and describe real techniques for how these examples work.

How the Internet of Things are Turning the Internet Upside Down

DataWorks Summit

Apache Spark Overview part1 (20161107)

Steve Min

Big data matrix factorizations and Overlapping community detection in graphs

David Gleich

Why Spark Is the Next Top (Compute) Model

Dean Wampler

Similar to R user group 2011 09 (20)

Lawrence Livermore Labs talk 2011

MapReduce Algorithm Design

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

Introduction to Spark

DSD-INT 2016 The new parallel Krylov Solver package - Verkaik

Cleveland Hadoop Users Group - Spark

Tall and Skinny QRs in MapReduce

Real-time and Long-time Together

dmapply: A functional primitive to express distributed machine learning algor...

MapReduce with Hadoop

Sparse matrix computations in MapReduce

Resilient Distributed Datasets

Introduction to Spark - Phoenix Meetup 08-19-2014

Introduction to Spark on Hadoop

ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats

Dealing with an Upside Down Internet

How the Internet of Things are Turning the Internet Upside Down

Apache Spark Overview part1 (20161107)

Big data matrix factorizations and Overlapping community detection in graphs

Why Spark Is the Next Top (Compute) Model

More from MapR Technologies

Converging your data landscape

MapR Technologies

How Data-Driven Approaches are Changing Your Data Management Strategies Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.

ML Workshop 2: Machine Learning Model Comparison & Evaluation

MapR Technologies

How Rendezvous Architecture Improves Evaluation in the Real World In this addition of our machine learning logistics webinar series we build on the ideas of the key requirements for effective management of machine learning logistics presented in the Overview webinar and in Part I Workshop. Here we focus on model-to-model comparison & evaluation, use of decoy models and more. Listen here: http://info.mapr.com/machine-learning-workshop2.html?_ga=2.35695522.324200644.1511891424-416597139.1465233415

Self-Service Data Science for Leveraging ML & AI on All of Your Data

MapR Technologies

Enabling Real-Time Business with Change Data Capture

MapR Technologies

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...

MapR Technologies

ML Workshop 1: A New Architecture for Machine Learning Logistics

MapR Technologies

Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/

Machine Learning Success: The Key to Easier Model Management

MapR Technologies

Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.

Data Warehouse Modernization: Accelerating Time-To-Action

MapR Technologies

Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html

Live Tutorial – Streaming Real-Time Events Using Apache APIs

MapR Technologies

Bringing Structure, Scalability, and Services to Cloud-Scale Storage

MapR Technologies

Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.

Live Machine Learning Tutorial: Churn Prediction

MapR Technologies

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook. Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization. In this tutorial, we'll do the following: Review classification and decision trees Use Spark DataFrames with Spark ML pipelines Predict customer churn with Apache Spark ML decision trees Use Zeppelin to run Spark commands and visualize the results

An Introduction to the MapR Converged Data Platform

MapR Technologies

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...

MapR Technologies

IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem. Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution. Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.

Best Practices for Data Convergence in Healthcare

MapR Technologies

Geo-Distributed Big Data and Analytics

MapR Technologies

Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html

MapR Product Update - Spring 2017

MapR Technologies

3 Benefits of Multi-Temperature Data Management for Data Analytics

MapR Technologies

Cisco & MapR bring 3 Superpowers to SAP HANA Deployments

MapR Technologies

MapR and Cisco Make IT Better

MapR Technologies

You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years. With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.

Evolving from RDBMS to NoSQL + SQL

MapR Technologies

More from MapR Technologies (20)

Converging your data landscape

ML Workshop 2: Machine Learning Model Comparison & Evaluation

Self-Service Data Science for Leveraging ML & AI on All of Your Data

Enabling Real-Time Business with Change Data Capture

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...

ML Workshop 1: A New Architecture for Machine Learning Logistics

Machine Learning Success: The Key to Easier Model Management

Data Warehouse Modernization: Accelerating Time-To-Action

Live Tutorial – Streaming Real-Time Events Using Apache APIs

Bringing Structure, Scalability, and Services to Cloud-Scale Storage

Live Machine Learning Tutorial: Churn Prediction

An Introduction to the MapR Converged Data Platform

How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...

Best Practices for Data Convergence in Healthcare

Geo-Distributed Big Data and Analytics

MapR Product Update - Spring 2017

3 Benefits of Multi-Temperature Data Management for Data Analytics

Cisco & MapR bring 3 Superpowers to SAP HANA Deployments

MapR and Cisco Make IT Better

Evolving from RDBMS to NoSQL + SQL

Recently uploaded

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Bits & Pixels using AI for Good.........

Alison B. Lowndes

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Connector Corner: Automate dynamic content and events by pushing a button

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Securing your Kubernetes cluster_ a step-by-step guide to success !

Generating a custom Ruby SDK for your web service or Rails API using Smithy

Neuro-symbolic is not enough, we need neuro-*semantic*

Epistemic Interaction - tuning interfaces to provide information for AI support

Bits & Pixels using AI for Good.........

When stars align: studies in data quality, knowledge graphs, and machine lear...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

Mission to Decommission: Importance of Decommissioning Products to Increase E...

The Art of the Pitch: WordPress Relationships and Sales

Monitoring Java Application Security with JDK Tools and JFR Events

Accelerate your Kubernetes clusters with Varnish Caching

Leading Change strategies and insights for effective change management pdf 1.pdf

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

How world-class product teams are winning in the AI era by CEO and Founder, P...

GraphRAG is All You need? LLM & Knowledge Graph

R user group 2011 09

2. 8/9/2013 © MapR Confidential 2 The bad old days (i.e. now) • Hadoop is a silo • HDFS isn’t a normal file system • Hadoop doesn’t really like C++ • R is limited • One machine, one memory space • Isn’t there any way we can just get along?

3. 8/9/2013 © MapR Confidential 3 The white knight • MapR changes things • Lots of new stuff like snapshots, NFS • All you need to know, you already know • NFS provides cluster wide file access • Everything works the way you expect • Performance high enough to use as a message bus

4. 8/9/2013 © MapR Confidential 4 Example, out-of-core SVD • SVD provides compressed matrix form • Based on sum of rank-1 matrices A =s1u1 ¢v1 +s2u2 ¢v2 +e ± ±≈ + + ?

5. 8/9/2013 © MapR Confidential 5 More on SVD • SVD provides a very nice basis Ax = A aiviå = s juj ¢vj j å é ë ê ê ù û ú ú aivi i å é ë ê ù û ú= aisiui i å

6. 8/9/2013 © MapR Confidential 6 • And a nifty approximation property Ax =s1a1u1 +s2a2u2 + siaiui i>2 å e 2 £ si 2 i>2 å

8. 8/9/2013 © MapR Confidential 8 An application, approximate translation • Translation distributes over concatenation • But counting turns concatenation into addition • This means that translation is linear! T(s1 | s2 )=T(s1)| T(s2 ) k(s1 | s2 )= k(s1) + k(s2 ) k(T(s1 | s2 )) = k(T(s1)) + k(T(s2 ))

10. 8/9/2013 © MapR Confidential 10 Traditional computation • Products of A are dominated by large singular values and corresponding vectors • Subtracting these dominate singular values allows the next ones to appear • Lanczos method, generally Krylov sub-space A ¢A A( ) n =US2n+1 ¢V

12. 8/9/2013 © MapR Confidential 12 The gotcha • Iteration in Hadoop is death • Huge process invocation costs • Lose all memory residency of data • Total lost cause

13. 8/9/2013 © MapR Confidential 13 Randomness to the rescue • To save the day, run all iterations at the same time Y = AW QR = Y B = ¢Q A US ¢V = B QU( )S ¢V » A == A

14. 8/9/2013 © MapR Confidential 14 In R lsa = function(a, k, p) { n = dim(a)[1] m = dim(a)[2] y = a %*% matrix(rnorm(m*(k+p)), nrow=m) y.qr = qr(y) b = t(qr.Q(y.qr)) %*% a b.qr = qr(t(b)) svd = svd(t(qr.R(b.qr))) list(u=qr.Q(y.qr) %*% svd$u[,1:k], d=svd$d[1:k], v=qr.Q(b.qr) %*% svd$v[,1:k]) }

15. 8/9/2013 © MapR Confidential 15 Not good enough yet • Limited to memory size • After memory limits, feature extraction dominates

16. 8/9/2013 © MapR Confidential 16 Hybrid architecture Feature extraction and down sampling I n p u t Side-data Data join Sequential SVD Map-reduce Via NFS

17. 8/9/2013 © MapR Confidential 17 Hybrid architecture Feature extraction and down sampling I n p u t Side-data Data join Map-reduce Via NFS R Visualization Sequential SVD

18. 8/9/2013 © MapR Confidential 18 Randomness to the rescue • To save the day again, use blocks Yi = AiW ¢R R = ¢Y Y = ¢Yi Yiå Bj = AiWR-1 ( )Aij i å LL' = B ¢B US ¢V = L AWR-1 U( )S L-1 B ¢V( )» A == =

19. 8/9/2013 © MapR Confidential 19 Hybrid architecture Map-reduce Feature extraction and down sampling Via NFS R Visualization Map-reduce Block-wise parallel SVD

20. 8/9/2013 © MapR Confidential 20 Conclusions • Inter-operability allows massively scalability • Prototyping in R not wasted • Map-reduce iteration not needed for SVD • Feasible scale ~10^9 non-zeros or more

R user group 2011 09

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to R user group 2011 09

Similar to R user group 2011 09 (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

R user group 2011 09