The document discusses streaming data and concurrency in R. It notes that R is inherently single-threaded, but there are efforts to enable parallelization through distributed computation and interfaces to other technologies. While multithreading R itself is challenging due to its internal workings, concurrency could be useful for real-time streaming data analysis. The document presents an example application using a C++ extension to subscribe to real-time market data feeds in R, but notes issues with blocking the interpreter thread. An alternative approach using inter-process communication is suggested to decouple the subscription logic.
A survey of data visualization functions and packages in R. In particular, I discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package. I also discuss some methods for visualizing large data sets.
A survey of data visualization functions and packages in R. In particular, I discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package. I also discuss some methods for visualizing large data sets.
Unleashing the Potential: Navigating the Versatility and Simplicity of Python...Flexsin
Discover Python's versatility, simplicity, and efficiency in modern software development. From web dev to AI, explore its key features, applications, and why it's a developer favorite. Python isn't just a language; it's an ecosystem of innovation, simplicity, and limitless possibilities.
https://www.flexsin.com/open-source/python-development/
Fully featured, commercially supported machine learning suites that can build Decision Trees in Hadoop are few and far between. Addressing this gap, Revolution Analytics recently enhanced its entire scalable analytics suite to run in Hadoop. In this talk, I will explain how our Decision Tree implementation exploits recent research reducing the computational complexity of decision tree estimation, allowing linear scalability with data size and number of nodes. This streaming algorithm processes data in chunks, allowing scaling unconstrained by aggregate cluster memory. The implementation supports both classification and regression and is fully integrated with the R statistical language and the rest of our advanced analytics and machine learning algorithms, as well as our interactive Decision Tree visualizer.
Python for Data Engineering: Why Do Data Engineers Use Python?hemayadav41
Discover why data engineers prefer using Python for their data engineering tasks. Explore the versatility, ease of use, extensive libraries, integration with big data technologies, and data visualisation capabilities that make Python an invaluable tool in the field. Institutes like Uncodemy, Udemy, Simplilearn, Ducat, and 4achivers, provide the best Python Course with Job Placement in Jaipur, Kanpur, Gorakhpur, Mumbai, Pune, Delhi, Noida, and all over India."
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
Performing large, intensive or non-trivial computing on array like data structures is one of the most common task in scientific computing, video game development and other fields. This matter of fact is backed up by the large number of tools, languages and libraries to perform such tasks. If we restrict ourselves to C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK C++ binding to template meta-programming based Blitz++ or Eigen. If all of these libraries provide good performance or good abstraction, none of them seems to fit the need of so many different user types.
Moreover, as parallel system complexity grows, the need to maintain all those components quickly become unwieldy. This talk explores various software design techniques - like Generative Programming, MetaProgramming and Generic Programming - and their application to the implementation of a parallel computing librariy in such a way that:
- abstraction and expressiveness are maximized - cost over efficiency is minimized
We'll skim over various applications and see how they can benefit from such tools. We will conclude by discussing what lessons were learnt from this kind of implementation and how those lessons can translate into new directions for the language itself.
Introduction to Spark R with R studio - Mr. Pragith Sigmoid
R is a programming language and software environment for statistical computing and graphics.
The R language is widely used among statisticians and data miners for developing statistical
software and data analysis.
RStudio IDE is a powerful and productive user interface for R.
It’s free and open source, and available on Windows, Mac, and Linux.
Using Aspects for Language Portability (SCAM 2010)lennartkats
Software platforms such as the Java Virtual Machine or the CLR .NET virtual machine have their own ecosystem of a core programming language or instruction set, libraries, and developer community. Programming languages can target multiple software platforms to increase interoperability or to boost performance. Introducing a new compiler backend for a language is the first step towards targeting a new platform, translating the language to the platform\'s language or instruction set. Programs written in modern languages generally make extensive use of APIs, based on the runtime system of the software platform, introducing additional portability concerns. They may use APIs that are implemented by platform-specific libraries. Libraries may perform platform-specific operations, make direct native calls, or make assumptions about performance characteristics of operations or about the file system. This paper proposes to use aspect weaving to invasively adapt programs and libraries to address such portability concerns, and identifies four classes of aspects for this purpose. We evaluate this approach through a case study where we retarget the Stratego program transformation language towards the Java Virtual Machine.
In this presentation from Revolution Analytics, Bill Jacobs presents: Are You Ready for Big Data Analytics?
"Revolution Analytics delivers advanced analytics software at half the cost of existing solutions. By building on open source R—the world's most powerful statistics software—with innovations in big data analysis, integration and user experience, Revolution Analytics meets the demands and requirements of modern data-driven businesses."
Learn more: http://www.revolutionanalytics.com
Watch the presentation video: http://wp.me/p3RLEV-12S
Unleashing the Potential: Navigating the Versatility and Simplicity of Python...Flexsin
Discover Python's versatility, simplicity, and efficiency in modern software development. From web dev to AI, explore its key features, applications, and why it's a developer favorite. Python isn't just a language; it's an ecosystem of innovation, simplicity, and limitless possibilities.
https://www.flexsin.com/open-source/python-development/
Fully featured, commercially supported machine learning suites that can build Decision Trees in Hadoop are few and far between. Addressing this gap, Revolution Analytics recently enhanced its entire scalable analytics suite to run in Hadoop. In this talk, I will explain how our Decision Tree implementation exploits recent research reducing the computational complexity of decision tree estimation, allowing linear scalability with data size and number of nodes. This streaming algorithm processes data in chunks, allowing scaling unconstrained by aggregate cluster memory. The implementation supports both classification and regression and is fully integrated with the R statistical language and the rest of our advanced analytics and machine learning algorithms, as well as our interactive Decision Tree visualizer.
Python for Data Engineering: Why Do Data Engineers Use Python?hemayadav41
Discover why data engineers prefer using Python for their data engineering tasks. Explore the versatility, ease of use, extensive libraries, integration with big data technologies, and data visualisation capabilities that make Python an invaluable tool in the field. Institutes like Uncodemy, Udemy, Simplilearn, Ducat, and 4achivers, provide the best Python Course with Job Placement in Jaipur, Kanpur, Gorakhpur, Mumbai, Pune, Delhi, Noida, and all over India."
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
Performing large, intensive or non-trivial computing on array like data structures is one of the most common task in scientific computing, video game development and other fields. This matter of fact is backed up by the large number of tools, languages and libraries to perform such tasks. If we restrict ourselves to C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK C++ binding to template meta-programming based Blitz++ or Eigen. If all of these libraries provide good performance or good abstraction, none of them seems to fit the need of so many different user types.
Moreover, as parallel system complexity grows, the need to maintain all those components quickly become unwieldy. This talk explores various software design techniques - like Generative Programming, MetaProgramming and Generic Programming - and their application to the implementation of a parallel computing librariy in such a way that:
- abstraction and expressiveness are maximized - cost over efficiency is minimized
We'll skim over various applications and see how they can benefit from such tools. We will conclude by discussing what lessons were learnt from this kind of implementation and how those lessons can translate into new directions for the language itself.
Introduction to Spark R with R studio - Mr. Pragith Sigmoid
R is a programming language and software environment for statistical computing and graphics.
The R language is widely used among statisticians and data miners for developing statistical
software and data analysis.
RStudio IDE is a powerful and productive user interface for R.
It’s free and open source, and available on Windows, Mac, and Linux.
Using Aspects for Language Portability (SCAM 2010)lennartkats
Software platforms such as the Java Virtual Machine or the CLR .NET virtual machine have their own ecosystem of a core programming language or instruction set, libraries, and developer community. Programming languages can target multiple software platforms to increase interoperability or to boost performance. Introducing a new compiler backend for a language is the first step towards targeting a new platform, translating the language to the platform\'s language or instruction set. Programs written in modern languages generally make extensive use of APIs, based on the runtime system of the software platform, introducing additional portability concerns. They may use APIs that are implemented by platform-specific libraries. Libraries may perform platform-specific operations, make direct native calls, or make assumptions about performance characteristics of operations or about the file system. This paper proposes to use aspect weaving to invasively adapt programs and libraries to address such portability concerns, and identifies four classes of aspects for this purpose. We evaluate this approach through a case study where we retarget the Stratego program transformation language towards the Java Virtual Machine.
In this presentation from Revolution Analytics, Bill Jacobs presents: Are You Ready for Big Data Analytics?
"Revolution Analytics delivers advanced analytics software at half the cost of existing solutions. By building on open source R—the world's most powerful statistics software—with innovations in big data analysis, integration and user experience, Revolution Analytics meets the demands and requirements of modern data-driven businesses."
Learn more: http://www.revolutionanalytics.com
Watch the presentation video: http://wp.me/p3RLEV-12S
Can you teach coding to kids in a mobile game app in local languages. Do you need to be good in English to learn coding in R or Python?
How young can we train people in coding-
something we worked on for six months but now we are giving up due to lack of funds is this idea.
Feel free to use it, it is licensed cc-by-sa
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. About Me
Independent Software Consultant
M.Sc. Applied Computing, 2000
M.Sc. Finance, 2008
Apache Committer
Working in the financial sector for the last 7 years or so
Interested in practical applications of functional languages and
machine learning
Relatively recent convert to R ( ≈ 2 years)
3. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
4. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
5. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
6. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
7. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
8. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
9. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
10. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
11. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
12. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
13. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
14. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
15. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
16. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
17. R - Pros and Cons
Pro
Designed by statisticians Con
Can be extremely elegant
Designed by statisticians
Comprehensive extension
Can be clunky (S4)
library
Bewildering array of
Open-source
overlapping extensions
Huge parallelization effort
Inherently single-threaded
Fantastic reporting
Incredibly Popular
capabilities
Incredibly Popular
18. Parallelization vs. Concurrency
R interpreter is single threaded
Some historical context for this (BLAS implementations)
Not necessarily a limitation in the general context
Multithreading can be complex and problematic
Instead a focus on parallelization:
Distributed computation: gridR, nws, snow
Multicore/multi-cpu scaling: Rmpi, Romp, pnmath/pnmath0
Interfaces to Pthreads/PBLAS/OpenMP/MPI/Globus/etc.
Parallelization suits cpu-bound large data processing
applications
19. Other Scalability and Performance Work
JIT/bytecode compilation (Ra)
Implicit vectorization a la Matlab (code analysis)
Large (≥ RAM) dataset handling (bigmemory,ff)
Many incremental performance improvements (e.g. less
internal copying)
Next: GPU/massive multicore...?
20. What Benefit Concurrency?
Real-time (streaming to be more precise) data analysis
Growing Interest in using R for streaming data, not just offline
analyis
GUI toolkit integration
Fine-grained control over independent task execution
"I believe that explicit concurrency management tools (i.e. a
threads toolkit) are what we really need in R at this point." -
Luke Tierney, 2001
21. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
22. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
23. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
24. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
25. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
26. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
27. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
28. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
29. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
30. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
31. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
32. Will There Be A Multithreaded R?
Short answer is: probably not
At least not in its current incarnation
Internal workings of the interpreter not particularly amenable
to concurrency:
Functions can manipulate caller state («- vs. <-)
Lazy evaluation machinery (promises)
Dynamic State, garbage collection, etc.
Scoping: global environments
Management of resources: streams, I/O, connections, sinks
Implications for current code
Possibly in the next language evolution (cf. Ihaka?)
Large amount of work (but potentially do-able)
33. Example Application
Based on work I did last year and presented at UseR! 2008
Wrote a real-time and historical market data service from
Reuters/R
The real-time interface used the Reuters C++ API
R extension in C++ that spawned listening thread and
handled updates
35. Example Usage
rsub <- function(duration, items, callback)
The call rsub will subscribe to the specified rate(s) for the duration
of time specified by duration (ms). When a tick arrives, the
callback function callback is invoked, with a data frame
containing the fields specified in items.
Multiple market data items may be subscribed to, and any
combination of fields may be be specified.
Uses the underlying RFA API, which provides a C++ interface to
real-time market updates.
36. Real-Time Example
# Specify field names to retrieve
fields <- c("BID","ASK","TIMCOR")
# Subscribe to EUR/USD and GBP/USD ticks
items <- list()
items[[1]] <- c("IDN_SELECTFEED", "EUR=", fields)
items[[2]] <- c("IDN_SELECTFEED", "GBP=", fields)
# Simple Callback Function
callback <- function(df) { print(paste("Received",df)) }
# Subscribe for 1 hour
ONE_HOUR <- 1000*(60)^2
rsub(ONE_HOUR, items, callback)
37. Issues With This Approach
As R interpreter is single threaded, cannot spawn thread for
callbacks
Thus, interpreter thread is locked for the duration of
subscription
Not a great user experience
Need to find alternative mechanism
38. Alternative Approach
If we cannot run subscriber threads in-process, need to
decouple
Standard approach: add an extra layer and use some form of
IPC
For instance, we could:
Subscribe in a dedicated R process (A)
Push incoming data onto a socket
R process (B) reads from a listening socket
Sockets could also be another IPC primitive, e.g. pipes
Also note that R supports asynchronous I/O (?isIncomplete)
Look at the ibrokers package for examples of this
39. The bigmemoRy package
From the description: "Use C++ to create, store,
access, and manipulate massive matrices"
Allows creation of large matrices
These matrices can be mapped to files/shared memory
It is the shared memory functionality that we will use
The next version (3.0) will be unveiled at UseR! 2009
big.matrix(nrow, ncol, type = "integer", ....)
shared.big.matrix(nrow, ncol, type = "integer", ...)
filebacked.big.matrix(nrow, ncol, type = "integer", ...)
40. Sample Usage
> library(bigmemory) # Note: I'm using pre-release
> X <- shared.big.matrix(type="double", ncol=1000, nrow=1000)
> X
An object of class “big.matrix”
Slot "address":
<pointer: 0x7378a0>
42. Export the Descriptor
In R session 1:
> dput(desc, file="~/matrix.desc")
In R session 2:
> library(bigmemory)
> desc <- dget("~/matrix.desc")
> X <- attach.big.matrix(desc)
Now R sessions A and B share the same big.matrix instance
43. Share Data Between Sessions
R session 1:
> X[1,1] <- 1.2345
R session 2:
> X[1,1]
[1] 1.2345
Thus, streaming data can be continuously fed into session A
And concurrently processed in session B
44. Summary
Lack of threads not a barrier to concurrent analysis
Packages like bigmemory, nws, etc. facilitate decoupling via
IPC
nws goes a step further, with a distributed workspace
Many applications for streaming data:
Data collection/monitoring
Development of pricing/risk algorithms
Low-frequency execution (??)
...