Rhadoop is an effective platform for doing exploratory data analysis over big data sets. The convenience of an interactive command-line interpreter and the overwhelming number of statistical and machine learning routines implemented in R libraries make a highly effective environment to perform elementary data science.
We'll discuss the basics of RHadoop: what it is, how to install it, and the API fundamentals. Next we'll discuss common use cases that you might want to use RHadoop for. Last, we'll run through an interactive example.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies that allow clients to analyze data in new ways; exposing new insights for the business. Join us as Jeffrey Breen explains the core technology concepts and illustrates how to utilize R and Revolution Analytics’ RevoR in Hadoop environments.
Nesta apresentação é demonstrado alguns recursos disponíveis num cluster Hadoop, bem como os principais componentes do ecossistema utilizado no Magazine Luiza. Além disso, temos uma comparação com grandes nomes do mercado que também utilizam esta tecnologia.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies that allow clients to analyze data in new ways; exposing new insights for the business. Join us as Jeffrey Breen explains the core technology concepts and illustrates how to utilize R and Revolution Analytics’ RevoR in Hadoop environments.
Nesta apresentação é demonstrado alguns recursos disponíveis num cluster Hadoop, bem como os principais componentes do ecossistema utilizado no Magazine Luiza. Além disso, temos uma comparação com grandes nomes do mercado que também utilizam esta tecnologia.
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
This presentation contains brief description about big data along with that hadoop installation, configuration and MapReduce wordcount program and its explanation.
r packagesdata analytics study material;
learn data analytics online;
data analytics courses;
courses for data analysis;
courses for data analytics;
online data analysis courses;
courses on data analysis;
data analytics classes;
data analysis training courses online;
courses in data analysis;
data analysis courses online;
data analytics training;
courses for data analyst;
data analysis online course;
data analysis certification;
data analysis courses;
data analysis classes;
online course data analysis;
learn data analysis online;
data analysis training;
python for data analysis course;
learn data analytics;
study data analytics;
how to learn data analytics;
data analysis course free;
statistical methods and data analysis;
big data analytics;
data analysis companies;
python data analysis course;
tools that can be used to analyse data;
data analysis consulting;
basic data analytics;
data analysis programs;
examples of data analysis tools;
big data analysis tools;
data analytics tools and techniques;
statistics for data analytics;
data analytics tools;
data analytics and big data;
data analytics big data;
data analysis software;
data analytics with excel;
website data analysis;
data analytics companies;
data analysis qualifications;
tools for data analytics;
data analysis tools;
qualitative data analysis software;
free data analytics;
data analysis website;
tools for analyzing data;
data analytics software;
free data analysis software;
tools for analysing data;
data mining book;
learn data analysis;
about data analytics;
statistical data analysis software;
it data analytics;
data analytics tutorial for beginners;
unstructured data analytics;
data analytics using excel;
dissertation data analysis;
sample of data analysis;
data analysis online;
data analytics;
tools of data analysis;
analytical tools for data analysis;
statistical tools to analyse data;
data analysis help;
data analysis education;
statistical technique for data analysis;
tools for data analysis;
how to learn data analysis;
data analytics tutorial;
excel data analytics;
data mining course;
data analysis software free;
big data and data analytics;
statistical analysis software;
tools to analyse data;
online data analysis;
data mining software;
data analytics statistics;
how to do data analytics;
statistical data analysis tools;
data analyst tools;
business data analysis;
tools and techniques of data analysis;
education data analysis;
advanced data analytics;
study data analysis;
spreadsheet data analysis;
learn data analysis in excel;
software for data analysis;
shared data warehouse;
what are data analysis tools;
data analytics and statistics;
data analyse;
analysis courses;
data analysis tools for research;
research data analysis tools;
big data analysis;
data mining programs;
applications of data analytics;
data analysis tools and techniques;
Apache Drill is the next generation of SQL query engines. It builds on ANSI SQL 2003, and extends it to handle new formats like JSON, Parquet, ORC, and the usual CSV, TSV, XML and other Hadoop formats. Most importantly, it melts away the barriers that have caused databases to become silos of data. It does so by able to handle schema-changes on the fly, enabling a whole new world of self-service and data agility never seen before.
There's a big shift in both at the architecture and api level from Hadoop 1 vs Hadoop 2, particularly YARN and we had our first meetup to talk about this (http://www.meetup.com/Atlanta-YARN-User-Group/) on 10/13/2013.
Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad.
These slides cover the very basics of Hadoop architecture, in particular HDFS. This was my presentation in the first Delhi Hadoop User Group (DHUG) meetup held at Gurgaon on 10th September 2011. Loved the positive feedback. I'll also upload a more elaborate version covering Hadoop mapreduce architecture as well soon. Most of the stuff covered in these slides can be found in Tom White's book as well (See the last slide)
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and the like.
Yahoo! Hadoop grid makes use of a managed service to get the data pulled into the clusters. However, when it comes to getting the data-out of the clusters, the choices are limited to proxies such as HDFSProxy and HTTPProxy. With the introduction of HCatalog services, customers of the grid now have their data represented in a central metadata repository. HCatalog abstracts out file locations and underlying storage format of data for the users, along with several other advantages such as sharing of data among MapReduce, Pig, and Hive. In this talk, we will focus on how the ODBC/JDBC interface of HiveServer2 accomplished the use case of getting data out of the clusters when HCatalog is in use and users no longer want to worry about the files, partitions and their location. We will also demo the data out capabilities, and go through other nice properties of the data out feature.
Presenter(s):
Sumeet Singh, Director, Product Management, Yahoo!
Chris Drome, Technical Yahoo!
OBIEE Answers Vs Data Visualization: A Cage MatchMichelle Kolbe
With Oracle's new tool in 12c called Data Visualization, when do you use Answers and when do you use Data Visualization? This presentation included a live demo of the two tools. The slides walk step by step through this demo. You can follow along yourself using the Sample App data.
See my blog post walking through these slides here: https://medium.com/@datacheesehead/the-cage-match-between-obiee-answers-and-data-visualization-73496bbf4dfe#.thiuznp0z
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
This presentation contains brief description about big data along with that hadoop installation, configuration and MapReduce wordcount program and its explanation.
r packagesdata analytics study material;
learn data analytics online;
data analytics courses;
courses for data analysis;
courses for data analytics;
online data analysis courses;
courses on data analysis;
data analytics classes;
data analysis training courses online;
courses in data analysis;
data analysis courses online;
data analytics training;
courses for data analyst;
data analysis online course;
data analysis certification;
data analysis courses;
data analysis classes;
online course data analysis;
learn data analysis online;
data analysis training;
python for data analysis course;
learn data analytics;
study data analytics;
how to learn data analytics;
data analysis course free;
statistical methods and data analysis;
big data analytics;
data analysis companies;
python data analysis course;
tools that can be used to analyse data;
data analysis consulting;
basic data analytics;
data analysis programs;
examples of data analysis tools;
big data analysis tools;
data analytics tools and techniques;
statistics for data analytics;
data analytics tools;
data analytics and big data;
data analytics big data;
data analysis software;
data analytics with excel;
website data analysis;
data analytics companies;
data analysis qualifications;
tools for data analytics;
data analysis tools;
qualitative data analysis software;
free data analytics;
data analysis website;
tools for analyzing data;
data analytics software;
free data analysis software;
tools for analysing data;
data mining book;
learn data analysis;
about data analytics;
statistical data analysis software;
it data analytics;
data analytics tutorial for beginners;
unstructured data analytics;
data analytics using excel;
dissertation data analysis;
sample of data analysis;
data analysis online;
data analytics;
tools of data analysis;
analytical tools for data analysis;
statistical tools to analyse data;
data analysis help;
data analysis education;
statistical technique for data analysis;
tools for data analysis;
how to learn data analysis;
data analytics tutorial;
excel data analytics;
data mining course;
data analysis software free;
big data and data analytics;
statistical analysis software;
tools to analyse data;
online data analysis;
data mining software;
data analytics statistics;
how to do data analytics;
statistical data analysis tools;
data analyst tools;
business data analysis;
tools and techniques of data analysis;
education data analysis;
advanced data analytics;
study data analysis;
spreadsheet data analysis;
learn data analysis in excel;
software for data analysis;
shared data warehouse;
what are data analysis tools;
data analytics and statistics;
data analyse;
analysis courses;
data analysis tools for research;
research data analysis tools;
big data analysis;
data mining programs;
applications of data analytics;
data analysis tools and techniques;
Apache Drill is the next generation of SQL query engines. It builds on ANSI SQL 2003, and extends it to handle new formats like JSON, Parquet, ORC, and the usual CSV, TSV, XML and other Hadoop formats. Most importantly, it melts away the barriers that have caused databases to become silos of data. It does so by able to handle schema-changes on the fly, enabling a whole new world of self-service and data agility never seen before.
There's a big shift in both at the architecture and api level from Hadoop 1 vs Hadoop 2, particularly YARN and we had our first meetup to talk about this (http://www.meetup.com/Atlanta-YARN-User-Group/) on 10/13/2013.
Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad.
These slides cover the very basics of Hadoop architecture, in particular HDFS. This was my presentation in the first Delhi Hadoop User Group (DHUG) meetup held at Gurgaon on 10th September 2011. Loved the positive feedback. I'll also upload a more elaborate version covering Hadoop mapreduce architecture as well soon. Most of the stuff covered in these slides can be found in Tom White's book as well (See the last slide)
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and the like.
Yahoo! Hadoop grid makes use of a managed service to get the data pulled into the clusters. However, when it comes to getting the data-out of the clusters, the choices are limited to proxies such as HDFSProxy and HTTPProxy. With the introduction of HCatalog services, customers of the grid now have their data represented in a central metadata repository. HCatalog abstracts out file locations and underlying storage format of data for the users, along with several other advantages such as sharing of data among MapReduce, Pig, and Hive. In this talk, we will focus on how the ODBC/JDBC interface of HiveServer2 accomplished the use case of getting data out of the clusters when HCatalog is in use and users no longer want to worry about the files, partitions and their location. We will also demo the data out capabilities, and go through other nice properties of the data out feature.
Presenter(s):
Sumeet Singh, Director, Product Management, Yahoo!
Chris Drome, Technical Yahoo!
OBIEE Answers Vs Data Visualization: A Cage MatchMichelle Kolbe
With Oracle's new tool in 12c called Data Visualization, when do you use Answers and when do you use Data Visualization? This presentation included a live demo of the two tools. The slides walk step by step through this demo. You can follow along yourself using the Sample App data.
See my blog post walking through these slides here: https://medium.com/@datacheesehead/the-cage-match-between-obiee-answers-and-data-visualization-73496bbf4dfe#.thiuznp0z
Overview of accessing relational databases from R. Focuses and demonstrates DBI family (RMySQL, RPostgreSQL, ROracle, RJDBC, etc.) but also introduces RODBC. Highlights DBI's dbApply() function to combine strengths of SQL and *apply() on large data sets. Demonstrates sqldf package which provides SQL access to standard R data.frames.
Presented at the May 2011 meeting of the Greater Boston useR Group.
Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves. In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I'll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.
HP Distributed R is a high-performance scalable platform for the R language. It enables R to
leverage multiple cores and multiple servers to perform Big Data Advanced Analytics. It consists of
new R language constructs to easily parallelize algorithms across multiple R processes.
HP Distributed R simplifies large-scale analysis by extending R. Because R is a single-threaded
environment, it has limited utility for Big Data analytics. HP Distributed R allows you to specify that
parts of programs be run in multiple single-threaded R-processes. This approach results in
significantly reduced execution times for Big Data analysis.
Slides from my lightning talk at the Boston Predictive Analytics Meetup hosted at Predictive Analytics World, Boston, October 1, 2012.
Full code and data are available on github: http://bit.ly/pawdata
Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies.
Data profiling deserves a fresh look for two reasons: First, the area itself is neither established nor defined in any principled way, despite significant research activity on individual parts in the past. Second, current data profiling techniques hardly scale beyond what can only be called small data. Finally, more and more data beyond the traditional relational databases are being created and beg to be profiled. The talk proposes new research directions and challenges, including interactive and incremental profiling and profiling heterogeneous and non-relational data.
Speaker: Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he joined the graduate school "Distributed Information Systems" at Humboldt University of Berlin. He completed his PhD thesis on "Quality-driven Query Answering" in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on topics around data integration. From 2003 - 2006 he was assistant professor for information integration at the Humboldt-University of Berlin. Since then he holds the chair for information systems at the Hasso Plattner Institute at the University of Potsdam in Germany.
(Presented by Antonio Piccolboni to Strata 2012 Conference, Feb 29 2012).
Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.
- rhdfs provides file level manipulation for HDFS, the Hadoop file system
- rhbase provides access to HBASE, the hadoop database
- rmr allows to write mapreduce programs in R
rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.
Overview of a few ways to group and summarize data in R using sample airfare data from DOT/BTS's O&D Survey.
Starts with naive approach with subset() & loops, shows base R's tapply() & aggregate(), highlights doBy and plyr packages.
Presented at the March 2011 meeting of the Greater Boston useR Group.
Step-1 Tableau Introduction
Step-2 Connecting to Data
Step-3 Building basic views
Step-4 Data manipulations and Calculated fields
Step-5 Tableau Dashboards
Step-6 Advanced Data Options
Step-7 Advanced graph Options
Different application domains including sensor networks, social networks, science, financial services, condition monitoring systems demand the storage of a vast amount of data in the petabytes area. Prominent candidates are Google, Facebook, Yahoo!, Amazon just to name a few.
This data volume can't be tackled with convential relational database technologies anymore, either from a technical or licensing point of view or both. It demands a scale-out environment, which allows reliable, scalable and distributed processing. This trend in Big Data management is more and more approached with NoSQL solutions like Apache HBase on top of Apache Hadoop.
This session discusses big data management and their scalability challenges in general with a short introduction into Apache Hadoop/HBase and a case study on the co-existence of Apache Hadoop/HBase with Firebird in a sensor data aquisition system.
Have you ever heard the buzzword "big data"? Big data is briefly described to collect massive amounts of data and extract all the small details and larger trends that are available. Summarize the output and generate important insight about customers and competitors.
Enterprises seem to have sensed that something is in the air and have started to shop technology. So what has the world to offer for enterprises that have an unknown amount of petabytes flowing through their systems on a daily basis? There are a few options, but really few that can catch up with the popularity of Hadoop. Hadoop can store and process large amounts of data. It has a large and diverse toolset for integrations, operations and processing and it is open source!
Hadoop is emerging as the preferred solution for big data analytics across unstructured data. Using real world examples learn how to achieve a competitive advantage by finding effective ways of analyzing new sources of unstructured and machine-generated data.
Comparison between RDBMS, Hadoop and Apache based on parameters like Data Variety, Data Storage, Querying, Cost, Schema, Speed, Data Objects, Hardware profile, and Used cases. It also mentions benefits and limitations.
Apache Hadoop started as batch: simple, powerful, efficient, scalable, and a shared platform. However, Hadoop is more than that. It's true strengths are:
Scalability – it's affordable due to it being open-source and its use of commodity hardware for reliable distribution.
Schema on read – you can afford to save everything in raw form.
Data is better than algorithms – More data and a simple algorithm can be much more meaningful than less data and a complex algorithm.
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
A walk-thru of core Hadoop, the ecosystem tools, and Hortonworks Data Platform (HDP) followed by code examples in MapReduce (Java and C#), Pig, and Hive.
Presented at the Atlanta .NET User Group meeting in July 2014.
Integrating R & Hadoop - Text Mining & Sentiment AnalysisAravind Babu
This project encompasses the sentiment expressed in social media (Twitter) for smartphones.
How to Perform text mining on Hadoop data and analyze the same by integrating R with Hadoop.
My talk at August's joint meeting of Chicago's R and Hadoop user groups providing an introduction to using R with Hadoop. It starts with a quick introduction to and overview of available options, then focuses on using RHadoop's rmr library to perform an analysis on the publicly-available 'airline' data set.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Would
You
Like
to…
• Predict
X?
– The
outcome
of
a
future
event
– Who
is
likely
to
do
something
– Gene?c
factors
leading
to
disease
• Pre-‐filter
things
so
humans
can
accomplish
more?
• Do
all
of
this
faster
and
beCer?
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
2
3. Why
R
and
Hadoop?
• R
is
a
fantas?c
plaHorm
for
data
science
– Has
a
peer-‐reviewed
community
and
journal
that
vets
libraries
– (Mostly)
intui?ve
language
• Hadoop
is
the
de-‐facto
plaHorm
for
parallel
processing
• Today,
we’ll
be
talking
about
rmr,
but
there’s
two
more
packages:
rhbase
and
rhdfs
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
3
4. Nothing
Has
Changed.
Everything
Has
Changed.
• Some
of
the
most
effec?ve
techniques
for
data
mining
are
rela?vely
old
– Modern
SVM
dates
back
to
‘92
– Logis?c
regression
dates
back
to
‘44
– Important
elements
of
the
algorithms
date
back
to
Newton
• Accessibility
and
relevance
have
changed
– Accessibility
to
data
– Accessibility
of
computa?onal
power
– Necessity
of
methods
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
4
5. Some
CriBcisms
of
R
&
Rhadoop
• R
docs
are
wriCen
in
their
own
language
(using
data
frames,
etc.)
that
is
unfamiliar
to
computer
scien?sts
• R
and
CRAN
documenta?on
are
more
like
old-‐school
GNU
than
most
Apache
projects
– Get
used
to
Googling
and
using
R’s
help()
func?on
• R’s
data
management
facili?es
are
inconsistent
• Streaming
API
isn’t
super
fast
• (get
over
it)
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
5
6. Comparison
to
Other
R
Parallelism
Frameworks
• SNOW/SNOWFALL
– Operates
over
MPI,
Sockets,
or
PVM
– No
?e-‐in
to
a
DFS
(bad
for
data-‐intensive
compu?ng)
– Handles
matrix
mul?plica?on
well
(perhaps
beCer)
– Doesn’t
handle
other
non-‐trivial
IPC
well
(basically
for
parallel
linear
algebra
and
simula?ons)
• Rmpi
– More
code
– All
synchroniza?on
constructs
are
user-‐built
(just
like
MPI)
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
6
7. Comparison
to
Other
R
Parallelism
Frameworks
• Others…
– Only
other
Hadoop
libraries
have
integra?on
with
HDFS/are
appropriate
for
data
intensive
compu?ng
– Only
Rhadoop
supports
local
and
cluster
based
backends
and
has
an
intui?ve
interface
that
duplicates
closures
in
the
remote
environment
– Most
environments
are
targeted
towards
modeling
and
simula?on
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
7
8. InstallaBon
–
Local
WorkstaBon
• Install
R
– Macports
–
sudo port install r-framework!
– Ubuntu
–
sudo apt-get install r-base!
– RHEL
–
sudo yum install R!
• Install
R
dependencies
(inside
R)
– install.packages(c("Rcpp", "RJSONIO", "itertools", "digest"),
repos="http://watson.nci.nih.gov/cran_mirror/”)!
• Install
RMR
– curl http://cloud.github.com/downloads/RevolutionAnalytics/RHadoop/
rmr_1.3.1.tar.gz > rmr.tar.gz!
– install.packages("rmr.tar.gz”) # from inside r, in the same
directory!
• Configure
the
local
backend
each
?me
you
run
R
– rmr.options.set(backend=“local”)!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
8
9. InstallaBon
-‐
Cluster
• Install
R
and
all
packages
you
plan
on
using
(rmr,
e1071,
topicmodels,
tm,
etc.)
on
each
node.
• Use
a
compa?ble
version
of
Hadoop
1
(1.0.3+
or
CDH3+).
Hadoop
2
may
or
may
not
work.
• The
example
on
the
previous
slide
installs
R
packages
in
your
home
directory,
you
probably
want
to
install
them
to
the
root
install.
• Configure
environment
variables
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/lib/hadoop/contrib/
streaming/hadoop-streaming-<version>.jar!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
9
10. The
Curse
of
Volume
of
the
Unit
Ball
vs.
Dimensionality
Dimensionality
• The
volume
of
the
unit
sphere
tends
towards
0
as
the
dimensionality
of
hyperspace
increases
• Intui?vely
this
means
that
there
is
more
“slop
room”
for
your
dividing
hyperplane
to
fall
into
• The
amount
of
data
we
need
to
train
a
model
rises
with
the
feature
space,
tending
towards
infinity,
making
the
problem
untenable
• With
a
small
feature
space,
there
is
no
need
for
lots
of
data
• Thus,
there
is
liCle
point
in
using
Hadoop
to
implement
many
classic
machine
learning
models
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
10
11. The
Hadoop
Data
Science
Flow
• Join
• Sample
• Model
• Repeat
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
11
12. Join
• Put
two
pieces
of
data
together
using
a
common
key
• Scenario:
– Data
is
in
two
flat
files
in
HDFS
– Turn
rows
into
rows
of
key-‐value
pairs,
where
the
key
is
the
join
key
and
the
value
is
the
rest
of
the
row
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
12
13. Sample
• Take
a
sample
of
your
(maybe)
joined
data
• Most
common
method
is
probabilis?cally
• Numerous
other
techniques
can
leverage
par??ons
and
randomness
of
the
key
hash
• Scenarios
(a
precursor
for):
– Supervised
learning/classifica?on
– Unsupervised
learning/clustering
– Regression
– Distribu?on
modeling
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
13
14. Model
• Supervised
learning:
I
want
to
predict
something
and
I
already
know
(some)
of
the
answers.
Also
called
classifica?on
and
binary
classifica?on
• Unsupervised
learning:
I
want
to
find
natural
groupings
in
the
data
that
I
might
not
have
known
about
• Regression,
probability
modeling
–
I
want
to
fit
a
curve
to
my
data
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
14
15. Repeat
• Gain
insight
about
the
data
• Change
your
procedure
(select
only
outliers,
etc.)
• Gain
more
insight
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
15
16. Rhadoop
Impact:
Join,
Sample
• Work
totally
in
R
• Execute
large,
complex
joins
such
as
cross
joins
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
16
17. Rhadoop
Impact:
Model
• Most
algorithms
work
perfectly
well
(or
beCer)
over
a
sample
of
the
data
• Train
and
cross-‐validate
a
large
number
of
models
in
parallel
• Perform
model
selec?on
in
the
reduce
phase
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
17
18. Rhadoop
API
mapreduce(!
input,!
output = NULL,!
map = to.map(identity),!
reduce = NULL,!
combine = NULL,!
reduce.on.data.frame = FALSE,!
input.format = "native",!
output.format = "native",!
vectorized = list(map = FALSE, reduce = FALSE),!
structured = list(map = FALSE, reduce = FALSE),!
backend.parameters = list(),!
verbose = TRUE)!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
18
19. Rhadoop
API
rmr.options.set(backend = c("hadoop", "local"),!
profile.nodes = NULL, vectorized.nrows = NULL)
!
to.dfs(object, output = dfs.tempfile(), !
format = "native")!
!
from.dfs(input, format = "native", !
to.data.frame = FALSE, vectorized = FALSE,!
structured = FALSE)
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
19
20. Doing
Things
the
R
Way
• Objects
– my_car = list(color=“green”, model=“volt”)!
• Transforming a vector (list), iterating
– lapply/sapply/tapply – functional programming constructs
• Loops (not preferred)
– for ( i in 1:100) {…}!
– Note this is the same as lapply(1:100, function(i){…})!
• Other control structures – basically as you would expect
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
20
21. Vectors
in
R
• R
helps
you!
O_o
• Every
object
has
a
mode
and
length
and
hence
can
be
interpreted
as
some
sort
of
vector
–
even
primi?ves!
• Even
primi?ves
such
as
strings
or
integers
are
stored
in
a
vector
of
length
1,
never
free-‐standing
• There
are
lots
of
types
of
vectors
– Lists
(think
linked
list)
– Atomic
vectors
(think
array)
hCp://cran.r-‐project.org/doc/manuals/R-‐intro.html#The-‐intrinsic-‐aCributes-‐
mode-‐and-‐length
• Type
coercion
usually
works
the
way
you
would
expect
– But…
you
may
find
yourself
using
as.list()
or
as.vector()
or
doing
manual
coercion
frequently
depending
on
what
libraries
you’re
using
due
to
mode
not
matching
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
21
22. Example
–
Fake
Data
fakedata = data.frame(x = c(rnorm(100)*.25, rep(.
75,100)+rnorm(100)*.25), y = c(rnorm(100), rep(1,100)+rnorm(100)),
z = c(rep(0,100), rep(1,100)) )!
!
plot(fakedata[,"x"],fakedata[,"y"],col=sapply(fakedata[,"z"],
function(z) ifelse(z>0,"blue","green")))!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
22
23. Examples
–
Simple
Parallelism
rmr.options.set(backend=“local”)!
!
ints = to.dfs(1:100)!
!
squares = mapreduce(ints, map=function(x)
reyval(NULL,x^2))!
!
print from.dfs(ints)!
!
# notice the result will be !
# keyvals!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
23
24. Examples
–
Trying
Lots
of
SVM
Kernels
kernels =
to.dfs(list("linear","polynomial","radial","sigmoid"
))!
!
models =
from.dfs(mapreduce(kernels,map=function(nothing,kern
)
keyval(NULL,svm(factor(z)~.,fakedata,kernel=kern))))!
!
plot(models[[1]][["val"]],fakedata)!
!
!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
24
25. Examples
–
Different
Models
calls =
to.dfs(list(list("glm",z~.,family=binomial("logi
t"), fakedata),list("svm",z~.,fakedata)))!
!
models = from.dfs(mapreduce(calls,
map=function(nothing,callsig)
keyval(NULL,do.call(callsig[[1]],callsig[2:lengt
h(callsig)]))))!
!
models[[1]][["val"]]!
This
document
is
company
confiden?al
and
is
intended
solely
for
the
use
and
informa?on
of
Booz
Allen
Hamilton
25