This document discusses how data science platforms can be built on cloud computing infrastructure like Amazon Web Services (AWS). It highlights how AWS provides scalable, on-demand computing and storage resources that allow data and compute needs to scale rapidly. Example applications and customer case studies are presented to show how various organizations are using AWS for large-scale data analysis, including genomics, computational fluid dynamics, and more. The document argues that distributed, programmable cloud infrastructure can support new types of data-driven science by providing massive, rapidly scaling resources.
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
Part 3 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation demonstrates how to use Apache Whirr to launch a Hadoop cluster on Amazon EC2--easily.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
Part 2 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation is geared towards anyone with an occasional need for more computing power.
We walk through the mechanics of launching a instance on Amazon's EC2, install some software (like R and RStudio), and make sure it all works.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012.
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
Part 3 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation demonstrates how to use Apache Whirr to launch a Hadoop cluster on Amazon EC2--easily.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
Part 2 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation is geared towards anyone with an occasional need for more computing power.
We walk through the mechanics of launching a instance on Amazon's EC2, install some software (like R and RStudio), and make sure it all works.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012.
Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. This presentation provides an introduction to Apache DataFu covering both the datafu-pig and datafu-hourglass libraries. This presentation also contains a number of examples of using DataFu to make pig development easier.
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
Part 1 of 3 of series focusing on the infrastructure aspect of getting started with Big Data, specifically Hadoop. This presentation starts small, installing a pre-packaged virtual machine from Hadoop vendor Cloudera on your local machine.
We then install R, copy some sample data into HDFS and test everything by running Jonathan Seidman's a sample streaming job.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
This session will describe how members of the US Large Hadron Collider (LHC) community have benchmarked the usage of Amazon Elastic Compute Cloud (Amazon EC2) resource to simulate events observed by experiments at the European Organization for Nuclear Research (CERN). Miron Livny from the University of Wisconsin-Madison who has been collaborating with the US-LHC community for more than a decade will detail the process for benchmarking high-throughput computing (HTC) applications running across multiple AWS regions using the open source HTCondor distributed computing software. The presentation will also outline the different ways that AWS and HTCondor can help meet the needs of compute intensive applications from other scientific disciplines.
[JSDC 2016] Codex: Conditional Modules Strike BackAlex Liu
Netflix runs hundreds of multivariate AB tests a year, many of which help personalize the experience in the UI. This causes an exponential growth in the number of user experiences we serve to members, with each unique experience resulting in a unique JS/CSS bundle. Pre-publishing million of permutations to the CDN for each build of each UI simply does not work at Netflix scale.
Instead, we've taken a novel approach by standing up a brand new Node.js service: Codex. Codex's sole responsibility is to build personalized JS/CSS bundles on the fly for our members as they move through the Netflix user experience. This frees up our UI teams to innovate rapidly on the UI itself, without having to worry about the costs of infrastructure and the complexity of pre-publishing to the CDN.
As we stood up Codex, we learned a ton about building a horizontally scalable Node.js microservice. This talk is the story of how we built, designed, and scaled that service to meet the needs of our 80 million customers.
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
Quick overview of programming Apache Hadoop with R. Jonathan Seidman's sample code allows a quick comparison of several packages followed by a real example using RHadoop's rmr package. Our example demonstrates using compound (vs. single-field) keys and values and shows the data coming into and out of our mapper and reducer functions.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
How to protect your application from outages and failures of cloud infrastructures. Planning disaster recovery architecture and use Cloudify for cloud abstraction and monitoring.
Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. This presentation provides an introduction to Apache DataFu covering both the datafu-pig and datafu-hourglass libraries. This presentation also contains a number of examples of using DataFu to make pig development easier.
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
Part 1 of 3 of series focusing on the infrastructure aspect of getting started with Big Data, specifically Hadoop. This presentation starts small, installing a pre-packaged virtual machine from Hadoop vendor Cloudera on your local machine.
We then install R, copy some sample data into HDFS and test everything by running Jonathan Seidman's a sample streaming job.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
This session will describe how members of the US Large Hadron Collider (LHC) community have benchmarked the usage of Amazon Elastic Compute Cloud (Amazon EC2) resource to simulate events observed by experiments at the European Organization for Nuclear Research (CERN). Miron Livny from the University of Wisconsin-Madison who has been collaborating with the US-LHC community for more than a decade will detail the process for benchmarking high-throughput computing (HTC) applications running across multiple AWS regions using the open source HTCondor distributed computing software. The presentation will also outline the different ways that AWS and HTCondor can help meet the needs of compute intensive applications from other scientific disciplines.
[JSDC 2016] Codex: Conditional Modules Strike BackAlex Liu
Netflix runs hundreds of multivariate AB tests a year, many of which help personalize the experience in the UI. This causes an exponential growth in the number of user experiences we serve to members, with each unique experience resulting in a unique JS/CSS bundle. Pre-publishing million of permutations to the CDN for each build of each UI simply does not work at Netflix scale.
Instead, we've taken a novel approach by standing up a brand new Node.js service: Codex. Codex's sole responsibility is to build personalized JS/CSS bundles on the fly for our members as they move through the Netflix user experience. This frees up our UI teams to innovate rapidly on the UI itself, without having to worry about the costs of infrastructure and the complexity of pre-publishing to the CDN.
As we stood up Codex, we learned a ton about building a horizontally scalable Node.js microservice. This talk is the story of how we built, designed, and scaled that service to meet the needs of our 80 million customers.
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
Quick overview of programming Apache Hadoop with R. Jonathan Seidman's sample code allows a quick comparison of several packages followed by a real example using RHadoop's rmr package. Our example demonstrates using compound (vs. single-field) keys and values and shows the data coming into and out of our mapper and reducer functions.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
How to protect your application from outages and failures of cloud infrastructures. Planning disaster recovery architecture and use Cloudify for cloud abstraction and monitoring.
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu
Data analysts, data engineers, and application developers are supporting unprecedented rates of change, whether talking about latency requirements to the expanding arena of data usage scenarios. While the technology functionality must rapidly evolve to meet customer needs and respond to competitive pressures, how can we enhance the data platform to help manage this unpredictability?
To help address these realities, data practitioners from a diverse set of backgrounds are increasingly relying on schema-free, distributed, scalable, and high-performance data storage (also known as NoSQL databases). In this session, we will showcase a wide variety of customer scenarios, business goals, and technical challenges faced by real-world customers. More importantly, how adding Azure DocumentDB into a data practitioner's arsenal within the Microsoft/Azure data ecosystem will allow you to easily solve these complex design patterns at massive scale.
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Jennie Wang, Software Engineer (Intel)
Tsai Louie, Software Engineer (Intel)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Jisc
In Jisc's future of cloud computing horizon scan report, we identified three strategic areas where Jisc could support universities and colleges in moving to the cloud – cloud as a utility, app as a service, and working to build capability in cloud technologies.
Come along to this session to hear more about this work from Jisc futurist Martin Hamilton, and find out how you can get involved.
This is a sharing on a seminar held together by Cathay Bank and the AWS User Group in Taiwan. In this sharing, overview of Amazon EMR and AWS Glue is offered and CDK management on those services via practical scenarios is also presented
인공 지능을 공부하려는 개발자들이 필연적으로 부딪히는 문제 상황에 대한 해법을 알려드립니다. 본 세션에서는 손쉬운 딥러닝 인프라 설정, 빠른 모델 학습 과정, 기존 서비스에 인공 지능 기능 탑재 방법 등에 대한 다양한 서비스와 활용 사례를 데모와 함께 보여 드립니다.
- 딥러닝 인프라 설정 및 빠른 모델 학습 과정 소개
- AI 기반 이미지 인식 및 TTS 서비스 서비스 활용 사례
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
Netflix is obsessed with customer joy, we relentlessly focus on product experience and high-quality content. In recent years, we have been making heavy investments in the tech-driven studio and content production. As a result, a lot of unique challenges arise in the real-time data infrastructure space. For example, in a microservices architecture, domain entities are spread in different applications and persistence storages, this made low latency consistent operational reporting and entity searching especially challenging.
In this talk, we’ll talk about some interesting use cases, the various challenges lay in the fundamentals of distributed systems, and how did we solve them. We will also discuss the learnings, things we could’ve done differently, and the new vision towards an open self-serving Data Mesh platform that empowers our partners and users to build flexible real-time data pipelines.
A presentation on open science as part of the Ask Later talks at Ignite Seattle Two. The talk was 5 minutes with 15 second slide transitions and quite a blast.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
53. include_recipe "packages"
include_recipe "ruby"
include_recipe "apache2"
if platform?("centos","redhat")
if dist_only?
# just the gem, we'll install the apache module within apache2
package "rubygem-passenger"
return
else
package "httpd-devel"
end
else
%w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
package pkg do
action :upgrade
end
end
end
gem_package "passenger" do
version node[:passenger][:version]
end
execute "passenger_module" do
command 'echo -en "nnnn" | passenger-install-apache2-module'
creates node[:passenger][:module_path]
end
54. import boto
import boto.emr
from boto.emr.step import StreamingStep
Connect to Elastic MapReduce
from boto.emr.bootstrap_action import BootstrapAction
import time
# set your aws keys and S3 bucket, e.g. from environment or .boto
AWSKEY=
SECRETKEY=
S3_BUCKET=
NUM_INSTANCES = 1
conn = boto.connect_emr(AWSKEY,SECRETKEY)
bootstrap_step = BootstrapAction("download.tst",
"s3://elasticmapreduce/bootstrap-actions/download.sh",None)
Install packages
step = StreamingStep(name='Wordcount',
mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py',
cache_files = ["s3n://" + S3_BUCKET + "/boto.mod#boto.mod"],
reducer='aggregate',
input='s3n://elasticmapreduce/samples/wordcount/input',
output='s3n://' + S3_BUCKET + '/output/wordcount_output')
Set up mappers &
jobid = conn.run_jobflow(
name="testbootstrap",
reduces
log_uri="s3://" + S3_BUCKET + "/logs",
steps = [step],
bootstrap_actions=[bootstrap_step],
num_instances=NUM_INSTANCES)
print "finished spawning job (note: starting still takes time)"
state = conn.describe_jobflow(jobid).state
print "job state = ", state
print "job id = ", jobid
while state != u'COMPLETED':
print time.localtime() job state
time.sleep(30)
state = conn.describe_jobflow(jobid).state
print "job state = ", state
print "job id = ", jobid
print "final output can be found in s3://" + S3_BUCKET + "/output" + TIMESTAMP
print "try: $ s3cmd sync s3://" + S3_BUCKET + "/output" + TIMESTAMP + " ."
55.
56. “I terminate the
instance and
relaunch it. Thats
my error handling”
Source: @jtimberman on Twitter
77. Linpack benchmark
880-instance CC1 cluster
Performance: 41.82 TFlops*
*#231 in Nov 2010 Top 500 rankings
78. Credit: K. Jorissen, F. D.Villa, and J. J. Rehr
WIEN2k Parallel Performance (U. Washington)
KS for huge system
at 1 k-point
H size 56,000 (25GB)
Runtime (16x8 processors)
Local (Infiniband) 3h:48
Cloud (10Gbps) 1h:30 ($40)
VERY DEMANDING
network performance
•1200 atom unit cell; SCALAPACK+MPI diagonalization, matrix size 50k-100k
124. deesingh@amazon.com
Twitter:@mndoci
http://slideshare.net/mndoci
http://mndoci.com
Inspiration and ideas from
Matt Wood& Larry Lessig
Credit” Oberazzi under a CC-BY-NC-SA license