Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
AWS Black Belt Online Seminarの最新コンテンツ: https://aws.amazon.com/jp/aws-jp-introduction/#new
過去に開催されたオンラインセミナーのコンテンツ一覧: https://aws.amazon.com/jp/aws-jp-introduction/aws-jp-webinar-service-cut/
Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges. In this session, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We will talk about common architectures, best practices to quickly create Spark clusters using Amazon EMR, and ways to integrate Spark with other big data services in AWS.
Learning Objectives:
• Learn why Spark is great for ad-hoc interactive analysis and real-time stream processing.
• How to deploy and tune scalable clusters running Spark on Amazon EMR.
• How to use EMR File System (EMRFS) with Spark to query data directly in Amazon S3.
• Common architectures to leverage Spark with Amazon DynamoDB, Amazon Redshift, Amazon Kinesis, and more.
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Amazon Web Services
Amazon EMR is a managed service that makes it easy for customers to use big data frameworks and applications like Apache Hadoop, Spark, and Presto to analyze data stored in HDFS or on Amazon S3, Amazon’s highly scalable object storage service. In this session, we will introduce Amazon EMR and the greater Apache Hadoop ecosystem, and show how customers use them to implement and scale common big data use cases such as batch analytics, real-time data processing, interactive data science, and more. Then, we will walk through a demo to show how you can start processing your data at scale within minutes.
AWS Black Belt Online Seminarの最新コンテンツ: https://aws.amazon.com/jp/aws-jp-introduction/#new
過去に開催されたオンラインセミナーのコンテンツ一覧: https://aws.amazon.com/jp/aws-jp-introduction/aws-jp-webinar-service-cut/
Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges. In this session, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We will talk about common architectures, best practices to quickly create Spark clusters using Amazon EMR, and ways to integrate Spark with other big data services in AWS.
Learning Objectives:
• Learn why Spark is great for ad-hoc interactive analysis and real-time stream processing.
• How to deploy and tune scalable clusters running Spark on Amazon EMR.
• How to use EMR File System (EMRFS) with Spark to query data directly in Amazon S3.
• Common architectures to leverage Spark with Amazon DynamoDB, Amazon Redshift, Amazon Kinesis, and more.
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Amazon Web Services
Amazon EMR is a managed service that makes it easy for customers to use big data frameworks and applications like Apache Hadoop, Spark, and Presto to analyze data stored in HDFS or on Amazon S3, Amazon’s highly scalable object storage service. In this session, we will introduce Amazon EMR and the greater Apache Hadoop ecosystem, and show how customers use them to implement and scale common big data use cases such as batch analytics, real-time data processing, interactive data science, and more. Then, we will walk through a demo to show how you can start processing your data at scale within minutes.
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAmazon Web Services
Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges.
In this webinar, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We will talk about common architectures and best practices to quickly create Spark clusters using Amazon Elastic MapReduce (EMR), and ways to use Spark with Amazon Redshift, Amazon DynamoDB, Amazon Kinesis, and other big data applications in the Apache Hadoop ecosystem.
Learning Objectives:
Learn why Spark is great for ad-hoc interactive analysis and real-time stream processing
How to deploy and tune scalable clusters running Spark on Amazon EMR
How to use EMR File System (EMRFS) with Spark to query data directly in Amazon S3
Common architectures to leverage Spark with DynamoDB, Redshift, Kinesis, and more
Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges. In this session, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We will talk about common architectures, best practices to quickly create Spark clusters using Amazon EMR, and ways to integrate Spark with other big data services in AWS.
Learning Objectives:
• Learn why Spark is great for ad-hoc interactive analysis and real-time stream processing.
• How to deploy and tune scalable clusters running Spark on Amazon EMR.
• How to use EMR File System (EMRFS) with Spark to query data directly in Amazon S3.
• Common architectures to leverage Spark with Amazon DynamoDB, Amazon Redshift, Amazon Kinesis, and more.
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Data science with spark on amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices.
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices.
Learn how to use Apache Spark on AWS to implement and scale common big data use cases such as Real-time data processing, interactive data science, and more.
Apache Spark is the fast, open source engine that is rapidly becoming the most popular choice for big data processing. Running it on AWS is especially powerful as you get scale, elasticity and agility from the AWS platform coupled with the rich functionality that Spark provides.In this session we will explore how to get the most out of Spark on AWS.
Speaker: Nam Je Cho, Enterprise Solutions Architect, Amazon Web Services
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...Amazon Web Services
Easy scalability is a powerful feature of Amazon Aurora. Scalability in its actual definition refers to being able to get larger or smaller depending on the need. Amazon Aurora allows you to easily achieve this by scaling the database instance up or down and adding or removing read replicas. Scaling across regions brings additional resilience to your architectures and could boost your application performance due to geographic proximity. You can perform all of these scaling operations through the Aurora console. You can also automate instance and read scaling using lambda function or scripts based on the usage pattern you define. You can extend the automation by feeding your database usage data from Aurora enhanced monitoring into Machine Learning to provide more sophisticated predictive patterns to drive your automation. In this session we will do a deep dive into how scalability works in Aurora and how to make the best use of it to reduce your cost, increase application performance and architect resilient applications.
You should have good database knowledge and at least some experience with Amazon RDS or Amazon Aurora and should bring your own laptop.
Use case of the usage of Apache Spark @Windward Ltd.
Video lecture on YouTube: https://www.youtube.com/watch?v=rPO6P5YIKUI
Showing the domain of the company,
A short introduction of Apache Spark,
And the Tool Box used @Windward Ltd to form a working production Spark Data Pipeline.
Similar to (BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
2. What to Expect from the Session
• Data science with Apache Spark
• Running Spark on Amazon EMR
• Customer use cases and architectures
• Best practices for running Spark
• Demo: Using Apache Zeppelin to analyze US domestic
flights dataset
3.
4. Spark is fast
join
filter
groupBy
Stage 3
Stage 1
Stage 2
A: B:
C: D: E:
F:
= cached partition= RDD
map
• Massively parallel
• Uses DAGs instead of map-
reduce for execution
• Minimizes I/O by storing data
in RDDs in memory
• Partitioning-aware to avoid
network-intensive shuffle
8. Use DataFrames to easily interact with data
• Distributed
collection of data
organized in
columns
• An extension of the
existing RDD API
• Optimized for query
execution
12. Use DataFrames for machine learning
• Spark ML libraries
(replacing MLlib) use
DataFrames as
input/output for
models
• Create ML pipelines
with a variety of
distributed algorithms
13. Create DataFrames on streaming data
• Access data in Spark Streaming DStream
• Create SQLContext on the SparkContext used for Spark
Streaming application for ad hoc queries
• Incorporate DataFrame in Spark Streaming application
14. Use R to interact with DataFrames
• SparkR package for using R to manipulate DataFrames
• Create SparkR applications or interactively use the SparkR
shell (no Zeppelin support yet - ZEPPELIN-156)
• Comparable performance to Python and Scala
DataFrames
15. Spark SQL
• Seamlessly mix SQL with Spark programs
• Uniform data access
• Hive compatibility – run Hive queries without
modifications using HiveContext
• Connect through JDBC/ODBC
17. Focus on deriving insights from your data
instead of manually configuring clusters
Easy to install and
configure Spark
Secured
Spark submit or use
Zeppelin UI
Quickly add
and remove capacity
Hourly, reserved, or
EC2 Spot pricing
Use S3 to decouple
compute and storage
18. Launch the latest Spark version
July 15 – Spark 1.4.1 GA release
July 24 – Spark 1.4.1 available on Amazon EMR
September 9 – Spark 1.5.0 GA release
September 30 – Spark 1.5.0 available on Amazon EMR
< 3 week cadence with latest open source release
19. Amazon EMR runs Spark on YARN
• Dynamically share and centrally configure
the same pool of cluster resources across
engines
• Schedulers for categorizing, isolating, and
prioritizing workloads
• Choose the number of executors to use, or
allow YARN to choose (dynamic allocation)
• Kerberos authentication
Storage
S3, HDFS
YARN
Cluster Resource Management
Batch
MapReduce
In Memory
Spark
Applications
Pig, Hive, Cascading, Spark Streaming, Spark SQL
20. Create a fully configured cluster in minutes
AWS Management
Console
AWS Command Line
Interface (CLI)
Or use an AWS SDK directly with the Amazon EMR API
22. Many storage layers to choose from
Amazon DynamoDB
EMR-DynamoDB
connector
Amazon RDS
Amazon
Kinesis
Streaming data
connectorsJDBC Data Source
w/ Spark SQL
Elasticsearch
connector
Amazon Redshift
Amazon Redshift Copy
From HDFS
EMR File System
(EMRFS)
Amazon S3
Amazon EMR
23. Decouple compute and storage by using S3
as your data layer
HDFS
S3 is designed for 11
9’s of durability and is
massively scalable
EC2 Instance
Memory
Amazon S3
Amazon EMR
Amazon EMR
Amazon EMR
24. Easy to run your Spark workloads
Amazon EMR Step API
SSH to master node
(Spark Shell)
Submit a Spark
application
Amazon EMR
25. Secure Spark clusters – encryption at rest
On-Cluster
HDFS transparent encryption (AES 256)
[new on release emr-4.1.0]
Local disk encryption for temporary files
using LUKS encryption via bootstrap action
Amazon S3
Amazon S3
EMRFS support for Amazon S3 client-side
and server-side encryption (AES 256)
26. Secure Spark clusters – encryption in flight
Internode communication on-cluster
Blocks are encrypted in-transit in HDFS
when using transparent encryption
Spark’s Broadcast and FileServer services
can use SSL. BlockTransferService (for
shuffle) can’t use SSL (SPARK-5682).
Amazon S3
S3 to Amazon EMR cluster
Secure communication with SSL
Objects encrypted over the wire if using
client-side encryption
27. Secure Spark clusters – additional features
Permissions:
• Cluster level: IAM roles for the Amazon
EMR service and the cluster
• Application level: Kerberos (Spark on
YARN only)
• Amazon EMR service level: IAM users
Access: VPC, security groups
Auditing: AWS CloudTrail
33. • Using correct instance
• Understanding Executors
• Sizing your executors
• Dynamic allocation on YARN
• Understanding storage layers
• File formats and compression
• Boost your performance
• Data serialization
• Avoiding shuffle
• Managing partitions
• RDD Persistence
• Using Zeppelin notebook
34. What does Spark need?
• Memory – lots of it!!
• Network
• CPU
• Horizontal Scaling
Workflow Resource
Machine learning CPU
ETL I/O
Instance
35. Try different configurations to find your optimal architecture.
CPU
c1 family
c3 family
cc1.4xlarge
cc2.8xlarge
Memory
m2 family
r3 family
cr1.8xlarge
Disk/IO
d2 family
i2 family
General
m1 family
m3 family
Choose your instance types
Batch Machine Interactive Large
process learning Analysis HDFS
36. • Using correct instance
• Understanding Executors
• Sizing your executors
• Dynamic allocation on YARN
• Understanding storage layers
• File formats and compression
• Caching tables
• Boost your performance
• Data serialization
• Avoiding shuffle
• Managing partitions
• RDD Persistence
41. Selecting number of executor cores:
• Leave 1 core for OS and other activities
• 4-5 cores per executor gives a good performance
• Each executor can run up to 4-5 tasks
• i.e. 4-5 threads for read/write operations to HDFS
Inside Spark Executor on YARN
42. Selecting number of executor cores:
--num-executors or spark.executor.instances
• 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟𝑠 𝑝𝑒𝑟 𝑛𝑜𝑑𝑒 =
(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑒𝑠 𝑜𝑛 𝑛𝑜𝑑𝑒 −1 𝑓𝑜𝑟 𝑂𝑆 )
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑎𝑠𝑘 𝑝𝑒𝑟 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟
•
16 −1
5
= 3 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟𝑠 𝑝𝑒𝑟 𝑛𝑜𝑑𝑒
Inside Spark Executor on YARN
Model vCPU Mem (GB) SSD
Storage
(GB)
Networking
r3.4xlarge 16 122 1 x 320 High
43. Selecting number of executor cores:
--num-executors or spark.executor.instances
•
16 −1
5
= 3 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟𝑠 𝑝𝑒𝑟 𝑛𝑜𝑑𝑒
• 6 instances
• 𝑛𝑢𝑚 𝑜𝑓 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟𝑠 = 3 ∗ 6 − 1 = 𝟏𝟕
Inside Spark Executor on YARN
44. Inside Spark Executor on YARN
Max Container size on node
YARN Container Controls the max sum of memory used by the container
yarn.nodemanager.resource.memory-mb
→
Default: 116 GConfig File: yarn-site.xml
45. Inside Spark Executor on YARN
Max Container size on node
Executor space Where Spark executor Runs
Executor Container
→
46. Inside Spark Executor on YARN
Max Container size on node
Executor Memory Overhead - Off heap memory (VM overheads, interned strings etc.)
𝑠𝑝𝑎𝑟𝑘. 𝑦𝑎𝑟𝑛. 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟. 𝑚𝑒𝑚𝑜𝑟𝑦𝑂𝑣𝑒𝑟ℎ𝑒𝑎𝑑 = 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟𝑀𝑒𝑚𝑜𝑟𝑦 ∗ 0.10
Executor Container
Memory
Overhead
Config File: spark-default.conf
47. Inside Spark Executor on YARN
Max Container size on node
Spark executor memory - Amount of memory to use per executor process
spark.executor.memory
Executor Container
Memory
Overhead
Spark Executor Memory
Config File: spark-default.conf
48. Inside Spark Executor on YARN
Max Container size on node
Shuffle Memory Fraction- Fraction of Java heap to use for aggregation and cogroups
during shuffles
spark.shuffle.memoryFraction
Executor Container
Memory
Overhead
Spark Executor Memory
Shuffle
memoryFraction
Default: 0.2
49. Inside Spark Executor on YARN
Max Container size on node
Storage storage Fraction - Fraction of Java heap to use for Spark's memory cache
spark.storage.memoryFraction
Executor Container
Memory
Overhead
Spark Executor Memory
Shuffle
memoryFraction
Storage
memoryFraction
Default: 0.6
50. Inside Spark Executor on YARN
Max Container size on node
--executor-memory or spark.executor.memory
𝐸𝑥𝑒𝑐𝑢𝑡𝑜𝑟 𝑚𝑒𝑚𝑜𝑟𝑦 =
𝑀𝑎𝑥 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 𝑠𝑖𝑧𝑒
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑒𝑐𝑢𝑡𝑜𝑟 𝑝𝑒𝑟 𝑛𝑜𝑑𝑒
Config File: spark-default.conf
51. Inside Spark Executor on YARN
Max Container size on node
--executor-memory or spark.executor.memory
𝐸𝑥𝑒𝑐𝑢𝑡𝑜𝑟 𝑚𝑒𝑚𝑜𝑟𝑦 =
116 𝐺
3
~=38 G
Config File: spark-default.conf
52. Inside Spark Executor on YARN
Max Container size on node
--executor-memory or spark.executor.memory
𝑀𝑒𝑚𝑜𝑟𝑦 𝑂𝑣𝑒𝑟ℎ𝑒𝑎𝑑 => 38 ∗ 0.10 => 3.8 𝐺
Config File: spark-default.conf
53. Inside Spark Executor on YARN
Max Container size on node
--executor-memory or spark.executor.memory
𝐸𝑥𝑒𝑐𝑢𝑡𝑜𝑟 𝑀𝑒𝑚𝑜𝑟𝑦 => 38 − 3.8 => ~34 𝐺𝐵
Config File: spark-default.conf
56. • Using correct instance
• Understanding Executors
• Sizing your executors
• Dynamic allocation on YARN
• Understanding storage layers
• File formats and compression
• Boost your performance
• Data serialization
• Avoiding shuffle
• Managing partitions
• RDD Persistence
• Using Zeppelin Notebook
57. Dynamic Allocation on YARN
… allows your Spark applications to scale up based on
demand and scale down when not required.
Remove Idle executors,
Request more on demand
58. Dynamic Allocation on YARN
Scaling up on executors
- Request when you want the job to complete faster
- Idle resources on cluster
- Exponential increase in executors over time
60. • Using correct instance
• Understanding Executors
• Sizing your executors
• Dynamic allocation on YARN
• Understanding storage layers
• File formats and compression
• Boost your performance
• Data serialization
• Avoiding shuffle
• Managing partitions
• RDD Persistence
• Using Zeppelin notebook
61. Compressions
• Always compress data files on Amazon S3
• Reduces storage cost
• Reduces bandwidth between Amazon S3 and
Amazon EMR
• Speeds up your job
62. Compressions
Compression types:
– Some are fast BUT offer less space reduction
– Some are space efficient BUT slower
– Some are split able and some are not
Algorithm % Space
Remaining
Encoding
Speed
Decoding
Speed
GZIP 13% 21MB/s 118MB/s
LZO 20% 135MB/s 410MB/s
Snappy 22% 172MB/s 409MB/s
63. Compressions
• If you are time-sensitive, faster compressions are a
better choice
• If you have large amount of data, use space-efficient
compressions
• If you don’t care, pick GZIP
64. • Using correct instance
• Understanding Executors
• Sizing your executors
• Dynamic allocation on YARN
• Understanding storage layers
• File formats and compression
• Boost your performance
• Data serialization
• Avoiding shuffle
• Managing partitions
• RDD Persistence
• Using Zeppelin notebook
65. Data Serialization
• Data is serialized when cached or shuffled
Default: Java serializer
Memory
Disk
Memory
Disk
Spark executor
66. Data Serialization
• Data is serialized when cached or shuffled
Default: Java serializer
• Kyro serialization (10x faster than Java serialization)
• Does not support all Serializable types
• Register the class in advance
Usage: Set in SparkConf
conf.set("spark.serializer”,"org.apache.spark.serializer.KryoSerializer")
67. Spark doesn’t like to Shuffle
• Shuffling is expensive
• Disk I/O
• Data Serialization
• Network I/O
• Spill to disk
• Increased Garbage collection
• Use aggregateByKey() instead of your own aggregator
Usage:
myRDD.aggregateByKey(0)((k,v) => v.toInt+k, (v,k) => k+v).collect
• Apply filter earlier on data
68. Parallelism & Partitions
𝑠𝑝𝑎𝑟𝑘. 𝑑𝑒𝑓𝑎𝑢𝑙𝑡. 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚
• getNumPartitions()
• If you have >10K tasks, then its good to coalesce
• If you are not using all the slots on cluster, repartition can
increase parallelism
• 2-3 tasks per CPU core in your cluster
Config File: spark-default.conf
69. RDD Persistence
• Caching or persisting dataset in memory
• Methods
• cache()
• persist()
• Small RDD MEMORY_ONLY
• Big RDD MEMORY_ONLY_SER (CPU intensive)
• Don’t spill to disk
• Use replicated storage for faster recovery