VMworld 2013
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)
The Cisco Open SDN Controller is a commercial distribution of OpenDaylight that delivers business agility through automation of standards-based network infrastructure.
Built as a highly scalable software-defined networking (SDN) platform, the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
The controller exposes REST APIs to allow other applications to take advantage capabilities of the controller and unlock the power of the underlying network infrastructure, and JAVA APIs to allow for the creation of new network services.
This session will present the basic constructs of the controller and the capabilities of the REST and JAVA APIs to demonstrate how the Open SDN Controller abstracts away the complexity of managing heterogeneous networks to improve service delivery and reduce operating costs.
Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
Динамичное развитие инструментов для обработки Больших Данных порождает новые подходы к повышению производительности. Ключевые новые технологии в Hadoop 2.0, такие как Yarn labeling и Storage Tiering, уже используются компаниями Yahoo и Ebay. Эти новые технологии открывают путь для серьезного повышения эффективности ИТ-инфраструктуры для Hadoop, достигая прироста производительности в несколько десятков процентов при одновременном снижении потребления памяти и электроэнергии.
Эталонная архитектура для Hadoop от HP — HP Big Data Reference Architecture — предлагает использование специализированных "микросерверов" HP Moonshot вкупе с высокоплотными узлами хранения HP Apollo для достижения лучших на сегодня показателей полезной отдачи от железа в Hadoop.
Big-Data-as-a-Service (BDaaS) in an enterprise environment requires meeting the often contradictory goals of (1) providing your data scientists, analysts, and data engineers with a self-service consumption model; (2) delivering agile and scalable on-demand infrastructure for the rapidly evolving ecosystem of big data frameworks and application software; while (3) ensuring enterprise-grade capabilities for isolation, security, monitoring, etc.
In this presentation at our BDaaS meetup in Santa Clara, Tom Phelan (chief architect and co-founder of BlueData) reviewed these goals and how to resolve the potential contradictions. He also discussed the infrastructure, application, user experience, security, and maintainability considerations required before selecting (or designing and building) a Big-Data-as-a-Service platform for an enterprise big data deployment.
More info on this BDaaS meetup can be found at: http://www.meetup.com/Big-Data-as-a-Service/events/233999817
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
3 Things to Learn About:
*On-premises versus the cloud
*Design & benefits of real-time operational data in the cloud
*Best practices and architectural considerations
Presentation given for the SQLPass community at SQLBits XIV in Londen. The presentation is an overview about the performance improvements provided to Hive with the Stinger initiative.
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
Cloudera Enterprise BDR delivers centralized disaster recovery for data and metadata, enabling you to prepare for disaster by moving data to your secondary site automatically. Cloudera Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing access to all data stored in Hadoop, which are critical for customers that are in highly regulated industries and have stringent compliance requirements.
This presentation will teach you how to:
- Centrally configure and manage replication workflows for files (HDFS) and metadata (Hive)
- Consistently meet or exceed SLAs and RTOs through simplified management and process automation
- Track access permissions and actual accesses to all data objects in Hive, HBase, and HDFS
- Answer the questions:
- Who has access to which data object(s)
- Which data objects were accessed by a user
- When was a data object accessed and by whom
- What data assets were accessed using a service
- Which device was used to access
The Fundamentals Guide to HDP and HDInsightGert Drapers
This session will give you the architectural overview and introduction in to inner workings of HDP 2.0 (http://hortonworks.com/products/hdp-windows/) and HDInsight. The world has embraced the Hadoop toolkit to solve their data problems from ETL, data warehouses to event processing pipelines. As Hadoop consists of many components, services and interfaces, understanding its architecture is crucial, before you can successfully integrate it in to your own environment.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but the overall cluster configuration can present some unespected issues that can compromise performances and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them. The presented use cases will refer to DL4J and ND4J on different Spark deployment modes (standalone, YARN, Kubernetes). The reference programming language for any code example would be Scala, but no preliminary Scala knowledge is mandatory in order to better understanding the presented topics.
There is increased interest in using Kubernetes, the open-source container orchestration system for modern, stateful Big Data analytics workloads. The promised land is a unified platform that can handle cloud native stateless and stateful Big Data applications. However, stateful, multi-service Big Data cluster orchestration brings unique challenges. This session will delve into the technical gaps and considerations for Big Data on Kubernetes.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between on-premises servers, cloud instances, and across data centers. Organizations have embarked on this journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Hadoop, Spark and to lesser extent, noSQL platforms such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts...
Speakers
Anant Chintamaneni, VP Products, BlueData
Nanda Vijaydev, Director Solutions, BlueData
John Sing's Edge 2013 presentation, detailing when/where/how external storage products and/or system software (i.e. GPFS) can be effectively used in a Hadoop storage environment. Many Hadoop situations absolutely required direct attached storage. However, there are many intelligent situations where shared external storage may make sense in a Hadoop environment. This presentation details how/why/where, and promotes taking an intelligent, Hadoop-aware approach to deciding between internal storage and external shared storage. Having full awareness of Hadoop considerations is essential to selecting either internal or external shared storage in Hadoop environment.
VMworld 2013
Chris Greer, FedEx
Richard McDougall, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
Динамичное развитие инструментов для обработки Больших Данных порождает новые подходы к повышению производительности. Ключевые новые технологии в Hadoop 2.0, такие как Yarn labeling и Storage Tiering, уже используются компаниями Yahoo и Ebay. Эти новые технологии открывают путь для серьезного повышения эффективности ИТ-инфраструктуры для Hadoop, достигая прироста производительности в несколько десятков процентов при одновременном снижении потребления памяти и электроэнергии.
Эталонная архитектура для Hadoop от HP — HP Big Data Reference Architecture — предлагает использование специализированных "микросерверов" HP Moonshot вкупе с высокоплотными узлами хранения HP Apollo для достижения лучших на сегодня показателей полезной отдачи от железа в Hadoop.
Big-Data-as-a-Service (BDaaS) in an enterprise environment requires meeting the often contradictory goals of (1) providing your data scientists, analysts, and data engineers with a self-service consumption model; (2) delivering agile and scalable on-demand infrastructure for the rapidly evolving ecosystem of big data frameworks and application software; while (3) ensuring enterprise-grade capabilities for isolation, security, monitoring, etc.
In this presentation at our BDaaS meetup in Santa Clara, Tom Phelan (chief architect and co-founder of BlueData) reviewed these goals and how to resolve the potential contradictions. He also discussed the infrastructure, application, user experience, security, and maintainability considerations required before selecting (or designing and building) a Big-Data-as-a-Service platform for an enterprise big data deployment.
More info on this BDaaS meetup can be found at: http://www.meetup.com/Big-Data-as-a-Service/events/233999817
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
3 Things to Learn About:
*On-premises versus the cloud
*Design & benefits of real-time operational data in the cloud
*Best practices and architectural considerations
Presentation given for the SQLPass community at SQLBits XIV in Londen. The presentation is an overview about the performance improvements provided to Hive with the Stinger initiative.
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
Cloudera Enterprise BDR delivers centralized disaster recovery for data and metadata, enabling you to prepare for disaster by moving data to your secondary site automatically. Cloudera Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing access to all data stored in Hadoop, which are critical for customers that are in highly regulated industries and have stringent compliance requirements.
This presentation will teach you how to:
- Centrally configure and manage replication workflows for files (HDFS) and metadata (Hive)
- Consistently meet or exceed SLAs and RTOs through simplified management and process automation
- Track access permissions and actual accesses to all data objects in Hive, HBase, and HDFS
- Answer the questions:
- Who has access to which data object(s)
- Which data objects were accessed by a user
- When was a data object accessed and by whom
- What data assets were accessed using a service
- Which device was used to access
The Fundamentals Guide to HDP and HDInsightGert Drapers
This session will give you the architectural overview and introduction in to inner workings of HDP 2.0 (http://hortonworks.com/products/hdp-windows/) and HDInsight. The world has embraced the Hadoop toolkit to solve their data problems from ETL, data warehouses to event processing pipelines. As Hadoop consists of many components, services and interfaces, understanding its architecture is crucial, before you can successfully integrate it in to your own environment.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but the overall cluster configuration can present some unespected issues that can compromise performances and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them. The presented use cases will refer to DL4J and ND4J on different Spark deployment modes (standalone, YARN, Kubernetes). The reference programming language for any code example would be Scala, but no preliminary Scala knowledge is mandatory in order to better understanding the presented topics.
There is increased interest in using Kubernetes, the open-source container orchestration system for modern, stateful Big Data analytics workloads. The promised land is a unified platform that can handle cloud native stateless and stateful Big Data applications. However, stateful, multi-service Big Data cluster orchestration brings unique challenges. This session will delve into the technical gaps and considerations for Big Data on Kubernetes.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between on-premises servers, cloud instances, and across data centers. Organizations have embarked on this journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Hadoop, Spark and to lesser extent, noSQL platforms such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts...
Speakers
Anant Chintamaneni, VP Products, BlueData
Nanda Vijaydev, Director Solutions, BlueData
John Sing's Edge 2013 presentation, detailing when/where/how external storage products and/or system software (i.e. GPFS) can be effectively used in a Hadoop storage environment. Many Hadoop situations absolutely required direct attached storage. However, there are many intelligent situations where shared external storage may make sense in a Hadoop environment. This presentation details how/why/where, and promotes taking an intelligent, Hadoop-aware approach to deciding between internal storage and external shared storage. Having full awareness of Hadoop considerations is essential to selecting either internal or external shared storage in Hadoop environment.
VMworld 2013
Chris Greer, FedEx
Richard McDougall, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Hadoop has traditionally been an on-premises workload, with very few notable implementations on the cloud. With Organizations either having jumped on the cloud bandwagon or have started planning their expansion into the ecosystem, it is imperative for us to explore how Hadoop conforms to the cloud paradigm. With the coming off age of some very useful cloud paradigms and the nature of Big Data with high seasonality of workloads, this is becoming a very common ask from customers. Robust architectures, elastic scale, open platforms, OSS integrations, and addressing complex pain points will all be part of this lively talk. To be able to implement effective solutions for Big Data in the cloud it is imperative that you understand the core principles and grasp the design principles of how the cloud can enhance the benefits of parallelized analytics. Join this session to understand the nitty-gritties of implementing Big Data in the cloud and the various options therein. Big Data + Cloud is definitely a deadly combination.
These slides provide highlights of my book HDInsight Essentials. Book link is here: http://www.packtpub.com/establish-a-big-data-solution-using-hdinsight/book
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
This book gives a quick introduction to Hadoop-like problems, and gives a primer on the real value of HDInsight. Next, it will show how to set up your HDInsight cluster.
Then, it will take you through the four stages: collect, process, analyze, and report.
For each of these stages you will see a practical example with the working code.
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.
This webinar will show you examples of how to use Amazon EMR to with the MapR Distribution for Hadoop. You will learn how you can free yourself from the heavy lifting required to run Hadoop on-premises, and gain the advantages of using the cloud to increase flexibility and accelerate projects while lowering costs.
What we'll learn:
• See a live demonstration of how you can quickly and easily launch your first Hadoop cluster in a few steps.
• Examples of real world applications and customer successes in production
• Best practices for maximizing the benefits of using MapR with AWS.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
ETL is the process of extracting data from one location, transforming it, and loading it into a different location, often for the purposes of collection and analysis. As Hadoop becomes a common technology for sophisticated analysis and transformation of petabytes of structured and unstructured data, the task of moving data in and out efficiently becomes more important and writing transformation jobs becomes more complicated. Talend provides a way to build and automate complex ETL jobs for migration, synchronization, or warehousing tasks. Using Talend's Hadoop capabilities allows users to easily move data between Hadoop and a number of external data locations using over 450 connectors. Also, Talend can simplify the creation of MapReduce transformations by offering a graphical interface to Hive, Pig, and HDFS. In this talk, Cédric Carbone will discuss how to use Talend to move large amounts of data in and out of Hadoop and easily perform transformation tasks in a scalable way.
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
Alluxio Tech Talk
Dec 10, 2019
Chris Crosbie and Roderick Yao from the Google Dataproc team and Dipti Borkar of Alluxio will demo how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. They’ll also show how to run Dataproc Spark against a remote HDFS cluster.
For more Alluxio events: https://www.alluxio.io/events/
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
VMworld 2013
Jayanth Gummaraju, VMware
Sasha Kipervarg, Identified, Inc.
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
At VMware Corporate IT Data Solution and Delivery Team , we have built the Enterprise Advance Data Analytic Platform on Top of vSphere 6.0 with VMware BigData Extension, Isilon HDFS, Pivotal HD 3.0, Spring XD 1.2 and Alpine Data Lab
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
Hadoop and Spark are big data frameworks used to extract useful span a variety of scenarios from ingestion, data prep, data management, processing, analyzing and visualizing data. Each step requires specialized toolsets to be productive. In this talk I will share solution examples in the Big Data ecosystem such as Cask, StreamSets, Datameer, AtScale, Dataiku on Microsoft’s Azure HDInsight that simplify your Big Data solutions. Azure HDInsight is a cloud Spark and Hadoop service for the enterprise and take advantage of all the benefits of HDInsight giving you the best of both worlds. Join this session for practical information that will enable faster time to insights for you and your business.
Fundamentals of Big Data, Hadoop project design and case study or Use case
General planning consideration and most necessaries in Hadoop ecosystem and Hadoop projects
This will provide the basis for choosing the right Hadoop implementation, Hadoop technologies integration, adoption and creating an infrastructure.
Building applications using Apache Hadoop with a use-case of WI-FI log analysis has real life example.
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around HP and Hortonworks Data Platform to get you started on building your modern data architecture.
Learn how to:
- Leverage best practices for deployment
- Choose a deployment model
- Design your Hadoop cluster
- Build a Modern Data Architecture and vision for the Data Lake
Similar to VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions (20)
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...VMworld
This year VMware vSphere 6 combined with vRealize Operations 6.1 (vR Ops 6) adds critical features to increase technical agility in the infrastructure, and reduce Mean time to Repair. With a new Automated remediation action framework in vR Ops, vSphere 6’s ability to vMotion Physical Raw Device mappings (RDMs), and a complete Management Pack Ecosystem for monitoring Infrastructure to applications, administrators have the tools needed to get to maintain 5 9’s uptime, shorten Mean Time to Repair (MTTR), and predict capacity requirements as and when the business requires.. This session will be a deep technical explanation, and live demonstration of these tools. It will give administrators a solid understanding of how they can use these tools to monitor and manage their application clusters, keep applications running during Infrastructure maintenance, and get deep holistic visibility into the entire Application ecosystem, from Storage to Networking.
VMworld 2015: Advanced SQL Server on vSphereVMworld
Microsoft SQL Server is one of the most widely deployed “apps” in the market today and is used as the database layer for a myriad of applications, ranging from departmental content repositories to large enterprise OLTP systems. Typical SQL Server workloads are somewhat trivial to virtualize; however, business critical SQL Servers require careful planning to satisfy performance, high availability, and disaster recovery requirements. It is the design of these business critical databases that will be the focus of this breakout session. You will learn how build high-performance SQL Server virtual machines through proper resource allocation, database file management, and use of all-flash storage like XtremIO. You will also learn how to protect these critical systems using a combination of SQL Server and vSphere high availability features. For example, did you know you can vMotion shared-disk Windows Failover Cluster nodes? You can in vSphere 6! Finally, you will learn techniques for rapid deployment, backup, and recovery of SQL Server virtual machines using an all-flash array.
VMworld 2015: Virtualize Active Directory, the Right Way!VMworld
Active Directory Domain Services (ADDS) allows organizations to deploy a scalable and secure directory service for managing users, resources and applications. Virtualization of ADDS has been supported for many years now, however has required careful management to avoid pitfalls around replication, time management, and access. Windows Server 2012 provides greater support for virtualization by including virtualization-safe features and support for rapid domain controller deployment.
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...VMworld
Policy based management greatly simplifies the work of IT Administrators making it easy to ensure that applications and VMs receive the resources, protection and functionality required. Learn about the latest enhancements of Site Recovery Manager in this space, which represent a huge step towards providing policy based DR. In this session we'll dive deep into how this approach works and how to work with them.
Not content to simply describe the Virtual Volume (VVOL) framework, this session instead examines practical use cases: How different configurations and workloads benefit from VVOLs. Learn how Storage Policy Based Management (SPBM) couples with VVOLs to provide VM configuration options not previously available. We demonstrate a handful of real-life scenarios, specifically covering how VVOLs benefits oversubscribed systems, disaster recovery preparation and multi-tenant requirements for customers. Specific configuration options and constraints are covered in detail, including how they work with underlying storage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions
1. Big Data Platform Building Blocks: Serengeti,
Resource Management,
and Virtualization Extensions
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
VAPP5762
#VAPP5762
2. 22
Agenda
Big Data, Hadoop, and What It Means to You
The VMware Big Data Platform
• Operate Clusters Simply
• Share Infrastructure Efficiently
• Leverage Existing Investment
Pivotal and VMware: Partnering to Virtualize Hadoop
Conclusion and Q&A
4. 44
What is Hadoop?
Framework that allows for distributed processing of large data sets
across clusters of commodity servers
• Store large amount of data
• Process the large amount of data stored
Inspired by Google’s MapReduce and Google File System (GFS)
papers
Apache Open Source Project
• Initial work done at Yahoo! starting in 2005
• Open sourced in 2009 there is now a very active open source community
5. 55
What is Hadoop?
Storage & Compute in One Framework
Open Source Project of the Apache Software Foundation
Java-intensive programming required
HDFS MapReduce
Two Core Components
Scalable storage in
Hadoop Distributed
File System
Compute via the
MapReduce distributed
processing platform
6. 66
Why Hadoop?
HDFS provides cheap and reliable storage on commodity hardware
In-place data analysis, rather than moving from file systems to data
warehouses
Ability to analyze structured and unstructured data
Enables better business decisions from more types of data at
higher speeds and lower costs
7. 77
Use Case: Data Warehouse Augmentation / Offload
Challenges
• Existing EDW used for low value and resource consuming ETL process
• Planned growth will far exceed compute capacity
• Hard to do analytics or even basic reporting on EDW system
Objectives
• Reduce EDW Total Cost of Ownership
• Enable longer data retention to enable analytics and accelerate time to market
• Migrate ETL off EDW to free up compute resources
8. 88
Use Case: Retailer Trend Analysis
Deep Historical Reporting for Retail Trends:
• Credit card company loads 10 years of data for all retailers (100’s of TB’s)
• Run Map/Reduce Job develop historical picture of retailers in a specific area
• Load results from Hadoop into data warehouse and further analyze with
standard BI/statistics packages
Why do this in Hadoop?
• Ability to store years of data cost effectively
• Data available for immediate recall (not on tapes or flat files)
• No need to ETL/normalize the data
• Data exists in its valuable, original format
• Offload intensive computation from DW
• Ability to combine structured and unstructured data
14. 1414
The Big Data Journey in the Enterprise
Stage 3: Cloud Analytics Platform
• Serve many departments
• Often part of mission critical workflow
• Fully integrated with analytics/BI tools
Stage1: Hadoop Piloting
• Often start with line of business
• Try 1 or 2 use cases to explore
the value of Hadoop
Stage 2: Hadoop Production
• Serve a few departments
• More use cases
• Growing # and size of clusters
• Core Hadoop + components
10’s 100’s0 node
Integrated
Scale
Standalone
15. 1515
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
16. 1616
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
Hadoop
production
Hadoop
production
Hadoop
experimentation
17. 1717
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
HBase
Hadoop
production
SQL on Hadoop
HAWQ, Impala, Drill
NoSQL
Cassandra
Mongo
Other
Spark
Shark
Solr
Platfora
18. 1818
Benefits of Virtualization at Each Stage
Stage 3: Cloud Analytics Platform
Mixed workloads
Right tool at the right time
Flexible and elastic infrastructure
Stage1: Hadoop Piloting
Rapid deployment
On the fly cluster resizing
Flexible config
Automation of cluster lifecycle
Stage 2: Hadoop Production
High Availability
Consolidation
Tiered SLAs
Elastic Scaling
10’s 100’s0 node
Integrated
Scale
Standalone
20. 2020
Big Data Initiatives at VMware
Serengeti
vSphere
Resource
Management
Hadoop
Virtualization
Extensions
Virtualization changes
for core Hadoop
Contributed back to
Apache Hadoop
Advanced resource
management on vSphere
Big Data applications-specific
extension to DRS
Open source project
Tool to simplify virtualized Hadoop
deployment & operations
21. 2121
Clustered Workload Management: The Next Frontier
ESXi
Serengeti
Hadoop
Management
Virtualization
vCenter
Source: http://www.conferencebike.com/image/generated/792.png
22. 2222
Serengeti Project History
Serengeti 0.5
June 2012
Serengeti 0.6
August 2012
Serengeti 0.7
October 2012
Serengeti 0.8
April 2013
• Hadoop in
10 min
• Highly
Available
Hadoop
• Time to
insight
• Configuring
Hadoop
• Compute
elasticity
• Configuring
placement
and topology
• HBase
• MapR
• CDH4
• Performance
best
practices
Serengeti 0.9/
BDE Beta
June 2013
• Integrated
GUI
• Automatic
elasticity
• YARN/
Pivotal HD
23. 2323
Big Data Extensions: Serengeti-vCenter Integration
ESXi
Hadoop
Management
Virtualization
Big Data Extensions + vCenter
26. 2626
What Does Nick Think About Hadoop?
I don’t want to be the
bottleneck when it
comes to provisioning
Hadoop clusters
I need sizing flexibility,
because my Hadoop users
don’t know how large of a
cluster they need
I want to establish a
repeatable process for
deploying Hadoop
clusters
I don’t really know
that much about
Hadoop
I want to better manage
the jumble of LOB
Hadoop clusters in my
enterprise
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
27. 2727
Choose Your Own Adventure
Source: http://www.vintagecomputing.com/wp-content/images/retroscan/supercomputer_cyoa_large.jpg
29. 2929
Deploy Hadoop Clusters in Minutes
Hadoop Installation and
Configuration
Network Configuration
OS installation
Server preparation
From manual process To fully automated, using the GUI
30. 3030
How It Works
BDE is packaged as a virtual appliance, which can be easily
deployed on vCenter
BDE works as a vCenter extension and establishes SSL connection
with vCenter
BDE clones VMs from the template and controls/configures VMs
through vCenter
Host Host Host Host Host
Virtualization Platform
Hadoop
Node
Hadoop
Node
vCenter
Management
Server
Template
Virtual Appliance VM Cloning
31. 3131
User-specified Customizations Using Cluster Specification File
Storage configuration
Choice of shared or local
High Availability option
Number of nodes and
resource configuration
VM placement policies
36. 3636
Production
Test
Experimentation
Dept A: recommendation engine Dept B: ad targeting
Production
Test
Experimentation
Log files
Social dataTransaction data Historical cust behavior
Pain Points:
1. Cluster sprawl
2. Redundant common data in
separate clusters
3. Inefficient use of resources. Some
clusters could be running at
capacity while other clusters are
sitting idle
NoSQL Real time SQL …
On the horizon…
Challenges of Running Hadoop in the Enterprise
38. 3838
What Other Things Does Nick Think About Hadoop?
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
I want to scale out
when my workload
requires it
My Hadoop users
ask for large
Hadoop clusters,
which end up
underutilized
I want to offer
Hadoop-as-a-
Service in my
private cloud
I want to get all
Hadoop clusters
into a centralized
environment to
minimize spend
39. 3939
Achieving Multi-tenancy
Resource Isolation
• Control the greedy noisy neighbor
• Reserve resources to meet needs
Version Isolation
• Allow concurrent OS, App, Distro versions
Security Isolation
• Provide privacy between users/groups
• Runtime and data privacy required
Host Host Host Host Host Host
VMware vSphere + Serengeti
Host
40. 4040
Combined
Storage/
Compute
VM
Hadoop in VM
VM lifecycle
determined
by Datanode
Limited elasticity
Limited to Hadoop
Multi-Tenancy
Storage
Compute
VM
VM
Separate Storage
Separate compute
from data
Elastic compute
Enable shared
workloads
Raise utilization
Storage
T1 T2
VM
VM
VM
Separate Compute Tenants
Separate virtual clusters
per tenant
Stronger VM-grade security
and resource isolation
Enable deployment of
multiple Hadoop runtime
versions
Slave Node
Separating Hadoop Data and Compute for Elasticity
41. 4141
Dynamic Hadoop Scaling
Deploy separate compute clusters for different tenants
sharing HDFS
Commission/decommission compute nodes according to priority
and available resources
ExperimentationDynamic resourcepool
Data layer
Production
recommendation engine
Compute layer Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Experimentation Production
Compute
VM
Job
Tracker
Job
Tracker
VMware vSphere + Serengeti
46. 4646
What Is Nick Still Thinking About Hadoop?
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
I want to use my
existing
infrastructure, not
buy new hardware
I want to leverage
the tools I already
have
Hadoop on
Amazon is costing
too much
My data is in
shared storage; do
I have to move it?
I want a low-risk
way of trying
Hadoop
47. 4747
Use Storage That Meets Your Needs
SAN Storage
$2 - $10/Gigabyte
$1M gets:
0.5 Petabytes
200,000 IOPS
8Gbyte/sec
NAS Filers
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
200,000 IOPS
10Gbyte/sec
Local Storage
$0.05/Gigabyte
$1M gets:
10 Petabytes
400,000 IOPS
250 Gbytes/sec
48. 4848
Leveraging Isilon as External HDFS
Time to results: Analysis of data in place
Lower risk using vSphere with Isilon
Scale storage and compute independently
Data Layer – Hadoop on Isilon
Elastic Virtual Compute Layer
49. 4949
Hybrid Storage Model to Get the Best of Both Worlds
Master nodes:
• NameNode, JobTracker on
shared storage
• Leverage vSphere vMotion, HA
and FT
Slave nodes
• TaskTracker, DataNode on local
storage
• Lower cost, scalable bandwidth
Local StorageShared Storage
50. 5050
Achieving HA for the Entire Hadoop Stack
Battle-tested HA technology
Single mechanism to achieve HA for the entire Hadoop stack
Simple to enable HA/FT
HDFS
(Hadoop Distributed File System)
HBase (Key-Value store)
MapReduce (Job Scheduling/Execution System)
Pig (Data Flow) Hive (SQL)
BI ReportingETL Tools
ManagementServer
Zookeepr(Coordination)
HCatalog
RDBMS
Namenode
Jobtracker
Hive MetaDB Hcatalog MDB
Server
51. 5151
Leveraging Other VMware Assets
Monitoring with vCenter Operations Manager
• Gain comprehensive visibility
• Eliminate manual processes with intelligent automation
• Proactively manage operations
Future: vCloud Automation Center, Software-defined Storage
52. 5252
Get Maximum Value from Existing Tools and Infrastructure
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
HBase
Hadoop
production
SQL on Hadoop
HAWQ, Impala, Drill
NoSQL
Cassandra
Mongo
Other
Spark
Shark
Solr
Platfora
54. 5454
Virtualization Benefits
Multi-tenancy (users, business units) with strong vSphere-based
isolation
Multiple big data applications and compute engines can access
common HDFS data
Agility to scale Hadoop nodes at run-time
Provide On-Demand Hadoop / Hadoop as a Service
55. 5555
Busting Myths About Virtual Hadoop
Virtualization will add significant
performance overhead
Virtual Hadoop performance
is comparable to bare metal
Hadoop cannot work
with shared storage
Shared storage is a valid choice,
especially for smaller clusters
Virtualization necessitates
the use of shared storage
Shared storage is useful for HA, but
virtual Hadoop on DAS is very common
Hadoop distribution vendors don’t
support virtual implementations
Pivotal HD is jointly tested, certified, and
supported on vSphere
Source: http://www.psychologytoday.com/files/u637/good-grief-charlie-brown.jpg, http://images2.wikia.nocookie.net/__cb20101130042247/peanuts/images/6/6d/Joe-cool-1-.jpg
58. 5858
You Need Hadoop Virtual Extensions
Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
59. 5959
Hadoop Virtual Extensions
Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
60. 6060
Current Hadoop Network Topology Not Virtualization Aware
H1 H2 H3
R1
H4 H5 H6
R2
H7 H8 H9
R3
H10 H11 H12
R4
D1 D1
/
• D = data center
• R = rack
• H = host
Multiple replicas may end up on same Physical Host in Virtual
environments
61. 6161
HVE Adds a New Layer in Hadoop Network Topology
• D = data center
• R = rack
• NG = node group
• HG = node
N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12
R1 R2 R3 R4
D1 D2
/
NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
62. 6262
“Virtualization Aware” Replica Placement Policy During Write
Updated Policies:
• No replicas are placed on the
same node or nodes under
the same node group
• 1st replica is on the local
node or one of nodes under
the same node group of the
writer
• 2nd replica is on a remote
rack of the 1st replica
• 3rd replica is on the same
rack as the 2nd replica
• Remaining replicas are
placed randomly across rack
to meet minimum restriction
63. 6363
Hadoop Virtual Extensions
Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
64. 6464
HVE Achieves Vertical Scaling of Hadoop Nodes
VM’s boundary is elastic already
• VM resource type: reserved (low limit) and maximum (up limit)
• If resource is tight, VMs compete for resource (between reserved and
maximum) based on shares
• “Stealing” resources without notifying Apps sometimes cause very bad
performance
• Thus, need to figure out a way to make app-aware resource change
Current Hadoop resource schedulers are static
• MRV1 – slots
• YARN – resources (Memory for now, YARN-2 will include CPUs)
HVE Elasticity patches
• Enable flexible resource model for each
Hadoop node
• Change resources at runtime
65. 6565
Pivotal HD is the Best Suited for Virtualization
Only distribution that ships with VMware Hadoop Virtual
Extensions (HVE)
• Fully tested
• Ensures proper HDFS replication placement on vSphere
• Improves MapReduce performance through better data locality on vSphere
• Allows dynamic scaling of Hadoop Compute Nodes
Certified on vSphere
VMware Serengeti deploys and scales Pivotal HD on vSphere out-
of-box
• Only YARN based distribution supported by Serengeti
71. Big Data Platform Building Blocks: Serengeti,
Resource Management,
and Virtualization Extensions
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
VAPP5762
#VAPP5762