This document discusses how Serengeti can be used to automate the deployment and management of Hadoop clusters on VMware vSphere. Some key points:
- Serengeti is a virtual appliance that can be deployed on vSphere and automates the provisioning of Hadoop clusters within 10 minutes from templates.
- It allows separating storage and compute by deploying Hadoop data nodes on shared storage and compute nodes as VMs for better elasticity and utilization.
- Serengeti supports elastic scaling of Hadoop clusters, multi-tenancy by isolating tenant workloads, and live configuration changes with rolling upgrades and no downtime.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. With Amazon RDS, you can MySQL in minutes with cost-efficient and re-sizable hardware capacity. In this webinar, we'll discuss how to get the most out of the service, including techniques for migrating data in and out.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. With Amazon RDS, you can MySQL in minutes with cost-efficient and re-sizable hardware capacity. In this webinar, we'll discuss how to get the most out of the service, including techniques for migrating data in and out.
Powerpoint file(incl. animations!): http://db.tt/oQiXb9lq
This is the slides of the presentation "Wordpress optimization" who presented at WordCamp 2013.
How to improve your wordpress performance and speed up your website more than 700% faster!
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
This talk will focus on the journey we in the Arm Treasure Data hadoop team is on to simplify and automate how we deploy hadoop. In Arm Treasure Data, up to recently we were running hadoop clusters in two clouds. Due to fast increase of deployments into more sites, the overhead of manual operations has started to strain us. Due to this, we started a project last year to automate and simplify how we deploy using tools like AWS autoscaling groups. Steps we have taken so far are modernize and standardize instance types, moved from manually executed deployment scripts to api triggered work flows, actively working to deprecate chef in favor of debian packages and AWS Codedeploy. We have also started to automate a lot of operations that up to recently were manual, like scaling in and out clusters, and routing traffic between clusters. We also started simplify health check and node snapshotting. And our goal of the year is close to fully automated cluster operations.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
Breaking IO Performance Barriers: Scalable Parallel File System for AWSAmazon Web Services
Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and large-scale manufacturing in the aerospace and automotive sectors. As HPC-powered simulations continue to grow ever larger and more complex, scientists are looking for cost-effective high performance compute resources that's available when they need it. Access to on-demand infrastructure allows opportunities to experiment and try new speculative models. AWS provides computing infrastructure that allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Driven by its flexibility and affordability, many HPC and big data workloads are transitioning from on premise entirely onto AWS.
But like on-premises HPC, maximizing application of ""HPC cloud"" workloads requires fast and highly scalable storage.
Intel® Cloud Edition for Lustre Software has been purpose-built for use with the dynamic computing resources available from Amazon Web Services to provide the fast, massively scalable storage software resources needed to accelerate performance, even on complex workloads.
What is Trove, the Database as a Service on OpenStack?OpenStack_Online
Trove was integrated into the IceHouse release of OpenStack to provision and manage databases in an OpenStack Cloud. With Trove developers can spin up a database instance on-demand in an instant.
Please sign up for upcoming OpenStack Online Meetups: http://www.meetup.com/OpenStack-Online-Meetup/
Administering a Hadoop cluster isn't easy. Many Hadoop clusters suffer from Linux configuration problems that can negatively impact performance. With vast and sometimes confusing config/tuning options, it can can tempting (and scary) for a cluster administrator to make changes to Hadoop when cluster performance isn't as expected. Learn how to improve Hadoop cluster performance and eliminate common problem areas, applicable across use cases, using a handful of simple Linux configuration changes.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Aerospike
From financial services, to digital advertising, omni-channel marketing and retail, companies are pushing to grow revenue by personalizing the customer experience in real-time based on knowing what they care about, where they are, and what they are doing now. For growing numbers of these businesses, this means developing applications that combine the historical analysis provided by Hadoop with real-time analysis through Storm and within NoSQL databases, themselves. This session will examine the design considerations and development approaches for successfully delivering interactive applications that incorporate real-time and batch analysis using a combination of Hadoop, Storm and NoSQL. Key topics will include:· A review of the respective roles that Hadoop, Storm and NoSQL databases play.· Considerations in choosing which technology to use in areas where their capabilities overlap.· An overview of a typical solution architecture.· Strategies for addressing the diverse data types required for providing a complete view of the customers.· Approaches to managing large data types to ensure reliable real-time responses.Throughout the discussion, concepts will be illustrated by use cases of businesses that have implemented real-time applications using Hadoop, Storm and NoSQL, which are in production today.
This presentation was given at the 2014 NoSQL Matters conference in Cologne, Germany.
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
This session will introduce the basics of primary storage in CloudStack. Additionally, I discuss the challenges of guaranteeing storage performance in a cloud and how by leveraging the latest enhancements to CloudStack, storage administrators can deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. I'll review the CloudStack enhancements in detail, outline the management benefits they provide and discuss common go-to-market approaches.
About Mike Tutkowski
Mike Tutkowski, a member of the CloudStack PMC, develops software for the Apache Software Foundation's CloudStack project to help drive improvements in its storage component and to integrate SolidFire more deeply into the product.
Presentation at March 2019 Dutch Postgres User Group Meetup on lessons learnt while migrating from Oracle to Postgres, demo'ed via vagrant test environments and using generic pgbench datasets.
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...NETWAYS
Physical, virtual, containers. Public cloud, private cloud, hybrid cloud. IaaS, PaaS, SaaS. These are the choices that we're faced with when architecting a datacenter of today. And the choice is not one or the other; it is often a combination of many of these. How do we remain in control of our datacenters? How do we deploy and configure software, manage change across disparate systems, and enforce policy/security? How do we do this in a way that operations engineers and developers alike can rejoice in the processes and workflow?
In this talk, I will discuss the problems faced by the modern datacenter, and how a set of open source tools including Vagrant, Packer, Consul, and Terraform can be used to tame the rising complexity curve and provide solutions for these problems.
Powerpoint file(incl. animations!): http://db.tt/oQiXb9lq
This is the slides of the presentation "Wordpress optimization" who presented at WordCamp 2013.
How to improve your wordpress performance and speed up your website more than 700% faster!
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
This talk will focus on the journey we in the Arm Treasure Data hadoop team is on to simplify and automate how we deploy hadoop. In Arm Treasure Data, up to recently we were running hadoop clusters in two clouds. Due to fast increase of deployments into more sites, the overhead of manual operations has started to strain us. Due to this, we started a project last year to automate and simplify how we deploy using tools like AWS autoscaling groups. Steps we have taken so far are modernize and standardize instance types, moved from manually executed deployment scripts to api triggered work flows, actively working to deprecate chef in favor of debian packages and AWS Codedeploy. We have also started to automate a lot of operations that up to recently were manual, like scaling in and out clusters, and routing traffic between clusters. We also started simplify health check and node snapshotting. And our goal of the year is close to fully automated cluster operations.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
Breaking IO Performance Barriers: Scalable Parallel File System for AWSAmazon Web Services
Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and large-scale manufacturing in the aerospace and automotive sectors. As HPC-powered simulations continue to grow ever larger and more complex, scientists are looking for cost-effective high performance compute resources that's available when they need it. Access to on-demand infrastructure allows opportunities to experiment and try new speculative models. AWS provides computing infrastructure that allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Driven by its flexibility and affordability, many HPC and big data workloads are transitioning from on premise entirely onto AWS.
But like on-premises HPC, maximizing application of ""HPC cloud"" workloads requires fast and highly scalable storage.
Intel® Cloud Edition for Lustre Software has been purpose-built for use with the dynamic computing resources available from Amazon Web Services to provide the fast, massively scalable storage software resources needed to accelerate performance, even on complex workloads.
What is Trove, the Database as a Service on OpenStack?OpenStack_Online
Trove was integrated into the IceHouse release of OpenStack to provision and manage databases in an OpenStack Cloud. With Trove developers can spin up a database instance on-demand in an instant.
Please sign up for upcoming OpenStack Online Meetups: http://www.meetup.com/OpenStack-Online-Meetup/
Administering a Hadoop cluster isn't easy. Many Hadoop clusters suffer from Linux configuration problems that can negatively impact performance. With vast and sometimes confusing config/tuning options, it can can tempting (and scary) for a cluster administrator to make changes to Hadoop when cluster performance isn't as expected. Learn how to improve Hadoop cluster performance and eliminate common problem areas, applicable across use cases, using a handful of simple Linux configuration changes.
Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Aerospike
From financial services, to digital advertising, omni-channel marketing and retail, companies are pushing to grow revenue by personalizing the customer experience in real-time based on knowing what they care about, where they are, and what they are doing now. For growing numbers of these businesses, this means developing applications that combine the historical analysis provided by Hadoop with real-time analysis through Storm and within NoSQL databases, themselves. This session will examine the design considerations and development approaches for successfully delivering interactive applications that incorporate real-time and batch analysis using a combination of Hadoop, Storm and NoSQL. Key topics will include:· A review of the respective roles that Hadoop, Storm and NoSQL databases play.· Considerations in choosing which technology to use in areas where their capabilities overlap.· An overview of a typical solution architecture.· Strategies for addressing the diverse data types required for providing a complete view of the customers.· Approaches to managing large data types to ensure reliable real-time responses.Throughout the discussion, concepts will be illustrated by use cases of businesses that have implemented real-time applications using Hadoop, Storm and NoSQL, which are in production today.
This presentation was given at the 2014 NoSQL Matters conference in Cologne, Germany.
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
This session will introduce the basics of primary storage in CloudStack. Additionally, I discuss the challenges of guaranteeing storage performance in a cloud and how by leveraging the latest enhancements to CloudStack, storage administrators can deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. I'll review the CloudStack enhancements in detail, outline the management benefits they provide and discuss common go-to-market approaches.
About Mike Tutkowski
Mike Tutkowski, a member of the CloudStack PMC, develops software for the Apache Software Foundation's CloudStack project to help drive improvements in its storage component and to integrate SolidFire more deeply into the product.
Presentation at March 2019 Dutch Postgres User Group Meetup on lessons learnt while migrating from Oracle to Postgres, demo'ed via vagrant test environments and using generic pgbench datasets.
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...NETWAYS
Physical, virtual, containers. Public cloud, private cloud, hybrid cloud. IaaS, PaaS, SaaS. These are the choices that we're faced with when architecting a datacenter of today. And the choice is not one or the other; it is often a combination of many of these. How do we remain in control of our datacenters? How do we deploy and configure software, manage change across disparate systems, and enforce policy/security? How do we do this in a way that operations engineers and developers alike can rejoice in the processes and workflow?
In this talk, I will discuss the problems faced by the modern datacenter, and how a set of open source tools including Vagrant, Packer, Consul, and Terraform can be used to tame the rising complexity curve and provide solutions for these problems.
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
The presentation describes CI environment for our product - Maxscale - database proxy server. To test such product we need a setup that consists of tens of machines: locally hosted virtual machines as well as machines from different clouds. All our Jenkins jobs are implemented in the form of Jenkins Job Builder code. Presentations also tells about our tool to manage virtual machines (wrapper over Vagrant)- MDBCI.
An introduction to the basics of primary storage in CloudStack, including a discussion of the challenges of guaranteeing storage performance in a cloud. Learn how to leverage the latest enhancements to CloudStack to enable storage administrators to deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. View now for a detailed look at CloudStack enhancements, the management benefits they provide, and common go-to-market approaches.
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016ManageIQ
ManageIQ for VMware vSphere by Adam Grare at ManageIQ Design Summit 2016
VDS demo: https://youtu.be/jXdTR57wFkw
Reconfigure demo: https://youtu.be/LM467glp2LI
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Get your Hadoop cluster in minutes
Hadoop Installation and
Configuration
Network Configuration
OS installation
Server preparation
Manual process, cost days
Fully automated process,
10 minutes to get a
Hadoop/HBase cluster from
scratch
1/1000 human efforts,
Least Hadoop operation knowledge
Automate by Serengeti on
vSphere with best practice
3. Serengeti deployment architecture
• Serengeti is packaged as virtual appliance, which can be easily
deployed on VC.
• Serengeti works as a VC extension and establishes SSL connection
with VC.
• Serengeti will clone VM from template and control/config VM through
VC.
4. Storage
Evolution of Hadoop on VMs – Data/Compute separation
Compute
Current
Hadoop:
Combined
Storage/Com
pute
Storage
T1 T2
VM VM VM
VMVM
VM
Hadoop in VM
- * VM lifecycle
determined
by Datanode
- * Limited elasticity
Separate Storage
- * Separate compute
from data
- * Remove elastic constrain
- by Datanode
- * Elastic compute
- * Raise utilization
Separate Compute Clusters
- * Separate virtual compute
- * Compute cluster per tenant
- * Stronger VM-grade security
and resource isolation
Slave Node
5. Elastic Scalability & Multi-Tenancy
Deploy separate compute clusters for different tenants sharing HDFS.
Commission/decommission compute nodes according to priority and
available resources
ExperimentationDynamic resourcepool
Data layer
Production
recommendation engine
Compute layer Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Experimentation Production
Compute
VM
Job
Tracker
Job
Tracker
VMware vSphere + Serengeti
7. Rapid Deployment of a Hadoop/HBase Cluster with Serengeti
Done
Step 1: Deploy Serengeti virtual appliance on vSphere.
Step 2: A few clicks to stand up Hadoop Cluster.
8. Customizing your Hadoop/HBase cluster with Serengeti
Choice of distros
Storage configuration
• Choice of shared storage or Local disk
Resource configuration
High availability option
# of nodes
…
"distro":"apache",
"groups":[
{ "name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage": {
"type": "SHARED",
"sizeGB": 20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…
9. Cluster creation workflow – VM creation
VM placement
Calculation
UI
CLI
Create cluster request
Host
Host
TT
DN
TT
Cluster Spec
{
groups”:[
“name”:
“roles”:
"placementPolicies": {
}
]
}
VC
DN
Query
resource
Serengeti
Web Service
VM Creation
Template VM Host
DN
TT
Query resource
Clone VM
Add disk
Configure VM
1
2
4
Clone VM
Clone VM
Add disk
Configure VM
Analyze
spec
3
10. Workflow - Hadoop Package Deployment
Serengeti Server
Package Server
Hadoop Nodes
Admin
1) download
hadoop tarballs or
create yum repo on
Package Server
2) config tarball urls
or yum repo urls for
each distro in
manifest file
3) run ‘cluster
create’ to create a
cluster for a hadoop
distro; save tarball
urls or yum repo
urls in Chef Server.
4) remotely ssh to Hadoop nodes
and execute chef-client
chef-client
5) read tarball urls or yum
repo urls from Chef Server,
then download and extract
hadoop tarballs to
/usr/lib/hadoop/ or yum
install rpms from Package
Server
6) generate hadoop
configuration files on all
nodes
7) start hadoop daemons
on all nodes
simultaneously with
synchronization between
NN, DDs, JT, TTsChef Server
11. Cluster creation workflow – Software installation
Ironfan
Software bootstrap request
Cluster Spec
for Ironfan
"cluster_data": {
"rack_topology_policy":
"NONE",
"groups": [
{
"name":
"ComputeMaster",
"roles": [
"hadoop_jobtracker"
],
"instances": [
{
"name": “sample-
ComputeMaster-0",
……}
}
"distro_package_repos": [
"http://<server
ip>mapr/2.1.3/mapr-
m5.repo"
],
……
DN1
Serengeti
Web Service
1
Analyze
spec
Ironfan
Thrift Service
Chef Server Package Server
Chef Client
TT1
Chef Client
2
Create
Chef
Nodes
SSH to
start chef
client
3
4
Login to Chef
server
Download
cookbook
REST API
5 5Execute
cookbook
DataNode
cookbook
TaskTracker
cookbook
Download bits
Hadoop
binary
Pig, Hive,
etc.
6
12. Cluster creation workflow – Software installation - continued
Ironfan
Software bootstrap request
DN1
Serengeti
Web Service
Ironfan
Thrift Service
Chef Server
Chef Client
TT1
Chef Client
7
Get properties
REST API
8 8
Configure Hadoop
Start Hadoop daemons with
synchronization between NN, DDs, JT, TTs
Get
bootstrap
status
Persist
bootstrap
staus
Bootstrap
status query
Serengeti
Web Service
Note: Software installation on all
nodes are executed
simultaneously
13. Configure/reconfigure Hadoop with ease by Serengeti
Modify Hadoop cluster configuration from Serengeti
• Use the “configuration” section of the json spec file
• Specify Hadoop attributes in core-site.xml, hdfs-site.xml, mapred-site.xml,
hadoop-env.sh, log4j.properties
• Apply new Hadoop configuration using the edited spec file
"configuration": {
"hadoop": {
"core-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/core-default.html
},
"hdfs-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/hdfs-default.html
},
"mapred-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html
"io.sort.mb": "300"
} ,
"hadoop-env.sh": {
// "HADOOP_HEAPSIZE": "",
// "HADOOP_NAMENODE_OPTS": "",
// "HADOOP_DATANODE_OPTS": "",
…
> cluster config --name myHadoop --specFile /home/serengeti/myHadoop.json
14. Workflow - Tuning Hadoop Configuration
Serengeti Server Hadoop Nodes
Admin
1) run ‘cluster export’
to export cluster spec
and set hadoop conf
params in the spec.
2) run ‘cluster config’
to apply the new
hadoop configuration
to the whole cluster
or a node group of
the cluster.
3) save new hadoop
configuration into
Chef Server.
4) remotely ssh to hadoop nodes
and execute chef-client
chef-client
5) read hadoop configuration
from Chef Server
6) generate new hadoop
configuration files on all
nodes
7) restart corresponding
hadoop daemons on all
nodes simultaneously to
apply the new configuration
Chef Server
15. Rolling operation
Rolling operation works on one node each time, which does not
impact whole cluster job execution.
Supported functions:
• Cluster scale up/down
• Cluster fix
Workflow
• The workflow for each node is similar to whole cluster operation.
• Only when one node finishes all steps, the other node will start.
• Node will be restarted during the operation.
16. One click to scale out your cluster with Serengeti
17. Easily scale out using Serengeti
Host Host Host Host Host
Virtualization Platform
NN JT
• Use Case:
When the cluster capacity is not big enough
New hardware is available
• Through Serengeti
One click in UI to scale out cluster
worker worker worker worker
Virtualization Platform
18. VC adapter
Leverage VLSI to connect VC
Have VC object cache to improve VC query performance
Listen for VC event
• VM power on, VM power off, VM creation, etc.
• If VM status is changed from VC outside of Serengeti, cluster list can
immediately show the VM status change
19. VM placement - Fine control of DC separation cluster
Constraint number of nodes on each host
Group association:
• Put compute nodes close to data nodes
20. VM placement - Rack aware placement
Balance number of nodes across multiple racks
22. Separated system disk
Host
DN CN
Host
DN CN
System disk
Separated virtual system disks on
specified local storage
System disk
Data disks
Data disks
Separated virtual system disks on
shared storage
23. VHM: Example Architecture
ESX ESX ESX
J
T
DATA VM DATA VM DATA VM
Local Disks
SAN/NAS Non-Hadoop VMs
Hadoop Compute VMs
JT: JobTracker
TT: TaskTracker
NN: NameNode
VHM: Virtual Hadoop Manager
N
N
T
T
T
T
T
T
VirtualCenter Management Server
DRS DRS DRSDRS DRS
V
H
M
Hadoop HDFS VMs
T
T
T
T
T
T
J
T
24. Virtual Hadoop Manager
State, stats
(Slots used,
Pending work)
Commands
(Decommission,
Recommission)
Stats and VM
configuration
Serengeti Job
Tracker
vCenter DB
Manual/Auto
Power on/off
Virtual Hadoop Manager (VHM)
Job
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
vCenter Server
Serengeti
Configuration
VC
state and stats
Hadoop
state and stats
VC
actions
Hadoop
actions
Algorithms
Cluster
Configuration
Simple description for Key modules:
Web service running above Tomcat, is the central controller of cluster management workflow, which is leveraging Spring Batch library.
The VM placement algorithm, disk placement policy are processed in VM placement module in WS layer.
Serengeti is talking with VC through VC adapter layer, which maintain several VC sessions to execute different VC tasks and listen for VC events.
Serengeti is distro neutral, so the hadoop software is installed and configured after VM is created. Open source project Chef and Ironfan are leveraged to install and configure hadoop services.
Chef is a popular distribute software configuration tool.
5. Runtime Manager is responsible for hadoop cluster elasticity control. Serengeti is talking with VHM through rabbitMQ.
Chef Server and Package Server are now deployed in the same VM of Serengeti Server. They can be deployed on separate VMs to support large scale cluster (200+ nodes)
Step 4: chef client connect to chef server, and download cookbook through REST API.
Chef provide flexible software deployment and configuration mechanism, so it’s easy to add more services into Serengeti.
During VM placement, embed several performance improvement configuration based on host and VM CPU/Memory size.
At this stage, you will constantly configuring and reconfigure your cluster to tune for optimal results. With sergenti, this process is very simple. Taking the json spec file I showed earlier, you can specify the various hadoop attributes through xml file and apply these new configuration to the cluster. We will automatically change the hadoop cluster according to your specification, and the changes are propagated to the entire cluster. You don’t need to do reconfigure one node at a time.
Sample Hadoop Configuration:
{
… …
// we suggest running convert-hadoop-conf.rb to generate "configuration" section and paste the output here
"configuration": {
"hadoop": {
"core-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/core-default.html
// note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample:
// "io.file.buffer.size": "4096"
},
"hdfs-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hdfs-default.html
},
"mapred-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/mapred-default.html
},
"hadoop-env.sh": {
// "HADOOP_HEAPSIZE": "",
// "HADOOP_NAMENODE_OPTS": "",
// "HADOOP_DATANODE_OPTS": "",
// "HADOOP_SECONDARYNAMENODE_OPTS": "",
// "HADOOP_JOBTRACKER_OPTS": "",
// "HADOOP_TASKTRACKER_OPTS": "",
// "HADOOP_CLASSPATH": "",
// "JAVA_HOME": "",
// "PATH": ""
},
"log4j.properties": {
// "hadoop.root.logger": "INFO,RFA",
// "log4j.appender.RFA.MaxBackupIndex": "10",
// "log4j.appender.RFA.MaxFileSize": "100MB",
// "hadoop.security.logger": "DEBUG,DRFA"
},
"fair-scheduler.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/fair_scheduler.html
// "text": "the full content of fair-scheduler.xml in one line"
},
"capacity-scheduler.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/capacity_scheduler.html
},
"mapred-queue-acls.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/cluster_setup.html#Configuring+the+Hadoop+Daemons
// "mapred.queue.queue-name.acl-submit-job": "",
// "mapred.queue.queue-name.acl-administer-jobs", ""
}
}
}
}
Not configurable to choose which disk placement rule.
The separated system disk can be configured in cluster spec at node group level as following:
dsNames4System:<ds name used to put system disk>
dsNames4Data:<ds name used to put data disk>
If these attribute is not set, default value will be used.