Large scale data processing for Extract Transform and Loading (ETL) jobs is a very common practice. The stackArmor DevOps team developed a Chef based automation solution to automate the AWS environment provisioning, code deployment and data ingestion processing to ingest and process over 2 TB of Data.
This presentation covers the technologies used, the planning phase, AWS instance selection and optimizing the ETL processing for not only performance but also cost.
The target was to process 500 million rows within 72 hours with a processing rate of 5 million transactions per hour.
The presentation also provides pitfalls and automation optimizations performed to accomplish the targeted processing rates.
The presentation was delivered at the DevOpsDC Meetup on May 17, 2016
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...Amazon Web Services
Postgres is a popular relational database and is the backend of a number of high traffic applications. Join AWS and PalominoDB, the company that helped Obama for America campaign optimize the database infrastructure on AWS, to learn about how you can run high throughput, I/O intensive Postgres clusters on the Amazon EBS storage platform. We will go over best practices including performance, durability and optimization related to deploying Postgres on AWS.
You hear about the best practices learned and applied for the Obama for America campaign.
In this webinar, you will learn about:
- Amazon Elastic Block Store (EBS)
- Why Provisioned IOPS volumes fit the needs of high I/O intensive applications
- Best practices for deploying Postgres on AWS
- How to leverage Provisioned IOPS volumes for Postgres
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
Amazon RDS makes it easy to set up, operate, and scale relational databases in the cloud. The service offers a variety of options for optimizing the performance level delivered, as well as optimizing your spending. In this webinar, we will show a variety of techniques for implementing the right performance level for your application.
Learning Objectives:
• Understand the Amazon RDS options that change database performance and cost
• Select the appropriate performance and cost level for your specific application Who Should Attend:
• Technical Amazon RDS customers and prospective customers
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...Amazon Web Services
Postgres is a popular relational database and is the backend of a number of high traffic applications. Join AWS and PalominoDB, the company that helped Obama for America campaign optimize the database infrastructure on AWS, to learn about how you can run high throughput, I/O intensive Postgres clusters on the Amazon EBS storage platform. We will go over best practices including performance, durability and optimization related to deploying Postgres on AWS.
You hear about the best practices learned and applied for the Obama for America campaign.
In this webinar, you will learn about:
- Amazon Elastic Block Store (EBS)
- Why Provisioned IOPS volumes fit the needs of high I/O intensive applications
- Best practices for deploying Postgres on AWS
- How to leverage Provisioned IOPS volumes for Postgres
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
Amazon RDS makes it easy to set up, operate, and scale relational databases in the cloud. The service offers a variety of options for optimizing the performance level delivered, as well as optimizing your spending. In this webinar, we will show a variety of techniques for implementing the right performance level for your application.
Learning Objectives:
• Understand the Amazon RDS options that change database performance and cost
• Select the appropriate performance and cost level for your specific application Who Should Attend:
• Technical Amazon RDS customers and prospective customers
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
Some workloads require very low latency for high percentile of requests that even the fastest of disks may feel challenged to provide. This requirement will not be met if several IO reads will have to be issued to retrieve requested data from the storage array. The new Scylla In-Memory storage option was added to Scylla Enterprise to satisfy the read mostly workloads that fit into the memory and require consistent low latency. In this talk we will discuss the characteristics of the implementation and how can you take advantage of it.
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Ontico
Write-optimized database algorithms have been available in NoSQL products for many years. With MyRocks, the RocksDB storage engine for MySQL, we are using a write-optimized algorithm for a SQL DBMS. This talk will explain why we created MyRocks and how to compare write-optimized algorithms with the ubiquitous B-Tree in terms of read, write and space efficiency. A large MySQL deployment at Facebook is in the process of migrating from InnoDB to MyRocks. With RocksDB storage engines for MySQL and MongoDB and the Vinyl storage engine for Tarantool we think it is likely that you can consider a write-optimized database engine in the next few years.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
Some workloads require very low latency for high percentile of requests that even the fastest of disks may feel challenged to provide. This requirement will not be met if several IO reads will have to be issued to retrieve requested data from the storage array. The new Scylla In-Memory storage option was added to Scylla Enterprise to satisfy the read mostly workloads that fit into the memory and require consistent low latency. In this talk we will discuss the characteristics of the implementation and how can you take advantage of it.
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Ontico
Write-optimized database algorithms have been available in NoSQL products for many years. With MyRocks, the RocksDB storage engine for MySQL, we are using a write-optimized algorithm for a SQL DBMS. This talk will explain why we created MyRocks and how to compare write-optimized algorithms with the ubiquitous B-Tree in terms of read, write and space efficiency. A large MySQL deployment at Facebook is in the process of migrating from InnoDB to MyRocks. With RocksDB storage engines for MySQL and MongoDB and the Vinyl storage engine for Tarantool we think it is likely that you can consider a write-optimized database engine in the next few years.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
Originally presented at the BDOOP and Spark Barcelona meetup groups: http://meetu.ps/3bwCTM
Cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Spark and Hive come ready to use, with a general-purpose configuration and upgrade management. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2, making it challenging to keep up-to-date production services both on-premises and in the cloud for compatibility and stability. The talk compares:
• The performance of both v1 and v2 for Spark and Hive
• PaaS cloud services: Azure HDinsight, Amazon Web Services EMR, Google Cloud Dataproc
• Out-of-the-box support for Spark and Hive versions from providers
• PaaS reliability, scalability, and price-performance of the solutions
Using BigBench, the new Big Data benchmark standard. BigBench combines SQL queries, MapReduce, user code (UDF), and machine learning, which makes it ideal to stress Spark libraries (SparkSQL, DataFrames, MLlib, etc.).
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Amazon Web Services
As the cloud continues to grow, organizations need IT talent with cloud skills. AWS Certifications validate cloud knowledge with an industry-recognized credential that can help advance your career.
Join this webinar to learn more about why AWS Certifications matter and to hear tips from an AWS expert about how to prepare for certification exams. During this webinar, you’ll hear about the AWS training, self-paced labs, and online resources that can help you on your path toward preparing for any one of our Associate exams including: Solutions Architect, Developer, and SysOps Administrator. We’ll also walk you through sample questions and study tips so you can learn how to think through typical associate-level exam questions. Finally, you’ll have the chance to have your questions answered live by an AWS expert.
Learning Objectives:
• Hear about a recommended preparation path for the career-enhancing AWS associate certification exams
• Learn more about how AWS Training can help you prepare to take the exam
• Hear study tips, work through a practice question, and have your questions answered live
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
Amazon Aurora: The New Relational Database Engine from AmazonAmazon Web Services
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
Amazon Aurora: The New Relational Database Engine from AmazonAmazon Web Services
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The service is now in preview. Come to our session for an overview of the service and learn how Aurora delivers up to five times the performance of MySQL yet is priced at a fraction of what you'd pay for a commercial database with similar performance and availability.
Amazon Aurora Getting started Guide -level 0kartraj
Introduction To Amazon Aurora, Amazon Aurora
applying a Service-oriented architecture
to the database
Aurora Makes it Easy to Run Your Databases
Aurora simplifies storage management
Aurora simplifies Data Security
Aurora is Highly Available
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Amazon Aurora is a MySQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. This session introduces you to Amazon Aurora, explains common use cases for the service, and helps you get started with building your first Amazon Aurora–powered application.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss the performance implications of our new larger and faster SSD volumes (up to 16 TB with increased max throughput levels), as well as Amazon EBS encryption. Throughout, we share tips for success.
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceAmazon Web Services
Amazon Elastic Block Store (Amazon EBS) provides persistent block-level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss the performance implications of our new larger and faster SSD volumes (up to 16 TB with increased maximum throughput levels), as well as Amazon EBS encryption. Throughout, we share tips for success.
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceAmazon Web Services
Amazon Elastic Block Store (Amazon EBS) provides persistent block-level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss the performance implications of our new larger and faster SSD volumes (up to 16 TB with increased maximum throughput levels), as well as Amazon EBS encryption. Throughout, we share tips for success.
Learn tips and techniques that will improve the performance of your applications and databases running on Amazon EC2 instance storage and/or Amazon Elastic Block Store (EBS). This advanced session discusses when to use HI1, HS1, and Amazon EBS. We will share an "under the hood" view to tune the performance of your Elastic Block Store and best practices for running workloads on Amazon EBS, such as relational databases (MySQL, Oracle, SQL Server, postgres) and NoSQL data stores, such as MongoDB and Riak.
Similar to DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef (20)
stackArmor - FedRAMP and 800-171 compliant cloud solutionsGaurav "GP" Pal
Conducting a Security Assessment and Authorization (SA&A) phase is essential to deliver a fully compliant solution and ensure adequate verifiable evidence to support the assertion that the system is compliant with FedRAMP or 800-171 requirements. Documentation standards as prescribed by FIPS-199, NIST SP 800-53 and the newly released Rev 5 as well as the DOD RMF are proven frameworks for ensuring a secure and compliant cloud system.
stackArmor - FedRAMP and 800-171 compliant cloud solutionsGaurav "GP" Pal
Providing a FedRAMP or 800-171 compliant solution requires a strong continuous monitoring and management program. DHS' CDM initiative is a robust blueprint for architecting a proven set of policies, procedures and tools that effectively provide the information needed to detect issues and anomalies.
stackArmor Security MicroSummit - Next Generation Firewalls for AWSGaurav "GP" Pal
stackArmor Security MicroSummit
How to select a Next Generation Firewall by Palo Alto Networks:
Ed Caswell from Palo Alto Networks will talk about how to select and deploy a next generation firewall. He will cover the topics described below.
Understand key threats and use cases for NGFW : Understand the threat vectors and use cases that are driving NGFW adoption.
Key features and benefits of NGFW : Understand key capabilities and the protections that are delivered for cloud hosting environments.
NGFW Best Practices : Common deployment models and cloud-architecture best practices for security focused organizations leveraging cloud platforms such as AWS.
stackArmor MicroSummit
Securing the AWS Environment by McAfee:
Larry Kovalsky will cover topics relevant to securing the AWS hosting environment for compliance and security focused customers. He will cover the topics described below.
Endpoint Focused : McAfee Public Cloud Security Suite – Workload Discovery, Visibility, and Comprehensive Threat Protection for AWS
Network Focused : McAfee Virtual Network Security Platform – Network intrusion prevention featuring advanced signature-less detection techniques and true East/West IPS/prevention capabilities within AWS.
Data Focus : Pervasive Data Protection Suite – Visibility, Encryption, Data Loss Prevention, Web/Cloud Access Service Broker (CASB) protection. Follow the data between on-prem and AWS.
stackArmor MicroSummit - Niksun Network Monitoring - DPIGaurav "GP" Pal
stackArmor Security MicroSummit
Deep Packet Inspection on AWS by Niksun:
Shivank Dua will talk about how Deep Packet Inspection on AWS provides critical capabilities required to detect data breaches, malware and other threat scenarios. The ability to reconstruct the packet stream and perform forensics is critical to speedy incident response protecting from emerging and dynamic threat patterns. Topics will include:
Threat scenarios and the need for Deep Packet Inspection / Deep Content Inspection
Limitations of flow and log-based analysis techniques
Use cases for ‘knowing the unknown’ via deep packet and content inspection
stackArmor Security MicroSummit - AWS Security with SplunkGaurav "GP" Pal
stackArmor MicroSummit
Creating a SOC/NOC and Security Insights with Splunk and SplunkCloud:
Splunk talk about how to leverage and deploy Splunk ES and the latest SplunkCloud offering to rapidly develop a SIEM and Operational Insights platform quickly. Learn about the latest SplunkCloud offering from Splunk and available on the AWS Marketplace.
Secured hosting and maintenance of e-commerce websites has become the need of the hour. Modern day websites are highly vulnerable to threats such as hacking, phishing, pharming, denial of access etc. Magento is considered to be one of the most secured e-commerce platform that is easy to install and ready to use. The inbuilt security features of Magento and the additional benefits of AWS makes it the safest and secured platform for modern applications.
Magento is an open source cloud based digital commerce platform that empowers merchants to integrate digital and physical shopping experiences. Magento enterprise edition provides an engaging shopping experience to the users by providing personalized content, fast checkout and a seamless shopper experience. However, in order to ensure the integrity of the user experience and sensitive customer data, it is important to follow security and deployment best practices. stackArmor’s cybersecurity and cloud deployment experts have developed a proven and full-stack methodology to help protect and secure applications and data. The diagram below provides an overview of the key layers and security countermeasures.
AWS offers a wide variety of configuration and deployment choices requiring infrastructure, systems engineering and AWS engineering expertise. The cloud experts at stackArmor, have developed an easy to use deployment automation harness called StackBuilderTM. StackBuilderTM allows users to quickly deploy and use their Magento e-commerce website hosted on AWS. StackBuilder’s intelligent cloud deployment engine takes care of instance selection, AWS VPC configuration and software installation. The fully managed Magento service includes patching, vulnerability management, continuous monitoring, data encryption, and recovery & backup support.
stackArmor StackBuilder provides a rich and easy to use consumer-grade experience for non-technical users to jumpstart their projects by answering a series of simple questions. StackBuilder’s intelligent provisioning and capacity estimation engine leverages the rich set of services provided by the AWS cloud platform including wide variety of EC2 instances, Virtual Private Cloud (VPC), Auto Scaling Groups, Clustering and Elastic Load Balancers (ELB) amongst others. The user of StackBuilderTM does not have to go through the various steps associated with configuring and setting up the AWS infrastructure as they are handled automatically. This allows the user to focus on his project without waiting for costly consultants or the need for cloud infrastructure expertise.
Amazon Web Services (AWS) offers a cost-effective and flexible hosting platform for enterprise applications such as Sitecore. This white paper describes about the motivation behind the use of Sitecore and how it can be hosted on cloud infrastructure “As-A-Service”. This paper also talks about the features provided by Sitecore, the System Requirements for its installation, and the add-on features provided by AWS to make the service robust, secure, and reliable.
The cloud experts at stackArmor have successfully migrated and supported large cloud-based systems on the AWS platform for customers such as US Treasury, US Department of Defence and other large security-focused organizations in Healthcare and Financial Services. StackBuilderTM is a cloud deployment automation platform that has a “Turbo-Tax” like wizard that helps users quickly select and jumpstart their content management system on AWS.
stackArmor’s StackBuilderTM is a cloud deployment wizard designed to assist companies in the rapid deployment and installation of Sitecore on Amazon Web Services (AWS). The highly experienced AWS solution architects at stackArmor have developed StackBuilderTM to simplify the cloud migration and hosting experience by automating the configuration and setting up of secure cloud hosting environments based on the AWS Well-Architected Framework (WAF).
Secured Hosting of PCI DSS Compliant Web Applications on AWSGaurav "GP" Pal
Protecting card owner information has become very important for e-commerce companies as they have become frequent targets for hackers. In order to safeguard the interests of the card owners, four industry majors, VISA, MasterCard, Discover and American Express, joined hands to create a set of policies and procedures to protect the debit, credit and cash card transactions and to safeguard the personal information of the cardholders. These policies and procedures are collectively known as the Payment Card Industry Data Security Standard (PCI DSS). In simple terms these standards alert companies that they are wholly responsible for the credit card information of their customers. The PCI directs companies to use the information diligently and to store only that information that is required for their business. This white paper provides an overview of architectural features in the AWS cloud that ensure the hosting of e-commerce web applications that are PCI DSS compliant. This stackArmor white paper provides an overview of hosting PCI DSS compliant applications in AWS.
Implementing Secure DevOps on Public Cloud PlatformsGaurav "GP" Pal
Businesses are looking to accelerate the delivery of production quality software with fewer defects, and better security. Continuous Integration/Continuous Deployment (CI/CD) also known as DevOps is a rapidly maturing practice for reducing the time and effort it takes to test and deploy code into production. The rapid automation of the integration and deployment activities is common especially on cloud-based platforms. Adding security testing into the DevOps pipeline can help address the needs of regulated, compliance and public sector focused organizations. This white paper describes the use of open source technologies and commercial packages to design and deploy a Secure DevOps pipeline. Tools such as Yasca, SonarQube, and OpenSCAP amongst others when integrated with vulnerability scanners such as Tenable Nessus, HP Fortify and others provide a robust SecDevOps implementation. This white paper by stackArmor provides an overview on how an organization can implement a Secure DevOps pipeline and its key elements.
FGMC - Managed Data Platform - CloudDC MeetupGaurav "GP" Pal
First Guaranty Mortgage Corporation (FGMC) is a full service national lender offering mortgage solutions to clients. The Enterprise Data team is focused on leveraging Enterprise Data as a differentiator. By embracing data science, analytics and cloud technologies, new and innovative solutions are delivered to support the business mission.
The rapid evaluation and deployment of a flexible, scalable, and secure cloud-based data platform was critical to jumpstarting enterprise data initiatives.
FGMC leveraged Amazon Web Services (AWS) and conducted a agile and iterative transformation process that included a pilot using stackArmor's StackBuilder solution for rapidly deploying services.
Organizations are looking for rich visualization solutions to enable better decision making and collaboration. Tableau is a market leading data analytics solution that can be easily hosted on AWS and offers variety of pricing models to get started quickly. Learn more about how you can jumpstart your Tableau project by using the new cloud based pricing model.
AWS Security Best Practices, SaaS and ComplianceGaurav "GP" Pal
As more SaaS businesses come online it is critical they follow security architecture and operational best practices. The changing regulatory framework from agencies such as SEC, FTC and other agencies requires SaaS companies to implement security best practices.
Big Data - Accountability Solutions for Public Sector ProgramsGaurav "GP" Pal
Enhancing Program Oversight and Integrity through Agile Systems Development and Advanced Analytics requires the application of advanced algorithms and technologies for proactive oversight.
The Recovery Operations Center (ROC) deployed advanced analytics and data analytics staff to help identity and prevent waste, fraud and abuse in the $840 billion ARRA 2009 program.
2013 11-06 adopting aws at scale - lessons from the trenchesGaurav "GP" Pal
Enterprise adoption of elastic cloud computing platforms such as AWS require new management and operations processes.
Highlights:
--“Pay-as-you-go” is an asset only if strong governance is in place
--Who should be performing this optimization? Developers? Ops? PM’s? What should be the frequency?
--What is the playbook?
----Resizing instances based on demand
----Reviewing storage consumption
----Standard/Reserved Instances
--Shut-down instances when not needed
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
DevOps helps accelerate the delivery of software applications through automation and by removing Development & Operations silos. The Netflix Platform Engineering team has developed a robust data pipeline solution called SURO that has been open sourced. Come learn from the experiences of pioneers like Netflix how they are leveraging the data pipeline for new and innovative use cases. This is the presentation by Danny Yuan, Netflix Platform Engineering Team on operational and monitoring aspects of applications on cloud platforms.
Enterprise transformation with cloud computing Jan 2014Gaurav "GP" Pal
We concluded on Jan 31, 2014 another fabulous edition of the Digital Innovation Breakfast. Over 60 people registered and listened to the keynote address by Mr. Bernie Mazer, CIO, US Department of the Interior. This was followed by a lively panel discussion that included US Department of the Interior, US Department of the Treasury, Accenture, Microsoft and Gartner. We are now set to execute on the third edition of the event scheduled for late April 2014/early May 2014 titled “Big Data in Financial Services”.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
1. Proprietary and confidential information of stackArmor
PRESENTATION FOR DEVOPSDC
MAY 17, 2016
ETL processing at scale with
MongoDB and Solr on AWS using Chef
2. PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 2
Delivering innovation with Cloud, Data
Analytics and Automation
• Cloud orchestration and automation
• Migration and Operations support
• Cloud hosting and SaaS Development
www.stackArmor.com
Our Partnerships
3. The Customer
• Big Data Analytics SaaS Firm
◦ 500 million records every month needed to be processed as fast as possible at
the lowest possible cost
◦ The current on-premises infrastructure was taking 3 weeks to run and
complete the process
◦ Processing costs were a big deal and needed to be executed within a very
tight and specific budget
◦ Due to HIPAA compliance reasons, the application components could not be
altered
◦ Needed to process 5 million records per hour and complete the process in 1-2
days and tear down the environment post-completion
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 3
4. ETL System Overview
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 4
Input
Files
Parsing
Component
Staging Collection
Final Collection
Ingestion
Component
Search
Index
JSON
Doc
JSON
Doc
1 - Read Input
Records
2 – Store JSON
Docs
3 – Read each
JSON Doc
4 – Search for Docs which
might be a match.
5 – Get set of
candidate Doc IDs
6 – After evaluating
candidates, merge
with existing Doc, or
save as new Doc
5. Summary of Process
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 5
• Customer receives batch files with user data from different sources
• Needs to reconcile with existing records
• Update info or create new records
• Most users already exist in the DBs
6. The Stack
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 6
• MongoDB 2.6
AWS environment with 10 mongo shards each with one primary, one replica and one arbiter. The primaries
and replicas are running on r3.2xlarge instances. Each has 4 EBS volumes attached. 1x1TB, 1x60GB, 2x25GB
with 3000, 180/3000, 250, and 200 IOPs, respectively.
• MongoConfig
Three mongo config instances are running on 3 m3.large instances. These instances also run the Zookeeper
processes used to manage the SolrCloud cluster. Issues with Zookeeper stability.
• Mongos
Mongos nodes on the same VM as our application logic. Each application server connects to a local mongos
node.
• Solr 5.4.1
Running a SolrCloud cluster with 20 shards and 5 replication factors. Each primary and secondary is
deployed to a c4.4xlarge instance. Each instance has 3 EBS volumes: 2x 60 GB, 1x 200GB. 180/3000, 1000,
and 600/3000 IOPs, respectively.
• Application Nodes
Ingestion/indexing nodes running on c4.xlarge instances. 10 such nodes running Tomcat 7.
7. Design Considerations
• Wanted to build environment by “hand” to save time and meet
project goals – automate now or later?
◦ Which automation and orchestration technology to use?
• What is the optimal shard to replica ratio?
• To P-IOPS or not P-IOPS? Or supersize the instance?
• RAID or NOT, if so which one
• Scale-up or Scale-out? Cost considerations
• Solr versus SolrCloud?
• Goal was to process 5 million records per hour
• Money, Money, Money?
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 7
8. The Math
• Needed to process a total data set of 2TB for Staging and
2TB for Production
◦ Estimated 50 shards with 80GB per shard
◦ Choose instance size with memory to fit 2TB / 50 shards
• Separate mongos nodes or with app nodes
• PIOPs disks vs RAID 0
◦ Cost vs performance
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 8
9. Instance Selection
• Scale-up versus Scale-out
• IO Calculations
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 9
Name vCPU Memory Instance I/O On-Demand Cost per GB RAM
Cost per Core
Per GB RAM
r3.8xlarge 32 244 SSD 2 x 320 10 Gigabit $ 2.6600 $ 0.01090 $ 0.0000447
r3.4xlarge 16 122 SSD 1 x 320 High $ 1.3300 $ 0.01090 $ 0.0000894
r3.2xlarge 8 61 SSD 1 x 160 High $ 0.6650 $ 0.01090 $ 0.0001787
m2.4xlarge 8 68.4 2 x 840 High $ 0.9800 $ 0.01433 $ 0.0002095
cr1.8xlarge 32 244 SSD 2 x 120 10 Gigabit $ 3.5000 $ 0.01434 $ 0.0000588
m4.xlarge 4 16 -- High $ 0.2390 $ 0.01494 $ 0.0009336
m4.10xlarge 40 160 -- 10 Gigabit $ 2.3940 $ 0.01496 $ 0.0000935
m4.4xlarge 16 64 -- High $ 0.9580 $ 0.01497 $ 0.0002339
m4.2xlarge 8 32 -- High $ 0.4790 $ 0.01497 $ 0.0004678
m3.2xlarge 8 30 SSD 2 x 80 High $ 0.5320 $ 0.01773 $ 0.0005911
m3.xlarge 4 15 SSD 2 x 40 High $ 0.2660 $ 0.01773 $ 0.0011822
d2.8xlarge 36 244HDD 24 x 2000 10 Gigabit $ 5.5200 $ 0.02262 $ 0.0000927
Helped us find the
right instance for us;
There were other
Considerations as well
P-IOPS or not?
10. Initial Design
• <insert initial design>
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 10
11. Final Design
• MongoDB nodes
◦ 10x c4.8xlarge
•solr nodes
◦ 100x c4.4xlarge
◦ Max heap 6 gb
◦ Replication factor of 1 (shard, non-cloud mode)
• Separate EBS volumes for MongoDB data, journal, and logs
• 4x1TB EBS RAID0 drives for MongoDB data
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 11
12. The tuning journey
• Driving optimal load from mongos to SOLRs
• Throttling the number of java threads that can generate the load that
cluster can handle
• Maximize the CPU utilization on the SOLR nodes but at the same time
not to push the network bandwidth traffic
• Use of 10GB network between EC2s
• Determination of IOPS requirement to maximize the throughput
(#number of ingestion records processed).
• Gradually increasing threads instead of overloading SOLR at the start
• use optimize SOLR function periodically to sustain steady SOLR search
response times
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 12
13. Automation Saved the Day!
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 13
Solr
Parameterized
ETL Environment
Number of Instances
Type of Instances
Other params
Tuning by Doing!
14. Some charts along the way
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 14
Graph showing throughput
reaching around 75,000
records per 15 mins for each
node. We reached nearly 3
million records per hour but
with spikes (and errors).
15. Things are getting better!
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 15
Graph showing throughput
reaching around 75,000
records per 15 mins for each
node. We reached nearly 3
million records per hour
sustained throughput.
16. What worked to improve throughput?
Our primary goal is to maximize the throughput at the optimal price
• Gradual increase in ‘solr’ load/throttle
•‘solr’ optimize run frequently to make ‘solr’ index spread evenly and
sustain ‘solr’ search time
•Use of 10-GB network between ec2 nodes to improve throughput of more
than 3GB network bandwidth
•Use of RAID0/with 4 disks to improve IO , keeping the read queue smaller
•Use of RAID read block size of 32 for Mongo DB
•Disable read-ahead for Mongo DB
•Use of enhanced EBS networking
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 16
17. Results
• Client has an automated, dynamically built environment
• Automated parts of code deployment processes
◦ (Jenkins -> S3 -> Nodes)
• Capabilities delivered include:
◦ Offsite code archiving (S3)
◦ Infrastructure automation
◦ CloudTrail, VPC Flow Logs and S3 access logs helped auditing
◦ Created over 100 Chef recipes
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 17
18. Next Steps
• Upgrade to the technology stack and increase instance
utilization using Apache Mesos
• Enable Self Service for Engineers
◦ Create a new MongoDB / SOLR cluster
◦ Start a cluster
◦ Stop a cluster
◦ Update the code on a cluster (deploy)
◦ Update configuration parameters
◦ Resize instances within a named cluster
• Create Rundeck or Jenkins based dashboard
• Capture state of all clusters / generate daily reports for
management
•Optimize Cost and Performance of Ingestion Cluster
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 18
20. Thank you
Gaurav “GP” Pal
Principal,
stackArmor.com
gpal@stackArmor.com
(571) 271 4396
www.stackArmor.com
PROPRIETARY AND CONFIDENTIAL INFORMATION OF STACKARMOR 20