Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
Apache Spark is a popular in-memory data analytics engine because of its speed, scalability, and ease of use. It also fits well with DevOps practices and cloud-native software platforms. It’s good for data exploration, interactive analytics, and streaming use cases.
However, Spark, like other data-processing platforms, is not one size fits all. Different versions of Spark support different feature sets, and Spark’s machine-learning libraries can also vary in important ways between versions, or may lack the right algorithm.
In this webinar, you’ll learn:
- How to integrate data warehouse workloads with Spark
- Which workloads are better for Greenplum and for Spark
- How to use the Greenplum-Spark connector
Presenter: Kong Yew Chan, Product Manager, Pivotal
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
Apache Spark is a popular in-memory data analytics engine because of its speed, scalability, and ease of use. It also fits well with DevOps practices and cloud-native software platforms. It’s good for data exploration, interactive analytics, and streaming use cases.
However, Spark, like other data-processing platforms, is not one size fits all. Different versions of Spark support different feature sets, and Spark’s machine-learning libraries can also vary in important ways between versions, or may lack the right algorithm.
In this webinar, you’ll learn:
- How to integrate data warehouse workloads with Spark
- Which workloads are better for Greenplum and for Spark
- How to use the Greenplum-Spark connector
Presenter: Kong Yew Chan, Product Manager, Pivotal
The Pivotal Greenplum-Spark Connector provides high speed, parallel data transfer between Greenplum Database and Apache Spark clusters to support:
- Interactive data analysis
- In-memory analytics processing
- Batch ETL
- Continuous ETL pipeline (streaming)
Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data.
This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink.
Speaker
Gregory Fee, Principal Engineer, Lyft
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVKevin Xu
This deck was presented at the SF Kubernetes Meetup held at Microsoft's downtown SF office, introducing the architecture of TiDB and TiKV (a CNCF project), key use cases, a user story with Mobike (one of the largest bikesharing platforms in the world), and how TiDB is deployed across different cloud environment using TiDB Operator.
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years.
Inspired by the increasingly complex SQL queries run by the Presto user community, engineers at Facebook and Starburst have recently focused on cost-based query optimization. In this talk we will present the initial design and implementation of the CBO, support for connector-provided statistics, estimating selectivity, and choosing efficient query plans. Then, our detailed experimental evaluation will illustrate the performance gains for several classes of queries achieved thanks to the optimizer. Finally, we will discuss our future work enhancing the initial CBO and present the general Presto roadmap for 2018 and beyond.
Speakers
Kamil Bajda-Pawlikowski, Starburst Data, CTO & Co-Founder
Martin Traverso
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
In this session, we will introduce the concept of Apache AGE and the synergy effect in the combination of Postgres and NoSQL (Graph Database). We shall discuss the story and background of Apache AGE as an open-source project and introduce challenges that AGE can solve for its users. Furthermore, we will talk about a graph database as an extension to PostgreSQL and how it can support all the functionalities and features of PostgreSQL and offers a graph model in addition. We will also discuss how users with a relational background and data model who are in need of having a graph model on top of their existing relational model, can use this extension with minimal effort because they can use existing data without migration to enable a graph database.
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...Databricks
Most enterprises have business critical code that is well maintained and high performance. The switching costs to rewrite or port this code can often prevent adoption of new frameworks due to the level of technical debt. Adding another level of indirection through network proxies often results in an unacceptable performance hit.
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
The Central Bank of the Republic of Turkey is primarily responsible for steering the monetary and exchange rate policies in Turkey.
One of the major core functions of the Bank is market operations. In this context, analyzing and interpreting real-time tick data related to money market instruments has become not only a requirement but also a challenge.
For this use case, an API provided by one of the financial data vendors has been used to gather real-time tick data and data routing has been orchestrated by Apache NiFi.
Gathered data is being transferred to Kafka topics and then handed off to Druid for real-time indexing tasks.
Indicators such as effective cost, bid-ask spread, price impact measures, return reversal are calculated using Apache Storm and finally visualized by means of Apache Superset in order to provide decision-makers with a new set of tools.
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks
Continuous integration (CI) pipelines generate massive amounts of messy log data. At Pure Storage engineering, we run over 65,000 tests per day creating a large triage problem. Spark’s flexible computing platform allows us to write a single application for both streaming and batch jobs to understand the state of our CI pipeline. Spark indexes log data for real-time reporting (Streaming), uses Machine Learning for performance modeling and prediction (Batch job), and re-indexes old data for newly encoded patters (Batch job). Previous work on a mixed streaming and batch environment describes the options for persisting data and their trade-offs:
1) short interval buckets which hurts batch performance
2) long interval buckets which increases micro batch time windows
3) additional software on the background to compact the short interval buckets which adds complexity.
This talk will go over how we use the filesystem metadata of our disaggregated compute and storage layers to write over half a million files per day of varied sizes from 52 Billion events and have efficient batch jobs without compaction that allow us to process over 40TB per hour. We will go over the challenges and best practices to achieve efficiency in this mixed environment scenarios.
Dev ops for big data cluster management toolsRan Silberman
What are the tools that we can find to day to manage Hadoop cluster and its ecosystem?
There are two tools ready today:
Cloudera Manager and Ambari from Hortonworks.
In this presentation I explain what they do and why to use them, as well as Pros. and Cons.
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
The overwhelming growth of technologies in the Cloud Native foundation overtook our toolbox and completely changed (well, really enhanced) the Developer Experience.
In this talk, I will try to provide my personal journey from the "Operator to Developer's chair" and the practices which helped me along my journey as a Cloud-Native Dev ;)
The Pivotal Greenplum-Spark Connector provides high speed, parallel data transfer between Greenplum Database and Apache Spark clusters to support:
- Interactive data analysis
- In-memory analytics processing
- Batch ETL
- Continuous ETL pipeline (streaming)
Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data.
This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink.
Speaker
Gregory Fee, Principal Engineer, Lyft
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVKevin Xu
This deck was presented at the SF Kubernetes Meetup held at Microsoft's downtown SF office, introducing the architecture of TiDB and TiKV (a CNCF project), key use cases, a user story with Mobike (one of the largest bikesharing platforms in the world), and how TiDB is deployed across different cloud environment using TiDB Operator.
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years.
Inspired by the increasingly complex SQL queries run by the Presto user community, engineers at Facebook and Starburst have recently focused on cost-based query optimization. In this talk we will present the initial design and implementation of the CBO, support for connector-provided statistics, estimating selectivity, and choosing efficient query plans. Then, our detailed experimental evaluation will illustrate the performance gains for several classes of queries achieved thanks to the optimizer. Finally, we will discuss our future work enhancing the initial CBO and present the general Presto roadmap for 2018 and beyond.
Speakers
Kamil Bajda-Pawlikowski, Starburst Data, CTO & Co-Founder
Martin Traverso
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
In this session, we will introduce the concept of Apache AGE and the synergy effect in the combination of Postgres and NoSQL (Graph Database). We shall discuss the story and background of Apache AGE as an open-source project and introduce challenges that AGE can solve for its users. Furthermore, we will talk about a graph database as an extension to PostgreSQL and how it can support all the functionalities and features of PostgreSQL and offers a graph model in addition. We will also discuss how users with a relational background and data model who are in need of having a graph model on top of their existing relational model, can use this extension with minimal effort because they can use existing data without migration to enable a graph database.
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...Databricks
Most enterprises have business critical code that is well maintained and high performance. The switching costs to rewrite or port this code can often prevent adoption of new frameworks due to the level of technical debt. Adding another level of indirection through network proxies often results in an unacceptable performance hit.
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
The Central Bank of the Republic of Turkey is primarily responsible for steering the monetary and exchange rate policies in Turkey.
One of the major core functions of the Bank is market operations. In this context, analyzing and interpreting real-time tick data related to money market instruments has become not only a requirement but also a challenge.
For this use case, an API provided by one of the financial data vendors has been used to gather real-time tick data and data routing has been orchestrated by Apache NiFi.
Gathered data is being transferred to Kafka topics and then handed off to Druid for real-time indexing tasks.
Indicators such as effective cost, bid-ask spread, price impact measures, return reversal are calculated using Apache Storm and finally visualized by means of Apache Superset in order to provide decision-makers with a new set of tools.
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks
Continuous integration (CI) pipelines generate massive amounts of messy log data. At Pure Storage engineering, we run over 65,000 tests per day creating a large triage problem. Spark’s flexible computing platform allows us to write a single application for both streaming and batch jobs to understand the state of our CI pipeline. Spark indexes log data for real-time reporting (Streaming), uses Machine Learning for performance modeling and prediction (Batch job), and re-indexes old data for newly encoded patters (Batch job). Previous work on a mixed streaming and batch environment describes the options for persisting data and their trade-offs:
1) short interval buckets which hurts batch performance
2) long interval buckets which increases micro batch time windows
3) additional software on the background to compact the short interval buckets which adds complexity.
This talk will go over how we use the filesystem metadata of our disaggregated compute and storage layers to write over half a million files per day of varied sizes from 52 Billion events and have efficient batch jobs without compaction that allow us to process over 40TB per hour. We will go over the challenges and best practices to achieve efficiency in this mixed environment scenarios.
Dev ops for big data cluster management toolsRan Silberman
What are the tools that we can find to day to manage Hadoop cluster and its ecosystem?
There are two tools ready today:
Cloudera Manager and Ambari from Hortonworks.
In this presentation I explain what they do and why to use them, as well as Pros. and Cons.
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
The overwhelming growth of technologies in the Cloud Native foundation overtook our toolbox and completely changed (well, really enhanced) the Developer Experience.
In this talk, I will try to provide my personal journey from the "Operator to Developer's chair" and the practices which helped me along my journey as a Cloud-Native Dev ;)
What if …
- Traditional, labour-intensive backup and archive practices for your MySQL, MariaDB, MongoDB and PostgreSQL databases were a thing of the past?
- You could have one backup management solution for all your business data?
- You could ensure integrity of all your backups?
- You could leverage the competitive pricing and almost limitless capacity of cloud-based backup while meeting cost, manageability, and compliance requirements from the business.
Welcome to our webinar on Backup Management with ClusterControl.
ClusterControl’s centralized backup management for open source databases provides you with hot backups of large datasets, point in time recovery in a couple of clicks, at-rest and in-transit data encryption, data integrity via automatic restore verification, cloud backups (AWS, Google and Azure) for Disaster Recovery, retention policies to ensure compliance, and automated alerts and reporting.
Whether you are looking at rebuilding your existing backup infrastructure, or updating it, this webinar is for you!
AGENDA
- Backup and recovery management of local or remote databases
- Logical or physical backups
- Full or Incremental backups
- Position or time-based Point in Time Recovery (for MySQL and PostgreSQL)
- Upload to the cloud (Amazon S3, Google Cloud Storage, Azure Storage)
- Encryption of backup data
- Compression of backup data
- One centralized backup system for your open source databases (Demo)
- Schedule, manage and operate backups
- Define backup policies, retention, history
- Validation - Automatic restore verification
- Backup reporting
SPEAKER
Bartlomiej Oles, Senior Support Engineer at Severalnines, is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
https://www.youtube.com/playlist?list=PLAiEy9H6ItrKC5PbH7KiELiSEIKv3tuov
-What is Prometheus?
-Difference Between Nagios vs Prometheus
-Architecture
-Alertmanager
-Time series DB
-PromQL (Prometheus Query Language)
-Live Demo
-Grafana
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
Enter the world of cloud computing and software development with PaaS. What it takes to create a production ready application with Heroku and how to run it?
Terraforming your Infrastructure on GCPSamuel Chow
A talk I gave at the Google Cloud Platform LA Meetup event at Google Playa Vista on Nov 6, 2019. This is a 1+ hour-long, tutorial-oriented talk on Infrastructure as Code (IaC), Terraform (as a toolset for IaC and modern devops), and leverage the practice and tools in defining, deploying, and managing your infrastructure in GCP.
Next Generation Cloud Computing With Google - RightScale Compute 2013RightScale
Speaker: Martin Gannholm - Lead Engineer, Google
Google Cloud Platform provides everything you need to build, run, and scale social, mobile, and online applications. Already, tens of thousands of popular applications like Khan Academy, Angry Birds, SnapChat, and Pulse are benefiting from the power of running on top of Google infrastructure. Come join Google as we go deep on how to best leverage our technology with RightScale to build your next masterpiece.
Join this info-packed and hands-on workshop where we will cover:
Introduction to Kubernetes & GitOps talk:
We'll cover the most popular path that has brought success to many users already - GitOps as a natural evolution of Kubernetes. We'll give an overview of how you can benefit from Kubernetes and GitOps: greater security, reliability, velocity and more. Importantly, we cover definitions and principles standardized by the CNCF's OpenGitOps group and what it means for you.
Get Started with GitOps:
You'll have GitOps up and running in about 30 mins using our free and open source tools! We'll give a brief vision of where you want to be with those security, reliability, and velocity benefits, and then we'll support you while go through the getting started steps. During the workshop, you'll also experience in action and see demos for:
* an opinionated repo structure to minimize decision fatigue
* disaster recovery using GitOps
* Helm charts example
* Multi-cluster example
* all with free and open source tools mostly in the CNCF (eg. Flux and Helm).
If you have questions before or after the workshop, talk to us at #weave-gitops http://bit.ly/WeaveGitOpsSlack (If you need to invite yourself to the Slack, visit https://slack.weave.works/)
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
Modernizing analytics data pipelines to gain the most of your data while optimizing costs can be challenging. However, today cloud providers offer a good set of services that can help with this endeavor. We will do a tour across some GCP services during this hands-on session, using DataFlow (apache beam) as the backbone to architect a modern analytics pipeline to wire them all together.
Choosing a right cloud provider to host your mission-critical application on the cloud is a multi-dimensional challenge faced by most of the companies. This presentation provides you information on the various benefits, challenges, and insights while evaluating a cloud provider and helps you quickly make a decision to choose a right cloud partner for your business to host your application.
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.
Developed for the Denver Art Museum by Ashley Blewer, this slide-deck covers some of the basics of diagnosing issues with Archivematica. Ashley covers everything from the software components involved with Archivematica, to monitoring logs, system monitoring, and upgrading your system. The presentation concludes with some useful links for tech-savvy preservationists, and Archivematica-unfamiliar system's administrators!
Multiplier Effect: Case Studies in Distributions for PublishersJon Peck
Join members from both Four Kitchens and Meredith Agrimedia as they discuss the experience of migration and relaunch of the digital presence of two magazines: Successful Farming at Agriculture.com and WOOD Magazine at woodmagazine.com.
We'll start by discussing the scope of the projects, delve into the commonalities and differences, explore their common advertising and analytics implementation, and analyze the unified distribution that supports both brands. By developing the infrastructure simultaneously, brand-agnostic functionality became a priority which in turn created a more modular and flexible system that facilitated open-sourcing and cross-organizational sharing. Thanks to the codebase approach and experience, the first site took about 6 months and the second took less than 6 weeks.
Is your flash system up to the challenge? Attend this webinar and learn how you can optimize your SQL Server performance. Hear how the pros pinpoint performance bottlenecks and leverage the latest advancements in storage technology to decrease access latency and IO wait times. By the end of the webinar you’ll have the tools and information you need to recommend the best approach for your SQL Server environment.
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
Accelerated computing is on the rise because of massively parallel, compute-intensive workloads such as deep learning, 3D content rendering, financial computing, and engineering simulations. In this session, we provide an overview of our accelerated computing instances, including how to choose instances based on your application needs, best practices and tips to optimize performance, and specific examples of accelerated computing in real-world applications.
Summary: In this talk you’ll learn how to implement and deploy a basic serverless Python application. --- Serverless is a concept that has recently raised to popularity, boosted by the drive to financially optimize usage of computing power in cloud environments while reducing maintenance efforts. The following topics will be covered in this talk: - What is a serverless application? - What are the benefits of the serverless execution model? - What is AWS Lambda - How to implement a basic Python serverless application with AWS Lambda? - How to implement a serverless Python based Webservice using Zappa
Similar to Pivotal Greenplum Cloud Marketplaces - Greenplum Summit 2019 (20)
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
3. Cover w/ Image
■ FAST
■ Leverage the Cloud
■ Same Experience Across Clouds
■ Secure
Goals for Cloud
Deployments
4. Goal 1 - Fast
● Companies use Greenplum for SPEED
● Cloud Deployments Must be Fast too
5. Performance Tuning
What is Tuned?
● Virtual Machine
● Operating System
● Disk
● Memory
● Network
● Marketplace Template
How is it Measured?
● "gpcheckperf" (Greenplum Utility)
for Network and Disk
● TPC-DS Benchmark
● Cloud Vendor Specs
6. TPC-DS Performance Test
Score
● Transaction Processing Performance Council (TPC)
● Members include:
○ Pivotal, Cloudera, HP, IBM, Microsoft, MapR, Oracle, RedHat,
Teradata, Intel, VMWare, Dell, and many others
● Decision Support (DS): Standard for Big Data / Data Warehousing
● Star Schema with 24 Tables and 99 Queries
● 3TB of data
● 1 and 5 Users
https://github.com/pivotalguru/tpc-ds
7. Score is a Function of
Duration and Hardware
Larger Score = Faster
TPC-DS Performance Test
9. Goal 2 - Leverage The Cloud
● Take Advantage of Cloud-Only Features
○ On-Demand Provisioning
○ Node Replacement
○ Disk Snapshots
○ Upgrades
○ Optional Installations
○ Web Based
10. On-Demand Provisioning
● Deployments Take less than 1 Hour to Complete
● Removes Barriers to Evaluate and Buy
● Empowers Business Units
Azure Resource Group
Deployment
AWS CloudFormation
GCP Deployment
Manager
11. Node Replacement
Pivotal Greenplum Self-Healing
● ANY Node Failure gets Automatically Replaced and Recovered
● Full Recovery in as little as 5 Minutes
○ On-Premises Recovery can last for Days!
● Online Recovery for Standby and Segment Hosts
● pgBouncer pause before Rebalance
VM VM
VMVM
VM
X
Demos in Pivotal
Booth!
12. Node Replacement
Pivotal Greenplum Self-Healing
Single Master
● Maintains High Availability
● No Performance Loss
● Fast Recovery with Self-Healing
● Save $$ on Infrastructure and
Licensing Costs
Interconnect
sdw1
Standby
Seg1
Seg2
Seg3
Seg4
sdw2
Seg5
Seg6
Seg7
Seg8
sdw3
Seg9
Seg10
Seg11
Seg12
...
mdw
Master
13. Disk Snapshots
gpsnap
● Schedule, Create, List, Delete, and Restore Snapshots with "gpsnap" and
"gpcronsnap"
● IaaS Snapshots Provide Fast Backup of a Volume
● Full Cluster Backup Measured in Minutes
● Automatically Configured to take a Weekly Snapshot Backup
● Snapshots are executed in Parallel so they are very FAST!
Data Volume Snapshot Restore
Demos in Pivotal
Booth!
14. Upgrades
gprelease
● Notification of New Version Availability with gpcronrelease (Executes Weekly)
● Installation of New Version with gprelease
● Existing Optional Packages (MADlib, PostGIS, Command Center, etc) Re-Installed and
Upgraded if Needed
Demos in Pivotal
Booth!
15. Optional Installations
gpoptional
● Deployment Parameters to Install
Components
● Or Post Deployment Tool
gpoptional
● Included Packages
○ Command Center
○ Data Science R and Python
○ MADlib
○ PostGIS
○ PL/R
Demos in Pivotal
Booth!
17. Goal 3 - Same Experience Across Clouds
● Similar Deployment
● Same Tools
● Same Software Versions
18. Parameters - Basics
Parameter AWS Azure GCP
Name? Stack Name Deployment Name Deployment Name
Where
Deployed?
Availability
Zone
Resource Group +
Location
Zone
SSH Key? Key Name SSH Public Key N/A
Who Can
Access?
SSH Location SSH Location SSH Location
Subnet CIDR ClusterSubnet Subnet Subnet
Instance
Type?
Instance
Type+Storage
Instance
Type+Storage
Instance Type
Instance
Storage?
N/A N/A Node Storage
How Many? Instance Count Instance Count Node Count
● GCP SSH Key is Managed
Automatically
● Azure Deployments are in a
Resource Group as well as
in a Location
● AWS and Azure Storage is
set by Instance Type for
Optimal Performance
● GCP Disk Size does not
impact performance
Demos in Pivotal
Booth!
19. Parameters - AWS
Parameter AWS
Name? Stack Name
SSH Key? Key Name
Who Can
Access?
SSH Location
Where
Deployed?
Availability
Zone
Subnet CIDR Subnet
Instance
Type?
Instance
Type+Storage
How Many? Instance Count
20. Parameters - Azure
Parameter Azure
Name? Deployment Name
SSH Key? SSH Public Key
Who Can
Access?
SSH Location
Subnet Subnet
Where
Deployed?
Resource Group + Location
22. Parameters - GCP
Parameter GCP
Name? Deployment Name
Where
Deployed?
Zone
Subnet Subnet
Instance
Type?
Instance Type
How Many? Node Count
Instance
Storage?
Node Storage
Who Can
Access?
SSH Location
Dynamic SSH Keys
23. Parameters - Optional Installs
Parameter AWS Azure GCP
Install? Command Center Command Center Command Center
Install? MADlib MADlib MADlib
Install? Data Science Python Data Science Python Data Science Python
Install? Data Science R Data Science R Data Science R
Install? PL/R PL/R PL/R
Install? PostGIS PostGIS PostGIS
● Optional Installs performed by "gpoptional"
30. Documentation
● Release Notes
○ Detailed Information
○ Located On Each Marketplace Listing
● Overview
○ One Pager
○ Located on Each Marketplace Listing