This document discusses how MongoDB supports lean and agile development approaches. It describes key lean concepts like eliminating waste and continuous innovation. Agile frameworks like Scrum and Kanban are also covered. Several case studies are presented that demonstrate how MongoDB was used for real-time analytics, operational intelligence, and large-scale big data projects involving terabytes of mobile network data. The document concludes by emphasizing MongoDB's flexibility for rapidly changing business needs and collecting huge amounts of data.
Scale up - How to build adaptive data systems in the age of viralityJohannes Brandstetter
In this talk we share details about glomex's award-winning data management infrastructure. They’ll show you how a serverless approach can scale automatically to the demands of a highly unpredictable industry as video clips go viral arbitrarily. What is the best architecture for real time data processing? How does a batch-driven BI workflow fit in? What are the key benefits of going to the Cloud? Which AWS services should you use?
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB
1) The document discusses using MongoDB and Kubernetes to reduce impedance mismatches in software stacks and deployment processes.
2) It proposes using a MEAN stack with MongoDB as the database to align the client, server, and data layers. Docker is used to package the application and Kubernetes manages deploying containers across a cluster.
3) The presentation includes demos of deploying a MEAN app to Kubernetes and running MongoDB on Kubernetes, including recovering from node failures through replication and services.
If you implement a microservice architecture correctly, you will end up with a proliferation of different microservices; with multiple instances of each one for redundancy. Find out how you to get microservices to automatically discover each other, share a configuration with real-time updates. See how to eliminate server management altogether with "serverless" microservice frameworks.
Google Cloud Platform - Introduction & Certification Path 2018Pavan Dikondkar
This document provides an overview of Google Cloud Platform career paths and services. It discusses Google Cloud Platform certification programs and tracks for associate cloud engineer, professional cloud architect, and data engineer. The document then summarizes key Google Cloud Platform services including Compute Engine, Container Engine, App Engine, Load Balancing, Cloud DNS, Cloud Storage, Cloud Datastore, and Cloud SQL. It concludes with an invitation for questions.
RedisConf18 - Transforming Vulnerability Telemetry with Redis EnterpriseRedis Labs
This document summarizes Malwarebytes' transformation to leverage big data and AI by building a data and AI team over 18 months. It discusses challenges around scaling to big data volumes and varieties, handling stateful and real-time data, and massive data caching needs. It then outlines architectural goals and solutions developed using Redis Enterprise to build advanced data visualizations and analytics around malware detections, including infection maps, AV failure rates over time, and malware velocity trends. Redis Enterprise helped scale to large data volumes and processing needs cost effectively with high performance and availability.
MongoDB World 2016: NOW TV and Linear Streaming: Scaling MongoDB for High Loa...MongoDB
The document discusses improvements made to NOW TV's streaming platform to better handle unpredictable load from live linear streaming events. Key issues addressed include:
- Heartbeats were changed to not terminate streams on non-OK responses to be more resilient during outages.
- Concurrency tracking was improved by tracking playout slots by device ID rather than just ID, to reclaim slots after app crashes.
- Product data storage was optimized by storing entitlements rather than duplicating product documents.
- Viewing history APIs were improved by merging viewings and bookmark collections and adding indexes.
- MongoDB indexing was optimized to improve performance of queries for viewing history and other APIs.
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk, we’ll mention all of the aspects that you should take into consideration when monitoring a distributed system using tools like Web Services, Spark, Cassandra, MongoDB, AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
This document discusses eBay's private cloud and journey with OpenStack over the past 6 years. It outlines the challenges of developing OpenStack at scale to support eBay's needs, including network design, security, onboarding, CI/CD, configuration management, high availability, monitoring, logging, and customer support. It discusses lessons learned around the difficulty of turning OpenStack into an enterprise-grade cloud, and future directions including enabling containers/microservices, programmable application security, and software-defined networks and data centers to create an automated, efficient, and secure cloud infrastructure.
Scale up - How to build adaptive data systems in the age of viralityJohannes Brandstetter
In this talk we share details about glomex's award-winning data management infrastructure. They’ll show you how a serverless approach can scale automatically to the demands of a highly unpredictable industry as video clips go viral arbitrarily. What is the best architecture for real time data processing? How does a batch-driven BI workflow fit in? What are the key benefits of going to the Cloud? Which AWS services should you use?
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB
1) The document discusses using MongoDB and Kubernetes to reduce impedance mismatches in software stacks and deployment processes.
2) It proposes using a MEAN stack with MongoDB as the database to align the client, server, and data layers. Docker is used to package the application and Kubernetes manages deploying containers across a cluster.
3) The presentation includes demos of deploying a MEAN app to Kubernetes and running MongoDB on Kubernetes, including recovering from node failures through replication and services.
If you implement a microservice architecture correctly, you will end up with a proliferation of different microservices; with multiple instances of each one for redundancy. Find out how you to get microservices to automatically discover each other, share a configuration with real-time updates. See how to eliminate server management altogether with "serverless" microservice frameworks.
Google Cloud Platform - Introduction & Certification Path 2018Pavan Dikondkar
This document provides an overview of Google Cloud Platform career paths and services. It discusses Google Cloud Platform certification programs and tracks for associate cloud engineer, professional cloud architect, and data engineer. The document then summarizes key Google Cloud Platform services including Compute Engine, Container Engine, App Engine, Load Balancing, Cloud DNS, Cloud Storage, Cloud Datastore, and Cloud SQL. It concludes with an invitation for questions.
RedisConf18 - Transforming Vulnerability Telemetry with Redis EnterpriseRedis Labs
This document summarizes Malwarebytes' transformation to leverage big data and AI by building a data and AI team over 18 months. It discusses challenges around scaling to big data volumes and varieties, handling stateful and real-time data, and massive data caching needs. It then outlines architectural goals and solutions developed using Redis Enterprise to build advanced data visualizations and analytics around malware detections, including infection maps, AV failure rates over time, and malware velocity trends. Redis Enterprise helped scale to large data volumes and processing needs cost effectively with high performance and availability.
MongoDB World 2016: NOW TV and Linear Streaming: Scaling MongoDB for High Loa...MongoDB
The document discusses improvements made to NOW TV's streaming platform to better handle unpredictable load from live linear streaming events. Key issues addressed include:
- Heartbeats were changed to not terminate streams on non-OK responses to be more resilient during outages.
- Concurrency tracking was improved by tracking playout slots by device ID rather than just ID, to reclaim slots after app crashes.
- Product data storage was optimized by storing entitlements rather than duplicating product documents.
- Viewing history APIs were improved by merging viewings and bookmark collections and adding indexes.
- MongoDB indexing was optimized to improve performance of queries for viewing history and other APIs.
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk, we’ll mention all of the aspects that you should take into consideration when monitoring a distributed system using tools like Web Services, Spark, Cassandra, MongoDB, AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
This document discusses eBay's private cloud and journey with OpenStack over the past 6 years. It outlines the challenges of developing OpenStack at scale to support eBay's needs, including network design, security, onboarding, CI/CD, configuration management, high availability, monitoring, logging, and customer support. It discusses lessons learned around the difficulty of turning OpenStack into an enterprise-grade cloud, and future directions including enabling containers/microservices, programmable application security, and software-defined networks and data centers to create an automated, efficient, and secure cloud infrastructure.
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Codemotion
In a world of connected devices it is really important to be prepared receiving and managing a huge amount of messages. In this context what is making the real difference is the backend that has to be able to handle safely every request in real time. In this talk we will show how the broad spectrum of highly scalable services makes Google Cloud Platform the perfect habitat for such as workloads.
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Cohesive Networks
Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.
"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldOutlyer
As machine learning moves from niche to mainstream tech stacks how do DevOps engineers prepare for a very different set of problems. A brief look at the new issues that arise from machine learning, an overview of cutting-edge "old school" solutions and how to drag data science (kicking and screaming) into a world of automation.
Video: https://www.youtube.com/watch?v=KHxZCRajRiA
Join DevOps Exchange London here: http://meetup.com/DevOps-Exchange-London/
Follow DOXLON on twitter http://www.twitter.com/doxlon
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...HostedbyConfluent
Embracing open source software for critical platform operations is a tough organizational evolution for a company of any size. This is particularly daunting for technology teams accustomed to a fully supported managed service. Come learn about how we are using OSS to modernize Health Care at UnitedHealth Group as a roadmap to adopt and offer OSS in your own organization!
Over the last three years, Kafka as a Service within UnitedHealth Group has gone from non-existent to being centrally managed and utilized by over 200 internal application teams as an essential component to our ecosystem. In this session, I will share how to tactically implement a Kafka as a Service platform offering within any organization with a very lean team and how to get broad adoption from engineers and leadership.
I'll discuss the engineering cultural changes needed, both on the DevOps team as well as more broadly, to adopt OSS. Spoiler: Documentation is the key to success. I will talk about some of our "aha" moments, including the importance of internal Terms of Service and how to encourage teams to "Google first." I will include things that haven't worked as well, such as requiring manual review of all topic creation PRs (this doesn't scale!).
Attendees will learn how to both stand up their own OSS offering as well as how to be a good internal consumer of other such offerings. Come ready to learn and laugh about my journey to offering OSS to thousands of people!
Kafka Summit SF 2017 - Providing Reliability Guarantees in Kafka at One Trill...confluent
In this presentation, I will talk about my firsthand experience dealing with the unique challenges of running Kafka at a massive scale. If you ever thought that running Kafka is difficult, this talk may change your mind and provide you with valuable insights into how to configure a Kafka cluster efficiently, how to manage Kafka for enterprise customers and how to measure, monitor and maintain the Quality of Kafka Service. Our production Kafka cluster runs over 1500+ VMs, and serves over 10 GBPS data spread across hundreds of topics for multiple teams across Microsoft. We built a self-serve Kafka management service to make the process manageable and scalable across many teams. In this talk, I will also share insights about running Kafka in Private vs multi-tenant mode, supporting failover and disaster recovery requirements, and how to make Kafka Compliant with regulatory certifications such as ISO, SOC, FEDRAMP, etc.
Presented by Nitin Kumar, Microsoft
Systems Track
Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...HostedbyConfluent
Many organizations use Apache Kafka to facilitate the flow of data between multiple applications or data sources. Thanks to Kafka’s distributed architecture, it is easy to set up a scalable and reliable broker, but doing the same with producers or consumers is quite often a fine art. This session provides a quick overview of Couchbase, describes the Couchbase Kafka Connector, and showcases a demo of how it can be used as both a source and a sink for building real-time data processing pipelines for mission-critical applications.
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
This document summarizes key aspects of distributed architecture in a cloud native microservices ecosystem. It discusses Netflix's transition to microservices running in the cloud, key characteristics of microservices and cloud computing like scalability and availability, challenges of operating in the cloud like unpredictable failures and latency, Netflix's open source tools for discovery, circuit breaking, resilience, continuous delivery, and more. It also provides an overview of how to develop, integrate, operate, and optimize microservices in terms of embracing failures, caching, operations, and using a data-driven approach.
The document discusses Google Cloud Platform and its capabilities for building, storing, and analyzing IT infrastructure in the cloud. It highlights key services including Compute Engine, App Engine, Cloud Storage, Cloud Datastore, Cloud SQL, BigQuery, and Cloud Endpoints. The platform offers scalable, reliable and secure computing resources with options for infrastructure, platform and software services as a utility.
Webinar: Gaining Insights into MongoDB with MongoDB Cloud Manager and New RelicMongoDB
In this session, we’ll show how to use MongoDB Cloud Manager to monitor the performance of your cluster. Next, we’ll dive into New Relic and demonstrate how you can view the same database specific metrics from within the APM tool.
Big data at AWS Chicago User Group - 2014AWS Chicago
Big data at AWS Chicago User Group
Most of the slides from the Sept 23rd 2014 AWS User Group in Chicago.
Talks:
"AWS Storage Options" Ben Blair, CTO at MarkITx @stochastic_code
"APIs and Big Data in AWS" - Kin Lane, API Evangelist @kinlane
[coming soon] "Democratizing Data Analysis with Amazon Redshift" - Bill Wanjohi @billwanjohi and Michelangelo D'Agostino @MichelangeloDA, Civis Analytics
Sponsored by Cohesive and CivisAnalytics.
Project Sherpa: How RightScale Went All in on DockerRightScale
We just finished a 7 week project at RightScale to migrate 48 services and 650+ cloud instances to Docker. As a result we’ve been able to accelerate our development processes and cut our cloud costs (a lot). Here we share lessons learned about our experience migrating to Docker and introduce our new Container Manager we added to the RightScale platform to help manage containerized environments.
Spca2014 7 tenets of highly scalable applications kapicNCCOMMS
The document discusses 7 tenets of highly scalable applications:
1. Effective caching mechanisms to avoid roundtrips
2. Using content delivery networks and BLOB storage for the same reason
3. Employing NoSQL storage to avoid bottlenecks
4. Sharding data across multiple databases or storages
5. Using queues to avoid bottlenecks from simultaneous requests
6. Acting asynchronously to optimize server throughput
7. Ensuring redundant design to avoid single points of failure
Going Reactive in the Land of No or How to build modern reactive systems for the modern
world
Sean Walsh, co-author of “Reactive Application Development” and Field CTO at Lightbend and former CEO of Reactibility, shares lessons learned in helping large enterprises convert their monoliths into distributed microservices.
In the Melbourne edition of a 4-city Technology Radar roadshow, ThoughtWorks Australia's Head of Technology Scott Shaw and senior consultant Jen Smith cover topics from all 4 quadrants of the latest edition of the ThoughtWorks Technology Radar. This presentation covers Reactive Architectures, Hamms, Spring Boot vs. Nancy, and Impala.
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...confluent
This document discusses building microservices for data streaming and processing using Spring Cloud and Kafka. It provides an overview of Spring Cloud Stream and how it can be used to build event-driven microservices that connect to Kafka. It also discusses how Spring Cloud Data Flow can be used to orchestrate and deploy streaming applications and topologies. The document includes code samples of building a basic Kafka Streams processor application using Spring Cloud Stream and deploying it as part of a streaming data flow. It concludes with proposing a demonstration of these techniques.
What Does Big Data Mean and Who Will WinBigDataCloud
Michael Ralph Stonebraker is a computer scientist specializing in database research. He is currently an adjunct professor at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today. He is also the founder of a number of database companies, including Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, and Paradigm4. He was previously the Chief Technical Officer (CTO) of Informix & a Professor of Computer Science at University of California, Berkeley. He is also an editor for the book "Readings in Database Systems"
Big data hadoop-no sql and graph db-finalramazan fırın
This document discusses big data, Hadoop, NoSQL databases, and graph databases. It provides an overview of these topics and outlines potential uses for a telecommunications company, such as using big data to prevent customer churn, offer customer-specific campaigns, and get more customers. The document includes definitions and examples of key concepts like Hadoop, MapReduce, NoSQL databases, and the graph database Neo4j. It also summarizes trends in big data and provides examples of how telecom companies can analyze call detail records, model networks, and manage master customer data using these technologies.
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Codemotion
In a world of connected devices it is really important to be prepared receiving and managing a huge amount of messages. In this context what is making the real difference is the backend that has to be able to handle safely every request in real time. In this talk we will show how the broad spectrum of highly scalable services makes Google Cloud Platform the perfect habitat for such as workloads.
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Cohesive Networks
Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.
"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldOutlyer
As machine learning moves from niche to mainstream tech stacks how do DevOps engineers prepare for a very different set of problems. A brief look at the new issues that arise from machine learning, an overview of cutting-edge "old school" solutions and how to drag data science (kicking and screaming) into a world of automation.
Video: https://www.youtube.com/watch?v=KHxZCRajRiA
Join DevOps Exchange London here: http://meetup.com/DevOps-Exchange-London/
Follow DOXLON on twitter http://www.twitter.com/doxlon
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...HostedbyConfluent
Embracing open source software for critical platform operations is a tough organizational evolution for a company of any size. This is particularly daunting for technology teams accustomed to a fully supported managed service. Come learn about how we are using OSS to modernize Health Care at UnitedHealth Group as a roadmap to adopt and offer OSS in your own organization!
Over the last three years, Kafka as a Service within UnitedHealth Group has gone from non-existent to being centrally managed and utilized by over 200 internal application teams as an essential component to our ecosystem. In this session, I will share how to tactically implement a Kafka as a Service platform offering within any organization with a very lean team and how to get broad adoption from engineers and leadership.
I'll discuss the engineering cultural changes needed, both on the DevOps team as well as more broadly, to adopt OSS. Spoiler: Documentation is the key to success. I will talk about some of our "aha" moments, including the importance of internal Terms of Service and how to encourage teams to "Google first." I will include things that haven't worked as well, such as requiring manual review of all topic creation PRs (this doesn't scale!).
Attendees will learn how to both stand up their own OSS offering as well as how to be a good internal consumer of other such offerings. Come ready to learn and laugh about my journey to offering OSS to thousands of people!
Kafka Summit SF 2017 - Providing Reliability Guarantees in Kafka at One Trill...confluent
In this presentation, I will talk about my firsthand experience dealing with the unique challenges of running Kafka at a massive scale. If you ever thought that running Kafka is difficult, this talk may change your mind and provide you with valuable insights into how to configure a Kafka cluster efficiently, how to manage Kafka for enterprise customers and how to measure, monitor and maintain the Quality of Kafka Service. Our production Kafka cluster runs over 1500+ VMs, and serves over 10 GBPS data spread across hundreds of topics for multiple teams across Microsoft. We built a self-serve Kafka management service to make the process manageable and scalable across many teams. In this talk, I will also share insights about running Kafka in Private vs multi-tenant mode, supporting failover and disaster recovery requirements, and how to make Kafka Compliant with regulatory certifications such as ISO, SOC, FEDRAMP, etc.
Presented by Nitin Kumar, Microsoft
Systems Track
Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...HostedbyConfluent
Many organizations use Apache Kafka to facilitate the flow of data between multiple applications or data sources. Thanks to Kafka’s distributed architecture, it is easy to set up a scalable and reliable broker, but doing the same with producers or consumers is quite often a fine art. This session provides a quick overview of Couchbase, describes the Couchbase Kafka Connector, and showcases a demo of how it can be used as both a source and a sink for building real-time data processing pipelines for mission-critical applications.
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
This document summarizes key aspects of distributed architecture in a cloud native microservices ecosystem. It discusses Netflix's transition to microservices running in the cloud, key characteristics of microservices and cloud computing like scalability and availability, challenges of operating in the cloud like unpredictable failures and latency, Netflix's open source tools for discovery, circuit breaking, resilience, continuous delivery, and more. It also provides an overview of how to develop, integrate, operate, and optimize microservices in terms of embracing failures, caching, operations, and using a data-driven approach.
The document discusses Google Cloud Platform and its capabilities for building, storing, and analyzing IT infrastructure in the cloud. It highlights key services including Compute Engine, App Engine, Cloud Storage, Cloud Datastore, Cloud SQL, BigQuery, and Cloud Endpoints. The platform offers scalable, reliable and secure computing resources with options for infrastructure, platform and software services as a utility.
Webinar: Gaining Insights into MongoDB with MongoDB Cloud Manager and New RelicMongoDB
In this session, we’ll show how to use MongoDB Cloud Manager to monitor the performance of your cluster. Next, we’ll dive into New Relic and demonstrate how you can view the same database specific metrics from within the APM tool.
Big data at AWS Chicago User Group - 2014AWS Chicago
Big data at AWS Chicago User Group
Most of the slides from the Sept 23rd 2014 AWS User Group in Chicago.
Talks:
"AWS Storage Options" Ben Blair, CTO at MarkITx @stochastic_code
"APIs and Big Data in AWS" - Kin Lane, API Evangelist @kinlane
[coming soon] "Democratizing Data Analysis with Amazon Redshift" - Bill Wanjohi @billwanjohi and Michelangelo D'Agostino @MichelangeloDA, Civis Analytics
Sponsored by Cohesive and CivisAnalytics.
Project Sherpa: How RightScale Went All in on DockerRightScale
We just finished a 7 week project at RightScale to migrate 48 services and 650+ cloud instances to Docker. As a result we’ve been able to accelerate our development processes and cut our cloud costs (a lot). Here we share lessons learned about our experience migrating to Docker and introduce our new Container Manager we added to the RightScale platform to help manage containerized environments.
Spca2014 7 tenets of highly scalable applications kapicNCCOMMS
The document discusses 7 tenets of highly scalable applications:
1. Effective caching mechanisms to avoid roundtrips
2. Using content delivery networks and BLOB storage for the same reason
3. Employing NoSQL storage to avoid bottlenecks
4. Sharding data across multiple databases or storages
5. Using queues to avoid bottlenecks from simultaneous requests
6. Acting asynchronously to optimize server throughput
7. Ensuring redundant design to avoid single points of failure
Going Reactive in the Land of No or How to build modern reactive systems for the modern
world
Sean Walsh, co-author of “Reactive Application Development” and Field CTO at Lightbend and former CEO of Reactibility, shares lessons learned in helping large enterprises convert their monoliths into distributed microservices.
In the Melbourne edition of a 4-city Technology Radar roadshow, ThoughtWorks Australia's Head of Technology Scott Shaw and senior consultant Jen Smith cover topics from all 4 quadrants of the latest edition of the ThoughtWorks Technology Radar. This presentation covers Reactive Architectures, Hamms, Spring Boot vs. Nancy, and Impala.
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...confluent
This document discusses building microservices for data streaming and processing using Spring Cloud and Kafka. It provides an overview of Spring Cloud Stream and how it can be used to build event-driven microservices that connect to Kafka. It also discusses how Spring Cloud Data Flow can be used to orchestrate and deploy streaming applications and topologies. The document includes code samples of building a basic Kafka Streams processor application using Spring Cloud Stream and deploying it as part of a streaming data flow. It concludes with proposing a demonstration of these techniques.
What Does Big Data Mean and Who Will WinBigDataCloud
Michael Ralph Stonebraker is a computer scientist specializing in database research. He is currently an adjunct professor at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today. He is also the founder of a number of database companies, including Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, and Paradigm4. He was previously the Chief Technical Officer (CTO) of Informix & a Professor of Computer Science at University of California, Berkeley. He is also an editor for the book "Readings in Database Systems"
Big data hadoop-no sql and graph db-finalramazan fırın
This document discusses big data, Hadoop, NoSQL databases, and graph databases. It provides an overview of these topics and outlines potential uses for a telecommunications company, such as using big data to prevent customer churn, offer customer-specific campaigns, and get more customers. The document includes definitions and examples of key concepts like Hadoop, MapReduce, NoSQL databases, and the graph database Neo4j. It also summarizes trends in big data and provides examples of how telecom companies can analyze call detail records, model networks, and manage master customer data using these technologies.
This document discusses lessons learned from building and growing a software startup. It describes how the company quickly built their initial product but ran into scaling issues. It outlines the technical infrastructure changes they made to improve stability, such as moving to the cloud, adding Redis, Resque, and MongoDB. The document also provides recommendations on performance testing, libraries, tools, and localization. Overall it advocates for just starting to build the product now rather than overplanning.
Spil Games: outgrowing an internet startupart-spilgames
This presentation will explain how Spil Games has grown in a short time from an internet startup to a global online gaming company and how we currently are building a global cross datacenter storage solution with MySQL as its backend.
The first part will contain a short summary of where we started with our database engineering department (look ahead at most one week in time), to a more professionalized department (look ahead and plan three to four months in time) to currently growing out of the startup phase (look ahead and plan more than one year in time). This will be illustrated with some examples of the growing pains we encountered with scaling, replication and high availability and leading up to the conclusion that we need to acknowledge our problems and shortcomings to actually be able to overcome them.
The second part of the presentation will contain a comparison of our old architecture against the new architecture. In this new architecture we take into account that failure of a complete datacenter is certain to occur sometime and strive to give our users the best possible experience, even in worst case when data is inaccessible. We also introduce asynchronous calls which enable us to fire and forget most of our writes. The architecture is being built with MySQL, handler sockets, Erlang and Memcache as its building blocks.
This document provides an introduction to big data and NoSQL databases. It begins with an introduction of the presenter. It then discusses how the era of big data came to be due to limitations of traditional relational databases and scaling approaches. The document introduces different NoSQL data models including document, key-value, graph and column-oriented databases. It provides examples of NoSQL databases that use each data model. The document discusses how NoSQL databases are better suited than relational databases for big data problems and provides a real-world example of Twitter's use of FlockDB. It concludes by discussing approaches for working with big data using MapReduce and provides examples of using MongoDB and Azure for big data.
This document provides an overview of NoSQL databases and CouchDB. It discusses how NoSQL databases are a better fit than relational databases for large datasets and real-time applications. It then describes CouchDB, an open-source document-oriented NoSQL database, covering its features like schema-free documents, robustness, concurrency, REST API, views, replication, and deployment in the cloud. The document concludes with a discussion of Erlang and eventually demos CouchDB.
Adoption of MongoDB has accelerated tremendously among developers in the past 18 months, and many large enterprises have now deployed MongoDB in reliable and large scale production environments. However, for many developers, it remains a challenge to convince production teams and business stakeholders to adopt an open source technology that has not been certified yet by their IT teams. This session will provide you with the compelling arguments to reassure business and production teams such as:
Public customer references and real-world case studies (migration, and adoption stories)
Deployment support and practices for robustness
How MongoDB contributes to your company’s business value
Brad Anderson presented on NOSQL databases and CouchDB. He discussed how relational databases do not scale well and are rigid. NOSQL databases like CouchDB are a better fit for large, growing datasets. CouchDB is a document oriented database written in Erlang that uses a REST API and supports views and incremental replication. It can be deployed on a cloud platform to improve scalability, redundancy and query distribution.
Getting Started with MongoDB at Oracle Open World 2012MongoDB
The document provides an overview of getting started with MongoDB. It discusses the benefits of MongoDB, common use cases, and how to get stakeholder buy-in for MongoDB projects. It also addresses execution of MongoDB projects, operational aspects like replica sets and sharding, and the economics advantages in terms of developer, hardware, and software savings compared to relational databases. Finally, it discusses why 10gen is a leader in MongoDB and provides commercial support, upcoming MongoDB features, and free online training. The document concludes by advertising upcoming sessions on a MongoDB use case at Apollo Group.
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/architecture-to-scale/donn-rochette
The relational database model was designed to solve the problems of yesterday’s data storage requirements. The massively connected world of today presents different problems and new challenges. We’ll explore the NoSQL philosophy, before comparing and contrasting the strengths and weaknesses of the relational model versus the NoSQL model. While stepping through real-world scenarios, we’ll discuss the reasons for choosing one solution over the other.
To complete this session, let’s demonstrate our findings with an application written with a NoSQL storage layer and explain the advantages that accrue from that decision. By taking a look at the new challenges we face with our data storage needs, we’ll examine why the principles behind NoSQL make it a better candidate as a solution, than yesterday’s relational model.
Mapping Life Science Informatics to the CloudChris Dagdigian
This document discusses strategies for mapping informatics to the cloud. It provides 9 tips for doing so effectively. Tip 1 advises that high-performance computing and clouds require a new model where resources are dedicated to each application. Tip 2 recommends hybrid cloud approaches but cautions they are less usable than claimed and practical only sometimes. The document emphasizes the need to handle legacy codes in addition to new "big data" approaches.
The document summarizes the development of Scripted, a lightweight browser-based code editor. It discusses observations that heavy IDEs are not ideal for JavaScript development and speed is essential. Two prototypes were created - Orion and Scripted. Scripted focused on speed, code awareness through static analysis, and module system comprehension. Near term goals include improved content assistance and a plugin model. Long term goals include debugging integration and support for additional languages.
RightScale User Conference: Why RightScale?Erik Osterman
RightScale provides a framework for operations that standardizes infrastructure management and allows operations to evolve alongside engineering. It treats infrastructure like software development with reusable components, simplifying operations and reducing technical debt. This framework allows organizations to build infrastructure consistently across clouds, commoditize resources, and empower engineers to take on operational roles through a modern DevOps approach.
This talk, given at the VA Smalltalk Forum Europe 2010 in Stuttgart, gives an overview of techniques and tools to get existing Smalltalk projects back to speed and productivity.
The talk included some demos of tools we created for some of our customers to make their project life much easier.
This document discusses why MongoDB is easier to develop, operate, and scale compared to relational databases. It provides examples of how MongoDB allows for flexible data models, easy administration through packages and shell helpers, and horizontal scaling by adding servers and sharding data across multiple machines. Resources for getting help from the MongoDB community and commercial support options are also listed.
During Kylin OLAP development, we setup many engineering principles in the team. These principles are very important to delivery Kylin with high quality and on schedule.
This document summarizes a presentation about monitoring MySQL at AOL. It discusses the tools used at AOL for MySQL monitoring including a MySQL webpage, Argus for metrics and alerts, and Nagios for fault detection. It covers the goals of monitoring, challenges faced, and resources for learning about MySQL monitoring. It also announces the creation of a new MySQL meetup group for the DC/Baltimore area.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
26. Schema Free
“Your data schema is a direct corollary with how you view your
business’ direction and tech goals. When you pivot, especially if it’s
a significant one, your data may no longer make sense in the
context of that change. Give yourself room to breath. A schema-less
data model is MUCH easier to adapt to rapidly changing
requirements than a highly structured, rigidly enforced schema.”
from:
http://www.cleverkoala.com/2010/08/why-your-startup-should-be-
using-mongodb/
26
40. Pre-Aggregation
• Problem:
– You require up-to-the minute data, or up-to-the-second if
possible
– The queries for ranges of data (by time) must be as fast as
possible
40
59. Operational Intelligence
• Next best activity for support/callcenter
• interpret user session
• e.g. “RaspberryPi - strong interest”
• exp. 2000 events per second
59
68. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
66
69. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
66
70. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
66
71. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
66
72. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
66
73. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
66
74. Big Data Project
• started as prototype, in production now ;-)
• “beyond agile”
• going from
– fetch all, calculate in service layer
– use MongoDB MapReduce on single node
– use MongoDB MapReduce on 5 shards
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
– use EMR (around 10 m2.4xlarge instances)
66
77. Big Data Project
• why not use Aggregation Framework?
– we started with 2.0.6
– would have had to change data model
– M/R seemed the way to go (data size)
69
78. Big Data Project
• Numbers
– data comes in weekly increments
– 2TB raw data
– 14GB / week (into MongoDB)
– data grows in direct proportion to polygon count
– currently 1 replica set of 3 m2.4xlarge instances
70
83. Big Data Project
• more polygons -> more data
– key length can become an issue
• using polygons to display cell metrics
• tried different types of visualizations
75
84. Big Data Project
• key-size per doc: 1.8KB
– bad: {very_descriptive_long_key : “yay”}
– good { v : “yay”}
76
85. Big Data Project
100000 polygons 500000 polygons
0 100.0 200.0 300.0 400.0
62
GB / year
308
77
87. Big Data Project
• 308GB of EBS storage => 332$ per year
– backups / snapshot not considered
79
88. Big Data Project
• Future Plans
– new Use Case
– expecting about 1TB of data / week
80
89. Conclusion
• rapidly changing business needs
• ease of collecting huge amounts of data
• infrastructure as part of code
• MongoDB provides flexibility
81
90. Comments?
• @comsysto
• #MongoMunich2012
• http://blog.comsysto.com
• Don’t forget the hallway track
• Mongo User Group Munich
– http://www.meetup.com/Muenchen-MongoDB-
User-Group/
82