This presentation, from the 2014 Datastax Cassandra Tech Day in Houston, TX, is a quick overview on how iland Internet Solutions leverages Cassandra as its sole data storage platform for performance metrics as well as configuration across multiple-data centers in the US, EU and Asia.
iland exposes these metrics to its customers trough its ECS portal application which is covering a wealth of functionality including offering visibility into resource consumption, billing, performance, alerts, the impact of change and other key areas. The platform also provides predictive analytics that help companies monitor performance, achieve consistency and anticipate growth requirements.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Introduction to Cassandra and CQL for Java developersJulien Anguenot
This talk will provide a high-level overview of Cassandra, the Cassandra Query Language (CQL) and more specifically the DataStax CQL Java driver. This talk will aim to introduce Java developers tools, techniques and best practices for building Java application leveraging the Cassandra database using CQL3.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
Customizing JVM settings for the needs of an application can be a tricky business, especially when running externally developed software such as Cassandra. In this talk I will share our experiences and the procedure that we have used to test and validate changes with Java tuning. We'll explore with two recent experiences: changes and monitoring of G1 garbage collection, and moving buffer objects off the heap.
For the talk, I'll discuss our tuning process at Knewton. I will share some of the challenges that we faced while identifying what we expected to learn. I'll discuss how we isolated and minimized variables across tests, the importance of the duration of these tests, and how we try to separate correlation from causation. I will demonstrate how to use and interpret the results of the custom scripts that we were driven to develop to gain visibility into our G1GC processes; these scripts will be open sourced.
About the Speaker
Carlos Monroy Senior Software Engineer, Knewton
Carlos Monroy is a senior engineer on the database team at Knewton, an education company that created an adaptive learning platform. Carlos has been developing software professionally since 1998. His experience holding multiple roles on the software lifecycle provides him a wholistic approach. Having used over a half dozen relational database engines, he has recently come over to the NoSQL side, first working with HBase and for the last three years Cassandra.
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Introduction to Cassandra and CQL for Java developersJulien Anguenot
This talk will provide a high-level overview of Cassandra, the Cassandra Query Language (CQL) and more specifically the DataStax CQL Java driver. This talk will aim to introduce Java developers tools, techniques and best practices for building Java application leveraging the Cassandra database using CQL3.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
Customizing JVM settings for the needs of an application can be a tricky business, especially when running externally developed software such as Cassandra. In this talk I will share our experiences and the procedure that we have used to test and validate changes with Java tuning. We'll explore with two recent experiences: changes and monitoring of G1 garbage collection, and moving buffer objects off the heap.
For the talk, I'll discuss our tuning process at Knewton. I will share some of the challenges that we faced while identifying what we expected to learn. I'll discuss how we isolated and minimized variables across tests, the importance of the duration of these tests, and how we try to separate correlation from causation. I will demonstrate how to use and interpret the results of the custom scripts that we were driven to develop to gain visibility into our G1GC processes; these scripts will be open sourced.
About the Speaker
Carlos Monroy Senior Software Engineer, Knewton
Carlos Monroy is a senior engineer on the database team at Knewton, an education company that created an adaptive learning platform. Carlos has been developing software professionally since 1998. His experience holding multiple roles on the software lifecycle provides him a wholistic approach. Having used over a half dozen relational database engines, he has recently come over to the NoSQL side, first working with HBase and for the last three years Cassandra.
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsDataStax Academy
The internal battle has been fought, and Cassandra is your group's NoSQL platform of choice! Hooray! But now what? This talk will introduce you to all the basic operations concepts you need to know to start your foray into the wonderful world of Cassandra off right. Or even if you have already started but are looking for a solid holistic overview... this is the talk for you!
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
For this upcoming meetup, we welcome Patrick Eaton PhD, Systems Architect at Stackdriver, and Joey Imbasciano, Cloud Platform Engineer at Stackdriver.
What You'll Learn At This Meetup:
• Why Stackdriver chose Cassandra over other DB offerings
• Stackdriver's data pipeline that runs into Cassandra
• Operating Cassandra Running on AWS
• Stackdriver's approach to disaster recovery
Patrick and Joey will be presenting their use of Apache Cassandra at Stackdriver, some lesson's learned, technical tips and a Q&A to end the evening.
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
We, Ben Whitehead and Robert Stupp, will show you how to run Cassandra on Mesos. We will go through all the technical steps how to plan, setup and operate even large scale Cassandra clusters on Mesos. Further we illustrate how the Cassandra-on-Mesos framework helps you to setup Cassandra on Mesos, schedule regular maintenance tasks and manage hardware failures in the heart of your data center.
Presenter: Adam Zeglin, CTO of Instaclustr
In this presentation we discuss a method of provisioning and running an Apache Cassandra deployment spilt between multiple heterogeneous data centers which, rather than allocating per-node public IPv4 addresses or configuring mesh VPNs, uses Port Address Translation (PAT) for node↔internet connectivity and is self- configuring and discoverable via DNS Service Discovery (DNS-SD or wide-area Bonjour). While Cassandra has built-in support for AWS EC2 multi-region/data centre topologies (via Ec2MultiRegionSnitch, etc), the existing solution requires the wasteful allocation of public IPv4 addresses per-node. Additionally there is little support for topologies that are either a mix of or deploy completely on alternative infrastructure providers. Our solution uses a single public IP address per data center, is provider-agnostic, doesn’t introduce the configuration and management overheads of a mesh VPN between data centres, and allows nodes to automatically discover each-other.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
Outbrain is the world's largest content discovery program. Learn about their use case with Scylla where they lowered latency while doing 20X IOPS of Cassandra.
Slides from my talk at Cassandra Summit 2016 on troubleshooting Cassandra. This is a reprise of my popular talk from last summit, reorganized, expanded, and updated for Cassandra 3.0. In it I share the secrets I've learned in four years of supporting hundreds of customers using Apache Cassandra and DataStax Enterprise. Be sure to check out presenter notes for additional tips and links to further resources.
NewSQL overview:
- History of RDBMs
- The reasons why NoSQL concept appeared
- Why NoSQL was not enough, the necessity of NewSQL
- Characteristics of NewSQL
- 7 DBs that belongs to NewSQL
- Overview Table with main properties
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
ScyllaDB is a distributed database designed to scale horizontally and vertically — in theory. What about in practice? ScyllaDB’s Benny Halevy, Director, Software Engineering, will take you through the process and results of benchmarking our NoSQL database at the petabyte level, showing how you can use advanced features like workload prioritization to control priorities of transactional (read-write) and analytic (read-only) queries on the same cluster with smooth and predictable performance.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
Cassandra databases at Spotify hold all sorts of interesting data sets. Quite obviously, we would like to allow our data scientists tap these data sets.
Recent developments in the offerings of cloud vendors allowed us to engineer systems that answer this use case in an unprecedented way.
In this talk we'll present how we turned the process of exporting data from Cassandra clusters into a trivially parallelizible problem. Using just a few basic cloud products we've managed to dump our largest clusters containing terabytes of data in the order of minutes.
About the Speaker
Emilio Del Tessandoro Software Engineer, Spotify
Emilio Del Tessandoro is a software engineer working on tooling and automation for the Spotify storage infrastructure. He is interested in theoretical computer science with a focus on algorithms and scalable systems.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
This session will explore how to apply GeoSpatial analytics using Apache Spark on high-velocity streaming (data-in-motion) and high-volume batch (data-at-rest). Demonstrations will be performed throughout the session to cement these concepts.
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsDataStax Academy
The internal battle has been fought, and Cassandra is your group's NoSQL platform of choice! Hooray! But now what? This talk will introduce you to all the basic operations concepts you need to know to start your foray into the wonderful world of Cassandra off right. Or even if you have already started but are looking for a solid holistic overview... this is the talk for you!
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
For this upcoming meetup, we welcome Patrick Eaton PhD, Systems Architect at Stackdriver, and Joey Imbasciano, Cloud Platform Engineer at Stackdriver.
What You'll Learn At This Meetup:
• Why Stackdriver chose Cassandra over other DB offerings
• Stackdriver's data pipeline that runs into Cassandra
• Operating Cassandra Running on AWS
• Stackdriver's approach to disaster recovery
Patrick and Joey will be presenting their use of Apache Cassandra at Stackdriver, some lesson's learned, technical tips and a Q&A to end the evening.
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
We, Ben Whitehead and Robert Stupp, will show you how to run Cassandra on Mesos. We will go through all the technical steps how to plan, setup and operate even large scale Cassandra clusters on Mesos. Further we illustrate how the Cassandra-on-Mesos framework helps you to setup Cassandra on Mesos, schedule regular maintenance tasks and manage hardware failures in the heart of your data center.
Presenter: Adam Zeglin, CTO of Instaclustr
In this presentation we discuss a method of provisioning and running an Apache Cassandra deployment spilt between multiple heterogeneous data centers which, rather than allocating per-node public IPv4 addresses or configuring mesh VPNs, uses Port Address Translation (PAT) for node↔internet connectivity and is self- configuring and discoverable via DNS Service Discovery (DNS-SD or wide-area Bonjour). While Cassandra has built-in support for AWS EC2 multi-region/data centre topologies (via Ec2MultiRegionSnitch, etc), the existing solution requires the wasteful allocation of public IPv4 addresses per-node. Additionally there is little support for topologies that are either a mix of or deploy completely on alternative infrastructure providers. Our solution uses a single public IP address per data center, is provider-agnostic, doesn’t introduce the configuration and management overheads of a mesh VPN between data centres, and allows nodes to automatically discover each-other.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
Outbrain is the world's largest content discovery program. Learn about their use case with Scylla where they lowered latency while doing 20X IOPS of Cassandra.
Slides from my talk at Cassandra Summit 2016 on troubleshooting Cassandra. This is a reprise of my popular talk from last summit, reorganized, expanded, and updated for Cassandra 3.0. In it I share the secrets I've learned in four years of supporting hundreds of customers using Apache Cassandra and DataStax Enterprise. Be sure to check out presenter notes for additional tips and links to further resources.
NewSQL overview:
- History of RDBMs
- The reasons why NoSQL concept appeared
- Why NoSQL was not enough, the necessity of NewSQL
- Characteristics of NewSQL
- 7 DBs that belongs to NewSQL
- Overview Table with main properties
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
ScyllaDB is a distributed database designed to scale horizontally and vertically — in theory. What about in practice? ScyllaDB’s Benny Halevy, Director, Software Engineering, will take you through the process and results of benchmarking our NoSQL database at the petabyte level, showing how you can use advanced features like workload prioritization to control priorities of transactional (read-write) and analytic (read-only) queries on the same cluster with smooth and predictable performance.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
Cassandra databases at Spotify hold all sorts of interesting data sets. Quite obviously, we would like to allow our data scientists tap these data sets.
Recent developments in the offerings of cloud vendors allowed us to engineer systems that answer this use case in an unprecedented way.
In this talk we'll present how we turned the process of exporting data from Cassandra clusters into a trivially parallelizible problem. Using just a few basic cloud products we've managed to dump our largest clusters containing terabytes of data in the order of minutes.
About the Speaker
Emilio Del Tessandoro Software Engineer, Spotify
Emilio Del Tessandoro is a software engineer working on tooling and automation for the Spotify storage infrastructure. He is interested in theoretical computer science with a focus on algorithms and scalable systems.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
This session will explore how to apply GeoSpatial analytics using Apache Spark on high-velocity streaming (data-in-motion) and high-volume batch (data-at-rest). Demonstrations will be performed throughout the session to cement these concepts.
The prpl foundation is an open-source, community-driven, collaborative, organization. It mainly targets and supports the MIPS architecture – but it is open to all –, with a focus on enabling next-generation datacenter-to-device portable software and virtualized architectures.
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
DevOps helps accelerate the delivery of software applications through automation and by removing Development & Operations silos. The Netflix Platform Engineering team has developed a robust data pipeline solution called SURO that has been open sourced. Come learn from the experiences of pioneers like Netflix how they are leveraging the data pipeline for new and innovative use cases. This is the presentation by Danny Yuan, Netflix Platform Engineering Team on operational and monitoring aspects of applications on cloud platforms.
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
Think you have big data? What about high availability
requirements? At DataDog we process billions of data points every day including metrics and events, as we help the world
monitor the their applications and infrastructure. Being the world’s monitoring system is a big responsibility, and thanks to
Redis we are up to the task. Join us as we discuss how the DataDog team monitors and scales Redis to power our SaaS based monitoring offering. We will discuss our usage and deployment patterns, as well as dive into monitoring best practices for production Redis workloads
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...Cubic Corporation
Stream processing use cases today are often discussed within the world of Big Data, the Internet, social media and log analytics, where data rates in real-time world scenarios rarely exceed 10,000 records per second. The emergence of the Internet of Things and new smart services require real-time responses from data arriving at rates of millions of records per second, with 24x7 operation and support for dynamic updates for business services, data feeds and enterprise integrations. This presentation discusses the requirements for operational real-time systems in an Internet of Things environment, where industrial automation, smart cities and telematics require robust and massively scalable data management platforms.
This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing
Reference architecture for Internet of ThingsSujee Maniyam
What kind of a data infrastructure is needed, to support Internet of Things?
This talk presents a reference architecture.
We are actually building this architecture as open source project. See here : bit.ly / iotxyz
Splunk is a powerful platform that can harness your machine data and turn it into valuable information thereby enabling your business to make informed decisions, taking your organization from reactive to proactive. Just like any other platform, Splunk is only as powerful as the data it has access to, therefore in this session we will be conducting a walk thru of how to successfully on-board data, with samples of data ranging from simple to complex. We will also be taking a look at how to use common TA’s to bring valuable data into Splunk. This session is designed to give you a better understanding of how to onboard data into Splunk enabling you to unlock the power of your data.
Splunk is a powerful platform that can harness your machine data and turn it into valuable information thereby enabling your business to make informed decisions, taking your organization from reactive to proactive. Just like any other platform, Splunk is only as powerful as the data it has access to, therefore in this session we will be conducting a walk thru of how to successfully on-board data, with samples of data ranging from simple to complex. We will also be taking a look at how to use common TA’s to bring valuable data into Splunk. This session is designed to give you a better understanding of how to onboard data into Splunk enabling you to unlock the power of your data
We are already in the era of ‘connected things’ -- our mobile phones, computers, tablets and wearables sync and talk via ‘cloud’.
IoT presents a lot of challenges, especially data coming in at high volume and high velocity - imagine billions of devices each sending readings every few seconds or minutes. All this data need to be stored and analyzed. Plus we need to think about security, privacy and compliance.
In this talk we will present a reference data architecture for IoT. We draw on the industry’s best practices on Big Data including Lambda architecture patterns. Lambda architecture outlines generic, scalable and fault-tolerant data processing. We will illustrate a potential data pipeline and discuss how each stage might be implemented and technology choices at each stage.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Brian Brownlow is an experienced senior analyst programmer for Mayo Clinic. He is made a workshop presentation at the 2014 BDPA Technology Conference on the topic, 'Big Data Implementation - Mayo Clinic Case Study'. This presentation will show part of the Mayo Clinic story on the embarking of an exploration of the application of `Big Data' technologies. `Big Data' is seen as one set of tools that can be used to enhance medical research, medical education and practice management. Mayo Clinic is always searching for better, faster and cheaper ways to use its data to improve patient care and sustain financial outcomes in a challenging reimbursement environment. Our approach uses several components that are open source and combines them with data from various sources to provide information to decision makers in near real time. We have created a center of `Big Data' excellence using in-house staff and vendor engagements. `Big Data' is one element of our Enterprise Data Trust framework.
Similar to Real-time data analytics with Cassandra at iland (20)
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
4. Julien Anguenot @ Cassandra Tech Day Houston 2014
iland Internet Solutions
19 8
8 3
Years delivering
Years of delivering IT
Services
Years cloud
infrastructure & disaster
recovery expertise
IISO 27001 &
SSAE16 global
data centers
Cloud-based
Specializations:
Production;
Test & Dev; DR
4
5. Julien Anguenot @ Cassandra Tech Day Houston 2014
iland platform essentially
• data warehouse running across multiple data-centers!
• monitoring (resource consumption / performance)!
• billing!
• alerting!
• predictive analytics!
• cloud management!
• cloud services (backups, DR, etc.)!
• desktop and mobile applications (iland portal app)
5
11. Why did we choose Cassandra?
• MySQL and MongoDB attempts been big fails!
• write latency (constant-time writes)!
• distributed nature (multi-data centers)!
• scalability, reliability, performance, availability!
• sharding makes things complicated!
• no master - slave means no SPOF!
• simplicity!
Julien Anguenot @ Cassandra Tech Day Houston 2014
11
18. Julien Anguenot @ Cassandra Tech Day Houston 2014
C* iland’s cluster (1/2)
• one (1) C* cluster !
• 6 data centers - 25+ nodes - one (1) keyspace
18
19. Julien Anguenot @ Cassandra Tech Day Houston 2014
C* iland’s cluster (2/2)
• each DC (1 or 2 racks of 3 nodes)!
• 3 nodes per location for writes!
• replication Factor (RF) = 3 !
• 3 nodes per location for API reads (Dallas, TX, London, UK, Singapore)!
• each node!
• VM (vCenter powered) - Ubuntu 14.04 LTS w/ Cassandra base dpkg!
• 16GB of RAM / 16 vCPUs!
• currently: ~ 1TB of data per node!
• not using SSD (yet)
19
20. Asia
Data centers and replication (1/2)
Julien Anguenot @ Cassandra Tech Day Houston 2014 20
US
LA,CA Reston, VA
Dallas, TX
Singapore
EU
London,UK
Manchester,UK
21. Data centers and replication (2/2)
iland core platform iland core platform
Julien Anguenot @ Cassandra Tech Day Houston 2014
21
iland ReST API
C* W
iland ReST API
C* R C* W C* R
C* R only deployed in: Dallas, TX - London, UK - Singapore
22. US API EU API Asia API
EU portal Asia portal
Citrix Netscaler (US)
Julien Anguenot @ Cassandra Tech Day Houston 2014
Access to data (read / write)
22
US portal
https://my.ilandcloud.com/
23. Julien Anguenot @ Cassandra Tech Day Houston 2014
At the application level?
• History!
• POC: C* 1.2 + Astyanax (2013/02)!
• V1: C* 2.0 + Astyanax w/ CQL3 over thrift (2013/06) !
• Current version: C* 2.0 + DataStax CQL Java driver!
• Java / JEE cluster (Wildfly AS)!
• AMQP / RabbitMQ!
• Redis to cache up “hot” data (read latency)!
• Python & DataStax CQL driver for cluster / data upgrades
23