Slides from my talk at Cassandra Summit 2015
http://cassandrasummit-datastax.com/agenda/repeatable-scalable-reliable-observable-cassandra/
thelastpickle.com
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
Service discovery and configuration provisioningSource Ministry
Slides from our talk "Service discovery and configuration provisioning" presented by Mariusz Gil at PHP Benelux 2016
Apache Zookeeper or Consul are almost completely unknown in the PHP world, although its use solves a lot of typical problems. In a nutshell, they are a central services of provisioning configuration information, distributed synchronization and coordination of servers/processes. It simplifies the processes of application configuration management, so it is possible to change its settings and operation in real time (eg. feature flagging). During the presentation the typical cases of use of Zookeeper/Consul in PHP applications will be presented, both strictly web and workers running from the CLI.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
Service discovery and configuration provisioningSource Ministry
Slides from our talk "Service discovery and configuration provisioning" presented by Mariusz Gil at PHP Benelux 2016
Apache Zookeeper or Consul are almost completely unknown in the PHP world, although its use solves a lot of typical problems. In a nutshell, they are a central services of provisioning configuration information, distributed synchronization and coordination of servers/processes. It simplifies the processes of application configuration management, so it is possible to change its settings and operation in real time (eg. feature flagging). During the presentation the typical cases of use of Zookeeper/Consul in PHP applications will be presented, both strictly web and workers running from the CLI.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
Originally using SAP Adaptive Server Enterprise (ASE), the GPS Insight team soon found that relational databases simply aren’t a match for high volume machine data. To top it off, SAP ASE’s clustering technology proved cumbersome to manage and operate. In this presentation, you’ll learn about GPS Insight’s hybrid Scylla deployment that runs on-premises and on AWS datacenter. GPS Insight relies on Scylla to capture and analyze GPS data, offloading data from RDBMS to Scylla for hybrid analytics approach.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
An introduction to DataStax's Brisk (a distribution of Cassandra, Hadoop and Hive). Includes a back story of my own experience with Cassandra plus a demo of Brisk built around a very simple ad-network-type application.
Raquel Guimaraes- Third party infrastructure as codeThoughtworks
While implementing cloud Infrastructure as Code you might have come across the problem of dealing with third-party resources. This is most common in complex environments where most of the resources live in a cloud provider (GCP or AWS for example) and there are some SaaS solutions to integrate with (Datadog and Pingdom for example). In this talk we will expose the problem and explain a solution that is currently being used by one of our key clients in Spain.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Kief Morris - Infrastructure is terribleThoughtworks
Why is nearly every infrastructure project I've run across a big ball of mud? We're still in the early days of infrastructure as code tooling, so we're struggling with messy glue code, configuration files, and weird custom scripts and tools. What can you do on your project to cope with the current state of tooling? And what should we, as an industry, do to level up?
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
Watch this to learn:
Real-world examples of the benefits of using a streaming database like ksqlDB and seamlessly combining data from Kafka & Cassandra/Scylla (NoSQL).
The functionality of ksqlDB via push/pull queries and UDFs/UDAFs/UDTFs.
The ease with which data stored in a NoSQL database can be transformed using ksqlDB and then persisted back for long-term storage.
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started... which soon becomes 50 distributed replica sets as volume increases. This talk describes how we designed a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. In this deep-dive talk, we explore how we've used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
Hardening cassandra for compliance or paranoiazznate
How to secure a cassandra cluster. Includes details on configuring SSL, setting up a certificate authority and creating certificates and trust chains for the JVM.
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
Real World DTCS For Operators
The introduction of DateTieredCompactionStrategy in late 2014 was a significant step forward in providing a viable compaction strategy for time series data, especially time series data that will be TTL'd out. DateTieredCompactionStrategy's introduction was met with genuine excitement, and its rapid adoption is testament to developers' and operators' desire to have data compacted in a way that better matches their write patterns.
However, DateTieredCompactionStrategy's features come with significant limitations. This talk will review our real world benchmarking and use cases for DTCS as a vehicle to discuss the implications of DateTieredCompactionStrategy on operational tasks such as repair, read-repair, bootstrapping, and especially DR recovery scenarios, and it will also discuss how those various limitations lead us to proposing an operations-friendly alternative to DateTieredCompactionStrategy.
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
Originally using SAP Adaptive Server Enterprise (ASE), the GPS Insight team soon found that relational databases simply aren’t a match for high volume machine data. To top it off, SAP ASE’s clustering technology proved cumbersome to manage and operate. In this presentation, you’ll learn about GPS Insight’s hybrid Scylla deployment that runs on-premises and on AWS datacenter. GPS Insight relies on Scylla to capture and analyze GPS data, offloading data from RDBMS to Scylla for hybrid analytics approach.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
An introduction to DataStax's Brisk (a distribution of Cassandra, Hadoop and Hive). Includes a back story of my own experience with Cassandra plus a demo of Brisk built around a very simple ad-network-type application.
Raquel Guimaraes- Third party infrastructure as codeThoughtworks
While implementing cloud Infrastructure as Code you might have come across the problem of dealing with third-party resources. This is most common in complex environments where most of the resources live in a cloud provider (GCP or AWS for example) and there are some SaaS solutions to integrate with (Datadog and Pingdom for example). In this talk we will expose the problem and explain a solution that is currently being used by one of our key clients in Spain.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Kief Morris - Infrastructure is terribleThoughtworks
Why is nearly every infrastructure project I've run across a big ball of mud? We're still in the early days of infrastructure as code tooling, so we're struggling with messy glue code, configuration files, and weird custom scripts and tools. What can you do on your project to cope with the current state of tooling? And what should we, as an industry, do to level up?
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
Watch this to learn:
Real-world examples of the benefits of using a streaming database like ksqlDB and seamlessly combining data from Kafka & Cassandra/Scylla (NoSQL).
The functionality of ksqlDB via push/pull queries and UDFs/UDAFs/UDTFs.
The ease with which data stored in a NoSQL database can be transformed using ksqlDB and then persisted back for long-term storage.
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started... which soon becomes 50 distributed replica sets as volume increases. This talk describes how we designed a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. In this deep-dive talk, we explore how we've used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
Hardening cassandra for compliance or paranoiazznate
How to secure a cassandra cluster. Includes details on configuring SSL, setting up a certificate authority and creating certificates and trust chains for the JVM.
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
Real World DTCS For Operators
The introduction of DateTieredCompactionStrategy in late 2014 was a significant step forward in providing a viable compaction strategy for time series data, especially time series data that will be TTL'd out. DateTieredCompactionStrategy's introduction was met with genuine excitement, and its rapid adoption is testament to developers' and operators' desire to have data compacted in a way that better matches their write patterns.
However, DateTieredCompactionStrategy's features come with significant limitations. This talk will review our real world benchmarking and use cases for DTCS as a vehicle to discuss the implications of DateTieredCompactionStrategy on operational tasks such as repair, read-repair, bootstrapping, and especially DR recovery scenarios, and it will also discuss how those various limitations lead us to proposing an operations-friendly alternative to DateTieredCompactionStrategy.
Security is often an afterthought; configured and applied at the last minute before rolling out a new system. Instaclustr has deployed Cassandra for customers with many different requirements.
From deployments in Heroku requiring total public access through to private data centres, we will walk you through securing Cassandra the right way.
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
In the rush to release a new product, a new version or simply trying to get things working, security can sometimes be an afterthought. In this talk, Ben Bromhead CTO of Instaclustr, will explore the various ways in which you can setup and secure Cassandra appropriately for your threat environmen
The Log-structured Merge-Tree storage engine in Apache Cassandra allows for fast write performance, but has some potential downsides when it comes to deleting data. Expired columns (from TTL's) and Tombstones can impact on read performance until they are purged from disk by Compaction. And while different Compaction Strategies are suited to different workloads, they must all ensure deleted data stays deleted.
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
A CHANGE OF SEASONS: A big move to Apache Cassandra!
This is an extended version of the material presented at Cassandra Summit 2015 - Santa Clara - California - USA.
In this presentation I will show you 3 moves, use cases, that constitute our Big Move to Apache Cassandra @Movile.
Walking through relational model to NoSQL solution, hybrid platforms and a staggering cost reduction and throughput increase.
Ficstar Software: Cassandra Installation to OptimizationDataStax Academy
A general rule of thumb talk aimed at late bloomers, managers, directors and architects who have yet to adopt Cassandra.
Covers:
- what not to do.
- operational setup
- data modeling
- performance tuning
- capacity planning
- advanced use cases
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
Making the data of a company accessible to analysts, business users and data scientists can be a quite painful endeavor. In the past 5 years, Project A has supported many of its portfolio companies with building data infrastructures and we experienced many of these pains first-hand. This talk shows how some of these pains can be overcome by applying common sense and standard software engineering best practices.
[WSO2Con Asia 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented
Learn more: https://wso2.com/library/conference/2018/08/wso2con-asia-2018-patterns-for-building-streaming-apps/
Data Exploration with Apache Drill: Day 2Charles Givre
Study after study shows that data scientists and analysts spend between 50% and 90% of their time preparing their data for analysis. Using Drill, you can dramatically reduce the time it takes to go from raw data to insight. This course will show you how.
The course material for this presentation are available at https://github.com/cgivre/data-exploration-with-apache-drill
Introduction to WSO2 Data Analytics PlatformSrinath Perera
WSO2 have had several analytics products: WSO2 BAM and WSO2 CEP for some time (or Big Data products if you prefer the term). We are added WSO2 Machine Learner, a product to create, evaluate, and deploy predictive models and renamed WSO2 BAM to WSO2 DAS ( Data Analytics Server).
The platform let you publish ( collect data) once and process them through batch ( Spark) , realtime ( CEP), search the data ( Lucene) and build machine learning models.
This post describes how all those fit within to a single story.
For more information, see https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
Customers regularly use Apache Spark running on Amazon EMR to process large amounts of data. As time to insight and the ability to act quickly based on those insights become core differentiators for customers, there is a greater need to be able to analyze data in real time. In this session, we teach you several design patterns to process and analyze real-time streaming data using Amazon EMR and Amazon Kinesis data services.
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business. This session explores how to enable digital transformation by building a data analytics platform.
A presentation I made for Apache Spark and Apache Cassandra Integration.
First I present what are some of the differences between RDBMS and NoSQL, then I proceed with the Cassandra infrastructure and usual errors when creating a Cassandra Data Model.
Finally, I provide the Spark underlying main concepts and some settings for proper configuration.
Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical example for the discussion.
Related blog post "Cash for Clunkers - a Typical Sales Campaign?" http://practicalhoshin.blogspot.com/2009/08/cash-for-clunkers-typical-sales.html
We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.
Data infrastructure for the other 90% of companiesMartin Loetzsch
Abstract: Unscientific guess: 90% of the companies out there neither have the data amounts nor the real-time requirements that justify maintaining a big data streaming infrastructure. Still, these companies also need to integrate data in order to improve their products and processes. Some of them then still use Spark to handle a few GB of data, but for the vast majority, running SQL scripts in simple relational databases does the trick. In this talk, I will give some recommendations and best practices for setting up data integration infrastructure with open source technologies. I will explain why PostgreSQL is a perfect fit for building data warehouses with up to a few TB of data. And I will argue that Airflow is probably not the best tool for orchestrating the execution of SQL scripts.
Presented at the Data Council Meetup Kickoff in Berlin
Application metrics with Prometheus - DPC18Rafael Dohms
We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover Service Level Indicators (SLI), Objectives (SLO), and how to use Prometheus, an open-source system monitoring and alert platform, to measure and make sense of them. Let's get back to some real science.
We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL aaronmorton
Slides from my talk at Cassandra Summit 2015
http://cassandrasummit-datastax.com/agenda/steady-state-data-size-with-compaction-tombstones-and-ttl/
thelastpickle.com
My talk from http://wdcnz.com 2012.
I took a brief look at Cassandra and then stepped through building a twitter clone. Very rough code is at https://github.com/amorton/wdcnz-2012-site
Building a distributed Key-Value store with Cassandraaaronmorton
Slides from my talk at Kiwi Pycon in 2010.
Covers why we chose Cassandra, overview of it's feature and data model, and how we implemented our application.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
1. CASSANDRA SF 2015
REPEATABLE, SCALABLE, RELIABLE,
OBSERVABLE CASSANDRA
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
2. AboutThe Last Pickle.
Work with clients to deliver and improve Apache Cassandra
based solutions.
Apache Cassandra Committer, DataStax MVP, Apache
Usergrid Committer.
Based in New Zealand,Australia, & USA.
5. No Look Writes
CREATE TABLE user_visits (
user text,
day int, // YYYYMMDD
PRIMARY KEY (user, day)
);
6. No Look Writes
// Bad
SELECT *
FROM user_visits
WHERE user = ‘aaron’ AND day = 20150924;
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
7. No Look Writes
// Better
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
21. Concurrent Asynchronous Requests
// request for cities concurrently
SELECT *
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'Santa Clara';
SELECT *
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'San Jose';
24. Data Model SmokeTest
/*
* Get Pricing Data
*/
// Load Data
INSERT INTO city_distances (city, distance, nearby_city)
VALUES ('Santa Clara', 0, 'Santa Clara');
INSERT INTO city_distances (city, distance, nearby_city)
VALUES ('Santa Clara', 1, 'San Jose');
INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data)
VALUES (20150924, 'Santa Clara', 'Hilton Santa Clara', 0xFF);
INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data)
VALUES (20150924, 'San Jose', 'Hyatt San Jose', 0xFF);
25. Data Model SmokeTest
// Step 1
// Get the near by cities for the one selected by the user
SELECT nearby_city
FROM city_distances
WHERE city = 'Santa Clara' and distance < 2;
// Step 2
// Parallel requests for each city returned.
SELECT city, hotel_name, price_data
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'Santa Clara';
SELECT city, hotel_name, price_data
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'San Jose';
55. Disk SmokeTests
“Disk Latency and Other
Random Numbers”
Al Toby
http://tobert.github.io/post/2014-11-13-slides-disk-
latency-and-other-random-numbers.html