Designing & Optimizing micro batch processing system to handle multi-billion events using 100+ nodes of Cassandra , spark and Kafka - Lessons learned from the trenches
Designing and Optimizing 20+ billion operations a day presents a set of complex challenges especially when the SLA is near real-time. In this presentation we will walk through our experience in building large scale event processing pipeline using Cassandra , spark streaming and kafka using 100+ nodes. We will present the Design patterns, development steps and diagnostics setups at the technology level and application level that are needed to manage the application of this scale. We also aim to present some unique problems we encountered in optimizing and operationalizing these environments.
About the Speakers
Ananth Ram Senior Principal / Senior Manager, Accenture
Ananth Ram is a Solution Architect with over 17 years of experience in Oracle database Architecture and designing large scale applications. He was with Oracle Corp for nine years before joining Accenture as Senior Principal . As a part of Accenture, Ananth has been working on many large scale Oracle and big data initiatives in the last four years.
Rich Rein Solution Architect, DataStax
Rich Rein is a Solutions Architect from DataStax on Accenture team with over 30+ years as an architect, manager, and consultant in Silicon Valley's computing industry.
Rumeel Kazi, Accenture Federal
Rumeel Kazi is a Senior Manager in the Accenture Health & Public Service (H&PS) practice. He has over 17 years of Systems Integration implementation experience involving Oracle, J2EE platforms, Enterprise Application Integration, Supply Chain, ETL and Business Rules Management Systems. Rumeel has been working on large scale Oracle and big data application solutions since the last 5 years.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
A deep learning startup has a requirement for a robust and scalable data architecture. Training a Deep Neural Network requires 10s-100s of millions of examples consisting of data and metadata. In addition to training it is necessary to support test/validation, data exploration and more traditional data science analytics workloads. As a startup we have minimal resources and an engineering team of 1.
Cassandra, Spark and Kafka running on Mesos in AWS is a scalable architecture that is fast and easy to set up and maintain to deliver a data architecture for Deep Learning.
About the Speaker
Andrew Jefferson VP Engineering, Tractable
A software engineer specialising in realtime data systems. I've worked at companies from Startups to Apple on applications ranging from Ticketing to Genetics. Currently building data systems for training and exploiting Deep Neural Networks.
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax
DataStax provides modern, feature-rich, and highly tunable client libraries for C/C++, C#, Java, Node.js, Python, PHP, and Ruby that work with any cluster size no matter if deployed across multiple on premise or cloud datacenters.
Come learn right from the source about the DataStax drivers for Apache Cassandra and DSE and how they can help you build continuously available, fault tolerant, and instantly responsive applications.
About the Speakers
Alex Popescu Senior Product Manager, DataStax
I'm a developer turned product manager building developer tools for Apache Cassandra and DSE. With an eye for simplicity, I focus on creating friendly developer solutions that enable building high-performance, scalable, and fault tolerant applications. I'm passionate about open source and over years I made numerous contributions to major projects like TestNG and Groovy.
Bulat Shakirzyanov Architect, DataStax
Bulat Shakirzyanov, a.k.a. avalance123, is a software alchemist who holds a black belt in test-fu. Open source enthusiast, author of and contributor to several popular open source projects, he also loves talking about clean code, open source, unix, distributed systems, consensus algorithms and himself in third person.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
In this talk, we review a real-world use case that tested the Cassandra+Spark stack on Datastax Enterprise (DSE). We also cover implementation details around application high availability and fault tolerance using the new DSE File System (DSEFS). From a field and testing perspective, we discuss the strategies we can leverage to meet our requirements. Such requirements include (but not limited to) functional coverage, system integration, usability, and performance. We will discuss best practices and lessons we learned covering everything from application development to DSE setup and tuning.
About the Speaker
Rocco Varela Software Engineer in Test, DataStax
After earning his PhD in bioinformatics from UCSF, Rocco Varela took his passion for technology to DataStax. At DataStax he works on several aspects of performance and test automation around DataStax Enterprise (DSE) integrated offerings such as Apache Spark, Hadoop, Solr, and more recently DSE Graph.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
A deep learning startup has a requirement for a robust and scalable data architecture. Training a Deep Neural Network requires 10s-100s of millions of examples consisting of data and metadata. In addition to training it is necessary to support test/validation, data exploration and more traditional data science analytics workloads. As a startup we have minimal resources and an engineering team of 1.
Cassandra, Spark and Kafka running on Mesos in AWS is a scalable architecture that is fast and easy to set up and maintain to deliver a data architecture for Deep Learning.
About the Speaker
Andrew Jefferson VP Engineering, Tractable
A software engineer specialising in realtime data systems. I've worked at companies from Startups to Apple on applications ranging from Ticketing to Genetics. Currently building data systems for training and exploiting Deep Neural Networks.
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax
DataStax provides modern, feature-rich, and highly tunable client libraries for C/C++, C#, Java, Node.js, Python, PHP, and Ruby that work with any cluster size no matter if deployed across multiple on premise or cloud datacenters.
Come learn right from the source about the DataStax drivers for Apache Cassandra and DSE and how they can help you build continuously available, fault tolerant, and instantly responsive applications.
About the Speakers
Alex Popescu Senior Product Manager, DataStax
I'm a developer turned product manager building developer tools for Apache Cassandra and DSE. With an eye for simplicity, I focus on creating friendly developer solutions that enable building high-performance, scalable, and fault tolerant applications. I'm passionate about open source and over years I made numerous contributions to major projects like TestNG and Groovy.
Bulat Shakirzyanov Architect, DataStax
Bulat Shakirzyanov, a.k.a. avalance123, is a software alchemist who holds a black belt in test-fu. Open source enthusiast, author of and contributor to several popular open source projects, he also loves talking about clean code, open source, unix, distributed systems, consensus algorithms and himself in third person.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
In this talk, we review a real-world use case that tested the Cassandra+Spark stack on Datastax Enterprise (DSE). We also cover implementation details around application high availability and fault tolerance using the new DSE File System (DSEFS). From a field and testing perspective, we discuss the strategies we can leverage to meet our requirements. Such requirements include (but not limited to) functional coverage, system integration, usability, and performance. We will discuss best practices and lessons we learned covering everything from application development to DSE setup and tuning.
About the Speaker
Rocco Varela Software Engineer in Test, DataStax
After earning his PhD in bioinformatics from UCSF, Rocco Varela took his passion for technology to DataStax. At DataStax he works on several aspects of performance and test automation around DataStax Enterprise (DSE) integrated offerings such as Apache Spark, Hadoop, Solr, and more recently DSE Graph.
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
Cassandra is moving away from Thrift to CQL protocol. Along with this change, Thrift based client drivers are not actively supported any more, nor are they exposed to new Cassandra features. This brings many existing Cassandra users into a situation that they need to migrate their Thrift based application to CQL based. A complete solution to Thrift-to-CQL migration requires changes in 3 areas: application code, data model, and existing data. In this session we will focus on how we can migrate existing data effectively in order to reflect data model changes and give you the necessary tools to build from what you have learned to apply it in your environments.
Learning Objectives:
1) Gain an insight of what Cassandra storage engine looks like for version 2.2 and downward
2) Get better idea of how Thrift and CQL difference affects table design
3) Explore an approach to effectively migrate dynamically generated (Thrift) data into a static defined (CQL) table
About the Speaker
Yabin Meng Apache Cassandra / DataStax Enterprise Consultant, Pythian
Yabin is a DataStax certified Architect, Administrator, and Developer. He has been in IT industry for more than 15 years and much of his career is around database related technologies. He has been working with Cassandra for about 2 years and is currently a Cassandra/DSE consultant at Pythian.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
Albertsons/Safeway, America’s second largest supermarket chain, relies on DataStax Enterprise for their online customer facing application known as “LOYALTY”. With over 6 Million users and 1 Billion coupon clips per year, Albertson’s Safeway engages its buyers with their shopping experience from Web as well as Mobile app – but how does the organization ensure backup, restore, and redundancy?
This talk will explore how Albertsons/Safeway uses DataStax Enterprise for disaster avoidance, high availability, and extremely fast reads/writes. We will discuss how to run customized scripts in OpsCenter to ensure all nodes in the cluster are backed up without incurring performance hits and how Apache Cassandra data can be backed up while running on Azure using OS utilities and the system restored seamlessly without impacting app performance.
About the Speaker
Gurpreet Singh Data Services, Albertsons/ Safeway
Gurpreet Singh is a Cassandra Architect responsible for deploying, maintaining, and tuning customer facing applications that manage data, the most valuable asset in the organization.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, ""Yeah, I know what a Primary Key is"", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
Join us as we talk about the current state as well as the future of DSE Search. Nick Panahi will discuss high level architecture while Ariel will dive deep into some of the integration. We'll talk about future features, improvements and enhancements as well as some of the challenges of our custom integration and what that means for scale and availability.
About the Speakers
Nick Panahi Sr. Product Manager, DSE Search, DataStax
I am the product manager for DSE search, prior to product management, I was a solution architect for DataStax.
Ariel Weisberg Software Engineer, DataStax
Ariel is currently a Cassandra contributor and Datastax employee and former lead architect for VoltDB. Ariel aspires to be or considers himself a shared-nothing database expert depending on the time of day and whether Benedict is in the room, and has a passion for things measured in nanoseconds. Ariel has presented at events like Strangeloop, PAX Dev, OpenSQL camp Boston, NYC MySQL Meetup, and Boston New Technology Group meetup.
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
Since the introduction of SASI in Cassandra 3.4, it is way easier than before to query data. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE '%term%'` syntax.
This talk will show the architecture on a high level and exposes all the trade-offs so you can choose and use SAS wisely.
We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic sorry)
To illustrate the talk, we'll use a sample database of 110 000 albums and artists and create indices on them
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, Datastax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
Cassandra is getting more and more buzz and that means two things, more development and more issues. Some issues are unavoidable, but some of them are, just by understanding how our tooling works.
In this talk I'd like to review the core concepts on which Cassandra is built and how they impose the way we should work with it using some examples that will hopefully give you both a 'Quick Reference' and a 'Checklist' to go through every time you want to build scalable data models.
About the Speaker
Carlos Alonso Software Engineer, Job and Talent
Carlos received his Masters CS at Salamanca University, Spain. He worked a few years there in a digital agency, gaining expertise on a very wide range of technologies before moving to London where he narrowed down the focus on to the backend and data engineering disciplines. The latest step in his professional career was to move back to Madrid to work for Job and Talent where he currently helps on building the best candidate-job opening matching technology. Aside from work he likes sharing as much as he can by public speaking, mentoring or getting involved in OSS or OpenData initiatives.
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
Leveraging your operational data for advanced and predictive analytics enables deeper insights and greater value for cloud applications. DSE Analytics is a complete platform for Operational Analytics, including data ingestion, stream processing, batch analysis, and machine learning.
In this talk we will provide an overview of DSE Analytics as it applies to data science tools and techniques, and demonstrate these via real world use cases and examples.
Brian Hess
Rob Murphy
Rocco Varela
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting.
About the Speaker
Jason Cacciatore Senior Software Engineer, Netflix
Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...DataStax
Several CQL changes have occured since Cassandra 2.2. In this talk, I will explain some of the most important ones.
About the Speaker
Benjamin Lerer Software engineer, Datastax
Benjamin Lerer is an Apache Cassandra committer and a software engineer at Datastax. Prior to that, he worked 7 years for a High Frequency Trading Company.
There are many aspects of tuning Cassandra for production and a lot can go wrong: network splits and latency, hardware issues and failure, data corruption, etc. Most are mitigated with Cassandra's architecture but there are use cases where we need to dig deep and tune all layers to get the result we need to achieve specific business goals.
We will explore such case where we had to tune Cassandra for performance but also have consistent results on 99.999% of the queries. Getting even to 99 percent was relatively easy, but pushing those extra nines involved a lot of work. There are many nuts and bolts to turn and tune in order to get consistent results.
We will cover biggest latency-inducing factors and see how to set up metrics and tackle inevitable issues when doing cloud-based deployments. We will get into one of the major "sins" regarding AWS deployment by demystifying EBS based storage and talk about how we can leverage OS properties while tuning for high read performance.
About the Speaker
Matija Gobec CTO, SmartCat
Experienced software engineer interested in distributed streaming systems and real time analytics. In love with Cassandra since early versions.
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
Cassandra is moving away from Thrift to CQL protocol. Along with this change, Thrift based client drivers are not actively supported any more, nor are they exposed to new Cassandra features. This brings many existing Cassandra users into a situation that they need to migrate their Thrift based application to CQL based. A complete solution to Thrift-to-CQL migration requires changes in 3 areas: application code, data model, and existing data. In this session we will focus on how we can migrate existing data effectively in order to reflect data model changes and give you the necessary tools to build from what you have learned to apply it in your environments.
Learning Objectives:
1) Gain an insight of what Cassandra storage engine looks like for version 2.2 and downward
2) Get better idea of how Thrift and CQL difference affects table design
3) Explore an approach to effectively migrate dynamically generated (Thrift) data into a static defined (CQL) table
About the Speaker
Yabin Meng Apache Cassandra / DataStax Enterprise Consultant, Pythian
Yabin is a DataStax certified Architect, Administrator, and Developer. He has been in IT industry for more than 15 years and much of his career is around database related technologies. He has been working with Cassandra for about 2 years and is currently a Cassandra/DSE consultant at Pythian.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
Albertsons/Safeway, America’s second largest supermarket chain, relies on DataStax Enterprise for their online customer facing application known as “LOYALTY”. With over 6 Million users and 1 Billion coupon clips per year, Albertson’s Safeway engages its buyers with their shopping experience from Web as well as Mobile app – but how does the organization ensure backup, restore, and redundancy?
This talk will explore how Albertsons/Safeway uses DataStax Enterprise for disaster avoidance, high availability, and extremely fast reads/writes. We will discuss how to run customized scripts in OpsCenter to ensure all nodes in the cluster are backed up without incurring performance hits and how Apache Cassandra data can be backed up while running on Azure using OS utilities and the system restored seamlessly without impacting app performance.
About the Speaker
Gurpreet Singh Data Services, Albertsons/ Safeway
Gurpreet Singh is a Cassandra Architect responsible for deploying, maintaining, and tuning customer facing applications that manage data, the most valuable asset in the organization.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, ""Yeah, I know what a Primary Key is"", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
Join us as we talk about the current state as well as the future of DSE Search. Nick Panahi will discuss high level architecture while Ariel will dive deep into some of the integration. We'll talk about future features, improvements and enhancements as well as some of the challenges of our custom integration and what that means for scale and availability.
About the Speakers
Nick Panahi Sr. Product Manager, DSE Search, DataStax
I am the product manager for DSE search, prior to product management, I was a solution architect for DataStax.
Ariel Weisberg Software Engineer, DataStax
Ariel is currently a Cassandra contributor and Datastax employee and former lead architect for VoltDB. Ariel aspires to be or considers himself a shared-nothing database expert depending on the time of day and whether Benedict is in the room, and has a passion for things measured in nanoseconds. Ariel has presented at events like Strangeloop, PAX Dev, OpenSQL camp Boston, NYC MySQL Meetup, and Boston New Technology Group meetup.
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
Since the introduction of SASI in Cassandra 3.4, it is way easier than before to query data. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE '%term%'` syntax.
This talk will show the architecture on a high level and exposes all the trade-offs so you can choose and use SAS wisely.
We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic sorry)
To illustrate the talk, we'll use a sample database of 110 000 albums and artists and create indices on them
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, Datastax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
Cassandra is getting more and more buzz and that means two things, more development and more issues. Some issues are unavoidable, but some of them are, just by understanding how our tooling works.
In this talk I'd like to review the core concepts on which Cassandra is built and how they impose the way we should work with it using some examples that will hopefully give you both a 'Quick Reference' and a 'Checklist' to go through every time you want to build scalable data models.
About the Speaker
Carlos Alonso Software Engineer, Job and Talent
Carlos received his Masters CS at Salamanca University, Spain. He worked a few years there in a digital agency, gaining expertise on a very wide range of technologies before moving to London where he narrowed down the focus on to the backend and data engineering disciplines. The latest step in his professional career was to move back to Madrid to work for Job and Talent where he currently helps on building the best candidate-job opening matching technology. Aside from work he likes sharing as much as he can by public speaking, mentoring or getting involved in OSS or OpenData initiatives.
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
Leveraging your operational data for advanced and predictive analytics enables deeper insights and greater value for cloud applications. DSE Analytics is a complete platform for Operational Analytics, including data ingestion, stream processing, batch analysis, and machine learning.
In this talk we will provide an overview of DSE Analytics as it applies to data science tools and techniques, and demonstrate these via real world use cases and examples.
Brian Hess
Rob Murphy
Rocco Varela
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting.
About the Speaker
Jason Cacciatore Senior Software Engineer, Netflix
Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...DataStax
Several CQL changes have occured since Cassandra 2.2. In this talk, I will explain some of the most important ones.
About the Speaker
Benjamin Lerer Software engineer, Datastax
Benjamin Lerer is an Apache Cassandra committer and a software engineer at Datastax. Prior to that, he worked 7 years for a High Frequency Trading Company.
There are many aspects of tuning Cassandra for production and a lot can go wrong: network splits and latency, hardware issues and failure, data corruption, etc. Most are mitigated with Cassandra's architecture but there are use cases where we need to dig deep and tune all layers to get the result we need to achieve specific business goals.
We will explore such case where we had to tune Cassandra for performance but also have consistent results on 99.999% of the queries. Getting even to 99 percent was relatively easy, but pushing those extra nines involved a lot of work. There are many nuts and bolts to turn and tune in order to get consistent results.
We will cover biggest latency-inducing factors and see how to set up metrics and tackle inevitable issues when doing cloud-based deployments. We will get into one of the major "sins" regarding AWS deployment by demystifying EBS based storage and talk about how we can leverage OS properties while tuning for high read performance.
About the Speaker
Matija Gobec CTO, SmartCat
Experienced software engineer interested in distributed streaming systems and real time analytics. In love with Cassandra since early versions.
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
Advanced Apache Cassandra operations depends on an understanding of what features are available via the JMX interface. While nodetool exposes many of these, the most useful are still waiting to be discovered. The JMX interface allows the code base to expose functions that operate directly on internal structures, making real time changes to the way the process runs. With this skill in your toolkit there is no limit to the changes you can make.
In this talk Nate McCall, CTO at The Last Pickle, will explain how to explore, secure, and invoke the JMX interface exposed by Cassandra. He'll then move on to what you can do with it such as compacting specific SSTables, changing compaction on a single node, managing repairs, diagnosing latency, viewing cross node timeouts, and others. Whether you are a developer or operator, new or experienced, you will be given a thorough understanding of what all is available via JMX without having to consult the code on your own.
About the Speaker
Nate McCall CTO, The Last Pickle
Nate McCall has 16 years of server-side systems and software development experience. He started his involvement in the Cassandra community in the late fall of 2009 when he became one of the original developers on the Hector Java client. He has contributed a number of patches over the years to the Apache Cassandra code base and continues to be actively involved on the mail lists, issue system and IRC. He has been a DataStax MVP every year since the inception of the program.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...DataStax
In 2012 I presented my first version of the KillrVideo video sharing site and a data model was born! Many things have happened to Cassandra since then and as a result, the data model for KillrVideo has evolved. The transition from Thrift to CQL was the first big shift. From Cassandra 2 to 3 we have seen some major usability enhancements to CQL that have reduced the complexity on the application developer. Indexing changes. Denormalization help. Syntax changes in the select queries. Storage engine changes that has eliminated anti-patterns. A lot to talk about in a constantly evolving project like Apache Cassandra. Don't get left behind!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services?
In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world.
We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix.
About the Speakers
ajay upadhyay Senior Database Engineer, Netflix
Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team.
Arun Agrawal Senior Software Engineer, Netflix
Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
The Strong Consistency provided by QUORUM reads in Cassandra can still lead to read-write-modify problems when applications want to do things such as guarantee uniqueness or sell exactly 300 cinema tickets. Fortunately Light Weight Transactions (LWT) are designed to solve the problems Strong Consistency can not.
In this talk Christopher Batey, Consultant at The Last Pickle, will discuss:
- Syntax and semantics: Theoretical use cases
- How they work under the covers
Then we will go through LWTs in practice:
- How do the number of nodes/replicas/data centres affect performance?
- How does contention (multiple concurrent queries using LWTs) affect availability and performance?
- What consistency guarantees do you get with other LWTs and non-LWTs?
- How does LWT timeout differ from normal write timeout?
- Use case: LWTs as a distributed lock and how it went wrong 5 times.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Doug Cutting discusses:
- A brief history of Spark and its rise in popularity across developers and enterprises
- Spark's advantages over MapReduce
- The One Platform Initiative and the roadmap for Spark
- The future of data processing in Hadoop
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
This talk is about sharing experience and lessons learned on setting up and running the Apache Spark service inside the database group at CERN. It covers the many aspects of this change with examples taken from use cases and projects at the CERN Hadoop, Spark, streaming and database services. The talks is aimed at developers, DBAs, service managers and members of the Spark community who are using and/or investigating “Big Data” solutions deployed alongside relational database processing systems. The talk highlights key aspects of Apache Spark that have fuelled its rapid adoption for CERN use cases and for the data processing community at large, including the fact that it provides easy to use APIs that unify, under one large umbrella, many different types of data processing workloads from ETL, to SQL reporting to ML.
Spark can also easily integrate a large variety of data sources, from file-based formats to relational databases and more. Notably, Spark can easily scale up data pipelines and workloads from laptops to large clusters of commodity hardware or on the cloud. The talk also addresses some key points about the adoption process and learning curve around Apache Spark and the related “Big Data” tools for a community of developers and DBAs at CERN with a background in relational database operations.
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
Jai Ranganathan, Senior Director of Product Management, discusses why Spark has experienced such wide adoption and provide a technical deep dive into the architecture. Additionally, he presents some use cases in production today. Finally, he shares our vision for the Hadoop ecosystem and why we believe Spark is the successor to MapReduce for Hadoop data processing.
Les mégadonnées représentent un vrai enjeu à la fois technique, business et de société
: l'exploitation des données massives ouvre des possibilités de transformation radicales au
niveau des entreprises et des usages. Tout du moins : à condition que l'on en soit
techniquement capable... Car l'acquisition, le stockage et l'exploitation de quantités
massives de données représentent des vrais défis techniques.
Une architecture big data permet la création et de l'administration de tous les
systèmes techniques qui vont permettre la bonne exploitation des données.
Il existe énormément d'outils différents pour manipuler des quantités massives de
données : pour le stockage, l'analyse ou la diffusion, par exemple. Mais comment assembler
ces différents outils pour réaliser une architecture capable de passer à l'échelle, d'être
tolérante aux pannes et aisément extensible, tout cela sans exploser les coûts ?
Le succès du fonctionnement de la Big data dépend de son architecture, son
infrastructure correcte et de son l’utilité que l’on fait ‘’ Data into Information into Value ‘’.
L’architecture de la Big data est composé de 4 grandes parties : Intégration, Data Processing
& Stockage, Sécurité et Opération.
DoneDeal AWS Data Analytics Platform build using AWS products: EMR, Data Pipeline, S3, Kinesis, Redshift and Tableau. Custom built ETL was written using PySpark.
Getting real-time analytics for devices/application/business monitoring from trillions of events and petabytes of data like companies Netflix, Uber, Alibaba, Paypal, Ebay, Metamarkets do.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
In this talk will show how Large Scale Data Analytics can be done with Spark and Cassandra on the DataStax Enterprise Platform. First we will give an overview of what is the Spark Cassandra Connector and how it enables working with large data sets. Then we will use the Spark Notebook to show live examples in the browser of interacting with the data. The example will load a large Movies Database from Cassandra into Spark and then show how that data can be transformed and analyzed using Spark.
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
3 Things to Learn About:
*How Apache Kudu enables users to do more than ever before with their Analytic and Operational Databases
*How Cloudera has built two versatile databases to help our customers tackle their hardest problems.
*How the addition of Apache Kudu to this mix will enable new use cases around real-time analytics, internet of things, time series data, and more.
Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
The Briefing Room with Dr. Robin Bloor and Teradata RainStor
Live Webcast October 13, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=012bb2c290097165911872b1f241531d
Hadoop data lakes are emerging as peers to corporate data warehouses. However, successful data management solutions require a fusion of all relevant data, new and old, which has proven challenging for many companies. With a data lake that’s been optimized for fast queries, solid governance and lifecycle management, users can take data management to a whole new level.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he discusses the relevance of data lakes in today’s information landscape. He’ll be briefed by Mark Cusack of Teradata, who will explain how his company’s archiving solution has developed into a storage point for raw data. He’ll show how the proven compression, scalability and governance of Teradata RainStor combined with Hadoop can enable an optimized data lake that serves as both reservoir for historical data and as a "system of record” for the enterprise.
Visit InsideAnalysis.com for more information.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Similar to Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, Rumeel Kazi, Accenture / Rich Rein, DataStax) | C* Summit 2016 (20)
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
Be a holiday hero—not a sorry statistic. View this on-demand webinar to learn how to drive revenue, business growth, customer satisfaction, and loyalty during the holiday season, and achieve operational excellence (and sanity!) at the same time. You’ll also hear real-world stories of companies that have experienced Black Friday nightmares—and learn how they turned things back around.
View webinar: https://pages.datastax.com/20191003-NAM-Webinar-IsYourEnterpriseReadytoShinethisHolidaySeason_1-Registration-LP.html
Explore all DataStax webinars: www.datastax.com/webinars
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
Data resiliency and availability are mission-critical for enterprises today—yet we live in a world where outages are an everyday occurrence. Whether the problem is a single server failure or losing connectivity to an entire data center, if your applications aren’t designed to be fault tolerant, recovery from an outage can be painful and slow. Watch this on-demand webinar to look at best practices for developing fault-tolerant applications with DataStax Drivers for Apache Cassandra and DataStax Enterprise (DSE).
View recording: https://youtu.be/NT2-i3u5wo0
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
To simplify deploying and managing modern applications, enterprises have been combining the benefits of hyperconverged infrastructure (HCI) with the performance and scale of a NoSQL database — and the results have been remarkable. With this combination, IT organizations have experienced more agility, improved reliability, and better application performance. Watch this on-demand webinar where you’ll learn specifically how VMware HCI with DataStax Enterprise (DSE) and Apache Cassandra™ are transforming the enterprise.
View recording: https://youtu.be/FCLGHMIB0L4
Explore all DataStax Webinars: https://www.datastax.com/resources/webinars
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
A distributed graph database is the most powerful means of discovering and leveraging the relationships in your data. With the right techniques combined with the right enterprise graph features, you can build modern applications at scale for real-time use-cases. But how exactly should you manage and model your data for a distributed graph database? And how can you leverage the relationships in that data? Watch this on-demand webinar as our graph expert answers those questions and shares tips and insights into creating production apps with distributed graph data.
View recording: https://youtu.be/TSs_qPnhOas
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment.
View webinar: https://youtu.be/RrTxQ2BAxjg
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks.
View Recording: https://youtu.be/McZg_MMzVjI
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
In this webinar, you’ll also be introduced to DataStax Apache Kafka Connector, and get a brief demonstration of this groundbreaking technology. You’ll directly experience how this tool can help you stream data from Kafka topics into DataStax Enterprise versions of Cassandra. The future of your organization won’t wait. Register now to reserve your spot in this exciting new webinar.
Youtube: https://youtu.be/HmkNb8twUNk
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
No matter how diligent your organization is at driving toward efficiency, databases are complex and it’s easy to make mistakes on your way to production. The good news is, these mistakes are completely avoidable. In this webinar, Jeff Carpenter shares with you exactly how to get started in the right direction — and stay on the path to a successful database launch.
View recording: https://youtu.be/K9Zj3bhjdQg
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar.
View recording: https://youtu.be/z8fLn8GL5as
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
In this webinar, we’ll discuss how an Active Everywhere database—a masterless architecture where multiple servers (or nodes) are grouped together in a cluster—provides a consistent data fabric between on-premises data centers and public clouds, enabling enterprises to effortlessly scale their hybrid cloud deployments and easily transition to the new hybrid cloud world, without changes to existing applications.
View recording: https://youtu.be/ob6tr-9YiF4
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
The European Union’s General Data Protection Regulation (GDPR) has sweeping effects on how enterprises manage their data. Without the right policies and safeguards in place, a tiny data mishap could end up turning into a catastrophic mistake. Join Datastax and our partner Thales eSecurity for a live webinar to learn how GDPR effects impact data management and the various ways enterprises can both comply and thrive in a hybrid cloud environment.
View recording: https://youtu.be/QZ48_qkK9PU
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
Most enterprises understand the value of hybrid cloud. In fact, your enterprise is already working in a multi-cloud or hybrid cloud environment, whether you know it or not. View this SlideShare to gain a greater understanding of the requirements of a geo-distributed cloud database in hybrid and multi-cloud environments.
View recording: https://youtu.be/tHukS-p6lUI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Evaluate Cloud Databases for eCommerceDataStax
View these slides to discover the advantages of a distributed cloud database designed for hybrid cloud along with examples of how companies are delivering innovative and personalized ecommerce experiences. We'll discuss the sources of common data challenges and the hidden impact they have on business, the database requirements for improved customer experiences and innovative application delivery, and how leading organizations such as eBay, Sony, Macy’s, and Comcast are transforming the eCommerce experience with DataStax Enterprise 6.
View recording: https://youtu.be/4UXrJ3xtmGg
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
Today’s customers want experiences that are contextual, always on, and above all — delightful. To be able to provide this, enterprises need a distributed, hybrid cloud-ready database that can easily crunch massive volumes of data from disparate sources while offering data autonomy and operational simplicity. Don’t miss this webinar, where you’ll learn how DataStax Enterprise 6 maintains hybrid cloud flexibility with all the benefits of a distributed cloud database, delivers all the advantages of Apache Cassandra with none of the complexities, doubles performance, and provides additional capabilities around robust transactional analytics, graph, search, and more.
View recording: https://youtu.be/tuiWAt2jwBw
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
Today’s Right-Now Economy means employees and customers alike expect applications to be always on, real time, and contextual. But how do you manage applications that collect data from a variety of sources, at cloud scale, and provide instant insights? And, can you embrace the public cloud while still retaining control of your data? Join us to hear from Microsoft Cloud Architect and Azure Global Black Belt Ron Abellera to learn how an enterprise-ready hybrid cloud data layer can help to accelerate time to market and scale linearly, ensure continuous availability, and achieve data autonomy with a hybrid cloud strategy.
View webinar recording: https://youtu.be/_-GqmAk5C_I
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
Welcome to the Right-Now Economy. To win in the Right-Now Economy, your enterprise needs to be able to provide delightful, always-on, instantaneously responsive applications via a data layer that can handle data rapidly, in real time, and at cloud scale. Don’t miss our upcoming webinar in which Forrester Principal Analyst Brendan Witcher will discuss why a singular, contextual, 360-degree view of the customer in real-time is critical to CX success and how companies are using data to deliver real-time personalization and recommendations.
View recording: https://youtu.be/e6prezfIGMY
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Datastax - The Architect's guide to customer experience (CX)DataStax
From scalability to data access to data governance, learn the specific performance and data requirements of a customer experience-ready data management platform.
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. Join this webinar, to hear leading experts from DataStax, discuss how DataStax Enterprise, the data management platform trusted by 9 out of the top 15 global banks, enables innovation and industry transformation. They’ll cover how the right data management platform can help break down data silos and modernize old systems of record as an operational data layer that scales to meet the distributed, real-time, always available demands of the enterprise. Register now to learn how the right data management platform allows you to power innovative banking applications, gain instant insight into comprehensive customer interactions, and beat fraud before it happens.
Video: https://youtu.be/319NnKEKJzI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. How can you contextualize and analyze all this customer data in real time to meet increasingly demanding customer expectations? Join Mike Rowland, Director and National Practice Leader for CX Strategy at West Monroe Partners, and Kartavya Jain, Product Marketing Manager at DataStax, for an in-depth conversation about how customer experience frameworks, driven by Design Thinking, can help enterprises: understand their customers and their needs, define their strategy for real-time CX, create value from contextual and instant insights.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Based on Accenture Lab Research Paper: http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf
Based on Accenture Lab Research Paper: http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Data-Acceleration-Architecture-Modern-Data-Supply-Chain.pdf