This document provides an introduction to OpenStack, an open source software platform for building private and public clouds. It describes the key OpenStack components for compute (Nova), storage (Cinder, Glance, Swift), networking (Neutron), and identity (Keystone). It then discusses how organizations like CERN and PayPal use OpenStack to manage large amounts of data and computing resources in a scalable, distributed manner. The document concludes by outlining various ways that individuals can get involved and contribute to the OpenStack community.
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
In Apache Cassandra Lunch #54, we will discuss how you can use Apache Spark and Apache Cassandra to perform additional basic Machine Learning tasks.
Accompanying Blog: https://blog.anant.us/apache-cassandra-lunch-54-machine-learning-with-spark--cassandra-part-2/
Accompanying YouTube Video: https://youtu.be/3roCSBWQzRk
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
In Apache Cassandra Lunch #50, we will discuss how you can use Apache Spark and Apache Cassandra to perform basic Machine Learning tasks.
Accompanying Blog: https://blog.anant.us/apache-cassandra-lunch-50-machine-learning-with-spark--cassandra/
Accompanying YouTube video: https://youtu.be/myIX0kkpL9U
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise on DockerAnant Corporation
In Cassandra Lunch #75, we look at getting started with DataStax Enterprises on Docker.
Accompanying Blog: https://blog.anant.us/getting-started-with-datastax-enterprise-dse-on-docker
Accompanying YouTube: https://youtu.be/o2q5m3YbuUo
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Using Ceph for Large Hadron Collider DataRob Gardner
Talk by Lincoln Bryant (University of Chicago ATLAS team) on using Ceph for ATLAS data analysis @ Ceph Days Chicago http://ceph.com/cephdays/ceph-day-chicago/
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
Albertsons/Safeway, America’s second largest supermarket chain, relies on DataStax Enterprise for their online customer facing application known as “LOYALTY”. With over 6 Million users and 1 Billion coupon clips per year, Albertson’s Safeway engages its buyers with their shopping experience from Web as well as Mobile app – but how does the organization ensure backup, restore, and redundancy?
This talk will explore how Albertsons/Safeway uses DataStax Enterprise for disaster avoidance, high availability, and extremely fast reads/writes. We will discuss how to run customized scripts in OpsCenter to ensure all nodes in the cluster are backed up without incurring performance hits and how Apache Cassandra data can be backed up while running on Azure using OS utilities and the system restored seamlessly without impacting app performance.
About the Speaker
Gurpreet Singh Data Services, Albertsons/ Safeway
Gurpreet Singh is a Cassandra Architect responsible for deploying, maintaining, and tuning customer facing applications that manage data, the most valuable asset in the organization.
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias Johansson, Lead Developer at Valo.io
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Tobias is technical lead developer for Valo.io in London. He has a background in the financial sector as a front-office developer but changed track in 2013 to be part of a team building a new real-time analytics platform from the ground up. His goal is to outlive the JVM and his tea addiction. This is his first appearance on the conference scene as a speaker.
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
In Apache Cassandra Lunch #54, we will discuss how you can use Apache Spark and Apache Cassandra to perform additional basic Machine Learning tasks.
Accompanying Blog: https://blog.anant.us/apache-cassandra-lunch-54-machine-learning-with-spark--cassandra-part-2/
Accompanying YouTube Video: https://youtu.be/3roCSBWQzRk
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
In Apache Cassandra Lunch #50, we will discuss how you can use Apache Spark and Apache Cassandra to perform basic Machine Learning tasks.
Accompanying Blog: https://blog.anant.us/apache-cassandra-lunch-50-machine-learning-with-spark--cassandra/
Accompanying YouTube video: https://youtu.be/myIX0kkpL9U
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise on DockerAnant Corporation
In Cassandra Lunch #75, we look at getting started with DataStax Enterprises on Docker.
Accompanying Blog: https://blog.anant.us/getting-started-with-datastax-enterprise-dse-on-docker
Accompanying YouTube: https://youtu.be/o2q5m3YbuUo
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Using Ceph for Large Hadron Collider DataRob Gardner
Talk by Lincoln Bryant (University of Chicago ATLAS team) on using Ceph for ATLAS data analysis @ Ceph Days Chicago http://ceph.com/cephdays/ceph-day-chicago/
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
Albertsons/Safeway, America’s second largest supermarket chain, relies on DataStax Enterprise for their online customer facing application known as “LOYALTY”. With over 6 Million users and 1 Billion coupon clips per year, Albertson’s Safeway engages its buyers with their shopping experience from Web as well as Mobile app – but how does the organization ensure backup, restore, and redundancy?
This talk will explore how Albertsons/Safeway uses DataStax Enterprise for disaster avoidance, high availability, and extremely fast reads/writes. We will discuss how to run customized scripts in OpsCenter to ensure all nodes in the cluster are backed up without incurring performance hits and how Apache Cassandra data can be backed up while running on Azure using OS utilities and the system restored seamlessly without impacting app performance.
About the Speaker
Gurpreet Singh Data Services, Albertsons/ Safeway
Gurpreet Singh is a Cassandra Architect responsible for deploying, maintaining, and tuning customer facing applications that manage data, the most valuable asset in the organization.
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias Johansson, Lead Developer at Valo.io
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Tobias is technical lead developer for Valo.io in London. He has a background in the financial sector as a front-office developer but changed track in 2013 to be part of a team building a new real-time analytics platform from the ground up. His goal is to outlive the JVM and his tea addiction. This is his first appearance on the conference scene as a speaker.
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data.
Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining
tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in
collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is
not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we
need to design a different framework to improve efficiency. Also, the size of accumulated data grows
exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in
terms of computation capability and storage capacity.
Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining
algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for
distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We
propose a cloud based data mining model which provides the facility of mass data storage along with distributed
data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an
interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is
configured over private servers and clients can process their data through common framework from anywhere in
private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private
computers on the network. It is observed that the framework is helpful in processing large size data in less time
as compared to single system.
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you!
In this two part session you learn details of:
• the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas.
• best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
What is Mesos? How does it works? In the following slides we make an interesting review of this open-source software project to manage computer clusters.
Paolo Lucente, author of the software pmacct.
Traffic matrices can greatly benefit key IP Service Provider activities like capacity planning, traffic engineering, better understand their traffic patterns and take meaningful peering decisions. This talk wants to present a way to build traffic matrices using telemetry data and BGP, leveraging along the way some case-studies with a technical cut. pmacct (http://www.pmacct.net) is a commonly used, free, open-source IPv4/IPv6a ccounting package which integrates a NetFlow/sFlow and a multi-RIB BGP collector in a single piece of software and is authored by the presenter.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
Since the introduction of SASI in Cassandra 3.4, it is way easier than before to query data. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE '%term%'` syntax.
This talk will show the architecture on a high level and exposes all the trade-offs so you can choose and use SAS wisely.
We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic sorry)
To illustrate the talk, we'll use a sample database of 110 000 albums and artists and create indices on them
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, Datastax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, ""Yeah, I know what a Primary Key is"", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
Hadoop is a Java software framework that supports data-intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and petabytes of data.
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data.
Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining
tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in
collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is
not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we
need to design a different framework to improve efficiency. Also, the size of accumulated data grows
exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in
terms of computation capability and storage capacity.
Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining
algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for
distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We
propose a cloud based data mining model which provides the facility of mass data storage along with distributed
data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an
interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is
configured over private servers and clients can process their data through common framework from anywhere in
private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private
computers on the network. It is observed that the framework is helpful in processing large size data in less time
as compared to single system.
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you!
In this two part session you learn details of:
• the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas.
• best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
What is Mesos? How does it works? In the following slides we make an interesting review of this open-source software project to manage computer clusters.
Paolo Lucente, author of the software pmacct.
Traffic matrices can greatly benefit key IP Service Provider activities like capacity planning, traffic engineering, better understand their traffic patterns and take meaningful peering decisions. This talk wants to present a way to build traffic matrices using telemetry data and BGP, leveraging along the way some case-studies with a technical cut. pmacct (http://www.pmacct.net) is a commonly used, free, open-source IPv4/IPv6a ccounting package which integrates a NetFlow/sFlow and a multi-RIB BGP collector in a single piece of software and is authored by the presenter.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
Since the introduction of SASI in Cassandra 3.4, it is way easier than before to query data. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE '%term%'` syntax.
This talk will show the architecture on a high level and exposes all the trade-offs so you can choose and use SAS wisely.
We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic sorry)
To illustrate the talk, we'll use a sample database of 110 000 albums and artists and create indices on them
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, Datastax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, ""Yeah, I know what a Primary Key is"", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
Hadoop is a Java software framework that supports data-intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and petabytes of data.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
Google’s TensorFlow is one of the most popular deep learning (DL) frameworks. In distributed TensorFlow, gradient updates are a critical step governing the total model training time. These updates incur a massive volume of data transfer over the network.
In this talk, we first present a thorough analysis of the communication patterns in distributed TensorFlow. Then we propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMA-gRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance. Our design includes advanced features such as message pipelining, message coalescing, zero-copy transmission, etc. The performance evaluations show that our proposed design can significantly speed up gRPC throughput by up to 1.5x compared to the default gRPC design. By integrating our RDMA-gRPC with TensorFlow, we are able to achieve up to 35% performance improvement for TensorFlow training with CNN models.
Speakers
Dhabaleswar K (DK) Panda, Professor and University Distinguished Scholar, The Ohio State University
Xiaoyi Lu, Research Scientist, The Ohio State University
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown."
Watch the video: https://youtu.be/iLTYkTandEA
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
What is OpenStack and the added value of IBM solutionsSasha Lazarevic
OpenStack has become de-facto standard for private cloud implementations. This is presentation of OpenStack basics, with a conclusion that can be valuable to professional services. I recommend the clients to pay attention to IBM's value-added solutions like Cloud Manager and Cloud Orchestrator.
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference.
"This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video: http://wp.me/p3RLHQ-glW
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...OpenStack
Audience Level
Intermediate
Synopsis
M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash’s R@CMon Research Cloud team. Built to support Monash University’s next-gen high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only.
We’ll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds. We’ll also discuss some of the software and processing pipelines that this system supports and highlight the importance of tuning for these workloads.
Speaker Bio
Blair Bethwaite: Blair has worked in distributed computing at Monash University for 10 years, with OpenStack for half of that. Having served as team lead, architect, administrator, user, researcher, and occasional hacker, Blair’s unique perspective as a science power-user, developer, and system architect has helped guide the evolution of the research computing engine central to Monash’s 21st Century Microscope.
Lance Wilson: Lance is a mechanical engineer, who has been making tools to break things for the last 20 years. His career has moved through a number of engineering subdisciplines from manufacturing to bioengineering. Now he supports the national characterisation research community in Melbourne, Australia using OpenStack to create HPC systems solving problems too large for your laptop.
Les mégadonnées représentent un vrai enjeu à la fois technique, business et de société
: l'exploitation des données massives ouvre des possibilités de transformation radicales au
niveau des entreprises et des usages. Tout du moins : à condition que l'on en soit
techniquement capable... Car l'acquisition, le stockage et l'exploitation de quantités
massives de données représentent des vrais défis techniques.
Une architecture big data permet la création et de l'administration de tous les
systèmes techniques qui vont permettre la bonne exploitation des données.
Il existe énormément d'outils différents pour manipuler des quantités massives de
données : pour le stockage, l'analyse ou la diffusion, par exemple. Mais comment assembler
ces différents outils pour réaliser une architecture capable de passer à l'échelle, d'être
tolérante aux pannes et aisément extensible, tout cela sans exploser les coûts ?
Le succès du fonctionnement de la Big data dépend de son architecture, son
infrastructure correcte et de son l’utilité que l’on fait ‘’ Data into Information into Value ‘’.
L’architecture de la Big data est composé de 4 grandes parties : Intégration, Data Processing
& Stockage, Sécurité et Opération.
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
In this talk we’ll introduce an open source project being used to monitor large Power Systems clusters, such as in the IBM collaboration with Oak Ridge and Lawrence Livermore laboratories for the Summit project, a large deployment of custom AC922 Power Systems nodes augmented by GPUs that work in tandem to implement the (currently) largest Supercomputer in the world.
Data is collected out-of-band directly from the firmware layer and then redistributed to various components using an open source component called Crassd. In addition, in-band operating-system and service level metrics, logs and alerts can also be collected and used to enrich the visualization dashboards. Open source components such as the Elastic Stack (Elasticsearch, Logstash, Kibana and select Beats) and Netdata are used for monitoring scenarios appropriate to each tool’s strengths, with other components such as Prometheus and Grafana in the process of being implemented. We’ll briefly discuss our experience to put these components together, and the decisions we had to make in order to automate their deployment and configuration for our goals. Finally, we lay out collaboration possibilities and future directions to enhance our project as a convenient starting point for others in the open source community to easily monitor their own Power Systems environments.
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
2. ●
Electrical engineering student
●
Software engineering intern at
Red Hat
●
GNOME's Outreach Program for
Women participant
●
K-12 robotics and engineering mentor
●
Interested in free software, education,
●
python, and robots
3. Let's imagine...
Research organization
Large flow of data (20 TB a day)
Analyze this data
Data processing needs continued
increasing
Require different computing needs
Employees all over the world
How do you handle that much data
and make it available to your
employees/collaborators world wide?
4. What is OpenStack?
Set of software tools for building and
managing clouds.
Manages compute, storage, and
networking resources from a single
point.
Separated into projects which
communicate through public APIs
Scalable
Founded by NASA and Rackspace in
2010
6. Nova (Compute)
➔
Manages and automates pools of computer resources
➔
Used for deploying and managing large numbers of virtual machines
and other instances to handle computing tasks
➔
Supports multiple hypervisors (software that creates virtual machines)
7. Cinder (Block Storage)
Block storage (files are broken into little
pieces of data blocks so the disk can
store them) to guest virtual machines.
● The block storage system manages the
creation, attaching and detaching of
the block devices to servers.
Data access speed is the most
important consideration.
Create/delete/list volumes and
snapshots.
8. Glance (Image service)
➔
Manages "images" (virtual copies) of hard
disks and uses them as templates when
deploying new virtual machine instances.
➔
Configure, create, delete, and control
access to images
➔
Store and catalog an unlimited number of
backups
9. Swift (Object Storage)
● Distributed storage to prevent
any single point of failure
● API-accessible
● Containers
● Store VM images, backups, and archives
as well as smaller files, such as photos
and email messages
● Objects are referred by unique identifier
instead of location
10. Neutron (Networking)
System for managing networks and IP addresses
Provides networking models for different applications or
user groups
Relieve the stress on the network in cloud environments
Floating IP addresses allow traffic to be dynamically
rerouted to any of your compute resources
Extension framework that allows intrusion detection
systems (IDS), load balancing, firewalls and virtual private
networks (VPN)
11. Keystone (Identity)
➔ Central list of all of the users of the
OpenStack cloud, mapped against all of
the services provided by the cloud which
they have permission to use.
➔ Authentication & means of access
➔ Users, tenants, and roles
Admin powerz!
12. Other projects
➔
Ceiliometer (Telemetry)
– Provide efficient collection of metering
data, in terms of CPU and network costs
– Monitoring notifications sent from
existing services or by polling the
infrastructure
– Single Point Of Contact for billing
systems
➔
Trove (Database Service)
– Trove is a database-as-a-
service provisioning relational
and non-relational database
engines
13. Other projects
➔
Sahara (Data processing)
– Simple means to provision a
Hadoop cluster on OpenStack
– Analyze big data
➔
Heat (Orchestration)
– Heat is a service to orchestrate
multiple composite cloud
applications using templates
17. How is Openstack used?
● Web
● Ecommerce
● Research
● IT
● Healthcare
● Cloud hosting
● Film / Media / Gaming
● Yahoo
● Wikimedia
● US Department of
Energy
● Universities
● RackSpace
18. So..
Research organization
Large flow of data (20 TB a day)
Analyze this data
Data processing needs continued
increasing
Require different computing needs
Employees all over the world
How do you handle that much data
and make it available to your
employees world wide?
19. CERN
● Home of the Large
Hydron Collider (27 km
circumference & 100
meters under the
ground)
● 10,000+ scientists
● Create particle
collisions (10 millions
of collisions every
second)
● Multistorey-high
digital cameras
records thousands of
images per second of
debris from the LHC
particle collision
● 35PB/year
● Physics all around the
world analyze data
20. CERN
● 10,000 servers, 75,000 disks, and 100
petabytes of data stored in mass-
storage systems
● 200 Gbp/s network
● Reached its power and cooling capacity
21. CERN
Data is saved to tape storage and
then distributed to the more than
150 sites worldwide that comprise
the Worldwide LHC (Large Hadron
Collider) Computing Grid (WLCG)
for analysis
22. CERN
● The CERN Worldwide LHC Computing Grid
(WLCG) is massive!
● Data is sent to its 12 Tier 1 sites around the
world
● 140+ Tier 2 sites used for creation of
simulated data and analysis. In all, CERN
mass-storage systems store more than 100
PB of data.
(1 petabyte = 1,000 terrabytes)
23. CERN
● Private cloud
● 65,000 cores across
multiple data
centers, supporting
multiple users
● ~1300 compute
nodes
● Configured with
Puppet
● Runs on nova,
glance, keystone,
neutron, ceilometer,
and horizon
● Ceph as storage for
glance, volumes,
and other physics
storage (3PB)
● Scientific Linux
24. CERN
● Dynamic allocation of network addresses
and register those with with network
management system.
● Implemented extensions for creating
DNS entries, Kerberos, and X.509
certificates
● Self-service
25. Paypal
● Processed $26,000+ in mobile payments
every minute in 2012.
● OpenStack runs virtual machines to support
self-service for developers
● OpenStack APIs have accelerated the
company’s development process by making
applications easier to implement and having
one consistent standard across their
environment.
26. Contributing!
Different ways to
contribute
– File bugs
– Translations
– Security
– Documentation
– UX
– Web maintenance
● Contributing Code
– Reviewing
– Bug fixes
– Housekeeping
● comments in code,
reducing pylint
violations, increasing
code coverage
– Development
● Blueprints
27. Contributing!
● Launchpad account
– Add an SSH key
● Join the OpenStack Foundation
– Vote in elections
– Run for elected positions
● Sign Contributor License Agreement
– Individual
– Government
– Corporate
31. Where can I learn more?
➔ OpenStack Community Newsletter
➔ opensource.com (/tags/openstack)
➔ OpenStack Summit videos
➔ Mailing lists, IRC, ask.openstack.org
➔ openstack.org/user-stories/
➔ User groups
More at cpallar.es/openstack101