The document provides an overview of NoSQL and big data technologies. It begins with defining big data and the challenges it poses that require new techniques compared to traditional databases. It then discusses the CAP theorem and how NoSQL databases sacrifice consistency or availability to achieve scalability. The document outlines several NoSQL data models and examples like key-value, columnar, document and graph databases. It also discusses distributed systems like BigTable, HBase and PNUTS. Finally, it provides an example of how graph databases can model relationships compared to the need for joins in relational databases.
Cloud computing provides on-demand access to shared computing resources like networks, servers, storage, applications and services over the internet. It offers three main types of services - Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). IaaS provides basic computing and storage resources, PaaS offers development tools and environments, and SaaS delivers finished software applications to end users. The cloud offers advantages like scalability, pay-as-you-go pricing and reduced costs of ownership.
This document provides an introduction and agenda for a two-day training on Kubernetes. Day one will cover Kubernetes concepts like pods, services, replica sets, deployments and namespaces. It will also include hands-on exercises. Day two will focus on additional concepts like config maps, secrets, auto-scaling and Helm. It will end with further hands-on experience and conclusions.
A Novel Use of Openflow and Its Applications in Connecting Docker and Dummify...DaoliCloud Ltd
This document proposes a novel use of Openflow and its applications in connecting Docker containers and simplifying cloud deployment. Some key points:
- It aims to divide cloud infrastructure into smaller, independent parts to simplify deployment and scaling while using SDN techniques to reassemble them into a cloud with unlimited scalability.
- Near term applications include a zero-configuration plug-and-play cloud that reduces costs and expedites cloud maturity. Long term it could enable inter-cloud connectivity.
- It analyzes problems with current cloud networking practices and proposes using Openflow to map overlay entities to unique identifiers and route packets without encapsulation, eliminating the need for overlay entities to know each other.
This document provides an overview of SDN and OpenFlow. It discusses the drawbacks of traditional networks and how SDN aims to address these issues by separating the control plane and data plane. It then describes OpenFlow, the key SDN protocol, including its components, message types, secure channel, and how it enables flow-based packet matching and processing through flow tables and action sets. Example L2, L3, and load balancing uses of OpenFlow are also covered.
Cloud computing provides on-demand access to shared computing resources like networks, servers, storage, applications and services over the internet. It offers three main types of services - Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). IaaS provides basic computing and storage resources, PaaS offers development tools and environments, and SaaS delivers finished software applications to end users. The cloud offers advantages like scalability, pay-as-you-go pricing and reduced costs of ownership.
This document provides an introduction and agenda for a two-day training on Kubernetes. Day one will cover Kubernetes concepts like pods, services, replica sets, deployments and namespaces. It will also include hands-on exercises. Day two will focus on additional concepts like config maps, secrets, auto-scaling and Helm. It will end with further hands-on experience and conclusions.
A Novel Use of Openflow and Its Applications in Connecting Docker and Dummify...DaoliCloud Ltd
This document proposes a novel use of Openflow and its applications in connecting Docker containers and simplifying cloud deployment. Some key points:
- It aims to divide cloud infrastructure into smaller, independent parts to simplify deployment and scaling while using SDN techniques to reassemble them into a cloud with unlimited scalability.
- Near term applications include a zero-configuration plug-and-play cloud that reduces costs and expedites cloud maturity. Long term it could enable inter-cloud connectivity.
- It analyzes problems with current cloud networking practices and proposes using Openflow to map overlay entities to unique identifiers and route packets without encapsulation, eliminating the need for overlay entities to know each other.
This document provides an overview of SDN and OpenFlow. It discusses the drawbacks of traditional networks and how SDN aims to address these issues by separating the control plane and data plane. It then describes OpenFlow, the key SDN protocol, including its components, message types, secure channel, and how it enables flow-based packet matching and processing through flow tables and action sets. Example L2, L3, and load balancing uses of OpenFlow are also covered.
CERN's IT infrastructure is reaching its limits and needs to expand to support increasing computing capacity demands while maintaining a fixed staff size. CERN is addressing this by expanding its data center capacity through a new remote facility in Budapest, Hungary, and by adopting new open source configuration, monitoring and infrastructure tools to improve efficiency. Key projects include deploying OpenStack for infrastructure as a service, Puppet for configuration management, and integrating monitoring across tools. The transition will take place between 2012-2014 alongside LHC upgrades.
The document discusses OpenNebula, an open-source tool for managing virtual infrastructure in cloud computing. It describes OpenNebula's interoperability and portability features, challenges in these areas, and the community's approach of leveraging standards. Examples are given of collaborations using standards like OCCI and OVF to enable interoperability between OpenNebula and other cloud platforms.
Open nebula leading innovation in cloud computing managementIgnacio M. Llorente
The document discusses OpenNebula, an open-source toolkit for building Infrastructure as a Service (IaaS) clouds. It originated from the RESERVOIR European research project. OpenNebula allows organizations to build private, hybrid, and public clouds to manage their infrastructure resources. It has over 4,000 downloads per month and is used by many organizations and projects to build cloud computing testbeds and ecosystems. The document outlines OpenNebula's innovation model and calls for collaboration to address challenges regarding cloud adoption and key research issues in areas like cloud aggregation, interoperability, and management.
Comparing Open Source SDN Controllers, like OpenDaylight, OpenContrail, and ONOS is a challenge. Here, we’ll compare open source SDN Controllers. In a software-defined network (SDN), the SDN Controllers is the “brains” of the network. It is the strategic control point in the SDN network, relaying information to the switches/routers ‘below’ (via southbound APIs) and the applications and business logic ‘above’ (via northbound APIs).
Innovation in cloud computing architectures with open nebulaIgnacio M. Llorente
This presentation discusses innovation in cloud computing architectures using OpenNebula. It provides an overview of OpenNebula's positioning in the cloud ecosystem as an infrastructure as a service (IaaS) solution. It then covers challenges from different perspectives including users, infrastructure managers, business managers, and system integrators. It discusses designing a cloud infrastructure based on requirements and building a cloud using OpenNebula's features to enable private, public, and hybrid clouds.
Data-Blitz is a processing platform that provides high throughput and availability for organizations lacking resources. It uses modern techniques like those used by LinkedIn, Twitter, and others. Data-Blitz allows building, testing, deploying, and managing big data applications at scale across various infrastructure with built-in security, monitoring, and DevOps tools.
Slide deck from my "OpenStack and MySQL" presentation at Oracle OpenWorld 2015:
"This session details exactly how MySQL fits in throughout OpenStack, takes a deeper look at the database-as-a-service (DBaaS) offering with OpenStack Trove with MySQL, and discusses how Oracle supports this thriving ecosystem."
Virtualization is one of the hottest trends occurring in the IT industry. We dive into what virtualization is and why you should be thinking about implementing it into your network plan.
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case StudiesOpenNebula Project
This presentation discusses private cloud architectures for high-performance computing (HPC). It begins by describing the use case of using a private cloud for HPC workloads. It then covers the main challenges of deploying private HPC clouds, including flexible application management, resource management at scale, and ensuring application performance. Several case studies of existing private HPC clouds are presented, including those at FermiCloud, CESGA Cloud, SARA Cloud, SZTAKI Cloud, and KTH Cloud. Finally, trends in private cloud adoption by industry are discussed, such as experimenting with ARM architectures and providing hybrid cloud deployments.
The document introduces the Cisco One Platform Kit (onePK), which provides developers with tools to programmatically access and manipulate network resources. OnePK includes an SDK that standardizes access across different Cisco platforms through a common API. It allows applications to run on network devices or external servers. The onePK architecture provides flexibility in programming languages, device access, and deployment models. Key capabilities enabled include network analytics, automation, and new customized services.
Christian Kniep presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
"With Docker v1.9 a new networking system was introduced, which allows multi-host network- ing to work out-of-the-box in any Docker environment. This talk provides an introduction on what Docker networking provides, followed by a demo that spins up a full SLURM cluster across multiple machines. The demo is based on QNIBTerminal, a Consul backed set of Docker Images to spin up a broad set of software stacks."
Watch the video presentation:
http://wp.me/p3RLHQ-f7G
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
Network virtualization logically separates network resources and allows multiple virtual networks to operate over a shared physical infrastructure. It provides benefits like efficient usage of network resources, logical isolation of traffic between users, and accommodating dynamic server virtualization. Key enablers of network virtualization are cloud computing, server virtualization, software-defined networking (SDN), and network functions virtualization (NFV). A virtual tenant network (VTN) uses an underlay physical network and an overlay virtual network to logically isolate traffic for different users or groups. Common uses of network virtualization are in data centers and telecommunication networks.
The document discusses building a private cloud with open source software for a scientific environment. It provides an overview of cloud computing concepts, benefits of private clouds and open source software. It then discusses specific challenges and considerations for implementing a private cloud solution for scientific/research use cases using open source platforms like Eucalyptus, OpenNebula and OpenStack. Key technical aspects around virtualization, storage, networking and components of sample platforms are also summarized.
The document discusses the FlexPod solution from Cisco, NetApp, and Proact. FlexPod provides a converged infrastructure that combines Cisco UCS servers and networking with NetApp storage. It offers scalability, efficiency, and unified management. Proact partners with Cisco and NetApp to provide FlexPod solutions that are validated, secure, and easily managed. FlexPods allow customers to reduce costs and complexity while improving agility over traditional siloed data center architectures.
The document discusses software defined networking (SDN) and OpenFlow, including their history, key concepts, potential uses and challenges. SDN aims to separate the network control and forwarding functions through open standards like OpenFlow. This could make networks more programmable and innovative while reducing costs. However, challenges include limitations of the current standards and ensuring scalability and interoperability across vendors.
Software-Defined Networking (SDN): Unleashing the Power of the NetworkRobert Keahey
It goes without saying that cloud computing has dramatically reshaped the information technology services landscape. Virtualization is unleashing the power of commodity-based technology and open source communities are building new applications and services at an astonishing rate, but networking has lagged behind compute and storage in virtualization and automation. We’ve become accustomed to specialized networking silicon, complex operating systems and highly distributed control planes. For the most part, we’ve accepted the model along with its high costs.
All that is changing! New protocols such as OpenFlow are freeing the network control plane from proprietary operating systems and hardware platforms. We are entering a new era where customers control the features – and release schedules – of new, open networking applications that address the needs of the mega-scale world.
A lot of work is required to realize the potential of Software-Defined Networking (SDN), where we can enjoy the benefits derived from “software automating software.” This talk will examine some of the history that led us to the point where current networking architectures are no longer viable for cloud computing at mega-scale. We’ll take a look at the basics of SDN and some of its key elements – OpenFlow, network virtualization, and orchestration – along with some of the initiatives and companies that are setting the stage for the next generation of networking.
Running containers in production, the ING storyThijs Ebbers
- ING is transforming itself into a digital bank and using containers and microservices as part of its cloud native journey.
- ING has developed its own container hosting platform called ICHP, which runs on OpenShift and provides self-service capabilities for development teams to host applications.
- ICHP aims to provide reliable hosting while minimizing handovers and enabling development teams to focus on delivering value to the business rather than managing infrastructure.
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingMark Hinkle
And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover:
Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula
Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud.
Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others.
Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it.
[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...Dan Mihai Dumitriu
OpenStack deployments for public or private clouds require overlay networking. Due to the scale and rate of change of virtual resources, it isn't practical to rely on traditional network constructs and isolation mechanims. Today's deployments require performance, resilience, and high availability to be considered truly production-ready. In this session, we deep dive into the MidoNet architecture, and process of sending a data packet across an OpenStack environment through a network overlay. A distributed architecture implements logical constructs that are used to build networks without a single point of failure, all while adding network functionality in a highly-scalable manner. Network functions are applied in a single virtual hop. By applying network services right at the ingress host, the network is free from unnecessary clogging and bottlenecks by avoiding additional hops. Packets reach their destination more efficiently with the single virtual hop. After this session, the audience will understand how distributed architectures allow efficient networking with routing decisions and network services applied at the edge. Also, the audience will understand how it is easier to scale clouds when the network intelligence is distributed.
This document discusses distributed data stores and NoSQL databases. It begins by explaining how relational databases do not scale well for large web applications. It then discusses various techniques for scaling relational databases like master-slave replication and data partitioning. It introduces NoSQL databases as an alternative for large, unstructured datasets. Key features of NoSQL databases discussed include flexible schemas, eventual consistency, and high availability. Common types of NoSQL databases and some advantages and limitations are also summarized.
This document discusses distributed data stores and NoSQL databases. It begins by explaining how relational databases do not scale well for large web applications. Distributed key-value data stores like BigTable address this issue by allowing massively parallel data storage and retrieval. NoSQL databases relax ACID properties and do not require fixed schemas. The CAP theorem states that distributed systems can only achieve two of three properties: consistency, availability, and partition tolerance. Most NoSQL databases favor availability over strong consistency. Eventual consistency means copies will become consistent over time without updates. NoSQL is suitable for very large datasets but regular databases remain best for typical organizational use cases.
CERN's IT infrastructure is reaching its limits and needs to expand to support increasing computing capacity demands while maintaining a fixed staff size. CERN is addressing this by expanding its data center capacity through a new remote facility in Budapest, Hungary, and by adopting new open source configuration, monitoring and infrastructure tools to improve efficiency. Key projects include deploying OpenStack for infrastructure as a service, Puppet for configuration management, and integrating monitoring across tools. The transition will take place between 2012-2014 alongside LHC upgrades.
The document discusses OpenNebula, an open-source tool for managing virtual infrastructure in cloud computing. It describes OpenNebula's interoperability and portability features, challenges in these areas, and the community's approach of leveraging standards. Examples are given of collaborations using standards like OCCI and OVF to enable interoperability between OpenNebula and other cloud platforms.
Open nebula leading innovation in cloud computing managementIgnacio M. Llorente
The document discusses OpenNebula, an open-source toolkit for building Infrastructure as a Service (IaaS) clouds. It originated from the RESERVOIR European research project. OpenNebula allows organizations to build private, hybrid, and public clouds to manage their infrastructure resources. It has over 4,000 downloads per month and is used by many organizations and projects to build cloud computing testbeds and ecosystems. The document outlines OpenNebula's innovation model and calls for collaboration to address challenges regarding cloud adoption and key research issues in areas like cloud aggregation, interoperability, and management.
Comparing Open Source SDN Controllers, like OpenDaylight, OpenContrail, and ONOS is a challenge. Here, we’ll compare open source SDN Controllers. In a software-defined network (SDN), the SDN Controllers is the “brains” of the network. It is the strategic control point in the SDN network, relaying information to the switches/routers ‘below’ (via southbound APIs) and the applications and business logic ‘above’ (via northbound APIs).
Innovation in cloud computing architectures with open nebulaIgnacio M. Llorente
This presentation discusses innovation in cloud computing architectures using OpenNebula. It provides an overview of OpenNebula's positioning in the cloud ecosystem as an infrastructure as a service (IaaS) solution. It then covers challenges from different perspectives including users, infrastructure managers, business managers, and system integrators. It discusses designing a cloud infrastructure based on requirements and building a cloud using OpenNebula's features to enable private, public, and hybrid clouds.
Data-Blitz is a processing platform that provides high throughput and availability for organizations lacking resources. It uses modern techniques like those used by LinkedIn, Twitter, and others. Data-Blitz allows building, testing, deploying, and managing big data applications at scale across various infrastructure with built-in security, monitoring, and DevOps tools.
Slide deck from my "OpenStack and MySQL" presentation at Oracle OpenWorld 2015:
"This session details exactly how MySQL fits in throughout OpenStack, takes a deeper look at the database-as-a-service (DBaaS) offering with OpenStack Trove with MySQL, and discusses how Oracle supports this thriving ecosystem."
Virtualization is one of the hottest trends occurring in the IT industry. We dive into what virtualization is and why you should be thinking about implementing it into your network plan.
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case StudiesOpenNebula Project
This presentation discusses private cloud architectures for high-performance computing (HPC). It begins by describing the use case of using a private cloud for HPC workloads. It then covers the main challenges of deploying private HPC clouds, including flexible application management, resource management at scale, and ensuring application performance. Several case studies of existing private HPC clouds are presented, including those at FermiCloud, CESGA Cloud, SARA Cloud, SZTAKI Cloud, and KTH Cloud. Finally, trends in private cloud adoption by industry are discussed, such as experimenting with ARM architectures and providing hybrid cloud deployments.
The document introduces the Cisco One Platform Kit (onePK), which provides developers with tools to programmatically access and manipulate network resources. OnePK includes an SDK that standardizes access across different Cisco platforms through a common API. It allows applications to run on network devices or external servers. The onePK architecture provides flexibility in programming languages, device access, and deployment models. Key capabilities enabled include network analytics, automation, and new customized services.
Christian Kniep presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
"With Docker v1.9 a new networking system was introduced, which allows multi-host network- ing to work out-of-the-box in any Docker environment. This talk provides an introduction on what Docker networking provides, followed by a demo that spins up a full SLURM cluster across multiple machines. The demo is based on QNIBTerminal, a Consul backed set of Docker Images to spin up a broad set of software stacks."
Watch the video presentation:
http://wp.me/p3RLHQ-f7G
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
Network virtualization logically separates network resources and allows multiple virtual networks to operate over a shared physical infrastructure. It provides benefits like efficient usage of network resources, logical isolation of traffic between users, and accommodating dynamic server virtualization. Key enablers of network virtualization are cloud computing, server virtualization, software-defined networking (SDN), and network functions virtualization (NFV). A virtual tenant network (VTN) uses an underlay physical network and an overlay virtual network to logically isolate traffic for different users or groups. Common uses of network virtualization are in data centers and telecommunication networks.
The document discusses building a private cloud with open source software for a scientific environment. It provides an overview of cloud computing concepts, benefits of private clouds and open source software. It then discusses specific challenges and considerations for implementing a private cloud solution for scientific/research use cases using open source platforms like Eucalyptus, OpenNebula and OpenStack. Key technical aspects around virtualization, storage, networking and components of sample platforms are also summarized.
The document discusses the FlexPod solution from Cisco, NetApp, and Proact. FlexPod provides a converged infrastructure that combines Cisco UCS servers and networking with NetApp storage. It offers scalability, efficiency, and unified management. Proact partners with Cisco and NetApp to provide FlexPod solutions that are validated, secure, and easily managed. FlexPods allow customers to reduce costs and complexity while improving agility over traditional siloed data center architectures.
The document discusses software defined networking (SDN) and OpenFlow, including their history, key concepts, potential uses and challenges. SDN aims to separate the network control and forwarding functions through open standards like OpenFlow. This could make networks more programmable and innovative while reducing costs. However, challenges include limitations of the current standards and ensuring scalability and interoperability across vendors.
Software-Defined Networking (SDN): Unleashing the Power of the NetworkRobert Keahey
It goes without saying that cloud computing has dramatically reshaped the information technology services landscape. Virtualization is unleashing the power of commodity-based technology and open source communities are building new applications and services at an astonishing rate, but networking has lagged behind compute and storage in virtualization and automation. We’ve become accustomed to specialized networking silicon, complex operating systems and highly distributed control planes. For the most part, we’ve accepted the model along with its high costs.
All that is changing! New protocols such as OpenFlow are freeing the network control plane from proprietary operating systems and hardware platforms. We are entering a new era where customers control the features – and release schedules – of new, open networking applications that address the needs of the mega-scale world.
A lot of work is required to realize the potential of Software-Defined Networking (SDN), where we can enjoy the benefits derived from “software automating software.” This talk will examine some of the history that led us to the point where current networking architectures are no longer viable for cloud computing at mega-scale. We’ll take a look at the basics of SDN and some of its key elements – OpenFlow, network virtualization, and orchestration – along with some of the initiatives and companies that are setting the stage for the next generation of networking.
Running containers in production, the ING storyThijs Ebbers
- ING is transforming itself into a digital bank and using containers and microservices as part of its cloud native journey.
- ING has developed its own container hosting platform called ICHP, which runs on OpenShift and provides self-service capabilities for development teams to host applications.
- ICHP aims to provide reliable hosting while minimizing handovers and enabling development teams to focus on delivering value to the business rather than managing infrastructure.
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingMark Hinkle
And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover:
Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula
Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud.
Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others.
Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it.
[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]
Midokura OpenStack Day Korea Talk: MidoNet Open Source Network Virtualization...Dan Mihai Dumitriu
OpenStack deployments for public or private clouds require overlay networking. Due to the scale and rate of change of virtual resources, it isn't practical to rely on traditional network constructs and isolation mechanims. Today's deployments require performance, resilience, and high availability to be considered truly production-ready. In this session, we deep dive into the MidoNet architecture, and process of sending a data packet across an OpenStack environment through a network overlay. A distributed architecture implements logical constructs that are used to build networks without a single point of failure, all while adding network functionality in a highly-scalable manner. Network functions are applied in a single virtual hop. By applying network services right at the ingress host, the network is free from unnecessary clogging and bottlenecks by avoiding additional hops. Packets reach their destination more efficiently with the single virtual hop. After this session, the audience will understand how distributed architectures allow efficient networking with routing decisions and network services applied at the edge. Also, the audience will understand how it is easier to scale clouds when the network intelligence is distributed.
This document discusses distributed data stores and NoSQL databases. It begins by explaining how relational databases do not scale well for large web applications. It then discusses various techniques for scaling relational databases like master-slave replication and data partitioning. It introduces NoSQL databases as an alternative for large, unstructured datasets. Key features of NoSQL databases discussed include flexible schemas, eventual consistency, and high availability. Common types of NoSQL databases and some advantages and limitations are also summarized.
This document discusses distributed data stores and NoSQL databases. It begins by explaining how relational databases do not scale well for large web applications. Distributed key-value data stores like BigTable address this issue by allowing massively parallel data storage and retrieval. NoSQL databases relax ACID properties and do not require fixed schemas. The CAP theorem states that distributed systems can only achieve two of three properties: consistency, availability, and partition tolerance. Most NoSQL databases favor availability over strong consistency. Eventual consistency means copies will become consistent over time without updates. NoSQL is suitable for very large datasets but regular databases remain best for typical organizational use cases.
This document discusses polyglot persistence and multi-cloud data management solutions. It begins by noting the huge amounts of data being generated and stored globally, such as the billions of pieces of content shared daily on social media platforms. It then discusses challenges in storing and accessing these massive datasets, which can range from the petabyte to exabyte scale. The document introduces the concept of polyglot persistence, where enterprises use a variety of data storage technologies suited to different types of data, rather than assuming a single relational database. It also discusses using NoSQL databases and deploying databases across multiple cloud platforms.
في الفيديو ده بيتم شرح ما هي المشاكل التي انتجت ظهور هذا النوع من قواعد البيانات
انواع المشاريع التي يمكن استخدامها بها
نبذة عن تاريخها و مزاياها و عيوبها
https://youtu.be/I9zgrdCf0fY
This document discusses NoSQL databases and compares them to relational databases. It provides information on different types of NoSQL databases, including key-value stores, document databases, wide-column stores, and graph databases. The document outlines some use cases for each type and discusses concepts like eventual consistency, CAP theorem, and polyglot persistence. It also covers database architectures like replication and sharding that provide high availability and scalability.
The document discusses big data challenges and solutions. It describes how specialized systems like Hadoop are more efficient than relational databases for large-scale data. It provides examples of open source projects that can be used for tasks like storage, search, streaming data, and batch processing. The document also summarizes the design of the Voldemort distributed key-value store and how it was inspired by Dynamo and Memcached.
This document provides an overview of NoSQL databases, including why they are used, common types, and how they work. The key points are:
1) SQL databases do not scale well for large amounts of distributed data, while NoSQL databases are designed for horizontal scaling across servers and partitions.
2) Common types of NoSQL databases include document, key-value, graph, and wide-column stores, each with different data models and query approaches.
3) NoSQL databases sacrifice consistency guarantees and complex queries for horizontal scalability and high availability. Eventual consistency is common, with different consistency models for different use cases.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
The document discusses the NoSQL movement and non-relational databases. It provides background on the limitations of relational databases that led to the development of NoSQL databases. Examples of NoSQL databases are described like Voldemort, CouchDB, and Cassandra. Benefits of NoSQL databases include horizontal scaling, high availability, and faster performance.
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
This document discusses data management in cloud computing and provides an overview of existing NoSQL database systems and their advantages over traditional SQL databases. It begins by defining cloud computing and the need for scalable data storage. It then discusses key goals for cloud data management systems including availability, scalability, elasticity and performance. Several popular NoSQL databases are described, including BigTable, MongoDB and Dynamo. The advantages of NoSQL systems like elastic scaling and easier administration are contrasted with some limitations like limited transaction support. The document concludes by discussing opportunities for future research to improve scalability and queries in cloud data management systems.
In this paper we describe NoSQL, a series of non-relational database
technologies and products developed to address the current problems the
RDMS system are facing: lack of true scalability, poor performance on high
data volumes and low availability. Some of these products have already been
involved in production and they perform very well: Amazon’s Dynamo,
Google’s Bigtable, Cassandra, etc. Also we provide a view on how these
systems influence the applications development in the social and semantic Web
sphere.
In this paper we describe NoSQL, a series of non-relational database technologies and products developed to address the current problems the RDMS system are facing: lack of true scalability, poor performance on high data volumes and low availability. Some of these products have already been involved in production and they perform very well: Amazon’s Dynamo, Google’s Bigtable, Cassandra, etc. Also we provide a view on how these systems influence the applications development in the social and semantic Web sphere.
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
Couchbase is a complete NoSQL database solution for big data. It provides a distributed database that can scale horizontally. Couchbase uses a document-oriented data model and supports the CAP theorem. It sacrifices consistency to achieve high availability and partition tolerance. Couchbase is used by many large companies for applications that involve large, complex datasets with high user volumes and real-time requirements.
This document provides an overview of topics to be covered in a database management systems course, including parallel and distributed databases, NoSQL databases, and MapReduce. It discusses parallel databases and different architectures for distributed databases. It introduces several NoSQL databases like Amazon SimpleDB, Google BigTable, and HBase and describes their data models and implementations. It also provides details about MapReduce, including its programming model, implementation, optimizations, and statistics on its usage at Google. The next class meetings will include a mid-term exam, student presentations on assigned topics, and a proposal for each student's final project.
This document outlines Oracle's general product direction but does not constitute a commitment and should not be relied upon for purchasing decisions. The development and release of any features described remains at Oracle's sole discretion. It then provides a high-level summary of the history and evolution of Oracle database technology from its beginnings in the 1970s through recent innovations in cloud computing, engineered systems, and managing big data.
The document discusses evolving data warehousing strategies and architecture options for implementing a modern data warehousing environment. It begins by describing traditional data warehouses and their limitations, such as lack of timeliness, flexibility, quality, and findability of data. It then discusses how data warehouses are evolving to be more modern by handling all types and sources of data, providing real-time access and self-service capabilities for users, and utilizing technologies like Hadoop and the cloud. Key aspects of a modern data warehouse architecture include the integration of data lakes, machine learning, streaming data, and offering a variety of deployment options. The document also covers data lake objectives, challenges, and implementation options for storing and analyzing large amounts of diverse data sources.
The document discusses choosing between SQL and NoSQL databases. It covers the evolution of data architectures from traditional client-server models to newer distributed NoSQL solutions. It provides an overview of different data store types like SQL, NoSQL, key-value, document, column family, and graph databases. The document advises picking the right data model based on business needs, use cases, data storage requirements, and growth patterns then evaluating solutions based on pros and cons. It concludes that for large, growing data, both SQL and NoSQL solutions may be needed.
IoT (and M2M and WoT) From the Operators (CSP) perspectiveSamuel Dratwa
Short introduction to IoT for telecom operators, providers and vendors.
Including: value chain, working examples and more.
Also describe: Smart cities, Smart home, wearables, etc.
The document provides an introduction to cloud computing. It begins with an overview of the course agenda and then defines cloud computing. It discusses the three main service models of cloud computing: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The document then provides examples of each service model and their advantages. It also discusses public and private cloud models as well as cloud architecture, including load balancing, data centers, and virtualization. The document concludes with a discussion of the future of cloud computing including Kubernetes and containerization.
The document lists numerous abbreviations and acronyms related to telecommunications. It includes abbreviations for 3rd Generation Partnership Project (3GPP), Average Revenue Per User (ARPU), Evolved Packet Core (EPC), Global System for Mobile communications (GSM), Long Term Evolution (LTE), Mobile Switching Center (MSC), Policy and Charging Rule Function (PCRF), Public Land Mobile Network (PLMN), Session Initiation Protocol (SIP), and Virtual Network Function (VNF) among many others for technologies, standards, functions and components in telecommunications networks.
This document provides an overview of Logtel's activities including training, consulting, software development, telecom hardware, computer technology skills, and product training. It discusses Logtel's fields of expertise like telecom hardware, computer technology, and skills in areas like Israel's hi-tech industry. The document also mentions Logtel's branches, partners, and services related to outsourcing training worldwide.
This document provides an overview of artificial intelligence (AI) and its history. It discusses early definitions of AI from the 1950s and examples of AI like Siri. It also summarizes different approaches to AI like neural networks, natural language processing, and the future of customer relationship management using AI. The document outlines the evolution of AI ideas over time from games to knowledge representation and machine learning. It discusses how concepts can be represented and taught to computers through examples like the concept of a chair. Finally, it briefly touches on functional programming approaches to AI.
The document discusses next generation networks (NGN) and IP Multimedia Subsystem (IMS). NGN aims to converge different access networks onto a single all-IP infrastructure to seamlessly deliver multimedia services. IMS is an architectural framework for delivering IP-based services to users on both fixed and mobile networks. It provides session control functions and enables real-time multimedia services like voice and video over packet networks.
The document discusses various trends in telecommunications including network, service, screen, customer, payment, and vendor convergence. It describes how networks are becoming more integrated using technologies like IP and IMS. Services are converging across devices with fixed-mobile integration and convergence of phones, PCs, and TV. Customers are also converging as business and consumer lines blur. New services enabled by these trends include content delivery networks, location-based services, and community applications.
Web 2.0 contains 2 different but incorporated topics: User Generated Content and Long Tail. In this short lecture we will elaborate on both topics and how they influent the internet in general and even our "traditional life" outside the internet using the telecom industry.
רשתות חברתיות ומידע עסקי - או למה צריך להיות שםSamuel Dratwa
הרצאת מבוא לרשתות חברתיות ומידע עסקי.
בהרצאה נסקור בקצרה את (חלק) מהרשתות הקימות, ההבדלים בניהן והיתרונות והחסרונות של כל אחר.
נדבר על איזה רשת כדאי ל"השקיע" בה - ומה משמעות ההשקעה.
נראה דוגמאות למידע שניתן לאתר ברשתות ולשימוש נוספים בהן
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
8. Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
8
Exponential increase in
collected/generated data
10. Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and
structures
• Text, numerical, images, audio,
video, sequences, time series, social
media data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
10
12. Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed
fast
• Online Data Analytics
• Late decisions missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase
history, what you like send promotions right now for store next to
you
• Healthcare monitoring: sensors monitoring your activities and body
any abnormal measurements require immediate reaction
12
14. Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable
fashion
14
15. The Model Has
Changed…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming
data
15
41. BASE in Cassandra
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest Query
Digest Response Digest Response
Result
Client
Read repair if
digests differ
90. What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
91
91. Value of Big Data Analytics
• Big data is more real-time in
nature than traditional DW
applications
• Traditional DW architectures (e.g.
Exadata, Teradata) are not well-
suited for big data apps
• Shared nothing, massively parallel
processing, scale out
architectures are well-suited for
big data apps
92
94. What is collecting all this data?
Web Browsers Search Engines
Microsoft’s
Internet Explorer
Mozilla’s FireFox
Google’s Chrome
Apple’s Safari
Google’s
Microsoft’s
Yahoo’s
IAC Search’s
Time-Warner’s AOL
Explorer
(Non-profit foundation,
used to be Netscape)
95. What is collecting all this data?
Smartphones & Apps
Apple’s iPhone
(Apple O/S)
Samsung, HTC.
Nokia, Motorola
(Android O/S)
RIM Corp’s Blackberry
(BlackBerry O/S)
Tablet Computers & Apps
Apple’s iPad
Samsung’s Galaxy
Amazon’s Kindle Fire
96. What is collecting all this data?
Hospitals & Other Medical Systems Banking & Phone Systems
Can you hear me now?
(Heh heh heh!)
Pharmacies
Laboratories
Imaging Centers
Emergency Medical Services (EMS)
Hospital Information Systems
Doc-in-a-Box
Electronic Medical Records
Blood Banks
Birth & Death Records
97. What is collecting all this data?
A real pain in the apps! What are they collecting?
• Restaurant reservations
(Open Table)
• Weather in L.A. in 3 days
(Weather+)
• Side effects of medications
(MedWatcher)
• 3-star hotels in New Orleans
(Priceline)
• Which PC should I buy and where
(PriceCheck)
98. Big Brother Needs Big Data
In March 2012, the Obama Administration announced the Big Data Research
and Development Initiative, $200 million in new R&D investments, which will
explore how Big Data could be used to address important problems facing the
government. The initiative was composed of 84 different Big Data programs
spread across six departments.
http://tinyurl.com/85oytkj
The U.S. Federal Government owns six of the ten most powerful supercomputers
in the world.
99. How Companies Like Use Big
Data To Make You Love Them
Last month, I talked to Amazon customer service about my malfunctioning
Kindle, and it was great. Thirty seconds after putting in a service request on
Amazon’s website, my phone rang, and the woman on the other end--let’s call
her Barbara--greeted me by name and said, "I understand that you have a
problem with your Kindle." We resolved my problem in under two minutes,
we got to skip the part where I carefully spell out my last name and address,
and she didn’t try to upsell me on anything. After nearly a decade of ordering
stuff from Amazon, I never loved the company as much as I did at that
moment.
The fact is, Amazon has been collecting my information for years--not just
addresses and payment information but the identity of everything I’ve ever
bought or even looked at. And while dozens of other companies do that,
too, Amazon’s doing something remarkable with theirs. They’re using that
data to build our relationship.
Article by Sean Madden, May 2012, an expert in service design and innovation strategy.
100. How Can You Avoid Big Data?
• Pay cash for everything!
• Never go online!
• Don’t use a telephone!
• Don’t use Kroger or Harris Teeter cards!
• Don’t fill any prescriptions!
• Never leave your house!
101. Key concept of Big Data
• Store everything
• Don’t delete anything
• Schema is a bottleneck
• Think always on parallel
• Remember the CAP theorem