There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
A comparison between Relational and Non relational Databases.
The full article is available at this link: https://towardsdatascience.com/relational-vs-non-relational-databases-f2ac792482e3
This presentation is all about for the difference in between the Sql and NoSQL database because this question generally comes in the mind of every people that on what parameters and
how we can differentiate both these databases.
So, after viewing this presentation all your doubts and misconfusion between Sql and NoSQL got clear.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
Ian closely looks at design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.g
A comparison between Relational and Non relational Databases.
The full article is available at this link: https://towardsdatascience.com/relational-vs-non-relational-databases-f2ac792482e3
This presentation is all about for the difference in between the Sql and NoSQL database because this question generally comes in the mind of every people that on what parameters and
how we can differentiate both these databases.
So, after viewing this presentation all your doubts and misconfusion between Sql and NoSQL got clear.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
Ian closely looks at design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.g
This tutorial will provide you with a basic understanding of graph database technology and the ability to quickly begin development of a graph database application. You will have the capability to recognize graph-based problems and present the benefits of using graph technology for problem resolution.
The tutorial will give you an understanding of:
• Graph theory - origins and concepts
• Benefits of graph databases
• Different types of graph databases
• Typical graph database API
• Programming basics
• Use cases
Bring your laptops for a hands-on opportunity to practice some sample codes. A basic understanding of Java programming is a recommended prerequisite to understand this course. This session is led by the InfiniteGraph technical team and the demonstration code will be drawn from InfiniteGraph examples, however the broader educational presentation is product-neutral and not a commercial presentation of their products.
To participate in the hands-on portion of the graph tutorial users must have:
• Java programming experience
• Java Developer Kit (JDK)
• Current InfiniteGraph installed on laptop. (To download visit www.objectivity.com/infinitegraph)
• HelloGraph test – Upon installing IG, run HelloGraph to test the install. (HelloGraph can be found online at http://wiki.infinitegraph.com/2.1/w/index.php?title=Download_Sample_Code)
Leon Guzenda was one of the founding members of Objectivity in 1988 and one of the original architects of Objectivity/DB. He currently works with Objectivity's major customers to help them effectively develop and deploy complex applications and systems that use the industry's highest-performing, most reliable DBMS technology, Objectivity/DB. He also liaises with technology partners and industry groups to help ensure that Objectivity/DB remains at the forefront of database and distributed computing technology. Leon has more than 35 years experience in the software industry. At Automation Technology Products, he managed the development of the ODBMS for the Cimplex solid modeling and numerical control system. Before that, he was Principal Project Director for International Computers Ltd. in the United Kingdom, delivering major projects for NATO and leading multinationals. He was also design and development manager for ICL's 2900 IDMS product. He spent the first 7 years of his career working in defense and government systems. Leon has a B.S. degree in Electronic Engineering from the University of Wales.
Graph Database Management Systems provide an effective
and efficient solution to data storage in current scenarios
where data are more and more connected, graph models are
widely used, and systems need to scale to large data sets.
In this framework, the conversion of the persistent layer of
an application from a relational to a graph data store can
be convenient but it is usually an hard task for database
administrators. In this paper we propose a methodology
to convert a relational to a graph database by exploiting
the schema and the constraints of the source. The approach
supports the translation of conjunctive SQL queries over the
source into graph traversal operations over the target. We
provide experimental results that show the feasibility of our
solution and the efficiency of query answering over the target
database.
Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.
Ready to leverage the power of a graph database to bring your application to the next level, but all the data is still stuck in a legacy relational database?
Fortunately, Neo4j offers several ways to quickly and efficiently import relational data into a suitable graph model. It's as simple as exporting the subset of the data you want to import and ingest it either with an initial loader in seconds or minutes or apply Cypher's power to put your relational data transactionally in the right places of your graph model.
In this webinar, Michael will also demonstrate a simple tool that can load relational data directly into Neo4j, automatically transforming it into a graph representation of your normalized entity-relationship model.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
NativeX (formerly W3i) recently transitioned a large portion of their backend infrastructure from MS SQL Server to Apache Cassandra. Today, its Cassandra cluster backs its mobile advertising network supporting over 10 million daily active users producing over 10,000 transactions per second with an average database request latency of under 2 milliseconds. Going from relational to noSQL required NativeX's engineers to re-train, re-tool and re-think the way it architects applications and infrastructure. Learn why Cassandra was selected as a replacement, what challenges were encountered along the way, and what architecture and infrastructure were involved in the implementation.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Why we need Database Awareness?
Document vs Relational
Row-based vs Column-based
In-memory Database vs In-memory Data grids
Graph
Time-series
Solr vs ElasticSearch
Event Store
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
The presentation begins with an overview of the growth of non-structured data and the benefits NoSQL products provide. It then provides an evaluation of the more popular NoSQL products on the market including MongoDB, Cassandra, Neo4J, and Redis. With NoSQL architectures becoming an increasingly appealing database management option for many organizations, this presentation will help you effectively evaluate the most popular NoSQL offerings and determine which one best meets your business needs.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
Power BI Overview, Deployment and GovernanceJames Serra
Deploying Power BI in a large enterprise is a complex task, and one that requires a lot of thought and planning. The purpose of this presentation is to help you make your Power BI deployment a success. After a quick Power BI overview, I’ll discuss deployment strategies, common usage scenarios, how to store and refresh data, prototyping options, how to share externally, and then finish with how to administer and secure Power BI. I’ll outline considerations and best practices for achieving an optimal, well-performing, enterprise level Power BI deployment.
Power BI has become a product with a ton of exciting features. This presentation will give an overview of some of them, including Power BI Desktop, Power BI service, what’s new, integration with other services, Power BI premium, and administration.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
In three years I went from a complete unknown to a popular blogger, speaker at PASS Summit, a SQL Server MVP, and then joined Microsoft. Along the way I saw my yearly income triple. Is it because I know some secret? Is it because I am a genius? No! It is just about laying out your career path, setting goals, and doing the work.
I'll cover tips I learned over my career on everything from interviewing to building your personal brand. I'll discuss perm positions, consulting, contracting, working for Microsoft or partners, hot fields, in-demand skills, social media, networking, presenting, blogging, salary negotiating, dealing with recruiters, certifications, speaking at major conferences, resume tips, and keys to a high-paying career.
Your first step to enhancing your career will be to attend this session! Let me be your career coach!
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Learning to present and becoming good at itJames Serra
Have you been thinking about presenting at a user group? Are you being asked to present at your work? Is learning to present one of the keys to advancing your career? Or do you just think it would be fun to present but you are too nervous to try it? Well take the first step to becoming a presenter by attending this session and I will guide you through the process of learning to present and becoming good at it. It’s easier than you think! I am an introvert and was deathly afraid to speak in public. Now I love to present and it’s actually my main function in my job at Microsoft. I’ll share with you journey that lead me to speak at major conferences and the skills I learned along the way to become a good presenter and to get rid of the fear. You can do it!
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. About Me
Microsoft, Big Data Evangelist
In IT for 30 years, worked on many BI and DW projects
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm employee, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
3. Agenda
Definition and differences
ACID vs BASE
Four categories of NoSQL
Use cases
CAP theorem
On-prem vs cloud
Product categories
Polyglot persistence
Architecture samples
4. Goal
My goal is to give you a high level overview of all the technologies so you know where to start and put you on
the right path to be a hero!
5. Relational and non-relational defined
Relational databases (RDBMS, SQL Databases)
• Example: Microsoft SQL Server, Oracle Database, IBM DB2
• Mostly used in large enterprise scenarios
• Analytical RDBMS (OLAP, MPP) solutions are Analytics Platform System, Teradata, Netezza
Non-relational databases (NoSQL databases)
• Example: Azure Cosmos DB, MongoDB, Cassandra
• Four categories: Key-value stores, Wide-column stores, Document stores and Graph stores
Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce
6. Origins
Using SQL Server, I need to index a few thousand documents and search them.
No problem. I can use Full-Text Search.
I’m a healthcare company and I need to store and analyze millions of medical claims per day.
Problem. Enter Hadoop.
Using SQL Server, my internal company app needs to handle a few thousand transactions per second.
No problem. I can handle that with a nice size server.
Now I have Pokémon Go where users can enter millions of transactions per second.
Problem. Enter NoSQL.
But most enterprise data just needs an RDBMS (89% market share – Gartner).
7. Main differences (Relational)
Pros
• Works with structured data
• Supports strict ACID transactional consistency
• Supports joins
• Built-in data integrity
• Large eco-system
• Relationships via constraints
• Limitless indexing
• Strong SQL
• OLTP and OLAP
• Most off-the-shelf applications run on RDBMS
8. Main differences (Relational)
Cons
• Does not scale out horizontally (concurrency and data size) – only vertically, unless use sharding
• Data is normalized, meaning lots of joins, affecting speed
• Difficulty in working with semi-structured data
• Schema-on-write
• Cost
9. Main differences (Non-relational/NoSQL)
Pros
• Works with semi-structured data (JSON, XML)
• Scales out (horizontal scaling – parallel query performance, replication)
• High concurrency, high volume random reads and writes
• Massive data stores
• Schema-free, schema-on-read
• Supports documents with different fields
• High availability
• Cost
• Simplicity of design: no “impedance mismatch”
• Finer control over availability
• Speed, due to not having to join tables
10. Main differences (Non-relational/NoSQL)
Cons
• Weaker or eventual consistency (BASE) instead of ACID
• Limited support for joins, does not support star schema
• Data is denormalized, requiring mass updates (i.e. product name change)
• Does not have built-in data integrity (must do in code)
• No relationship enforcement
• Limited indexing
• Weak SQL
• Limited transaction support
• Slow mass updates
• Uses 10-50x more space (replication, denormalized, documents)
• Difficulty tracking schema changes over time
• Most NoSQL databases are still too immature for reliable enterprise operational applications
11. Main differences (Hadoop)
Pros
• Not a type of database, but rather a open-source software ecosystem that allows for massively
parallel computing
• No inherent structure (no conversion to relational or JSON needed)
• Good for batch processing, large files, volume writes, parallel scans, sequential access
• Great for large, distributed data processing tasks where time isn’t a constraint (i.e. end-of-day
reports, scanning months of historical data)
• Tradeoff: In order to make deep connections between many data points, the technology
sacrifices speed
• Some NoSQL databases such as HBase are built on top of HDFS
12. Main differences (Hadoop)
Cons
• File system, not a database
• Not good for millions of users, random access, fast individual record lookups or updates (OLTP)
• Not so great for real-time analytics
• Lacks: indexing, metadata layer, query optimizer, memory management
• Same cons at non-relational: no ACID support, data integrity, limited indexing, weak SQL, etc
• Security limitations
• More complex debugging
Hadoop adoption has slowed
• Too much hype
• Companies adopt is without understanding use cases (i.e. real big data)
• Difficulty in finding skillset
• Pace of change too fast
• Too many products involved in a solution
• Other technologies (RDBMS, NoSQL) improving and expanding use cases
• Higher learning curve
13. ACID (RDBMS) vs BASE (NoSQL)
ATOMICITY: All data and commands in a
transaction succeed, or all fail and roll back
CONSISTENCY: All committed data must be
consistent with all data rules including
constraints, triggers, cascades, atomicity,
isolation, and durability
ISOLATION: Other operations cannot access
data that has been modified during a
transaction that has not yet completed
DURABILITY: Once a transaction is
committed, data will survive system failures,
and can be reliably recovered after an
unwanted deletion
Needed for bank transactions
Basically Available: Guaranteed Availability
Soft-state: The state of the system may change, even
without a query (because of node updates)
Eventually Consistent: The system will become
consistent over time
Ok for web page visits
ACID BASE
Strong Consistency Weak Consistency – stale data OK
Isolation Last Write Wins
Transaction Programmer Managed
Available/Consistent Available/Partition Tolerant
Robust Database/Simpler Code Simpler Database, Harder Code
14. Data stored in tables.
Tables contain some number of columns, each of a type.
A schema describes the columns each table can have.
Every table’s data is stored in one or more rows.
Each row contains a value for every column in that table.
Rows aren’t kept in any particular order.
15. Thanks to: Harri Kauhanan, http://www.slideshare.net/harrikauhanen/nosql-3376398
Relational stores
16. Key-value stores offer very high speed via the
least complicated data model—anything can
be stored as a value, as long as each value is
associated with a key or name.
Key Value
18. Wide-column stores are fast and can be nearly as simple as key-value stores. They include a primary
key, an optional secondary key, and anything stored as a value. Also called column stores
Values
Primary key Keys and values can
be sparse or
numerous
Secondary
key
24. Use cases for NoSQL categories
• Key-value stores: [Redis] For cache, queues, fit in memory, rapidly changing data, store blob data.
Examples: shopping cart, session data, leaderboards, stock prices. Fastest performance
• Wide-column stores: [Cassandra] Real-time querying of random (non-sequential) data, huge
number of writes, sensors. Examples: Web analytics, time series analytics, real-time data analysis,
banking industry. Internet scale
• Document stores: [MongoDB] Flexible schemas, dynamic queries, defined indexes, good
performance on big DB. Examples: order data, customer data, log data, product catalog, user
generated content (chat sessions, tweets, blog posts, ratings, comments). Fastest development
• Graph databases: [Neo4j] Graph-style data, social network, master data management, network and
IT operations. Examples: social relations, real-time recommendations, fraud detection, identity and
access management, graph-based search, web browsing, portfolio analytics, gene sequencing, class
curriculum
Note: Many NoSQL solutions are now multi-model
25. Velocity
Volume Per
Day
Real-world
Transactions
Per Day
Real-world
Transactions
Per Second
Relational DB Document DB Key Value or
Wide Column
8 GB 8.64B 100,000 As Is
86 GB 86.4B 1M Tuned* As Is
432 GB 432B 5M Appliance Tuned* As Is
864 GB 864B 10M Clustered
Appliance
Clustered
Servers
Tuned*
8,640 GB 8.64T 100M Many
Clustered
Servers
Clustered
Servers
43,200 GB 43.2T 500M Many
Clustered
Servers
* Tuned means tuning the model, queries, and/or hardware (more CPU, RAM, and Flash)
26. Focus of different data models
…you may not have the data volume for NoSQL (yet), but there are other reasons to use
NoSQL (semi-structured data, schemaless, high availability, etc)
27. Relational NewSQL stores are designed for web-scale
applications, but still require up-front schemas, joins, and
table management that can be labor intensive.
Blend RDBMS with NoSQL: provide the same scalable
performance of NoSQL systems for OLTP read-write
workloads while still maintaining the ACID guarantees of
a traditional relational database system.
28. Use case for different database technologies
• Traditional OLTP business systems (i.e. ERP, CRM, In-house app): relational database (RDBMS)
• Data warehouses (OLAP): relational database (SMP or MPP)
• Web and mobile global OLTP applications: non-relational database (NoSQL)
• Data lake: Hadoop
• Relational and scalable OLTP: NewSQL
29. CAP Theorem
Impossible for any shared data system to guarantee simultaneously all of the
following three properties:
Consistency: Once data is written, all future requests will contain the data. “Is
the data I’m looking at now the same if I look at it somewhere else?”
Availability: The database is always available and responsive. “What happens
if my database goes down?”
Partitioning: If part of the database is unavailable, other parts are unaffected.
“What if my data is on a different node?”
Relational: CA (i.e. SQL Server with no replication)
Non-relational: AP (Cassandra, CoachDB, Riak); CP (Hbase, Cosmos DB, MongoDB, Redis)
NoSQL can’t be both consistent and available. If two nodes (A and B) and B goes down, if
the A node takes requests, it is available but not consistent with B node. If A node stops
taking requests, it remains consistent with B node but it is not available. RDBMS is
consistent and available because it only has one node/partition (so no partition tolerance)
30. Microsoft data platform solutions
Product Category Description More Info
SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic
quadrant. JSON support
https://www.microsoft.com/en-us/server-
cloud/products/sql-server-2016/
SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly.
Has built-in high availability and disaster recovery. JSON
support
https://azure.microsoft.com/en-
us/services/sql-database/
SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data.
Provision and scale quickly. Can pause service to reduce
cost
https://azure.microsoft.com/en-
us/services/sql-data-warehouse/
Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and
seamless integration of all your data
https://www.microsoft.com/en-us/server-
cloud/products/analytics-platform-
system/
Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of
your data while making it faster to get up and running with
batch, streaming, and interactive analytics
https://azure.microsoft.com/en-
us/services/data-lake-store/
Azure Data Lake Analytics On-demand analytics job
service/Big Data-as-a-
service
Cloud-based service that dynamically provisions resources
so you can run queries on exabytes of data. Includes U-
SQL, a new big data query language
https://azure.microsoft.com/en-
us/services/data-lake-analytics/
HDInsight PaaS Hadoop compute A managed Apache Hadoop, Spark, R, HBase, and Storm
cloud service made easy
https://azure.microsoft.com/en-
us/services/hdinsight/
Azure Cosmos DB PaaS NoSQL: Document
Store
Get your apps up and running in hours with a fully
managed NoSQL database service that indexes, stores, and
queries data using familiar SQL syntax
https://azure.microsoft.com/en-
us/services/cosmos-db/
Azure Table Storage PaaS NoSQL: Key-value
Store
Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en-
us/services/storage/tables/
32. Azure Cosmos DB consistency options
• Strong, which is the slowest of the four, but is guaranteed to always return correct data
• Bounded staleness, which ensures that an application will see changes in the order in which they were
made. This option does allow an application to see out-of-date data, but only within a specified
window, e.g., 500 milliseconds
• Session, which ensures that an application always sees its own writes correctly, but allows access to
potentially out-of-date or out-of-order data written by other applications
• Consistent Prefix (new), updates returned are some prefix of all the updates, with no gaps
• Eventual, which provides the fastest access, but also has the highest chance of returning out-of-date
data
33. On-prem vs Cloud
• On-prem: SQL Server, APS, MongoDB, Oracle, Cassandra, Neo4J
• IaaS Cloud: SQL Server in Azure VM, Oracle in Azure VM
• DBaaS/PaaS Cloud: SQL Database, SQL Data Warehouse, Azure Cosmos DB, Redshift, RDS, MongoLab
38. db-engines.com/en/ranking
Method of calculation:
• Number of mentions of the system
on websites
• General interest in the system
• Frequency of technical discussions
about the system
• Number of job offers, in which the
system is mentioned
• Number of profiles in professional
networks, in which the system is
mentioned
• Relevance in social networks
db-engines.com/en/ranking_definition
40. Polyglot Persistence
• Sometimes a relational store is the right choice, sometimes a NoSQL store is the right choice
• Sometimes you need more than one store: Using the right tool for the right job
41.
42.
43. Summary
Choose NoSQL when…
• You are bringing in new data with a lot of volume and/or variety
• Your data is non-relational/semi-structured
• Your team will be trained in these new technologies (NoSQL)
• You have enough information to correctly select the type and product of NoSQL for your situation
• You can relax transactional consistency when scalability or performance is more important
• You can service a large number of user requests vs rigorously enforcing business rules
Relational databases are created for strong consistency, but at the cost of speed and scale. NoSQL slightly sacrifices
consistency across nodes for both speed and scalability.
NoSQL and Hadoop are viable technologies for a subset of specialized needs and use cases.
Lines are getting blurred – do your homework!
44. Bottom line!
• RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB
• NoSQL for scaled OLTP and JSON documents
• Hadoop for big data analytics (OLAP)
45. Resources
Relational database vs Non-relational databases: http://bit.ly/1HXn2Rt
Types of NoSQL databases: http://bit.ly/1HXn8Zl
What is Polyglot Persistence? http://bit.ly/1HXnhMm
Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
Hadoop and Microsoft: http://bit.ly/20Cg2hA
46. Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)
Editor's Notes
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions (“NoSQL databases”) compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, how they compare to Hadoop, and discuss the best use cases for each. I’ll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
Fluff, but point is I bring real work experience to the session
My goal is to give you a high level overview of all the technologies so you know where to start Make you a hero
Hadoop started 2006. NoSQL started 2009
DocumentDB has done 5m/tps per region for 4 regions, so 20m/tps. DocumentDB uses local storage
Kevin Cox: What is the highest performance (transactions per second) you have seen out of SQL Server? Over 500k/sec. Very dependent on using flash-type storage for tran log; i.e. FusionIO or similar. Also short transactions (stock trades).
Matt Goswell: Please see attached. SDX offers 171,800 TPS however this is using SQL 2014. We are waiting on updated numbers for SQL 2016.
Arvind Shyamsundar: The question is fairly open-ended and the answer is dependent on the workload pattern. On the in-memory OLTP front, we achieved 1.2 million batch requests / second on a Fujitsu Primergy server (4 sockets, 72 cores, 144 logical procs) last October. The Superdome X can go up to 16-sockets and hundreds of cores, but with the form factor beyond 4 sockets comes increased NUMA memory latency. So more sockets does not necessarily translate to more throughput. The recent 10TB TPC-H numbers we released were all on 8-socket Lenovo boxes, and the workload involved is predominantly read-workload
https://blogs.msdn.microsoft.com/sqlcat/2016/10/26/how-bwin-is-using-sql-server-2016-in-memory-oltp-to-achieve-unprecedented-performance-and-scale/
sql server: 1.2m batch requests/sec (30-40 sql statements each batch)
Batch requests / second is the nearest equivalent to compare transactions / second. Statements is not an accurate comparison. Transactions / second is too overloaded / ambiguous because it could mean any of:
Business transactions / second (one business transactions could mean multiple SQL batches)
Batch requests / second (assuming one business transaction == one SQL batches)
Some other number involving interplay between SQL commands and external web services etc.
So from a pure OLTP perspective we prefer to quote batch requests / second in this ‘benchmark’. Proper benchmarks like TPC benchmarks have their own clearly defined unit of meansurement (http://www.tpc.org/tpcc/detail.asp)
Arvind Shyamsundar
OLTP DBMS now called Operational DBMS: http://www.gartner.com/technology/reprints.do?id=1-2RIVJYE&ct=151104&st=sb
Hadoop is kind of FileSystem on which Several Ecosystem can work. Its not a DB.Nosql is a kind of DB, Which having specific property.
The diff between filesystem and database is subtle. Anyway databases store all data in files or in RAM. Also we have "object storages"(like S3), or " key-value data stores"(like Riak), or "data structure stores"(like Redis) and we can treat them as the databases.Hadoop is file system and technology stack including NoSQL solutions(HBase for example). NoSQL is a set of methods or ways of data handling.
Hadoop HDFS + YARN is a file system on steroids... i.e. it is neither a relational DBMS's nor non-relational (NoSql) DBMS's... it is optimized for string processing (large strings in large amounts of data)... Hadoop allows users to interact with the data via SQL (multiple options of SQL dialects) and NoSql (multiple options of procedural languages)... unfortunately, in a sub-optimal performance and functionally restrictive for all non string related processing... that's the reason for all vendors and gurus to be so emphatic about Hadoop costs...
For any real-time processing or analytics, NoSQL would be a better use case, rather than Hadoop. However, there are several factors to keep in mind. NoSQL is better suited for simple data structure (key-value, doc etc), but Hadoop has no inherent structure. Hadoop is better for volume writes and parallel scans, but NoSQL is better for high volume random reads (indexed access) and writes. Finally, it would be important to look at what type of analytics you want to do: statistical (with R), Visualization etc to pick the right store. Sometimes it would mean to have both hadoop and NoSQL
On SQL, you nay not need to define schema, but you still need to convert to key/value or JSON before you can store Hadoop is good for batch processing and you don't want to expose to millions of users
Historically Hadoop ecosystem(hdfs,map reduce,yarn etc) targeted OLAP use cases and No Sql (Cassandra, Couchbase etc) were more towards OLTP work loads. However lines are getting blurred. You gave a good example of Map Reduce on Couchbase. Or Hbase on Hadoop ecosystem targeting real time use cases.
HDFS (Hadoop File System) has been built for large files and is very efficient in batch processing ,supports sequential access of data only , hence no support for random access and fast individual record lookups and data update is not efficient either, while NoSQL database addresses all the these challenges.
To reiterate in short, Hadoop is a computation platform, while NoSQL is an unstructured database.
Hadoop on its most basic constituent is a distributed file system HDFS built to store large volume of string data in parallel with redundancy. But the filesystem by itself is of little use without the rest of the ecosystem like YARN, HBASE, HIVE, etc (and now SPARC for more realtime usage) providing more user friendly usage. HBASE also falls under the noSQL category. NoSQL come in different flavors based on the inherent architecture and use-cases they support.
OLTP DBMS now called Operational DBMS: http://www.gartner.com/technology/reprints.do?id=1-2RIVJYE&ct=151104&st=sb
Hadoop is kind of FileSystem on which Several Ecosystem can work. Its not a DB.Nosql is a kind of DB, Which having specific property.
The diff between filesystem and database is subtle. Anyway databases store all data in files or in RAM. Also we have "object storages"(like S3), or " key-value data stores"(like Riak), or "data structure stores"(like Redis) and we can treat them as the databases.Hadoop is file system and technology stack including NoSQL solutions(HBase for example). NoSQL is a set of methods or ways of data handling.
Hadoop HDFS + YARN is a file system on steroids... i.e. it is neither a relational DBMS's nor non-relational (NoSql) DBMS's... it is optimized for string processing (large strings in large amounts of data)... Hadoop allows users to interact with the data via SQL (multiple options of SQL dialects) and NoSql (multiple options of procedural languages)... unfortunately, in a sub-optimal performance and functionally restrictive for all non string related processing... that's the reason for all vendors and gurus to be so emphatic about Hadoop costs...
For any real-time processing or analytics, NoSQL would be a better use case, rather than Hadoop. However, there are several factors to keep in mind. NoSQL is better suited for simple data structure (key-value, doc etc), but Hadoop has no inherent structure. Hadoop is better for volume writes and parallel scans, but NoSQL is better for high volume random reads (indexed access) and writes. Finally, it would be important to look at what type of analytics you want to do: statistical (with R), Visualization etc to pick the right store. Sometimes it would mean to have both hadoop and NoSQL
On SQL, you nay not need to define schema, but you still need to convert to key/value or JSON before you can store Hadoop is good for batch processing and you don't want to expose to millions of users
Historically Hadoop ecosystem(hdfs,map reduce,yarn etc) targeted OLAP use cases and No Sql (Cassandra, Couchbase etc) were more towards OLTP work loads. However lines are getting blurred. You gave a good example of Map Reduce on Couchbase. Or Hbase on Hadoop ecosystem targeting real time use cases.
HDFS (Hadoop File System) has been built for large files and is very efficient in batch processing ,supports sequential access of data only , hence no support for random access and fast individual record lookups and data update is not efficient either, while NoSQL database addresses all the these challenges.
To reiterate in short, Hadoop is a computation platform, while NoSQL is an unstructured database.
Hadoop on its most basic constituent is a distributed file system HDFS built to store large volume of string data in parallel with redundancy. But the filesystem by itself is of little use without the rest of the ecosystem like YARN, HBASE, HIVE, etc (and now SPARC for more realtime usage) providing more user friendly usage. HBASE also falls under the noSQL category. NoSQL come in different flavors based on the inherent architecture and use-cases they support.
NoSQL: Analogy of building a race car from a regular car…stripping off the parts
scalable because all data within one doc and no need to move data to join tables
Join not a problem for OLTP, but a problem for OLAP
NoSQL: Analogy of building a race car from a regular car…stripping off the parts
scalable because all data within one doc and no need to move data to join tables
Join not a problem for OLTP, but a problem for OLAP
http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/
Hadoop Common – Contains libraries and utilities needed by other Hadoop modules
Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
Hadoop MapReduce – A programming model for large scale data processing. It is designed for batch processing. Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc). But Java is the most popular
Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0). YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/
Hadoop Common – Contains libraries and utilities needed by other Hadoop modules
Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
Hadoop MapReduce – A programming model for large scale data processing. It is designed for batch processing. Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc). But Java is the most popular
Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0). YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
https://www.linkedin.com/pulse/hadoop-falling-george-hill
http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/
Hadoop Common – Contains libraries and utilities needed by other Hadoop modules
Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
Hadoop MapReduce – A programming model for large scale data processing. It is designed for batch processing. Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc). But Java is the most popular
Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0). YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
Am I doing bank transactions or counting web page visits?
In NoSQL, to maintain high availability or for performance reasons, data has multiple copies. These copies will not all be updated instantaneously when there is a data change, but will all eventually be updated (“eventually consistent”)
Use cases: scale, cache, store blob data, shopping cart, session data, leaderboards, queues
See http://www.slideshare.net/harrikauhanen/nosql-3376398
Also called Columnar stores or column stores
Use cases: scale, real-time querying of random (non-sequential) data. Web analytics, time series analytics, huge number of writes, big data storage. Like document stores except data is stored on nodes
Use cases: social network, master data management, network and IT operations, real-time recommendations, fraud detection, identity and access management, graph-based search, web browsing, portfolio analytics, gene sequencing, class curriculum
MongoDB vs Cassandra: http://theprofessionalspoint.blogspot.com/2014/01/mongodb-vs-cassandra-difference-and.html:
Cassandra is much better suited for highly distributed applications due to its tunable replication engine. It was built from the ground up to be a shared-nothing data engine. MongoDB, by contrast, is better suited for applications that need a dynamic schema-less approach.
https://www.youtube.com/watch?v=PENcqjVKqr4c
https://www.youtube.com/watch?v=gJFG04Sy6NY
http://maxivak.com/differences-between-nosql-databases-cassandra-vs-mongodb-vs-couchdb-vs-redis-vs-riak-vs-hbase-vs-membase-vs-neo4j/
http://www.infoworld.com/article/2848722/nosql/mongodb-cassandra-hbase-three-nosql-databases-to-watch.html
MongoDB vs Cassandra: http://theprofessionalspoint.blogspot.com/2014/01/mongodb-vs-cassandra-difference-and.html:
Cassandra is much better suited for highly distributed applications due to its tunable replication engine. It was built from the ground up to be a shared-nothing data engine. MongoDB, by contrast, is better suited for applications that need a dynamic schema-less approach.
https://www.youtube.com/watch?v=PENcqjVKqr4c
https://www.youtube.com/watch?v=gJFG04Sy6NY
http://maxivak.com/differences-between-nosql-databases-cassandra-vs-mongodb-vs-couchdb-vs-redis-vs-riak-vs-hbase-vs-membase-vs-neo4j/
http://www.infoworld.com/article/2848722/nosql/mongodb-cassandra-hbase-three-nosql-databases-to-watch.html
http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf
Use cases: scale,
A class of modern RDBMS’s that seek to provide the same scalable performance of NoSQL systems for OLTP read-write workloads while still maintaining the ACID guarantees of a traditional relational database system. The disadvantages is they are not for OLAP-style queries, and they are inappropriate for databases over a few terabytes. Aims to blend NoSQL and Relational/SQL. VoltDB, NuoDB, MemSQL, SAP HANA, Splice Machine, Clustrix, Altibase
If you would rather go the route of using Hadoop software, many of the above technologies have Hadoop or open source equivalents: AtScale and Apache Kylin create SSAS-like OLAP cubes on Hadoop, Jethro Data creates indexes on Hadoop data, Apache Atlas for metadata and lineage tools, Apache Drill to query Hadoop files via SQL, Apache Mahout or Spark MLib for machine learning, Apache Flink for distributed stream and batch data processing, Apache HBase for storing non-relational streaming data and supporting fast query response times, SQLite/MySQL/PostgreSQL for storing relational data, Apache Kafka for event queuing, Apache Falcon for data and pipeline management (ETL), and Apache Knox for authentication and authorization.
https://codahale.com/you-cant-sacrifice-partition-tolerance/
Emails don’t need to be consistent, stock prices do
http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond
In NoSQL, to maintain high availability or for performance reasons, data has multiple copies. These copies will not all be updated instantaneously when there is a data change, but will all eventually be updated (“eventually consistent”)
https://www.infoq.com/news/2014/04/bitcoin-banking-mongodb
https://azure.microsoft.com/en-us/blog/json-functionalities-in-azure-sql-database-public-preview/ “If you need a specialized JSON database in order to take advantage of automatic indexing of JSON fields, tunable consistency levels for globally distributed data, and JavaScript integration, you may want to choose Azure DocumentDB as a storage engine.”
https://blogs.msdn.microsoft.com/jocapc/2015/05/16/json-support-in-sql-server-2016/
https://msdn.microsoft.com/en-us/library/dn921897.aspx “If you have pure JSON workloads where you want to use some query language that is customized and dedicated for processing of JSON documents, you might consider Microsoft Azure DocumentDB.”
http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016
https://www.simple-talk.com/sql/learn-sql-server/json-support-in-sql-server-2016/
So now that you’re convinced of the benefits of PaaS, let’s take a look at the menu of available PaaS data services on Azure. It’s important to remember that with any application, you can use multiple data stores.
Cache and Search are specialized data stores that you wouldn’t use as a primary data store, but they are worth mentioning here.
Note: speaker should do a brief verbal overview of the information contained in this chart.
Presenter guidance:
Introduce the family portrait.
Slide talk track:
This is how we think about the core differences across the data services for capturing and managing data
On the left, you have more database imposed structure on the left and this loosens as you move to the right, ending in blobs which is just large containers of binary data.
Presenter guidance:
At this point, let’s take a slight detour to mention SQL Server in a VM and how it fits into the mix. It’s important in the context of dev/test and lift and shift (or migrating existing apps).
Establish app dev scenarios as common ground.
Slide talk track:
Let’s first orient ourselves in what we see as common application scenarios.
Are you seeing these?
Are you interested in these scenarios?
Do these represent scenarios you would be willing to move to the cloud?
The services listed are generally those that we would see you using for these scenarios, but this is just what we see. There are infinite ways to do things and at the end of the day, it’s your decision. Azure is there to make sure you have all of the options and choices available that you need.
https://msdn.microsoft.com/en-us/library/mt143171.aspx
When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
Clock- 47 Minutes
In this scenario based HOL, you will learn how to build a ‘polyglot persistence’ data pattern that is common in modern cloud hosted applications. Requirements of modern applications, such as, greater scale and availability, have driven the industry to begin using a much broader range of technologies for storing data within an application. Microsoft Azure provides a range of storage technologies that support these architectures and this HOL provides an example of the use of these in the well understood scenario of e-Commerce. With data services in Microsoft Azure, you can quickly design, deploy, and manage highly-available apps that scale without downtime and that enable you to rapidly respond to security threats. Features built into services like Azure SQL Database, Azure Search, and Azure DocumentDB help your apps scale smartly, run faster, and stay available and secure.
In this HOL you will see a browser based e-commerce application running under the LCA approved sample company name ‘AdventureWorks’. It has been created to demonstrate functionality provided by the following data storage technologies (SQL Database, DocumentDB, Search, Table Storage). In a real application, decisions will need to be made as to where data is stored. In this HOL we wish to highlight; how using multiple Azure data service technologies allows you to take a modern approach to data in your applications.
Note- The website (which gets built out of this HOL) is not intended to be a fully functioning site. It is not designed to be a reference e-commerce implementation nor a starting point for a customer’s implementation of an e-commerce site on Azure, rather it will provide the following functionality in order to demonstrate the selected storage technologies.
In the course of this lab, you will gain greater familiarity with Azure SQL Database, DocumentDB, Azure Search and Table Storage through performing the following tasks:
Familiarize yourself with one of the tenant-company’s websites and its Azure SQL Database backend.
Create a new database using the Azure portal.
Configure and implement vertical scaling by increasing the capacity of a database.
Use Azure SQL Database auditing features to track down an erroneous deletion from a database.
Use Azure SQL Database point-in-time restore to correct the deletion (Optional)
Configure and implement Azure SQL Database geographic disaster recovery to prevent large-scale data loss.
Locate data using Azure Search.
Modernize and create an iterative experience using DocumentDB.
http://INMMDDYYYY.azurewebsites.net/
Show a couple of examples of using multiple data services.
Show a couple of examples of using multiple data services.
1) Copy source data into the Azure Data Lake Store (twitter data example)2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5)3) Pass data into Azure ML to build models using Hive query (or pass in directly from Blob Storage). You can use a Python package to pull data directly from the Azure Data Lake Store4) Azure ML feeds prediction results into the data warehouse (you can also pull in data from SQL Database or SQL Data Warehouse)5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data)6) Power BI pulls data from data warehouse to build dashboards and reports7) Azure Data Catalog captures metadata from Azure Data Lake Store, SQL DW/DB, and SSAS cubes8) Power BI can pull data from the Azure Data Lake Store via HDInsight/Spark (beta) or directly. Excel can pull data from the Azure Data Lake Store via Hive ODBC or PowerQuery/HDInsight9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube