I often hear from clients: “We don’t know much about Big Data – can you tell us what it is and how it can help our business?” Yes! The first step is this vendor-free presentation, where I start with a business level discussion, not a technical one. Big Data is an opportunity to re-imagine our world, to track new signals that were once impossible, to change the way we experience our communities, our places of work and our personal lives. I will help you to identify the business value opportunity from Big Data and how to operationalize it. Yes, we will cover the buzz words: modern data warehouse, Hadoop, cloud, MPP, Internet of Things, and Data Lake, but I will show use cases to better understand them. In the end, I will give you the ammo to go to your manager and say “We need Big Data an here is why!” Because if you are not utilizing Big Data to help you make better business decisions, you can bet your competitors are.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Big Data in Financial Services: How to Improve Performance with Data-Driven D...Perficient, Inc.
Most banking and financial services organizations have only scratched the surface of leveraging customer data to transform their business, realize new revenue opportunities, manage risk and address customer loyalty. Yet a business’s digital footprint continues to evolve as automated payments, location-based purchases, and unstructured customer communications continue to influence the technology landscape for financial services.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Big Data in Financial Services: How to Improve Performance with Data-Driven D...Perficient, Inc.
Most banking and financial services organizations have only scratched the surface of leveraging customer data to transform their business, realize new revenue opportunities, manage risk and address customer loyalty. Yet a business’s digital footprint continues to evolve as automated payments, location-based purchases, and unstructured customer communications continue to influence the technology landscape for financial services.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
As a follow-on to the presentation "Building an Effective Data Warehouse Architecture", this presentation will explain exactly what Big Data is and its benefits, including use cases. We will discuss how Hadoop, the cloud and massively parallel processing (MPP) is changing the way data warehouses are being built. We will talk about hybrid architectures that combine on-premise data with data in the cloud as well as relational data and non-relational (unstructured) data. We will look at the benefits of MPP over SMP and how to integrate data from Internet of Things (IoT) devices. You will learn what a modern data warehouse should look like and how the role of a Data Lake and Hadoop fit in. In the end you will have guidance on the best solution for your data warehouse going forward.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
The cloud is all the rage. Does it live up to its hype? What are the benefits of the cloud? Join me as I discuss the reasons so many companies are moving to the cloud and demo how to get up and running with a VM (IaaS) and a database (PaaS) in Azure. See why the ability to scale easily, the quickness that you can create a VM, and the built-in redundancy are just some of the reasons that moving to the cloud a “no brainer”. And if you have an on-prem datacenter, learn how to get out of the air-conditioning business!
Azure Stream Analytics (ASA) is an Azure Service that enables real-time insights over streaming data from devices, sensors, infrastructure, and applications. In this presentation, we provide introduction to the service, common use cases, example customer scenarios, business benefits, and demo how to get started. We will quickly build a simple real time analytic application that uses an IoT device to ingest data (Event Hubs), process and analyze data (Stream Analytics) and visualize data (PowerBI).
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
Should I move my database to the cloud?James Serra
So you have been running on-prem SQL Server for a while now. Maybe you have taken the step to move it from bare metal to a VM, and have seen some nice benefits. Ready to see a TON more benefits? If you said “YES!”, then this is the session for you as I will go over the many benefits gained by moving your on-prem SQL Server to an Azure VM (IaaS). Then I will really blow your mind by showing you even more benefits by moving to Azure SQL Database (PaaS/DBaaS). And for those of you with a large data warehouse, I also got you covered with Azure SQL Data Warehouse. Along the way I will talk about the many hybrid approaches so you can take a gradual approve to moving to the cloud. If you are interested in cost savings, additional features, ease of use, quick scaling, improved reliability and ending the days of upgrading hardware, this is the session for you!
Cortana Analytics Suite is a fully managed big data and advanced analytics suite that transforms your data into intelligent action. It is comprised of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription. This presentation will cover all the products involved, how they work together, and use cases.
BI is the “Gathering of data from multiple sources to present it in a way that allows executives to make better business decisions”. I will describe in more detail exactly what BI is, what encompasses the Microsoft BI stack, why it is so popular, and why a BI career pays so much. I will review specific examples from previous projects of mine that show the benefits of BI and its huge return-on-investment. I'll go into detail on the components of a BI solution, and I will discuss key concepts for successfully implementing BI in your organization.
DocumentDB is a powerful NoSQL solution. It provides elastic scale, high performance, global distribution, a flexible data model, and is fully managed. If you are looking for a scaled OLTP solution that is too much for SQL Server to handle (i.e. millions of transactions per second) and/or will be using JSON documents, DocumentDB is the answer.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
Did you know Microsoft provides a Hadoop Platform-as-a-Service (PaaS)? It’s called Azure HDInsight and it deploys and provisions managed Apache Hadoop clusters in the cloud, providing a software framework designed to process, analyze, and report on big data with high reliability and availability. HDInsight uses the Hortonworks Data Platform (HDP) Hadoop distribution that includes many Hadoop components such as HBase, Spark, Storm, Pig, Hive, and Mahout. Join me in this presentation as I talk about what Hadoop is, why deploy to the cloud, and Microsoft’s solution.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
HA/DR options with SQL Server in Azure and hybridJames Serra
What are all the high availability (HA) and disaster recovery (DR) options for SQL Server in a Azure VM (IaaS)? Which of these options can be used in a hybrid combination (Azure VM and on-prem)? I will cover features such as AlwaysOn AG, Failover cluster, Azure SQL Data Sync, Log Shipping, SQL Server data files in Azure, Mirroring, Azure Site Recovery, and Azure Backup.
Apache Hadoop and Spark are best-of-breed technologies for distributed processing and storage of very large data sets: Big Data. Join us as we explain how to integrate Salesforce with off-the-shelf big data tools to build flexible applications. You'll also learn how Force.com is evolving in this area and how Big Objects and Data Pipelines will provide Big Data capability within the platform.
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
Self-service BI empowers users to reach analytic outputs through data visualizations and reporting tools. Solution Architect and Cloud Solution Specialist, James McAuliffe, will be taking you through a journey of Azure's Modern Data Estate.
Logical Data Warehouse and Data Lakes can play a role in many different type of projects and, in this presentation, we will look at some of the most common patterns and use cases. Learn about analytical and big data patterns as well as performance considerations. Example implementations will be discussed for each pattern.
- Architectural patterns for logical data warehouse and data lakes.
- Performance considerations.
- Customer use cases and demo.
This presentation is part of the Denodo Educational Seminar, and you can watch the video here goo.gl/vycYmZ.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/2HMdbUp
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
• What data virtualization really is,
• How it differs from other enterprise data integration technologies
• Real-world examples of data virtualization in action from companies such as Logitech, Autodesk and Festo.
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
Human beings have an ability for exploring the world around them and finding specimen needed, Big Data Analytics & Machine Learning can help us take this up and scratch it.
Sponsored by Data Transformed, the KNIME Meetup was a big success. Please find the slides for Dan's, Tom's, Anand's and Chhitesh's presentations.
Agenda:
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Power BI Overview, Deployment and GovernanceJames Serra
Deploying Power BI in a large enterprise is a complex task, and one that requires a lot of thought and planning. The purpose of this presentation is to help you make your Power BI deployment a success. After a quick Power BI overview, I’ll discuss deployment strategies, common usage scenarios, how to store and refresh data, prototyping options, how to share externally, and then finish with how to administer and secure Power BI. I’ll outline considerations and best practices for achieving an optimal, well-performing, enterprise level Power BI deployment.
Power BI has become a product with a ton of exciting features. This presentation will give an overview of some of them, including Power BI Desktop, Power BI service, what’s new, integration with other services, Power BI premium, and administration.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
In three years I went from a complete unknown to a popular blogger, speaker at PASS Summit, a SQL Server MVP, and then joined Microsoft. Along the way I saw my yearly income triple. Is it because I know some secret? Is it because I am a genius? No! It is just about laying out your career path, setting goals, and doing the work.
I'll cover tips I learned over my career on everything from interviewing to building your personal brand. I'll discuss perm positions, consulting, contracting, working for Microsoft or partners, hot fields, in-demand skills, social media, networking, presenting, blogging, salary negotiating, dealing with recruiters, certifications, speaking at major conferences, resume tips, and keys to a high-paying career.
Your first step to enhancing your career will be to attend this session! Let me be your career coach!
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Learning to present and becoming good at itJames Serra
Have you been thinking about presenting at a user group? Are you being asked to present at your work? Is learning to present one of the keys to advancing your career? Or do you just think it would be fun to present but you are too nervous to try it? Well take the first step to becoming a presenter by attending this session and I will guide you through the process of learning to present and becoming good at it. It’s easier than you think! I am an introvert and was deathly afraid to speak in public. Now I love to present and it’s actually my main function in my job at Microsoft. I’ll share with you journey that lead me to speak at major conferences and the skills I learned along the way to become a good presenter and to get rid of the fear. You can do it!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
1. Finding business value in Big Data
“What exactly is Big Data and why should I care?”
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
2. Other Presentations
Building an Effective Data Warehouse Architecture
Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)
Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud and MPP)
Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in
Finding business value in Big Data (What exactly is Big Data and why
should I care?)
Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
How does Microsoft solve Big Data?
Covers the Microsoft products that can be used to create a Big Data solution
Modern Data Warehousing with the Microsoft Analytics Platform System
The next step in data warehouse performance is APS, a MPP appliance
Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc
Deep dives into the various Microsoft Big Data related products
3. About Me
Business Intelligence Consultant, in IT for 28 years
Microsoft, Big Data Evangelist
Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data
Warehouse solutions using the Microsoft BI stack
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW developer
Been perm, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference and PASS Summit
MCSE for SQL Server 2012: Data Platform and BI
SME for SQL Server 2012 certs
Contributing writer for SQL Server Pro magazine
Blog at JamesSerra.com
SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
4. I tried understanding Big Data…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
5. Agenda
Overview of Big Data and Analytics
Use cases
Data Lake
Hadoop and its role
IoT and real-time data
Modern data warehouse
Federated querying
Data warehouse and the cloud
Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
8. What is Big Data, really?
Data in all forms & sizes
is being generated
faster than ever before
Capture & combine it
for new insights & better,
faster decisions
11
9. Harness the growing and changing nature of data
Collect any data
StreamingStructured
Challenge is combining transactional data stored in relational databases with less structured data
Big Data = All Data
Get the right information to the right people at the right time in the right format
Unstructured
“ ”
10. An illustration of the velocity of data created
Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/
12. Complex implementations
Enterprise data warehouse
Spreadmarts
Siloed data
Hadoop
DashboardsAd hoc analysis
Machine learning
OLAP
Any dataIn-memory
Internet of Things
Innovation
Transactional systems
ETL
Operational reporting
Value
Technology innovation accelerates value
14. 21
Put data to work for everyone
in your organization
Inspire innovation
Accelerate decision-making
Learn from & share insights
15. Units Sold, Discounts, and Profit
before Tax
22
Embrace Big Data across your business
Revenue and Target by Region Departments HeadcountXT2000 Status List
Show Only Problems
Indicator
Preliminary Budget
Materials and Packaging Review
Book Advertising Slots
Fall Showcase Event Analysis
End User Survey
Technical Review Milestone
Status 2M
1.5M
1M
0.5M
0M
Discounts(Millions)
50K 60K 70K 80K 90K 100K 110
Product A
Product D Product C
Product F
Product G
0 5 10 15
Accounting
Administration
Customer Support
Finance
Human Resources
IT
Marketing
R&D
Sales
Sales
Improve revenue
performance
HR
Maximize employee
engagement
Marketing
Build deeper customer
relationships
Finance
Impact your company’s
bottom line
0
5
10
15
0
5
10
15
(Thousands)
North South
Region: South
Target: 13450
Highlighted:
4900
Revenue Target
16. 23
The Data Divide
80%
of data
stored
70%
of data
generated by
customers
<0.5%
being
operationalized
0.5%
being
analyzed
3%
prepared for
analysis
IDC says that right now, about 22% of data is useful. By 2020 that number will climb to 37%.
17. Major Fail
Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation”
Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
18. Analytics Solution
Capture and
integrate data
from multiple internal
and external sources
Derive insight
from data
with rich, interactive dashboards
and reports using the tools you know
Put insight
into action
to increase efficiency
and constituent satisfaction
24. Recommenda-
tion engines
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting for
business
planning
Oil & Gas
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
& Web App
optimization
Legal
discovery and
document
archiving
Data Analytics is needed everywhere
Intelligence
Gathering
Location-based
tracking &
services
Pricing Analysis
Personalized
Insurance
25. Personalized
policies can
reduce costs &
better meet
customer needs
Insurance companies can help
(and some have already started
helping) their customers with truly
personalized insurance plans
tailored to their needs and risks
Personalized Insurance
Insurance Companies can collect real-time data from in-
car sensors and combine it with geolocation and in-house
systems. With information such as distance and speed,
provide personalized insurance offers based on driving
amount, risk, and other factors, for a truly personalized
plan that may often save drivers money
$1,600/yr.
US national avg. car
insurance premium
26. The vast amount of current and ever-growing customer
purchase, rating and click data can all be collected and
managed with an Hadoop-based solution, to pinpoint
preferences based on purchase history and demographics, and
be able to serve useful and compelling cross-sell and up-sell
recommendations.
Recommendation Engines
Significantly
improve up-sell
and cross-sell
opportunities
Retailers can use customer
purchase & rating information to
serve recommendations to current
customers, based on similarities
across many dimensions
158
Items sold/second
by Amazon.com on
11/29/2010 (Cyber
Monday)
27. Retailers – whether large, small, online or in-store – can improve
margins with more detailed pricing analysis. When a customer
is in range of a transaction (either in the store, online or perhaps
passing by), offer personalized offers, real-time price quotes, or
other frequent-buyer perks to help bring more customers to the
store and improve repeat business.
Pricing Analysis
Significantly
improve sales
and customer
satisfaction
Retailers can use customer past
purchase, preference, and demo-
graphic information to serve real-
time custom pricing, instant
discounts when near the store.
up to 30%
Additional price Mac
users accepted for
travel from Orbitz
28. Using data from the Weather Channel, Walmart can create targeted ads based on local weather, products in
their nearby stores, and seasonal consumer desires. Walmart increased the berry and steak sales as much as
threefold when weather-targeted ads were run
31. What is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• A place to store unlimited amounts of data in any format inexpensively
• Allows collection of data that you may or may not use later: “just in case”
• A way to describe any large data pool in which the schema and data requirements are not
defined until the data is queried: “just in time” or “schema on read”
• Complements EDW and can be seen as a data source for the EDW – capturing all data but
only passing relevant data to the EDW
• Frees up expensive EDW resources (storage and processing), especially for data refinement
• Allows for data exploration to be performed without waiting for the EDW team to model
and load the data
• Some processing in better done on Hadoop than ETL tools like SSIS
• Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
32. Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Well manicured, often relational
sources
Known and expected data volume
and formats
Little to no change
Complex, rigid transformations
Required extensive monitoring
Transformed historical into read
structures
Flat, canned or multi-dimensional
access to historical data
Many reports, multiple versions of
the truth
24 to 48h delay
MONITORING AND TELEMETRY
33. Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Increase in variety of data sources
Increase in data volume
Increase in types of data
Pressure on the ingestion engine
Complex, rigid transformations can’t
longer keep pace
Monitoring is abandoned
Delay in data, inability to transform
volumes, or react to new sources
Repair, adjust and redesign ETL
Reports become invalid or unusable
Delay in preserved reports increases
Users begin to “innovate” to relieve
starvation
MONITORING AND TELEMETRY
INCREASING DATA VOLUME NON-RELATIONAL DATA
INCREASE IN TIME
STALE REPORTING
34. Data Lake Transformation (ELT not ETL)
New Approaches
All data sources are considered
Leverages the power of on-prem
technologies and the cloud for
storage and capture
Native formats, streaming data, big
data
Extract and load, no/minimal transform
Storage of data in near-native format
Orchestration becomes possible
Streaming data accommodation becomes
possible
Refineries transform data on read
Produce curated data sets to
integrate with traditional warehouses
Users discover published data
sets/services using familiar tools
CRMERPOLTP LOB
DATA SOURCES
FUTURE DATA
SOURCESNON-RELATIONAL DATA
EXTRACT AND LOAD
DATA LAKE DATA REFINERY PROCESS
(TRANSFORM ON READ)
Transform
relevant data
into data sets
BI AND ANALYTCIS
Discover and
consume
predictive
analytics, data
sets and other
reports
OTHER REFINERY
PROCESSES
DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
36. What is Hadoop?
Microsoft Confidential
Distributed, scalable system on commodity HW
Composed of a few parts:
HDFS – Distributed file system
MapReduce – Programming model
Other tools: Hive, Pig, SQOOP, HCatalog, HBase,
Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie,
ZooKeeper, Flume, Storm
Main players are Hortonworks, Cloudera, MapR
WARNING: Hadoop, while ideal for processing huge
volumes of data, is inadequate for analyzing that
data in real time (companies do batch analytics
instead)
Core Services
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP
REDUCE
HIVE &
HCATALOG
PIG
HBASEFALCON
Hadoop Cluster
compute
&
storage . . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware
37. Hortonworks Data Platform 2.2
Simply put, Hortonworks ties all the open source products together (20)
38. The real cost of Hadoop
http://www.wintercorp.com/tcod-report/
39. Use cases using Hadoop and a DW in combination
Bringing islands of Hadoop data together
Archiving data warehouse data to Hadoop (move)
(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)
(Hadoop as backup/DR, analysis, cloud use)
Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area, sandbox, Data Lake)
41. What is the Internet of Things?
Connectivity Data AnalyticsThings
IoT = sensor-acquired data
42. What is the Internet of Things (IoT)?
Internet-connected devices that can perceive the environment in some way, share their data, and communicate with
you. IoT is just a catch-all term for ways of using machine-generated data to create something useful.
- Has it one processor and sensor to collect information
- Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors,
field operation devices that assist firefighters in search and rescue
- Excludes computers, tablets, and smart phones
- But really, it’s in the sphere of business intelligence that IoT will really make a difference.
Cool possibilities
- When a milk carton is almost empty it will ping you when you are near a store
- An alarm clock that signals your coffee maker to start brewing when you wake up
- An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit
Gartner: 10 billion devices connected to the internet today, 26B by 2020
At some point in the future, nearly every manmade object will contain a device that transmits data!
44. Modern Data Warehouse
Think about future needs:
• Increasing data volumes
• Real-time performance
• New data sources and types
• Cloud-born data
• Multi-platform solution
• Hybrid architecture
49. Federated Querying
Other names: Data virtualization, logical data warehouse, data
federation, virtual database, and decentralized data warehouse.
A model that allows a single query to retrieve and combine data as it sits
from multiple data sources, so as to not need to use ETL or learn more
than one retrieval technology
50. Select… Result set
Federated Querying
Relational
Data
DB2
Oracle
MongoDB
SQL Server
Query Model
Non-
Relational
Data
Cloudera CHD Linux
Hortonworks HDP
Windows Azure
HDInsight
EDW
52. Can I use the cloud with my DW?
• Public and private cloud
• Cloud-born data vs on-prem born data
• Transfer cost from/to cloud and on-prem
• Sensitive data on-prem, non-sensitive in cloud
• Look at hybrid solutions
55. SMP vs MPP
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
56. 50 TB
100 TB
500 TB
10 TB
5 PB
1.000
100
10.000
3-5 Way
Joins
Joins +
OLAP operations +
Aggregation +
Complex “Where”
constraints +
Views
Parallelism
5-10 Way
Joins
Normalized
Multiple, Integrated
Stars and Normalized
Simple
Star
Multiple,
Integrated
Stars
TB’s
MB’s
GB’s
Batch Reporting,
Repetitive Queries
Ad Hoc Queries
Data Analysis/Mining
Near Real Time
Data Feeds
Daily
Load
Weekly
Load
Strategic, Tactical
Strategic
Strategic, Tactical
Loads
Strategic, Tactical
Loads, SLA
“Query Freedom“
“Query complexity“
“Data
Freshness”
“Query Data Volume“
“Query Concurrency“
“Mixed
Workload”
“Schema Sophistication“
“Data Volume”
DW SCALABILITY SPIDER CHART
MPP – Multidimensional
Scalability
SMP – Tunable in one dimension
on cost of other dimensions
The spiderweb depicts
important attributes to
consider when evaluating
Data Warehousing options.
Big Data support is newest
dimension.
57. When do you need a MPP solution?
• We need at least 3x query performance improvement
• We are near disk capacity and see a lot of growth in the upcoming years
• We need to support queries during our maintenance window
• We need to load data outside of our maintenance window
• We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more
memory, faster cpu, clustering
59. Summary
• We live in an increasingly data-intensive world
• Much of the data stored online and analyzed today is more varied than the data
stored in recent years
• More of our data arrives in near-real time
This presents a large business opportunity. Are you ready for it?
60. Resources
The Modern Data Warehouse: http://bit.ly/1xuX4Py
Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
Should you move your data to the cloud? http://bit.ly/1xuXbKU
Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5
Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4
Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO
Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy
What is Advanced Analytics? http://bit.ly/1LDklkB
Azure Data Lake http://bit.ly/1LDkqEN
61. Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted under “Presentations”)
Editor's Notes
many sources and many data marts (spaghetti code), different update of frequency, different variation of dimensions
One version of truth story: different departments using different financial formulas to help bonus
This leads to reasons to use BI. This is used to convince your boss of need for DW
Note that you still want to do some reporting off of source system (i.e. current inventory counts).
It’s important to know upfront if data warehouse needs to be updated in real-time or very frequently as that is a major architectural decision
JD Edwards has tables names like T117
Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?
Slide talk track:
What is the “traditional” data warehouse?
IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company.
However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
Key goal of slide: Communicate what Hadoop is
Slide talk track:
Everyone has heard of Hadoop. But what is it? And do I need it? Apache Hadoop is an open-source solution framework that supports data-intensive distributed applications on large clusters of commodity hardware.
Hadoop is composed of a few parts:
HDFS – Hadoop Distributed File System is Hadoop’s file-system which stores large files (from gigabytes to terabytes) across multiple machines
MapReduce – is a programming model that performs filtering, sorting and other data retrieval commands across a parallel, distributed algorithm.
Other parts of Hadoop include Hbase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper which are all parts of the Hadoop ecosystem that all perform other functions to supplement.
PolyBase of APS v2 AU1 can already support HDP 2.x with the hotfix KB2973037!(HDP 2.x includes HDP 2.0 and HDP 2.1)
Azure HDInsight supports both HDFS and Azure Blog storage for storing data.
With this hotfix, you have following sp_configure values for the option "hadoop connectivity" availlable:0 - no HDP support1 - Hortonworks for Windows Server (HDP 1.3)HDInsight on Analytics Platform SystemHDInsight’s Windows Azure blob storage (WASB[S])2 - Hortonworks for Linux (HDP 1.3)3 - Cloudera CDH 4.3 for Linux (also works with 4.5 and 4.6)4 - Hortonworks Data Platform for Windows Server (HDP 2.x)5 - Hortonworks Data Platform (HDP 2.x) for Linux
Key goal of slide: PolyBase is available only within the Microsoft Analytics Platform System.
Slide talk track:
PolyBase simplifies this by allowing Hadoop data to be queried with standard Transact-SQL (T-SQL) query language without the need to learn MapReduce and without the need to move the data into the data warehouse. PolyBase unifies relational and non-relational data at the query level.
Integrated query: PolyBase accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster referencing a non-relational source. It then seamlessly returns the results to the user.
PolyBase can query Hadoop data in other Hadoop distributions such as Hortonworks or Cloudera.
No difficult learning curve: Standard T-SQL can be used to query Hadoop data. Users are not required to learn MapReduce to execute the query.
Cloud-Hybrid Scenario Options
PolyBase can also query across Windows Azure HDInsight, providing a Hybrid Cloud solution to the data warehouse
The ability of querying all of your company’s data, independent of where it resides, what format it is stored in, in a performing way is crucial in today’s data-centric world with massive, increasing data volume. Today, with AU1, one can query various Hadoop distributions + data stored in Azure. For example, with one single T-SQL statement a user can query over data stored in multiple HDP 2.0 clusters, combine it with data in PDW and combine it with data stored in Azure. No one in the industry (as far as I’m aware of) can do this in this simple fashion. Bringing all Microsoft assets together, on-prem and specifically through our Azure play including various services that will be brought online in future, we can clearly distinguish through our unique & complete end-to-end data management story. No doubt that there are several pieces missing in our ‘Poly’ vision – including supporting other data stores, enabling push-down computation for our cloud story, more user-definable options language-wise, better automation/polices, and many more ideas we’d like to go after in the next weeks & months ahead.
HDInsights benefits: Cheap, quickly procure
Key goal of slide: Highlight the four main use cases for PolyBase.
Slide talk track:
There are four key scenarios for using PolyBase with the data lake of data normally locked up in Hadoop.
PolyBase leverages the APS MPP architecture along with optimizations like push-down computing to query data using Transact-SQL faster than using other Hadoop technologies like Hive. More importantly, you can use the Transact-SQL join syntax between Hadoop data and PDW data without having to import the data into PDW first.
PolyBase is a great tool for archiving older or unused data in APS to less expensive storage on a Hadoop cluster. When you do need to access the data for historical purposes, you can easily join it back up with your PDW data using Transact-SQL.
There are times when you need to share your PDW with Hadoop users and PolyBase makes it easy to copy data to a Hadoop cluster.
Using a simple SELECT INTO statement, PolyBase makes it easy to import valuable Hadoop data into PDW without having to use external ETL processes.
Key goal of slide: To convey that the traditional data warehouse is going to break in one of four different ways. These ways should also not be a surprise to the IT professionals. At the end of the slide, IT should be asking, what can I do to prevent my warehouse from breaking?
Slide talk track:
There are many reasons why data warehouses are at it’s tipping point where something needs to change.
The first trend that will break my traditional data warehouse is data growth. Data volumes are expected to grow 10X over the next five years and traditional data warehouses cannot keep up with this explosion of data.
In addition to growing data, end users have the expectation that they’ll need be able to get back query results faster in near real-time. End users are no longer apt to wait minutes to hours for their results which is something traditional data warehouses cannot keep up with. Also, want real-time data, not dated data pulled in during a maintenance window each night
The third trend is new types of data captured that are “non-relational.” 85% of data growth is coming from “non-relational” data in the form of things like web logs, sensor data, social sentiment and devices. You’ve probably heard the term “Big Data” and “Hadoop” quite a bit. This is where these technologies come into play. More on that later….
The final trend that is appearing is cloud born data. This is data that might be coming from some of IT’s infrastructure that they are starting to host in the cloud (ie. CRM, ERP, etc) or not stored by any type of corporate owned system. How do you incorporate both on-premise and cloud data as part of your data warehouse? This is the last trend that is breaking the traditional data warehouse.
However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
Key goal of slide: To convey that the modern data warehouse is something that the traditional data warehouse must evolve to. To have IT agree that their warehouses need to take advantage of these new technologies (specifically focusing on the middle and bottom layer).
Slide talk track:
To encompass these four trends, we need to evolve our traditional data warehouse to ensure that it does not break. It needs to become the “modern data warehouse.” What is the “modern data warehouse?” This is the new warehouse that is able to excel with these new trends and can be your warehouse now and into the future.
The modern data warehouse has the ability to:
Handle all types of data. Whether it be your structured, relational data sources or your non-relational data sources, the Modern data warehouse will incorporate Hadoop. It can handle real-time data by using complex event processor technologies.
Provide a way to enrich your data with Extract, Transform Load (ETL) capabilities as well as Master Data Management (MDM) and data quality
Provide a way for any BI tool or query mechanism to interface with all these different types of data with a single query model that leverages a single query language that users already know (example: SQL).
Questions drive BI, Analytics drive questions
Key goal of slide: To convey that the major pillars of the Analytics Platform System with key points.
To help organizations with a simple and smooth seamlessly transition to this new world of data, Microsoft introduces the Microsoft Analytics Platform System (APS) – the only, no-compromise modern data warehouse solution that brings both Hadoop and RDBMS in a single, pre-built appliance with tier-one performance, the lowest TCO in the industry, and accessibility to all their users through some of the most widely used BI tools in the industry.
Enterprise-ready Big Data: Microsoft APS combines Microsoft’s industry leading RDBMS platform, the Parallel Data Warehouse Appliance (PDW), with Microsoft’s Hadoop Distribution, HDInsight, for non-relational data to offer an all-in Big Data Analytics appliance.
Tying together and integrating the worlds of relational and Hadoop data is PolyBase, Microsoft’s integrated query tool available only in APS.
Your Modern Data Warehouse in One Turnkey Appliance
APS integrates PDW and HDInsight to operate seamlessly together in a single appliance
Integrated Querying across All Data Types Using T-SQL
PolyBase allows Hadoop data to be queried using rich featured T-SQL , while taking advantage of Hadoop processing, without additional Hadoop-based skills or training.
Enterprise-Ready Hadoop
HDInsight is Microsoft’s Hadoop-based distribution with end-user authentication via Active Directory and managed by IT using System Center
Big Data Insights to Any User
Native Microsoft BI integration within PolyBase allows everyone access to insights through familiar tools such as SSAS and Excel
Next-generation performance at scale: APS was built to scale into multi-petabytes, handling both RDBMS and the data stored in Hadoop, to deliver the performance that meets today’s near real-time sand rapid insights requirements.
Scale-Out to accommodate your Growing Data
APS contains PDW and HDInsight that both have linear scale-out architecture. Start small with a few terabytes and dynamically add capacity for seamless, linear scale-out
Remove DW bottlenecks with MPP SQL Server
Get the dynamic performance and scale that your modern data warehouse requires while retaining your skills and investment in SQL Server.
Real-Time Performance with In-Memory
Provides up to 100x improvement in query performance and 15x compression via updateable in-memory columnstore
Concurrency that Supports High Adoption
Scales in simultaneous user accessibility. APS has high concurrency, allowing for multiple workloads.
Optimal architecture: More than just a converged system, APS has reshaped the very hardware specifications required through software innovations to deliver optimal value. Through features delivered in Windows Server 2012, customers get exceptional value:
APS Provides the Industry’s Lowest DW Price/TB
Lower cost while maintaining performance using WS2012 Storage Spaces that replace SAN with economical Windows Storage Spaces
Save up to 70% of APS storage with up to 15x compression via updateable in-memory columnstore
Value through Single Appliance Solution
Reduce hardware footprint by having PDW and HDInsight within a single appliance
Remove the need for costly integration efforts
Value through Flexible Hardware Options
Avoid hardware lock-in through flexible hardware options from HP, Dell, and Quanta