Database is the new black. Ever the backbone of information architectures, database technology continually evolves to meet growing and changing business needs. New types of data and applications make the database more important than ever, and understanding which technology best serves your use case is paramount to building durable systems. These days, the choices are many, so users should be careful when deciding which direction to go. Register for this Exploratory Webcast to hear veteran database Analyst Dr. Robin Bloor explain why the database market has exploded in recent years. He'll outline the current database landscape, and provide insights about which kinds of technologies are suitable for the growing variety of business needs today. He'll also focus on key auxiliary technologies that enable modern databases to do perform efficiently.
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
Why use a datalake? Why use lambda? A conversation starter for Toronto Data Unconference 2015. We will discuss technologies such as Hadoop, Kafka, Spark Streaming, and Cassandra.
Integrating Relational Databases with the Semantic Web: A ReflectionJuan Sequeda
This is a lecture given at the 2017 Reasoning Web Summer School
It has been clear from the beginning that the success of the Semantic Web hinges on integrating the vast amount of data stored in Relational Databases. In 2007, the W3C organized a workshop on RDF Access to Relational Databases. In 2012, two standards were ratified that map relational data to RDF: Direct Mapping and R2RML.
In this lecture, I will reflect on the last 10 years of research results and systems to integrate Relational Databases with the Semantic web. I will provide an answer to the following question: how and to what extent can Relational Databases be integrated with the Semantic Web? I will review how these standards and systems are being used in practice for data integration and discuss open challenges.
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Presentation at Data/Graph Day Texas Conference.
Austin, Texas
January 14, 2017
This talk grew out Juan Sequeda's office hours following the Seattle Graph Meetup. Some of the questions posed were: How do I recognize problem best solved with a graph solution? How do I determine the best type of graph to solve the problem? How do I manage the data where both graph and relational operations will be performed? Juan did such a great job of explaining the options, we asked him to develop his responses into a formal talk.
The core idea behind Hadoop is to distribute both the data and user software on individual shards within the cluster. The Bigdata Replay method is drastically different in that it packs user software into batches on a single multicore machine and uses circuit emulation to maximize throughout when bringing data shards for replay. The effect from hotspots, defined as drastically higher access frequency to a small portion of (popular) data, is different in the two platforms. This paper models the difference numerically but in a relative form, which makes it possible to compare the two platforms.
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.
The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
Why use a datalake? Why use lambda? A conversation starter for Toronto Data Unconference 2015. We will discuss technologies such as Hadoop, Kafka, Spark Streaming, and Cassandra.
Integrating Relational Databases with the Semantic Web: A ReflectionJuan Sequeda
This is a lecture given at the 2017 Reasoning Web Summer School
It has been clear from the beginning that the success of the Semantic Web hinges on integrating the vast amount of data stored in Relational Databases. In 2007, the W3C organized a workshop on RDF Access to Relational Databases. In 2012, two standards were ratified that map relational data to RDF: Direct Mapping and R2RML.
In this lecture, I will reflect on the last 10 years of research results and systems to integrate Relational Databases with the Semantic web. I will provide an answer to the following question: how and to what extent can Relational Databases be integrated with the Semantic Web? I will review how these standards and systems are being used in practice for data integration and discuss open challenges.
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Presentation at Data/Graph Day Texas Conference.
Austin, Texas
January 14, 2017
This talk grew out Juan Sequeda's office hours following the Seattle Graph Meetup. Some of the questions posed were: How do I recognize problem best solved with a graph solution? How do I determine the best type of graph to solve the problem? How do I manage the data where both graph and relational operations will be performed? Juan did such a great job of explaining the options, we asked him to develop his responses into a formal talk.
The core idea behind Hadoop is to distribute both the data and user software on individual shards within the cluster. The Bigdata Replay method is drastically different in that it packs user software into batches on a single multicore machine and uses circuit emulation to maximize throughout when bringing data shards for replay. The effect from hotspots, defined as drastically higher access frequency to a small portion of (popular) data, is different in the two platforms. This paper models the difference numerically but in a relative form, which makes it possible to compare the two platforms.
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.
The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
For more information on our services or upcoming events, please visit our website at http://www.casertaconcepts.com/.
Relational databases power most applications, but new use-cases have requirements that they are not well suited for.
That's why new approaches like graph databases are used to handle join-heavy, highly-connected and realtime aspects of your applications.
This talk compares relational and graph databases, show similarities and important differences.
We do a hands-on, deep-dive into ease of data modeling and structural evolution, massive data import and high performance querying with Neo4j, the most popular graph database.
I demonstrate a useful tool which makes data import from existing relational databases with a non-denormalized ER-model a "one click"-experience.
Which leaves biggest challenge for people coming from a relational background is to adapt some of their existing database experience to new ways of thinking.
Robin Bloor and Mark Madsen offer their theories on where the rapidly-changing database market stands today: What’s new? What’s standard? What is the trajectory of this evolving market? Each Analyst will present for 10-15 minutes, then will engage in a dialogue with the moderator and attendees.
The webcast audio and video archive can be found at https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=4695777&rKey=4b284990a1db4ec0
In this talk I have discussed some ideas of BigData distribution using CDNs (Content Delivery Networks). These ideas included not only the static content, but had primarily content pre-computation in focus. I have also discussed some basic technical tricks of global content distribution
Virtualizing Relational Databases as Graphs: a multi-model approachJuan Sequeda
Talk given at Smart Data 2017
Relational Databases are inflexible due to the rigid constraints of the relational data model. If you have new data that doesn’t fit your schema, you will need to alter your schema (add a column or a new table). This is a task that is not always possible. IT departments don't have time, or they won't allow it - just more nulls that can lead to query performance degradation, etc.
A goal of graph databases is to address this problem with their schema-less graph data model. However, many businesses have large investments in commercial RDBMSs and their associated applications and can't expect to move all of their data to a graph database.
In this talk, I will present a multi-model graph/relational architecture solution. Keep your relational data where it is, virtualize it as a graph, and then connect it with additional data stored in a graph database. This way, both graph and relational technologies can seamlessly interact together.
Learn about data lifecycle best practices in the AWS Cloud, so you can optimize performance and lower the costs of data ingestion, staging, storage, cleansing, analytics and visualization, and archiving.
Graph Query Languages: update from LDBCJuan Sequeda
The Linked Data Benchmark Council (LDBC) is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Graph Query Language task force of LDBC is studying query languages for graph data management systems, and specifically those systems storing so-called Property Graph data. The goals of the GraphQL task force are to:
Devise a list of desired features and functionalities of a graph query language.
Evaluate a number of existing languages (i.e. Cypher, Gremlin, PGQL, SPARQL, SQL), and identify possible issues.
Provide a better understanding of the design space and state-of-the-art.
Develop proposals for changes to existing query languages or even a new graph query language.
This query language should cover the needs of the most important use-cases for such systems, such as social network and Business Intelligence workloads.
This talk will present an update of the work accomplished by the LDBC GraphQL task force. We also look for input from the graph community.
Integrating Semantic Web with the Real World - A Journey between Two Cities ...Juan Sequeda
(The original version of this talk was a Keynote at KCAP2017. This is the final version of the slides after giving this talk 14 times in 2018)
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific and engineering challenges that require attention.
Creating a Data Science Team from an Architect's perspective. This is about team building on how to support a data science team with the right staff, including data engineers and devops.
Integrating Semantic Web in the Real World: A Journey between Two Cities Juan Sequeda
Keynote at The 9th International Conference on Knowledge Capture (KCAP2017), Austin, Texas, Dec 2017
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific challenges that require attention.
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
Recently, there's been discussion, even some confusion, around the relationship between Hadoop and Spark. Although they're both big data frameworks with many similarities, they are not one in the same - and are in fact complimentary in an enterprise environment.
View the webinar replay here: http://info.zaloni.com/spark-hadoops-friend-or-foe
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
The Central Hub: Defining the Data LakeEric Kavanagh
Exploratory Webcast with Dr. Robin Bloor and Dez Blanchfield
It has many aliases – pond, reservoir, swamp – but the concept of the Data Lake has gained a strong foothold in today’s data ecosystem. Its early days saw it used primarily as a landing zone for raw data, but a range of new application areas are emerging, from self-service analytics and BI to a wholly governed and secure data store. As the Data Lake matures, they key is to tie its broad functionality to business value.
Register for this Exploratory Webcast to hear Dr. Robin Bloor offer his perspective on why the information landscape is changing and what the various roles of the Data Lake are thus far. He’ll be joined by Data Scientist Dez Blanchfield, who will discuss his hypothesis of the future of data management and suggest ideas for surviving the Data Lake hype.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
Relational databases power most applications, but new use-cases have requirements that they are not well suited for.
That's why new approaches like graph databases are used to handle join-heavy, highly-connected and realtime aspects of your applications.
This talk compares relational and graph databases, show similarities and important differences.
We do a hands-on, deep-dive into ease of data modeling and structural evolution, massive data import and high performance querying with Neo4j, the most popular graph database.
I demonstrate a useful tool which makes data import from existing relational databases with a non-denormalized ER-model a "one click"-experience.
Which leaves biggest challenge for people coming from a relational background is to adapt some of their existing database experience to new ways of thinking.
Robin Bloor and Mark Madsen offer their theories on where the rapidly-changing database market stands today: What’s new? What’s standard? What is the trajectory of this evolving market? Each Analyst will present for 10-15 minutes, then will engage in a dialogue with the moderator and attendees.
The webcast audio and video archive can be found at https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=4695777&rKey=4b284990a1db4ec0
In this talk I have discussed some ideas of BigData distribution using CDNs (Content Delivery Networks). These ideas included not only the static content, but had primarily content pre-computation in focus. I have also discussed some basic technical tricks of global content distribution
Virtualizing Relational Databases as Graphs: a multi-model approachJuan Sequeda
Talk given at Smart Data 2017
Relational Databases are inflexible due to the rigid constraints of the relational data model. If you have new data that doesn’t fit your schema, you will need to alter your schema (add a column or a new table). This is a task that is not always possible. IT departments don't have time, or they won't allow it - just more nulls that can lead to query performance degradation, etc.
A goal of graph databases is to address this problem with their schema-less graph data model. However, many businesses have large investments in commercial RDBMSs and their associated applications and can't expect to move all of their data to a graph database.
In this talk, I will present a multi-model graph/relational architecture solution. Keep your relational data where it is, virtualize it as a graph, and then connect it with additional data stored in a graph database. This way, both graph and relational technologies can seamlessly interact together.
Learn about data lifecycle best practices in the AWS Cloud, so you can optimize performance and lower the costs of data ingestion, staging, storage, cleansing, analytics and visualization, and archiving.
Graph Query Languages: update from LDBCJuan Sequeda
The Linked Data Benchmark Council (LDBC) is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Graph Query Language task force of LDBC is studying query languages for graph data management systems, and specifically those systems storing so-called Property Graph data. The goals of the GraphQL task force are to:
Devise a list of desired features and functionalities of a graph query language.
Evaluate a number of existing languages (i.e. Cypher, Gremlin, PGQL, SPARQL, SQL), and identify possible issues.
Provide a better understanding of the design space and state-of-the-art.
Develop proposals for changes to existing query languages or even a new graph query language.
This query language should cover the needs of the most important use-cases for such systems, such as social network and Business Intelligence workloads.
This talk will present an update of the work accomplished by the LDBC GraphQL task force. We also look for input from the graph community.
Integrating Semantic Web with the Real World - A Journey between Two Cities ...Juan Sequeda
(The original version of this talk was a Keynote at KCAP2017. This is the final version of the slides after giving this talk 14 times in 2018)
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific and engineering challenges that require attention.
Creating a Data Science Team from an Architect's perspective. This is about team building on how to support a data science team with the right staff, including data engineers and devops.
Integrating Semantic Web in the Real World: A Journey between Two Cities Juan Sequeda
Keynote at The 9th International Conference on Knowledge Capture (KCAP2017), Austin, Texas, Dec 2017
An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with the Semantic Web using the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs.
However, where is the mainstream industry adoption? What are the barriers to adoption? Are these engineering and social barriers or are they open scientific problems that need to be addressed?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, and scientific challenges that require attention.
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
Recently, there's been discussion, even some confusion, around the relationship between Hadoop and Spark. Although they're both big data frameworks with many similarities, they are not one in the same - and are in fact complimentary in an enterprise environment.
View the webinar replay here: http://info.zaloni.com/spark-hadoops-friend-or-foe
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
The Central Hub: Defining the Data LakeEric Kavanagh
Exploratory Webcast with Dr. Robin Bloor and Dez Blanchfield
It has many aliases – pond, reservoir, swamp – but the concept of the Data Lake has gained a strong foothold in today’s data ecosystem. Its early days saw it used primarily as a landing zone for raw data, but a range of new application areas are emerging, from self-service analytics and BI to a wholly governed and secure data store. As the Data Lake matures, they key is to tie its broad functionality to business value.
Register for this Exploratory Webcast to hear Dr. Robin Bloor offer his perspective on why the information landscape is changing and what the various roles of the Data Lake are thus far. He’ll be joined by Data Scientist Dez Blanchfield, who will discuss his hypothesis of the future of data management and suggest ideas for surviving the Data Lake hype.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
Watch full webinar here: https://bit.ly/32TT2Uu
Data virtualization is not just for self-service, it’s also a first-class citizen when it comes to modern data platform architectures. Technology has forced many businesses to rethink their delivery models. Startups emerged, leveraging the internet and mobile technology to better meet customer needs (like Amazon and Lyft), disrupting entire categories of business, and grew to dominate their categories.
Schedule a complimentary Data Virtualization Discovery Session with g2o.
Traditional companies are still struggling to meet rising customer expectations. During this webinar with the experts from g2o and Denodo we covered the following:
- How modern data platforms enable businesses to address these new customer expectation
- How you can drive value from your investment in a data platform now
- How you can use data virtualization to enable multi-cloud strategies
Leveraging the strategy insights of g2o and the power of the Denodo platform, companies do not need to undergo the costly removal and replacement of legacy systems to modernize their systems. g2o and Denodo can provide a strategy to create a modern data architecture within a company’s existing infrastructure.
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
INTRODUCTION TO BIG DATA AND HADOOP
9
Introduction to Big Data, Types of Digital Data, Challenges of conventional systems - Web data, Evolution of analytic processes and tools, Analysis Vs reporting - Big Data Analytics, Introduction to Hadoop - Distributed Computing
Challenges - History of Hadoop, Hadoop Eco System - Use case of Hadoop – Hadoop Distributors – HDFS – Processing Data with Hadoop – Map Reduce.
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
The Future of Data Warehousing and Data IntegrationEric Kavanagh
The rise of big data, data lakes and the cloud, coupled with increasingly stringent enterprise requirements, are reinventing the role of data warehousing in modern analytics ecosystems. The emerging generation of data warehouses is more flexible, agile and cloud-based than their predecessors, with a strong need for automation and real-time data integration.
Join this live webinar to learn:
-Typical requirements for data integration
-Common use cases and architectural patterns
-Guidelines and best practices to address data requirements
-Guidelines and best practices to apply architectural patterns
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
Synthesis Webcast with Eric Kavanagh and Tamr
DataOps is an emerging set of practices, processes, and technologies for building and automating data pipelines to meet business needs quickly. As these pipelines become more complex and development teams grow in size, organizations need better collaboration and development processes to govern the flow of data and code from one step of the data lifecycle to the next – from data ingestion and transformation to analysis and reporting.
DataOps is not something that can be implemented all at once or in a short period of time. DataOps is a journey that requires a cultural shift. DataOps teams continuously search for new ways to cut waste, streamline steps, automate processes, increase output, and get it right the first time. The goal is to increase agility and cycle times, while reducing data defects, giving developers and business users greater confidence in data analytic output.
This webcast examines how organizations adopt DataOps practices in the field. It will review results of an Eckerson Group survey that sheds light on the rate and scope of DataOps adoption. It will also describe case studies of organizations that have successfully implemented DataOps practices, the challenges they have encountered and benefits they’ve received.
Tune into our webcast to learn:
- User perceptions of DataOps
- The rate of DataOps adoption by industry and other demographic variables
- DataOps adoption by technique and component (i.e., agile, test automation, orchestration, continuous development/continuous integration)
- Key challenges organizations face with DataOps
- Key benefits organizations experience with DataOps
- Best practices in doing DataOps
- Case studies and anecdotes of DataOps at companies
Expediting the Path to Discovery with Multi-Source AnalysisEric Kavanagh
The Briefing Room with Eric Kavanagh and Zoomdata
In the realm of complex analysis, rarely does one source of data provide everything the analyst needs. Data Warehouses were designed to pull data from multiple sources, to enable that kind of cross-system discovery. But that traditional model typically required stripping the data of significant context, essentially watering down the end result, and at times obfuscating the most meaningful facets.
Thanks to several advances in real-time data exploration, companies can now access raw data where it lives, and begin the analysis process often within seconds of connecting to a source. And new innovations allow for multi-source analytics, where disparate systems can be accessed simultaneously, allowing real-time discovery across multiple sources, creating a kind of analytical depth perception. Register for this special episode of The Briefing Room to hear Bloor Group CEO Eric Kavanagh, and Zoomdata speakers explain this remarkable new capability.
Metadata Mastery: A Big Step for BI ModernizationEric Kavanagh
Modernizing data management is on everyone’s mind today. Making the shift from data management practices of the BI era to modern data management is essential but it is also challenging. Whether you’re updating the back end by migrating your data warehouses to the cloud or advancing the front end with a shift from legacy BI tools to self-service analysis and visualization, it is critical to know the data that you have and to understand data lineage. Data inventory, data glossary, and data lineage are all metadata dependent. But legacy BI metadata is typically proprietary, non-integrated, and collected inconsistently by a variety of disparate tools. The metadata muddle is a serious inhibitor to modernization efforts. Metadata consolidation and centralization are the keys to overcoming this barrier. What if all this were automated?
Join us to learn:
- How a smart and innovative new technology resolves metadata disparity
- How metadata management automation accelerates modernization efforts
- How metadata management automation reduces errors and improves quality of results from data management modernization projects
- How metadata management automation and data cataloging work together to help you move rapidly to the next generation of BI and analytics
Better to Ask Permission? Best Practices for Privacy and SecurityEric Kavanagh
Hot Technologies with The Bloor Group and IDERA
If security was once a nice-to-have, those days have long gone. Between data breaches and privacy regulations, organizations today face immense pressure to protect their systems and their sensitive data. When giants like Yahoo! and Target can get hacked, so can any other company. What can you do about it? How can you protect your company and clients?
Register for this episode of Hot Technologies to hear Analysts Eric Kavanagh and Dr. Robin Bloor provide insights about the many ways that companies can buttress their defenses and stay ahead of the bad guys. They'll be briefed by Vicky Harp of IDERA who will demonstrate how to identify vulnerabilities, track sensitive data, successfully pass audits, and protect your SQL Server databases.
The Model Enterprise: A Blueprint for Enterprise Data GovernanceEric Kavanagh
What gets measured, gets managed; but what gets governed, generates real value. That's one major reason why data governance has risen to a top priority for most organizations. Another reason is the rapid onboarding of big data, which often comes from beyond the traditional firewall. And then there are the authorities: issues like privacy, security and fiduciary responsibility are combining to make data governance a must-have. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why governance should be viewed as a positive change agent for the modern enterprise. He'll be briefed by Ron Huizenga of IDERA, who will discuss a practical, model-based approach to enterprise data governance, with a focus on Master Data Management.
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingEric Kavanagh
Expectations have changed. That's true for users, executives and customers alike. There's no time for systems running slowly, or cost overruns. That's why fundamentals like capacity planning have become mission-critical. By paying attention to the details, and doing effective forecasts, companies can optimize their information architecture, keeping everyone happy. Register for this episode of Hot Technologies to learn from veteran Analysts Dr. Robin Bloor and Rick Sherman who will offer insights about how and why to do capacity planning. They'll be briefed by Bullett Manale of IDERA, who will explain how his company's SQL Diagnostic Manager can track a wide range of usages metrics which can be used for accurate forecasting.
A Winning Strategy for the Digital EconomyEric Kavanagh
The speed of innovation today creates tremendous opportunities for some, existential threats for others. Companies that win create their own success by leveraging modern data platforms. While architectures vary, the foundation is often in-memory, and the latency is real-time. Register for this Special Edition of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how today's data platforms enable the modern enterprise in groundbreaking ways. He'll be briefed by Chris Hallenbeck of SAP who will demonstrate how forward-looking companies are leveraging real-time data platforms to achieve operational excellence, make decisions faster, and find new ways to innovate.
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
The Briefing Room with Dr. Robin Bloor and Waterline Data
Good enterprise data can drive positive business outcomes. But if that data isn’t organized and accessible, information workers are left with an incomplete picture. Knowing the location, lineage and permissions of data across the enterprise can lead to more accurate and insightful searches, and ultimately, knowledge discovery.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he discusses how the success of big data projects relies on understanding your data. He’ll be briefed by Todd Goldman and Mohan Sadashiva of Waterline Data, who will explain how their solution can facilitate discovery via automation and crowd sourcing. They’ll demonstrate how combining the value of tribal knowledge with rationalized data can enable self-service analytics, improve data governance, and reduce data redundancy.
Health Check: Maintaining Enterprise BIEric Kavanagh
Hot Technologies with The Bloor Group and IDERA
Most companies realize the value of business intelligence. Advanced analytics, data mining, dashboards – all surface useful insights. With so many moving parts in play, it’s crucial to provide visibility across the entire BI environment, thus delivering solid system and service performance.
Register for this episode of Hot Technologies to learn from Analyst Dr. Robin Bloor and Eric Kavanagh as they discuss why operational and strategic business intelligence are the cornerstones of any organization. They’ll be briefed by Stan Geiger of IDERA, who will showcase his company’s SQL BI Manager, and end-to-end solution designed to provide a single view into numerous running processes. He will explain that by optimizing system health and availability, users can eliminate downtime and improve efficiency.
Rapid Response: Debugging and Profiling to the RescueEric Kavanagh
Bad code happens. And when it does, developers often spend far too much time trying to find and fix the error. Debugging is a common solution, but in a complex environment, running multiple applications on multiple platforms, it can be easier said than done. Developers need instant visibility across all machines, ultimately leading to faster and higher quality insights. Register for this episode of Hot Technologies to learn from Analyst Dr. Robin Bloor and Data Scientist Dez Blanchfield as they discuss how errant code can inevitably disrupt systems and performance. They’ll be briefed by Bert Scalzo of IDERA, who will explain how his company’s Rapid SQL can facilitate the debugging and profiling of stored procedures and functions.
Solving the Really Big Tech Problems with IoTEric Kavanagh
The Briefing Room with Dr. Robin Bloor and HPE Security
The Internet of Things brings new technological problems: sensor communications are bi-directional, the scale of data generation points has no precedent and, in this new world, security, privacy and data protection need to go out to the edge. Likely, most of that data lands in Hadoop and Big Data platforms. With the need for rapid analytics never greater, companies try to seize opportunities in tighter time windows. Yet, cyber-threats are at an all-time high, targeting the most valuable of assets—the data.
Register for this episode of The Briefing Room to hear Analyst Dr. Robin Bloor explain the implications of today's divergent data forces. He’ll be briefed by Reiner Kappenberger of HPE, who will discuss how a recent innovation -- NiFi -- is revolutionizing the big data ecosystem. He’ll explain how this technology dramatically simplifies data flow design, enabling a new era of business-driven analysis, while also protecting sensitive data.
Beyond the Platform: Enabling Fluid AnalysisEric Kavanagh
When the analysts aren’t happy, no one is happy. That’s because these days, practically every aspect of the business is driven by insights. And because information architectures are increasingly complex, any number of issues can cause a slowdown in queries, or even basic reporting. How can your organization ensure that all systems are go?
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains the common roadblocks to successful BI and analytics. He'll be briefed by Stan Geiger of IDERA, who previously demonstrated how his company’s SQL BI Manager can optimize platform health and performance. In this episode, he will dive deeper into how IDERA’s solution resolves resource constraints, user activity and capacity issues, making tiresome troubleshooting a thing of the past.
Protect Your Database: High Availability for High Demand DataEric Kavanagh
Hot Technologies with Dr. Robin Bloor, Dez Blanchfield and IDERA
Your company’s data is mission-critical. While protecting it from outside attack or catastrophe has become a standard business requirement, it’s not enough these days to rely solely on simple backup and recovery techniques. Today’s enterprise requires high availability and uninterrupted operational performance, meaning the DBA toolbox must provide more than traditional solutions.
Register for this episode of Hot Technologies to hear from Analyst Dr. Robin Bloor and Data Scientist Dez Blanchfield as they discuss the necessary components of a modern solution architecture. They’ll be briefed by IDERA’s Oracle ACE Bert Scalzo, who will explain some innovative options for ensuring high availability in a demanding database environment.
A Better Understanding: Solving Business Challenges with DataEric Kavanagh
Good decisions make great companies. That's why the data-driven mantra keeps gaining momentum. Increasingly, smart business people are taking a data-first approach for both strategic planning and tactical decision-making. They spend ample time exploring their data to better understand their options. In doing so, they capitalize on real opportunities, while avoiding low-value projects.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why a data-first mindset can help companies optimize their resources and thus make better decisions. He'll be briefed by Rishi Patel and Erin Haselkorn of
The Briefing Room with Dr. Robin Bloor and Experian
Experian, who will showcase Experian Pandora, which enables the kind of discovery that businesses need to better understand their data. They'll explain how Pandora can help professionals build a business case for their ideas and plans.
The Key to Effective Analytics: Fast-Returning QueriesEric Kavanagh
The best business analysts understand the value of having a "conversation" with their data. The idea is that they can pose queries, examine results, then quickly modify their questions to home in on a desired answer. This kind of iterative process creates a fluid environment that is highly conducive for identifying meaningful patterns in data. Register for this episode of Hot Technologies to hear Bloor Group Chief Analyst Dr. Robin Bloor and Data Scientist Dez Blanchfield as they outline why fluid analytics should be the norm and which hurdles still stand in the way. They'll be briefed by Bullett Manale of IDERA who will demonstrate his company's diagnostic platform for analytics. He'll provide context, and also deliver a demo that shows real-world solutions that enable iterative analytics.
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
The Briefing Room with Dez Blanchfield and Red Hat
Think of containers as the drones of modern computing. They're small, agile, and can carry a significant payload. In many ways, they represent the fruition of the last two major paradigm shifts in enterprise software: SOA and virtualization. However, for companies to fully leverage this innovative approach, a persistent storage platform is needed that is as flexible and scalable as containers themselves.
Register for this episode of The Briefing Room to hear Bloor Group Data Scientist Dez Blanchfield, who will explain the significance of container technology, and the relevance of software-defined storage (SDS) in a constantly evolving IT world. He'll be briefed by Steve Watt and Sayan Saha of Red Hat, who will demonstrate how open-source technology can help organizations take advantage of this brave new world of enterprise computing. They will explain how containers are the next step in the evolution of the operating system, and why SDS is now the optimal solution.
Application Acceleration: Faster Performance for End Users Eric Kavanagh
Hot Technologies with Dr. Robin Bloor, Dez Blanchfield and IDERA
Application performance issues impact end users the hardest, and too often, IT doesn’t know about it until after the fact. With many applications served by a variety of disparate technologies, troubleshooting bottlenecks can be onerous and time consuming, ultimately causing frustration and missed SLAs. How can IT quickly discover what process affected SQL execution time and keep end users focused on the bottom line?
Register for this episode of Hot Technologies to learn from Analyst Dr. Robin Bloor and Data Scientist Dez Blanchfield as they discuss the complexities of the data pipeline. They’ll be briefed by Bill Ellis of IDERA, who will explain the importance of identifying and resolving the root cause of performance problems. He’ll show how IDERA’s Precise Application Performance Platform can isolate transactions and usage patterns, thus giving IT the necessary tools to provide a consistent end user experience.
Time's Up! Getting Value from Big Data NowEric Kavanagh
The Briefing Room with Dr. Robin Bloor and CASK
We all know the promise of big data, but who gets the value? There are plenty of success stories already, and most of them involve one key ingredient: facilitated access to important data sets. Most research studies suggest that the Pareto principle applies: 80 percent goes to data integration, and only 20 to analysis. Inverting that balance is the Holy Grail.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why the time has finally come for turning the tables on the status quo in analytics. He'll be briefed by CASK CEO Jonathan Gray, who will showcase his company's big data integration platform, CDAP, which was specifically designed to expedite time-to-value for big data.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
3. Database Disruption
The forces of nature
often converge to
transform the very
foundations of our
infrastructure.
In the database
landscape, recent
developments have
resulted in a massive
transformation of
the DBMS market.
Understanding your
requirements is key
success these days.
6. Database Fundamentals
q Built for a collection of
resources – which could
be engineered for the
application
q Shares data among
multiple concurrent users
q Optimizes performance
q Handles resilience
q Provides ACID properties
to some degree
8. Hardware Factors
q CPUs, GPUs & FPGAs
q Cross breeding
q 3D Xpoint and PCM (and
Memristor?)
q SSDs & parallel access
q Parallel hardware
architectures
Performance is accelerating
and costs continue to fall.
9. The Cloud
q A Cloud Database is no
different to an on-prem,
in theory
q Most databases now
available in the cloud
q Some databases are cloud
focused (Snowflake, Reed
Shift)
q Some are hybrid (NuoDb
is a good example)
10. Data Growth
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Data growth is roughly 55% pa. Always has been.
11. The Global Map and Data Options
u Move the data to
the processing
u Move the
processing to the
data
u Move the
processing and the
data
u Shard
There will not be a single physical database (or data lake) for a
multitude of reasons.
13. Everything in flux
u Hardware (network,
storage, servers)
u Data Sources
u Data Staging
u Data Volumes
u Data Flow
u Data Governance
u Query Languages
u Data Usage
u Data Structures
u Schema definition
u Ingest speeds
u Data Workloads
u Applications
14. NoSQL Confusion
As the graph indicates,
there is some overlap
between SQL databases
and other databases.
What to choose is a use-
case driven decision.
There never was a
“universal database”
and probably there
never will be.
15. NoSQL World
q Some NDBMS do not attempt to
provide all ACID properties.
q Some NDBMS use a distributed
scale-out architecture with data
redundancy.
q XML DBMS using XQuery are
NDBMS.
q Some documents stores are
NDBMS
q Object databases are NDBMS
(Gemstone, Objectivity,
ObjectStore, etc.)
q Key value stores
q Graph DBMS are NDMBS
q Large data pools (BigTable,
Hbase, Mnesia, etc.) are NDBM
17. SQL Merits and Demerits
q SQL: very good for set
manipulation.
q Works for OLTP and many
query environments.
q Not good for nested data
structures (documents, web
pages, etc.)
q Not good for ordered data
sets
q Not good for data graphs
(networks of values)
Not a Swiss Army Knife!
18. The Impedance Mismatch
q The RDBMS stores data organized
according to table structures
q The OO programmer manipulates
data organized according to
complex object structures,
which may have specific
methods associated with them.
q The data does not simply map to
the structure it has within the
database
q Consequently a mapping activity
is necessary to get and put data
q Basically: hierarchies, types,
result sets, crappy APIs,
language bindings, tools.
19. The SQL Barrier
q SQL has:
q DDL (for data definition)
q DML (for Select, Project and
Join)
q But it has little MML (Math)
or TML (Time)
q Usually result sets are brought to
the client for further analytical
manipulation, but this creates
problems
q Alternatively doing all analytical
manipulation in the database
creates problems
21. Database Mismatch
A key problem is that we talk
mostly about computation over data
when we talk about “big data” and
analytics, a potential mismatch for
both relational and NoSQL
22. Database Workload Parameters
q Read-intensive vs. write-
intensive
q Mutable vs. immutable data
q Immediate vs. eventual
consistency
q Short vs. long data latency
q Predictable vs.
unpredictable data access
patterns
q Simple vs. complex data
types
23. Horses for Courses
q Relational row store databases for
conventionally tooled low to mid-
scale OLTP
q Relational databases for ACID
requirements
q Parallel databases (row or column)
for unpredictable or variable query
workloads
q Specialized databases for complex
data query workloads
q NoSQL (KVS, DHT) for high scale
OLTP
q NoSQL (KVS, DHT) for low latency
read-mostly data access
q Parallel databases (row or column)
for analytic workloads over tabular
data
q NoSQL / Hadoop for batch analytic
workloads over large data volumes
24. Database Tools: A Call Out
q Have you noticed how databases
are not self-running.
q DBA’s are in short supply and the
need for them is increasing
q Database diversity doesn’t help
in this area.
q DBA Tools:
q SQL analysis
q Performance analysis
q Security management
q Capacity planning
q Database deployment
q We meet the same problem with
data lakes – except that there
are very few tools
25. The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation) regularly
27. The Perfect Storm – The Data Lake
q The triumph of Open
Source as a business model
q The dominance of Apache
q Hadoop, the platform
for data
q Spark, for speed
q Kafka & Nifi for data
flow
q The triumph of the cloud
and its dominance
q Cost collapse
28. The Primary Role of the Data Lake
System of Record
Data Governance
Application Platform
29. The Evolved Conception
Analytics
or BI Apps
Data
Governance
Data Lake
Mgt
Static Data Sources Data Streams
To
Databases
Data Marts
Other Apps
ETL
Data
Lake
Ingest
u Static data and data
streams
u Real-time data ingest
u Data Governance
u Data Lake Mgt
u Analytics & BI
u Extracts
The data lake becomes
the system of record
31. The Full Picture
Data
Cleansing
Data
Security
Ingest
Metadata
Mgt
Real-Time
Apps
Transform &
Aggregate
Search &
Query
BI, Visual'n
& Analytics
Other
Apps
Data Lake
Mgt
Data
Governance
DATA LAKE
To
Databases
Data Marts
Other Apps
Archive
Life Cycle
Mgt Extracts
Servers, Desktops, Mobile, Network Devices, Embedded
Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys
Mgt Apps, ESBs, Web Services, SaaS, Business Apps,
Office Apps, BI Apps, Workflow, Data Streams, Social...
32. Data Governance
If data governance was important
before Big Data, (and it was) it is
far more important in the era of
Data Lakes
33. Data Governance
System of record
Data provenance & lineage
Data cleansing
Data security
Data compliance
Data integrity
Data audit record
Data life-cycle mgt
Data meaning
Data Governance is a perpetual
process
35. A TRANSACTION is a
MOLECULE of ATOMIC EVENTS
The ATOM of data has
become the EVENT
Events: Atoms and Molecules
36. Events
Think of events as drops of water.
They can live in streams, and they
can also live in data pools and data
lakes and databases.
37. Event Types
q Instantiation Event
q A State Report
q A Trigger Event
q A Correction Event
We also need to consider:
Data Refinement
Aggregations
Homogeneous Collections
Derived Data
38. § The pulse and the
threshold alert
§ Some of this involves
distributed processing
§ There are known apps
and unknown apps, so
analytical exploration
needs to be enabled
§ Only aggregations will
migrate
DepotDepot
Central
Hub
Source
Proc.
Depot
Proc.
Central
Proc.
Sensors, controllers, CPUs
Data Data
Data
Event Based IoT Architecture
39. u Time
u Geographic location
u Virtual/logical location
u Source device & SW
u Device ID
u Derivation (if derived)
u Creator
u Owner
u Permissions
u Status (for replication)
u Metadata
u Audit Trail
u Archive flag
Self-defining data