Datavail and SlamData present on how to use NoSQL technologies (MongoDB and SlamData) to build a Data Hub -- the fast and easy way to real-time business insight.
Overview of the SlamData open source project for modern data analytics. SlamData allows users to run ordinary SQL queries on modern NoSQL Data like JSON. Currently we support MongoDB, but plan to support other NoSQL datastores including Cassandra, Hadoop and others. Our project opens up modern NoSQL data to anyone with basic SQL skills.
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...DataStax
Increasing regulations on patient data, expanding and ever-changing data volumes and formats, and the need for real-time analytics are adding new levels of complexity to database platforms, forcing Healthcare IT management to rethink legacy database environments.
Join Christopher Rosin, Ph.D., Chief Scientist at Amara Health Analytics, as he shares his knowledge about implementing a real-time predictive analytics platform to support clinicians in the early detection of critical disease states. Based on years of research and hands-on experience, Chris provides practical steps for guiding DataStax Enterprise initiatives from evaluation to successful implementation.
Watch to learn:
- The challenges in selecting the right database technology for dynamic, real-time data without the rigidity of relational systems
- How Amara’s SaaS model delivers real-time decision support, leveraging large amounts of unstructured and structured clinical data
- How DataStax Enterprise helps meet strict requirements on patient data privacy, data integrity, and system performance
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.
If you missed Strata + Hadoop World, you missed quite a bit. This year's event was packed with Big Data practitioners across industries who shared their experiences and how they are driving new innovations like never before. Just because you weren't there, doesn't mean you missed out.
In this session, we'll touch on a few of the key highlights from the show, including:
Key trends in Big Data adoption
The enterprise data hub
How the enterprise data hub is used in practice
Overview of the SlamData open source project for modern data analytics. SlamData allows users to run ordinary SQL queries on modern NoSQL Data like JSON. Currently we support MongoDB, but plan to support other NoSQL datastores including Cassandra, Hadoop and others. Our project opens up modern NoSQL data to anyone with basic SQL skills.
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...DataStax
Increasing regulations on patient data, expanding and ever-changing data volumes and formats, and the need for real-time analytics are adding new levels of complexity to database platforms, forcing Healthcare IT management to rethink legacy database environments.
Join Christopher Rosin, Ph.D., Chief Scientist at Amara Health Analytics, as he shares his knowledge about implementing a real-time predictive analytics platform to support clinicians in the early detection of critical disease states. Based on years of research and hands-on experience, Chris provides practical steps for guiding DataStax Enterprise initiatives from evaluation to successful implementation.
Watch to learn:
- The challenges in selecting the right database technology for dynamic, real-time data without the rigidity of relational systems
- How Amara’s SaaS model delivers real-time decision support, leveraging large amounts of unstructured and structured clinical data
- How DataStax Enterprise helps meet strict requirements on patient data privacy, data integrity, and system performance
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.
If you missed Strata + Hadoop World, you missed quite a bit. This year's event was packed with Big Data practitioners across industries who shared their experiences and how they are driving new innovations like never before. Just because you weren't there, doesn't mean you missed out.
In this session, we'll touch on a few of the key highlights from the show, including:
Key trends in Big Data adoption
The enterprise data hub
How the enterprise data hub is used in practice
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
Being able to analyze sales at the most granular level with up-to-date data, provides a competitive advantage for unlocking additional revenue -- especially for e-commerce and retail companies heading into the holiday season.
Webinar - Bringing connected graph data to Cassandra with DSE GraphDataStax
For today’s always-connected customer, modern digital cloud applications need to manage highly connected data with seemingly endless data relationships. DataStax Enterprise with DSE Graph is the only distributed data platform able to support the transactional and analytical complex data relationships contained in such systems. Learn how DSE Graph can support your highly connected systems and answer questions such as, how do my customers interact with my business, where is the bottleneck in my supply chain, what recommendation makes the most sense for my customer in a particular moment?
View recording: https://youtu.be/7R_axClTWnc
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax Managed Cloud: focus on innovation, not administrationDataStax
Apache Cassandra was built for the cloud, ready for both its seemingly endless elasticity as well to avoid disruption from outages that have become all too common. DataStax Managed Cloud helps you take full advantage of what the cloud has to offer while removing the overhead and complexity of managing operations.
View recording: https://youtu.be/JI7R3CwIw54
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
Better Together: The New Data Management OrchestraCloudera, Inc.
To ingest, store, process and leverage big data for maximum business impact requires integrating systems, processing frameworks, and analytic deployment options. Learn how Cloudera’s enterprise data hub framework, MongoDB, and Teradata Data Warehouse working in concert can enable companies to explore data in new ways and solve problems that not long ago might have seemed impossible.
Gone are the days of NoSQL and SQL competing for center stage. Visionary companies are driving data subsystems to operate in harmony. So what’s changed?
In this webinar, you will hear from executives at Cloudera, Teradata and MongoDB about the following:
How to deploy the right mix of tools and technology to become a data-driven organization
Examples of three major data management systems working together
Real world examples of how business and IT are benefiting from the sum of the parts
Join industry leaders Charles Zedlewski, Chris Twogood and Kelly Stirman for this unique panel discussion, moderated by BI Research analyst, Colin White.
Data volumes have experienced explosive growth in recent years, and that data is being generated from sources that are increasingly complex and varied. Harnessing and refining value from this data requires a new approach as data extraction, transformation, and loading (ETL) becoming increasingly more costly and difficult to scale.
Organizations are looking to leverage Hadoop as an enterprise data hub—also called a “data lake” or “data reservoir”—as a key component of their data architecture to augment their data warehouse, ETL and analytical systems in order to maximize their existing investments, reduce costs, and unlock new business value from their data.
In this webinar, you will learn:
Real-world examples that illustrate why Hadoop is the best low-cost data hub, data lake, or data landing zone (staging area) option for ETL processing
Proof points that demonstrate advantages of Hadoop and its ability to scale to manage increasing data volumes and support exploratory big data analytics
Proven best practices for a cost-effective, reliable way to implement a data management platform for your entire big data analytical ecosystem
Hidden issues to be aware of in deploying your data hub/data lake
A few months back I spoke with some graduate students about "what is data warehousing". In this talk I covered the past, present, and probably future of what data warehousing is and how it can add value to a company.
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
According to Forrester Research, leaders in customer experience drive 5.1X revenue growth over laggards. And although 84% of companies aspire to be a leader in this space, only 1 in 5 successfully delivers good or great customer experience. Join us for our next webinar where Mike Gualtieri, VP and Principal Analyst at Forrester Research and Rajay Rai, Head of Digital Engineering at Macquarie Bank will share how Customer Experience can drive business results such as faster revenue growth, longer customer retention, greater employee engagement and improved profit margins.
View webinar recording: https://youtu.be/eEc5tx-nHvI
Explore past DataStax webinars: http://www.datastax.com/resources/webinars
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...DataStax
Building and managing cloud applications is not easy. Teams come face to face with these challenges: agility, manageability, performance, scalability, continuous availability and of course, security. Join us for “The Agility Challenge: Powering Cloud Applications with Multi-Model & Mixed Workloads” webinar where we will deep dive into challenges customers face with multiple data models such as graph, mixed workloads and how DataStax Enterprise can help.
Video: https://youtu.be/1tKDxkexzFE
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...DataStax
Managing 3.8 million e-prescriptions daily for more than 1 million healthcare professionals is no small feat. And, with rapid growth in the number of digital transactions and expansion of its network, Surescripts needed to replace its legacy relational database system to address a new set of data management challenges while meeting their customers’ demanding SLAs. Join us for this on-demand webinar to hear from Keith Willard, Chief Architect at Surescripts, to learn how and why Surescripts leverages DataStax Enterprise to deliver enhanced message processing at scale.
View recording: https://youtu.be/1T6V1XAoaJQ
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...DataStax
Proofpoint is a visionary leader in social media, email and mobile security and compliance. Their technology consumes and correlates billions of events every day across all of these communication channels to detect bad actors in real-time and deliver deep threat protection and intelligence. Proofpoint uses DataStax Enterprise (DSE) as a key piece of their platform to deliver industry leading security and compliance solutions. In this webinar, VP of Engineering at Proofpoint, Rich Sutton will share the use cases they’ve deployed on DataStax Enterprise, highlight the problems solved and outcomes achieved and share their journey into the world of NoSQL.
Video recording: https://youtu.be/ro-Kc1VUjrQ
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME
A new foundation for the Modern Information Architecture.
Speaker: Amr Awadallah, CTO & Cofounder, Cloudera
Our legacy information architecture is not able to cope with the realities of today's business. This is because it is not able to scale to meet our SLAs due to separation of storage and compute, economically store the volumes and types of data we currently confront, provide the agility necessary for innovation, and most importantly, provide a full 360 degree view of our customers, products, and business. In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
Across all industries, organizations are embracing the promise of Apache Hadoop to store and analyze data of all types, at larger volumes than ever before possible. But to tap into the true value of this data, organizations need to manage this data and its subsequent metadata to understand its context, see how it’s changing, and take actions on it.
Cloudera Navigator is the only integrated data management and governance for Hadoop and is designed to do exactly this. With Cloudera 5.7, we have further expanded the capabilities in Cloudera Navigator to make it even easier to understand your data and maintain metadata consistency as it moves through Hadoop.
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
Analytic workloads and the ability to determine “what happened” are some of the most common use cases across enterprises today - helping you understand and adapt based on changing trends. However, for most businesses today, they are only able to see a piece of the story. Analytics are limited by the amount of data able to be stored and ultimately accessed, it’s time-intensive to bring in new datasets or fit unstructured data into rigid schemas, and user access is constrained to a select few who must already know the questions they’re trying to answer.
It’s no surprise that big data is disrupting this modus operandi for analytics. A modern, Hadoop-based platform is designed to help businesses break free of these analytic limitations, providing a new kind of adaptive, high-performance analytic database. The recent release of Cloudera 5.8 continues to advance Cloudera Enterprise as the foundation for these analytic workloads.
Join Justin Erickson, Senior Director of Product Management at Cloudera, and Andy Frey, Chief Technology Officer at Marketing Associates, as they discuss:
-What technology is needed to build a modern analytic database with Hadoop
-What’s new with Cloudera 5.8
-How to align your teams around agile analytics
-Real world success from Marketing Associates
-What’s next for Cloudera Enterprise’s Analytic Database
Emergence of MongoDB as an Enterprise Data HubMongoDB
Emergence of MongoDB as an Enterprise Data Hub, presented by Dylan Tong, Sr. Solutions Architect, MongoDB at MongoDB Evenings Seattle at the Seattle Public Library on October 6, 2015.
In this document, we will present a very brief introduction to BigData (what is BigData?), Hadoop (how does Hadoop fits the picture?) and Cloudera Hadoop (what is the difference between Cloudera Hadoop and regular Hadoop?).
Please note that this document is for Hadoop beginners looking for a place to start.
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraDataStax
Apache Cassandra is the open source database technology that pioneered distributed data at scale. DataStax Enterprise, powered by the best distribution of Apache Cassandra, gives you up to 2x better compaction throughput, 3x better operational analytics performance, ease-of-use, and a secure, comprehensive multi-model data platform including search and operational analytics integrated with Cassandra to help you take on whatever challenges you might face along the way.
View recording: https://youtu.be/qLJyFydE-uY
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
Apache Cassandra is a powerful system for supporting large-scale, low-latency data systems, but it has some tradeoffs. Apache Spark can help fill those gaps, and this presentation will show you how.
How you can gain rapid insights and create more flexibility by capturing and storing data from a variety of sources and structures into a NoSQL database.
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
Being able to analyze sales at the most granular level with up-to-date data, provides a competitive advantage for unlocking additional revenue -- especially for e-commerce and retail companies heading into the holiday season.
Webinar - Bringing connected graph data to Cassandra with DSE GraphDataStax
For today’s always-connected customer, modern digital cloud applications need to manage highly connected data with seemingly endless data relationships. DataStax Enterprise with DSE Graph is the only distributed data platform able to support the transactional and analytical complex data relationships contained in such systems. Learn how DSE Graph can support your highly connected systems and answer questions such as, how do my customers interact with my business, where is the bottleneck in my supply chain, what recommendation makes the most sense for my customer in a particular moment?
View recording: https://youtu.be/7R_axClTWnc
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax Managed Cloud: focus on innovation, not administrationDataStax
Apache Cassandra was built for the cloud, ready for both its seemingly endless elasticity as well to avoid disruption from outages that have become all too common. DataStax Managed Cloud helps you take full advantage of what the cloud has to offer while removing the overhead and complexity of managing operations.
View recording: https://youtu.be/JI7R3CwIw54
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
Better Together: The New Data Management OrchestraCloudera, Inc.
To ingest, store, process and leverage big data for maximum business impact requires integrating systems, processing frameworks, and analytic deployment options. Learn how Cloudera’s enterprise data hub framework, MongoDB, and Teradata Data Warehouse working in concert can enable companies to explore data in new ways and solve problems that not long ago might have seemed impossible.
Gone are the days of NoSQL and SQL competing for center stage. Visionary companies are driving data subsystems to operate in harmony. So what’s changed?
In this webinar, you will hear from executives at Cloudera, Teradata and MongoDB about the following:
How to deploy the right mix of tools and technology to become a data-driven organization
Examples of three major data management systems working together
Real world examples of how business and IT are benefiting from the sum of the parts
Join industry leaders Charles Zedlewski, Chris Twogood and Kelly Stirman for this unique panel discussion, moderated by BI Research analyst, Colin White.
Data volumes have experienced explosive growth in recent years, and that data is being generated from sources that are increasingly complex and varied. Harnessing and refining value from this data requires a new approach as data extraction, transformation, and loading (ETL) becoming increasingly more costly and difficult to scale.
Organizations are looking to leverage Hadoop as an enterprise data hub—also called a “data lake” or “data reservoir”—as a key component of their data architecture to augment their data warehouse, ETL and analytical systems in order to maximize their existing investments, reduce costs, and unlock new business value from their data.
In this webinar, you will learn:
Real-world examples that illustrate why Hadoop is the best low-cost data hub, data lake, or data landing zone (staging area) option for ETL processing
Proof points that demonstrate advantages of Hadoop and its ability to scale to manage increasing data volumes and support exploratory big data analytics
Proven best practices for a cost-effective, reliable way to implement a data management platform for your entire big data analytical ecosystem
Hidden issues to be aware of in deploying your data hub/data lake
A few months back I spoke with some graduate students about "what is data warehousing". In this talk I covered the past, present, and probably future of what data warehousing is and how it can add value to a company.
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
According to Forrester Research, leaders in customer experience drive 5.1X revenue growth over laggards. And although 84% of companies aspire to be a leader in this space, only 1 in 5 successfully delivers good or great customer experience. Join us for our next webinar where Mike Gualtieri, VP and Principal Analyst at Forrester Research and Rajay Rai, Head of Digital Engineering at Macquarie Bank will share how Customer Experience can drive business results such as faster revenue growth, longer customer retention, greater employee engagement and improved profit margins.
View webinar recording: https://youtu.be/eEc5tx-nHvI
Explore past DataStax webinars: http://www.datastax.com/resources/webinars
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...DataStax
Building and managing cloud applications is not easy. Teams come face to face with these challenges: agility, manageability, performance, scalability, continuous availability and of course, security. Join us for “The Agility Challenge: Powering Cloud Applications with Multi-Model & Mixed Workloads” webinar where we will deep dive into challenges customers face with multiple data models such as graph, mixed workloads and how DataStax Enterprise can help.
Video: https://youtu.be/1tKDxkexzFE
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...DataStax
Managing 3.8 million e-prescriptions daily for more than 1 million healthcare professionals is no small feat. And, with rapid growth in the number of digital transactions and expansion of its network, Surescripts needed to replace its legacy relational database system to address a new set of data management challenges while meeting their customers’ demanding SLAs. Join us for this on-demand webinar to hear from Keith Willard, Chief Architect at Surescripts, to learn how and why Surescripts leverages DataStax Enterprise to deliver enhanced message processing at scale.
View recording: https://youtu.be/1T6V1XAoaJQ
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...DataStax
Proofpoint is a visionary leader in social media, email and mobile security and compliance. Their technology consumes and correlates billions of events every day across all of these communication channels to detect bad actors in real-time and deliver deep threat protection and intelligence. Proofpoint uses DataStax Enterprise (DSE) as a key piece of their platform to deliver industry leading security and compliance solutions. In this webinar, VP of Engineering at Proofpoint, Rich Sutton will share the use cases they’ve deployed on DataStax Enterprise, highlight the problems solved and outcomes achieved and share their journey into the world of NoSQL.
Video recording: https://youtu.be/ro-Kc1VUjrQ
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME
A new foundation for the Modern Information Architecture.
Speaker: Amr Awadallah, CTO & Cofounder, Cloudera
Our legacy information architecture is not able to cope with the realities of today's business. This is because it is not able to scale to meet our SLAs due to separation of storage and compute, economically store the volumes and types of data we currently confront, provide the agility necessary for innovation, and most importantly, provide a full 360 degree view of our customers, products, and business. In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
Across all industries, organizations are embracing the promise of Apache Hadoop to store and analyze data of all types, at larger volumes than ever before possible. But to tap into the true value of this data, organizations need to manage this data and its subsequent metadata to understand its context, see how it’s changing, and take actions on it.
Cloudera Navigator is the only integrated data management and governance for Hadoop and is designed to do exactly this. With Cloudera 5.7, we have further expanded the capabilities in Cloudera Navigator to make it even easier to understand your data and maintain metadata consistency as it moves through Hadoop.
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
Analytic workloads and the ability to determine “what happened” are some of the most common use cases across enterprises today - helping you understand and adapt based on changing trends. However, for most businesses today, they are only able to see a piece of the story. Analytics are limited by the amount of data able to be stored and ultimately accessed, it’s time-intensive to bring in new datasets or fit unstructured data into rigid schemas, and user access is constrained to a select few who must already know the questions they’re trying to answer.
It’s no surprise that big data is disrupting this modus operandi for analytics. A modern, Hadoop-based platform is designed to help businesses break free of these analytic limitations, providing a new kind of adaptive, high-performance analytic database. The recent release of Cloudera 5.8 continues to advance Cloudera Enterprise as the foundation for these analytic workloads.
Join Justin Erickson, Senior Director of Product Management at Cloudera, and Andy Frey, Chief Technology Officer at Marketing Associates, as they discuss:
-What technology is needed to build a modern analytic database with Hadoop
-What’s new with Cloudera 5.8
-How to align your teams around agile analytics
-Real world success from Marketing Associates
-What’s next for Cloudera Enterprise’s Analytic Database
Emergence of MongoDB as an Enterprise Data HubMongoDB
Emergence of MongoDB as an Enterprise Data Hub, presented by Dylan Tong, Sr. Solutions Architect, MongoDB at MongoDB Evenings Seattle at the Seattle Public Library on October 6, 2015.
In this document, we will present a very brief introduction to BigData (what is BigData?), Hadoop (how does Hadoop fits the picture?) and Cloudera Hadoop (what is the difference between Cloudera Hadoop and regular Hadoop?).
Please note that this document is for Hadoop beginners looking for a place to start.
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraDataStax
Apache Cassandra is the open source database technology that pioneered distributed data at scale. DataStax Enterprise, powered by the best distribution of Apache Cassandra, gives you up to 2x better compaction throughput, 3x better operational analytics performance, ease-of-use, and a secure, comprehensive multi-model data platform including search and operational analytics integrated with Cassandra to help you take on whatever challenges you might face along the way.
View recording: https://youtu.be/qLJyFydE-uY
Explore all DataStax webinars: http://www.datastax.com/resources/webinars
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
Apache Cassandra is a powerful system for supporting large-scale, low-latency data systems, but it has some tradeoffs. Apache Spark can help fill those gaps, and this presentation will show you how.
How you can gain rapid insights and create more flexibility by capturing and storing data from a variety of sources and structures into a NoSQL database.
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
A deep learning startup has a requirement for a robust and scalable data architecture. Training a Deep Neural Network requires 10s-100s of millions of examples consisting of data and metadata. In addition to training it is necessary to support test/validation, data exploration and more traditional data science analytics workloads. As a startup we have minimal resources and an engineering team of 1.
Cassandra, Spark and Kafka running on Mesos in AWS is a scalable architecture that is fast and easy to set up and maintain to deliver a data architecture for Deep Learning.
About the Speaker
Andrew Jefferson VP Engineering, Tractable
A software engineer specialising in realtime data systems. I've worked at companies from Startups to Apple on applications ranging from Ticketing to Genetics. Currently building data systems for training and exploiting Deep Neural Networks.
The current Hadoop ecosystem is challenged and slowed by fragmented and duplicated efforts.
An industry standard is required that translates to immediate benefits that will increase stability, capabilities and compatibility among Hadoop distributions. Its also important to include an open data management core with emphasis on making it enterprise focused.
The ODPi is a shared industry effort focused on build such standards and also promoting and advancing the state of Big Data technologies. Linaro is actively involved in this effort and also to make sure ODPi is ARM compatible.
This talk will go over some of specifications defined, Linaro's contributions, Roadmap and a quick demo
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Sudarshan Kadambi presented this talk at the Bay Area Spark Meetup @ Bloomberg. He covered Bloomberg Apache Spark Server and contributions to Apache Spark. The talk also talked about challenges of doing high-volume online analytics while still observing high-levels of SLAs
A short presentation on the architecture of the Masdar Inst. of Tech in Abu Dhabi. The presentation was done as a case study for a college project of desgining a residential block for the students. The focus therefore is on the residetial block of the Inst. here rather than its other numerous feat.s.
Use of Architectural Elements in Evolution of Traditional StyleSHUBHAM SHARMA
The research paper prepared by me gives a brief idea about traditional and architectural components of rajasthan. This features can be used as a basic components in modern and contemporary architecture to achieve extents of sustainability.
How to Manage Projects in SharePoint Using Out of the Box FeaturesGregory Zelfond
Learn how you can utilize SharePoint out of the box functionality to manage projects. 3 options are discussed: Office 365 Groups, Document sets and project sites. Also, what's available in terms of PMO-style dashboards and reporting capability.
A Brief History of Information Technology
Databases for Decision Support
OLTP vs. OLAP
Why OLAP & OLTP don’t mix (1)
Organizational Data Flow and Data Storage Components
Loading the Data Warehouse
Characteristics of a Data Warehouse
A Data Warehouse is Subject Oriented
For more visit : http://jsbi.blogspot.com
SharePoint has been on the market from 2001, and since then, matured into a very stable and popular business collaboration platform. The beauty of SharePoint is that it is relatively easy to customize and it provides an experience already familiar to users via Office suite. Most frequent use of the platform by corporations has been in the areas of web content management, information sharing and document management.
However, adoption of SharePoint as a true Project Management Information System (PMIS) has been slow. Out-of-the-box SharePoint is unappealing, customization takes time and acceptance at PMO level is often very bureaucratic.
In this presentation I will demonstrate how you can customize SharePoint to help you with your next project. You will walk away learning tips and tricks that you can implement literally in hours. Among other things, you will learn how SharePoint can help you facilitate project team collaboration, integrate existing methodologies and empower your project team.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Learn about the three advances in database technologies that eliminate the need for star schemas and the resulting maintenance nightmare.
Relational databases in the 1980s were typically designed using the Codd-Date rules for data normalization. It was the most efficient way to store data used in operations. As BI and multi-dimensional analysis became popular, the relational databases began to have performance issues when multiple joins were requested. The development of the star schema was a clever way to get around performance issues and ensure that multi-dimensional queries could be resolved quickly. But this design came with its own set of problems.
Unfortunately, the analytic process is never simple. Business users always think up unimaginable ways to query the data. And the data itself often changes in unpredictable ways. These result in the need for new dimensions, new and mostly redundant star schemas and their indexes, maintenance difficulties in handling slowly changing dimensions, and other problems causing the analytical environment to become overly complex, very difficult to maintain, long delays in new capabilities, resulting in an unsatisfactory environment for both the users and those maintaining it.
There must be a better way!
Watch this webinar to learn:
- The three technological advances in data storage that eliminate star schemas
- How these innovations benefit analytical environments
- The steps you will need to take to reap the benefits of being star schema-free
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Original: Lean Data Model Storming for the Agile EnterpriseDaniel Upton
This original publication, aimed at data project leaders, describes a set of methods for agile modeling and delivery of an enterprise data warehouse, which together make it quicker to deliver, faster to load, and more easily adaptable to unexpected changes in source data, business rules or reporting/analytic requirements.
With this set of methods, the parts of data warehouse development that used to be the most resistant to sprint-sized / agile work breakdown -- data modeling and ETL -- are now completely agile, so that this tasking, too, can now be sized purely based on customer requirements, rather than the dictates of a traditional data warehouse architecture.
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
Watch full webinar here: https://buff.ly/3qgGjtA
Presented at TDWI VIRTUAL SUMMIT - Modernizing Data Management
While the technological advances of the past decade have addressed the scale of data processing and data storage, they have failed to address scale in other dimensions: proliferation of sources of data, diversity of data types and user persona, and speed of response to change. The essence of the data mesh and data fabric approaches is that it puts the customer first and focuses on outcomes instead of outputs.
In this session, Saptarshi Sengupta, Senior Director of Product Marketing at Denodo, will address key considerations and provide his insights on why some companies are succeeding with these approaches while others are not.
Watch On-Demand and Learn:
- Why a logical approach is necessary and how it aligns with data fabric and data mesh
- How some of the large enterprises are using logical data fabric and data mesh for their data and analytics needs
- Tips to create a good data management modernization roadmap for your organization
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
This is a presentation I gave at OUGF14 in Helsinki, Finland.
Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures incrementally, without constant refactoring, when using the Data Vault modeling technique. This technique works well for:
• Building the Enterprise Data Warehouse repository in a CIF architecture
• Building a Persistent Staging Area (PSA) in a Kimball Bus Architecture
• Building your data model incrementally, one sprint at a time using a repeatable technique
• Providing a model that is easily extensible without need to re-engineer existing structure or load processes
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
Watch full webinar here: https://bit.ly/3hgOSwm
Data Lake technologies have been in constant evolution in recent years, with each iteration primising to fix what previous ones failed to accomplish. Several data lake engines are hitting the market with better ingestion, governance, and acceleration capabilities that aim to create the ultimate data repository. But isn't that the promise of a logical architecture with data virtualization too? So, what’s the difference between the two technologies? Are they friends or foes? This session will explore the details.
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Elliott Cordo, Principal Consultant at Caserta Concepts, delivered a talk on NoSQL data storage architectures at our most recent Big Data Warehousing Meetup: what they are, how they're used and why you can't ignore them in the context of existing enterprise data ecosystems.
For more information, check out our website at http://www.casertaconcepts.com/.
Similar to Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse (20)
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
1. Building the Modern Data Hub:
Beyond the Traditional
Enterprise Data Warehouse
2. 2www.datavail.com
The New World of Data
90% of the world’s information was created in the last two
years. 80% of all enterprise data is unstructured, which means
it’s not the neat and tidy data that for decades has been held in
relational databases, which in turn plug nicely into “business
intelligence” tools, enterprise data warehouses and other
traditional data analytics systems.
Today’s data needs different tools. And it requires a different
sort of data scientist.
3. 3www.datavail.com
The EDW Analytic Conundrum
Modern DataHub
● Flexible - add new data easily
● Fresh - up to date data, near real
time
● Any query no matter how
complex
● Rapid deployment - days to
weeks
Traditional EDW
● ETL based - Brittle, hard to add
new data sources
● Stale - data can be out of date
● Limited - queries limited by
what data available
● Slow - months to deploy or
update
4. 4www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance
5. 5www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance
What’s the ROI?
How long is
this going to
take?
Are we sure
these are the
right reports?
How quickly can we
make changes?
6. 6www.datavail.com
Today’s Traditional EDW Problems
Extraction, Transformation & Data Loading
•Highly transformative, structured ETLs are a costly
investment on many levels from development,
monitoring, tuning to operational maintenance &
remediation
•Target schema structures require planning based
on end goals but often those goals are not well
defined
•Often today the data we have is both structured
and unstructured
• Traditional EDWs are a long term investment and
the ROI is often hard to measure
• Perishable Insights are difficult to capture in
traditional EDWs requiring fast turnaround (Superbowl,
Mother's Day, Thanksgiving, etc.)
7. 7www.datavail.com
Today’s Traditional EDW Problems Visualization &
Reporting
• Traditional analytic reporting is predicated on
structured schemas (star, snowflake, relational, etc..)
• if these are not planned well it can create
performance problems
• hard structures can lead to missing metrics and
reporting opportunities
• Any reworking of the final analytics requiring new
metrics or data elements often require going back to
the ETL to properly remediate the missing elements
• Producing insights and reporting for new trends can
be time consuming when predicated on pre-planned
data structures
• Missed opportunities on Perishing Insights (Superbowl,
Mother's Day, Thanksgiving, etc.)
8. 8www.datavail.com
A Proposed Modern Approach
MongoDB
JSON Data Warehouse
No Predetermined
Schema
Cubes
Unstructured
Data Star Schema
EDW
OLTP
Data Mart
Reporting
ETL / ELT
Staging
Immediate Access to Data
for Analytic Insights, Fast
ROI & Planning
Other Data
Sources
9. 9www.datavail.com
NoSQL as Source for Visualization
JSON
Structured Data
• RDBMS
• Cloud (AWS, Azure, etc)
-MongoDB
-Spark
BI Tools *
Tableau
PowerBI
Spotfire
Reporting
BI Connector
NoSQL
Hadoop
Hadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema
10. 10www.datavail.com
Hadoop Data Lakes & Data Hubs
• Hadoop is NOT a database it’s a filesystem
• Impala, Cassandra or just JSON, XML, CSV files
• SlamData connects to Hadoop using Spark (both written in Scala)
• Much simpler to implement than 1st generation data
hub/lakes.
Historical Data
Historical Data
Historical Data
Hadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema
11. 11www.datavail.com
What is SlamData?
• SlamData is not a Database
• SlamData is not a monitoring tool
• SlamData is not an ETL tool
• SlamData is not NoSQL
• SlamData is not a replacement SQL
Server, Oracle, DB2, MySQL,
Informix, etc...
• SlamData is not expensive
• SlamData is an analytics engine
• SlamData uses SQL2
for queries
• SlamData will natively connect to
MongoDB, Hadoop (eventually SQL,
Oracle, MySQL, Flatfiles, and more)
• SlamData solves the problem of
directly querying JSON, CSV, ect.
• SlamData spans a huge gap in
traditional data warehouse needs
NOT IS
15. 15www.datavail.com
Chart out Machine Data
• Machine data visualizations are quick and easy. Embed them as real-time visuals in
your own Analytics Dashboard or share them as quick insights.
17. 17www.datavail.com
When Could This Solution Make Sense?
1. You are using MongoDB and getting reporting out is a
struggle
2. You’re planning a traditional data warehouse project,
and the 6-12 month time frame is daunting and you
need better report planning to determine ROI
3. You are using a product like Splunk to capture machine
data and it’s become too expensive
4. You have Hadoop or are planning to implement
Hadoop as a DataLake or DataHub
18. 18www.datavail.com
Why this approach? Simple, save time and money
• Scoping EDW is more simple
• Imagine the ability to eliminate the overhead of planning the data
structure before you know the end analytic needs
• ETL development is less complex
• If the task is just defined as capturing and storing the data; it
becomes much more simple
• Implement solutions in days to weeks, not weeks to months
• SAVE $$$$, Less costly storage options, no ETL software, less
maintenance, lower cost to implement.
19. 19www.datavail.com
Case Studies
Global technology company
Needs:
• Consolidated security and log analytics
• Needed ability to do complex ad-hoc
queries without limitations.
• Share and publish results easily
Solution:
• Using MongoDB to live capture logs
• SlamData for ad-hoc queries and
visualizations
Large Government Agency
Needs:
• Consolidate data from 5+ data sources in
various formats
• Need to be able to answer ad-hoc questions in
minutes to hours, not days to weeks
• Data is perishable, slow brittle ETL or data
mapping was not a good option
Solution:
• Consolidate data into MongoDB datahub
• Use SlamData for building rapid reports that
can be shared and published
20. 20www.datavail.com
So What’s Next Step?
• Lets us show you - give us your toughest data
analytics problem
• Deliver a POC in two weeks or less
• SlamData is the missing piece of data lake/data hub
•Fast time to value, less cost
• Leverage current SQL skills, lower the learning curve
• Build powerful reports, dashboards in minutes, on
live data