Cloud Data Warehousing juggernaut Snowflake has raced out ahead of the pack to deliver a data management platform from which a wealth of new analytics can be run. Using Snowflake as a traditional data warehouse has some obvious cost advantages over a hardware solution. But the real value of Snowflake as a data platform lies in its ability to support a high-concurrency analytics platform using Kyligence Cloud, powered by Apache Kylin.
In this presentation, Senior Solutions Architect Robert Hardaway will describe a modern data service architecture using precomputation and distributed indexes to provide interactive analytics to hundreds or even thousands of users running against very large Snowflake datasets (TBs to PBs).
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
Delta Lake is an open source framework living on top of parquet in your data lake to provide Reliability and performances. It has been open-sourced by Databricks this year and is gaining traction to become the defacto delta lake format.
We’ll see all the goods Delta Lake can do to your data with ACID transactions, DDL operations, Schema enforcement, batch and stream support etc !
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
Delta Lake is an open source framework living on top of parquet in your data lake to provide Reliability and performances. It has been open-sourced by Databricks this year and is gaining traction to become the defacto delta lake format.
We’ll see all the goods Delta Lake can do to your data with ACID transactions, DDL operations, Schema enforcement, batch and stream support etc !
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
Choosing the right data warehouse is a big challenge for organisations. In this doc, we have made an end to end comparison of leading data warehouses. Snowflake vs Redshift vs BigQuery vs Hive vs Athena
Sprinkledata.com
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
Building a data lake is a daunting task. The promise of a virtual data lake is to provide the advantages of a data lake without consolidating all data into a single repository. With Apache Arrow and Dremio, companies can, for the first time, build virtual data lakes that provide full access to data no matter where it is stored and no matter what size it is.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
SF Big Analytics 2020-07-28
Anecdotal history of Data Lake and various popular implementation framework. Why certain tradeoff was made to solve the problems, such as cloud storage, incremental processing, streaming and batch unification, mutable table, ...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...Altinity Ltd
OSA Con 2022: Apache Iceberg: An Architectural Look Under the Covers
Alex Merced - Dremio
The data lakehouse is one of the most exciting trends in the data space promising to merge the best aspects of data lakes and data warehouses without either of their problems. Open source tech is making this promise a reality and in this talk Dremio Developer Advocate, Alex Merced, explores these technologies.
In this talk Alex Merced will cover:
- What is a Data Lakehouse?
- Why open matters in preserving the promise of lakehouses (better costs, vendor freedom, data freedom)
- What are technologies that enable lakehouses like Apache Iceberg, Apache Parquet, Apache Arrow and Project Nessie
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
See how extreme query speeds and ultra-high concurrency on MicroStrategy, and any other business intelligence (BI) tool, on Big Data is possible through the Kyligence platform. Learn more here: https://kyligence.io/
Addressing the systemic shortcomings of cloud analyticsSamanthaBerlant
Learn how existing open source technologies like Apache Kylin, Spark, and Mondrian can be used to increase the value of your analytics investment.
As we enter what some have called The Golden Age of Analytics, there are still some fundamental challenges that plague even the largest and most sophisticated cloud analytics adopters. Chief among these is the challenge of scale, often reflected in limitations of concurrency, multi-tenancy, distributed query performance, and all manner of latencies.
Other less obvious, but equally crucial, challenges of scale and performance have to do with IT and end-user productivity. In other words, there have been few technological advances that enable the quick deployment of big data analytics and the rapid creation of business value from the data being analyzed.
This presentation will consider a few of these systemic challenges and suggest some ways that they can be addressed with available open source technology such as Apache Kylin, Apache Spark, and Apache Mondrian.
Presenter:
Kaige Liu is a Senior Solutions Architect at Kyligence, where he works on building the next-generation big data analytics platform. Previously, he worked on the OpenStack and Bluemix team at IBM, focusing on cloud computing and virtualization technology. Kaige loves the open source community and is an active Apache Kylin committer.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
Choosing the right data warehouse is a big challenge for organisations. In this doc, we have made an end to end comparison of leading data warehouses. Snowflake vs Redshift vs BigQuery vs Hive vs Athena
Sprinkledata.com
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
Building a data lake is a daunting task. The promise of a virtual data lake is to provide the advantages of a data lake without consolidating all data into a single repository. With Apache Arrow and Dremio, companies can, for the first time, build virtual data lakes that provide full access to data no matter where it is stored and no matter what size it is.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
SF Big Analytics 2020-07-28
Anecdotal history of Data Lake and various popular implementation framework. Why certain tradeoff was made to solve the problems, such as cloud storage, incremental processing, streaming and batch unification, mutable table, ...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...Altinity Ltd
OSA Con 2022: Apache Iceberg: An Architectural Look Under the Covers
Alex Merced - Dremio
The data lakehouse is one of the most exciting trends in the data space promising to merge the best aspects of data lakes and data warehouses without either of their problems. Open source tech is making this promise a reality and in this talk Dremio Developer Advocate, Alex Merced, explores these technologies.
In this talk Alex Merced will cover:
- What is a Data Lakehouse?
- Why open matters in preserving the promise of lakehouses (better costs, vendor freedom, data freedom)
- What are technologies that enable lakehouses like Apache Iceberg, Apache Parquet, Apache Arrow and Project Nessie
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
See how extreme query speeds and ultra-high concurrency on MicroStrategy, and any other business intelligence (BI) tool, on Big Data is possible through the Kyligence platform. Learn more here: https://kyligence.io/
Addressing the systemic shortcomings of cloud analyticsSamanthaBerlant
Learn how existing open source technologies like Apache Kylin, Spark, and Mondrian can be used to increase the value of your analytics investment.
As we enter what some have called The Golden Age of Analytics, there are still some fundamental challenges that plague even the largest and most sophisticated cloud analytics adopters. Chief among these is the challenge of scale, often reflected in limitations of concurrency, multi-tenancy, distributed query performance, and all manner of latencies.
Other less obvious, but equally crucial, challenges of scale and performance have to do with IT and end-user productivity. In other words, there have been few technological advances that enable the quick deployment of big data analytics and the rapid creation of business value from the data being analyzed.
This presentation will consider a few of these systemic challenges and suggest some ways that they can be addressed with available open source technology such as Apache Kylin, Apache Spark, and Apache Mondrian.
Presenter:
Kaige Liu is a Senior Solutions Architect at Kyligence, where he works on building the next-generation big data analytics platform. Previously, he worked on the OpenStack and Bluemix team at IBM, focusing on cloud computing and virtualization technology. Kaige loves the open source community and is an active Apache Kylin committer.
In January of this year, Kyligence announced the immediate availability of Kyligence Cloud 4, the first fully cloud-native, distributed OLAP platform. During our announcement, EMA analyst John Santaferraro said:
“As the race for unified analytics heats up, Kyligence offers a solution that overcomes the challenges of querying data in both data lakes and data warehouses located both in the cloud and on premises.”
Join Li Kang - VP of North America at Kyligence - as he provides an overview of the Kyligence Cloud 4 release that will show:
--The new cloud native architecture that employs Apache Kylin, Apache Spark, and Apache Parquet to ensure optimal performance.
--How KC4 delivers sub-second query responses on very large datasets using precomputed aggregate indexes (hyper-cubes) and table indexes.
--The AI-Augmented engine that intelligently organizes your data and reduces data modeling time from days/weeks to minutes.
In this presentation, we will present the Kyligence Cloud 4 story - high-speed analytics with unprecedented sub-second query response times against petabyte datasets.
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudTyler Wishnoff
You’ve been using SQL Server Analytics Services and you love it.
Its multidimensional analysis enables your team to slice and dice your data any way they want and get the results back easily.
The only problem is:
• it doesn’t work very well with the latest technologies
• It wasn’t built to handle Big Data or to serve large analytics teams
• and most frustrating of all, it still isn’t available in the Cloud
The good news is that there’s a way to unburden yourself from the limitations of SSAS, without losing the capabilities you rely on.
If you’re ready to modernize the way your team does analytics, this presentation will provide you the tools and ideas you need to do so.
This presentation will show you how to:
• Efficiently migrate your SSAS-based workload to the Cloud
• Super-charge your SSAS applications, seamlessly, easily and cost-effectively
• Provide unlimited scale of data, concurrency and deliver sub-second response
• Make your SQL/MDX queries scaling and performing beyond your imagination
• Ensure Enterprise-grade Security
• Plus, get actionable examples and stories, from organizations who have successfully overcome these challenges
For more information, visit www.Kyligence.io
If you have big data, more and more of your analytics stack needs to be intelligent. Your tools need to be able to anticipate the needs of your analysts, customers, and your business. With the AI-Augmented Engine, this learning process is automated and predictive. It intelligently adapts to user behavior and query patterns and learns to anticipate each users’ needs. Join us for the third installment of this series diving into the core features of Kyligence Cloud 4.
In this presentation you will learn:
-How the Kyligence Cloud 4 AI-Augmented Engine works
-How the AI-Augmented Engine gives optimal efficiency for cube building
-How the AI-Augmented Engine greatly simplifies data modeling
Watch the webinar here: https://www.brighttalk.com/webcast/18317/480320
Smashing Through Big Data Barriers with Tableau and SnowflakeSamanthaBerlant
Your analysts are working with more data than ever before in Tableau. Chances are, as the data volumes grow, your teams are experiencing some slowdowns. While it may be tempting to blame Tableau, the most likely explanation for performance and scalability pains lies in your data service layer. What if you could transform the way you do analytics without having to retrain your Tableau users? What if you could get more critical business value out of Tableau, and your data, without disrupting the way your business operates?
Join us for this session to learn how Tableau could be the ultimate window into ALL of your valuable data, no matter how large. Learn how precomputation technology and AI-augmented query optimization can help you break free of the downward performance spiral of legacy analytics approaches.
In this presentation, you will learn:
-How to get the fastest big data analytics experience on Tableau
-How a unified semantic layer can ensure that your current Tableau users are not disrupted by big data
-How to improve your analytics operations with automation and machine intelligence
Watch the webinar to see this technology in action during the live Snowflake demo. Enter the onramp to unmatched performance with big data analytics on Tableau.
Take the Bias out of Big Data Insights With Augmented AnalyticsTyler Wishnoff
Is bias impacting your Big Data insights? Learn how augmented analytics and the latest advancements in OLAP technology are making analytics (including on cloud) from business intelligence, data science, and machine learning more accurate and impactful. Learn more at https://kyligence.io
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...Tyler Wishnoff
Learn how to empower your analysts with easier access to all the data they need, exactly when they need it - all while reducing workloads for IT and data engineering.
This presentation will walk you through those challenges, what modern options are available for solving them, and how taking an AI-powered approach to self-service analytics may yield the greatest level of data access along with the best possible performance. Learn more here: https://kyligence.io/
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB
Are you tired of tedious and long data-to-insights journey, siloed data and unleveraged Data? Would you like existing demographic data help you drive business outcome? Would you like NOT to create any data lake and direct insights on data with pre-fabricated data structure without any efforts?
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
Presentation by Ashish Thusoo, Co-Founder & CEO at Qubole, on exploring the big data industry trends in moving from data warehouses to cloud-based data lakes.This presentation will cover how companies today are seeing a significant rise in the success of their big data projects by moving to the cloud to iteratively build more cost-effective data pipelines and new products with ML and AI.
Uncovering how services like AWS, Google, Oracle, and Microsoft Azure provide the storage and compute infrastructure to build self-service data platforms that can enable all teams and new products to scale iteratively.
Drug discovery at 2x speed. Faster, more comprehensive testing approval processes. Identifying gene targets in massive sequencing data sets. These goals are ambitious yet attainable, but not without increasing the computational capabilities of today's researchers. While everyone agrees that simply deploying more infrastructure is not the answer, running that work in the cloud is not without challenges. In this talk we will discuss and illustrate elements of those workloads that Cycle Computing's customers have run on AWS, generating vastly better results than would have been attained on traditional infrastructure. We will cover some common problems they encountered, and how they resolved them using Amazon EC2, S3, Glacier, and Cycle's software.
Presenters: Dougal Ballantyne, Business Development, AWS; Rob Futrick, CTO, Cycle Computing
Precomputation or Data Virtualization, which one is right for you?SamanthaBerlant
In the world of cloud analytics, what role do precomputation and distributed OLAP play compared with a data virtualization approach? Which should you choose? Do they compete or complement each other? This webinar will address these questions and provide some guidance for how to choose the right approach for your circumstances.
Both technologies are trying to address a similar challenge: make analytics easily accessible to a wider audience in a modern big data environment. Precomputation focuses on performance, response time, and concurrency in the production environment. Data Virtualization technologies focus on making analysis easily available to users by reducing or eliminating ETL and data warehouses.
In this presentation we will cover:
-The key differences between precomputation and data virtualization
-How your choice between the two affects data quality, security, governance, and TCO
-The financial impact each of these technologies have on your analytics program
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...SamanthaBerlant
People have been using Excel for 35 years. There are over 750 million Excel users. People are making magic with Excel every day. With the surging interest in big data, advanced analytics, and the cloud, how does Excel stay relevant and how extreme can Excel get? In this presentation, we will examine:
o Traditional limits of Excel performance, scale, dataset sizes
o Cloud technologies that make Excel better
o Defining the new extremes for Excel power users
Speaker Bio:
Rachel Beddor is a Solutions Engineer for Kyligence where she creates technical content to enhance the learning experience for new Apache Kylin and Kyligence users. She has dedicated her career to making technology more accessible, fun, and inviting to people of all backgrounds.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Tyler Wishnoff
Simplify data lake governance, no matter how much data you work with and how many data sources and BI tools you manage. This presentation offers all you need to develop your own strategy for smarter data lake governance. Learn more at: https://kyligence.io/
Enhance Data Governance with Kyligence Unified Semantic LayerSamanthaBerlant
Simplify data lake governance, no matter how much data you work with and how many data sources and BI tools you manage. This presentation offers all you need to develop your own strategy for smarter data lake governance.
https://www.brighttalk.com/webcast/18317/414017
Similar to Architecting Snowflake for High Concurrency and High Performance (20)
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and IndexingSamanthaBerlant
You’ve moved your data to the cloud, awesome. Now you’re running into issues of concurrency, scale, and cost overruns. But there’s a better way to run your cloud analytics if you think of cloud resources as commodities to conserve and maximize. Sure, you could run the same query from start to finish every time, or you could speed up this process, and save some cash in the process, by precomputing those queries and storing the response for fast retrieval any time, by any number of analysts.
Kyligence Cloud 4’s Spark-Powered Cubing and Indexing feature provides just that - intelligent precomputation, which fundamentally boils down to low-cost, high-performance analytics. Join us for the fourth part of this series exploring the key features of Kyligence Cloud 4.
In this webinar you will learn:
-About modern, cloud era OLAP and cubing theory
-Performance gains you’ll get from intelligent precomputation
-How to apply cloud computing and distributed processing
-Precomputation strategies and tactics
Open Source Technologies in the Analytics RevolutionSamanthaBerlant
One of the hallmarks of modern analytics is that data pipelines are largely built upon open source software (OSS). It is entirely possible to create cutting edge data science, machine learning, data engineering, ETL processing, and predictive analytics pipelines without using any commercial software. Of course, OSS does not necessarily mean “free,” but as a thought experiment, the first part of this session will explore the role of OSS in your data analytics stacks and data pipelines.
For the second half of this presentation, we will examine how OSS tools and platforms can be used to learn and create your own Machine Learning and Data Analytics projects without breaking the bank.
View the presentation: https://youtu.be/JbNuikWKC1Q
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSamanthaBerlant
With over 450 million customers, Didi (world’s largest rideshare company) conducts complex user behavior analysis on huge datasets daily. Exact Count Distinct is one of Didi’s most critical metrics, but it is known for being computationally heavy and notoriously slow. The difference between exact Count Distinct and approximate Count Distinct can cost Didi millions of dollars. In this talk, Kaige Liu of the Apache Kylin project will explain how Didi uses Apache Kylin to return exact Distinct Count on billions of rows of data with sub-second latency to generate the most accurate picture of its business.
You will also learn about the latest development in modern OLAP technologies. Kaige will share how Didi and Truck Alliance (a truck-hailing company that processes $100 billion worth of goods yearly) use Apache Kylin to power their analytics platforms that allow 100s of analysts to achieve sub-second latency on petabyte-scale data.
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it.
https://www.brighttalk.com/webcast/18317/422499
See how the world’s leading open source solution for query acceleration on massive datasets is revolutionizing analytics for enterprises across every industry, and how you can get started using it in your organization.
https://www.brighttalk.com/webcast/18317/413952
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...SamanthaBerlant
See how to consistently deliver accurate COUNT DISTINCT queries in under a second, even on petabyte-scale datasets. This presentation will share Apache Kylin’s approach to COUNT DISTINCT queries for user behavior analysis.
https://www.brighttalk.com/webcast/18317/414006
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.