This document provides an overview of data warehousing. It defines a data warehouse as a collection of diverse data aimed at executives and decision makers that is integrated, time-varying, and non-volatile. It discusses why warehouses are used, different warehouse architectures and models, and operations for implementing and querying a warehouse like aggregation, pivoting, and materializing views.
Faster transactions & analytics with the new SQL2016 In-memory technologiesHenk van der Valk
This document contains information about in-memory technologies in Microsoft SQL Server including:
- In-memory OLTP which provides low-latency updates using memory-optimized tables and natively compiled stored procedures.
- Columnstore indexes which provide high data compression and fast analytical queries by storing data in columns.
- Resource governor which allows binding databases to resource pools to control CPU and memory usage.
- Various server hardware configurations recommended for in-memory workloads.
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
The document discusses online analytical processing (OLAP) and the need for OLAP capabilities beyond basic data analysis. It describes how OLAP uses multidimensional data models and pre-computed aggregates to provide fast and interactive analysis of data across multiple dimensions. Different approaches for implementing OLAP like ROLAP, MOLAP, and hybrid systems are covered.
This document provides an introduction to big data and basic data analysis. It discusses the large amounts of data being generated daily from sources like Google, Facebook, CERN, and Earthscope projects. It describes different types of data like relational, text, semi-structured, graph, and streaming data. It also discusses common tasks for analyzing big data like aggregation, indexing, searching, knowledge discovery, statistics, OLAP, and data mining. Finally, it provides examples of classification, clustering, association rule mining and other analysis techniques.
This document discusses multidimensional data models and cube operations. It introduces key concepts like facts and measures, dimensions and hierarchies. It describes star and snowflake schemas for structuring multidimensional data in a relational database. The document also covers cube operations like roll-up, drill-down, slice and dice that allow interactive analysis of aggregated data across multiple dimensions.
The document discusses data warehousing and the star schema. It defines a data warehouse as a repository of integrated information available for queries and analysis. The data comes from heterogeneous sources and can be queried together. It describes how a star schema organizes data into a central fact table surrounded by dimension tables. The fact table contains keys linking to attributes in the dimension tables. Star queries are processed by first using bitmap indexes on the fact table keys to retrieve relevant rows, then joining the results to the dimension tables.
This document discusses OLAP (online analytical processing) and compares it to OLTP (online transaction processing). It describes how OLAP uses complex queries against large portions of a database to guide strategic decisions, while OLTP focuses on short, frequent updates to maintain an accurate database. The document also introduces concepts like data cubes, dimension tables, aggregation, drilling down, rolling up, pivoting, slicing and dicing. It describes how the CUBE and ROLLUP operators can perform multiple aggregations with a single query.
This document discusses OLAP (Online Analytical Processing) and data mining. It begins by comparing OLTP (Online Transaction Processing) and OLAP, noting that OLTP maintains operational databases while OLAP uses stored data to guide strategic decisions. Common OLAP techniques like slicing, dicing, drilling down and rolling up are introduced. Data mining aims to extract knowledge from databases by finding patterns and associations. The document outlines common data mining algorithms and discusses data warehouses for housing OLAP and mining data.
This document provides an overview of OLAP cubes and multidimensional databases. It discusses key concepts such as star schemas, dimensions and hierarchies, cube aggregation and operators like roll-up and drill-down. It also compares the relational and multidimensional models, highlighting how multidimensional databases allow for intuitive analysis and fast retrieval of large datasets by predefining dimensional perspectives.
Faster transactions & analytics with the new SQL2016 In-memory technologiesHenk van der Valk
This document contains information about in-memory technologies in Microsoft SQL Server including:
- In-memory OLTP which provides low-latency updates using memory-optimized tables and natively compiled stored procedures.
- Columnstore indexes which provide high data compression and fast analytical queries by storing data in columns.
- Resource governor which allows binding databases to resource pools to control CPU and memory usage.
- Various server hardware configurations recommended for in-memory workloads.
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
The document discusses online analytical processing (OLAP) and the need for OLAP capabilities beyond basic data analysis. It describes how OLAP uses multidimensional data models and pre-computed aggregates to provide fast and interactive analysis of data across multiple dimensions. Different approaches for implementing OLAP like ROLAP, MOLAP, and hybrid systems are covered.
This document provides an introduction to big data and basic data analysis. It discusses the large amounts of data being generated daily from sources like Google, Facebook, CERN, and Earthscope projects. It describes different types of data like relational, text, semi-structured, graph, and streaming data. It also discusses common tasks for analyzing big data like aggregation, indexing, searching, knowledge discovery, statistics, OLAP, and data mining. Finally, it provides examples of classification, clustering, association rule mining and other analysis techniques.
This document discusses multidimensional data models and cube operations. It introduces key concepts like facts and measures, dimensions and hierarchies. It describes star and snowflake schemas for structuring multidimensional data in a relational database. The document also covers cube operations like roll-up, drill-down, slice and dice that allow interactive analysis of aggregated data across multiple dimensions.
The document discusses data warehousing and the star schema. It defines a data warehouse as a repository of integrated information available for queries and analysis. The data comes from heterogeneous sources and can be queried together. It describes how a star schema organizes data into a central fact table surrounded by dimension tables. The fact table contains keys linking to attributes in the dimension tables. Star queries are processed by first using bitmap indexes on the fact table keys to retrieve relevant rows, then joining the results to the dimension tables.
This document discusses OLAP (online analytical processing) and compares it to OLTP (online transaction processing). It describes how OLAP uses complex queries against large portions of a database to guide strategic decisions, while OLTP focuses on short, frequent updates to maintain an accurate database. The document also introduces concepts like data cubes, dimension tables, aggregation, drilling down, rolling up, pivoting, slicing and dicing. It describes how the CUBE and ROLLUP operators can perform multiple aggregations with a single query.
This document discusses OLAP (Online Analytical Processing) and data mining. It begins by comparing OLTP (Online Transaction Processing) and OLAP, noting that OLTP maintains operational databases while OLAP uses stored data to guide strategic decisions. Common OLAP techniques like slicing, dicing, drilling down and rolling up are introduced. Data mining aims to extract knowledge from databases by finding patterns and associations. The document outlines common data mining algorithms and discusses data warehouses for housing OLAP and mining data.
This document provides an overview of OLAP cubes and multidimensional databases. It discusses key concepts such as star schemas, dimensions and hierarchies, cube aggregation and operators like roll-up and drill-down. It also compares the relational and multidimensional models, highlighting how multidimensional databases allow for intuitive analysis and fast retrieval of large datasets by predefining dimensional perspectives.
This document provides an introduction to big data and basic data analysis techniques. It discusses the large amounts of data being generated daily from sources like the web, social networks, and scientific projects. It also covers common data types, challenges in working with big data, and some basic statistical and data mining techniques for analyzing large datasets including classification, clustering, and association rule mining.
Although the beginnings of SQL date back to 70s, the language is more relevant than ever. You can save your data in good old PostgreSQL, a fancy new NoSQL database, or even some in-house built storage, but everyone will want to use SQL to query it.
At Zoomdata we build a modern BI platform, which works natively with regular DBs and big data alike. We went through several implementations, employing different approaches and frameworks, but in the end, concluded that the best way to execute analytical queries is to use an engine that natively understands SQL and relational algebra.
In this talk I will introduce Apache Calcite – an open source framework that can help you build your own database, execute queries over distributed data sources, and much more.
The document provides an overview of SAP BW (Business Warehouse), including its key components and architecture. SAP BW is a data warehouse system optimized for reporting and analysis. It includes preconfigured support for extracting data from SAP systems like R/3 as well as tools for extracting from non-SAP sources. The core components include the Administrator Workbench for managing metadata and content, data modeling tools, extraction and loading processes, the operational data store, and BEX reporting tools. Data is loaded from source systems into an in-memory database optimized for online analytical processing.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
New feature overview of Cubes 1.0 – lightweight Python OLAP and pluggable data warehouse. Video: https://www.youtube.com/watch?v=-FDTK80zsXc Github sources: https://github.com/databrewery/cubes
The document provides an overview of data warehousing and discusses key concepts such as:
1) Data warehouses address the problem of data being scattered across different sources by creating a single, consistent store of data from various sources for analysis.
2) Dimensional modeling techniques like star schemas and snowflake schemas are used to structure data warehouses for analysis.
3) Data transformation is required to integrate data from different sources and achieve a single version of truth.
Refreshing and querying the data warehouse are also discussed.
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
OLAP Cube is about pre-aggregations, it reduces the query latency by spending more time and resources on data preparation. But for real-time analytics, data preparation and visibility latency are critical. What happens when OLAP cube meets real-time use cases?
Can we pre-build the cubes in real-time with a quick and more cost effective way? This is hard but still doable.
In eBay,we built our own real-time OLAP solution based on Apache Kylin & Apache Kafka. We read unbounded events from Kafka cluster then divide the streaming data into 3 stages, In-Memory Stage (Continuously In-Memory Aggregations) , On Disk Stage (Flush to disk, columnar based storage and indexes) and Full Cubing Stage (with MR or Spark, save to HBase). Data are aggregated to different layers in different stage, but all query able. Data will be transformed from 1 stage to another stage automatically and transparent to user.
This solution is built to support quite a few realtime analytics use cases in eBay, we will share some use cases like site speed monitoring and eBay site deal performance in this session as well.
Speaker:
Qiaoneng Qian, Senior Product Manager, eBay
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
This document discusses data warehouses, including what they are, how they are implemented, and how they can be further developed. It provides definitions of key concepts like data warehouses, data cubes, and OLAP. It also describes techniques for efficient data cube computation, indexing of OLAP data, and processing of OLAP queries. Finally, it discusses different approaches to data warehouse implementation and development of data cube technology.
This document discusses agile practices for data warehousing, including test-driven development, using a backlog, and kanban workflows. It emphasizes treating data modeling as an iterative process by writing data stories to define requirements, prioritizing stories, and developing the data model and interfaces in small batches. Testing is integrated throughout via test data and test cases added to the backlog. Visual tools like cumulative flow diagrams help optimize workflows by limiting work in progress.
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a massively parallel columnar storage engine for MariaDB that provides high performance analytics on large datasets. It uses a distributed columnar architecture where each column is stored separately and data is partitioned horizontally across nodes. This allows for very fast analytical queries by only accessing the relevant columns and partitions. Some key features include built-in analytics functions, high speed data ingestion, and support for running on-premises or on cloud platforms like AWS. The latest 1.1 version adds capabilities like streaming data ingestion APIs, improved high availability with GlusterFS, and performance optimizations.
This document discusses data warehouse models and OLAP operations. It introduces key concepts like dimensional modeling, star schemas, snowflake schemas, and multi-dimensional data cubes. It explains how OLAP allows for interactive analysis of large datasets through operations like roll-up, drill-down, slicing, and pivoting. It also outlines different approaches to implementing OLAP, including ROLAP which uses relational databases and MOLAP which uses specialized multi-dimensional arrays.
a2c Boston Big Data Meet-up: Agile Data Warehouse Designa2c
Preview this Big Data Seminar, and request the complete audio and animated download featuring Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs. The method utilizes BEAM✲ (Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. a2c's Practice Director of Information Services and Author Jim Stagnitto and CTO John DiPietro designed this presentation to provide an overview of Agile Warehouse Design that will facilitate communication between Data Modelers and Business Intelligence Stakeholders in a fun and informative one hour session. Demystify this process and find out what the 96 Data Scientists who attended November's Boston Big Data Meet-up are talking about.
“Excellent presentation. It is good to hear meaningful …information about new developments in how Agile methodologies can be applied to DW/BI work. Big Kudos to the presenters and organizers. Thanks, I found it very useful and enjoyable.”- Ramon Venegas
“Extremely useful to understand how to apply Agile approach to DWH; how create a framework where model changes are welcome, and bring users to the process of DWH modeling.” – Alfredo Gomez
This document provides a portfolio of business intelligence experience including technologies like SQL Server Integration Services (SSIS), SAS, MDX, SQL Server Reporting Services (SSRS), and data warehousing. It includes samples of ETL processes, dimensional modeling, and MDX code. It also includes assembly language code samples for initializing reject message storage and enabling reject categories.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Metail uses Snowplow to collect customer data and Cascalog to process that data into normalized batch views for analysis. Cascalog transforms the raw Snowplow event stream into structured tables for things like customer body shape, orders, items ordered, returns, and browsers. This makes the data more manageable and complex analysis and aggregation easier. For example, Cascalog is used to calculate key performance indicators by grouping customer data and summing metrics from the batch views. The output is then analyzed further in R. Looker will also allow business analysts to access and explore the batch views and raw Snowplow event data.
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
NoSQL schemas are designed with very different goals in mind than SQL schemas. Where SQL normalizes data, NoSQL denormalizes. Where SQL joins ad-hoc, NoSQL pre-joins. And where SQL tries to push performance to the runtime, NoSQL bakes performance into the schema. Join us for an exploration of the core concepts of NoSQL schema design, using Scylla as an example to demonstrate the tradeoffs and rationale.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
This document provides an introduction to big data and basic data analysis techniques. It discusses the large amounts of data being generated daily from sources like the web, social networks, and scientific projects. It also covers common data types, challenges in working with big data, and some basic statistical and data mining techniques for analyzing large datasets including classification, clustering, and association rule mining.
Although the beginnings of SQL date back to 70s, the language is more relevant than ever. You can save your data in good old PostgreSQL, a fancy new NoSQL database, or even some in-house built storage, but everyone will want to use SQL to query it.
At Zoomdata we build a modern BI platform, which works natively with regular DBs and big data alike. We went through several implementations, employing different approaches and frameworks, but in the end, concluded that the best way to execute analytical queries is to use an engine that natively understands SQL and relational algebra.
In this talk I will introduce Apache Calcite – an open source framework that can help you build your own database, execute queries over distributed data sources, and much more.
The document provides an overview of SAP BW (Business Warehouse), including its key components and architecture. SAP BW is a data warehouse system optimized for reporting and analysis. It includes preconfigured support for extracting data from SAP systems like R/3 as well as tools for extracting from non-SAP sources. The core components include the Administrator Workbench for managing metadata and content, data modeling tools, extraction and loading processes, the operational data store, and BEX reporting tools. Data is loaded from source systems into an in-memory database optimized for online analytical processing.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
New feature overview of Cubes 1.0 – lightweight Python OLAP and pluggable data warehouse. Video: https://www.youtube.com/watch?v=-FDTK80zsXc Github sources: https://github.com/databrewery/cubes
The document provides an overview of data warehousing and discusses key concepts such as:
1) Data warehouses address the problem of data being scattered across different sources by creating a single, consistent store of data from various sources for analysis.
2) Dimensional modeling techniques like star schemas and snowflake schemas are used to structure data warehouses for analysis.
3) Data transformation is required to integrate data from different sources and achieve a single version of truth.
Refreshing and querying the data warehouse are also discussed.
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
OLAP Cube is about pre-aggregations, it reduces the query latency by spending more time and resources on data preparation. But for real-time analytics, data preparation and visibility latency are critical. What happens when OLAP cube meets real-time use cases?
Can we pre-build the cubes in real-time with a quick and more cost effective way? This is hard but still doable.
In eBay,we built our own real-time OLAP solution based on Apache Kylin & Apache Kafka. We read unbounded events from Kafka cluster then divide the streaming data into 3 stages, In-Memory Stage (Continuously In-Memory Aggregations) , On Disk Stage (Flush to disk, columnar based storage and indexes) and Full Cubing Stage (with MR or Spark, save to HBase). Data are aggregated to different layers in different stage, but all query able. Data will be transformed from 1 stage to another stage automatically and transparent to user.
This solution is built to support quite a few realtime analytics use cases in eBay, we will share some use cases like site speed monitoring and eBay site deal performance in this session as well.
Speaker:
Qiaoneng Qian, Senior Product Manager, eBay
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
This document discusses data warehouses, including what they are, how they are implemented, and how they can be further developed. It provides definitions of key concepts like data warehouses, data cubes, and OLAP. It also describes techniques for efficient data cube computation, indexing of OLAP data, and processing of OLAP queries. Finally, it discusses different approaches to data warehouse implementation and development of data cube technology.
This document discusses agile practices for data warehousing, including test-driven development, using a backlog, and kanban workflows. It emphasizes treating data modeling as an iterative process by writing data stories to define requirements, prioritizing stories, and developing the data model and interfaces in small batches. Testing is integrated throughout via test data and test cases added to the backlog. Visual tools like cumulative flow diagrams help optimize workflows by limiting work in progress.
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a massively parallel columnar storage engine for MariaDB that provides high performance analytics on large datasets. It uses a distributed columnar architecture where each column is stored separately and data is partitioned horizontally across nodes. This allows for very fast analytical queries by only accessing the relevant columns and partitions. Some key features include built-in analytics functions, high speed data ingestion, and support for running on-premises or on cloud platforms like AWS. The latest 1.1 version adds capabilities like streaming data ingestion APIs, improved high availability with GlusterFS, and performance optimizations.
This document discusses data warehouse models and OLAP operations. It introduces key concepts like dimensional modeling, star schemas, snowflake schemas, and multi-dimensional data cubes. It explains how OLAP allows for interactive analysis of large datasets through operations like roll-up, drill-down, slicing, and pivoting. It also outlines different approaches to implementing OLAP, including ROLAP which uses relational databases and MOLAP which uses specialized multi-dimensional arrays.
a2c Boston Big Data Meet-up: Agile Data Warehouse Designa2c
Preview this Big Data Seminar, and request the complete audio and animated download featuring Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs. The method utilizes BEAM✲ (Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. a2c's Practice Director of Information Services and Author Jim Stagnitto and CTO John DiPietro designed this presentation to provide an overview of Agile Warehouse Design that will facilitate communication between Data Modelers and Business Intelligence Stakeholders in a fun and informative one hour session. Demystify this process and find out what the 96 Data Scientists who attended November's Boston Big Data Meet-up are talking about.
“Excellent presentation. It is good to hear meaningful …information about new developments in how Agile methodologies can be applied to DW/BI work. Big Kudos to the presenters and organizers. Thanks, I found it very useful and enjoyable.”- Ramon Venegas
“Extremely useful to understand how to apply Agile approach to DWH; how create a framework where model changes are welcome, and bring users to the process of DWH modeling.” – Alfredo Gomez
This document provides a portfolio of business intelligence experience including technologies like SQL Server Integration Services (SSIS), SAS, MDX, SQL Server Reporting Services (SSRS), and data warehousing. It includes samples of ETL processes, dimensional modeling, and MDX code. It also includes assembly language code samples for initializing reject message storage and enabling reject categories.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Metail uses Snowplow to collect customer data and Cascalog to process that data into normalized batch views for analysis. Cascalog transforms the raw Snowplow event stream into structured tables for things like customer body shape, orders, items ordered, returns, and browsers. This makes the data more manageable and complex analysis and aggregation easier. For example, Cascalog is used to calculate key performance indicators by grouping customer data and summing metrics from the batch views. The output is then analyzed further in R. Looker will also allow business analysts to access and explore the batch views and raw Snowplow event data.
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
NoSQL schemas are designed with very different goals in mind than SQL schemas. Where SQL normalizes data, NoSQL denormalizes. Where SQL joins ad-hoc, NoSQL pre-joins. And where SQL tries to push performance to the runtime, NoSQL bakes performance into the schema. Join us for an exploration of the core concepts of NoSQL schema design, using Scylla as an example to demonstrate the tradeoffs and rationale.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
2. 2
Warehousing
Growing industry: $8 billion way back in
1998
Range from desktop to huge:
Walmart: 900-CPU, 2,700 disk, 23TB
Teradata system
Lots of buzzwords, hype
slice & dice, rollup, MOLAP, pivot, ...
3. 3
Outline
What is a data warehouse?
Why a warehouse?
Models & operations
Implementing a warehouse
Future directions
4. 4
What is a Warehouse?
Collection of diverse data
subject oriented
aimed at executive, decision maker
often a copy of operational data
with value-added data (e.g., summaries, history)
integrated
time-varying
non-volatile
more
5. 5
What is a Warehouse?
Collection of tools
gathering data
cleansing, integrating, ...
querying, reporting, analysis
data mining
monitoring, administering warehouse
10. 10
Advantages of Warehousing
High query performance
Queries not visible outside warehouse
Local processing at sources unaffected
Can operate when sources unavailable
Can query data not stored in a DBMS
Extra information at warehouse
Modify, summarize (store aggregates)
Add historical information
11. 11
Advantages of Query-Driven
No need to copy data
less storage
no need to purchase data
More up-to-date data
Query needs can be unknown
Only query interface needed at sources
May be less draining on sources
12. 12
OLTP vs. OLAP
OLTP: On Line Transaction Processing
Describes processing at operational sites
OLAP: On Line Analytical Processing
Describes processing at warehouse
13. 13
OLTP vs. OLAP
Mostly updates
Many small transactions
Mb-Gb of data
Raw data
Clerical users
Up-to-date data
Consistency,
recoverability critical
Mostly reads
Queries long, complex
Gb-Tb of data
Summarized,
consolidated data
Decision-makers,
analysts as users
OLTP OLAP
14. 14
Data Marts
Smaller warehouses
Spans part of organization
e.g., marketing (customers, products, sales)
Do not require enterprise-wide consensus
but long term integration problems?
15. 15
Warehouse Models & Operators
Data Models
relations
stars & snowflakes
cubes
Operators
slice & dice
roll-up, drill down
pivoting
other
16. 16
Star
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
product prodId name price
p1 bolt 10
p2 nut 5
store storeId city
c1 nyc
c2 sfo
c3 la
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
19. 19
Dimension Hierarchies
store storeId cityId tId mgr
s5 sfo t1 joe
s7 sfo t2 fred
s9 la t1 nancy
city cityId pop regId
sfo 1M north
la 5M south
region regId name
north cold region
south warm region
sType tId size location
t1 small downtown
t2 large suburbs
store
sType
city region
snowflake schema
constellations
23. 23
Aggregates
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
24. 24
Aggregates
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
ans date sum
1 81
2 48
25. 25
Another Example
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
26. 26
Aggregates
Operators: sum, count, max, min,
median, ave
“Having” clause
Using dimension hierarchy
average by region (within store)
maximum by month (within date)
30. 30
Aggregation Using Hierarchies
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
region A region B
p1 56 54
p2 11 8
customer
region
country
(customer c1 in Region A;
customers c2, c3 in Region B)
31. 31
Pivoting
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
Multi-dimensional cube:Fact table view:
c1 c2 c3
p1 56 4 50
p2 11 8
Pivot turns unique values from
one column into unique columns
in the output
32. 32
Derived Data
Derived Warehouse Data
indexes
aggregates
materialized views (next slide)
When to update derived data?
Incremental vs. refresh
33. 33
Materialized Views
Define new warehouse relations using
SQL expressions
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
product id name price
p1 bolt 10
p2 nut 5
joinTb prodId name price storeId date amt
p1 bolt 10 c1 1 12
p2 nut 5 c1 1 11
p1 bolt 10 c3 1 50
p2 nut 5 c2 1 8
p1 bolt 10 c1 2 44
p1 bolt 10 c2 2 4
does not exist
at any source
34. 34
Processing
ROLAP servers vs. MOLAP servers
Index Structures
What to Materialize?
Algorithms
Client Client
Warehouse
Source Source Source
Query & Analysis
Integration
Metadata
35. 35
ROLAP Server
Relational OLAP Server
relational
DBMS
ROLAP
server
tools
utilities
sale prodId date sum
p1 1 62
p2 1 19
p1 2 48
Special indices, tuning;
Schema is “denormalized”
36. 36
MOLAP Server
Multi-Dimensional OLAP Server
multi-
dimensional
server
M.D. tools
utilities
could also
sit on
relational
DBMS
Product
City
Date
1 2 3 4
milk
soda
eggs
soap
A
B
Sales
37. 37
Index Structures
Traditional Access Methods
B-trees, hash tables, R-trees, grids, …
Popular in Warehouses
inverted lists
bit map indexes
join indexes
text indexes
39. 39
Using Inverted Lists
Query:
Get people with age = 20 and name = “fred”
List for age = 20: r4, r18, r34, r35
List for name = “fred”: r18, r52
Answer is intersection: r18
40. 40
Bit Maps
20
23
18
19
20
21
22
23
25
26
id name age
1 joe 20
2 fred 20
3 sally 21
4 nancy 20
5 tom 20
6 pat 25
7 dave 21
8 jeff 26
...
age
index
bit
maps
data
records
1
1
0
1
1
0
0
0
0
0
0
1
0
0
0
1
0
1
1
41. 41
Using Bit Maps
Query:
Get people with age = 20 and name = “fred”
List for age = 20: 1101100000
List for name = “fred”: 0100000001
Answer is intersection: 010000000000
Good if domain cardinality small
Bit vectors can be compressed
42. 42
Join
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• “Combine” SALE, PRODUCT relations
• In SQL: SELECT * FROM SALE, PRODUCT WHERE ...
product id name price
p1 bolt 10
p2 nut 5
joinTb prodId name price storeId date amt
p1 bolt 10 c1 1 12
p2 nut 5 c1 1 11
p1 bolt 10 c3 1 50
p2 nut 5 c2 1 8
p1 bolt 10 c1 2 44
p1 bolt 10 c2 2 4
43. 43
Join Indexes
product id name price jIndex
p1 bolt 10 r1,r3,r5,r6
p2 nut 5 r2,r4
sale rId prodId storeId date amt
r1 p1 c1 1 12
r2 p2 c1 1 11
r3 p1 c3 1 50
r4 p2 c2 1 8
r5 p1 c1 2 44
r6 p1 c2 2 4
join index
44. 44
What to Materialize?
Store in warehouse results useful for
common queries
Example:
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
c1 c2 c3
p1 56 4 50
p2 11 8
c1 c2 c3
p1 67 12 50
c1
p1 110
p2 19
129
. . .
total sales
materialize
46. 46
Cube Aggregates Lattice
city, product, date
city, product city, date product, date
city product date
all
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
c1 c2 c3
p1 56 4 50
p2 11 8
c1 c2 c3
p1 67 12 50
129
use greedy
algorithm to
decide what
to materialize
48. 48
Dimension Hierarchies
city, product
city, product, date
city, date product, date
city product date
all
state, product, date
state, date
state, product
state
not all arcs shown...