The document discusses data warehouse implementation and online analytical processing (OLAP). It describes the compute cube operator, which computes aggregates for all subsets of specified dimensions. It also covers efficient cube computation techniques like chunking and materialized views. Better access methods for OLAP like bitmap indexing and join indexing are also summarized. The document emphasizes that efficient query processing requires determining which operations to perform on available cuboids and selecting the optimal cuboid based on factors like storage size and indexing.
The seminar is about Data warehousing, in here we are gonna discuss about what is data warehousing, comparison b/w database and data warehouse, different data warehouse models.about Data mart, and disadvantages of data warehousing.
The seminar is about Data warehousing, in here we are gonna discuss about what is data warehousing, comparison b/w database and data warehouse, different data warehouse models.about Data mart, and disadvantages of data warehousing.
Talks about best practices and patterns on how to design an efficient cube in Kylin. Covers concepts like mandatory dimension, hierarchy dimension, derived dimension, incremental build, aggregation group etc.
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets.
If you want to do multi-dimension analysis on large data sets (billion+ rows) with low query latency (sub-seconds), Kylin is a good option. Kylin also provides seamless integration with existing BI tools (e.g Tableau).
2. "Design Patterns: Elements of Reusable Object-Oriented Software" by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides:
Understanding design patterns is crucial for building scalable and maintainable software. This book introduces 23 classic design patterns that solve recurring design problems. It's an excellent resource for software architects and developers looking to enhance their object-oriented design skills.
3. "The Pragmatic Programmer: Your Journey to Mastery" by Dave Thomas and Andy Hunt:
This book provides pragmatic advice for programmers at all levels. It covers a wide range of topics, including code organization, debugging, testing, and automation. The authors share valuable insights and best practices that can significantly impact your efficiency and effectiveness as a developer.
4. "Introduction to Algorithms" by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein:
For a deep dive into algorithms and data structures, this book is a comprehensive resource. It's widely used in computer science courses and covers essential algorithms, their analysis, and their application in solving real-world problems. The book's clarity and rigor make it suitable for both beginners and experienced developers.
5. "Code Complete: A Practical Handbook of Software Construction" by Steve McConnell:
"Code Complete" is a comprehensive guide to software construction, covering a wide array of topics related to writing high-quality code. It's suitable for developers at various experience levels and provides practical advice, examples, and case studies to help you improve your coding skills.
6. "The Mythical Man-Month: Essays on Software Engineering" by Frederick P. Brooks Jr.:
This classic book offers valuable insights into software engineering and project management. Frederick Brooks discusses the challenges of software development, including the famous concept of "The Mythical Man-Month," which explores the complexities of managing large software projects. It remains relevant and thought-provoking decades after its initial publication.
7. "Refactoring: Improving the Design of Existing Code" by Martin Fowler:
In the real world, developers often work with existing codebases. This book provides practical strategies for improving the design of existing code through refactoring. Martin Fowler introduces numerous refactorings and explains the principles behind them, making it an invaluable resource for enhancing code maintainability.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets and subsecond query latency.
In this talk we will look at how to efficiently (in both space and time) summarize large, potentially unbounded, streams of data by approximating the underlying distribution using so-called sketch algorithms. The main approach we are going to be looking at is summarization via histograms. Histograms have a number of desirable properties: they work well in an on-line setting, are embarrassingly parallel, and are space-bound. Not to mention they capture the entire (empirical) distribution which is something that otherwise often gets lost when doing descriptive statistics. Building from that we will delve into related problems of sampling in a stream setting, and updating in a batch setting; and highlight some cool tricks such as capturing time-dynamics via data snapshotting. To finish off we will touch upon algorithms to summarize categorical data, most notably count-min sketch.
Are OLAP cubes "large monters" that deliver quick data retrieval at the expense of long upload time? This presentation shows one way to kill this myth.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. “ What is the Challenge ? “
– Faster processing of OLAP queries
Requirements of a Data Warehouse system
Efficient cube computation
Better access methods
Efficient query processing
3. Cube computation
COMPUTE CUBE OPERATOR
Definition :
“ It computes the aggregates over all subsets of the
dimensions specified in the operation “
Syntax :
Compute cube cubename
Example
Consider we define the data cube for an electronic store “Best Electronics”
Dimensions are :
City
Item
Year
Measure :
Sales_in_dollars
4. Cube Operation
• Cube definition and computation in DMQL
define cube sales[item, city, year]: sum(sales_in_dollars)
compute cube sales
• Transform it into a SQL-like language (with a new operator cube by,
introduced by Gray et al.’96) ()
SELECT item, city, year, SUM (amount)
FROM SALES (city) (item) (year)
CUBE BY item, city, year
• Need compute the following Group-Bys
(city, item) (city, year) (item, year)
(date, product, customer),
(date,product),(date, customer), (product, customer),
(date), (product), (customer) (city, item, year)
() 4
5. Efficient Data Cube Computation
• Data cube can be viewed as a lattice of cuboids
– The bottom-most cuboid is the base cuboid
– The top-most cuboid (apex) contains only one cell
– How many cuboids in an n-dimensional cube with L levels?
n
T = ∏ ( Li + )
1
i= 1
• Materialization of data cube
– Materialize every (cuboid) (full materialization), none (no
materialization), or some (partial materialization)
– Selection of which cuboids to materialize
• Based on size, sharing, access frequency, etc.
5
6. Iceberg Cube
• Computing only the cuboid cells whose count
or other aggregates satisfying the condition like
HAVING COUNT(*) >= minsup
Motivation
Only a small portion of cube cells may be “above the water’’
in a sparse cube
Only calculate “interesting” cells—data above certain
threshold
Avoid explosive growth of the cube
Suppose 100 dimensions, only 1 base cell. How many aggregate cells if
6
7. Compute cube operator
• The statement “ compute cube sales “
• It explicitly instructs the system to compute the sales aggregate cuboids for all the subsets of the set { item,
city, year}
• Generates a lattice of cuboids making up a 3-D data cube ‘sales’
• Each cuboid in the lattice corresponds to a subset
Figure from Data Mining Concepts & Techniques
By Jiawei Han & Micheline Kamber
Page # 72
8. Compute cube operator
Advantages
– Computes all the cuboids for the cube in advance
– Online analytical processing needs to access different cuboids for different queries.
– Precomputation leads to fast response time
Disadvantages
– Required storage space may explode if all of the cuboids in the data cube are
precomputed
• Consider the following 2 cases for n-dimensional cube
– Case 1 : Dimensions have no hierarchies
• Then the total number of cuboids computed for a n-dimensional cube = 2n
– Case 2: Dimensions have hierarchies
• Then the total number of cuboids computed for a n-dimensional cube =
» Where Li is the number of levels associated with dimension i
9. Multiway Array Aggregation
“ What is chunking ?”
• MOLAP uses multidimensional array for data storage
• Chunk is obtained by partitioning the multidimensional array such that it
is small enough to fit in the memory available for cube computation
So from the above 2 points we get :
“ Chunking is a method for dividing the n-dimensional array into small n-
dimensional chunks “
10. Multiway Array Aggregation
• It is a technique used for the computation of data cube
• It is used for MOLAP cube construction
Example
• Consider 3-D data array
• Dimensions are A,B,C
• Each dimension is partitioned into 4
equalized partitions
• A : a0,a1,a2,a3
• B : b0,b1,b2,b3
• C : c0,c1,c2,c3
• 3-D array is partitioned into 64 chunks as
shown in the figure
Figure from Data Mining Concepts & Techniques
By Jiawei Han & Micheline Kamber
Page # 76
11. Multiway Array Aggregation (contd )
• The cuboids that make up the cube are
– Base cuboid ABC
• From which all other cuboids are
generated
• It is already computed and corresponds
to given 3-D array
– 2-D cuboids AB,AC,BC
– 1-D cuboids A,B,C
– 0-D cuboid (apex cuboid)
Figure from Data Mining Concepts &
Techniques
By Jiawei Han & Micheline Kamber
Page # 76
12. Better access methods
For efficient data accessing :
• Materialized View
• Index structures
• Bitmap Indexing – allows quick searching on Data
Cubes, through record_ID lists.
• Join Indexing – creates a joinable rows of two
relations from a relational database.
13. Materialized View
“ Materialized views contains aggregate data (cuboids)
derived from a fact table in order to minimize the
query response time “
There are 3 kinds of materialization
(Given a base cuboid )
1. No Materialization
– Precompute only the base cuboid
• “ Slow response time ”
2. Full Materialization
– Precompute all of the cuboids
• “ Large storage space “
3. Partial Materialization
– Selectively compute a subset of the cuboids
• “ Mix of the above “
14. Bitmap Indexing
• Used for quick searching in data cubes
• Features
– A distinct bit vector Bv ,for each value v in the domain of the attribute
– If the domain has n values then the bitmap index has n bit vectors
Example
Dimensions
• Item
• city
Where:
H=Home entertainment, C=Computer
P=Phone, S=Security
V=Vancouver, T=Toronto
15. Join Indexing
• It is useful in maintaining the relationship between the foreign key
and its matching primary key
Consider the sales fact table and the dimension tables for location and item
17. Efficient query processing
• Query processing proceeds as follows given materialized
views :
– Determine which operations should be performed on the available
cuboids
• Transforming operations (selection, roll-up, drill down,…) specified in the query into
corresponding sql and/or OLAP operations.
– Determine to which materialized cuboid(s) the relevant operations
should be applied
• Identifying the cuboids for answering the query
• Select the cuboid with the least cost
18. Consider a data cube for “Best Electronics” of the form
• “sales [time, item, location]:sum(sales_in_dollars)
• Dimension hierarchies used are :
– “ day<month<quarter<year ” for time
– “ item_name<brand<type” for item
– “ street<city<province_or_state<country “ for location
• Query :{ brand,province_or_state} with year = 2000
• Materialized cuboids available are
• Cuboid 1: { item_name,city,year}
• Cuboid 2: {brand,country,year}
• Cuboid 3: {brand,province_or_state,year}
• Cuboid 4: {item_name,province_or_state} where year=2000
19. “ Which of the above four cuboids should be selected to process
the query ? “
• Cuboid 2
– It cannot be used
» Since finer granularity data cannot be generated from coarser granularity data
» Here country is more general concept than province_or_state
• Cuboid 1,3,4
• Can be used
• They have the same set or a superset of the dimensions in the query
• The selection clause in the query can imply the selection in the cuboid
• The abstraction levels for the item and location dimensions are at a finer level
than brand and province_or_state respectively
20. “How would the cost:of each cuboid compare if used to process the query”
• Cuboid 1
– Will cost more
• Since both item_name and city are at a lower level than brand and
province_or_state specified in the query
• Cuboid 3 :
• Will cost least
• If there are not many year values associated with items in the cube but there are
several item_names for each brand
• Cuboid 3 will be smaller than cuboid 4
• Cuboid 4 :
• Will cost least
• If efficient indices are available
“Hence some cost based estimation is required in order to decide which set of
cuboids must be selected for query processing “
21. Indexing OLAP Data: Bitmap Index
• Index on a particular column
• Each value in the column has a bit vector: bit-op is fast
• The length of the bit vector: # of records in the base table
• The i-th bit is set if the i-th row of the base table has the value for the
indexed column
• not suitable for high cardinality domains
Base table Index on Region Index on Type
Cust Region Type RecIDAsia Europe America RecID Retail Dealer
C1 Asia Retail 1 1 0 0 1 1 0
C2 Europe Dealer 2 0 1 0 2 0 1
C3 Asia Dealer 3 1 0 0 3 0 1
C4 America Retail 4 0 0 1 4 1 0
C5 Europe Dealer 5 0 1 0 5 0 1
21
22. Indexing OLAP Data: Join Indices
• Join index: JI(R-id, S-id) where R (R-id, …) S (S-id,
…)
• Traditional indices map the values to a list of record
ids
– It materializes relational join in JI file and speeds
up relational join
• In data warehouses, join index relates the values of
the dimensions of a start schema to rows in the fact
table.
– E.g. fact table: Sales and two dimensions city and
product
• A join index on city maintains for each distinct
city a list of R-IDs of the tuples recording the
Sales in the city
– Join indices can span multiple dimensions
22
23. Efficient Processing OLAP Queries
• Determine which operations should be performed on the available cuboids
– Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice
= selection + projection
• Determine which materialized cuboid(s) should be selected for OLAP op.
– Let the query to be processed be on {brand, province_or_state} with the condition
“year = 2004”, and there are 4 materialized cuboids available:
1) {year, item_name, city}
2) {year, brand, country}
3) {year, brand, province_or_state}
4) {item_name, province_or_state} where year = 2004
Which should be selected to process the query?
• Explore indexing structures and compressed vs. dense array structs in MOLAP
23
25. Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results
using visualization tools
25
26. From On-Line Analytical Processing (OLAP)
to On Line Analytical Mining (OLAM)
• Why online analytical mining?
– High quality of data in data warehouses
• DW contains integrated, consistent, cleaned data
– Available information processing structure surrounding data
warehouses
• ODBC, OLEDB, Web accessing, service facilities, reporting
and OLAP tools
– OLAP-based exploratory data analysis
• Mining with drilling, dicing, pivoting, etc.
– On-line selection of data mining functions
• Integration and swapping of multiple mining functions,
algorithms, and tasks
26
27. An OLAM System Architecture
Mining query Mining result Layer4
User Interface
User GUI API
Layer3
OLAM OLAP
Engine Engine OLAP/OLAM
Data Cube API
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
27
28. OLAP APPLICATIONS
• Financial Applications
• Activity-based costing (resource allocation)
• Budgeting
• Marketing/Sales Applications
• Market Research Analysis
• Sales Forecasting
• Promotions Analysis
• Customer Analyses
• Market/Customer Segmentation
• Business modeling
• Simulating business behaviour
• Extensive, real-time decision support system for managers
29. BENEFITS OF USING OLAP
• OLAP helps managers in decision-making through the multidimensional data
views that it is capable of providing, thus increasing their productivity.
• OLAP applications are self-sufficient owing to the inherent flexibility provided to
the organized databases.
• It enables simulation of business models and problems, through extensive usage
of analysis-capabilities.
• In conjunction with data warehousing, OLAP can be used to provide reduction in
the application backlog, faster information retrieval and reduction in query drag..