SlideShare a Scribd company logo
SAMSTAR:of Poster
A Semi-Automated Lexical Method to
Titl
generate Star Schemas from an ERD
Authors
Ritu Khare, Il Yeol Song, Yuan An
PROBLEM
To develop a star schema, existing approaches analyze the
attributes of interesting business entities.
•Entities with numerical measure attributes are assumed to be the
candidates of facts
•Entities with non-numerical and descriptive attributes are assumed
to be the candidates of dimensions.

2. Annotated Dimension Design Patterns (A_DDP)
We have referred to four sources and have instantiated the six classes of
DDP to produce a list of 131 commonly used dimension entities. We refer
to these entities as Annotated DDP (A_DDPs). These entities are
frequently used entities in the business processes. Examples include
account, activity, agent, aircraft, airport, etc.

RESULTS
Invoice

We focus on the structure of the ERD. The novel features of
SAMSTAR are:
(1) the use of the notion of Connection Topology Value (CTV) in
identifying the candidates of facts and dimensions and
(2) the use of Annotated Dimensional Design Patterns (A_DDP) as
well as WordNet to extend the list of dimensions.

Customer

ALGORITHM

Quantitative
Method
(FIND FACTS
AND DIRECT
DIMENSIONS)

DDP

Fig.2 SAMSTAR Overview

Quantitative
Method
(FIND INDIRECT
DIMENSIONS

WordNet

A_DDP

1. Connection Topology Value (CTV)

CTV (e) 1* n 0.8 *

CTV(Node(i ))
i 1

* If you are scanning charts,
cartoons, illustrations or
plain text non-photo
represents scan entity having
images), an at 600 dpi,
then ‘Save As’ at 225 dpi.

where i
a direct M:1 relationship with e.
For Fig. 1, CTV is calculated in the following manner:

A

B

E

H

weight_d=100%; weight_i=80%
D
F
C
The CTV for each entity is:
CTV (H) = 1* 0 + 0.8 * 0 = 0
CTV(F) = 0
G
CTV(G) = 0
CTV(E) = 1*1 + 0.8 * CTV(H) = 1
Fig.1: Calculation of CTV
CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8
CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2
CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6
CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04

www.ischool.drexel.edu

Order
Time

Logistics

Shipment

Order
Customer

Area

Shipment

Supplier

Product-Supplier

Order-Product

Promotion Store

Store

SAMSTAR

Order

Invoice
Customer

Store

Promotion
Store

Customer

Promotion Type

Product

Warehouse

Time

Star Schema

n

Return

Store

ER Diagram

OUR SOLUTION

Store

Invoice

Return

Warehouse

Hence, these approaches are qualitative in nature, and focus only
on the semantics of ERD.

Time

1. Pre-process the input ERD.
2. Store Entities and Relationships.
3. Let user choose weighting factors for direct and indirect relationships.
4. Calculate the CTV for all entities.
5. Calculate the threshold value, Th, for CTV.
6. Identify entities having CTV higher than the threshold Th.
These are the candidates for fact tables.
7. Decide and shortlist the fact entities.
8. For each fact entity, perform the following steps:
(i) Identify the entities having direct M:1 link with a fact entity.
(ii) Identify entities having indirect M:1 link with the fact entity.
Out of these entities, identify synonyms of entity names from WordNet.
Extract the terms which match the DDP entity list or the A_DDP List.
(iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate
dimensions for a given fact.
(iv) Add time dimension to this list.
9. Decide the dimension entities.
10. Let the user post-process the Star Schemas
11. Generate the star schema(s).

Order
Product

Order

Product

Invoice

Fig.3: Result of SAMSTAR
Schemas generated by SAMSTAR are similar to those generated
by the manual steps in case study papers.
SAMSTAR generated star schemas are the superset of the ones
generated manually in the paper using the same ER diagrams,
user needs and business goals.
This shows our schemas are inclusive of all possible facts and
dimensions. This gives the designer a helpful aid to prune the
schema as per the business and user requirements.

CONTRIBUTION
•A universal method to generate star schema(s) in that we have
used generalized DDPs and WordNet to identify dimensions of a
fact table.
•Quantitative in nature in that we analyze the structure of an input
ERD.
•Identifies a set of fact candidates from a large and complex ERD.
•Automatable up to a large extent; simplifies the work of
experienced designers; and gives a smooth head-start to novices.

The paper SAMSTAR: A Semi-Automated Lexical Method for
generating Star Schemas using an Entity Relationship
Diagram was presented at Data warehousing and On-line
Analytical Processing (DOLAP 2007) workshop held in Lisbon,
Portugal, November 2007.

More Related Content

What's hot

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternatives
Bich Lien Pham
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability ii
Bich Lien Pham
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysis
Bich Lien Pham
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
MohammedMedani4
 
slides-josepcoves
slides-josepcovesslides-josepcoves
slides-josepcoves
Josep Coves Barreiro
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
MohammedMedani4
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation ii
Bich Lien Pham
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitation
Bich Lien Pham
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogram
Media4math
 
Assignment in java
Assignment in javaAssignment in java
Assignment in java
Mariel Canonicato
 
Thesis PPT
Thesis PPTThesis PPT
Thesis PPT
Drew Ferkin
 
test
testtest
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using Matlab
Shiv Koppad
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysis
Bich Lien Pham
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysis
Bich Lien Pham
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest rates
Bich Lien Pham
 

What's hot (16)

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternatives
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability ii
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysis
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
slides-josepcoves
slides-josepcovesslides-josepcoves
slides-josepcoves
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation ii
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitation
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogram
 
Assignment in java
Assignment in javaAssignment in java
Assignment in java
 
Thesis PPT
Thesis PPTThesis PPT
Thesis PPT
 
test
testtest
test
 
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using Matlab
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysis
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysis
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest rates
 

Viewers also liked

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opiniones
corronuevo
 
Tecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidadTecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidad
Jairo y Johanna Prieto Gonzalez
 
Edit anaylist
Edit anaylistEdit anaylist
Edit anaylist
alicedriver
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfid
Ecwaytech
 
Motos
MotosMotos
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del paciente
Formando Enfermeria
 
Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015
Siemprefm Cientouno Punto Cinco
 
Brad Pitt & Angelina Jolie
Brad Pitt & Angelina JolieBrad Pitt & Angelina Jolie
Brad Pitt & Angelina Jolie
michellekarlstedt
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...
Ecwaytech
 
Paz ivan
Paz ivanPaz ivan
Paz ivan
arinani
 
Resume Carrie
Resume CarrieResume Carrie
Resume Carrie
Carrieann Light
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior paper
cornel doyle
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesores
TICS & Partners
 
시디론Ppt
시디론Ppt시디론Ppt
시디론Ppt
소은 박
 
USAFBannual_2014
USAFBannual_2014USAFBannual_2014
USAFBannual_2014
Aaron Hill (@Aaron_Hill19)
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing Strategies
Kawser Ahmad Sohan
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রাম
Kawser Ahmad Sohan
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSS
Vijayananda Mohire
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysis
Ecwaytech
 

Viewers also liked (20)

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opiniones
 
Tecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidadTecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidad
 
Edit anaylist
Edit anaylistEdit anaylist
Edit anaylist
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfid
 
Motos
MotosMotos
Motos
 
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del paciente
 
Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015
 
Brad Pitt & Angelina Jolie
Brad Pitt & Angelina JolieBrad Pitt & Angelina Jolie
Brad Pitt & Angelina Jolie
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...
 
Paz ivan
Paz ivanPaz ivan
Paz ivan
 
Resume Carrie
Resume CarrieResume Carrie
Resume Carrie
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior paper
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesores
 
시디론Ppt
시디론Ppt시디론Ppt
시디론Ppt
 
ASNT_Level_III_UT
ASNT_Level_III_UTASNT_Level_III_UT
ASNT_Level_III_UT
 
USAFBannual_2014
USAFBannual_2014USAFBannual_2014
USAFBannual_2014
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing Strategies
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রাম
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSS
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysis
 

Similar to SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Julian Hyde
 
C a1 fin_10
C a1 fin_10C a1 fin_10
C a1 fin_10
Konstantin Shiryaev
 
Bw14
Bw14Bw14
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
Chris Seebacher
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
Paulo Gandra de Sousa
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
Paulo Gandra de Sousa
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
Kbengt521
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
amarsri
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
University of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
University of Illinois,Chicago
 
It ready dw_day4_rev00
It ready dw_day4_rev00It ready dw_day4_rev00
It ready dw_day4_rev00
Siwawong Wuttipongprasert
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM Mappings
Alithya
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
Dataconomy Media
 
C++
C++C++
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
Torsten Steinbach
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
eddiebaggott
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
MapR Technologies
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic Concepts
Hareem Aslam
 

Similar to SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD (20)

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
C a1 fin_10
C a1 fin_10C a1 fin_10
C a1 fin_10
 
Bw14
Bw14Bw14
Bw14
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
It ready dw_day4_rev00
It ready dw_day4_rev00It ready dw_day4_rev00
It ready dw_day4_rev00
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM Mappings
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
C++
C++C++
C++
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic Concepts
 

Recently uploaded

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 

Recently uploaded (20)

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 

SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

  • 1. SAMSTAR:of Poster A Semi-Automated Lexical Method to Titl generate Star Schemas from an ERD Authors Ritu Khare, Il Yeol Song, Yuan An PROBLEM To develop a star schema, existing approaches analyze the attributes of interesting business entities. •Entities with numerical measure attributes are assumed to be the candidates of facts •Entities with non-numerical and descriptive attributes are assumed to be the candidates of dimensions. 2. Annotated Dimension Design Patterns (A_DDP) We have referred to four sources and have instantiated the six classes of DDP to produce a list of 131 commonly used dimension entities. We refer to these entities as Annotated DDP (A_DDPs). These entities are frequently used entities in the business processes. Examples include account, activity, agent, aircraft, airport, etc. RESULTS Invoice We focus on the structure of the ERD. The novel features of SAMSTAR are: (1) the use of the notion of Connection Topology Value (CTV) in identifying the candidates of facts and dimensions and (2) the use of Annotated Dimensional Design Patterns (A_DDP) as well as WordNet to extend the list of dimensions. Customer ALGORITHM Quantitative Method (FIND FACTS AND DIRECT DIMENSIONS) DDP Fig.2 SAMSTAR Overview Quantitative Method (FIND INDIRECT DIMENSIONS WordNet A_DDP 1. Connection Topology Value (CTV) CTV (e) 1* n 0.8 * CTV(Node(i )) i 1 * If you are scanning charts, cartoons, illustrations or plain text non-photo represents scan entity having images), an at 600 dpi, then ‘Save As’ at 225 dpi. where i a direct M:1 relationship with e. For Fig. 1, CTV is calculated in the following manner: A B E H weight_d=100%; weight_i=80% D F C The CTV for each entity is: CTV (H) = 1* 0 + 0.8 * 0 = 0 CTV(F) = 0 G CTV(G) = 0 CTV(E) = 1*1 + 0.8 * CTV(H) = 1 Fig.1: Calculation of CTV CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8 CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2 CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6 CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04 www.ischool.drexel.edu Order Time Logistics Shipment Order Customer Area Shipment Supplier Product-Supplier Order-Product Promotion Store Store SAMSTAR Order Invoice Customer Store Promotion Store Customer Promotion Type Product Warehouse Time Star Schema n Return Store ER Diagram OUR SOLUTION Store Invoice Return Warehouse Hence, these approaches are qualitative in nature, and focus only on the semantics of ERD. Time 1. Pre-process the input ERD. 2. Store Entities and Relationships. 3. Let user choose weighting factors for direct and indirect relationships. 4. Calculate the CTV for all entities. 5. Calculate the threshold value, Th, for CTV. 6. Identify entities having CTV higher than the threshold Th. These are the candidates for fact tables. 7. Decide and shortlist the fact entities. 8. For each fact entity, perform the following steps: (i) Identify the entities having direct M:1 link with a fact entity. (ii) Identify entities having indirect M:1 link with the fact entity. Out of these entities, identify synonyms of entity names from WordNet. Extract the terms which match the DDP entity list or the A_DDP List. (iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate dimensions for a given fact. (iv) Add time dimension to this list. 9. Decide the dimension entities. 10. Let the user post-process the Star Schemas 11. Generate the star schema(s). Order Product Order Product Invoice Fig.3: Result of SAMSTAR Schemas generated by SAMSTAR are similar to those generated by the manual steps in case study papers. SAMSTAR generated star schemas are the superset of the ones generated manually in the paper using the same ER diagrams, user needs and business goals. This shows our schemas are inclusive of all possible facts and dimensions. This gives the designer a helpful aid to prune the schema as per the business and user requirements. CONTRIBUTION •A universal method to generate star schema(s) in that we have used generalized DDPs and WordNet to identify dimensions of a fact table. •Quantitative in nature in that we analyze the structure of an input ERD. •Identifies a set of fact candidates from a large and complex ERD. •Automatable up to a large extent; simplifies the work of experienced designers; and gives a smooth head-start to novices. The paper SAMSTAR: A Semi-Automated Lexical Method for generating Star Schemas using an Entity Relationship Diagram was presented at Data warehousing and On-line Analytical Processing (DOLAP 2007) workshop held in Lisbon, Portugal, November 2007.