SlideShare a Scribd company logo
1 of 15
EDDI 2013
5th Annual European DDI User Conference
December 3, 2013 – December 4, 2013
Paris, France
Metadata Technology North America
http://www.mtna.us
http://www.openmetadata.org
mtna@mtna.us
The <Meta>Data Dance
The <Meta/>Data Dance
The <Meta>Data Dance
 Tools to facilitate statistical/scientific data management
 Complement existing tools (not replace)
 Facilitate adoption of standards and promote best practices
 Alleviate the need to master complexities of metadata standards,
technologies

 Vision: web based services (cloud users), stand alone desktop /server
products (agencies/enclaves), libraries (developers)
 Broad Target audience: Data Producers / Manager / Researchers /
Users
 DataForge Today (lean model):
 Simple desktop based tools
 SledgeHammer - Data/metadata transformation


LavaCore - Java library

 Caelum - Reporting / Publication
 Asmurex: ASCII Multi-Record Exctraction tool

 DataForge Tomorrow:
 DataForge Online, Web Services, More Desktop tools, Java Library
__ _
_
/ _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___
___ _ __
  | |/ _ / _` |/ _` |/ _ / /_/ / _` | '_ ` _ | '_ ` _  / _  '__|
_ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ |
__/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_|
|___/

 Data Liberation
 Unlock from proprietary formats, turn into open data
 Produce metadata + convert data into standard ASCII formats

 Open <Meta>Data Integration
 Generating scripts for loading into stats packages, databases, big
data engines, cloud
 Use with open systems / environments
script

triple-s
 Challenges
 Read proprietary formats
 Data integrity
 ASCII optimization
 Cross package formatting issues
 SQL Database limits (columns, indexes, views)
 Performance (Timings, size, memory usage)
 …
 Key Features
 Input: SPSS, Stata, SAS/Syntax+ASCII, ASCII+DDI-C / DDI-L,
ASCII+SSS 1.1/2.0, ASCII+StatTransfer
 Data Out: Fixed w/optimization, CSV, Delimited
 Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1SU/Colectica, DDI-L 3.2-RP, DDI-L 3.2 SU, Triple-S 1.1/2.0
 Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev
/Variance / Frequencies, Weighted *, Save in CSV*.
 Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MSSQL*, HSQL*, Google BigQuery*, HP/Vertica* +
dimension/lookup tables, indexes/foreign keys*
 Command line interface
 Key changes since IASSIST 2013
 Desktop User Interface
 SAS (Win+DSRead), handling of missing values, DDI 3.2, lots of
tweaks

 Editions
 Freeware
 Community: 100 variables / 1,000 obs.
 Registered (free): 500 variables / 5,000 obs. (or contact us)
 Available today (Candidate Release)

 Pro/Enterprise (1Q2014)
 No data size limits
 Weighted stats, save stats, classification auto-harmonizer, batch mode,
add database formats & star schema
 Subscription: $29/3-month or $99/year , Perpetual: $279

 LavaCore:
 Currently internal only, planning for OEM
 Future Features (demand driven)
 Support additional input/output formats
 Additional ID generators

 DataWeaver (data transformation)
 Produce data subset
 Synthetic data generators
 Disclosure control

 DataForge Online (2014)
 Free and pay as you go services

 DataForge Services (available today)
 Community support in Google groups
 MTNA: pay as you go / annual support, custom developments
 IDMS: data/metadata management / processing
Products

SledgeHammer

Caelum

DataWeaver

Asmurex

Data/metadata
Management
(free/paid)

Documentation
XML Transform
(free)

Data subsets,
derivation, transforms
(internal/project)

ASCII multi record
Extraction
(free)

DMP Editor

CMS

Simple editor for
Data Management
Plan
(OSS)

Classification
Management System
(2014Q3 / paid )

SledgeHammer--BI Survey Manager
Integration with
Business Intelligence
(internal/projects)

DDI 3.1 Editor for
simple projects or
training
(OSS)
Services
Expertise
Expert Advisory and Technical Assistance
• Establishment of corporate data
management strategies
• Adoption of metadata standards and
related best practices
• Development of technical specifications
for data management infrastructure
• Statistical and scientific data management
technologies

Training
Training in and Support for:
• Data Documentation Initiative (DDI)
• Statistical Data and Metadata Exchange
(SDMX)
• Generic Statistical Business Process Model
(GSBPM)
• Generic Statistical Information Model (GSIM)
• General data and metadata management
support

Development
Application Development
• Statistical and Scientific data
management solutions
• Solutions include production, archiving,
dissemination, analysis
• Web-based data delivery solutions (web
services, portals, secure environments)
• Standards-based data management
tools/utilities

Technologies
Leverage New Technologies
• Help introduce new state-of-the-art
technologies to the community and
customers
• Deployment, hosting, and management
of cloud-based infrastructures
http://www.openmetadata.org/dataforge/sledgehammer

DEMO

More Related Content

What's hot

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel AbdellaouiStatsCommunications
 
WEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETWEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETDhruvVekariya3
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
DW Appliance
DW ApplianceDW Appliance
DW ApplianceShankar R
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & martsNymphea Saraf
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsDataminingTools Inc
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...StatsCommunications
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...StatsCommunications
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 

What's hot (20)

2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
 
WEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NETWEB PROGRAMMING USING ASP.NET
WEB PROGRAMMING USING ASP.NET
 
Data services
Data servicesData services
Data services
 
Database and types of databases
Database and types of databasesDatabase and types of databases
Database and types of databases
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
MS Access
MS AccessMS Access
MS Access
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & marts
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
 
Tableau best practises
Tableau best practisesTableau best practises
Tableau best practises
 
Bigdata
BigdataBigdata
Bigdata
 
Database
DatabaseDatabase
Database
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Tableau file types
Tableau   file typesTableau   file types
Tableau file types
 

Viewers also liked

NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelSusanna-Assunta Sansone
 
Beyond regulatory submission - standards metadata management
Beyond regulatory submission  - standards metadata managementBeyond regulatory submission  - standards metadata management
Beyond regulatory submission - standards metadata managementKevin Lee
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsSusanna-Assunta Sansone
 
A brief overview of metadata for datasets
A brief overview of metadata for datasetsA brief overview of metadata for datasets
A brief overview of metadata for datasetssesrdm
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 

Viewers also liked (7)

What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
Beyond regulatory submission - standards metadata management
Beyond regulatory submission  - standards metadata managementBeyond regulatory submission  - standards metadata management
Beyond regulatory submission - standards metadata management
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
 
A brief overview of metadata for datasets
A brief overview of metadata for datasetsA brief overview of metadata for datasets
A brief overview of metadata for datasets
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 

Similar to OpenDataForge - SledgeHammer EDDI 2013 presentation

Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...DataScienceConferenc1
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...DataBench
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalknzhang
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdfMASSAL3
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022Kavika Roy
 
Analyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalytixDataServices
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developersllangit
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000ukdpe
 

Similar to OpenDataForge - SledgeHammer EDDI 2013 presentation (20)

jagadeesh updated
jagadeesh updatedjagadeesh updated
jagadeesh updated
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
 
Data mining
Data miningData mining
Data mining
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
ACdP Fiware.pdf
ACdP Fiware.pdfACdP Fiware.pdf
ACdP Fiware.pdf
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Analyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentationAnalyti x mapping manager product overview presentation
Analyti x mapping manager product overview presentation
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Mihai_Nuta
Mihai_NutaMihai_Nuta
Mihai_Nuta
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
 
BR_Summation_08052014
BR_Summation_08052014BR_Summation_08052014
BR_Summation_08052014
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

OpenDataForge - SledgeHammer EDDI 2013 presentation

  • 1. EDDI 2013 5th Annual European DDI User Conference December 3, 2013 – December 4, 2013 Paris, France Metadata Technology North America http://www.mtna.us http://www.openmetadata.org mtna@mtna.us
  • 5.  Tools to facilitate statistical/scientific data management  Complement existing tools (not replace)  Facilitate adoption of standards and promote best practices  Alleviate the need to master complexities of metadata standards, technologies  Vision: web based services (cloud users), stand alone desktop /server products (agencies/enclaves), libraries (developers)  Broad Target audience: Data Producers / Manager / Researchers / Users  DataForge Today (lean model):  Simple desktop based tools  SledgeHammer - Data/metadata transformation  LavaCore - Java library  Caelum - Reporting / Publication  Asmurex: ASCII Multi-Record Exctraction tool  DataForge Tomorrow:  DataForge Online, Web Services, More Desktop tools, Java Library
  • 6. __ _ _ / _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___ ___ _ __ | |/ _ / _` |/ _` |/ _ / /_/ / _` | '_ ` _ | '_ ` _ / _ '__| _ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ | __/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_| |___/  Data Liberation  Unlock from proprietary formats, turn into open data  Produce metadata + convert data into standard ASCII formats  Open <Meta>Data Integration  Generating scripts for loading into stats packages, databases, big data engines, cloud  Use with open systems / environments
  • 8.  Challenges  Read proprietary formats  Data integrity  ASCII optimization  Cross package formatting issues  SQL Database limits (columns, indexes, views)  Performance (Timings, size, memory usage)  …
  • 9.  Key Features  Input: SPSS, Stata, SAS/Syntax+ASCII, ASCII+DDI-C / DDI-L, ASCII+SSS 1.1/2.0, ASCII+StatTransfer  Data Out: Fixed w/optimization, CSV, Delimited  Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1SU/Colectica, DDI-L 3.2-RP, DDI-L 3.2 SU, Triple-S 1.1/2.0  Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev /Variance / Frequencies, Weighted *, Save in CSV*.  Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MSSQL*, HSQL*, Google BigQuery*, HP/Vertica* + dimension/lookup tables, indexes/foreign keys*  Command line interface
  • 10.  Key changes since IASSIST 2013  Desktop User Interface  SAS (Win+DSRead), handling of missing values, DDI 3.2, lots of tweaks  Editions  Freeware  Community: 100 variables / 1,000 obs.  Registered (free): 500 variables / 5,000 obs. (or contact us)  Available today (Candidate Release)  Pro/Enterprise (1Q2014)  No data size limits  Weighted stats, save stats, classification auto-harmonizer, batch mode, add database formats & star schema  Subscription: $29/3-month or $99/year , Perpetual: $279  LavaCore:  Currently internal only, planning for OEM
  • 11.  Future Features (demand driven)  Support additional input/output formats  Additional ID generators  DataWeaver (data transformation)  Produce data subset  Synthetic data generators  Disclosure control  DataForge Online (2014)  Free and pay as you go services  DataForge Services (available today)  Community support in Google groups  MTNA: pay as you go / annual support, custom developments  IDMS: data/metadata management / processing
  • 12.
  • 13. Products SledgeHammer Caelum DataWeaver Asmurex Data/metadata Management (free/paid) Documentation XML Transform (free) Data subsets, derivation, transforms (internal/project) ASCII multi record Extraction (free) DMP Editor CMS Simple editor for Data Management Plan (OSS) Classification Management System (2014Q3 / paid ) SledgeHammer--BI Survey Manager Integration with Business Intelligence (internal/projects) DDI 3.1 Editor for simple projects or training (OSS)
  • 14. Services Expertise Expert Advisory and Technical Assistance • Establishment of corporate data management strategies • Adoption of metadata standards and related best practices • Development of technical specifications for data management infrastructure • Statistical and scientific data management technologies Training Training in and Support for: • Data Documentation Initiative (DDI) • Statistical Data and Metadata Exchange (SDMX) • Generic Statistical Business Process Model (GSBPM) • Generic Statistical Information Model (GSIM) • General data and metadata management support Development Application Development • Statistical and Scientific data management solutions • Solutions include production, archiving, dissemination, analysis • Web-based data delivery solutions (web services, portals, secure environments) • Standards-based data management tools/utilities Technologies Leverage New Technologies • Help introduce new state-of-the-art technologies to the community and customers • Deployment, hosting, and management of cloud-based infrastructures