Your SlideShare is downloading. ×
MTNA DataForge
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MTNA DataForge

112
views

Published on

IASSIST 2013 presentation of DataForge.

IASSIST 2013 presentation of DataForge.


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
112
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. IASSIST 2013
  • 2. The <Meta>Data Dance
  • 3. The <Meta/>Data Dance
  • 4. The <Meta>Data Dance
  • 5.  Tools to facilitate statistical/scientific data management Complement existing tools (not replace) Facilitate adoption of standards and promote best practices Alleviate the need to master complexities of metadata standards,technologies Vision: web based services (cloud users), stand alone desktop /serverproducts (agencies/enclaves), libraries (developers) Broad Target audience: Data Producers / Manager / Researchers /Users DataForge Today (lean model): Simple desktop based command line tools SledgeHammer - Data/metadata transformation LavaCore - Java library Caelum - Reporting / Publication DataForge Tomorrow: DataForge Online, Web Services, Desktop w/UI, Java Library
  • 6.  Data Liberation Unlock from proprietary formats, turn into open data Produce metadata + convert data into standard ASCII formats Open <Meta>Data Integration Generating scripts for loading into stats packages, databases, bigdata engines, cloud Use with open systems / environments__ _ _/ _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___ ___ _ __ | |/ _ / _` |/ _` |/ _ / /_/ / _` | _ ` _ | _ ` _ / _ __|_ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ |__/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_||___/
  • 7. triple-ssyntaxscript
  • 8.  Challenges Read proprietary formats Data integrity ASCII optimization Cross package formatting issues SQL Database limits (columns, indexes, views) Performance (Timings, size, memory usage) …
  • 9.  Key Features Input: SPSS, Stata, ASCII+SAS syntax, ASCII+DDI-C / DDI-L,ASCII+SSS 1.1/2.0, ASCII+StatTransfer Data Out: Fixed w/optimization, CSV, Delimited Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1-SU/Colectica, Triple-S 1.1/2.0 Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev/Variance / Frequencies, Weighted *, Save in CSV*. Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MS-SQL*, HSQL*, Google BigQuery*, HP/Vertica* +dimension/lookup tables, indexes/foreign keys* See User’s Guide at http://goo.gl/9xukB
  • 10.  Editions Freeware Anonymous: 100 variables / 1,000 obs. Registered: 500 variables / 5000 obs. (or contact us) Pro/Enterprise: 4Q2013 No limits, all features, disable usage report pricing/licensing t.b.d. In private beta (contact us for info) LavaCore: Currently internal only, planning for OEM
  • 11.  Future Features (demand driven) Desktop UI Support additional input/output formats (SAS, cloud) Support missing value Additional ID generators Compute subsets Synthetic data generators Disclosure control ??? DataForge Online (2014) Free and pay as you go services DataForge Services (available today) Community support in Google groups MTNA: tech support, custom developments IDMS: data/metadata management / processing
  • 12.  MTNA Project Stories Generated DDI for American National Election Studies (ANES) Loaded US General Social Survey (1972-2010) surveys into BigQuery (5K+ variables, 55K records, 2.2Gb, query in seconds) Generated MS-SQL and loaded data for KUSP TranslatingResearch in Elder Care (TREC) surveys (U. Alberta, Canada) Integration in NSF / NORC SED Manager (LavaCore) Integration in Canada RDC Dataset Builder (LavaCore) …
  • 13.  Caelum: A constellation in the southern sky, “chisel” in latin. Formerly known asCaelum Scalptorium, "the engravers chisel”. Rationale: I have DDI, then what…. Not everyone is an XML/technology expert. Use cases: Produce HTML/PDF reports, quality assurance, XML conversion,etc.. How does it work: Command line tool that wraps standards XML and othertransformation technologies. Can be used with any XML Comes with demo transforms with objective to foster community contributions Download at http://www.openmetadata.org/dataforge/caelum
  • 14. http://www.openmetadata.org/dataforgeDEMO

×