Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MTNA DataForge

290 views

Published on

IASSIST 2013 presentation of DataForge.

  • Be the first to comment

  • Be the first to like this

MTNA DataForge

  1. 1. IASSIST 2013
  2. 2. The <Meta>Data Dance
  3. 3. The <Meta/>Data Dance
  4. 4. The <Meta>Data Dance
  5. 5.  Tools to facilitate statistical/scientific data management Complement existing tools (not replace) Facilitate adoption of standards and promote best practices Alleviate the need to master complexities of metadata standards,technologies Vision: web based services (cloud users), stand alone desktop /serverproducts (agencies/enclaves), libraries (developers) Broad Target audience: Data Producers / Manager / Researchers /Users DataForge Today (lean model): Simple desktop based command line tools SledgeHammer - Data/metadata transformation LavaCore - Java library Caelum - Reporting / Publication DataForge Tomorrow: DataForge Online, Web Services, Desktop w/UI, Java Library
  6. 6.  Data Liberation Unlock from proprietary formats, turn into open data Produce metadata + convert data into standard ASCII formats Open <Meta>Data Integration Generating scripts for loading into stats packages, databases, bigdata engines, cloud Use with open systems / environments__ _ _/ _| | ___ __| | __ _ ___ / /__ _ _ __ ___ _ __ ___ ___ _ __ | |/ _ / _` |/ _` |/ _ / /_/ / _` | _ ` _ | _ ` _ / _ __|_ | | __/ (_| | (_| | __/ __ / (_| | | | | | | | | | | | __/ |__/|_|___|__,_|__, |___/ /_/ __,_|_| |_| |_|_| |_| |_|___|_||___/
  7. 7. triple-ssyntaxscript
  8. 8.  Challenges Read proprietary formats Data integrity ASCII optimization Cross package formatting issues SQL Database limits (columns, indexes, views) Performance (Timings, size, memory usage) …
  9. 9.  Key Features Input: SPSS, Stata, ASCII+SAS syntax, ASCII+DDI-C / DDI-L,ASCII+SSS 1.1/2.0, ASCII+StatTransfer Data Out: Fixed w/optimization, CSV, Delimited Metadata Out: DDI-C 1.2.2/Nesstar/2.1/2.5, DDI-L 3.1-RP/3.1-SU/Colectica, Triple-S 1.1/2.0 Summary Stats: Min / Max /Valid/Invalid /Mean, StdDev/Variance / Frequencies, Weighted *, Save in CSV*. Scripts Generators: SAS, SPSS, Stata, R, Mysql, Oracle*, MS-SQL*, HSQL*, Google BigQuery*, HP/Vertica* +dimension/lookup tables, indexes/foreign keys* See User’s Guide at http://goo.gl/9xukB
  10. 10.  Editions Freeware Anonymous: 100 variables / 1,000 obs. Registered: 500 variables / 5000 obs. (or contact us) Pro/Enterprise: 4Q2013 No limits, all features, disable usage report pricing/licensing t.b.d. In private beta (contact us for info) LavaCore: Currently internal only, planning for OEM
  11. 11.  Future Features (demand driven) Desktop UI Support additional input/output formats (SAS, cloud) Support missing value Additional ID generators Compute subsets Synthetic data generators Disclosure control ??? DataForge Online (2014) Free and pay as you go services DataForge Services (available today) Community support in Google groups MTNA: tech support, custom developments IDMS: data/metadata management / processing
  12. 12.  MTNA Project Stories Generated DDI for American National Election Studies (ANES) Loaded US General Social Survey (1972-2010) surveys into BigQuery (5K+ variables, 55K records, 2.2Gb, query in seconds) Generated MS-SQL and loaded data for KUSP TranslatingResearch in Elder Care (TREC) surveys (U. Alberta, Canada) Integration in NSF / NORC SED Manager (LavaCore) Integration in Canada RDC Dataset Builder (LavaCore) …
  13. 13.  Caelum: A constellation in the southern sky, “chisel” in latin. Formerly known asCaelum Scalptorium, "the engravers chisel”. Rationale: I have DDI, then what…. Not everyone is an XML/technology expert. Use cases: Produce HTML/PDF reports, quality assurance, XML conversion,etc.. How does it work: Command line tool that wraps standards XML and othertransformation technologies. Can be used with any XML Comes with demo transforms with objective to foster community contributions Download at http://www.openmetadata.org/dataforge/caelum
  14. 14. http://www.openmetadata.org/dataforgeDEMO

×