Towards a Big Data Taxonomy
Bill Mandrick, PhD
Data Tactics
Version 26_August_2013
Scientific Taxonomies Represent
• Types of Processes
• Types of Objects
– Physical Objects
– Information Artifacts
• Types...
Big Data Taxonomy
• Big Data Related Processes
• Big Data Characteristics
• Big Data Information Artifacts
• Big Data Info...
Relations Between Processes
• Processes A <relation> Processes B
– Complex Process <has part> Sub-Process
– Sub-Process <p...
Information Artifact Lifecycle Processes
• Collecting
• Curating
• Representing
• Storing
– Cluster Storing
• Managing
– P...
Big Data Processes
6
Big Data Processes can be
decomposed and related to
other (sub)processes
…as well as to their outputs...
Relating Processes to Products
7
Big Data Information Artifacts
8
9
10
Information Content Entities
11
Use Case
Data Characteristics
12
Information Bearers
13
Partial Taxonomy
14
Human Genome Data
15
Terms from Human Genome Data Use Case
Use Case Term:
Genomic Measurements
Reference Materials
Reference Data
Reference Met...
Information Artifacts:
Human Genome Data Measurement Result
Characterization (Data Characterization, IA or ICE)
Whole Huma...
Genomic Research Organizations
18
Instances
DNA Data Sets
19
Instances
DNA Organizational Roles
20Instances
Agent Roles
21
DNA Visualization
22
Instances
Conclusion
• This method can be done for any part of the
Big Data Taxonomy
• Need SME input for various areas/domains
• Ne...
Upcoming SlideShare
Loading in...5
×

Big Data Taxonomy 8/26/2013

259

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
259
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data Taxonomy 8/26/2013"

  1. 1. Towards a Big Data Taxonomy Bill Mandrick, PhD Data Tactics Version 26_August_2013
  2. 2. Scientific Taxonomies Represent • Types of Processes • Types of Objects – Physical Objects – Information Artifacts • Types of Characteristics – Qualities – Roles • Relationships – Between Processes – Between Objects – Between Characteristics 2
  3. 3. Big Data Taxonomy • Big Data Related Processes • Big Data Characteristics • Big Data Information Artifacts • Big Data Information Bearers • Relationships between Big Data Elements • Mapping Instances to the Taxonomy • Creating Situational Awareness 3
  4. 4. Relations Between Processes • Processes A <relation> Processes B – Complex Process <has part> Sub-Process – Sub-Process <part of> Complex Process – Process A <precedes> Process B – Process A <follows> Process B Examples: Data Curation Process <has part> Data Selection Process Data Curation Process <has part> Data Collection Process Data Curation Process <has part> Data Archiving Process 4
  5. 5. Information Artifact Lifecycle Processes • Collecting • Curating • Representing • Storing – Cluster Storing • Managing – Processing • Distributed Processing – Map Reduce • Analyzing – Data Mining – Causal Analysis – Probabilistic Analysis – Correlation Analysis • Data Collection Process • Data Curation Process • Data Representation Process • Data Storing Process – Cluster Storing Process • Data Management Process – Processing • Distributed Data Process – Map Reduce Process • Data Analytics Process – Data Mining Process – Causal Analysis Process – Probabilistic Analysis Process – Correlation Analysis Process Common Labels Taxonomy Labels 5
  6. 6. Big Data Processes 6 Big Data Processes can be decomposed and related to other (sub)processes …as well as to their outputs (Information Artifacts).
  7. 7. Relating Processes to Products 7
  8. 8. Big Data Information Artifacts 8
  9. 9. 9
  10. 10. 10
  11. 11. Information Content Entities 11 Use Case
  12. 12. Data Characteristics 12
  13. 13. Information Bearers 13
  14. 14. Partial Taxonomy 14
  15. 15. Human Genome Data 15
  16. 16. Terms from Human Genome Data Use Case Use Case Term: Genomic Measurements Reference Materials Reference Data Reference Methods Assess Performance Genome Sequencing Integrate Data Sequencing Technologies Sequencing Methods Characterization Whole Human Genomes Assess Performance Genome Sequencing Run Computer System Storage Networking Processing Software Open Source Sequencing Bioinformatics Software Data Source Sequencer Volume Variety Variability Veracity Visualization Data Quality Data Types Data Analytics Taxonomical Term: Genomic Measurement Result (Measurement Result) Reference Material Role Reference Data Role Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Data Sequencing Technology (Tool) Sequencing Method (Process) Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Process Genome Sequencing Run Computer System Data Storage Process Computer Networking Process Data Processing Process Software (IAO placement?) Bioinformatics Sequencing Software Data Source Role Sequencer Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic) Data Type Data Analytics Process 16
  17. 17. Information Artifacts: Human Genome Data Measurement Result Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Genome Sequence Software (IAO placement?) Data Visualization Processes: Human Genome Data Measurement Process Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Sequencing Method (Process) Data Characterization Process Performance Assessment Process Genome Sequencing Run Data Storage Process Computer Networking Process Data Processing Process Data Visualization Process Data Analytics Process Roles and Characteristics: Reference Material Role Reference Data Role Data Source Role Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic) Artifacts/Tools: Data Sequencing Technology (Tool) Computer System Computer Network Software (IAO placement?) Bioinformatics Sequencing Software Sequencer 17 Terms from Human Genome Data Use Case
  18. 18. Genomic Research Organizations 18 Instances
  19. 19. DNA Data Sets 19 Instances
  20. 20. DNA Organizational Roles 20Instances
  21. 21. Agent Roles 21
  22. 22. DNA Visualization 22 Instances
  23. 23. Conclusion • This method can be done for any part of the Big Data Taxonomy • Need SME input for various areas/domains • Need to add definitions in owl • Need to expand set of standardized relations • Link instances to the taxonomy (e.g. actual data sets, organizations, etc.) 23
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×