• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
What is Data Science

What is Data Science






Total Views
Views on SlideShare
Embed Views



3 Embeds 346

http://lonewolflibrarian.wordpress.com 323
http://paper.li 22
http://news.google.com 1


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Facebook friend connections worldwide, a network diagram of the Enron email set, a comparison of similar gene sequences between humans, chimps, and macaques
  • HW, FW, MW, SW: Hardware Firmware Middleware Software

What is Data Science What is Data Science Presentation Transcript

  • Jeffrey Stanton
    School of Information Studies
    Syracuse University
    What is Data Science?
  • BIG Data
  • Kilo, Mega, Giga, Tera, Peta, ExaZetta = 1021 bytes
    Over 95% of the digital universe is "unstructured data" – meaning its content can't be truly represented by its field in a record, such as name, address, or date of last transaction. In
    organizations, unstructured data accounts for more than 80% of all information.
    Source: IDC
    …An organization employing 1,000 knowledge workers loses $5.7 million annually just in time wasted having to reformat information as they move among applications. Not finding information costs that same organization an additional $5.3m a year.
    Source: IDC
  • Available data on a scale millions of times larger than 20 years ago: customer transactions; environmental sensor outputs; genetic and epigenetic sequences; web documents; digital images and audio
    Heterogeneous data sets, with different representations and formats; mixtures of structured and unstructured data; some, little, or no metadata; distributed across systems
    Chaotic information life cycle, where little time and effort is spent on what should be kept and what can be discarded
    Diverse and/or legacy infrastructure: mainframes running Cobol connected with high speed networks to sensor arrays running Linux
    Why Data Science?
  • How will global climate change affect sea levels in major coastal metropolitan areas worldwide?
    Does genetic screening reduce cancer mortality for adults between the ages of 50 and 59?
    What gene sequences in cereal grains are associated with greater crop yields in arid environments?
    How can we reduce false positives in automated airline baggage scans without reducing accuracy?
    What Internet data can be mined as predictive of firm creation among startups that provide new jobs?
    Critical Questions
  • Water sustainability
    Climate analysis and prediction
    Energy through fusion
    Hazard analysis and management
    Cancer detection and therapy
    Drug design and development
    Advanced materials analysis
    New combustion systems
    Virtual product design
    In silico semiconductor design
    “Big Data” Provides Answers
    NSF Advisory Committee for Cyberinfrastructure, Taskforce for Grand Challenges, Final Report, March 2011. http://www.nsf.gov/od/oci/taskforces/TaskForceReport_GrandChallenges.pdf
  • NSF Advisory Committee for Cyberinfra-structure, Taskforce for Grand Challenges, Final Report, March 2011. http://www.nsf.gov/od/oci/taskforces/TaskForceReport_GrandChallenges.pdf
    “All grand challenges face barriers due to challenges in software, in data management and visualization, and in coordinating the work of diverse communities that must work together to develop new models and algorithms, and to evaluate outputs as a basis for critical decisions.”
  • Knowledge Development
    for Industry, Education, Government, Research
    Domain Experts
    Infrastructure Professionals
    Information Organization & Visualization
    Expertise in specific subject areas
    Rapid pace of
    IT development
    Limited opportunity to
    master technology skills
    Limited expertise in domain areas
    Data Scientists
    Information Analysis
    Proliferation of big data & new technology
    Specialized knowledge of HW, FW, MW, SW
    Digital Curation
    Need for knowledge and information managers
    Communication challenges
    Data Scientists: Transforming Data Into Decisions
  • A Definition of A Data Scientist
    A data scientist uses deep expertise in the management, transformation, and analysis of large, heterogeneous data sets to:
    Help infrastructure experts with the architecture of hardware and software to manage big data challenges
    Help domain experts and decision makers reduce the data deluge into usable knowledge, visualizations, and presentations
    Help institutions and organizations control and curate data throughout the information lifecycle