Data science

1,927 views
1,867 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,927
On SlideShare
0
From Embeds
0
Number of Embeds
1,219
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data science

  1. 1. Data Science Shankar Radhakrishnan Cognizant
  2. 2. History… • Questions first, data later • Data model first, data processing later • Size first, project second, react overtime • Focus on accuracy, assume little • Importance to completeness and comprehensiveness • Expose raw data to decision makers • Provide insights but those that are not actionable • Bound by constraints (Procurement, Process, Build Insights, Interaction)
  3. 3. What’s Changed ? • Medium to participate is vast • Mode to reach expanded • Data types are vast and voluminous • Noise is huge, yet accepted • Urgency precedes accuracy • Guidance is better than completeness • Cost to store and process has fallen (and still falling) • More ways and means to process data at scale
  4. 4. Speaking of Data • Volume - Data at rest • Variety - Data in many forms • Velocity - Data in motion • Veracity - Data in doubt
  5. 5. Data Science “ Data Science is the art of turning data into actions ” This is accomplished through creation of data products, that provide actionable information
 without exposing underlying data or analytics “ Scientific study of the creation, validation and transformation of data to create meaning ” http://www.datascienceassn.org/code-of-conduct.html
  6. 6. While we are on definitions… Data Mining “ Non-trivial process of identifying valid, novel, potentially useful and understandable structures or patterns or models or relationships in data to enable data driven decision making ” Statistics “ Science of learning from data or of 
 making sense out of data ”
  7. 7. Science of Data Science • Analyze and understand data that’s available • Find and acquire what more is needed • Discover what’s not known from data • Predict and build “actionable insights” from data • Build data products that has “immediate” business impact • Make it easy for business to “use” • Help decision making to drive “business value”
  8. 8. Data Science Toolkit Python R Java Textwrangler SQL C, C++ Mahout NLTK OpenNLP GPText SciPy Pandas scikit-leam Hadoop Hive HAWQ PL/Python PL/R PL/Java Proprietary D3.js Gephi Graphviz R Tableau Proprietary Languages Libraries Database Visualization
  9. 9. Approach, Techniques • Classification • Filtering • Structure • Clustering • Disambiguation • De-duplication • Normalization • Correlation • Prediction • Discover • Reason • Model • Deploy • Visualize • Recommend • Predict • Explore • Machine Learning • Decision Trees • Bayesian Networks • Logistic Regression • Monte Carlo Methods • Component Analysis • Fuzzy Modeling • Neural Networks • Genetic Algorithms Step Process Technology
  10. 10. Data Science In Action • Improving User Experience • Multi-device event stream analysis • Intrusion detection, avoidance • Collocation analysis from 
 cell-phone towers • Text Mining, Bandwidth Throttling • Network Performance & Optimization • Mobile User Location Analytics • Customer Churn Prevention • Social Media and Sentiment Analysis • Location Based Initiatives
  11. 11. Thanks !

×