Uploaded on


More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Data Science Shankar Radhakrishnan Cognizant
  • 2. History… • Questions first, data later • Data model first, data processing later • Size first, project second, react overtime • Focus on accuracy, assume little • Importance to completeness and comprehensiveness • Expose raw data to decision makers • Provide insights but those that are not actionable • Bound by constraints (Procurement, Process, Build Insights, Interaction)
  • 3. What’s Changed ? • Medium to participate is vast • Mode to reach expanded • Data types are vast and voluminous • Noise is huge, yet accepted • Urgency precedes accuracy • Guidance is better than completeness • Cost to store and process has fallen (and still falling) • More ways and means to process data at scale
  • 4. Speaking of Data • Volume - Data at rest • Variety - Data in many forms • Velocity - Data in motion • Veracity - Data in doubt
  • 5. Data Science “ Data Science is the art of turning data into actions ” This is accomplished through creation of data products, that provide actionable information
 without exposing underlying data or analytics “ Scientific study of the creation, validation and transformation of data to create meaning ” http://www.datascienceassn.org/code-of-conduct.html
  • 6. While we are on definitions… Data Mining “ Non-trivial process of identifying valid, novel, potentially useful and understandable structures or patterns or models or relationships in data to enable data driven decision making ” Statistics “ Science of learning from data or of 
 making sense out of data ”
  • 7. Science of Data Science • Analyze and understand data that’s available • Find and acquire what more is needed • Discover what’s not known from data • Predict and build “actionable insights” from data • Build data products that has “immediate” business impact • Make it easy for business to “use” • Help decision making to drive “business value”
  • 8. Data Science Toolkit Python R Java Textwrangler SQL C, C++ Mahout NLTK OpenNLP GPText SciPy Pandas scikit-leam Hadoop Hive HAWQ PL/Python PL/R PL/Java Proprietary D3.js Gephi Graphviz R Tableau Proprietary Languages Libraries Database Visualization
  • 9. Approach, Techniques • Classification • Filtering • Structure • Clustering • Disambiguation • De-duplication • Normalization • Correlation • Prediction • Discover • Reason • Model • Deploy • Visualize • Recommend • Predict • Explore • Machine Learning • Decision Trees • Bayesian Networks • Logistic Regression • Monte Carlo Methods • Component Analysis • Fuzzy Modeling • Neural Networks • Genetic Algorithms Step Process Technology
  • 10. Data Science In Action • Improving User Experience • Multi-device event stream analysis • Intrusion detection, avoidance • Collocation analysis from 
 cell-phone towers • Text Mining, Bandwidth Throttling • Network Performance & Optimization • Mobile User Location Analytics • Customer Churn Prevention • Social Media and Sentiment Analysis • Location Based Initiatives
  • 11. Thanks !