- 2. What is Data Science? Data Science is, in general terms, the extraction of knowledge from data
- 3. What is Data Science? Data is increasingly cheap and ubiquitous. We are collecting and analyzing data, unprecedented in variety, complexity and scale. At the same time, new technologies are emerging to organize and make sense of this avalanche of data.
- 4. What is Data Science? Data Science is an interdisciplinary subject employing concepts and techniques from mathematics, statistics, computer science and economics. It is used to identify patterns and regularities in data, affecting all aspects of work and society from medicine to marketing to scientific research.
- 5. Who is a Data Scientist? A data scientist is someone who is better at statistics than most software engineers and better at software engineering than most statisticians
- 6. Who is a Data Scientist? A Data Scientist is a professional with the training and curiosity to make discoveries while swimming in an ocean of data; communicating what they learn and suggesting its implications for new decisions.
- 7. Who is a Data Scientist? They identify and combine rich and potentially incomplete data sources, and bring structure to large quantities of formless data, making analysis possible. They engage decision makers in an ongoing conversation based on the implications of the data for products, processes, and decisions.
- 8. Who is a Data Scientist? ★ A Data Scientist should have solid quantitative and analytic skills Statistical Modelling Experimental Design Bayesian Inference Machine Learning Information Theory Complex Systems
- 9. Who is a Data Scientist? ★ A Data Scientist should be a good programmer Scripting: e.g. python Statistical Packages: e.g. R Databases: SQL and NoSQL MapReduce concepts Hadoop and Hive/Pig Computer Science
- 10. Who is a Data Scientist? In addition, a Data Scientist should ★ excel at communication and visualization ★ understand economics and business concepts ★ be curious and creative
- 11. Demand for Data Scientists
- 12. Demand for Data Scientists There is a growing demand for data-savvy professionals in businesses, public agencies, and nonprofits. There is a limited supply of professionals who can efficiently work with data at scale. Thus, the salaries for data engineers, data scientists, statisticians, and data analysts have increased rapidly.
- 13. A recent study by the McKinsey Global Institute estimates that there will be four to five million jobs in the U.S. requiring data analysis skills by 2018, and that large numbers of positions will only be filled through training or retraining.
- 14. In a survey of 816 data professionals in 53 countries, O’Reilly Media report a median annual salary for Data Science professionals as $98,000. SQL, R, Python and Excel are the top earning skills.
- 15. Data Science in India According to a survey by Gartner ★ In 2013, the Data Analytics market in India was $1.6 Billion with a growth rate of 8% ★ By 2018, the market is projected to be $3.7 Billion "For the fourth year in a row, analytics ranks as the No. 1 priority in Gartner's CIO [India] Survey." Bhavish Sood, research director at Gartner explains.
- 16. India is one of the strongest countries in the Data Science marketplace that boasts of clients including Facebook, GE, NASA, Tesco and Merck. It can potentially build a talent pipeline for data scientists that are virtually non-existent today. India will need 200,000 data scientists in the next few years. A single company, Wipro, already has as many as 8,000 people in analytics functions.
- 17. Data Science in India The median annual salary for a Data Scientists in India is Rs 670,665 The highest paying skills are Python, Machine Learning, Statistical Analysis, Big Data Analytics, and R.
- 18. Bengal Chamber proposes smart and green city for business analytics firms The Bengal Chamber of Commerce and Industry has taken an initiative to set up a smart city for business analytics in West Bengal. The project would involve service providers like KPMG Advisory Services and PricewaterhouseCoopers, corporate consumers, education institutions such as Indian Institute of Technology Kharagpur, the Indian Statistical Institute, and the Indian Institute of Management, Calcutta.
- 20. How can you be a Data Scientist? A Master’s degree is a natural route to be a Data Scientist. Massive Open Online Courses (MOOCs) give access to self-learning at a low cost (often free), but leave it to the student to identify a suitable set of courses and tools to round out a coherent skill set. Bootcamps offer students a practical and structured learning environment at a far more affordable rate compared with obtaining a Master’s Degree.
- 21. Master’s Degree Duration 9 - 20 months Faculty University Professors Learning Theory and Assignments Outcome Degree Projects Practicum and Internship Placement University Recruiting Examples UC Berkeley, NYU, NCSU IIT+IIM+ISI Tuition $20,000 - $70,000 (US) ₹20,000,000 (India)
- 22. Self-Learning (MOOCs) Duration 6 - 18 months (part time) Faculty University Professors (recorded lectures) Learning Self guided Outcome Certificate Projects Projects on own time Placement Self-driven job search Examples Coursera, Udacity Tuition Free- $500 (US)
- 23. Bootcamps Duration 2 - 3 months Faculty Professors & Data Scientists Learning Experiential Learning Outcome Certificate and Portfolio Projects Built-In Projects Placement Hiring Day and Placement Assistance Examples Zipfan, Metis, Data Incubator Tuition Free - $16,000 (US)
- 24. The Course Data+Science: A First Course is an intensive eight-week program based on the bootcamp model, organized by The Data+Science Initiative. It is designed to teach and train graduates in quantitative fields to take an entry-level position as a data scientist.
- 25. Objectives of the Course Upon graduating a student will: 1. Have a clear understanding of and practical experience with the process of designing, implementing, and communicating the results of a data science project. 2. Understand the landscape of data science tools and their applications, and be prepared to identify and dig into new technologies and algorithms needed for the job at hand.
- 26. Overview Data science gives valuable meaning to large sets of complex and unstructured data. The focus is around concepts and techniques to mine, store, analyse and visualize data. Data science is a highly interdisciplinary drawing from fields such as computer science (algorithms and databases), statistics (hypothesis testing and inference), artificial intelligence (pattern recognition and machine learning).
- 27. Course Content Data Mining (⅛): identifying data sources; extracting, cleaning and verifying structured and unstructured data Data Storage (¼): structuring, storage and retrieval of data; including big data and NoSQL Data Analysis (½): descriptive and inferential analysis; predictive modelling, risk analysis and decision making Data Visualization (⅛)
- 28. Course Content Graduating students will: 1. Be proficient in statistical concepts and mathematical techniques including correlation functions, inference and hypothesis testing. 2. Be able to make predictive analyses by modelling stochastic processes based on available data. 3. Learn and apply Machine Learning concepts to solve data science problems
- 29. Course Content 4. Be capable coders in Python and R, including the related packages and toolsets most commonly used in data science. 5. Know the fundamentals of data visualization and have experience creating static and dynamic data visuals using JavaScript and D3.js. 6. Have introductory exposure to big data tools and architecture such as the Hadoop stack, know when these tools are necessary, and be poised to quickly train up and utilize them in a big data project.
- 30. Prerequisites Basic Statistics and Probability descriptive statistics and distributions Linear Algebra vectors and matrices Calculus and Differential Equations basic calculus and finding extrema, ordinary differential equations Programming basic proficiency in any programming language
- 31. Preferred Subjects Computer Science algorithms, data structures and databases Advanced Statistics bayesian inference and stochoastic processes Statistical Mechanics/Information Theory entropy, information, complexity Economics supply/demand, game theory Web Development HTML, CSS and Javascript
- 32. Eligibility Anyone meeting the prerequisite criteria is eligible, determined by a qualifying exam, with preference given to those with knowledge of the preferred subjects. However, we would prefer applicants to have a bachelor’s degree in a quantitative field, such as: Engineering, Physics, Mathematics, Statistics, Economics or Computer Applications.
- 33. Course Details The course consists of 24 classes over 8 weeks. Each class (Mondays, Wednesdays, Fridays) is 6 hours in duration (10AM-4PM) including a lunch hour. Morning sessions consists of lectures and discussions while the afternoons is a guided programming session. In addition, instructors will be available for office hours at scheduled times.
- 34. Course Projects The course is divided into three parts. Part A (Weeks 1-4): daily programming projects executed individually or in groups Part B (Weeks 5-8): weekly projects in groups drawn from the industry Part C (Weeks 9-11, optional): course project in groups with biweekly meetings with instructors
- 35. Benefits Employment: Students will have the skill set and portfolio to find employment as an entry level data scientist. Such a skill set is in great demand, both domestically as well as in developed countries. Research: Since Data Science is at the core of academic research, our students, armed with the knowledge, portfolio and recommendation will find easier admission to universities, especially abroad.