Agile Data Science


Published on

Our experience in building a new Data Science graduate program at Istanbul Sehir University. Our key design principles are clearly laid out.

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Agile Data Science

  1. 1. Agile Data Science Dr. Ahmet Bulut ( Istanbul Sehir University, Istanbul, Turkey
  2. 2. Web ... • In the nineties, the Web served lots of static HTML pages created by a small set of people at select institutions and news agencies. • 21st century: the number of contributors and the amount of information has skyrocketed with the rise of platforms that enable rapid collaboration and personal contribution. • Web 3.0: MACHINES understanding, generating, and consuming information.
  3. 3. Skills Required! • Current environment awash with data. • Skills needed from undergraduates: (i) data analysis, (ii) idea generation, and (iii) hypothesis testing. • Raise awareness at K-12 level of what kind of undergraduate skills is being forged at the universities. • Skill need pressure will percolate down into K12.
  4. 4. Skills Gap • A recent IBM study highlights that roughly 1/4 of enterprises report having major skill gaps in four pivotal emerging technologies: (i) Mobile Computing, (ii) Cloud Computing, (iii) Social Business, and (iv) Business Analytics. Source: IBM developerWorks and IBM Center for Applied Insights, Tech Trends Study, November 2012.
  5. 5. Skills Mismatch Academia Skills Emphasis ? Industry Skills Need
  6. 6. Our “bridging” solution • In order to connect the academia and industry: a core set of classes that are designed to educate in areas where the faculty indicated as the most important skill set needed during the years personally spent in the industry.
  7. 7. Curriculum flavor • NO (-) to Programming Languages class. • YES (+) to broader Systems class. • the idea is to teach students how to run a web application on top of a database that may be distributed for handling increasing load or for enabling rapid data warehousing. • the goal is to expose students or drop them in the ocean (not in a sandbox environment).
  8. 8. Key design principles • (1): Leave little room for bloating the curriculum with unnecessary classes. • (2): Bridge the gap between undergraduate and graduate programs. • (3): Keep students engaged at all times. Pick a programming language for instruction that is versatile and agile.
  9. 9. Realization • (1): Leave little room for bloating the curriculum with unnecessary classes. ALGORITHMS SYSTEMS ARCHITECTURE MACHINE INTELLIGENCE SOFTWARE
  10. 10. Realization • (2): Bridge the gap between undergraduate and graduate programs. Graduate Program Data Engineering ... ... ... ... ... Undergraduate Program Programming Practice ... ... Dilute... ... ...
  11. 11. Realization • (3): Keep students engaged at all times. Pick a programming language for instruction that is versatile and agile. python
  12. 12. Fruits • Spring’ 13 - Programming Practice Class Projects: Project Description Movie Recommendation System Apply collaborative filtering learned in class on Netflix dataset. News Filter Provide news from multiple news sites in a form that is easy to digest. Use classification and textual properties to categorize data. Tweetpy Capture the relationship between social media and stock prices. Use statistics gathered to see if it can be used to predict the stock price. Use SQLite or Pickle to store data.
  13. 13. Future: Data Science Grad Program
  14. 14. Future: Data Science Grad Program (1) Data Engineering: Information retrieval and data engineering on practical applications. (2) Networks: Graph & Game theoretic analysis of Web, Social Networks, and Sponsored Search Markets. (3) Data Visualization: Techniques to visualize high-dimensional data for insight discovery. (4) Scalable Systems: How to build consumer facing Web systems that can scale. (5) Big Data Analysis: Tools used for analyzing Big Data. (6) Probabilistic Graphical Networks: Establish relationships between entities and objects for probabilistic inference. (7) Machine Learning: Theory behind well-established classification, regression, and clustering methodologies. (8) Linear Dynamical Systems: Representation of dynamic systems in state space to understand their evolution over time. (9) Optimization: Techniques used to optimize real world problems with real constraints.
  15. 15. Thank you! • Dr. Ahmet Bulut Department of Computer Science Istanbul Sehir University 34660 Istanbul, Turkey e-mail: phone: +90 216 559 9089