How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
"Applied data science in the industry: How to build a data science project in a corporate setting - best practices and a real-world example"
By Soraya Christina, Senior Data Scientist at Morgan Stanley
Abstract:
- Which platforms/technology to use for your analytics project and why? (Spark, Hadoop, vendor products, open source, Python, Scala, etc?)
- How to build your data science flow and what to avoid? (Occam's razor, testable and structured flow)
- How to presents results in a way business stakeholders understand them? (Making complex concepts easy to understand by business lines)
- A real-world example of a real-time failure prediction using Spark streaming and ML components.
The purpose of this talk is to present the challenges and solutions when building data science projects in a corporate environment. Generating insights for better business decision making is what drives data science projects. But working with business side by side, being able to build a reliable flow and properly communicating results and key elements are more than crucial, it is what will guarantee the success of your data science projects.
10 Tips for women to build a career in data scienceCarol Hargreaves
This presentation highlights the 10 things women should focus on when building a career in Data Science. Starting with the business question is key. Talking to the business users, business managers. stakeholders to understand the business question and how the results will impact the different employee roles is most important. Next is using only the relevant data to solve the business problem. After that, we should have good evaluation methods to ensure the analytical solution is sound. And lastly, but not least, show how the analytical results and models impact business in terms of its revenue, profitability, and operational efficiency.
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analyticsjjoseph100
Survey of 425 analytic professionals- those that are making big data and analytics work within organizations - to see if they have the skills needed to push analytics further and/or to identify the skills most needed and how people are developing them.
Data science is not Software Development and how Experiment Management can ma...Jakub Czakon
Working on data science projects that are run as if they were software development can sometimes feel like trying to fit a square peg in a round hole. In this talk, I will explain why that happens and what people do to try and fix it. Lately, in the context of machine learning, the concept of experiment management, which treats ml experiments as first-class citizens, has been gaining a lot of traction. I will discuss what it is, what are the benefits of using it, and how you can apply it in your work to make run your projects more efficiently.
Prt 1 of the Cause nd effect workshop. This claass will intorduce the use of C&E in business problem-solving and the use of tools like the Fishbone (or Ishakawa) diagram.
How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
"Applied data science in the industry: How to build a data science project in a corporate setting - best practices and a real-world example"
By Soraya Christina, Senior Data Scientist at Morgan Stanley
Abstract:
- Which platforms/technology to use for your analytics project and why? (Spark, Hadoop, vendor products, open source, Python, Scala, etc?)
- How to build your data science flow and what to avoid? (Occam's razor, testable and structured flow)
- How to presents results in a way business stakeholders understand them? (Making complex concepts easy to understand by business lines)
- A real-world example of a real-time failure prediction using Spark streaming and ML components.
The purpose of this talk is to present the challenges and solutions when building data science projects in a corporate environment. Generating insights for better business decision making is what drives data science projects. But working with business side by side, being able to build a reliable flow and properly communicating results and key elements are more than crucial, it is what will guarantee the success of your data science projects.
10 Tips for women to build a career in data scienceCarol Hargreaves
This presentation highlights the 10 things women should focus on when building a career in Data Science. Starting with the business question is key. Talking to the business users, business managers. stakeholders to understand the business question and how the results will impact the different employee roles is most important. Next is using only the relevant data to solve the business problem. After that, we should have good evaluation methods to ensure the analytical solution is sound. And lastly, but not least, show how the analytical results and models impact business in terms of its revenue, profitability, and operational efficiency.
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analyticsjjoseph100
Survey of 425 analytic professionals- those that are making big data and analytics work within organizations - to see if they have the skills needed to push analytics further and/or to identify the skills most needed and how people are developing them.
Data science is not Software Development and how Experiment Management can ma...Jakub Czakon
Working on data science projects that are run as if they were software development can sometimes feel like trying to fit a square peg in a round hole. In this talk, I will explain why that happens and what people do to try and fix it. Lately, in the context of machine learning, the concept of experiment management, which treats ml experiments as first-class citizens, has been gaining a lot of traction. I will discuss what it is, what are the benefits of using it, and how you can apply it in your work to make run your projects more efficiently.
Prt 1 of the Cause nd effect workshop. This claass will intorduce the use of C&E in business problem-solving and the use of tools like the Fishbone (or Ishakawa) diagram.
Slides for the presentation given at the Data Science Scotland Meetup (https://www.meetup.com/Scotland-Data-Science-Technology-Meetup/events/256269263/).
This talk aimed to give some general advice, tips, and tricks about how to run a successful data science project.
Hosted by:
Incremental Group - https://www.linkedin.com/company/incremental-group/
MBN Solutions - https://www.linkedin.com/company/mbn-recruitment-solutions/
The Datalab - https://www.linkedin.com/company/the-data-lab-innovation-centre/
Claudia Gold: Learning Data Science Onlinesfdatascience
Claudia Gold, author of the Data Analysis Learning path on SlideRule, talks about why she wrote it and how to approach learning data science on your own. https://www.mysliderule.com/learning-paths/data-analysis/
Supporting innovation in insurance with randomized experimentationDomino Data Lab
Recent technological advances, a dynamic competitive landscape, and an evolving regulatory environment have led to a period of rapid innovation for many insurance providers. Here, we’ll explore how data scientists may use randomized experiments to rigorously assess the causal impact of innovations on business outcomes. Particular emphasis will be placed on experimentation in “offline” channels, with some of the challenges and mitigation strategies highlighted.
Discussion on Transparency in data science: “Data science that provides transparency – How to clarify answers such that they become indisputable?” (session leader prof.dr.ir. Jack van Wijk)
Presentation to FourthLion (my former employer) staff on some lessons learnt while doing analytics across three domains, and the motivation for automation. Data science will IMO (a) have significant growing pains (b) see evolution similar to those that we saw in software engineering.
What you till learn:
GOALS - What is the bar for data science teams
PITFALLS - What are common data science struggles
DIAGNOSES - Why so many of our efforts fail to deliver value
RECOMMENDATIONS - How to address these struggles with best practices
Presented by Mac Steele
Director of Product at Domino Data Lab
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Edureka!
** Data Science Master Program: https://www.edureka.co/masters-program/data-scientist-certification **
This Edureka "Data Scientist Roles and Responsibilities" PPT talks about the various Job Descriptions and specific skill sets for the different kinds of Data Scientists that are there. It explains why Data Science is the best career move, right now. Learn about various job roles and what they actually mean and the learning path to make a career in Data Science. Below are the topics covered in this module:
What is Data Science?
Who is a Data Scientist?
Types of Data Scientists
Skills Required to Become a Data Scientist
Data Science Masters Program @Edureka
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Instagram: https://www.instagram.com/edureka_lea...
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Discussion on Accuracy in data science: “Data science without guesswork – How to answer questions with a guaranteed level of accuracy?” (session leader prof.dr. Mykola Pechenizkiy).
To future-proof responsible data science methods, foundational research is needed, and, given the complementarity of TU/e and TiU in JADS, there are great opportunities to collaborate on this theme. This was reflected by the JADS Workshop on Responsible Data Science
COEPD - Center of Excellence for Professional Development is a primarily a Business Analyst Training Institute in the IT industry of India head quartered at Hyderabad. COEPD is expert in Business Analyst Training in Hyderabad, Chennai, Pune , Mumbai & Vizag. We offer Business Analyst Training with affordable prices that fit your needs.
COEPD conducts 4-day workshops throughout the year for all participants in various locations i.e. Hyderabad, Pune. The workshops are also conducted on Saturdays and Sundays for the convenience of working professionals.
For More Details Please Contact us:
Visit at http://www.coepd.com or http://www.facebook.com/BusinessAnalystTraining
Center of Excellence for Professional Development
3rd Floor, Sahithi Arcade, S R Nagar,
Hyderabad 500 038, India.
Ph# +91 9000155700,
helpdesk@coepd.com
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
Analytics and data science are ever growing fields, as business decision makers continue to use data to drive decisions. The pinnacle of these fields are the models and their accuracy/fit,; what about the data? Is your data clean, and how do you know that? Our discussion will focus on best practices for data preprocessing for analytic uses. Beginning with essential distributional checks of a dataset to a propose method for automated data validation process during ETL for transactional data.
Data Science. Business Analytics is the statistical study of business data to gain insights. Data science is the study of data using statistics, algorithms and technology. Uses mostly structured data. Uses both structured and unstructured data.
Slides for the presentation given at the Data Science Scotland Meetup (https://www.meetup.com/Scotland-Data-Science-Technology-Meetup/events/256269263/).
This talk aimed to give some general advice, tips, and tricks about how to run a successful data science project.
Hosted by:
Incremental Group - https://www.linkedin.com/company/incremental-group/
MBN Solutions - https://www.linkedin.com/company/mbn-recruitment-solutions/
The Datalab - https://www.linkedin.com/company/the-data-lab-innovation-centre/
Claudia Gold: Learning Data Science Onlinesfdatascience
Claudia Gold, author of the Data Analysis Learning path on SlideRule, talks about why she wrote it and how to approach learning data science on your own. https://www.mysliderule.com/learning-paths/data-analysis/
Supporting innovation in insurance with randomized experimentationDomino Data Lab
Recent technological advances, a dynamic competitive landscape, and an evolving regulatory environment have led to a period of rapid innovation for many insurance providers. Here, we’ll explore how data scientists may use randomized experiments to rigorously assess the causal impact of innovations on business outcomes. Particular emphasis will be placed on experimentation in “offline” channels, with some of the challenges and mitigation strategies highlighted.
Discussion on Transparency in data science: “Data science that provides transparency – How to clarify answers such that they become indisputable?” (session leader prof.dr.ir. Jack van Wijk)
Presentation to FourthLion (my former employer) staff on some lessons learnt while doing analytics across three domains, and the motivation for automation. Data science will IMO (a) have significant growing pains (b) see evolution similar to those that we saw in software engineering.
What you till learn:
GOALS - What is the bar for data science teams
PITFALLS - What are common data science struggles
DIAGNOSES - Why so many of our efforts fail to deliver value
RECOMMENDATIONS - How to address these struggles with best practices
Presented by Mac Steele
Director of Product at Domino Data Lab
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Edureka!
** Data Science Master Program: https://www.edureka.co/masters-program/data-scientist-certification **
This Edureka "Data Scientist Roles and Responsibilities" PPT talks about the various Job Descriptions and specific skill sets for the different kinds of Data Scientists that are there. It explains why Data Science is the best career move, right now. Learn about various job roles and what they actually mean and the learning path to make a career in Data Science. Below are the topics covered in this module:
What is Data Science?
Who is a Data Scientist?
Types of Data Scientists
Skills Required to Become a Data Scientist
Data Science Masters Program @Edureka
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Instagram: https://www.instagram.com/edureka_lea...
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Discussion on Accuracy in data science: “Data science without guesswork – How to answer questions with a guaranteed level of accuracy?” (session leader prof.dr. Mykola Pechenizkiy).
To future-proof responsible data science methods, foundational research is needed, and, given the complementarity of TU/e and TiU in JADS, there are great opportunities to collaborate on this theme. This was reflected by the JADS Workshop on Responsible Data Science
COEPD - Center of Excellence for Professional Development is a primarily a Business Analyst Training Institute in the IT industry of India head quartered at Hyderabad. COEPD is expert in Business Analyst Training in Hyderabad, Chennai, Pune , Mumbai & Vizag. We offer Business Analyst Training with affordable prices that fit your needs.
COEPD conducts 4-day workshops throughout the year for all participants in various locations i.e. Hyderabad, Pune. The workshops are also conducted on Saturdays and Sundays for the convenience of working professionals.
For More Details Please Contact us:
Visit at http://www.coepd.com or http://www.facebook.com/BusinessAnalystTraining
Center of Excellence for Professional Development
3rd Floor, Sahithi Arcade, S R Nagar,
Hyderabad 500 038, India.
Ph# +91 9000155700,
helpdesk@coepd.com
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
Analytics and data science are ever growing fields, as business decision makers continue to use data to drive decisions. The pinnacle of these fields are the models and their accuracy/fit,; what about the data? Is your data clean, and how do you know that? Our discussion will focus on best practices for data preprocessing for analytic uses. Beginning with essential distributional checks of a dataset to a propose method for automated data validation process during ETL for transactional data.
Data Science. Business Analytics is the statistical study of business data to gain insights. Data science is the study of data using statistics, algorithms and technology. Uses mostly structured data. Uses both structured and unstructured data.
Whether you are a beginner, a transient, or a data scientist, this plan addresses each individual's needs. You can learn data science in a year if you follow this process.
6 steps to start your artificial intelligence projectTropos.io
Working in data analytics for fortune 500 companies, we've distilled a practical framework to discover opportunities in data analytics projects in 6 high level steps.
Business leaders everywhere are looking to data to inform their decision making. Accompanying this demand are misunderstandings of what it takes to transform data into something that can inform a decision. What is the data infrastructure required? In this talk, I'll dispel some of these misunderstandings and discuss what it takes to build good data infrastructure. I'll discuss the components of a good data infrastructure. The best practices and available tools for gathering data, processing it, storing it, analyzing it and communicating the results. The goal is for these components to create a data infrastructure which can evolve from simple reporting to sophisticated insights for decision making.
Presented at OpenWest 2018
Product Management in the Era of Data ScienceMandar Parikh
My slide-deck from a webinar on the same topic for the Institute of Product Leadership, April 4th, 2017
What does it take to build killer products in the “AI-first” era? What makes for a great Data Science-driven product and how do great Product Managers leverage Data Science to drive value for customers? Find out how to avoid the pitfalls of hype-chasing Data Science tactics. Learn how to work with Data Science and Engineering to build a compelling product and solve real problems.
Mandar takes a practitioner’s approach to present his recipe for success for building Data Science-driven products that drive enduring value for customers.
If you’re learning data science, you’re probably on the lookout for cool data science projects. Look no further! We have a wide variety of guided projects that’ll get you working with real data in real-world scenarios while also helping you learn and apply new data science skills.
The projects in the list below are also designed to help you get a job! Each project was designed by a data scientist on our content team, and they’re representative examples of the real projects working data analysts and data scientists do every day. They’re designed to guide you through the process while also challenging your skills, and they’re open-ended so that you can put your own twist on each project and use it for your data science portfolio.
You can complete each project right in your browser, or you can download the data set to your computer and work locally! If you work on our site, you’ll also be able to download your code at any time so that you can continue locally, or upload your project to GitHub.
The sky is the limit here and what you decide to look into further is completely up to you and your imagination!
1. Learning by Doing
Learning by doing refers to a theory of education expounded by American philosopher John Dewey. It is a hands-on approach to learning, meaning students must interact with their environment in order to adapt and learn. This way of learning sharpen your current skills and knowledge and also helps in gaining new skills that could only be acquired by doing.
Car driving is a perfect example of this, you can read as much as you would like about the theory of driving and the rules, and this is very important, and the more you understand the theory the better you get in the practical part. But you will only be able to drive better by applying this knowledge on the real road. In addition to that, there are some skills and knowledge that will be only gained by actually driving.
Data science is the same as driving. It is very important to have solid theoretical knowledge and to regularly increase them to be able to get better while working on a project. However, you should always apply this theoretical knowledge to projects. By this, you will deepen your understanding of these concepts and Knowledge, have a better point of view of how they work in a real-life, and will also show others that you have strong theoretical knowledge and are able to put them into practice.
There are different types of guided projects. One of them is a guided project for
There are a lot of benefits for it:
It removes the barriers between you and doing projects
Saves you much time thinking about the project and preparing the data.
It allows you to apply the theoretical knowledge without getting distracted by obstacles.
Practical tips that can save your effort and time in the future.
#datasciencefree
#rohitdubey
#teachtechtoe
#linkedin.com/in/therohitdubey
The pioneers in the big data space have battle scars and have learnt many of the lessons in this report the hard way. But if you are a general manger & just embarking on the big data journey, you should now have what they call the 'second mover advantage’. My hope is that this report helps you better leverage your second mover advantage. The goal here is to shed some light on the people & process issues in building a central big data analytics function
Data Science has become one of the most demanded jobs of the 21st century. It has become a buzzword that almost everyone talks about these days. But what is Data Science? In this article, we will demystify Data Science, the role of a Data Scientist and have a look at the tools required to master Data Science.
Data science is a field of study wherein data is analyzed using some specific parameters and decision is taken based on the pattern and results that are generated after the analysis. It is an interdisciplinary science that involves using scientific methods, algorithms and processes to study the available data and gain knowledge. Crampete Data Science Course shows how to become a Data Scientist from scratch.
A data scientist is a person who uses a mixture of different concepts from mathematics, statistics, information science and business intelligence to write algorithms for analyzing data. The results of the analysis are used by organizations to make smarter business decisions. In general, a data scientist needs to know how to code so that they can write scripts used to process the data.
http://www.crampete.com/
According to recent research report by Wall Street Journal, AI project failure rates near 50%, more than 53% terminates at proof of concept level and does not make it to production. Gartner report says that nearly 80% of the analytics projects are not delivering any business value. That means for every 10 projects, only 2 projects are useful to the organization. Let us pause here a moment, rather than looking at what makes AI projects to fail, let’s look at the challenges involved in AI projects and find a solution to overcome these challenges.
AI projects are different from traditional software projects. Typical software projects, as shown in Figure 1, consist of well-defined software requirements, high level design, coding, unit testing, system testing, and deployment along with beta testing or field testing. Now, organizations are adopting Agile process instead of traditional V or waterfall model, but still steps mentioned are valid.
However, AI and Machine Learning projects’ methodology is different from the above. Our experience working on many AI/ML projects has given us insights on some of the challenges of executing AI projects. Also, we are in regular touch with senior executives and thought leaders from different industries who understand the success formula. The following discussion is based on our practical experience and knowledge gained in the field.
Successful execution of AI projects depends on the following factors:
1. Clearly aligned Business Expectations
2. Clarity on Terminologies
3. Meeting Data Requirements
4. Tools and Technology
5. Right Resources
6. Understanding Output Results
7. Project Planning and the Process
Want to learn data analytics or just grab the information about data analytics and its future? https://coursedekho.com/data-analytics-courses-in-surat/
The significance of Data Science has impressively increased over recent years. The contemporary period is the intersection of data analytics with emerging technologies that involve artificial intelligence (AI), machine learning (MI), and automation. And these three things have an ocean of career opportunities. In this post, I am sharing with you some best Data Analytics Courses in Surat, with a detailed course curriculum and placements guarantee.
#education
#data
#DataAnalytics
#DataScience
#DataCourse
#AnalyticsCourses
#AnalyticsCourse
#DataScienceCourse
#DataScienceCourses
#CoursesInIndia
#DataJob
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
1. 10 tips
from a young
data scientist
Nuno Carneiro (nc@nunocarneiro.com)
www.linkedin.com/in/nunocarneiro
1
17/04/2018
2. 1. Data is here to stay
2. There is abundant learning material online
3. There are many types of data scientists
4. It usually looks like: Understand - Data prep - Analyze - Deploy/ Recommend
5. Data prep is 80% of the work
6. Start with the end
7. Projects with quick feedback time are easier
8. Be thorough with your analysis
9. Build insights and provide recommendations
10. Do data science for good
+ Extra: Description of a case study
Agenda
5. 1. Data is here to stay
The amount of available data is growing exponentially. Making use of it will
be crucial for any successful company.
6. 2. There is abundant learning material online
Learn Python:
● Code Academy
Learn Machine Learning:
● Coursera ML Course
● Coursera Deep Learning Course
● Caltech ML course
Learn Python + data science:
● Dataquest
Work on real projects/ compete with
others:
● Kaggle
● Numerai
● DrivenData
For anything else: Google + Stackoverflow
7. 3. There are many types of data scientists
● Classification
● Recommender Systems
● Time-series analysis
● Regression
● Forecasting
● NLP
● ...
Data Science fields
● Business analyst
● Data engineer
● Quant
● Consultant
● ...
Backgrounds:
● Statistics
● Business
● Software Engineering
● ….
Data Science profiles
Data scientists perform very different tasks according to the problem they are trying
to solve. Being a data scientist can mean travelling the world as a consultant or being in
a research office the whole day.
8. 4. It usually looks like this:
Understand Data prep. Analyze
Recommend
or
Deploy
When you are presenting recommendations or deploying a model, decision makers will
be looking for an intuitive explanation to your conclusions. It is very important to help
them make sense of it by explaining your methodology.
9. 5. Data preparation is 80% of the work
Data preparation is a non-linear process that takes the most time in any
data science project. Don’t underestimate the effort it takes.
Extract data Transfer files
Combine
files
Understand
variables
Create new
variables
Treat errors Define target
Document
data
treatment
Select usable
data
Generate
datasets for
analysis
10. 6. Start with the end
The first step in any data science project is to understand how the outcome will be used.
● What are you trying to learn?/ predict?
● How is the outcome going to be used?
● Which performance measures will be used as success criteria?
● How will the outcome have an impact on the business/ on people’s lives? For example,
you can optimize the prevention of churning customers, but this can drain the business
from all its customers.
For example, in classification exercises, Target definition is one the first and the most important steps. It
requires a good understanding of all the questions mentioned above.
When you start any data science project, the first task should be to
understand the business and how the project outcome will be used.
11. 7. Projects with quick feedback time are easier
Target
definition
Prediction
ActionFeedback
If your project has fast feedback cycles, it will be easier to get an
advantage out of machine learning.
● Example of long feedback cycle: credit default prediction
(30 year mortgages…);
● Example of quick feedback cycle: daily sales forecasting.
Besides, while under development, your project should also get
feedback from external stakeholders as fast as possible:
● Iterate fast;
● Interact with your client (external or internal) at every
step of the way.
12. 8. Be thorough with your analysis
A small mistake in the code can easily lead to wrong
conclusions.
Most often, a small mistake in thinking, like a wrongly
defined target or inclusion of future information, will be the
main source of errors.
If you are doing something that few people understand, your most important
currency is trust. Be very thorough with your analysis to prevent mistakes which
could make you lose that trust.
13. 9. Build insights and provide recommendations
Many data scientists get lost in analysis and fail to draw conclusions from their work.
It is very rare to find a data scientist who combines business understanding with analytical
skills and domain over the data science tools (coding, ML, etc.).
So, what?
“So, what?” - Always ask yourself what is the conclusion of your work. If you do statistical
tests, plot data in charts, or analyze prediction backtest results, you should always aim at
two things: 1) Building insights; 2) Providing recommendations.
14. 10. Do data science for good
Use your skills for Good. Even if you have good intentions, watch out for
unintended bad side effects of your work.
PS: Check out the platform mentioned on Tip 2: DrivenData. Their slogan speaks for itself: “Data science competitions to save the world.”
Be careful when creating tools which will not only
predict but also influence future events.
Sometimes, small decisions we take when coding
can have a big impact in other people’s lives.