Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Data Science by Prithwis Mukerjee 1199 views
- Introduction to Data Science by Caserta 1426 views
- Introduction to (Big) Data Science by InfoFarm 2109 views
- Demystifying Data Science with an i... by Julian Bright 997 views
- Introduction to Data Science by Anastasiia Kornilova 7028 views
- Introduction to Data Science and La... by Nik Spirin 648 views

1,047 views

Published on

No Downloads

Total views

1,047

On SlideShare

0

From Embeds

0

Number of Embeds

6

Shares

0

Downloads

32

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Dr. Bill Howe - Director of Research, Scalable Data Analytics
- 2. What is data science? ◦ Set of theories and principles to perform several data related tasks, like ◦ Data collection ◦ Data cleaning ◦ Data integration ◦ Data modeling ◦ Data visualization
- 3. Data science is different from ◦ Business intelligence ◦ Statistics ◦ Database management ◦ Visualization ◦ Machine Learning
- 4. DBA- Unstructured data Statistician – data that doesn’t fit in to memories Software engineer- statistical models and how to communicate results Business analyst- algorithms and tradeoff at scale
- 5. Common three skills of Data scientiest ◦ Statistics traditional analysis ◦ Data Munging parsing, scraping, and formatting data ◦ Visualization graphs, tools, etc.
- 6. Three types of tasks: ◦ Preparing to run a model ◦ Running the model ◦ Communicating the results
- 7. ◦ Preparing to run a model Gathering Cleaning Integrating Restructuring Transforming Loading Filtering
- 8. ◦ Running the model Choosing appropriate machine learning algorithms for regression, classification, clustering and recommendations. Validation of model Improvement of model ◦ Communicating the results
- 9. Breadth ◦ Mapreduce/Relational algebra/Logistic regression/visualization Depth ◦ Structure (Relational algebra)/ statics (linear algebra) Scale ◦ Desktop (R)/Cloud (Hadoop) Target ◦ Hackers(R,Java, python) /Analyts (little/no programming)
- 10. Scale – Cloud for Bigdata The bigdata can be measured by 3 V’s ◦ Volume – number of rows (size) ◦ Variety – number of columns OR sources (text, images, audio, video) ◦ Velocity - number of rows OR bytes per unit time (processing time )
- 11. “data exhaust” from customers new and pervasive sensors the ability to “keep everything”
- 12. Prior programming exercise ◦ SQL ◦ Python Basic statistics Basic database concepts
- 13. Twitter sentiment Analysis ◦ Extract the tweets from twitter API ◦ Calculate the sentiment score for tweets ◦ Calculate the sentiment score for terms in tweets ◦ Calculate frequency for terms of tweets ◦ Identify the happiest state ◦ Identify the top ten hastag

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment