•Download as PPTX, PDF•

11 likes•4,071 views

Overview of what big data is, how it is different than other data, what data scientists are/do, and what the benefit of data science can be.

Report

Share

Report

Share

Data Science Training | Data Science For Beginners | Data Science With Python...

Data Science Training | Data Science For Beginners | Data Science With Python...

introduction to data science

introduction to data science

Big Data Analytics

Big Data Analytics

BIG DATA and USE CASES

BIG DATA and USE CASES

Data Analytics Life Cycle

Data Analytics Life Cycle

What Is Data Science? | Introduction to Data Science | Data Science For Begin...

What Is Data Science? | Introduction to Data Science | Data Science For Begin...

Introduction to data analytics

Introduction to data analytics

Big data Analytics

Big data Analytics

Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...

Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...

Big data ppt

Big data ppt

Introduction of Data Science

Introduction of Data Science

Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...

Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...

Data mining techniques unit 1

Data mining techniques unit 1

Introduction to Data Science.pptx

Introduction to Data Science.pptx

Data Science

Data Science

7 steps to Predictive Analytics

7 steps to Predictive Analytics

Introduction to Data Analytics

Introduction to Data Analytics

Big data lecture notes

Big data lecture notes

Data Preprocessing

Data Preprocessing

Introduction to data science

Introduction to data science

Guide to MD/PhD programs

Guide to MD/PhD programs

Profiles of the Gifted

Profiles of the Gifted

The Neurobiology of Addiction

The Neurobiology of Addiction

Trauma and Alcoholism: Risk and Resilience

Trauma and Alcoholism: Risk and Resilience

Deep vs diverse architectures for classification problems

Deep vs diverse architectures for classification problems

Gender, Education, Skills, and Compensation in US Data Scientists

Gender, Education, Skills, and Compensation in US Data Scientists

Understanding the Profoundly Gifted

Understanding the Profoundly Gifted

Neuropsychopharmacology

Neuropsychopharmacology

Data Engineer vs Data Scientist vs Data Analyst.pptx

Data Engineer vs Data Scientist vs Data Analyst.pptx

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION

Which institute is best for data science?

Which institute is best for data science?

Best Selenium certification course

Best Selenium certification course

Data science training in hyd ppt (1)

Data science training in hyd ppt (1)

Data science training institute in hyderabad

Data science training institute in hyderabad

Data science training in Hyderabad

Data science training in Hyderabad

Data science training Hyderabad

Data science training Hyderabad

Data science online training in hyderabad

Data science online training in hyderabad

Data science training in hyd ppt (1)

Data science training in hyd ppt (1)

data science training and placement

data science training and placement

online data science training

online data science training

Data science online training in hyderabad

Data science online training in hyderabad

data science online training in hyderabad

data science online training in hyderabad

Best data science training in Hyderabad

Best data science training in Hyderabad

Data science training Hyderabad

Data science training Hyderabad

Data science training in hyd ppt converted (1)

Data science training in hyd ppt converted (1)

Data science training in hyd pdf converted (1)

Data science training in hyd pdf converted (1)

Data science training in hydpdf converted (1)

Data science training in hydpdf converted (1)

Advanced Analytics and Machine Learning with Data Virtualization (India)

Advanced Analytics and Machine Learning with Data Virtualization (India)

Generative AI for Social Good at Open Data Science East 2024

Generative AI for Social Good at Open Data Science East 2024

Hands-On Network Science, PyData Global 2023

Hands-On Network Science, PyData Global 2023

Modeling Climate Change.pptx

Modeling Climate Change.pptx

Natural Language Processing for Beginners.pptx

Natural Language Processing for Beginners.pptx

The Shape of Data--ODSC.pptx

The Shape of Data--ODSC.pptx

Generative AI, WiDS 2023.pptx

Generative AI, WiDS 2023.pptx

Emerging Technologies for Public Health in Remote Locations.pptx

Emerging Technologies for Public Health in Remote Locations.pptx

Applications of Forman-Ricci Curvature.pptx

Applications of Forman-Ricci Curvature.pptx

Geometry for Social Good.pptx

Geometry for Social Good.pptx

Topology for Time Series.pptx

Topology for Time Series.pptx

Time Series Applications AMLD.pptx

Time Series Applications AMLD.pptx

An introduction to quantum machine learning.pptx

An introduction to quantum machine learning.pptx

An introduction to time series data with R.pptx

An introduction to time series data with R.pptx

NLP: Challenges and Opportunities in Underserved Areas

NLP: Challenges and Opportunities in Underserved Areas

Geometry, Data, and One Path Into Data Science.pptx

Geometry, Data, and One Path Into Data Science.pptx

Topological Data Analysis.pptx

Topological Data Analysis.pptx

Transforming Text Data to Matrix Data via Embeddings.pptx

Transforming Text Data to Matrix Data via Embeddings.pptx

Natural Language Processing in the Wild.pptx

Natural Language Processing in the Wild.pptx

SAS Global 2021 Introduction to Natural Language Processing

SAS Global 2021 Introduction to Natural Language Processing

2021 American Mathematical Society Data Science Talk

2021 American Mathematical Society Data Science Talk

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊
How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊
The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills
At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊
At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505)
At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊
If you take an extra dose of misoprostol, it works about 99% of the time.
At 10-11 weeks pregnant, it works about 87% of the time. +966572737505)
If you take an extra dose of misoprostol, it works about 98% of the time.
In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only.
+966572737505
Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine.
+966572737505
The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion.
+966572737505
When can I take the abortion pill?+966572737505
In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505
Why do people choose the abortion pill?
Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them.
+966572737505
Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you.
+966572737505
More questions from patients:
Saudi Arabia+966572737505
CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505)
Unwanted Kit is a combination of two medicinesAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...

一比一原版麦考瑞大学毕业证成绩单如何办理

一比一原版麦考瑞大学毕业证成绩单如何办理

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证）成绩单学位证书留信学历认证

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证）成绩单学位证书留信学历认证

Heaps & its operation -Max Heap, Min Heap

Heaps & its operation -Max Heap, Min Heap

edited gordis ebook sixth edition david d.pdf

edited gordis ebook sixth edition david d.pdf

Generative AI for Trailblazers_ Unlock the Future of AI.pdf

Generative AI for Trailblazers_ Unlock the Future of AI.pdf

123.docx. .

123.docx. .

如何办理英国卡迪夫大学毕业证（Cardiff毕业证书）成绩单留信学历认证

如何办理英国卡迪夫大学毕业证（Cardiff毕业证书）成绩单留信学历认证

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...

Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...

Formulas dax para power bI de microsoft.pdf

Formulas dax para power bI de microsoft.pdf

一比一原版西悉尼大学毕业证成绩单如何办理

一比一原版西悉尼大学毕业证成绩单如何办理

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...

What is Insertion Sort. Its basic information

What is Insertion Sort. Its basic information

社内勉強会資料 Mamba - A new era or ephemeral

社内勉強会資料 Mamba - A new era or ephemeral

如何办理(WashU毕业证书）圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证

如何办理(WashU毕业证书）圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证

Atlantic Grupa Case Study (Mintec Data AI)

Atlantic Grupa Case Study (Mintec Data AI)

Seven tools of quality control.slideshare

Seven tools of quality control.slideshare

The Significance of Transliteration Enhancing

The Significance of Transliteration Enhancing

- 3. Oxford English Dictionary: ◦ “An all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications” Defined by volume, variety, velocity 2008 computer scientist predictions: ◦ Big Data will “transform the activities of companies, scientific researchers, medical practitioners, and our nation’s defense and intelligence operations” According to the New York Times: ◦ Big data science “typically means applying the tools of artificial application of intelligence, like machine learning, to vast new troves of data beyond that captured in standard databases”
- 4. Wider Longer Wider and Longer Complex subgroupings within wider or longer sets Many correlations Noisy Missing data
- 5. Computational challenges of storage and statistical program memory ◦ R space on a laptop is limited to 2 GB unless more RAM is added ◦ Algorithm computing time grows according to scaling rules, many of which are exponential. Thus, 2 GB takes 4 minutes, and 4 GB then takes 16 minutes… Statistical challenges from data structure ◦ Wide data violates many statistical assumptions. ◦ Correlations among predictors also violate statistical assumptions and creates problems with the underlying linear algebra calculation methods. ◦ Potential for lots of informative missing data that can’t be imputed using existing statistical methods.
- 6. More computing resources ◦ Expensive ◦ Cloud computing ◦ Does not solve statistical issues posed by big data New statistical methods ◦ Rely on a new set of tools from computer science ◦ Work around limitations of existing multivariate data analysis methods ◦ Don’t always scale as big data grows Still have computational issues Need for larger and larger training sets for good performance
- 7. Hadoop ◦ Open-source software for storage and processing of big data across computer cores/clusters ◦ Compatible with existing statistical software MapReduce ◦ Distributed computing strategy for big data processing and analyses ◦ Compute problem in parallel and combine final answers for shorter compute times SQL/NoSQL ◦ Relational database language for: Database construction/modifications Pulling pieces of data for further analyses/reporting R ◦ Free open-source software with existing machine learning algorithms and coding environment to create and test new machine learning algorithms Simulations ◦ Use data structure and relationship rules to create a dataset with pre- specified structure to it ◦ Allows for testing and validation of new algorithms against datasets with known answers ◦ Useful for comparing existing algorithms with new algorithms
- 8. Statistics ◦ Hypothesis testing (parametric and nonparametric) and experimental design ◦ Generalized linear models ◦ Longitudinal, time series, and survival models ◦ Bayesian methods Mathematics ◦ Multivariable calculus ◦ Linear algebra ◦ Probability theory ◦ Optimization ◦ Graph theory/discrete math ◦ Real analysis/topology Machine learning ◦ Technically, considered a branch of statistics ◦ Supervised, unsupervised, and semi-supervised models ◦ Serve to extend statistical models and relax assumptions on data ◦ Includes algorithms from topological data analysis and network analysis
- 10. A professional who blends several different areas of expertise to draw insights from disparate data sources (particularly big data) such that inference can be made about specific problems/decisions within the field of application Data science is a blend of statistical, machine learning, computer science, mathematical, and domain knowledge to leverage data for decision-making in that domain (business, medical, social media…).
- 11. Discuss problem with leadership to understand the problem and how results might be used. ◦ Providing a predictive algorithm that performs well but doesn’t provide insight into the problem might not be useful. ◦ There may be related items that leadership hasn’t considered, items that can enrich the project. Define data that needs to be pulled. ◦ May exist in database. ◦ May need to find elsewhere. Pull and clean data. ◦ Examine for errors or bias. ◦ Deal with missing data. Perform analyses and interpret output. ◦ Can be supervised (fit to outcome) or unsupervised (exploratory). ◦ Typically involves visualization of important results. Compile summary of actionable insights for leadership. ◦ Simplification ◦ Business value (no point in doing analysis if it can’t be implemented!)
- 12. Mathematical/Statistical Background ◦ Graduate degree, typically in mathematics/statistics, computer science, or engineering ◦ Training in machine learning and algorithm design ◦ Experience with R and SAS statistical languages/programs Computer Science Background ◦ Python/MATLAB/other high-level computing languages ◦ Hadoop/MapReduce concepts ◦ SQL or NoSQL coding for database extraction/management ◦ Experience with structured or unstructured data ◦ Data mining/algorithm design Field of Application Expertise ◦ Intellectual curiosity ◦ Understanding of the industry of application (marketing, medical, finance…) ◦ Communication skills to relate findings to non-technical leaders
- 13. From a quick Indeed.com search: ◦ Allstate Insurance ◦ Sprint ◦ Twitter ◦ APS Healthcare ◦ XOR Security ◦ LinkedIn ◦ IBM ◦ Intel Indeed.com search continued: ◦ Roche Pharmaceuticals ◦ Amazon ◦ Capital One
- 14. According to NewVantage and others: ◦ 2016 revenue gained from data science is estimated at $130.1 billion. ◦ This is expected to grow to $203 billion by 2020. Individual company results vary according to: ◦ Team talent and expertise ◦ Data collected (and quality of data) ◦ Competitor strengths in data science. Current and projected shortages of those with analytics talent will impact the market. ◦ Hubs of data science are emerging outside California— Boston, New York, Austin, Chicago, Jacksonville, Tampa, Charlotte, Atlanta… ◦ Across industries—healthcare, tech, finance, energy…

- http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/ Bryant, R., Katz, R. H., & Lazowska, E. D. (2008). Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Lohr, S. (2012). How big data became so big. New York Times, 11. Cuzzocrea, A., Song, I. Y., & Davis, K. C. (2011, October). Analytics over large-scale multidimensional data: the big data revolution!. In Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP (pp. 101-104). ACM. Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt. Brown, B., Chui, M., & Manyika, J. (2011). Are you ready for the era of ‘big data’. McKinsey Quarterly, 4, 24-35.
- Heidema, A. G., Boer, J. M., Nagelkerke, N., Mariman, E. C., & Feskens, E. J. (2006). The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC genetics, 7(1), 23. Draper, N. R., Smith, H., & Pownell, E. (1966). Applied regression analysis (Vol. 3). New York: Wiley. Gopalkrishnan, V., Steier, D., Lewis, H., & Guszcza, J. (2012, August). Big data, big business: bridging the gap. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 7-11). ACM.
- Bekkerman, R., Bilenko, M., & Langford, J. (Eds.). (2011). Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press. Christopher K. Riesbeck. From conceptual analyzer to Direct Memory Access Parsing: an overview., chapter 8. Ellis Horwood Limited, 1986. M. W. Berry. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13–49, Spring, 1992. Caporaso, J. G., Baumgartner Jr, W. A., Kim, H., Lu, Z., Johnson, H. L., Medvedeva, O., ... & Hunter, L. (2006). Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question-Answering. In TREC. Madden, S. (2012). From databases to big data. IEEE Internet Computing, 16(3), 4-6. Agrawal, D., Das, S., & El Abbadi, A. (2011, March). Big data and cloud computing: current state and future opportunities. In Proceedings of the 14th International Conference on Extending Database Technology (pp. 530-533). ACM.
- http://www.kdnuggets.com/2014/11/9-must-have-skills-data-scientist.html