Data scientist


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data scientist

  1. 1. DATASCIENCEMORE THAN MINING “The sexiest job in the next 10 years will be statisticians.” — Hal Varian, Chief economist, GoogleWhile the concept of data science has been around fordecades, the notion of a data scientist has become asought-after and in-demand career leading to a rise of a newgeneration of data scientists.The phenomenon in technology development significantlyexposes the staggering growth rates of “big data.”Technology innovation and the World Wide Web provide forthe growth of new types of data — such as user-generatedcontent — and tools that can be used to interpret it.Social media platforms such as Facebook (the largest socialnetwork and valued at $52 billion) depend on data science tocreate innovative, interactive features that encourage usersto get interested and stay that way — all so that we know itsimportant.But what does the term ‘Data Science’ really mean?What is data science?Data science can be broken down into four essential parts.Mining data StatisticsCollecting and formatting Information analysisthe informationInterpret Leverage A B C ?Representation or visualization in Implications of the data,the form of presentations, application of the data, interactioninfographics, graphs or charts using the data and predictions formed from studying itDefining a data scientistA good data scientist understands the importance of:Scouring OrganizationTheir eyes search for Their voice asks questionsinformation on the web about what they hope to Vectorized operations accomplish at the end of the project, setting Algorithmic strategizing information goals. APIsExtraction Expansion &Takes information they want and Applicationorganizing it using formulas. Theyorganize the information in order to The appropriate data flowsform educated, insightful conclusions out of the person in the formusing statistical and these of keywords, Facebook “Likes”mathematical methods: and other statistics. Factor Analysis Regression Analysis Correlation Time Series AnalysisCreating new theories andpredictions based upon the dataAsk questions to further expound pile-up and missed opportunities.upon the data beyond the reaches of For example, statistics regardinghard numbers or facts. holiday shopping trends areApply the information in a useful, imperative around the holidayinnovative manner to applications season. If the statistics arewhose success depends on data processed and the conclusions arescience. drawn too late, the season has passed and the information can noImmediately process terabytes of longer be utilized to its full that flow in to preventRequired skillsfor a data scientistA successful data scientist must have a combination of skills that opens uppossibilities both for that individual and their team. Visualization processes areoften disjointed since each person is typically assigned to a specific part of theproject. The designer depends on the information architect. The informationarchitect depends on stats from the statistician, and so on. A true data scientistshould be skilled in multiple areas. Expertise inHacking and Mathematics,Computer Statistics, CreativityScience Data Mining & Insight %Knowing how to take Pulling important Knowing whatadvantage of statistics and statistics arecomputers and the coherently organizing important and howinternet to create them using to leverage themdata-mining formulas mathematic prowess and computer formulasDangers of data scienceStatistics can be displayed in a misleading mannerLeading the pollee:What type of question are you more likelyto answer “yes” to? 85% 70% No YesShould Americans be taxed Should taxes support theso others can take advantage government’s aid to thoseof welfare and avoid working? who are unable to find work? Facts that are left out Including only the starting and ending points of data makes the change seem more drastic. A collage of carefully 9 of 10 selected information combined to induce a certain opinion Selection bias occurs when an unrepresentative population has been taken for a survey or study and then the results are advertised to the public consumers as if it represented the total population. An example is a toothpaste brand that shows the user how ‘studies’ can often be weighted in a companys favor.Ironically, facts and stats can be used topaint a very inaccurate — and damaging —picture of a business, organization orgeneral topic.Facts about data science1790 The first big data collection project in history was by the U.S. Census, which started in 1790.5MB When hard drives were first invented, a 5 megabyte server took up roughly the space of a luxury refrigerator. Today, a 32 gigabyte micro-SD card measures around 5/8 x 3/8 inch and weighs about 0.5 grams. 32GBWhen collecting mass quantities of data, some human remedial input is needed,this gave birth to crowd sourcing, The best example isAmazons mechanical turk.Modern collecting of big data is possible with cloud computing,or the spreading of the data across several physical resources that can be accessedremotely, rather than concentrated at one location.“The computing and processing ofdata is literally 100 to 1,000 timesfaster and cheaper than before.”— Scott Yara, Greenplum