0
What is a Data Scientist?
Authored By Russell Tibballs MACS CP MSR
Caveat
 This slideshow does not represent the views
of the company I work for. It represents my
evolving views at this po...
Google searches for “data
scientist”
2011 saw the rise of “Big Data” and the term Data Scientist.
2011 saw the release of ...
So how is a Data Scientist
portrayed - Super Human
‘Data Scientists perform data science.They use
technology and skills to...
Super Human Continued
A large IT company – ‘What sets the data scientist
apart is strong business acumen, coupled with the...
The Super Technologist
Authored By Russell Tibballs MACS CP MSR
DATA SCIENCE
Mark Biernbaum suggests ‘Data
Science is going 99% too fast’. His
complaint is that the “science’ is not
peer...
My Problem with Current
Definitions
I have been to a number of industry briefings
where supplied definitions are often ver...
So What can we do about
that?
 Recognise that Data Science is a science that
has a broad brush stroke across all industry...
Recognise that there are many
specialty areas.
There is not one version of data science.
 There is data science applicabl...
Recognise that it is not a
technological implementation.
 Being able to query unstructured data in a
HDFS does not make y...
So What is a Scientist.
Authored By Russell Tibballs MACS CP MSR
The important aspects of any
definition of a Job Title.
 The most important thing to remember here
is that we are talking...
So what is important about the
title ‘Data Scientist’?.
Authored By Russell Tibballs MACS CP MSR
Where did this title
originate?
‘On November 10, 1998, he (JeffWu) gave his
inaugural lecture entitled “Statistics = Data
...
So What is a Scientist?
From the Oxford Dictionary:
‘A person who is studying or has expert knowledge of one
or more of th...
A comment from a recently
retired Scientist
My neighbor has recently retired after a long
career as a scientist and academ...
A Slight Detour.
What qualifies a professional
 I see the Data Scientist as a specialty of the Computer Science
professio...
The Australian Qualifications
Framework - AQF
 The AQF has 10 levels
 Level 1 – Certificate I
 Level 2 – Certificate II...
A THE BOTTOM LEVEL OF THIS
SPECTRUM OF QUALIFICATIONS.
Summary Graduates at this level will have knowledge and skills for
...
At the highest level of the
spectrum of the AQF 10 – The
Doctorate
Summary Graduates at this level will have systematic an...
The Degree
Summary Graduates at this level will have broad and coherent
knowledge and skills for professional work and/or ...
The Vendor’s Course
 The vendors course will usually be about how
to apply a tool to a problem.
 It is not generally des...
So how does the AQF apply to
the question of Data Science
If the person working in the field of applying
‘Data Science’ ha...
Quo Bono. Who benefits from
this approach
 The Public - they will have greater confidence in
the profession.
 The employ...
But!!!
 There needs to be demand from within the
industry for this to happen.
 Some group like the IAPA needs to take on...
Upcoming SlideShare
Loading in...5
×

What is a data scientist - a presentation I made to the Canberra IAPA

110

Published on

A presentation I made to the Canberra IAPA meeting a couple of months back

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
110
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The term "data scientist" started to rise rapidly around 2011 and almost caught up with "statistician". Searches for "data scientist" surpassed the searches for "data miner" in 2012. The chart below shows Google Trends for "Statistician", "Data Scientist", and "Data Miner" from Jan 2008 to Dec 2013.
  • For Graph go to http://www.google.com/trends/explore#q=Statistician%2C%20%22Data%20Scientist%22%2C%20%22Data%20Miner%22&date=1%2F2007%2084m&cmpt=q

  • http://www.datascientists.net/what-is-data-science

    If this had stopped at the first couple of sentences I would have been happier.
  • I think the picture can do without the ‘go away if’ arm.

    All these things are good; however this expresses an ideal. Not a reality.

    For the quote. This has been written by a communications specialist who is telling someone ‘if you get the right person they will solve all your problems’. I believe in the Easter Bunny and Santa Claus too.

    On the communications side. A friend of the family works as a ‘Science Communicator ‘for a large pharmaceutical in London. Maybe if data science is that important, it will lead to ‘data communicators’ – possibly Nate Silver already falls in that camp.

    However if you to almost any profession there those that can do; and those who understand what they can do and can do it, and those who can do and communicate what it is they are doing. The communicator does not always rise to the top of the heap as technicians tend to respect technical ability above communications ability.
  • It is impossible to have all these skills, however some of them would be useful. Many of these tools will quickly become redundant as newer tools and methods evolve. The data access components will be merged into simpler interfaces and existing tools.

    http://nirvacana.com/thoughts/becoming-a-data-scientist/

    To be fair, Swami indicates this is a Roadmap to follow and is also getting people to think about what is a data scientist. He also indicates this is far from complete. I may be misinterpreting him; however it appears to that you need to an expert each stop, which seems a tall ask. By the time you have learnt many of these skills a fair percentage what you have learnt will be redundant as new tools and techniques replace them. Which is one of the joys of working in this field; you will never have time to get bored as you need to maintain continual learning to stay relevant.

  • http://www.kdnuggets.com/2014/01/biernbaum-data-science-99-percent-too-fast.html

    From what I have seen tend to agree.

    To Quote Steven Brobst (Teradata CTO) 20140402 – Teradata Summit Series, ‘IT people love to chase shiny objects’.
  • I am talking about vendor presentations and a few from Industry Special Interest Groups.

    I believe in most organisations there are staff who can be moulded to fill the required Data Science capability.
  • http://www.abc.net.au/news/2011-12-21/albert-einstein-sticks-out-his-tongue-at-photographers/3742064

    I picked this photo because everyone thinks of Einstien when they think of science

    Bottom Right is Ed Deiner who has studying Well Being for decades.

    These guys graduated as Geologists from the University of Wisconsin – They are science graduates and recognised specialists.
  • From Wikipedia. Note the problem solving and decision making component. Using that definition anyone who has a substantial statistics and research component to their degree such as maths, economics, science, and social science graduates who works in an analytics capacity are data scientists. Therefore if they are qualified and practicing in an information analytics capacity they are data scientists. data science and statisticians data scientists.[14] Later, he presented his lecture entitled “Statistics = Data Science?” as the first of his 1998 P.C. Mahalanobis Memorial Lectures.[15]’

    C.F. Jeff Wu is the Coca-Cola Chair in Engineering Statistics and Professor in the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology.

  • Peer review does not make so much sense outside the academic realm. However it does help get rid of silly mistakes and helps ensure that outcomes are repeatable. Good method will ensure that if you repeat a process you will get the same result. A surprisingly rare feat in the wilds of data analysis.

    Good analysis has purpose, context, and strong methodology. So outside it equates to the “active research cycle” with a ‘business’ focus.

    Active Research Cycle from http://creativeeducator.tech4learning.com/v07/articles/Embracing_Action_Research

  • Why am I bothering with this anecdote. It is designed to show the creep in requirements overtime for a skill.

    PS Thomas also noted the so called scientists no longer practised their craft – they spend their life applying for grants and networking. The post grad students do the work. He spent his last years pre-vetting submissions for publications by doctoral students.
  • The most important part is specialties or streams each with recognised levels of achievement and expertise. In terms of ‘Data Science’, just because you do not have a certification should not mean you will not be able to do certain work; it would mean that the Professional body would not be endorsing your ability to do that work.
  • In Australia we have the Australian Qualifications Framework. ‘The AQF is the national policy for regulated qualifications in Australian education and training. It incorporates the qualifications from each education and training sector into a single comprehensive national qualifications framework. The AQF was first introduced in 1995 to underpin the national system of qualifications in Australia encompassing higher education, vocational education and training and schools.’

    Where there are existing frameworks that are working make use them.

    http://www.aqf.edu.au/aqf/in-detail/aqf-levels/
  • This is where I see most vendor courses are sitting. They train to use a tool. In regard to ‘Data Science’, a course on Legal Privacy requirements at this level could and possibly should be compulsory.
  • Obviously people at this level in the hard and soft sciences have a demonstrated capacity to apply a level of qualitative, quantitative, or both analysis through the lense of the research cycle to provide significant insight. These people should be able to communicate exceptionally well. The argument put forward is often that they focus is narrow and should not be used outside that sphere.

    I once heard a Oxford Professor state that ‘Oxford Phd graduates can learn any new subject and be an expert within 2 weeks’. Probably an exaggeration; however it does highlight the issue that this level of achievement is generally an attribute of the graduate which shows the general ability to learn and communicate ideas at a high level.

    When I have quizzed a number of speakers after presentations that bemoaned the lack of “data science candidates’ I would ask about the 10s of thousand of Higher degree, and research graduates. Then they would agree that the problem is not so much the lack of ‘Data Scientists’ as a lack of manager who can comprehend what data scientists are talking about.
  • A graduate of the hard and soft sciences should be able to apply analytic tools to evaluate information and transmit solutions to complex problems. I believe this is the starting level for a Data Scientist.
  • Accreditation to use a tool is just that. It is not really a recognisable qualification. Often it is really telling you how to use a tool and little more. There are some Vendors whose courses are imbedded in Unervisity curriculums. However that is not the norm.
  • This is the end of the equation where people should qualify as a Professional. Below that we are really applying a tool.

    There are many other academic streams that would fit into this model. Basically anything where you have to use Data Analysis to apply scientific method. Ie Pyschology, engineering and others.
  • Transcript of "What is a data scientist - a presentation I made to the Canberra IAPA"

    1. 1. What is a Data Scientist? Authored By Russell Tibballs MACS CP MSR
    2. 2. Caveat  This slideshow does not represent the views of the company I work for. It represents my evolving views at this point in time and is mainly intended to provoke thought and discussion. Authored By Russell Tibballs MACS CP MSR
    3. 3. Google searches for “data scientist” 2011 saw the rise of “Big Data” and the term Data Scientist. 2011 saw the release of Money Ball, starring Brad Pitt as a geek. 2012 Nate Silver correctly predicted the winner of all 50 states and the District of Columbia when the pundits were claimingObama had lost. Authored By Russell Tibballs MACS CP MSR
    4. 4. So how is a Data Scientist portrayed - Super Human ‘Data Scientists perform data science.They use technology and skills to increase awareness, clarity and direction for those working with data.The data scientist role is here to accommodate the rapid changes that occur in our modern day environment and are bestowed the task of minimising the disruption that technology and data is having on the way we work, play and learn. Data Scientists don’t just present data, data scientists present data with an intelligence awareness of the consequences of presenting that data.’ Authored By Russell Tibballs MACS CP MSR
    5. 5. Super Human Continued A large IT company – ‘What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.’ Authored By Russell Tibballs MACS CP MSR
    6. 6. The Super Technologist Authored By Russell Tibballs MACS CP MSR
    7. 7. DATA SCIENCE Mark Biernbaum suggests ‘Data Science is going 99% too fast’. His complaint is that the “science’ is not peer-reviewed and the techniques are often questionable. He believes Data Scientists should slow down, specialize, and above all - have the methodologies peer-reviewed. Authored By Russell Tibballs MACS CP MSR
    8. 8. My Problem with Current Definitions I have been to a number of industry briefings where supplied definitions are often very ‘pie in sky’ and elitist.The definitions are designed to indicate ‘you can’t possibly do this yourself and there is no way any of your existing staff will qualify for the role’.This may not be intended; however it is the result. Authored By Russell Tibballs MACS CP MSR
    9. 9. So What can we do about that?  Recognise that Data Science is a science that has a broad brush stroke across all industry sectors.  Recognise that there are many specialty areas.  Recognise that it is not a technological implementation.  Recognise that there will be many levels of expertise. Authored By Russell Tibballs MACS CP MSR
    10. 10. Recognise that there are many specialty areas. There is not one version of data science.  There is data science applicable to the research sectors of Maths, Physics, Meteorology, and Medicine that will rarely be applied elsewhere.  There is data science our friends in the NSA and local equivalents will specialise in.  There is the data science economist and financial sectors will specialise in.  Etc, etc … ad nauseam. Authored By Russell Tibballs MACS CP MSR
    11. 11. Recognise that it is not a technological implementation.  Being able to query unstructured data in a HDFS does not make you a data scientist.  Being able to analyse Splunk data does not make you a data scientist.  Being able to filter petabytes of data on a MPP RDBMS does not make you a data scientist. Authored By Russell Tibballs MACS CP MSR
    12. 12. So What is a Scientist. Authored By Russell Tibballs MACS CP MSR
    13. 13. The important aspects of any definition of a Job Title.  The most important thing to remember here is that we are talking about a JobTitle, and a JobTitle should be meaningful.  Secondly what should qualify someone for that title. Authored By Russell Tibballs MACS CP MSR
    14. 14. So what is important about the title ‘Data Scientist’?. Authored By Russell Tibballs MACS CP MSR
    15. 15. Where did this title originate? ‘On November 10, 1998, he (JeffWu) gave his inaugural lecture entitled “Statistics = Data Science?” in honor of his appointment to the H. C. Carver Collegiate Professorship in Statistics at the University of Michigan.[14] In this lecture, he first focused on the identity of statistics in science. He then characterized statistical work as data collection, data modeling and analysis, and problem solving and decision making. In conclusion, he proposed that statistics be renamed to Data Science. Authored By Russell Tibballs MACS CP MSR
    16. 16. So What is a Scientist? From the Oxford Dictionary: ‘A person who is studying or has expert knowledge of one or more of the natural or physical sciences :a research scientist’. Note. A scientist is not necessarily a research scientist; they can be a practicing expert in a field. However all scientists share one feature, they are trained in a science and they apply scientific method to obtain understanding of a focus of interest, and their methods and conclusions are subject to peer review. Authored By Russell Tibballs MACS CP MSR
    17. 17. A comment from a recently retired Scientist My neighbor has recently retired after a long career as a scientist and academic.We were discussing the increasing growing exclusivity of the term scientist a few weekends ago. In his words, ‘In the 1970s a scientist had degree, by mid 80s they needed honors, in the 90s they needed a masters or PHD, now they need several Post-Doctoral projects under their belt to be considered a ‘real’ scientist.’ However, he believes someone who is qualified (has a science degree) and who is practicing their studied discipline, is a scientist. Authored By Russell Tibballs MACS CP MSR
    18. 18. A Slight Detour. What qualifies a professional  I see the Data Scientist as a specialty of the Computer Science profession.  We have lawyers who specialise in corporate, family, criminal, and other aspects of the law.  The accounting, architecture, engineering, teaching and medical professions have several specialties and recognised levels of expertise in each field.  These professional’s have academic training, and in many cases acceptance by a professional body is what makes them acceptable as professionals in the public eye.That is a model I strongly believe the ICT industry needs to adopt or at least move towards.  I believe the academic achievement makes the qualification.The acceptance by a professional body should give standing within the profession and wider community. Authored By Russell Tibballs MACS CP MSR
    19. 19. The Australian Qualifications Framework - AQF  The AQF has 10 levels  Level 1 – Certificate I  Level 2 – Certificate II  Level 3 – Certificate III  Level 4 – Certificate IV  Level 5 – Diploma  Level 6 – Advanced Diploma,Associate Degree.  Level 7 – Bachelor Degree  Level 8 – Bachelor Honors Degree, Graduate Certificate, Graduate Diploma  Level 9 – Masters Degree  Level 10 – Doctoral Degree Authored By Russell Tibballs MACS CP MSR
    20. 20. A THE BOTTOM LEVEL OF THIS SPECTRUM OF QUALIFICATIONS. Summary Graduates at this level will have knowledge and skills for initial work, community involvement and/or further learning Knowledge Graduates at this level will have foundational knowledge for everyday life, further learning and preparation for initial work Skills Graduates at this level will have foundational cognitive, technical and communication skills to: •undertake defined routine activities •identify and report simple issues and problems Application of knowledge and skills: Graduates at this level will apply knowledge and skills to demonstrate autonomy in highly structured and stable contexts and within narrow parameters Authored By Russell Tibballs MACS CP MSR
    21. 21. At the highest level of the spectrum of the AQF 10 – The Doctorate Summary Graduates at this level will have systematic and critical understanding of a complex field of learning and specialised research skills for the advancement of learning and/or for professional practice Knowledge Graduates at this level will have systemic and critical understanding of a substantial and complex body of knowledge at the frontier of a discipline or area of professional practice Skills Graduates at this level will have expert, specialised cognitive, technical and research skills in a discipline area to independently and systematically:  engage in critical reflection, synthesis and evaluation  develop, adapt and implement research methodologies to extend and redefine existing knowledge or professional practice  disseminate and promote new insights to peers and the community  generate original knowledge and understanding to make a substantial contribution to a discipline or area of professional practice Application of knowledge and skills Graduates at this level will apply knowledge and skills to demonstrate autonomy, authoritative judgment, adaptability and responsibility as an expert and leading practitioner or scholar Authored By Russell Tibballs MACS CP MSR
    22. 22. The Degree Summary Graduates at this level will have broad and coherent knowledge and skills for professional work and/or further learning Knowledge Graduates at this level will have broad and coherent theoretical and technical knowledge with depth in one or more disciplines or areas of practice Skills Graduates at this level will have well-developed cognitive, technical and communication skills to select and apply methods and technologies to:  analyse and evaluate information to complete a range of activities  analyse, generate and transmit solutions to unpredictable and sometimes complex problems  transmit knowledge, skills and ideas to others Application of knowledge and skillsGraduates at this level will apply knowledge and skills to demonstrate autonomy, well-developed judgement and responsibility:  in contexts that require self-directed work and learning  within broad parameters to provide specialist advice and functions Authored By Russell Tibballs MACS CP MSR
    23. 23. The Vendor’s Course  The vendors course will usually be about how to apply a tool to a problem.  It is not generally designed to provide you with knowledge that can be applied outside the scope of their tool’s environment.  It would generally not qualify within the AFQ guidelines. Authored By Russell Tibballs MACS CP MSR
    24. 24. So how does the AQF apply to the question of Data Science If the person working in the field of applying ‘Data Science’ has a degree (AQF level 6 or above) in a related subject, ie Maths, Statistics, or Economics; or a higher degree including Grad Cert and Diplomas they can be expected to:  apply knowledge and skills to demonstrate autonomy, well-developed judgment and responsibility:  in contexts that require self-directed work and learning  within broad parameters to provide specialist advice and functions Authored By Russell Tibballs MACS CP MSR
    25. 25. Quo Bono. Who benefits from this approach  The Public - they will have greater confidence in the profession.  The employer – they get the assurance that employee has the skills at the right levels to do the work.  The employee – because they will know what is expected of them and know they will be able to deliver.  The professional body and industry through greater faith and confidence by the public in the profession in general. Authored By Russell Tibballs MACS CP MSR
    26. 26. But!!!  There needs to be demand from within the industry for this to happen.  Some group like the IAPA needs to take on the responsibility of working out the Professional specialisations and required frameworks for acceptance of professional into those specialisations. Authored By Russell Tibballs MACS CP MSR
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×