A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)

  • 184 views
Uploaded on

This presentation presents recent research into definitions of analytics through analysis of related job adverts. The results help us identify a new categorisation of analytics methodologies, and......

This presentation presents recent research into definitions of analytics through analysis of related job adverts. The results help us identify a new categorisation of analytics methodologies, and discusses the implications for the operational research community.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
184
On Slideshare
184
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Agenda 2 Problem Summary Confusion about precise definition of analytics Benefit of ‘practical’ definitions Issues with the conventional ‘practical’ model of analytics Model Details Data source: ‘analytics’ job adverts Topic modeling & Latent Dirichlet Allocation Model build & data pre-processing Implications Model analysis An alternative definition of analytics Implications for OR/MS
  • 2. Analytics is … 3 …. delivering the right decision support to the right people at the right time. Laursen & Thorlund, 2010, p XII … the scientific process of transforming data into insight for making better decisions INFORMS … [the] technologies, systems, practices, & applications to analyze critical business data so as to gain new insights Lim et al, 2012 … the extensive use of data, statistical & quantitative analysis, explanatory & predictive models, & fact-based management to drive decisions & actions. Davenport & Harris , 2007, p 7 … an outgrowth of what is known as business intelligence *…+ Today’s expansive, global enterprises generate a deluge of data that is impossible for a human to make sense of. Varshney & Mojsilovic, 2011 Analytics with a capital "A" is an umbrella term that represents our industry at a macro level, and analytics with a small "a" refers to technology used to analyze data. Eckerson, 2011 … information-intensive concepts and methods to improve business decision making. Chiang et al, 2012 … is the process of obtaining an optimal and realistic decision based on existing data Hamel, 2011 … data analysis that changes the behavior of the organization Hackathom, 2010 the science of analysis … the science of analysis Wikipedia … the method of logical analysis Meriam Webster … the brains to cloud computing’s brawn Croll, 2011 … the process of transforming data, from a variety of sources and of a variety of types, into insights that support, improve and/or automate business decisions, using technological, quantitative and presentation techniques Mortenson et al, 2013 … a group of approaches, organizational procedures and tools used in combination with one another to gain information, analyze that information, and predict outcomes of problem solutions Trkman et al, 2010 … the use of data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions Evans, 2012 • Many contrasting and often contradictory definitions • Particularly difficult to distinguish analytics from business intelligence or similar fields • Does it matter?  Potential confusion  As analytics is multi-disciplinary it is important that a common language can be established  Important so that the growing job market can be met with the appropriate training What is Analytics?
  • 3. Analytics: Practical Definition 4 Source: Blackett, 2012 Advantages • Focuses on application & generation of value • Demonstrates the disciplines informing analytics Issues • Some methods suggest different purposes • Suggesting progression to prescriptive as advanced may not always hold
  • 4. Job Adverts 5 • Analyse “analytics” job adverts – following the tradition of ‘ASP’ studies (e.g. Liberatore and Luo, 2012) • Instead of studying a smaller pool of jobs, we access through the LinkedIn API  Over 250k jobs online  77% of all jobs are posted on LinkedIn (Dougherty, 2012) • Scripted using Python & stored in MongoDB  OAuth, SimpleJSON, & PyMongo • Need to reduce and generalise results from >6,800 adverts with >50,000 unique words.
  • 5. Topic Models 6 • Topic models assume documents to be a collection of latent topics. The topics determine which words are used • Probabilistic models that determine the topics by analysis of the co-occurrence of the words used • The most common are Probabilistic Latent Semantic Indexing (pLSI) and Latent Dirichlet Allocation (LDA)
  • 6. Latent Dirichlet Allocation (LDA) 7 • Basic conception is that a collection of documents has three layers and contains: Documents Words Words W Topics Z Topic Distribution Ө Alpha Parameter α Beta Parameter β Adapted from Blei et al, 2003N M
  • 7. Latent Dirichlet Allocation - Process 8 • Model is built by: 1. Estimating topics as product of observed words 2. Use to estimate document topic proportions 3. Evaluate corpus based on the distributions suggested in (1) & (2) 4. Use (3) to improve topic estimations (1) 5. Reiterate until best fit found
  • 8. Latent Dirichlet Allocation - Assumptions 9 • Bag-of-words / exchangeability • The number of topics is known and pre-determined (K )  Cross-validation to identify K with the lowest perplexity • Topic independence  As α is a parameter of a Dirichlet prior, each topic is assumed to be independent and not correlated  In this research correlation between topics has to be assumed.  Alternative is the correlated topic model (Blei & Lafferty, 2007), which uses a logistic normal rather than a Dirichlet distribution
  • 9. Data Pre-Processing & Model Build 10 • Strip HTML / XML • Remove stop words, numbers and punctuation • Remove words < 3 characters • Remove most and least frequent words  Python: HTMLParser, GenSim and String  R: TM and TopicModels • To stem or not to stem?  "the job involves managing analytics projects"  "the job involves the management of analytical projects“  "has experience running projects using management science and analytics"  "managing a team of scientists analysing the experience of runners"
  • 10. Topic Results • 30 topics identified • All topics are created equally but some are more topical than others 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% Most Likely Topic per Document as % of Corpus 11
  • 11. Most Likely Terms in Topics • Analysis of the 3rd, 4th & 5th most likely topics Digital & Web (8%) Topic 3 (4th ) other media across working understanding analysis social projects responsible required ensure within design key performance digital company manager products their lead tools role services Topic 13 (3rd ) working market develop project software process media reporting key through requirements solutions manager excellent your strategy multiple more service opportunity manage well opportunities clients Consultancy (17%) 12 Topic 9 (5th) risk systems design solutions services other tools technical teams related provide required position degree such operations global skills project opportunity clients service excellent products Technical (7%)
  • 12. Most Likely Terms in Topics (cont.) • Analysis of the top two most likely topics Topic 20 (1st ) reporting analysis media required strategy related strategic manager company degree risk online products across drive must manage responsible well financial planning industry lead software Topic 21 (2nd ) services solutions technology clients digital consulting your more implementation management oracle technical capabilities design provide advisory strategy integration technologies sap career enterprise solution architecture Strategic (41%)Computing (20%) 13
  • 13. Model Analysis • Main five topics:  Technical  Digital/Web  Consultancy  Computing  Strategic • ‘Digital/Web’ is a specialism within analytics (also ‘Financial’) • ‘Technical’ & ‘Consultancy’ are specific job types or environments  However, some technical (‘hard’) skills & some consulting-type (‘soft’) skills are likely to be required in all analytics jobs • ‘Computing’ & ‘Strategic’? 14
  • 14. The Analytics of Computing? 15 Basic Analytics Capability SoftHard Data Warehouses Big Data Architecture Stock Market Analysis Algorithmic Trading Fraud Investigation Automatic Fraud Detection Customer Segmentation Propensity Modeling Clickstream Analysis Behavioural Targeting Qualitative Text Analysis Natural Language Processing Reports & Dashboards Advanced Visualisation Advanced Analytics Capability Discovery Analytics
  • 15. The Analytics of Strategy? 16 Basic Analytics Capability SoftHard Trial & Error Experimentation Optimisation Simulation Basic Forecasting ARIMA Time Series Performance Metrics Data Envelopment Analysis A/B Testing Multivariate Testing Business Analysis Business Process Optimisation Requirements Gathering Problem Structuring Advanced Analytics Capability Decision Analytics
  • 16. An Alternative Definition of Analytics 17 Descriptive Analytics Predictive Analytics Prescriptive Analytics Statistical and data modeling techniques designed to describe past events and answer “what happened”? Data mining and machine learning techniques used to predict future events and answer “what will happen next”? OR/MS , advanced statistical and mathematical models used to prescribe future actions and answer “what should we do next”?
  • 17. An Alternative Definition of Analytics Technological Strategic Lower Risk Decisions Higher Risk Decisions 18 Discovery Analytics Decision Analytics Advanced Discovery Analytics Reporting & alerts Market research Information systems Basic historical analysis Performance metrics Stakeholder consultation Advanced visualisation Real time insights Automated decisions Advanced Decision Analytics Advanced modelling Problem structuring Decision analysis Advanced
  • 18. Summary & Implications for OR/MS • Implemented a correlated topic model on 6,873 job adverts • An alternative practical definition of analytics has been suggested: discovery and decision analytics  Maintains the focus on business value, application & the disciplines that inform analytics  However, removes the contradictions in the previous model • OR/MS has an obvious role in advanced decision analytics, both in hard and soft applications • Further exploration (and/or promotion) of the role of OR/MS in advanced discovery analytics 19
  • 19. Contact Details and Questions Email: m.j.mortenson@lboro.ac.uk Website: www.whatisanalytics.co.uk Mobile: 07833 XXXXXX LinkedIn: http://www.linkedin.com/profile/view?id=114000243&trk=tab_pro (or search Michael Mortenson) 20