6. Big Data
6
Machine
Learning
Hadoop
Big Data
Search term popularity
(fetched 12.9.14)
7. Big Data – why?
7
Unlock the hidden information
in data with advanced
analytical methods.
New insights lead to
competitive advantages
8. Big Data - Industries
8
Healthcare Academia Finance
Manufacturing HR
…you name it
9. Big Data – future driven
9
Business value
Costs/Complexity
standard
reports
raw data
ad hoc
reports
standard
stats
past driven
whatever
predictive
analytics
Big Data
future driven
11. Big Data - Providers
11
….there are a lot of players…..
12. Big Data – Definition
12
Volume
• Petabyte and more
Velocity
• Speed of generation of
data
Variety
• Diverse categories
Definition: Gartner (2012)
3 V’s
13. Big Data
13
Volume
• Petabyte and more
Velocity
• Speed of generation of data
Variety
• Diverse categories
Current definition (3 V’s) + high expectations
=
misleading associations
14. Big Data – misleading associations
14
Big data = Data analysis
(extracting useful information needs
a vast amount of data)
15. Big Data – misleading associations
15
Big Data = Big company and big infrastructure
(Big Data is only an option for big companies)
16. Big Data
16
The common thinking about Big Data
leads to a digital “two-tier society”.
Big Data rich and Big Data poor institutions/companies
17. Big Data - Volume
17
Volume
• Petabyte and more
Velocity
• Speed of generation of data
Variety
Misconception #1
• Diverse categories More data carry more insights.
1. Signal-to-Noise ratio can be worse
2. Strong but spurious correlations
3. Fooled by the curse of dimensionality
18. Big Data – Technology matters
18
Volume
• Petabyte and more
Velocity
• Speed of generation of data
Variety
Misconception #2
• Diverse categories Technology matters most.
1. Algorithms do not generate knowledge
2. Technology for technology’s sake
3. Technology beats business
19. Big Data – Data Science
19
Volume
• Petabyte and more
Velocity
• Speed of generation of data
Variety
Misconception #3
• Diverse categories Big Data projects generate facts.
1. Big Data is not a science
2. Whatever you do, you can’t predict the future
20. Data
20
To most relevant ingredients for a
successful “Big” Data project:
• Curiosity and creativity
• Carefully selected data (not necessarily big)
• A useful and strategic relevant business question
21. Data Scientist
21
From raw data to business insights!
Who can do this?
26. Data Science Team
26
Business
Question
Data
Acquisition
Data
Normalization
Modeling
Model
Assessment
Validation
Communication
Visualization
Data Science
Team
Number crunching
Human interpretation
27. Summary Big Data and Data Science
27
Takeaway message #1:
Methods and Algorithms developed within the
Big Data Hype are useful and work on smaller
data sets as well (sometimes even better).
28. Summary Big Data and Data Science
28
Takeaway message #2:
To successfully extract strategic relevant information
from your data you need a good mix of skills (team).
Develop explorative, fast, and fail early.
29. Summary Big Data and Data Science
29
Takeaway message #3:
Business domain knowledge is key.
30. Relevance for HR?
• Candidate does not
see your job offer
(time and location)
30
• Organization does
not reach candidate
(time and location)
31. Relevance for HR - Case
31
Possible business question:
What time is the right time to
proactively approach
a potential candidate?
32. too early… too late…..
time
candidate job seeking activities
Job seeking activity patterns?
passive active
33. Job seeking activity patterns?
learn pattern application
active phase
‘sweet spot’
time
candidate job seeking activities
38. Skill mixing (the nerd slide)
highlight various aspects of B-Rank, a toy network
introduced. For simplicity all links between ob-jects
users are equally weighted wi = 1 8i.
Example: team-up heterogeneous skill landscape
Candidate
Blattner, M. (2009),
Skills (measured)
'B-Rank: A top N Recommendation Algorithm',
Toy net to CoRR illustrate abs/0908.2741 B-.
Rank. Circles represent
hyperedges (users), squares are hypervertices, i.e. ob-jects.
votes are illustrated as links between objects
Big Data hat also etwas damit zu tun, wie auch immer einen added-value aus den Daten zu generieren. Oder anders ausgedrückt, die Hoffnung, sich einen kompetitiven Vorteil durch Analyse von Daten zu verschaffen.