More Related Content Similar to Understanding Big Data Analytics - solutions for growing businesses - Rafał Małanij, GetInData (20) Understanding Big Data Analytics - solutions for growing businesses - Rafał Małanij, GetInData2. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
■ 13+ yrs in IT
■ IT Service Management, Project Management,
Business development
■ Cloud Native, DevOps, Data Science, Big Data,
Genomics
■ Involved in:
● PyData Warsaw
● Data Science Summit
● DevOps Days Warsaw
● Cloud Native Warsaw
Rafał Małanij
rafal.malanij@getindata.com
3. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Founded in 2014 by
ex-Spotify engineers.
Focus only on Big Data and
Cloud (from day 1)
Community builders (Big Data
Tech Warsaw organizers)
60+ Big Data engineers
4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
5. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
6. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
7. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
8. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Volume
● Variety
● Velocity
● Veracity
● Value
Big Data
Source: Wikipedia
9. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
60% - 85%
Big Data projects fails
(Gartner 2016/2017)
10. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
“Big data isn't a one-off project: It's a culture
of collecting, analyzing, and using data.”
Matt Asay, Infoworld.com
11. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
“Technology is the engine of digital
transformation, data is the fuel, process is the
guidance system, and organizational change
capability is the landing gear.”
https://hbr.org/2020/05/digital-transformation-comes-down-to-talent-in-4-key-areas
12. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data literacy
Data literacy is the ability to read, understand, create, and
communicate data as information.
13. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data
Collection
Data
Storage
Processing Delivery
Clickstream
Mobile apps
Product systems
Transaction system
CRM
Call center
Workforce mgmt
14. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Lake
● Repository for raw data
● Various type of data
○ Structured
○ Semi-structured
○ Unstructured
○ Binary
● Historical data
vs.
15. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
16. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data lineage
● Where data comes from
● What happened / How it was transformed
● Where data is used
17. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Degrees of intelligence
Competing on Analytics: The New Science of Winning
by Thomas H. Davenport, Jeanne G. Harris
Competitive
advantage
🔴 Optimization What’s the best that can happen?
🔴 Predictive modeling What will happen next?
🔴 Forecasting/extrapolation What if these trends continue?
🔴 Statistical analysis Why is this happening?
🔴 Alerts What actions are needed?
🔴 Query/drill-down Where exactly is the problem?
🔴 Ad-hoc reports How many, how often, where?
🔴 Standard reports What happened?
Analytics
Reporting
18. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Science vs Machine Learning
19. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Machine Learning
20. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
ML Lifecycle
21. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Machine Learning vs. A.I.
“Artificial intelligence is
the science and engineering
of making computers behave
in ways that, until recently,
we thought required human
Intelligence.”
Andrew Moore,
Carnegie Mellon University,
22. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
23. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Culture
Automation
Lean
Measurement
Sharing
DevOps vs DataOps
+ Data quality
+ Manufacturing process
https://www.dataopsmanifesto.org/
24. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
25. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
26. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Technical
competences
Possibilities
Degrees of intelligence
Competing on Analytics: The New Science of Winning
by Thomas H. Davenport, Jeanne G. Harris
Competitive
advantage
🔴 Optimization
🔴 Predictive modeling
🔴 Forecasting/extrapolation
🔴 Statistical analysis
🔴 Alerts
🔴 Query/drill-down
🔴 Ad-hoc reports
🔴 Standard reports
Analytics
Reporting
27. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Interactive BI
● Reports
● Dashboards
● Drill-down reports
● SQL-queries
● Tools: Excel, PowerBi,
QlikView, Tableau,
Superset, Hive, Presto
28. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Science
● Transformed and Raw data
● Machine Learning
● Tools: Jupyter,
Spark, Scala/Java
R, Python
Tensorflow, etc.
29. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
30. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Discovery
● Search tool for data
● What, where, who?
● Metadata
● Popularity score
● Quality and profiling
31. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Lexikon @ Spotify
● Library for data and insights
● Knowledge Mgmt tool
○ People
○ Description, stats
○ Tables, Queries
https://engineering.atspotify.com/2020/02/27/how-we-improved-data-discovery-for-data-scientists-at-spotify/
32. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
33. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Source: “Continuous Analytics:
Stream Query Processing in
Practice”, Michael J Franklin,
Professor, UC Berkley, Dec 2009 i
https://www.slideshare.net/JoshB
aer/shortening-the-feedback-loop
-big-data-spain-external
34. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
35. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
36. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Hidden Technical Debt in Machine Learning Systems -
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
37. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
38. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
39. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
40. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
Dataism
“Dataism declares that the
universe consists of data flows,
and the value of any
phenomenon or entity is
determined by its contribution
to data processing,”
Yuval Noah Harari
“Homo Deus”.