SlideShare a Scribd company logo
Data Pipeline ArchitectData Pipeline Architect
Data Pipelines
For small, messy and tedious data.
Vladislav Supalov, 27th October 2016
Data Pipeline ArchitectData Pipeline Architect
How to tell if this talk is for you?
2
2
● Big Data
○ Pretty fascinating
○ “Good problem to have”
● Most companies
○ Not quite there
○ Should not start at this level
● This is for you, if you are close to the data at a
○ Startup
○ Growing company
○ Established company which is about to start an initiative
● Working with a new CDO, CAO, Head of BI
Data Pipeline ArchitectData Pipeline Architect
I want to help you achieve better results!
3
3
● What will help you to deal with …?
○ small data (not much is needed to be valuable)
○ messy data (multiple data sources, no overview)
○ tedious-to-handle data (multiple data sources, lots of manual work)
● “Use <tech X> in <way Y> and you will be fine”. Nope.
○ Just dealing with data is not a magic bullet
○ This will not guarantee good results for your company
○ You might get lucky of course. That’s not a safe bet.
● How can we improve your chances? Reduce risk.
○ Focus on what matters
Data Pipeline ArchitectData Pipeline Architect
Jumping to tech we would dive too deep, too early.
4
4
● What people tend to think about first:
○ Dashboards
○ Tools
○ Technical solutions, best practices & tricks
● That’s tactics
● We should not jump into implementation details right away.
● Let’s not.
Data Pipeline ArchitectData Pipeline Architect
The Craft of Designing & Building Data Pipelines
Should start with understanding the business.
Data Pipeline ArchitectData Pipeline Architect
Hi, I’m Vladislav!
6
6
● Data background
○ Machine learning, computer vision, data mining
● Fascination with DevOps
○ Efficient, reliable infrastructure setups
○ Monitoring, automation, processes
● Currently: Co-founding a startup - Pivii Technologies
○ Startup, accelerated by Axel Springer Plug and Play
○ Artificial intelligence for content marketing
○ AI, ML, CV, data!
○ pivii.co
● Previously: Building a data engineering consulting business
○ datapipelinearchitect.com
vsupalov
Data Pipeline ArchitectData Pipeline Architect
Preferred consulting situation:
7
7
● Mobile application marketing agency
○ Not necessarily huge data
○ Very valuable and worthwhile (from a certain point)
● “We built prototype analytics tools in-house and they are mostly functional”
○ “We have seen the value!”
○ But are painful to work with & broken
○ “Time and money is still being wasted.”
● Tools were created out of an actual need
○ Organically, little planning
○ “How can we do better?”
○ “Where do we go from here?”
Data Pipeline ArchitectData Pipeline Architect
Common Success Pattern: Business Value was Created.
Already achieved visible and measurable impact for the company.
Or have gotten VERY close to do so. Are thinking about ROI.
Data Pipeline ArchitectData Pipeline Architect
Business first. Tech follows.
9
9
● Key to successful data projects
○ Especially with limited resources
○ And small data
● Technical decisions should be informed by business needs and goals
● Handling data is a very small part of the whole
○ Straightforward once business needs are clear
● It starts with the mindset
○ Don't consider data plumbing in isolation
Data Pipeline ArchitectData Pipeline Architect
Key: being conscious and deliberate about the intention of
creating business value.
Let’s take a brief detour.
Data Pipeline ArchitectData Pipeline Architect
Consider sword fighting.
11
11
● A great samurai sword master
● 1584 - 1645
● Miyamoto Musashi
○ Martial artist
○ Tactician
○ Strategist
○ Artist
○ Sculptor
○ Calligrapher
○ Writer
○ Philosopher
○ ...
Images: Miyamoto Musashi, self-portrait, http://sv-musashi1.com/about_Musashi.htm,
Musashi Miyamoto with two Bokken, http://www.akinokai.org/images/Images.htm?Musashi.jpg
Data Pipeline ArchitectData Pipeline Architect
“The primary thing when you take a sword in your hands is
your intention to cut the enemy, whatever the means.”
- Miyamoto Musashi, The Book of Five Rings
Data Pipeline ArchitectData Pipeline Architect
“Whenever you parry, hit, spring, strike or touch the
enemy’s cutting sword, you must cut the enemy
in the same movement.”
- Miyamoto Musashi, The Book of Five Rings
Data Pipeline ArchitectData Pipeline Architect
“It is essential to attain this.
If you think only of hitting, springing, striking or touching
the enemy, you will not be able actually to cut him.”
- Miyamoto Musashi, The Book of Five Rings
Data Pipeline ArchitectData Pipeline Architect
“More than anything, you must be thinking
of carrying your movement through to cutting him.
You must thoroughly research this.”
- Miyamoto Musashi, The Book of Five Rings
Data Pipeline ArchitectData Pipeline Architect
The Goal of swordfighting is to cut the opponent.
16
16
● Stating this makes it seem very obvious.
○ Why the effort and emphasis?
● It’s not. Even for aspiring practitioners.
○ Results suffer.
● Mindset is essential for mastery
● The core advice (to my understanding):
○ Attain, cultivate and apply a goal-oriented mindset
○ Aim every step you take towards the goal
Data Pipeline ArchitectData Pipeline Architect
Back to the world of data-handling businesses!
17
17
● When working with company data
○ Before starting out on a project
○ Understand what you want and can achieve
○ Aim to create a positive impact on the business
○ Make it a constant, conscious goal
● The main tasks to do so are:
○ Understand the business
○ Understand the people
■ It’s about communication
○ Understand current processes
○ Be prepared to learn and revise
Data Pipeline ArchitectData Pipeline Architect
Use this process when approaching a new project:
18
18
● Qualify client/project
○ Does it make sense to get involved?
○ Is it evident that we can create value?
● Perform conversations/interviews
○ Find out more about the context
■ company, status, goals, limitations...
○ Learn from first-hand experience
● Summarize information, learnings and plans in writing
○ Roadmap document
○ Depicting the situation and ways forward
Data Pipeline ArchitectData Pipeline Architect
Is there potential
for a good fit?
Do budget, topic and goals seem in order?
Data Pipeline ArchitectData Pipeline Architect
Qualifying considerations. Learning about the client and project.
20
20
● What are you working on?
● What part of the project would you like help with?
● What needs to happen to make this a success for you?
● Why was this project started? What are the business goals?
● Is there an event that triggered it?
● Why especially now?
● What’s the budget? (ballpark estimate)
● When are you looking to get started?
Data Pipeline ArchitectData Pipeline Architect
Still good? Let’s start a
business relationship.
Initial research and planning. Roadmapping consulting package.
Data Pipeline ArchitectData Pipeline Architect
Four people to talk to:
22
22
● Project owner
○ We want this guy to be successful
● Business owner or C-level perspective
○ Knows what’s best for the business
○ "What could the ceo ask you in the hallway"
● Data wrangler - tales from the trenches
○ Insights into day-to-day business and data details
● Engineering Side
○ Current tech stack
○ Infos on constraints and preferences
○ Last touches
● Conversation focus, questions and duration vary from person to person.
Data Pipeline ArchitectData Pipeline Architect
Interviews completed, situation understood and put into writing.
23
● A bit of focused communication, we have a great foundation!
○ Project motivation
○ Business goals
○ Who should benefit
○ How to make it happen
● Different perspectives on the project and business.
● Time for tech!
○ Context clear (goals, constraints)
● Best case:
○ Very few choices left to make
Data Pipeline ArchitectData Pipeline Architect
Here’s what I would have told myself when starting out:
24
● Learn about the company
○ Easier with fresh eyes
● Understand the business
○ Multiple perspectives
● Keep the goal in mind
○ Helps learning the right things
○ Cultivate a business mindset (help earn more/lose less)
○ Aim for results
■ I will not stop saying this anytime soon :)
● Have a process laid out
24
Data Pipeline ArchitectData Pipeline Architect
Finally: Tactical Advice Which Fits the Remaining Time.
That’s the right proportion :)
Data Pipeline ArchitectData Pipeline Architect
Don’t roll your own home-baked scripts.
26
26
● "Quick and easy" isn't
● Uniqueness is bad, boring is good
○ Learning curve for others
○ Original author leaving
○ Maintenance time, tricky bugs, code duplication
○ Unexpected failure modes
● Extensibility?
● Growth?
● Metadata?
Data Pipeline ArchitectData Pipeline Architect
You should know about workflow engines.
27
27
● Workflow = “[..] orchestrated and repeatable pattern of business activity [..]” [1]
● Data flow = “bunch of data processing tasks with inter-dependencies” [2]
● Pipelines of batch jobs
○ complex, long-running
● Dependency management
● Reusability of intermediate steps
● Logging and alerting
● Failure handling
● Monitoring
● Lots of effort went into them (Broken data? Crashes? Partial failures?)
[1] https://en.wikipedia.org/wiki/Workflow
[2] Elias Freider, 2013, “Luigi - Batch Data Processing in Python“
Data Pipeline ArchitectData Pipeline Architect
If in doubt, try Luigi.
28
28
● Spotify
○ Lots of data!
○ 10k+ Hadoop jobs every day [1]
● Battle hardened
○ Published 2009
○ Has been used in production by large companies for a while
● Python
● Modular & extensible
● Dependency graph
● Not just for data tasks
[1] Erik Bernhardsson, 2013, “Building Data Pipelines with Python and Luigi”
Data Pipeline ArchitectData Pipeline Architect
Usually worthwhile pipeline properties:
29
29
● Keep it small and lean
● Make learning and iterating easy
○ Changes should be cheap to accommodate for (both time and money)
● Build something to start learning
● Get data into one place
● Don’t reinvent the wheel
○ The tools are out there
○ ETL and workflow engines
● Create quick positive results, be efficient (lazy)
○ Many small improvements everywhere
○ Instead of solving everything for one group
○ More bang-for-the-buck
Data Pipeline ArchitectData Pipeline Architect
In conclusion:
30
● Don’t dive into tactics right away
● Aim to create business value
○ Make it a conscious goal
● Understand the business, people and processes
○ This will take some time. It’s a good investment.
○ Have a process yourself
○ Tech choices will follow
● Try to make it easy to learn and iterate
● Get data in one place
● Don’t go with home-baked scripts
● Consider workflow engines
○ Luigi in particular30
Data Pipeline ArchitectData Pipeline Architect
Thanks! Want to learn more?
“What questions to ask? Am I missing something?”
For your future interviews and planning:
I want to share my seed-question lists with you!
Just drop me your email address at:
http://datapipelinearchitect.com/datanatives/

More Related Content

What's hot

Thinking Big with Big Data
Thinking Big with Big DataThinking Big with Big Data
Thinking Big with Big Data
Shawn Hermans
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
Turi, Inc.
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
Data Science Club
 
Behind the AI curtain: Designing for trust in machine learning products
Behind the AI curtain: Designing for trust in machine learning productsBehind the AI curtain: Designing for trust in machine learning products
Behind the AI curtain: Designing for trust in machine learning products
Software Guru
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 
Building a data platform tnt
Building a data platform tntBuilding a data platform tnt
Building a data platform tnt
BigDataExpo
 
Big Data Fud
Big Data FudBig Data Fud
Big Data Fud
Sudhir(SMACI) Menon
 
Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
Dmitry Tolpeko
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for Recruitment
José Kadlec
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studio
Norman Poh
 
DISUMMIT - Rishi Nalin Kumar from Datakind
DISUMMIT - Rishi Nalin Kumar from DatakindDISUMMIT - Rishi Nalin Kumar from Datakind
DISUMMIT - Rishi Nalin Kumar from Datakind
DigitYser
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Data Refinement
Data RefinementData Refinement
Data RefinementVivastream
 
BADR - startups sales deck.
BADR - startups sales deck.BADR - startups sales deck.
BADR - startups sales deck.
Muhammad Elkharashy
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
Week1day2 (1)
Week1day2 (1)Week1day2 (1)
Week1day2 (1)
Shaon Datta
 
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Darpan Deoghare
 
Big data perspective solution & technology
Big data perspective solution & technologyBig data perspective solution & technology
Big data perspective solution & technology
Pankaj Khattar
 

What's hot (20)

Thinking Big with Big Data
Thinking Big with Big DataThinking Big with Big Data
Thinking Big with Big Data
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
 
Behind the AI curtain: Designing for trust in machine learning products
Behind the AI curtain: Designing for trust in machine learning productsBehind the AI curtain: Designing for trust in machine learning products
Behind the AI curtain: Designing for trust in machine learning products
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Building a data platform tnt
Building a data platform tntBuilding a data platform tnt
Building a data platform tnt
 
Big Data Fud
Big Data FudBig Data Fud
Big Data Fud
 
Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for Recruitment
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studio
 
DISUMMIT - Rishi Nalin Kumar from Datakind
DISUMMIT - Rishi Nalin Kumar from DatakindDISUMMIT - Rishi Nalin Kumar from Datakind
DISUMMIT - Rishi Nalin Kumar from Datakind
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Data Refinement
Data RefinementData Refinement
Data Refinement
 
BADR - startups sales deck.
BADR - startups sales deck.BADR - startups sales deck.
BADR - startups sales deck.
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Week1day2 (1)
Week1day2 (1)Week1day2 (1)
Week1day2 (1)
 
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
 
Big data perspective solution & technology
Big data perspective solution & technologyBig data perspective solution & technology
Big data perspective solution & technology
 

Similar to "Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies

Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
Vladislav Supalov
 
DevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
DevSecCon Boston 2018: Technical debt - why I love it by Mike BursellDevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
DevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
DevSecCon
 
The layperson's guide to software architecture
The layperson's guide to software architectureThe layperson's guide to software architecture
The layperson's guide to software architecture
Thoughtworks
 
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
Data Con LA
 
Life of a data engineer
Life of a data engineerLife of a data engineer
Life of a data engineer
Nithish Raghunandanan
 
Tackle Your Everyday Business Problems Like an Architect, Melissa Shepard
Tackle Your Everyday Business Problems Like an Architect, Melissa ShepardTackle Your Everyday Business Problems Like an Architect, Melissa Shepard
Tackle Your Everyday Business Problems Like an Architect, Melissa Shepard
CzechDreamin
 
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality HeroWebinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Angela Sun
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Lviv Startup Club
 
Working with data.pdf
Working with data.pdfWorking with data.pdf
Working with data.pdf
BJZarate
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
Chris Riccomini
 
Running a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedRunning a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learned
Pere Ferrera Bertran
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentation
Alexandru Sisu
 
Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
DiUS
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
IWMW
 
Moving EA - from where we are to where we should be
Moving EA - from where we are to where we should beMoving EA - from where we are to where we should be
Moving EA - from where we are to where we should be
LeanIX GmbH
 
Large drupal site builds a workshop for sxsw interactive - march 17, 2015
Large drupal site builds   a workshop for sxsw interactive - march 17, 2015Large drupal site builds   a workshop for sxsw interactive - march 17, 2015
Large drupal site builds a workshop for sxsw interactive - march 17, 2015
rgristroph
 
Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)
Publishing Smarter
 
Data Driven Business Lab Feb2019
Data Driven Business Lab Feb2019Data Driven Business Lab Feb2019
Data Driven Business Lab Feb2019
Dries van den Enden
 

Similar to "Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies (20)

Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
DevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
DevSecCon Boston 2018: Technical debt - why I love it by Mike BursellDevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
DevSecCon Boston 2018: Technical debt - why I love it by Mike Bursell
 
The layperson's guide to software architecture
The layperson's guide to software architectureThe layperson's guide to software architecture
The layperson's guide to software architecture
 
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
Data Con LA 2019 - Move Fast, Think Big: The Principals of Managing Large Sca...
 
Life of a data engineer
Life of a data engineerLife of a data engineer
Life of a data engineer
 
Tackle Your Everyday Business Problems Like an Architect, Melissa Shepard
Tackle Your Everyday Business Problems Like an Architect, Melissa ShepardTackle Your Everyday Business Problems Like an Architect, Melissa Shepard
Tackle Your Everyday Business Problems Like an Architect, Melissa Shepard
 
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality HeroWebinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 
Working with data.pdf
Working with data.pdfWorking with data.pdf
Working with data.pdf
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
 
Running a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedRunning a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learned
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentation
 
Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
IWMW 2004: It Always Takes Longer Than You Think (Even If You Think It Will T...
 
Moving EA - from where we are to where we should be
Moving EA - from where we are to where we should beMoving EA - from where we are to where we should be
Moving EA - from where we are to where we should be
 
Large drupal site builds a workshop for sxsw interactive - march 17, 2015
Large drupal site builds   a workshop for sxsw interactive - march 17, 2015Large drupal site builds   a workshop for sxsw interactive - march 17, 2015
Large drupal site builds a workshop for sxsw interactive - march 17, 2015
 
Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)
 
Data Driven Business Lab Feb2019
Data Driven Business Lab Feb2019Data Driven Business Lab Feb2019
Data Driven Business Lab Feb2019
 

More from Dataconomy Media

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Dataconomy Media
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Dataconomy Media
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Dataconomy Media
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Dataconomy Media
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Dataconomy Media
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Dataconomy Media
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Dataconomy Media
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Dataconomy Media
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Dataconomy Media
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Dataconomy Media
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Dataconomy Media
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Dataconomy Media
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Dataconomy Media
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Dataconomy Media
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Dataconomy Media
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Dataconomy Media
 

More from Dataconomy Media (20)

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies

  • 1. Data Pipeline ArchitectData Pipeline Architect Data Pipelines For small, messy and tedious data. Vladislav Supalov, 27th October 2016
  • 2. Data Pipeline ArchitectData Pipeline Architect How to tell if this talk is for you? 2 2 ● Big Data ○ Pretty fascinating ○ “Good problem to have” ● Most companies ○ Not quite there ○ Should not start at this level ● This is for you, if you are close to the data at a ○ Startup ○ Growing company ○ Established company which is about to start an initiative ● Working with a new CDO, CAO, Head of BI
  • 3. Data Pipeline ArchitectData Pipeline Architect I want to help you achieve better results! 3 3 ● What will help you to deal with …? ○ small data (not much is needed to be valuable) ○ messy data (multiple data sources, no overview) ○ tedious-to-handle data (multiple data sources, lots of manual work) ● “Use <tech X> in <way Y> and you will be fine”. Nope. ○ Just dealing with data is not a magic bullet ○ This will not guarantee good results for your company ○ You might get lucky of course. That’s not a safe bet. ● How can we improve your chances? Reduce risk. ○ Focus on what matters
  • 4. Data Pipeline ArchitectData Pipeline Architect Jumping to tech we would dive too deep, too early. 4 4 ● What people tend to think about first: ○ Dashboards ○ Tools ○ Technical solutions, best practices & tricks ● That’s tactics ● We should not jump into implementation details right away. ● Let’s not.
  • 5. Data Pipeline ArchitectData Pipeline Architect The Craft of Designing & Building Data Pipelines Should start with understanding the business.
  • 6. Data Pipeline ArchitectData Pipeline Architect Hi, I’m Vladislav! 6 6 ● Data background ○ Machine learning, computer vision, data mining ● Fascination with DevOps ○ Efficient, reliable infrastructure setups ○ Monitoring, automation, processes ● Currently: Co-founding a startup - Pivii Technologies ○ Startup, accelerated by Axel Springer Plug and Play ○ Artificial intelligence for content marketing ○ AI, ML, CV, data! ○ pivii.co ● Previously: Building a data engineering consulting business ○ datapipelinearchitect.com vsupalov
  • 7. Data Pipeline ArchitectData Pipeline Architect Preferred consulting situation: 7 7 ● Mobile application marketing agency ○ Not necessarily huge data ○ Very valuable and worthwhile (from a certain point) ● “We built prototype analytics tools in-house and they are mostly functional” ○ “We have seen the value!” ○ But are painful to work with & broken ○ “Time and money is still being wasted.” ● Tools were created out of an actual need ○ Organically, little planning ○ “How can we do better?” ○ “Where do we go from here?”
  • 8. Data Pipeline ArchitectData Pipeline Architect Common Success Pattern: Business Value was Created. Already achieved visible and measurable impact for the company. Or have gotten VERY close to do so. Are thinking about ROI.
  • 9. Data Pipeline ArchitectData Pipeline Architect Business first. Tech follows. 9 9 ● Key to successful data projects ○ Especially with limited resources ○ And small data ● Technical decisions should be informed by business needs and goals ● Handling data is a very small part of the whole ○ Straightforward once business needs are clear ● It starts with the mindset ○ Don't consider data plumbing in isolation
  • 10. Data Pipeline ArchitectData Pipeline Architect Key: being conscious and deliberate about the intention of creating business value. Let’s take a brief detour.
  • 11. Data Pipeline ArchitectData Pipeline Architect Consider sword fighting. 11 11 ● A great samurai sword master ● 1584 - 1645 ● Miyamoto Musashi ○ Martial artist ○ Tactician ○ Strategist ○ Artist ○ Sculptor ○ Calligrapher ○ Writer ○ Philosopher ○ ... Images: Miyamoto Musashi, self-portrait, http://sv-musashi1.com/about_Musashi.htm, Musashi Miyamoto with two Bokken, http://www.akinokai.org/images/Images.htm?Musashi.jpg
  • 12. Data Pipeline ArchitectData Pipeline Architect “The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means.” - Miyamoto Musashi, The Book of Five Rings
  • 13. Data Pipeline ArchitectData Pipeline Architect “Whenever you parry, hit, spring, strike or touch the enemy’s cutting sword, you must cut the enemy in the same movement.” - Miyamoto Musashi, The Book of Five Rings
  • 14. Data Pipeline ArchitectData Pipeline Architect “It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him.” - Miyamoto Musashi, The Book of Five Rings
  • 15. Data Pipeline ArchitectData Pipeline Architect “More than anything, you must be thinking of carrying your movement through to cutting him. You must thoroughly research this.” - Miyamoto Musashi, The Book of Five Rings
  • 16. Data Pipeline ArchitectData Pipeline Architect The Goal of swordfighting is to cut the opponent. 16 16 ● Stating this makes it seem very obvious. ○ Why the effort and emphasis? ● It’s not. Even for aspiring practitioners. ○ Results suffer. ● Mindset is essential for mastery ● The core advice (to my understanding): ○ Attain, cultivate and apply a goal-oriented mindset ○ Aim every step you take towards the goal
  • 17. Data Pipeline ArchitectData Pipeline Architect Back to the world of data-handling businesses! 17 17 ● When working with company data ○ Before starting out on a project ○ Understand what you want and can achieve ○ Aim to create a positive impact on the business ○ Make it a constant, conscious goal ● The main tasks to do so are: ○ Understand the business ○ Understand the people ■ It’s about communication ○ Understand current processes ○ Be prepared to learn and revise
  • 18. Data Pipeline ArchitectData Pipeline Architect Use this process when approaching a new project: 18 18 ● Qualify client/project ○ Does it make sense to get involved? ○ Is it evident that we can create value? ● Perform conversations/interviews ○ Find out more about the context ■ company, status, goals, limitations... ○ Learn from first-hand experience ● Summarize information, learnings and plans in writing ○ Roadmap document ○ Depicting the situation and ways forward
  • 19. Data Pipeline ArchitectData Pipeline Architect Is there potential for a good fit? Do budget, topic and goals seem in order?
  • 20. Data Pipeline ArchitectData Pipeline Architect Qualifying considerations. Learning about the client and project. 20 20 ● What are you working on? ● What part of the project would you like help with? ● What needs to happen to make this a success for you? ● Why was this project started? What are the business goals? ● Is there an event that triggered it? ● Why especially now? ● What’s the budget? (ballpark estimate) ● When are you looking to get started?
  • 21. Data Pipeline ArchitectData Pipeline Architect Still good? Let’s start a business relationship. Initial research and planning. Roadmapping consulting package.
  • 22. Data Pipeline ArchitectData Pipeline Architect Four people to talk to: 22 22 ● Project owner ○ We want this guy to be successful ● Business owner or C-level perspective ○ Knows what’s best for the business ○ "What could the ceo ask you in the hallway" ● Data wrangler - tales from the trenches ○ Insights into day-to-day business and data details ● Engineering Side ○ Current tech stack ○ Infos on constraints and preferences ○ Last touches ● Conversation focus, questions and duration vary from person to person.
  • 23. Data Pipeline ArchitectData Pipeline Architect Interviews completed, situation understood and put into writing. 23 ● A bit of focused communication, we have a great foundation! ○ Project motivation ○ Business goals ○ Who should benefit ○ How to make it happen ● Different perspectives on the project and business. ● Time for tech! ○ Context clear (goals, constraints) ● Best case: ○ Very few choices left to make
  • 24. Data Pipeline ArchitectData Pipeline Architect Here’s what I would have told myself when starting out: 24 ● Learn about the company ○ Easier with fresh eyes ● Understand the business ○ Multiple perspectives ● Keep the goal in mind ○ Helps learning the right things ○ Cultivate a business mindset (help earn more/lose less) ○ Aim for results ■ I will not stop saying this anytime soon :) ● Have a process laid out 24
  • 25. Data Pipeline ArchitectData Pipeline Architect Finally: Tactical Advice Which Fits the Remaining Time. That’s the right proportion :)
  • 26. Data Pipeline ArchitectData Pipeline Architect Don’t roll your own home-baked scripts. 26 26 ● "Quick and easy" isn't ● Uniqueness is bad, boring is good ○ Learning curve for others ○ Original author leaving ○ Maintenance time, tricky bugs, code duplication ○ Unexpected failure modes ● Extensibility? ● Growth? ● Metadata?
  • 27. Data Pipeline ArchitectData Pipeline Architect You should know about workflow engines. 27 27 ● Workflow = “[..] orchestrated and repeatable pattern of business activity [..]” [1] ● Data flow = “bunch of data processing tasks with inter-dependencies” [2] ● Pipelines of batch jobs ○ complex, long-running ● Dependency management ● Reusability of intermediate steps ● Logging and alerting ● Failure handling ● Monitoring ● Lots of effort went into them (Broken data? Crashes? Partial failures?) [1] https://en.wikipedia.org/wiki/Workflow [2] Elias Freider, 2013, “Luigi - Batch Data Processing in Python“
  • 28. Data Pipeline ArchitectData Pipeline Architect If in doubt, try Luigi. 28 28 ● Spotify ○ Lots of data! ○ 10k+ Hadoop jobs every day [1] ● Battle hardened ○ Published 2009 ○ Has been used in production by large companies for a while ● Python ● Modular & extensible ● Dependency graph ● Not just for data tasks [1] Erik Bernhardsson, 2013, “Building Data Pipelines with Python and Luigi”
  • 29. Data Pipeline ArchitectData Pipeline Architect Usually worthwhile pipeline properties: 29 29 ● Keep it small and lean ● Make learning and iterating easy ○ Changes should be cheap to accommodate for (both time and money) ● Build something to start learning ● Get data into one place ● Don’t reinvent the wheel ○ The tools are out there ○ ETL and workflow engines ● Create quick positive results, be efficient (lazy) ○ Many small improvements everywhere ○ Instead of solving everything for one group ○ More bang-for-the-buck
  • 30. Data Pipeline ArchitectData Pipeline Architect In conclusion: 30 ● Don’t dive into tactics right away ● Aim to create business value ○ Make it a conscious goal ● Understand the business, people and processes ○ This will take some time. It’s a good investment. ○ Have a process yourself ○ Tech choices will follow ● Try to make it easy to learn and iterate ● Get data in one place ● Don’t go with home-baked scripts ● Consider workflow engines ○ Luigi in particular30
  • 31. Data Pipeline ArchitectData Pipeline Architect Thanks! Want to learn more? “What questions to ask? Am I missing something?” For your future interviews and planning: I want to share my seed-question lists with you! Just drop me your email address at: http://datapipelinearchitect.com/datanatives/