Submit Search
Upload
Tales from Dataland, or do I really know what I am doing?
•
0 likes
•
42 views
Evention
Follow
Przemysław Maciołek, Toptal LLC
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 27
Download now
Download to read offline
Recommended
[WebMuses] Bajki z krainy danych
[WebMuses] Bajki z krainy danych
Przemek Maciolek
[FDD 2017] Maciej Nowak - .Netowiec w kosmosie
[FDD 2017] Maciej Nowak - .Netowiec w kosmosie
Future Processing
Odnaleźć się w nanokosmosie
Odnaleźć się w nanokosmosie
Stowarzyszenie Jakości Systemów Informatycznych (SJSI)
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
Evention
A/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Evention
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Evention
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Evention
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Evention
Recommended
[WebMuses] Bajki z krainy danych
[WebMuses] Bajki z krainy danych
Przemek Maciolek
[FDD 2017] Maciej Nowak - .Netowiec w kosmosie
[FDD 2017] Maciej Nowak - .Netowiec w kosmosie
Future Processing
Odnaleźć się w nanokosmosie
Odnaleźć się w nanokosmosie
Stowarzyszenie Jakości Systemów Informatycznych (SJSI)
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
Evention
A/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Evention
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Evention
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Evention
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Evention
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Evention
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
Privacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, Mapflat
Evention
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Evention
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Evention
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Evention
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Evention
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Evention
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz Śliwa
Evention
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Evention
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chow
Evention
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM - Michał Brzezicki
Evention
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Evention
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Evention
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
Evention
Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
Evention
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Evention
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Evention
More Related Content
More from Evention
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Evention
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
Privacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, Mapflat
Evention
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Evention
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Evention
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Evention
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Evention
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Evention
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz Śliwa
Evention
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Evention
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chow
Evention
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM - Michał Brzezicki
Evention
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Evention
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Evention
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
Evention
Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
Evention
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Evention
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Evention
More from Evention
(20)
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Privacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, Mapflat
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz Śliwa
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chow
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM - Michał Brzezicki
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Tales from Dataland, or do I really know what I am doing?
1.
Bajki z krainy
danych Tudzież: czy wiem co robię? Opowie Przemek Maciołek z
2.
3.
Historia…
4.
5.
6.
7.
8.
Suspens!
9.
10.
> ?
11.
Co zrobiłby krasnalowy Chief
Data Scientist?
12.
13.
Nowi krasnale dostają
nowy typ młota Stary typ młota
14.
15.
install.packages('ggplot2') require('ggplot2') setwd("/Users/pmm/Desktop/hammer") all <- read.csv(file="all.csv") qplot(all$month_sequence,
all$dwarfs) + geom_smooth() qplot(all$month_sequence, all$production) + geom_smooth() all$prod_per_dwarf <- all$production / all$dwarfs qplot(all$month_sequence, all$prod_per_dwarf) + geom_smooth()
16.
Nowe młoty dla
krasnali od tego miesiąca…
17.
Średnia produkcja złota
na jednego krasnala
18.
Czy jest tu
jakiś problem?
19.
Produkcja dla danego
krasnala w kolejnych miesiącach od jego przyjścia. Używający stare młoty Używający nowe młoty
20.
Produkcja dla danego
krasnala w kolejnych miesiącach od jego przyjścia. Używający stare młoty Używający nowe młoty Luty Marzec …
21.
Produkcja dla danego
krasnala w kolejnych miesiącach od jego przyjścia. Używający stare młoty Używający nowe młoty lipiec sierpień …
22.
23.
new <- read.csv(file="new_relative.csv") old
<- read.csv(file="old_relative.csv") qplot(new$relative_month, new$production) ggplot(new, aes(x=relative_month, y=production)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2) # Tak bedzie ladniej: old$type='old' new$type='new' old_and_new = rbind(old,new) ggplot(old_and_new, aes(x=relative_month, y=production, color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2)
24.
25.
ggplot(old_and_new, aes(x=relative_month, y=production,
color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.1) + geom_smooth(method=lm)
26.
Nowe młoty zużywają się
dużo szybciej!
27.
Lekcje? • Warto: • wiedzieć
co się robi • zadawać pytania • być trochę podejrzliwym (przynajmniej jeśli chodzi o wyniki) • używać R i ggplot2 • Nie warto: • ufać wykresom ,,w ciemno’’
Download now