SlideShare a Scribd company logo
© 2015 IBM Corporation
Gibt es bei Regen wirklich keine Taxis?
Open Data in Aktion: Jeder kann
analysieren!
data2day conference 2015, Karlsruhe
Wilfried Hoge – IT Architect Big Data – hoge@de.ibm.com @wilfriedhoge
Stephan Reimann – IT Specialist Big Data – stephan.reimann@de.ibm.com @stereimann
© 2015 IBM Corporation
Motivation: A personal experience – especially when it is raining it
seems difficult to get a taxi
§  Is that true?
§  Can analytics provide the answer?
§  Is there any correlation between rain and taxi availability?
2
© 2015 IBM Corporation
First we needed data ... Open Data was the key
§  "Open data is the idea that some data should be freely available to everyone to use and
republish as they wish, without restrictions from copyright, patents or other mechanisms of
control.“ [Wikipedia, https://en.wikipedia.org/wiki/Open_data]
§  Open Data is available in different fields, e.g. Science, Government
§  Open Government data is available at almost any level:
– EU http://open-data.europa.eu/en/data/
– US https://www.data.gov/
– GovData – Das Datenportal für Deutschland https://www.govdata.de/
– Bavaria https://opendata.bayern.de/
– Munich https://www.opengov-muenchen.de/
– Berlin http://daten.berlin.de/
– New York https://nycopendata.socrata.com/
– ...
§  Open Data is available in several categories: census data, traffic, education, environment,
economy, health, ...
3
© 2015 IBM Corporation
There is plenty of Open Data, but sometimes it isn’t that easy to find
the one you are looking for
§  We needed taxi & weather data
§  Since we couldn’t find an appropriate taxi data set for
Munich, we choose New York
§  The taxi data set is available at
http://www.andresmh.com/nyctaxitrips/ and contains 2
areas trip data & trip fares
§  The taxi data set contains all taxi trips in Manhattan for
2013, approx. 4GB/month, overall too big to analyze it
on a Laptop
§  For the weather, we could find plenty of weather data,
but not detailed enough for our analysis, open weather
data was only available on a daily base, but taxi data is
on exact time
§  We decided to buy an appropriate data set with hourly
weather information for NYC at
https://weatherspark.com/ (approx. 10 €)
4
© 2015 IBM Corporation
Then we needed tools to analyze the data, we choose to use cloud
services due to their simplicity and agility
1. IDEAS
2. PROTOTYPE
3. FAIL FAST
4. PRODUCTION
•  Through cloud services, ideas can
be realized fast and simple:
•  Prototype ideas
•  Fail fast
•  Bring successful idea into
production
5
© 2015 IBM Corporation
Flexible Compute Options to Run Apps / Services
Instant Runtimes Containers Virtual Machines
Platform Deployment Options that Meet Your Workload Requirements
Bluemix
Public
Bluemix
Dedicated
Bluemix
Local*
DevOps Tooling
IBM SoftLayer
Catalog of Services that Extend Apps’ Functionality
Web Data Mobile AnalyticsCognitive IoT Security Yours
Cloud Services Fabric
Delivery Storage Network Security
Operational Excellence, Visibility, Hybrid Portability
Data Integration Operations
Your Own Hosted Apps / Services with Support of many Languages and Runtimes Integration and API Mgmt
Your Datacenter
We have used IBM Bluemix for our “investigation”
6
© 2015 IBM Corporation
To provide the data for analytics, we used the Softlayer Object store
due to automatic compression and attractive price
Automatic partitioning
7
Automatic compression
~4 ct / GB per month
© 2015 IBM Corporation
We decided to use dashDB, an in-memory analytical cloud database
to analyze the data since it exceeded the laptop capacity
§  Why?
– Easy to use
– No infrastructure required
– No tuning required, focus on analytics
8
www.ibm.com/software/data/dashdb/
© 2015 IBM Corporation
dashDB made it simple to create the table structures and load the
data
Create the tables1
Load the data2
Start analyzing3
9
© 2015 IBM Corporation
Now we can use SQL to obtain first insights
SQL can be also used for sampling and data preparation ...
10
© 2015 IBM Corporation
With the integrated RStudio, we can now start with advanced
analytics, e.g. to find correlations
The data can be easily accessed from R via SQL
Start the integrated RStudio
11
© 2015 IBM Corporation
Some observations made with R
12
Day of week seems to
heavily influence the
number of trips
New York has very few
days with heavy rain,
maybe not the best place
for our investation
Season and holidays seem
to influence the number of
passengers per month
© 2015 IBM Corporation
So no strong correlations so far, let’s try a T-Test
http://matheguru.com/stochastik/t-test.html#rechner
13
The T-Test indicates that
the difference for
number of taxi trips
doesn’t show significant
correlation with rain
© 2015 IBM Corporation
Shiny Apps provide a simple way to create nice and interactive
visualizations in R
14
Select an area and access
information on individual trips
Get information about
trip destinations visually
From where are people
going to the airport?
© 2015 IBM Corporation
Shiny Apps are easy to create
Create a marker for each list element
Shiny app for selecting and passing data to Google maps
15
© 2015 IBM Corporation
So is it more difficult to get a taxi when it is raining?
§ There are shorter taxi trips when it is raining, but average trip fare is
higher
è More traffic? Less people using the bike or walk? Traffic jams?
§ T-Test indicates the difference isn’t significant
§ We have analyzed on day level, may be an analysis on an hourly base
would show different results
§ So it seems to be a personal impression, but not a correlation, ... But
maybe New York just hasn’t enough rain ;-)
§ Find your own answers -> https://github.com/WilHoge/NYC-Taxi-Demo
16
© 2015 IBM Corporation17
Try it on http://bluemix.net

More Related Content

What's hot

NetApp By The Numbers
NetApp By The NumbersNetApp By The Numbers
NetApp By The Numbers
NetApp Insight
 
Introduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformIntroduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data Platform
Margriet Groenendijk
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
Derek Gottfrid
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data Tricks
Cloudera, Inc.
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
SigortaTatbikatcilariDernegi
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage Facts
NetApp Insight
 
Why we need open data? FMI Open Data on AWS
Why we need open data? FMI Open Data on AWSWhy we need open data? FMI Open Data on AWS
Why we need open data? FMI Open Data on AWS
Roope Tervo
 
Heise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
Heise Developer World 2016 - Big Data ist tot, es lebe Business IntelligenzHeise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
Heise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
Markus Schmidberger
 
kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発
Teppei Sato
 
Big Data Science in the Cloud from Big Data World Conference 2013
Big Data Science in the Cloud from Big Data World Conference 2013Big Data Science in the Cloud from Big Data World Conference 2013
Big Data Science in the Cloud from Big Data World Conference 2013
Markus Schmidberger
 
Pixie dust overview
Pixie dust overviewPixie dust overview
Pixie dust overview
David Taieb
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a Service
Mojtaba Imani
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip Report
Craig Milroy
 
Getting Started with FME 2017
Getting Started with FME 2017Getting Started with FME 2017
Getting Started with FME 2017
Sterling Geo
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
SmartMet Server in INSPIRE
SmartMet Server in INSPIRESmartMet Server in INSPIRE
SmartMet Server in INSPIRE
Roope Tervo
 
1Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 20171Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 2017
1Spatial
 
SnapLogic Live: AWS Integration
SnapLogic Live: AWS IntegrationSnapLogic Live: AWS Integration
SnapLogic Live: AWS Integration
SnapLogic
 
Daho.am meetup kubernetes evolution @abi
Daho.am meetup   kubernetes evolution @abiDaho.am meetup   kubernetes evolution @abi
Daho.am meetup kubernetes evolution @abi
Ovidiu Hutuleac
 

What's hot (20)

NetApp By The Numbers
NetApp By The NumbersNetApp By The Numbers
NetApp By The Numbers
 
Introduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data PlatformIntroduction to the IBM Watson Data Platform
Introduction to the IBM Watson Data Platform
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data Tricks
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage Facts
 
Why we need open data? FMI Open Data on AWS
Why we need open data? FMI Open Data on AWSWhy we need open data? FMI Open Data on AWS
Why we need open data? FMI Open Data on AWS
 
Heise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
Heise Developer World 2016 - Big Data ist tot, es lebe Business IntelligenzHeise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
Heise Developer World 2016 - Big Data ist tot, es lebe Business Intelligenz
 
kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発
 
Big Data Science in the Cloud from Big Data World Conference 2013
Big Data Science in the Cloud from Big Data World Conference 2013Big Data Science in the Cloud from Big Data World Conference 2013
Big Data Science in the Cloud from Big Data World Conference 2013
 
Pixie dust overview
Pixie dust overviewPixie dust overview
Pixie dust overview
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a Service
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip Report
 
Getting Started with FME 2017
Getting Started with FME 2017Getting Started with FME 2017
Getting Started with FME 2017
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
 
SmartMet Server in INSPIRE
SmartMet Server in INSPIRESmartMet Server in INSPIRE
SmartMet Server in INSPIRE
 
1Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 20171Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 2017
 
SnapLogic Live: AWS Integration
SnapLogic Live: AWS IntegrationSnapLogic Live: AWS Integration
SnapLogic Live: AWS Integration
 
Daho.am meetup kubernetes evolution @abi
Daho.am meetup   kubernetes evolution @abiDaho.am meetup   kubernetes evolution @abi
Daho.am meetup kubernetes evolution @abi
 

Similar to Is it harder to find a taxi when it is raining?

New York City Technology Forum 2015
New York City Technology Forum 2015New York City Technology Forum 2015
New York City Technology Forum 2015
Splunk
 
NYC Technology Forum 2015
NYC Technology Forum 2015NYC Technology Forum 2015
NYC Technology Forum 2015
Splunk
 
Massachusetts Digital Government Summit 2015
Massachusetts Digital Government Summit 2015Massachusetts Digital Government Summit 2015
Massachusetts Digital Government Summit 2015
Splunk
 
Illinois Digital Government Summit 2015
Illinois Digital Government Summit 2015Illinois Digital Government Summit 2015
Illinois Digital Government Summit 2015
Splunk
 
Marketing in the Age of Mobile
Marketing in the Age of MobileMarketing in the Age of Mobile
Marketing in the Age of Mobile
Adobe Experience Cloud
 
The Internet of Things (IoT) - What Really Matters for a Start-Up
The Internet of Things (IoT) - What Really Matters for a Start-UpThe Internet of Things (IoT) - What Really Matters for a Start-Up
The Internet of Things (IoT) - What Really Matters for a Start-Up
Sandy Carter
 
Big Data LDN 2017: Data Integration & Big Data Management
Big Data LDN 2017: Data Integration & Big Data ManagementBig Data LDN 2017: Data Integration & Big Data Management
Big Data LDN 2017: Data Integration & Big Data Management
Matt Stubbs
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive eraPreparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
Steven Miller
 
Have your cake and eat it too: adopting technologies without sacrificing - Pa...
Have your cake and eat it too: adopting technologies without sacrificing - Pa...Have your cake and eat it too: adopting technologies without sacrificing - Pa...
Have your cake and eat it too: adopting technologies without sacrificing - Pa...
Internet World
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
Steven Miller
 
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
Jon Stevens-Hall
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
 
Building intelligent apps using iot and cognitive services - Prabhjot Bakshi
Building intelligent apps using iot and cognitive services - Prabhjot BakshiBuilding intelligent apps using iot and cognitive services - Prabhjot Bakshi
Building intelligent apps using iot and cognitive services - Prabhjot Bakshi
aOS Community
 
Smart & Safer Cities by Richard Knight
Smart & Safer Cities by Richard KnightSmart & Safer Cities by Richard Knight
Smart & Safer Cities by Richard Knight
International Turneky Systems
 
Ford
FordFord
Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013
exponential-inc
 
IBM Bluemix saves the game
IBM Bluemix saves the gameIBM Bluemix saves the game
IBM Bluemix saves the game
gjuljo
 
Cognitive Sustainability Presentation for Berkeley
Cognitive Sustainability Presentation for Berkeley Cognitive Sustainability Presentation for Berkeley
Cognitive Sustainability Presentation for Berkeley
Daryl Pereira
 
A Step into the Future – Educational Cloud Services - Patrick kirk
A Step into the Future – Educational Cloud Services - Patrick kirkA Step into the Future – Educational Cloud Services - Patrick kirk
A Step into the Future – Educational Cloud Services - Patrick kirk
Synetrix
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
 

Similar to Is it harder to find a taxi when it is raining? (20)

New York City Technology Forum 2015
New York City Technology Forum 2015New York City Technology Forum 2015
New York City Technology Forum 2015
 
NYC Technology Forum 2015
NYC Technology Forum 2015NYC Technology Forum 2015
NYC Technology Forum 2015
 
Massachusetts Digital Government Summit 2015
Massachusetts Digital Government Summit 2015Massachusetts Digital Government Summit 2015
Massachusetts Digital Government Summit 2015
 
Illinois Digital Government Summit 2015
Illinois Digital Government Summit 2015Illinois Digital Government Summit 2015
Illinois Digital Government Summit 2015
 
Marketing in the Age of Mobile
Marketing in the Age of MobileMarketing in the Age of Mobile
Marketing in the Age of Mobile
 
The Internet of Things (IoT) - What Really Matters for a Start-Up
The Internet of Things (IoT) - What Really Matters for a Start-UpThe Internet of Things (IoT) - What Really Matters for a Start-Up
The Internet of Things (IoT) - What Really Matters for a Start-Up
 
Big Data LDN 2017: Data Integration & Big Data Management
Big Data LDN 2017: Data Integration & Big Data ManagementBig Data LDN 2017: Data Integration & Big Data Management
Big Data LDN 2017: Data Integration & Big Data Management
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive eraPreparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
 
Have your cake and eat it too: adopting technologies without sacrificing - Pa...
Have your cake and eat it too: adopting technologies without sacrificing - Pa...Have your cake and eat it too: adopting technologies without sacrificing - Pa...
Have your cake and eat it too: adopting technologies without sacrificing - Pa...
 
Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era Preparing the next generation for the cognitive era
Preparing the next generation for the cognitive era
 
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
BMC Engage 2015: IT Asset Management - An essential pillar for the digital en...
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
Building intelligent apps using iot and cognitive services - Prabhjot Bakshi
Building intelligent apps using iot and cognitive services - Prabhjot BakshiBuilding intelligent apps using iot and cognitive services - Prabhjot Bakshi
Building intelligent apps using iot and cognitive services - Prabhjot Bakshi
 
Smart & Safer Cities by Richard Knight
Smart & Safer Cities by Richard KnightSmart & Safer Cities by Richard Knight
Smart & Safer Cities by Richard Knight
 
Ford
FordFord
Ford
 
Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013
 
IBM Bluemix saves the game
IBM Bluemix saves the gameIBM Bluemix saves the game
IBM Bluemix saves the game
 
Cognitive Sustainability Presentation for Berkeley
Cognitive Sustainability Presentation for Berkeley Cognitive Sustainability Presentation for Berkeley
Cognitive Sustainability Presentation for Berkeley
 
A Step into the Future – Educational Cloud Services - Patrick kirk
A Step into the Future – Educational Cloud Services - Patrick kirkA Step into the Future – Educational Cloud Services - Patrick kirk
A Step into the Future – Educational Cloud Services - Patrick kirk
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
 

More from Wilfried Hoge

Cloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloudCloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloud
Wilfried Hoge
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...
Wilfried Hoge
 
2015.05.07 watson rp15
2015.05.07 watson rp152015.05.07 watson rp15
2015.05.07 watson rp15
Wilfried Hoge
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Wilfried Hoge
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
Wilfried Hoge
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
Wilfried Hoge
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
Wilfried Hoge
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
Wilfried Hoge
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2
Wilfried Hoge
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
Wilfried Hoge
 

More from Wilfried Hoge (10)

Cloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloudCloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloud
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...
 
2015.05.07 watson rp15
2015.05.07 watson rp152015.05.07 watson rp15
2015.05.07 watson rp15
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

Is it harder to find a taxi when it is raining?

  • 1. © 2015 IBM Corporation Gibt es bei Regen wirklich keine Taxis? Open Data in Aktion: Jeder kann analysieren! data2day conference 2015, Karlsruhe Wilfried Hoge – IT Architect Big Data – hoge@de.ibm.com @wilfriedhoge Stephan Reimann – IT Specialist Big Data – stephan.reimann@de.ibm.com @stereimann
  • 2. © 2015 IBM Corporation Motivation: A personal experience – especially when it is raining it seems difficult to get a taxi §  Is that true? §  Can analytics provide the answer? §  Is there any correlation between rain and taxi availability? 2
  • 3. © 2015 IBM Corporation First we needed data ... Open Data was the key §  "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.“ [Wikipedia, https://en.wikipedia.org/wiki/Open_data] §  Open Data is available in different fields, e.g. Science, Government §  Open Government data is available at almost any level: – EU http://open-data.europa.eu/en/data/ – US https://www.data.gov/ – GovData – Das Datenportal für Deutschland https://www.govdata.de/ – Bavaria https://opendata.bayern.de/ – Munich https://www.opengov-muenchen.de/ – Berlin http://daten.berlin.de/ – New York https://nycopendata.socrata.com/ – ... §  Open Data is available in several categories: census data, traffic, education, environment, economy, health, ... 3
  • 4. © 2015 IBM Corporation There is plenty of Open Data, but sometimes it isn’t that easy to find the one you are looking for §  We needed taxi & weather data §  Since we couldn’t find an appropriate taxi data set for Munich, we choose New York §  The taxi data set is available at http://www.andresmh.com/nyctaxitrips/ and contains 2 areas trip data & trip fares §  The taxi data set contains all taxi trips in Manhattan for 2013, approx. 4GB/month, overall too big to analyze it on a Laptop §  For the weather, we could find plenty of weather data, but not detailed enough for our analysis, open weather data was only available on a daily base, but taxi data is on exact time §  We decided to buy an appropriate data set with hourly weather information for NYC at https://weatherspark.com/ (approx. 10 €) 4
  • 5. © 2015 IBM Corporation Then we needed tools to analyze the data, we choose to use cloud services due to their simplicity and agility 1. IDEAS 2. PROTOTYPE 3. FAIL FAST 4. PRODUCTION •  Through cloud services, ideas can be realized fast and simple: •  Prototype ideas •  Fail fast •  Bring successful idea into production 5
  • 6. © 2015 IBM Corporation Flexible Compute Options to Run Apps / Services Instant Runtimes Containers Virtual Machines Platform Deployment Options that Meet Your Workload Requirements Bluemix Public Bluemix Dedicated Bluemix Local* DevOps Tooling IBM SoftLayer Catalog of Services that Extend Apps’ Functionality Web Data Mobile AnalyticsCognitive IoT Security Yours Cloud Services Fabric Delivery Storage Network Security Operational Excellence, Visibility, Hybrid Portability Data Integration Operations Your Own Hosted Apps / Services with Support of many Languages and Runtimes Integration and API Mgmt Your Datacenter We have used IBM Bluemix for our “investigation” 6
  • 7. © 2015 IBM Corporation To provide the data for analytics, we used the Softlayer Object store due to automatic compression and attractive price Automatic partitioning 7 Automatic compression ~4 ct / GB per month
  • 8. © 2015 IBM Corporation We decided to use dashDB, an in-memory analytical cloud database to analyze the data since it exceeded the laptop capacity §  Why? – Easy to use – No infrastructure required – No tuning required, focus on analytics 8 www.ibm.com/software/data/dashdb/
  • 9. © 2015 IBM Corporation dashDB made it simple to create the table structures and load the data Create the tables1 Load the data2 Start analyzing3 9
  • 10. © 2015 IBM Corporation Now we can use SQL to obtain first insights SQL can be also used for sampling and data preparation ... 10
  • 11. © 2015 IBM Corporation With the integrated RStudio, we can now start with advanced analytics, e.g. to find correlations The data can be easily accessed from R via SQL Start the integrated RStudio 11
  • 12. © 2015 IBM Corporation Some observations made with R 12 Day of week seems to heavily influence the number of trips New York has very few days with heavy rain, maybe not the best place for our investation Season and holidays seem to influence the number of passengers per month
  • 13. © 2015 IBM Corporation So no strong correlations so far, let’s try a T-Test http://matheguru.com/stochastik/t-test.html#rechner 13 The T-Test indicates that the difference for number of taxi trips doesn’t show significant correlation with rain
  • 14. © 2015 IBM Corporation Shiny Apps provide a simple way to create nice and interactive visualizations in R 14 Select an area and access information on individual trips Get information about trip destinations visually From where are people going to the airport?
  • 15. © 2015 IBM Corporation Shiny Apps are easy to create Create a marker for each list element Shiny app for selecting and passing data to Google maps 15
  • 16. © 2015 IBM Corporation So is it more difficult to get a taxi when it is raining? § There are shorter taxi trips when it is raining, but average trip fare is higher è More traffic? Less people using the bike or walk? Traffic jams? § T-Test indicates the difference isn’t significant § We have analyzed on day level, may be an analysis on an hourly base would show different results § So it seems to be a personal impression, but not a correlation, ... But maybe New York just hasn’t enough rain ;-) § Find your own answers -> https://github.com/WilHoge/NYC-Taxi-Demo 16
  • 17. © 2015 IBM Corporation17 Try it on http://bluemix.net