SlideShare a Scribd company logo
Augmenting
MongoDB with Treasure Data
About Me
• A recovering software engineer turned digital artist
once interested in fractals;
• now into data visualization based on large datasets
rendered directly to GPU (RGL, various Python GL
libraries, etc.)
• it’s easier these days to manipulate large dataset
with limited effort
Images courtesy of Edureka, 10gen, MongoDB,
clipart panda and aperfectworld.org
Courtesy of Edureka
Position
relatively speaking
Images courtesy of Edureka and MongoDB
Strengths: Format,
Convenience
Schema courtesy of Emily Stolfo, MongoDB
Discuss
“not so strengths” of
MongoDB
• Horowitz was also very honest about where and how
MongoDB is lacking in its current offering – most notably in
terms of integration capabilities and some areas of high
performance.
• “In the relational world you’ve got a few big boxes, in the
MongoDB world you could have 2,000 commodity servers,
so you need really great management tools for that.
That’s a huge problem for us.”
• “The other big thing is automation, where you can have
automation tools that let you manage very large clusters all
from a very simple pane of glass.”
http://diginomica.com/2014/11/10/mongodb-cto-mongo-works-doesnt/
From an interview with MongoDB CTO Eliot Horowitz
testimony of John R Jensen, Cengage
hmmm…moar “not so strengths” ;)
of MongoDB
• The dreaded “Write Lock”
• https://news.ycombinator.com/item?id=1691748
• http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
- is the data actually relational or not?
• Slideshare “where not to use MongoDB”
• Slideshare “Hive vs. Cassandra vs. MongoDB”
• http://www.slideshare.net/johnrjenson/mongodb-pros-and-cons
• “choosing the right NoSQL database” video:
https://www.youtube.com/watch?t=34&v=gJFG04Sy6NY
• limits of MongoDB: http://docs.mongodb.org/manual/reference/limits/
More improvements?
What are the steps to scale?
The Hadoop Story on MongoDB
Share some use-cases
with us, won’t you?
Complementing MongoDB
• Featurewise?
• Elastic Search
• Use-Case: Stripe
• Opslog feature as queryable log of diffs
• Hadoop (good for large scale processing), Hive
(Treasure Data)
• Use-Case: Wish
What’s wish.com?
• Mobile eCommerce -
world’s largest mobile
shopping mall
• Top 10 app on iOS &
Android
• Product
discovery/personalization
Your Fun, Personalized Shopping Mall
Wish & MongoDB
• Primary database
• 64x mongod
• AWS -> bare metal (SSDs ftw!)
Architecture
App server
Wish App fluentd
fluentd aggregation
server
fluentd
fluentd aggregation
server
fluentd
Hadoop/Hive
Before fluentd
fluentd!
Complementing MongoDB
• Operationally?
• Managing MongoDB is hard (spin up instances
with Mongolab)
• MongoDB (monitoring products: Ops Manager
and MMS)
• What’s your pain? What’s are you missing?
You know what?
Managing ANY data is
hard.
Treasure Data as a Cloud
Way of complementing Mongo
DB
6
Batch
Imports
Streaming
Imports
Ingestion
Batch Processing
(Hive, Presto)
Storage
(Schema-on-read)
Console
ODBC JDBC
AWS S3
PrestogresCLI Luigi-TD
FTP
GDocs
HTTP Put
Python
PostgreSQL
Redshift
SFDC
Yahoo BDI
Tableau Server
MySQL
…
Ingest Analyze Distribute
Gree (gaming)
DATA ACCESS > ADVANCED ANALYTICS
• Product analytics: data
access is a major issue.
• “Machine learning” is still
simple and “small” in scale
(can be done inside Python)
• Future work:
productized/operationalized
machine learning
bluetooth
iOS/Android SDK
Fluentd
Python/Pandas SDK
Data Science Team (5 people)
SCHEMA(LESS) COUNTS
• Redshift: lots of co-use
cases
• Event data is semi-
structured → Can be
modeled as JSON but
schemas change
• Treasure Data provides a
SQL-accessible, semi-
structured data lake.
email
Source of truth/JSON
More intensive data processing
Hourly/Daily load
Big data mart
More interactive data processing
Ad hoc queries
for new data
Ad hoc queries
for cached data
DATA COLLECTION IS HARD
• Want to assume all data is
on S3 or HDFS, but reality is
murkier.
• Sensor readings available as
email attachments
• Provide data collection tools
for 90% of the use cases.
Have APIs ready for 10%.
GH SCADA
email
Parse & transform
Import via REST
Import data as JSON
Analyze via SQL
Query
Results
Data-informed
maintenance
The Sunk Cost Fallacy
Some revised scenarios
• Revised scenario 1: Using Treasure Data for
Ingestion and analytics; exporting results to
MongoDB for reporting
Some revised scenarios
• Revised scenario 2: Ingestion data into MongoDB
and exporting to Treasure Data
Treasure Data is good for a
some of the same things…
• less overhead in setup
• less - make that practically no - effort to scale
• less overhead/effort to use
• but -> less fine-tuned control over outcome
The Tradeoff
Control (sharding and replication) vs. Ease
(scaling automatically in the cloud)
?
MongoDB is already excellent for a lot of things
+
Demo!
jhammink@treasure-data.com
@Rijksband
www.treasure-data.com
Questions?

More Related Content

What's hot

GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
Treasure Data, Inc.
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
Christian Gügi
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
ObjectRocket
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
Lynn Langit
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Chris Schalk
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
Treasure Data, Inc.
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Data Con LA
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
Open source log analytics
Open source log analyticsOpen source log analytics
Open source log analytics
Vinod Nayal
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
Gabriel Hamilton
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
arupmalakar
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
 

What's hot (20)

GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Open source log analytics
Open source log analyticsOpen source log analytics
Open source log analytics
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 

Viewers also liked

Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredata
Treasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
Treasure Data, Inc.
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestion
Treasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
Treasure Data, Inc.
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
Treasure Data, Inc.
 

Viewers also liked (8)

Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredata
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestion
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 

Similar to Augmenting Mongo DB with Treasure Data

When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
Mongodb
MongodbMongodb
Mongodb
Apurva Vyas
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
MongoDB
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB
 
Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...
DATAVERSITY
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
Which database should I use for my app?
Which database should I use for my app?Which database should I use for my app?
Which database should I use for my app?
Nawaz Dhandala
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
MongoDB
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
Itamar Haber
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
Bogdan Gaza
 
NoSQL
NoSQLNoSQL
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
MongoDB
 

Similar to Augmenting Mongo DB with Treasure Data (20)

When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Which database should I use for my app?
Which database should I use for my app?Which database should I use for my app?
Which database should I use for my app?
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
NoSQL
NoSQLNoSQL
NoSQL
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 

More from Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
Treasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
Treasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
Treasure Data, Inc.
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
Treasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Treasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Treasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
Treasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
Treasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
Treasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
Treasure Data, Inc.
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
Treasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data, Inc.
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
Treasure Data, Inc.
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
Treasure Data, Inc.
 

More from Treasure Data, Inc. (14)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
 

Recently uploaded

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 

Recently uploaded (20)

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 

Augmenting Mongo DB with Treasure Data

  • 1.
  • 3. About Me • A recovering software engineer turned digital artist once interested in fractals; • now into data visualization based on large datasets rendered directly to GPU (RGL, various Python GL libraries, etc.) • it’s easier these days to manipulate large dataset with limited effort
  • 4. Images courtesy of Edureka, 10gen, MongoDB, clipart panda and aperfectworld.org
  • 6. relatively speaking Images courtesy of Edureka and MongoDB
  • 9. “not so strengths” of MongoDB • Horowitz was also very honest about where and how MongoDB is lacking in its current offering – most notably in terms of integration capabilities and some areas of high performance. • “In the relational world you’ve got a few big boxes, in the MongoDB world you could have 2,000 commodity servers, so you need really great management tools for that. That’s a huge problem for us.” • “The other big thing is automation, where you can have automation tools that let you manage very large clusters all from a very simple pane of glass.” http://diginomica.com/2014/11/10/mongodb-cto-mongo-works-doesnt/ From an interview with MongoDB CTO Eliot Horowitz
  • 10. testimony of John R Jensen, Cengage
  • 11. hmmm…moar “not so strengths” ;) of MongoDB • The dreaded “Write Lock” • https://news.ycombinator.com/item?id=1691748 • http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ - is the data actually relational or not? • Slideshare “where not to use MongoDB” • Slideshare “Hive vs. Cassandra vs. MongoDB” • http://www.slideshare.net/johnrjenson/mongodb-pros-and-cons • “choosing the right NoSQL database” video: https://www.youtube.com/watch?t=34&v=gJFG04Sy6NY • limits of MongoDB: http://docs.mongodb.org/manual/reference/limits/
  • 12.
  • 14. What are the steps to scale?
  • 15. The Hadoop Story on MongoDB
  • 16. Share some use-cases with us, won’t you?
  • 17. Complementing MongoDB • Featurewise? • Elastic Search • Use-Case: Stripe • Opslog feature as queryable log of diffs • Hadoop (good for large scale processing), Hive (Treasure Data) • Use-Case: Wish
  • 18. What’s wish.com? • Mobile eCommerce - world’s largest mobile shopping mall • Top 10 app on iOS & Android • Product discovery/personalization
  • 19. Your Fun, Personalized Shopping Mall
  • 20. Wish & MongoDB • Primary database • 64x mongod • AWS -> bare metal (SSDs ftw!)
  • 21. Architecture App server Wish App fluentd fluentd aggregation server fluentd fluentd aggregation server fluentd Hadoop/Hive
  • 24. Complementing MongoDB • Operationally? • Managing MongoDB is hard (spin up instances with Mongolab) • MongoDB (monitoring products: Ops Manager and MMS) • What’s your pain? What’s are you missing?
  • 25. You know what? Managing ANY data is hard.
  • 26. Treasure Data as a Cloud Way of complementing Mongo DB
  • 27. 6 Batch Imports Streaming Imports Ingestion Batch Processing (Hive, Presto) Storage (Schema-on-read) Console ODBC JDBC AWS S3 PrestogresCLI Luigi-TD FTP GDocs HTTP Put Python PostgreSQL Redshift SFDC Yahoo BDI Tableau Server MySQL … Ingest Analyze Distribute
  • 29. DATA ACCESS > ADVANCED ANALYTICS • Product analytics: data access is a major issue. • “Machine learning” is still simple and “small” in scale (can be done inside Python) • Future work: productized/operationalized machine learning bluetooth iOS/Android SDK Fluentd Python/Pandas SDK Data Science Team (5 people)
  • 30. SCHEMA(LESS) COUNTS • Redshift: lots of co-use cases • Event data is semi- structured → Can be modeled as JSON but schemas change • Treasure Data provides a SQL-accessible, semi- structured data lake. email Source of truth/JSON More intensive data processing Hourly/Daily load Big data mart More interactive data processing Ad hoc queries for new data Ad hoc queries for cached data
  • 31. DATA COLLECTION IS HARD • Want to assume all data is on S3 or HDFS, but reality is murkier. • Sensor readings available as email attachments • Provide data collection tools for 90% of the use cases. Have APIs ready for 10%. GH SCADA email Parse & transform Import via REST Import data as JSON Analyze via SQL Query Results Data-informed maintenance
  • 32. The Sunk Cost Fallacy
  • 33. Some revised scenarios • Revised scenario 1: Using Treasure Data for Ingestion and analytics; exporting results to MongoDB for reporting
  • 34. Some revised scenarios • Revised scenario 2: Ingestion data into MongoDB and exporting to Treasure Data
  • 35. Treasure Data is good for a some of the same things… • less overhead in setup • less - make that practically no - effort to scale • less overhead/effort to use • but -> less fine-tuned control over outcome
  • 36. The Tradeoff Control (sharding and replication) vs. Ease (scaling automatically in the cloud) ?
  • 37. MongoDB is already excellent for a lot of things +
  • 38. Demo!

Editor's Notes

  1. Just a quick review… Sharding strategies: Range sharding - (shard key divided by e.g. device id by range) Hash Sharding: MongoDB applies a MD5 hash on the key when the subkey is used Tag-Aware Sharding: allows a subset of shards to be tagged, and assigned to a sub-range of the shard key
  2. The folks at Edureka did a comparative study of different database types. Breakoff discussion: Let’s talk about what kind of databases we’re using, and for what purposes.
  3. Question to audience: How do mongo and HBase (Plazma?) fall on the boundary between partition tolerance and consistency? (Might consider leaving this slide out
  4. BSON looks like JSON and translate nicely to things like python dictionaries. Working the Mongo prompt is easy but requires mastering another API/paradigm.
  5. What are some other strengths of MongoDB
  6. Managing MongoDB is hard (spin up instances with Mongolab) MongoDB (monitoring products: Ops Manager and MMS)
  7. If you can lose 5sec. worth of updates, a MongoDB replication pair is just fine. If you can lose a day's worth of updates (or can easily reconstruct the database contents from other sources), you can try out pretty much anything without bad repercussions. If you can't lose anything, you're pretty much limited to the most conservative databases (the SQL bunch).
  8. Any places where this process could be problematic? One is a failure before cache is written, during finalize. Another could be a failure during any step of the M-R process.
  9. NOTE: Need transitions on this slide to control how things appear with my story We start in the app which generates the logs. The app synchronously logs to a fluentd running on the same host. There’s no network latency and the load on each local fluentd is trivial, so we’ve never had problems with these getting slow The local fluentd accepts the logs and buffers them on disk for reliability. Periodically, it flushes those buffers to one of the hosts in our fluentd aggregation tier with at-least-once semantics. These run active/active and can be scaled out linearly. They also buffer on disk and periodically flush into Hadoop. As an added bonus, they also flush into S3 for backup. This tier gives us an easy to monitor & manage conduit for our logs to flow through without imposing extra costs on the app. To recap, the logs from our app are buffered by a fluentd on the same host. That reliably forwards to a tier of aggregation fluentds, which forward to Hadoop and S3.
  10. The sunk cost fallacy is the idea that your sunk costs (unrecoverable) create barriers to adjusting your future spending. For example, “I’m hungry. Therefore I should eat that egg salad in the fridge (even if it’s gone bad) because I’ve already spent the money on it (rather than going for fresh food.”