Building an IoT Streaming
Analytics Platform
Corva @ MongoDB.local Houston 2019 – www.corva.ai
Oil & Gas Industry Challenge
Got Data?
How valuable is this?
Oil & Gas Industry Challenge
Mo Data?
Mo Problems?
Where it actually goes
Oil & Gas Industry Challenge
Data Consistency?
Which data source contains
more truth?
How to join them for better truth?
Oil & Gas Industry Challenge
Data Quality?
Bad sensor calibration?
Human Error?
Corva Platform: We Are The Real-time Experts
1 Real-time Engineering
Processing large amounts of data in real-time is really hard. Corva is the forefront leader in real-time
processing of engineering and data models.
2 App Platform
Architected for the future - an app platform ready for your big ideas and the challenges of real-time
machine learn
3 Automated Alert & 24/7 Operational Support
Point in time & Data Trend based Alerting System & fully staffed 24/7 support teams to handle data quality
checks and operational validation for our clients
Corva @ MongoDB.local Houston 2019
Corva powers real-time insight and intelligence
to optimize drilling
Multi-platform Capability
Engineering Apps
Real-time engineering of physics &
data modeling to see downhole
Analytics Apps
Leverage large data sets for powerful
interpretation & decisions
Corva Platform Demo
High Level System Architecture
Drilling Data Pipeline
Drilling Data Pipeline Explained
Drilling Cloud
Real-time engineering, alerts, and
analytics to optimize drilling
50+ apps for
• Monitoring
• Engineering
• Analytics
• Optimization
Why MongoDB
Flexible Schema
• No need to define Schema at collection creation
• Customized Schema per Stream
• Schema can be enriched in the middle of the Stream
{
"_id" : 1,
"timestamp" : 1521234568,
…
"data" : {
"hole_depth" : 10516.0,
"bit_depth" : 10513.8,
”rop" : 230.3,
"ml_predicted_rop”: 231.1,
”ml_optimized_rop" : 270.4
...
},
}
Why MongoDB
Enable Fast Data Growth
• Data growth is at 100GB/day
• Data in/out of our API is around 4TB / day
• One collection with 4.5TB of data
• Ability to add Shard and increase storage at exponential growth
• Response time with properly tuned index is near constant at growth
• Price per Storage compare to other solutions
Scale MongoDB
Index
• Design index for any type of query
• Primary ID + timestamp
• Partial Index
• Cluster query behavior based on existing index
• Create new index for new feature, consider what can be dropped
• Save on Index size, save the world
• Build index on 5TB collection takes a while
Shard
• Shard key needs to be indexed. Duh, or is it? What if it doesn’t fit?
• Try to utilize the most often used index.
• Using Hashed key for Shard.
Data-Driven Platform Features
Flexible Data Stream
• Data stream architecture that allow custom pipeline configuration
• Able to add custom apps to different streams depending on
customer
• Able to change configurations for each app
• Able to pilot custom apps with very limited system impact
Flexible Schema
• Flexible data driven API that allow one endpoint to handle
requests to different collections with different data schema
• /data/{provider}/{collection}/
Pot holes along the way
Tuning MongoDB
• Index / Sharing / Aggregation collections
• When to shard, how to shard
Lambda (Smoke and Mirrors of Serverless)
• 500+ million invokes per month
• Cold / Warm invoke & invoke exception behavior
• Logging; the 7 levels of CloudWatch hell
Scale and tune Kafka
• Topics / Partitions / Consumers configuration
Scale API
• Capability comes with responsibility and how to limit functions on
API (limits, index only queries)
Vision for Connected Wells Platform
Empowering users to optimize and improve operations while they happen
Reservoir Drilling
Completion
Production
Future Plans for MongoDB
IoT Time-series schema design
• Store data in aggregated format in raw collections
• Save on data storage
• Save on index size <- huge impact
Atlas Stitch
• Serverless functions enable REST API directly to Internet
• We utilize Google SSO & Key Auth
• VM? Docker? That’s so 2018…
• Automation of manual data input & workflows
• Automation for parsing PDF/Excel daily reports
Jim Wang (jim.wang@corva.ai)
Learn more at https://www.corva.ai
We are Hiring!!!
The Industry Cloud for Oil & Gas
Corva @ MongoDB.local Houston 2019

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to Handle 100,000+ Requests/Min

  • 1.
    Building an IoTStreaming Analytics Platform Corva @ MongoDB.local Houston 2019 – www.corva.ai
  • 2.
    Oil & GasIndustry Challenge Got Data? How valuable is this?
  • 3.
    Oil & GasIndustry Challenge Mo Data? Mo Problems? Where it actually goes
  • 4.
    Oil & GasIndustry Challenge Data Consistency? Which data source contains more truth? How to join them for better truth?
  • 5.
    Oil & GasIndustry Challenge Data Quality? Bad sensor calibration? Human Error?
  • 6.
    Corva Platform: WeAre The Real-time Experts 1 Real-time Engineering Processing large amounts of data in real-time is really hard. Corva is the forefront leader in real-time processing of engineering and data models. 2 App Platform Architected for the future - an app platform ready for your big ideas and the challenges of real-time machine learn 3 Automated Alert & 24/7 Operational Support Point in time & Data Trend based Alerting System & fully staffed 24/7 support teams to handle data quality checks and operational validation for our clients Corva @ MongoDB.local Houston 2019
  • 7.
    Corva powers real-timeinsight and intelligence to optimize drilling Multi-platform Capability
  • 8.
    Engineering Apps Real-time engineeringof physics & data modeling to see downhole
  • 9.
    Analytics Apps Leverage largedata sets for powerful interpretation & decisions
  • 10.
  • 11.
    High Level SystemArchitecture
  • 12.
  • 13.
  • 14.
    Drilling Cloud Real-time engineering,alerts, and analytics to optimize drilling 50+ apps for • Monitoring • Engineering • Analytics • Optimization
  • 15.
    Why MongoDB Flexible Schema •No need to define Schema at collection creation • Customized Schema per Stream • Schema can be enriched in the middle of the Stream { "_id" : 1, "timestamp" : 1521234568, … "data" : { "hole_depth" : 10516.0, "bit_depth" : 10513.8, ”rop" : 230.3, "ml_predicted_rop”: 231.1, ”ml_optimized_rop" : 270.4 ... }, }
  • 16.
    Why MongoDB Enable FastData Growth • Data growth is at 100GB/day • Data in/out of our API is around 4TB / day • One collection with 4.5TB of data • Ability to add Shard and increase storage at exponential growth • Response time with properly tuned index is near constant at growth • Price per Storage compare to other solutions
  • 17.
    Scale MongoDB Index • Designindex for any type of query • Primary ID + timestamp • Partial Index • Cluster query behavior based on existing index • Create new index for new feature, consider what can be dropped • Save on Index size, save the world • Build index on 5TB collection takes a while Shard • Shard key needs to be indexed. Duh, or is it? What if it doesn’t fit? • Try to utilize the most often used index. • Using Hashed key for Shard.
  • 18.
    Data-Driven Platform Features FlexibleData Stream • Data stream architecture that allow custom pipeline configuration • Able to add custom apps to different streams depending on customer • Able to change configurations for each app • Able to pilot custom apps with very limited system impact Flexible Schema • Flexible data driven API that allow one endpoint to handle requests to different collections with different data schema • /data/{provider}/{collection}/
  • 19.
    Pot holes alongthe way Tuning MongoDB • Index / Sharing / Aggregation collections • When to shard, how to shard Lambda (Smoke and Mirrors of Serverless) • 500+ million invokes per month • Cold / Warm invoke & invoke exception behavior • Logging; the 7 levels of CloudWatch hell Scale and tune Kafka • Topics / Partitions / Consumers configuration Scale API • Capability comes with responsibility and how to limit functions on API (limits, index only queries)
  • 20.
    Vision for ConnectedWells Platform Empowering users to optimize and improve operations while they happen Reservoir Drilling Completion Production
  • 21.
    Future Plans forMongoDB IoT Time-series schema design • Store data in aggregated format in raw collections • Save on data storage • Save on index size <- huge impact Atlas Stitch • Serverless functions enable REST API directly to Internet • We utilize Google SSO & Key Auth • VM? Docker? That’s so 2018… • Automation of manual data input & workflows • Automation for parsing PDF/Excel daily reports
  • 22.
    Jim Wang (jim.wang@corva.ai) Learnmore at https://www.corva.ai We are Hiring!!! The Industry Cloud for Oil & Gas Corva @ MongoDB.local Houston 2019