SlideShare a Scribd company logo
1 of 18
May 21, 2016
Using the Cloud to Process
Unstructured Big Data
J on the Beach, Malaga, Spain
RavenPack: Mapping the World’s
Big Data for Financial Applications
Jason Cornez ‒ CTO
jcornez@ravenpack.com
2ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• RavenPack delivers big data analytics to financial professionals
• Top hedge funds and investment banks use RavenPack
for trading and risk management
• Patented, proprietary technology and award-winning research
• Archive of more than 300 million documents, spanning past 20 years
RavenPack processes hundreds of thousands of documents each day.
We produce machine readable analytics for each document in real time.
Expected processing time for a typical document is less than 250ms.
RavenPack at a Glance
3ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Classification Overview
• Realtime Classification: Classic vs Cloud
• Historical Classification: Classic vs Cloud
• New Challenges: Spot Instances and The Weather
• New Opportunities
Contents
4ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Extract meaning from Unstructured Text
• Tokenization
• Entity Detection
• Attribute Tagging
• Event Detection
• Consolidation
A stream-based Classification Framework allow us to add new classifiers into a stream of
documents. As much as possible, classifiers use separate threads to run in parallel.
Classification Overview
5ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Dictionary of nearly 400,000 entities
• Point-in-time aware
• Rules per entity type
• Extensive entity relationship modeling
• Supports metadata and other hints
• Equivalent terms and stop words
We support: company (Oracle Corp.), organization (European Union), geo-political place
(Spain), currency (US Dollar), nationality (Spanish), people (Barack Obama), commodity
(Crude Oil), position (CEO, President), team (Real Madrid), product (iPhone 6S), and more.
Entity Detection
6ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Example: People Detection
• Many people share the same or similar names
• Many people hold various positions at employers across time
• People have one or more nationalities
• People are related to other people
Melanie Griffith files for divorce from Banderas
Mai And Banderas Star In The New The King Of Fighters XIV Trailer
After year out, Tim Cook joins competitive Oregon State running back battle
Apple CEO Tim Cook Attends iPad Pro 9.7 inch Launch at Palo Alto Store
7ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• 6 Servers, 19 KVM virtual machines
• Limited Storage - Expensive to Upgrade
• Multiple Points of Failure
Use Case: Realtime Classification
RDBMS
Collectors
RT Feed
Snapshots
Classifier
Files
8ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• CloudFormation to model the Stack
• Unlimited, Distributed Storage
• Easy redundancy, failover and backup
Use Case: Realtime Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
CloudSearch
Amazon
Redshift
Amazon
Kinesis
RT Feed
Snapshots
ClassifiersCollectors
9ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Lose central RDBMS → Lose transactions
• S3 great for documents, but no index
• DynamoDB great for index, but...
Must manage throughput
No foreign keys or integrity constraints
Eventual consistency
• RedShift amazing for OLAP, but not OLTP
So use Kinesis to stream and then batch
• Schema-free is a myth
Applications are more flexible and scalable, but also more complex.
Cloud Migration Challenges
10ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• Same Limited Set of Servers, Same RDBMS
• Can affect Realtime System, Backups
• Full archive, 4-6 Classifiers → 6 weeks!
Use Case: History Classification
RDBMS Files
Classifiers
Classifiers
11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• Servers on Demand, Distributed Storage
• Independent of Realtime System
• Full archive, 100 Classifiers → 3 days!
Use Case: History Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
Redshift
Availability Zone
Availability Zone
...
Classifiers
Coordinator
12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model - Clear skies!
• Well-known resources
• Predictable workload
• Predictable behavior
• Stable Behavior
We have full control over the resources.
We expect a service to be started seldom
and to run for a long time without interruption.
The Weather
13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Spot Instances
• Bid for unused capacity
• Save money, control costs
• Great for jobs with no specific deadline
• Possible to bid above on-demand rates
Typically pay 1/2 to 1/10 the “on-demand” rates.
We use spot instances for our historical
classification runs.
The Weather
14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Warning! Uncertain Conditions
• Someone else’s resources
• Unpredictable behavior
• Easy to move the spot market
We have no control over the resources or who
else might be using them. We expect a server
can be killed with little notice.
The Weather
15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Warning! Uncertain Conditions
• Do work in multiple zones
• Optimize image startup
• Group work into well-defined chunks
• Use on-demand instances for co-ordination
Expect inclement weather and be prepared for it!
Dealing with Bad Weather
Availability Zone
16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Download a Custom “Slice” of Analytics Data
• Provide a Web-API and Web Service
• Let client specify parameters
Data Set and Time Range
Entities and Events
Filters
• Leverage Amazon RedShift and S3
• Compression and Multiple Output Formats
Opportunity: Self-Service Data
Amazon
S3
Amazon
Redshift
Amazon
EC2
Amazon API
Gateway
17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Let Clients upload Proprietary Content
to a Private and Secure VPC
• Provision Computing and Storage Resources
on a Per Project Basis
• View Private Analytics in Isolation or Alongside
Standard RavenPack Analytic DataSets
• Everything Goes Away when Project Completes
Opportunity: The RavenPack Cloud
Amazon
DynamoDB
Amazon
RDS
Amazon
S3
Amazon
Redshift
Amazon
EC2
AWS
CloudFormation
Amazon
CloudSearch
May 21, 2016
Using the Cloud to Process
Unstructured Big Data
J on the Beach, Malaga, Spain
Thank you! Gracias!
Jason Cornez ‒ CTO
jcornez@ravenpack.com

More Related Content

What's hot

20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" BusinessMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBMongoDB
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnelTechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnelElmer Thomas
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMProfinit
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected WorldNeo4j
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapRThe World Bank
 

What's hot (16)

20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" Business
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDB
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnelTechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBM
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 

Viewers also liked

Exploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured NewsExploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured Newsravenpack
 
VOC & Unstructured Data
VOC & Unstructured DataVOC & Unstructured Data
VOC & Unstructured DataGenex_Insights
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSAmazon Web Services
 
Leadership By The Numbers
Leadership By The NumbersLeadership By The Numbers
Leadership By The NumbersGordon M. Groat
 
Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...Pvrtechnologies Nellore
 
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...Luis Xavier Torres
 
Escala - Colcci - Party All The Time
Escala - Colcci - Party All The TimeEscala - Colcci - Party All The Time
Escala - Colcci - Party All The Timeslideescala
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured DataChristine Connors
 
Queja administrativa
Queja administrativaQueja administrativa
Queja administrativaSusy Sosa
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaKPI Partners
 
MARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case StudyMARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case Studymarhenbun
 
Energy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystemEnergy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystemLeMeniz Infotech
 

Viewers also liked (16)

Exploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured NewsExploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured News
 
VOC & Unstructured Data
VOC & Unstructured DataVOC & Unstructured Data
VOC & Unstructured Data
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWS
 
Leadership By The Numbers
Leadership By The NumbersLeadership By The Numbers
Leadership By The Numbers
 
John ryde
John rydeJohn ryde
John ryde
 
Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...
 
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
 
Competence based education
Competence based educationCompetence based education
Competence based education
 
Voronin_CV
Voronin_CVVoronin_CV
Voronin_CV
 
Escala - Colcci - Party All The Time
Escala - Colcci - Party All The TimeEscala - Colcci - Party All The Time
Escala - Colcci - Party All The Time
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Queja administrativa
Queja administrativaQueja administrativa
Queja administrativa
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With Endeca
 
MARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case StudyMARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case Study
 
Energy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystemEnergy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystem
 

Similar to Using the cloud to process unstructured big data by Jason Cornez.

AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAmazon Web Services
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...Amazon Web Services
 
Simplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of ItSimplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of ItCloudian
 
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...smecchk
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosCloudera, Inc.
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...Amazon Web Services
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDataStax
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar:  Overcoming the Top 3 Challenges of the Storage Status QuoWebinar:  Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar: Overcoming the Top 3 Challenges of the Storage Status QuoStorage Switzerland
 
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAmazon Web Services
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)Amazon Web Services
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享Amazon Web Services
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataStorage Switzerland
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoAmazon Web Services LATAM
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...Amazon Web Services
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 

Similar to Using the cloud to process unstructured big data by Jason Cornez. (20)

AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage Ecosystem
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
 
Simplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of ItSimplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of It
 
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar:  Overcoming the Top 3 Challenges of the Storage Status QuoWebinar:  Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
Datacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid CloudDatacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid Cloud
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayJ On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t HaveJ On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoTJ On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternJ On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorJ On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 

Recently uploaded (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 

Using the cloud to process unstructured big data by Jason Cornez.

  • 1. May 21, 2016 Using the Cloud to Process Unstructured Big Data J on the Beach, Malaga, Spain RavenPack: Mapping the World’s Big Data for Financial Applications Jason Cornez ‒ CTO jcornez@ravenpack.com
  • 2. 2ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • RavenPack delivers big data analytics to financial professionals • Top hedge funds and investment banks use RavenPack for trading and risk management • Patented, proprietary technology and award-winning research • Archive of more than 300 million documents, spanning past 20 years RavenPack processes hundreds of thousands of documents each day. We produce machine readable analytics for each document in real time. Expected processing time for a typical document is less than 250ms. RavenPack at a Glance
  • 3. 3ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Classification Overview • Realtime Classification: Classic vs Cloud • Historical Classification: Classic vs Cloud • New Challenges: Spot Instances and The Weather • New Opportunities Contents
  • 4. 4ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Extract meaning from Unstructured Text • Tokenization • Entity Detection • Attribute Tagging • Event Detection • Consolidation A stream-based Classification Framework allow us to add new classifiers into a stream of documents. As much as possible, classifiers use separate threads to run in parallel. Classification Overview
  • 5. 5ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Dictionary of nearly 400,000 entities • Point-in-time aware • Rules per entity type • Extensive entity relationship modeling • Supports metadata and other hints • Equivalent terms and stop words We support: company (Oracle Corp.), organization (European Union), geo-political place (Spain), currency (US Dollar), nationality (Spanish), people (Barack Obama), commodity (Crude Oil), position (CEO, President), team (Real Madrid), product (iPhone 6S), and more. Entity Detection
  • 6. 6ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Example: People Detection • Many people share the same or similar names • Many people hold various positions at employers across time • People have one or more nationalities • People are related to other people Melanie Griffith files for divorce from Banderas Mai And Banderas Star In The New The King Of Fighters XIV Trailer After year out, Tim Cook joins competitive Oregon State running back battle Apple CEO Tim Cook Attends iPad Pro 9.7 inch Launch at Palo Alto Store
  • 7. 7ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model • 6 Servers, 19 KVM virtual machines • Limited Storage - Expensive to Upgrade • Multiple Points of Failure Use Case: Realtime Classification RDBMS Collectors RT Feed Snapshots Classifier Files
  • 8. 8ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model using AWS • CloudFormation to model the Stack • Unlimited, Distributed Storage • Easy redundancy, failover and backup Use Case: Realtime Classification Amazon EC2 AWS CloudFormation Amazon DynamoDB Amazon S3 Amazon RDS Amazon CloudSearch Amazon Redshift Amazon Kinesis RT Feed Snapshots ClassifiersCollectors
  • 9. 9ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Lose central RDBMS → Lose transactions • S3 great for documents, but no index • DynamoDB great for index, but... Must manage throughput No foreign keys or integrity constraints Eventual consistency • RedShift amazing for OLAP, but not OLTP So use Kinesis to stream and then batch • Schema-free is a myth Applications are more flexible and scalable, but also more complex. Cloud Migration Challenges
  • 10. 10ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model • Same Limited Set of Servers, Same RDBMS • Can affect Realtime System, Backups • Full archive, 4-6 Classifiers → 6 weeks! Use Case: History Classification RDBMS Files Classifiers Classifiers
  • 11. 11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model using AWS • Servers on Demand, Distributed Storage • Independent of Realtime System • Full archive, 100 Classifiers → 3 days! Use Case: History Classification Amazon EC2 AWS CloudFormation Amazon DynamoDB Amazon S3 Amazon RDS Amazon Redshift Availability Zone Availability Zone ... Classifiers Coordinator
  • 12. 12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model - Clear skies! • Well-known resources • Predictable workload • Predictable behavior • Stable Behavior We have full control over the resources. We expect a service to be started seldom and to run for a long time without interruption. The Weather
  • 13. 13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Spot Instances • Bid for unused capacity • Save money, control costs • Great for jobs with no specific deadline • Possible to bid above on-demand rates Typically pay 1/2 to 1/10 the “on-demand” rates. We use spot instances for our historical classification runs. The Weather
  • 14. 14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Warning! Uncertain Conditions • Someone else’s resources • Unpredictable behavior • Easy to move the spot market We have no control over the resources or who else might be using them. We expect a server can be killed with little notice. The Weather
  • 15. 15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Warning! Uncertain Conditions • Do work in multiple zones • Optimize image startup • Group work into well-defined chunks • Use on-demand instances for co-ordination Expect inclement weather and be prepared for it! Dealing with Bad Weather Availability Zone
  • 16. 16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Download a Custom “Slice” of Analytics Data • Provide a Web-API and Web Service • Let client specify parameters Data Set and Time Range Entities and Events Filters • Leverage Amazon RedShift and S3 • Compression and Multiple Output Formats Opportunity: Self-Service Data Amazon S3 Amazon Redshift Amazon EC2 Amazon API Gateway
  • 17. 17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Let Clients upload Proprietary Content to a Private and Secure VPC • Provision Computing and Storage Resources on a Per Project Basis • View Private Analytics in Isolation or Alongside Standard RavenPack Analytic DataSets • Everything Goes Away when Project Completes Opportunity: The RavenPack Cloud Amazon DynamoDB Amazon RDS Amazon S3 Amazon Redshift Amazon EC2 AWS CloudFormation Amazon CloudSearch
  • 18. May 21, 2016 Using the Cloud to Process Unstructured Big Data J on the Beach, Malaga, Spain Thank you! Gracias! Jason Cornez ‒ CTO jcornez@ravenpack.com