SlideShare a Scribd company logo
May 21, 2016
Using the Cloud to Process
Unstructured Big Data
J on the Beach, Malaga, Spain
RavenPack: Mapping the World’s
Big Data for Financial Applications
Jason Cornez ‒ CTO
jcornez@ravenpack.com
2ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• RavenPack delivers big data analytics to financial professionals
• Top hedge funds and investment banks use RavenPack
for trading and risk management
• Patented, proprietary technology and award-winning research
• Archive of more than 300 million documents, spanning past 20 years
RavenPack processes hundreds of thousands of documents each day.
We produce machine readable analytics for each document in real time.
Expected processing time for a typical document is less than 250ms.
RavenPack at a Glance
3ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Classification Overview
• Realtime Classification: Classic vs Cloud
• Historical Classification: Classic vs Cloud
• New Challenges: Spot Instances and The Weather
• New Opportunities
Contents
4ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Extract meaning from Unstructured Text
• Tokenization
• Entity Detection
• Attribute Tagging
• Event Detection
• Consolidation
A stream-based Classification Framework allow us to add new classifiers into a stream of
documents. As much as possible, classifiers use separate threads to run in parallel.
Classification Overview
5ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Dictionary of nearly 400,000 entities
• Point-in-time aware
• Rules per entity type
• Extensive entity relationship modeling
• Supports metadata and other hints
• Equivalent terms and stop words
We support: company (Oracle Corp.), organization (European Union), geo-political place
(Spain), currency (US Dollar), nationality (Spanish), people (Barack Obama), commodity
(Crude Oil), position (CEO, President), team (Real Madrid), product (iPhone 6S), and more.
Entity Detection
6ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Example: People Detection
• Many people share the same or similar names
• Many people hold various positions at employers across time
• People have one or more nationalities
• People are related to other people
Melanie Griffith files for divorce from Banderas
Mai And Banderas Star In The New The King Of Fighters XIV Trailer
After year out, Tim Cook joins competitive Oregon State running back battle
Apple CEO Tim Cook Attends iPad Pro 9.7 inch Launch at Palo Alto Store
7ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• 6 Servers, 19 KVM virtual machines
• Limited Storage - Expensive to Upgrade
• Multiple Points of Failure
Use Case: Realtime Classification
RDBMS
Collectors
RT Feed
Snapshots
Classifier
Files
8ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• CloudFormation to model the Stack
• Unlimited, Distributed Storage
• Easy redundancy, failover and backup
Use Case: Realtime Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
CloudSearch
Amazon
Redshift
Amazon
Kinesis
RT Feed
Snapshots
ClassifiersCollectors
9ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Lose central RDBMS → Lose transactions
• S3 great for documents, but no index
• DynamoDB great for index, but...
Must manage throughput
No foreign keys or integrity constraints
Eventual consistency
• RedShift amazing for OLAP, but not OLTP
So use Kinesis to stream and then batch
• Schema-free is a myth
Applications are more flexible and scalable, but also more complex.
Cloud Migration Challenges
10ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• Same Limited Set of Servers, Same RDBMS
• Can affect Realtime System, Backups
• Full archive, 4-6 Classifiers → 6 weeks!
Use Case: History Classification
RDBMS Files
Classifiers
Classifiers
11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• Servers on Demand, Distributed Storage
• Independent of Realtime System
• Full archive, 100 Classifiers → 3 days!
Use Case: History Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
Redshift
Availability Zone
Availability Zone
...
Classifiers
Coordinator
12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model - Clear skies!
• Well-known resources
• Predictable workload
• Predictable behavior
• Stable Behavior
We have full control over the resources.
We expect a service to be started seldom
and to run for a long time without interruption.
The Weather
13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Spot Instances
• Bid for unused capacity
• Save money, control costs
• Great for jobs with no specific deadline
• Possible to bid above on-demand rates
Typically pay 1/2 to 1/10 the “on-demand” rates.
We use spot instances for our historical
classification runs.
The Weather
14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Warning! Uncertain Conditions
• Someone else’s resources
• Unpredictable behavior
• Easy to move the spot market
We have no control over the resources or who
else might be using them. We expect a server
can be killed with little notice.
The Weather
15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model - Warning! Uncertain Conditions
• Do work in multiple zones
• Optimize image startup
• Group work into well-defined chunks
• Use on-demand instances for co-ordination
Expect inclement weather and be prepared for it!
Dealing with Bad Weather
Availability Zone
16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Download a Custom “Slice” of Analytics Data
• Provide a Web-API and Web Service
• Let client specify parameters
Data Set and Time Range
Entities and Events
Filters
• Leverage Amazon RedShift and S3
• Compression and Multiple Output Formats
Opportunity: Self-Service Data
Amazon
S3
Amazon
Redshift
Amazon
EC2
Amazon API
Gateway
17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Let Clients upload Proprietary Content
to a Private and Secure VPC
• Provision Computing and Storage Resources
on a Per Project Basis
• View Private Analytics in Isolation or Alongside
Standard RavenPack Analytic DataSets
• Everything Goes Away when Project Completes
Opportunity: The RavenPack Cloud
Amazon
DynamoDB
Amazon
RDS
Amazon
S3
Amazon
Redshift
Amazon
EC2
AWS
CloudFormation
Amazon
CloudSearch
May 21, 2016
Using the Cloud to Process
Unstructured Big Data
J on the Beach, Malaga, Spain
Thank you! Gracias!
Jason Cornez ‒ CTO
jcornez@ravenpack.com

More Related Content

What's hot

20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Allen Day, PhD
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
MapR Technologies
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
Amazon Web Services
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
MapR Technologies
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" Business
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDB
MongoDB
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
The Hive
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
 
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnelTechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
Elmer Thomas
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBM
Profinit
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
Neo4j
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
Mark Rittman
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
The World Bank
 

What's hot (16)

20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" Business
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDB
 
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnelTechCrunch Disrupt Hackathon 2012 - CrunchFunnel
TechCrunch Disrupt Hackathon 2012 - CrunchFunnel
 
Výběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBMVýběr Big Data platformy - Jan Sovka - IBM
Výběr Big Data platformy - Jan Sovka - IBM
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 

Viewers also liked

Exploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured NewsExploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured News
ravenpack
 
VOC & Unstructured Data
VOC & Unstructured DataVOC & Unstructured Data
VOC & Unstructured Data
Genex_Insights
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWS
Amazon Web Services
 
Leadership By The Numbers
Leadership By The NumbersLeadership By The Numbers
Leadership By The Numbers
Gordon M. Groat
 
John ryde
John rydeJohn ryde
John ryde
Duzane Oliveira
 
Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...
Pvrtechnologies Nellore
 
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Luis Xavier Torres
 
Competence based education
Competence based educationCompetence based education
Competence based education
Agueda Castillo Sanchez
 
Voronin_CV
Voronin_CVVoronin_CV
Voronin_CV
Konstantin Voronin
 
Escala - Colcci - Party All The Time
Escala - Colcci - Party All The TimeEscala - Colcci - Party All The Time
Escala - Colcci - Party All The Time
slideescala
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
Christine Connors
 
Queja administrativa
Queja administrativaQueja administrativa
Queja administrativa
Susy Sosa
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With Endeca
KPI Partners
 
MARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case StudyMARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case Study
marhenbun
 
Energy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystemEnergy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystem
LeMeniz Infotech
 

Viewers also liked (16)

Exploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured NewsExploiting Entity Co-referencing in Unstructured News
Exploiting Entity Co-referencing in Unstructured News
 
VOC & Unstructured Data
VOC & Unstructured DataVOC & Unstructured Data
VOC & Unstructured Data
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWS
 
Leadership By The Numbers
Leadership By The NumbersLeadership By The Numbers
Leadership By The Numbers
 
John ryde
John rydeJohn ryde
John ryde
 
Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...Generating searchable public key ciphertexts with hidden structures for fast ...
Generating searchable public key ciphertexts with hidden structures for fast ...
 
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
Comprar En China, Pagar Seguro Y Crear Tu Propio Negocio ¿Por qué Hacerlo? De...
 
Competence based education
Competence based educationCompetence based education
Competence based education
 
Voronin_CV
Voronin_CVVoronin_CV
Voronin_CV
 
Escala - Colcci - Party All The Time
Escala - Colcci - Party All The TimeEscala - Colcci - Party All The Time
Escala - Colcci - Party All The Time
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Queja administrativa
Queja administrativaQueja administrativa
Queja administrativa
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Drive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With EndecaDrive Insight From Unstructured Data With Endeca
Drive Insight From Unstructured Data With Endeca
 
MARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case StudyMARK 671 Consumer Behaviour Case Study
MARK 671 Consumer Behaviour Case Study
 
Energy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystemEnergy aware load balancing and application scaling for the cloud ecosystem
Energy aware load balancing and application scaling for the cloud ecosystem
 

Similar to Using the cloud to process unstructured big data by Jason Cornez.

AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage Ecosystem
Amazon Web Services
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
Amazon Web Services
 
Simplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of ItSimplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of It
Cloudian
 
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
smecchk
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
Cloudera, Inc.
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
Amazon Web Services
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar:  Overcoming the Top 3 Challenges of the Storage Status QuoWebinar:  Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
Storage Switzerland
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
Amazon Web Services
 
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
Amazon Web Services
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
Amazon Web Services
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
Amazon Web Services
 
Datacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid CloudDatacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid Cloud
PT Datacomm Diangraha
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Storage Switzerland
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Amazon Web Services LATAM
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Amazon Web Services
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
javier ramirez
 

Similar to Using the cloud to process unstructured big data by Jason Cornez. (20)

AWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage EcosystemAWS Storage State of the Union & APN Storage Ecosystem
AWS Storage State of the Union & APN Storage Ecosystem
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
 
Simplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of ItSimplification of storage - The Hot and the Cold of It
Simplification of storage - The Hot and the Cold of It
 
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
Leveraging the Power of the Cloud for Your Business to Grow: Nate Taylor at S...
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar:  Overcoming the Top 3 Challenges of the Storage Status QuoWebinar:  Overcoming the Top 3 Challenges of the Storage Status Quo
Webinar: Overcoming the Top 3 Challenges of the Storage Status Quo
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
Datacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid CloudDatacomm VMWare Hybrid Cloud
Datacomm VMWare Hybrid Cloud
 
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured DataWebinar: End NAS Sprawl - Gain Control Over Unstructured Data
Webinar: End NAS Sprawl - Gain Control Over Unstructured Data
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
J On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
J On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
J On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
J On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
J On The Beach
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
J On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
J On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
J On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
J On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
J On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
J On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
J On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
J On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
J On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
J On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 

Recently uploaded (20)

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 

Using the cloud to process unstructured big data by Jason Cornez.

  • 1. May 21, 2016 Using the Cloud to Process Unstructured Big Data J on the Beach, Malaga, Spain RavenPack: Mapping the World’s Big Data for Financial Applications Jason Cornez ‒ CTO jcornez@ravenpack.com
  • 2. 2ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • RavenPack delivers big data analytics to financial professionals • Top hedge funds and investment banks use RavenPack for trading and risk management • Patented, proprietary technology and award-winning research • Archive of more than 300 million documents, spanning past 20 years RavenPack processes hundreds of thousands of documents each day. We produce machine readable analytics for each document in real time. Expected processing time for a typical document is less than 250ms. RavenPack at a Glance
  • 3. 3ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Classification Overview • Realtime Classification: Classic vs Cloud • Historical Classification: Classic vs Cloud • New Challenges: Spot Instances and The Weather • New Opportunities Contents
  • 4. 4ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Extract meaning from Unstructured Text • Tokenization • Entity Detection • Attribute Tagging • Event Detection • Consolidation A stream-based Classification Framework allow us to add new classifiers into a stream of documents. As much as possible, classifiers use separate threads to run in parallel. Classification Overview
  • 5. 5ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Dictionary of nearly 400,000 entities • Point-in-time aware • Rules per entity type • Extensive entity relationship modeling • Supports metadata and other hints • Equivalent terms and stop words We support: company (Oracle Corp.), organization (European Union), geo-political place (Spain), currency (US Dollar), nationality (Spanish), people (Barack Obama), commodity (Crude Oil), position (CEO, President), team (Real Madrid), product (iPhone 6S), and more. Entity Detection
  • 6. 6ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Example: People Detection • Many people share the same or similar names • Many people hold various positions at employers across time • People have one or more nationalities • People are related to other people Melanie Griffith files for divorce from Banderas Mai And Banderas Star In The New The King Of Fighters XIV Trailer After year out, Tim Cook joins competitive Oregon State running back battle Apple CEO Tim Cook Attends iPad Pro 9.7 inch Launch at Palo Alto Store
  • 7. 7ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model • 6 Servers, 19 KVM virtual machines • Limited Storage - Expensive to Upgrade • Multiple Points of Failure Use Case: Realtime Classification RDBMS Collectors RT Feed Snapshots Classifier Files
  • 8. 8ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model using AWS • CloudFormation to model the Stack • Unlimited, Distributed Storage • Easy redundancy, failover and backup Use Case: Realtime Classification Amazon EC2 AWS CloudFormation Amazon DynamoDB Amazon S3 Amazon RDS Amazon CloudSearch Amazon Redshift Amazon Kinesis RT Feed Snapshots ClassifiersCollectors
  • 9. 9ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Lose central RDBMS → Lose transactions • S3 great for documents, but no index • DynamoDB great for index, but... Must manage throughput No foreign keys or integrity constraints Eventual consistency • RedShift amazing for OLAP, but not OLTP So use Kinesis to stream and then batch • Schema-free is a myth Applications are more flexible and scalable, but also more complex. Cloud Migration Challenges
  • 10. 10ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model • Same Limited Set of Servers, Same RDBMS • Can affect Realtime System, Backups • Full archive, 4-6 Classifiers → 6 weeks! Use Case: History Classification RDBMS Files Classifiers Classifiers
  • 11. 11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model using AWS • Servers on Demand, Distributed Storage • Independent of Realtime System • Full archive, 100 Classifiers → 3 days! Use Case: History Classification Amazon EC2 AWS CloudFormation Amazon DynamoDB Amazon S3 Amazon RDS Amazon Redshift Availability Zone Availability Zone ... Classifiers Coordinator
  • 12. 12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Classic Model - Clear skies! • Well-known resources • Predictable workload • Predictable behavior • Stable Behavior We have full control over the resources. We expect a service to be started seldom and to run for a long time without interruption. The Weather
  • 13. 13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Spot Instances • Bid for unused capacity • Save money, control costs • Great for jobs with no specific deadline • Possible to bid above on-demand rates Typically pay 1/2 to 1/10 the “on-demand” rates. We use spot instances for our historical classification runs. The Weather
  • 14. 14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Warning! Uncertain Conditions • Someone else’s resources • Unpredictable behavior • Easy to move the spot market We have no control over the resources or who else might be using them. We expect a server can be killed with little notice. The Weather
  • 15. 15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Cloud Model - Warning! Uncertain Conditions • Do work in multiple zones • Optimize image startup • Group work into well-defined chunks • Use on-demand instances for co-ordination Expect inclement weather and be prepared for it! Dealing with Bad Weather Availability Zone
  • 16. 16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 Download a Custom “Slice” of Analytics Data • Provide a Web-API and Web Service • Let client specify parameters Data Set and Time Range Entities and Events Filters • Leverage Amazon RedShift and S3 • Compression and Multiple Output Formats Opportunity: Self-Service Data Amazon S3 Amazon Redshift Amazon EC2 Amazon API Gateway
  • 17. 17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90 • Let Clients upload Proprietary Content to a Private and Secure VPC • Provision Computing and Storage Resources on a Per Project Basis • View Private Analytics in Isolation or Alongside Standard RavenPack Analytic DataSets • Everything Goes Away when Project Completes Opportunity: The RavenPack Cloud Amazon DynamoDB Amazon RDS Amazon S3 Amazon Redshift Amazon EC2 AWS CloudFormation Amazon CloudSearch
  • 18. May 21, 2016 Using the Cloud to Process Unstructured Big Data J on the Beach, Malaga, Spain Thank you! Gracias! Jason Cornez ‒ CTO jcornez@ravenpack.com