SlideShare a Scribd company logo
Seravia in the Cloud Danny Yang May 7, 2011
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample of Data 2.0 Companies
Our data sets
Mongo ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sphinx ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],sphinxsearch.com
Combining Mongo and Sphinx ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why cloud? ,[object Object],[object Object],[object Object],[object Object]
Amazon Cloud Advantages ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Amazon Cloud users ,[object Object],[object Object],[object Object]
Amazon Web Services (AWS) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Seravia on AWS WWW Data Crawlware ELB EC2  rails, mongo,  mysql, sphinx S3 EC2 parsing, pentaho, ETL S3 EMR hadoop, hive, BI EC2 S3 EC2  rails, mongo,  mysql, sphinx
WWW architecture ELB EC2  webserver S3 EC2  webserver EC2  mongo EC2  sphinx EC2  mysql EC2  webserver, rails EC2  mongo EC2  sphinx
Data Architecture EC2 post-processing EC2 Parsing, ETL S3 EC2 Parsing, ETL EMR Hadoop, hive, BI EC2 post-processing 1. Raw data – html, xml, text files 2. Pre-processed – unrelated tsv files 3. Analyzed – related tsv files and reports 4. Post-processed  –  json documents EC2 post-processing
Crawlware Architecture EC2 Crawler EC2 Crawler EC2 Controller S3 EC2 Crawler EC2 Crawler EC2 Controller
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q & A

More Related Content

What's hot

Deploying On EC2
Deploying On EC2Deploying On EC2
Deploying On EC2
Steve Loughran
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
Treasure Data, Inc.
 
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
NAVER D2
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
Winston Chen
 
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series -  Running Microservices with Amazon EC2 Contain...AWS September Webinar Series -  Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
Amazon Web Services
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
Safe Software
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
Backbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & FirebaseBackbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & Firebase
Clément Wehrung
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki Tatsumi
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Chuan-Yen Chiang
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
Kouhei Sutou
 
Graph and Neptune
Graph and NeptuneGraph and Neptune
Graph and Neptune
Amazon Web Services
 
Distributed search solutions and comparison
Distributed search   solutions and comparison Distributed search   solutions and comparison
Distributed search solutions and comparison
zingopen
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
Kevin Lee
 

What's hot (20)

Deploying On EC2
Deploying On EC2Deploying On EC2
Deploying On EC2
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series -  Running Microservices with Amazon EC2 Contain...AWS September Webinar Series -  Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Backbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & FirebaseBackbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & Firebase
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
 
Graph and Neptune
Graph and NeptuneGraph and Neptune
Graph and Neptune
 
Distributed search solutions and comparison
Distributed search   solutions and comparison Distributed search   solutions and comparison
Distributed search solutions and comparison
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
 

Similar to Seravia in the Cloud

Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
Amazon Web Services
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Amazon Web Services
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Amazon Web Services
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
Amazon Web Services
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Amazon Web Services
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
cmcavoy
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Amazon Web Services
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Amazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Amazon Web Services
 

Similar to Seravia in the Cloud (20)

Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Seravia in the Cloud

  • 1. Seravia in the Cloud Danny Yang May 7, 2011
  • 2.
  • 3. Sample of Data 2.0 Companies
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Seravia on AWS WWW Data Crawlware ELB EC2 rails, mongo, mysql, sphinx S3 EC2 parsing, pentaho, ETL S3 EMR hadoop, hive, BI EC2 S3 EC2 rails, mongo, mysql, sphinx
  • 13. WWW architecture ELB EC2 webserver S3 EC2 webserver EC2 mongo EC2 sphinx EC2 mysql EC2 webserver, rails EC2 mongo EC2 sphinx
  • 14. Data Architecture EC2 post-processing EC2 Parsing, ETL S3 EC2 Parsing, ETL EMR Hadoop, hive, BI EC2 post-processing 1. Raw data – html, xml, text files 2. Pre-processed – unrelated tsv files 3. Analyzed – related tsv files and reports 4. Post-processed – json documents EC2 post-processing
  • 15. Crawlware Architecture EC2 Crawler EC2 Crawler EC2 Controller S3 EC2 Crawler EC2 Crawler EC2 Controller
  • 16.
  • 17. Q & A