Your SlideShare is downloading. ×
0
Architecture & Scalability
An overview of the Palantir Server Architecture

Akash Jain
Director of Engineering


© 2008 Pa...
Overview

 Palantir Server Architecture
   – A fully-featured, enterprise-grade analytic platform
   – Robust, scalable, ...
Server Architecture
                                          HTTPS


                 Dispatch
                  Server
 ...
Dispatch Server




 Clients connect here
   – “Gateway to Palantir”
   – Clients can only connect here
 Connects to dat...
Roadmap: Revisioning DB
                                          HTTPS


                 Dispatch
                  Serv...
Revisioning DB



 Persistence store           Dispatch Server

 Oracle 10g RDBMS
                                      ...
Roadmap: Search Server
                                          HTTPS


                 Dispatch
                  Serve...
Search Server




 Built on Apache Lucene                   Search Server

   – Leverage text processing capability
   – ...
Clustered Search Scale Parameters




                                                Search Server
 Palantir Search Serv...
Clustered Search Mirroring
 Mirroring for User Scalability
  – Redundancy across machines
  – Index write requests go to ...
Clustered Search Partitioning
 Partitioning for Data Scale
   – Split data across many machines
   – Search requests go t...
Roadmap: Job Server
                                          HTTPS


                 Dispatch
                  Server
 ...
Job Server



                                     Dispatch Server
 The job server runs
                                 ...
Systems Diagram
              HTTPS

                                         Client
External Network                     ...
Raptor Overview
 Raptor sits in front of data sources
 Raptor indexes data source and answers search queries
 Raptor mo...
Federated Search


 Raptor is Palantir’s federated search server
   – Rich data modeling
   – Extensible searching
   – H...
Raptor Query Process




Search Query                            Results Collection              Import to Palantir
•Hits ...
Raptor Scale Characteristics

 Data Scale
   – 100 million row Netflix dataset
   – 10 million document usenet corpus
   ...
Summary

 Palantir server components support a robust, scalable platform for
  analysis
 Leverage enterprise-grade infra...
Architecture & Scalability
An overview of the Palantir Server Architecture

Akash Jain
Director of Engineering


© 2008 Pa...
Upcoming SlideShare
Loading in...5
×

Architecture

3,187

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,187
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
224
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Architecture"

  1. 1. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.
  2. 2. Overview  Palantir Server Architecture – A fully-featured, enterprise-grade analytic platform – Robust, scalable, open and maintainable  In this talk – Dispatch Server – Oracle DB – Search Server – Job Server – Raptor Server
  3. 3. Server Architecture HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  4. 4. Dispatch Server  Clients connect here – “Gateway to Palantir” – Clients can only connect here  Connects to database – Access control – Revisioning database  Connects to search and federated search  Responsible for job creation and scheduling
  5. 5. Roadmap: Revisioning DB HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  6. 6. Revisioning DB  Persistence store Dispatch Server  Oracle 10g RDBMS JDBC 3.0  Enterprise-grade w/ SSL – Scalability Revisioning DB – Backup and Maintenance – Industry Standard – Large DBA community  JDBC 3.0 with SSL Oracle Database Storage
  7. 7. Roadmap: Search Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  8. 8. Search Server  Built on Apache Lucene Search Server – Leverage text processing capability – IR Library -> Enterprise Server – Full-text search capability – Custom fuzzy search using approxes  Why build our own? – Flexibility – database agnostic Lucene Index – Security – built into indexes Storage – Scalability
  9. 9. Clustered Search Scale Parameters Search Server  Palantir Search Server scales horizontally  User scale – Number of concurrent requests  Data scale – Additional corpora/data sources – Also includes manually entered data Lucene Index Storage
  10. 10. Clustered Search Mirroring  Mirroring for User Scalability – Redundancy across machines – Index write requests go to all mirrors – Search requests go to one mirror – More mirrors-> more concurrent queries Increased Throughput Search Index Search Search Search Search Search SearchSearch Search Request 1 Request Request 2 Request 1 Index Request Request 2 Request 4 3 Index Request 5 Request 6 A Request 3 Request A Request A Search Search Search Search Search Search Mirror Search Mirror Mirror Mirror Mirror Search Mirror Search Mirror Mirror Mirror Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  11. 11. Clustered Search Partitioning  Partitioning for Data Scale – Split data across many machines – Search requests go to all partitions – Index write requests go to one partition – More partitions -> more data with constant index size Increased Index Capacity Index Index Index Index Index Index Request IndexRequest Request Index Search Request Request Index Request Search Search Request A1 2 1 Request 3 Request 2 Request A 4 5 Request A3 Request 6 Search Search Search Search Search Search Partition Partition Search Partition Partition Partition Partition Search Partition Partition Search Partition Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  12. 12. Roadmap: Job Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  13. 13. Job Server Dispatch Server  The job server runs HTTPS asynchronous jobs on behalf of clients Job Server – Bulk data imports – Persistent searches Job Data Job Logs and Specs – LDAP auth syncs and Results  Many job servers Shared Storage
  14. 14. Systems Diagram HTTPS Client External Network HTTPS Dispatch DMZ Server JDBC 3.0 HTTPS w/ SSL Search Job Server Rev DB Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Shared Storage Storage Storage
  15. 15. Raptor Overview  Raptor sits in front of data sources  Raptor indexes data source and answers search queries  Raptor monitors changes in your data source and sends them to Palantir
  16. 16. Federated Search  Raptor is Palantir’s federated search server – Rich data modeling – Extensible searching – Highly scalable indexing and search capabilities  Leverages – Palantir Data Import Pipeline – Palantir Clustered Search Server  With Raptor:  Data owners control data  You control performance characteristics
  17. 17. Raptor Query Process Search Query Results Collection Import to Palantir •Hits Palantir Search Server •Results are sorted using •On-The-Fly (OTF) Import •Federated to Raptor relevance from each search •Sourcing information Instances if applicable retained for each attribute •Supports both keyword imported search and structured queries •Enables full Palantir Raptor Raptor functionality A B Raptor C Raptor C Searching Palantir Query Result Raptor B Searching Raptor A Searching
  18. 18. Raptor Scale Characteristics  Data Scale – 100 million row Netflix dataset – 10 million document usenet corpus – 1.5 million entity extracted Wikipedia corpus  Indexing Performance – 1m rows/hour structured indexing – 500k docs/hour unstructured document indexing – 100k docs/hour entity-extracted document indexing  Searching Performance – Sub-second search processing
  19. 19. Summary  Palantir server components support a robust, scalable platform for analysis  Leverage enterprise-grade infrastructure  Raptor provides further scalability
  20. 20. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×