Architecture

4,242
-1

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,242
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
341
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Architecture

  1. 1. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.
  2. 2. Overview  Palantir Server Architecture – A fully-featured, enterprise-grade analytic platform – Robust, scalable, open and maintainable  In this talk – Dispatch Server – Oracle DB – Search Server – Job Server – Raptor Server
  3. 3. Server Architecture HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  4. 4. Dispatch Server  Clients connect here – “Gateway to Palantir” – Clients can only connect here  Connects to database – Access control – Revisioning database  Connects to search and federated search  Responsible for job creation and scheduling
  5. 5. Roadmap: Revisioning DB HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  6. 6. Revisioning DB  Persistence store Dispatch Server  Oracle 10g RDBMS JDBC 3.0  Enterprise-grade w/ SSL – Scalability Revisioning DB – Backup and Maintenance – Industry Standard – Large DBA community  JDBC 3.0 with SSL Oracle Database Storage
  7. 7. Roadmap: Search Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  8. 8. Search Server  Built on Apache Lucene Search Server – Leverage text processing capability – IR Library -> Enterprise Server – Full-text search capability – Custom fuzzy search using approxes  Why build our own? – Flexibility – database agnostic Lucene Index – Security – built into indexes Storage – Scalability
  9. 9. Clustered Search Scale Parameters Search Server  Palantir Search Server scales horizontally  User scale – Number of concurrent requests  Data scale – Additional corpora/data sources – Also includes manually entered data Lucene Index Storage
  10. 10. Clustered Search Mirroring  Mirroring for User Scalability – Redundancy across machines – Index write requests go to all mirrors – Search requests go to one mirror – More mirrors-> more concurrent queries Increased Throughput Search Index Search Search Search Search Search SearchSearch Search Request 1 Request Request 2 Request 1 Index Request Request 2 Request 4 3 Index Request 5 Request 6 A Request 3 Request A Request A Search Search Search Search Search Search Mirror Search Mirror Mirror Mirror Mirror Search Mirror Search Mirror Mirror Mirror Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  11. 11. Clustered Search Partitioning  Partitioning for Data Scale – Split data across many machines – Search requests go to all partitions – Index write requests go to one partition – More partitions -> more data with constant index size Increased Index Capacity Index Index Index Index Index Index Request IndexRequest Request Index Search Request Request Index Request Search Search Request A1 2 1 Request 3 Request 2 Request A 4 5 Request A3 Request 6 Search Search Search Search Search Search Partition Partition Search Partition Partition Partition Partition Search Partition Partition Search Partition Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  12. 12. Roadmap: Job Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  13. 13. Job Server Dispatch Server  The job server runs HTTPS asynchronous jobs on behalf of clients Job Server – Bulk data imports – Persistent searches Job Data Job Logs and Specs – LDAP auth syncs and Results  Many job servers Shared Storage
  14. 14. Systems Diagram HTTPS Client External Network HTTPS Dispatch DMZ Server JDBC 3.0 HTTPS w/ SSL Search Job Server Rev DB Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Shared Storage Storage Storage
  15. 15. Raptor Overview  Raptor sits in front of data sources  Raptor indexes data source and answers search queries  Raptor monitors changes in your data source and sends them to Palantir
  16. 16. Federated Search  Raptor is Palantir’s federated search server – Rich data modeling – Extensible searching – Highly scalable indexing and search capabilities  Leverages – Palantir Data Import Pipeline – Palantir Clustered Search Server  With Raptor:  Data owners control data  You control performance characteristics
  17. 17. Raptor Query Process Search Query Results Collection Import to Palantir •Hits Palantir Search Server •Results are sorted using •On-The-Fly (OTF) Import •Federated to Raptor relevance from each search •Sourcing information Instances if applicable retained for each attribute •Supports both keyword imported search and structured queries •Enables full Palantir Raptor Raptor functionality A B Raptor C Raptor C Searching Palantir Query Result Raptor B Searching Raptor A Searching
  18. 18. Raptor Scale Characteristics  Data Scale – 100 million row Netflix dataset – 10 million document usenet corpus – 1.5 million entity extracted Wikipedia corpus  Indexing Performance – 1m rows/hour structured indexing – 500k docs/hour unstructured document indexing – 100k docs/hour entity-extracted document indexing  Searching Performance – Sub-second search processing
  19. 19. Summary  Palantir server components support a robust, scalable platform for analysis  Leverage enterprise-grade infrastructure  Raptor provides further scalability
  20. 20. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.

×