Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,346
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
157
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.
  • 2. Overview  Palantir Server Architecture – A fully-featured, enterprise-grade analytic platform – Robust, scalable, open and maintainable  In this talk – Dispatch Server – Oracle DB – Search Server – Job Server – Raptor Server
  • 3. Server Architecture HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  • 4. Dispatch Server  Clients connect here – “Gateway to Palantir” – Clients can only connect here  Connects to database – Access control – Revisioning database  Connects to search and federated search  Responsible for job creation and scheduling
  • 5. Roadmap: Revisioning DB HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  • 6. Revisioning DB  Persistence store Dispatch Server  Oracle 10g RDBMS JDBC 3.0  Enterprise-grade w/ SSL – Scalability Revisioning DB – Backup and Maintenance – Industry Standard – Large DBA community  JDBC 3.0 with SSL Oracle Database Storage
  • 7. Roadmap: Search Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  • 8. Search Server  Built on Apache Lucene Search Server – Leverage text processing capability – IR Library -> Enterprise Server – Full-text search capability – Custom fuzzy search using approxes  Why build our own? – Flexibility – database agnostic Lucene Index – Security – built into indexes Storage – Scalability
  • 9. Clustered Search Scale Parameters Search Server  Palantir Search Server scales horizontally  User scale – Number of concurrent requests  Data scale – Additional corpora/data sources – Also includes manually entered data Lucene Index Storage
  • 10. Clustered Search Mirroring  Mirroring for User Scalability – Redundancy across machines – Index write requests go to all mirrors – Search requests go to one mirror – More mirrors-> more concurrent queries Increased Throughput Search Index Search Search Search Search Search SearchSearch Search Request 1 Request Request 2 Request 1 Index Request Request 2 Request 4 3 Index Request 5 Request 6 A Request 3 Request A Request A Search Search Search Search Search Search Mirror Search Mirror Mirror Mirror Mirror Search Mirror Search Mirror Mirror Mirror Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  • 11. Clustered Search Partitioning  Partitioning for Data Scale – Split data across many machines – Search requests go to all partitions – Index write requests go to one partition – More partitions -> more data with constant index size Increased Index Capacity Index Index Index Index Index Index Request IndexRequest Request Index Search Request Request Index Request Search Search Request A1 2 1 Request 3 Request 2 Request A 4 5 Request A3 Request 6 Search Search Search Search Search Search Partition Partition Search Partition Partition Partition Partition Search Partition Partition Search Partition Lucene Lucene Lucene Lucene Lucene Lucene Index Index Index Index Lucene Storage Index Index Index Storage Lucene Index Storage Storage Lucene Index Storage Storage Storage Storage Storage
  • 12. Roadmap: Job Server HTTPS Dispatch Server JDBC 3.0 HTTPS w/ SSL Revisioning DB Search Server Job Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Storage Storage Shared Storage
  • 13. Job Server Dispatch Server  The job server runs HTTPS asynchronous jobs on behalf of clients Job Server – Bulk data imports – Persistent searches Job Data Job Logs and Specs – LDAP auth syncs and Results  Many job servers Shared Storage
  • 14. Systems Diagram HTTPS Client External Network HTTPS Dispatch DMZ Server JDBC 3.0 HTTPS w/ SSL Search Job Server Rev DB Server Job Data Job Logs and Specs and Results Oracle Lucene Database Index Shared Storage Storage Storage
  • 15. Raptor Overview  Raptor sits in front of data sources  Raptor indexes data source and answers search queries  Raptor monitors changes in your data source and sends them to Palantir
  • 16. Federated Search  Raptor is Palantir’s federated search server – Rich data modeling – Extensible searching – Highly scalable indexing and search capabilities  Leverages – Palantir Data Import Pipeline – Palantir Clustered Search Server  With Raptor:  Data owners control data  You control performance characteristics
  • 17. Raptor Query Process Search Query Results Collection Import to Palantir •Hits Palantir Search Server •Results are sorted using •On-The-Fly (OTF) Import •Federated to Raptor relevance from each search •Sourcing information Instances if applicable retained for each attribute •Supports both keyword imported search and structured queries •Enables full Palantir Raptor Raptor functionality A B Raptor C Raptor C Searching Palantir Query Result Raptor B Searching Raptor A Searching
  • 18. Raptor Scale Characteristics  Data Scale – 100 million row Netflix dataset – 10 million document usenet corpus – 1.5 million entity extracted Wikipedia corpus  Indexing Performance – 1m rows/hour structured indexing – 500k docs/hour unstructured document indexing – 100k docs/hour entity-extracted document indexing  Searching Performance – Sub-second search processing
  • 19. Summary  Palantir server components support a robust, scalable platform for analysis  Leverage enterprise-grade infrastructure  Raptor provides further scalability
  • 20. Architecture & Scalability An overview of the Palantir Server Architecture Akash Jain Director of Engineering © 2008 Palantir Technologies Inc. All rights reserved.