Salesforce’s Trusted Enterprise
Platform and Apache Phoenix
Jan Fernando
Principal Member of Technical Staff
jfernando@salesforce.com
@janfern25
Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any
of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking
statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or
service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for
future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts
or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible
mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our
employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com
products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most
recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information
section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not
be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available.
Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Background: Salesforce App Cloud
Fast Development for Everyone
Build apps with clicks AND code with modern tools
Connected Experience Across Apps
Unified user experience across every device
Trusted Enterprise Cloud
Get unparalleled security in the cloud
Proven Success and Scale
5.5M apps, 2.5M developers, 4B daily transactions,
AppExchange
The fastest way to build apps for customers, employees, and partners
AppExchange Trailhead
Shared Data, Networks, and Identity Across Services
FORCE
HEROKU
ENTERPRISE
THUNDERLIGHTNING
Background on the Multitenant Force.com database
- SObjects make up each tenant’s model
• Metadata defined fields and relationships
• Fully custom models supported
- Various access patterns
• Query support via SOQL
• Programmatic access via Apex
• SOAP/REST API automatically exposed
- Many features auto-enabled
• UI, Triggers, Search, Workflows, Reports,
Dashboards, etc.
A 100% metadata defined data model
Introducing BigObjects
High Volume Storage for Force.com
BigObjects
● Built on Apache Phoenix and Apache HBase
● Familiar, object-based development model
● Familiar platform semantics and features
Scale to 100s of billions of records on Force.com
Why Apache Phoenix?
Apache Phoenix a great fit for
Force.com relational-like
semantics
SOQL SQL
Key Apache Phoenix features used
● SQL and JDBC semantics
● Native Multi-tenancy
● Views
● Secondary Indexes
● Integrations with Pig, Spark etc.
● Operational Features:
• Integrated Metrics
• Pherf
Phoenix features that make BigObjects possible on Force.com
Force.com Developer
Expectations:
● Easy to use even for non-professional developers
● Predictable behavior and performance
● Reliable and Resilient
Meeting Developer
Expectations
Requirement 1:
Stable SOQL query times
independent of data size
Solution: Synchronous SOQL Constraints
● Support a subset of SOQL grammar
● Prevent table scans and unbounded query
runtimes
● Query along rowkey axis only
● All other queries blocked e.g no aggregates
● Use Apache Phoenix Query Plan to evaluate
queries at runtime
Requirement 2:
Resiliently Query BigObjects
with full power of SOQL
Solution: AsyncSOQL
● Asynchronous processing directly on Apache Hadoop/HBase
clusters
● Built on Apache Phoenix - Apache Pig integration
● Query, filter, aggregate data using SOQL without restrictions
● Fault-tolerant query engine for BigObject data
● New API for BigObject analytics and processing
Batch Analytics for BigObjects
+
Requirement 3:
Load data at scale into
BigObjects
Solution: Integrated Map/Reduce Based Data Ingest
● M/R behind Force.com Bulk API semantics
● Use Apache Phoenix/ Apache Pig integration
● Support for metadata data validation rules and
security model
● Robust error handling
● Idempotent writes to allow retries at any level in
the stack
Operational Requirements
Ensuring Apache Phoenix client is a good citizen in a multitenant environment
Requirement 1:
Fast Recovery
Solution: Short timeouts and connection management
● Connection Pool like semantics for managing
resource consumption
• Counting checked out connections
● Apache HBase and Apache Phoenix Timeout
configurations for fast recovery
• Avoid thread backups in the event pieces of
higher latencies
Requirement 2:
Visibility
Solution: Apache Phoenix Metrics, Logging and Graphing
● Visibility into production phoenix query performance and
resource utilization
● PHOENIX-1452 and PHOENIX-1819 were mission critical
contributions for us
● Metrics Examples:
• TASK_QUEUE_WAIT_TIME
• SCAN_BYTES
https://phoenix.apache.org/metrics.html
Requirement 3:
Manage Apache Phoenix
client resource utilization
Solution: Set Apache Phoenix Client Configs Appropriately
● Configure small Apache Phoenix thread pool size
• phoenix.query.threadPoolSize
• phoenix.query.queueSize
● Configure Apache Phoenix memory consumption
conservatively
• phoenix.query.maxGlobalMemoryPercentage
● Avoid queries with heavy client-side processing
https://phoenix.apache.org/tuning.html
Requirement 4:
Able to identify inefficiencies
during development
Solution: Pherf and automated Performance Testing
● Performance testing critical to identify resource expensive
queries
● Pherf an integral part of Perf strategy
• Pherf is a standalone tool that can perform performance
and functional testing.
• Great for automated regression testing - daily runs.
• Insights into performance of queries patterns early on
during design.
https://phoenix.apache.org/pherf.html
Round-up
● Apache Phoenix underpins BigObjects on Force.com
Platform
● Unique challenges in extending our platform to Big Data
● How we addressed challenges and used Phoenix to
provide a scalable and predictable way for developers to
work with Big Data on Force.com
Questions?
thank y u

Salesforce's Trusted Enterprise Platform and Apache Phoenix

  • 1.
    Salesforce’s Trusted Enterprise Platformand Apache Phoenix Jan Fernando Principal Member of Technical Staff jfernando@salesforce.com @janfern25
  • 2.
    Forward-Looking Statements Statement underthe Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  • 3.
    Background: Salesforce AppCloud Fast Development for Everyone Build apps with clicks AND code with modern tools Connected Experience Across Apps Unified user experience across every device Trusted Enterprise Cloud Get unparalleled security in the cloud Proven Success and Scale 5.5M apps, 2.5M developers, 4B daily transactions, AppExchange The fastest way to build apps for customers, employees, and partners AppExchange Trailhead Shared Data, Networks, and Identity Across Services FORCE HEROKU ENTERPRISE THUNDERLIGHTNING
  • 4.
    Background on theMultitenant Force.com database - SObjects make up each tenant’s model • Metadata defined fields and relationships • Fully custom models supported - Various access patterns • Query support via SOQL • Programmatic access via Apex • SOAP/REST API automatically exposed - Many features auto-enabled • UI, Triggers, Search, Workflows, Reports, Dashboards, etc. A 100% metadata defined data model
  • 5.
  • 6.
    BigObjects ● Built onApache Phoenix and Apache HBase ● Familiar, object-based development model ● Familiar platform semantics and features Scale to 100s of billions of records on Force.com
  • 7.
  • 8.
    Apache Phoenix agreat fit for Force.com relational-like semantics SOQL SQL
  • 9.
    Key Apache Phoenixfeatures used ● SQL and JDBC semantics ● Native Multi-tenancy ● Views ● Secondary Indexes ● Integrations with Pig, Spark etc. ● Operational Features: • Integrated Metrics • Pherf Phoenix features that make BigObjects possible on Force.com
  • 10.
    Force.com Developer Expectations: ● Easyto use even for non-professional developers ● Predictable behavior and performance ● Reliable and Resilient
  • 11.
  • 12.
    Requirement 1: Stable SOQLquery times independent of data size
  • 13.
    Solution: Synchronous SOQLConstraints ● Support a subset of SOQL grammar ● Prevent table scans and unbounded query runtimes ● Query along rowkey axis only ● All other queries blocked e.g no aggregates ● Use Apache Phoenix Query Plan to evaluate queries at runtime
  • 14.
    Requirement 2: Resiliently QueryBigObjects with full power of SOQL
  • 15.
    Solution: AsyncSOQL ● Asynchronousprocessing directly on Apache Hadoop/HBase clusters ● Built on Apache Phoenix - Apache Pig integration ● Query, filter, aggregate data using SOQL without restrictions ● Fault-tolerant query engine for BigObject data ● New API for BigObject analytics and processing Batch Analytics for BigObjects +
  • 16.
    Requirement 3: Load dataat scale into BigObjects
  • 17.
    Solution: Integrated Map/ReduceBased Data Ingest ● M/R behind Force.com Bulk API semantics ● Use Apache Phoenix/ Apache Pig integration ● Support for metadata data validation rules and security model ● Robust error handling ● Idempotent writes to allow retries at any level in the stack
  • 18.
    Operational Requirements Ensuring ApachePhoenix client is a good citizen in a multitenant environment
  • 19.
  • 20.
    Solution: Short timeoutsand connection management ● Connection Pool like semantics for managing resource consumption • Counting checked out connections ● Apache HBase and Apache Phoenix Timeout configurations for fast recovery • Avoid thread backups in the event pieces of higher latencies
  • 21.
  • 22.
    Solution: Apache PhoenixMetrics, Logging and Graphing ● Visibility into production phoenix query performance and resource utilization ● PHOENIX-1452 and PHOENIX-1819 were mission critical contributions for us ● Metrics Examples: • TASK_QUEUE_WAIT_TIME • SCAN_BYTES https://phoenix.apache.org/metrics.html
  • 23.
    Requirement 3: Manage ApachePhoenix client resource utilization
  • 24.
    Solution: Set ApachePhoenix Client Configs Appropriately ● Configure small Apache Phoenix thread pool size • phoenix.query.threadPoolSize • phoenix.query.queueSize ● Configure Apache Phoenix memory consumption conservatively • phoenix.query.maxGlobalMemoryPercentage ● Avoid queries with heavy client-side processing https://phoenix.apache.org/tuning.html
  • 25.
    Requirement 4: Able toidentify inefficiencies during development
  • 26.
    Solution: Pherf andautomated Performance Testing ● Performance testing critical to identify resource expensive queries ● Pherf an integral part of Perf strategy • Pherf is a standalone tool that can perform performance and functional testing. • Great for automated regression testing - daily runs. • Insights into performance of queries patterns early on during design. https://phoenix.apache.org/pherf.html
  • 27.
    Round-up ● Apache Phoenixunderpins BigObjects on Force.com Platform ● Unique challenges in extending our platform to Big Data ● How we addressed challenges and used Phoenix to provide a scalable and predictable way for developers to work with Big Data on Force.com
  • 28.
  • 29.

Editor's Notes

  • #2 Work on a team bringing big data scale data to the platform 2 main themes I’d like to share with you: How Apache Phoenix has been critical to allowing us to bring Big Data to the platform Unique challenges of building a general-purpose platform and how that has influenced how we use Apache Phoenix
  • #3 Key Takeaway: We are a publicly traded company. Please make your buying decisions only on the products commercially available from Salesforce. Talk Track: Before I begin, just a quick note that when considering future developments, whether by us or with any other solution provider, you should always base your purchasing decisions on what is currently available.
  • #4 Force.com part of Salesfore App Cloud - allows developers to build applications with clicks and code on a trusted enterprise
  • #5 Talk specifically about features on the force.com platform To date platform has been inherently restricted to relational scale data - http://www.salesforce.com/platform/products/force/ -An sObject is any object that can be stored in the Force.com platform database. - Use the Salesforce Object Query Language (SOQL) to search your organization’s Salesforce data for specific information. SOQL is similar to the SELECT statement in the widely used Structured Query Language (SQL) but is designed specifically for Salesforce data.
  • #6 Preserve platform semantics and easy of use but allow that to scale to Big Data
  • #7 Built on Apache Phoenix and HBase Semantics familiar to developers used to the force.com platform New contracts e.g for synchronous and asynchronous query patterns
  • #9 Plugging in another SQL based data-source was something we could very easily - SQL dominant paradigm and have support for multiple SQL stores
  • #10 SOQL as the dominant paradigm on the force.com platform, makes it
  • #11 Focus on implementing business logic and building apps and not plumbing Unique Challenges to meet Developer expectations How to design features with predictable performance and behavior irrespective of data size in a multi-tenant environment?
  • #12 How to provide query and data load patterns that make sense for Big Data and meet developer expectations?
  • #14 Problem more acute for BigData of query timeouts Supported query patterns discoverable in metadata Immutable Secondary Indexes to support alternate query patterns
  • #16 Technology agnostic architecture support for different engines in future (e.g. Spark) Push down filters to different data stores and Final joining and aggregation in Pig ETL Federated
  • #18 Why can’t we just use something like Bulk CSV tool or write a standalone data loader? Enforces BigObject schema data validation rules at runtime Enforces BigObject user access control at runtime
  • #21 Reject connections when hit max Longer timeouts for asynchronous processes versus HTTP requests
  • #22 Visibility into production phoenix query performance and resource utlization
  • #23 Metrics are awesome to trend and priceless when troubleshooting e.g: TASK_QUEUE_WAIT_TIME: Helps us understand if latencies are due to query itself or likely due to competition for threads SCAN_BYTES: Allows us to identify and monitor queries to see if they are scanning large volumes of data
  • #25  e.g. in shared environment where threads are scarce Discuss how in our case threads and memory are scarce resource in a shared app server Heavy client side queries can be be resource intensive, in our case in a shared app server this can be an issue - perf testing critical to identify these
  • #26 Inefficient queries and regressions during dev
  • #27 We found this kind of automated end-to-end testing and testing early on in the development cycle critical to achieve our goals of having consistent and predictable performance for our users