Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit San Francisco 2016 - Ali Ghodsi Keynote

1,677 views

Published on

Disrupting Big Data with Apache Spark in the Cloud

Published in: Software

Spark Summit San Francisco 2016 - Ali Ghodsi Keynote

  1. 1. Disrupting Big Data with Apache Spark in the Cloud Ali Ghodsi Co-Founder and CEO
  2. 2. The Dawn of Advanced Analytics 2 WatsonSIRI/assistantsSelf-driving cars Not just sci-fi, important applications for businesses
  3. 3. Analytics Transforming Industries 3 Predictive analytics Anomaly Detection Predict Product Revenue Customer Assessment Targeted Advertising Fraud Detection Risk Assessment Equipment Failure Data-Driven Real-time Analytics Applications
  4. 4. Today’s Data Reality 4 HADOOP DATA LAKES DATA HUBS CLOUD STORAGE DATA WAREHOUSES Siloed, Fast-Growing Size, Cost
  5. 5. The Analytics Gap 5 IndustrialMediaPharma HADOOP DATA LAKES DATA HUBS CLOUD STORAGE DATA WAREHOUSES Siloed, Fast-Growing Size, Cost Real-time Data-Driven Analytics Applications
  6. 6. Why is there a gap? 6 Real-time Data-Driven Analytics Applications ManageData infrastructure • Create, tune, monitor compute clusters. • Securely access silos of disparate data sources. • Enforce proper data governance. •1 Empower teams to be productive • Securely share big data clusters among analysts. • Interactively explore data and prototypeideas. • Debug, troubleshoot, version-control big data applications.• • • 2 Establish Production- Ready Applications • Setup robust data pipelines for ETL/ELT. • Productionize real-time applications with HA,FT. • Build, serve, maintain advanced machine learning models. • 3 Siloed, Fast-Growing Size, Cost
  7. 7. Databricks Cloud-Hosted Platform 7 • Separate compute & storage • Integrate existing data stores • Efficient cache on first access Just-in-Time Data Platform 1 Agile • Workflow scheduler for ML, streaming, SQL, ETL • Highavailability,fault-tolerant, performance-optimized Automated Apache Spark Management 3 Production-Ready • Interactive notebooks, dashboards, reports • Real-time exploration, machine learning, graph use cases Integrated Workspace 2 Democratize Big Data
  8. 8. HADOOP / DATA LAKES DATA WAREHOUSESYOUR STORAGE CLOUD STORAGE 8 Databricks Just-in-Time Data Platform INTEGRATEDWORKSPACE DASHBOARDS Reports NOTEBOOKS github, viz, collaboration BI TOOLS JUST-IN-TIME PROCESSING POWEREDBY APACHE CLUSTERS: Auto-scaled, resilient, multi-tenant DATA INTEGRATION: secure and fast data source integrations INTERFACES: RESTAPIs & BI tools DATABRICKSSERVICES + YOUR CUSTOM SPARK APPS PRODUCTION JOBS DATA LAKE DATA HUB
  9. 9. The Challenge of Securing Analytics 9 End-to-end security a challenge for enterprises Securing file management Secure table management Secure cluster management Secure job workflows Secure dashboards, report, notebook management Today there are piecemeal solutions, but no comprehensive solution
  10. 10. Databricks Enterprise Security (DBES) 10 Holistic end-to-end security for Data Analytics Tables Clusters Workflows Notebooks, Dashboards, Reports Files • Role-based access control • Auditing and governance • Integrated identity-management • Encryption on-diskand on-the-wire DBES provides The First End-to-End Security Solution for Apache Spark
  11. 11. Enterprise use-cases 11 Preventing creditcard fraud Predictenergy demand based on massiveweather data Predictplayer churn, predicting network outages Natural language processing to extract author graph Generating tailored programs based on big data
  12. 12. Thank you.
  13. 13. Try Apache Spark with Databricks 13 http://databricks.com/try Try latestversion of ApacheSpark and preview of Spark 2.0

×