Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Disrupting Big Data with Apache Spark in the Cloud

1,297 views

Published on

Ali Ghodsi's presentation at Spark Summit 2016

Published in: Data & Analytics
  • Be the first to comment

Disrupting Big Data with Apache Spark in the Cloud

  1. 1. Disrupting Big Data with Apache Spark in the Cloud Ali Ghodsi Co-Founder and CEO
  2. 2. The Dawn of Advanced Analytics 2 WatsonSIRI/assistantsSelf-driving cars Not just sci-fi, important applications for businesses
  3. 3. Analytics Transforming Industries 3 Predictive analytics Anomaly Detection Predict Product Revenue Customer Assessment Targeted Advertising Fraud Detection Risk Assessment Equipment Failure Data-Driven Real-time Analytics Applications
  4. 4. Today’s Data Reality 4 HADOOP DATA LAKES DATA HUBS CLOUD STORAGE DATA WAREHOUSE S Siloed, Fast-Growing Size, Cost
  5. 5. The Analytics Gap 5 IndustrialMediaPharma HADOOP DATA LAKES DATA HUBS CLOUD STORAGE DATA WAREHOUSES Siloed, Fast-Growing Size, Cost Real-time Data-Driven Analytics Applications
  6. 6. Why is there a gap? 6 Real-time Data-Driven Analytics Applications Manage Data infrastructure • Create, tune, monitor compute clusters. • Securely access silos of disparate data sources. • Enforce proper data governance. •1 Empower teams to be productive • Securely share big data clusters among analysts. • Interactively explore data and prototype ideas. • Debug, troubleshoot, version-control big data applications.• • • 2 Establish Production- Ready Applications • Setup robust data pipelines for ETL/ELT. • Productionize real-time applications with HA, FT. • Build, serve, maintain advanced machine learning models. • 3 Siloed, Fast-Growing Size, Cost
  7. 7. Databricks Cloud-Hosted Platform 7 • Separate compute & storage • Integrate existing data stores • Efficient cache on first access Just-in-Time Data Platform 1 Agile • Workflow scheduler for ML, streaming, SQL, ETL • High availability, fault- tolerant, performance- optimized Automated Apache Spark Management 3 Production-Ready • Interactive notebooks, dashboards, reports • Real-time exploration, machine learning, graph use cases Integrated Workspace 2 Democratize Big Data
  8. 8. HADOOP / DATA LAKES DATA WAREHOUSE S YOUR STORAGE CLOUD STORAGE 8 Databricks Just-in-Time Data Platform INTEGRATED WORKSPACEDASHBOARD S Reports NOTEBOOKS github, viz, collaboration BI TOOLS JUST-IN-TIME PROCESSING POWERED BY APACHE CLUSTERS: Auto-scaled, resilient, multi-tenant DATA INTEGRATION: secure and fast data source integrations INTERFACES: REST APIs & BI tools DATABRICKS SERVICES + YOUR CUSTOM SPARK APPS PRODUCTION JOBS DATA LAKE DATA HUB
  9. 9. The Challenge of Securing Analytics 9 End-to-end security a challenge for enterprises Securing file management Secure table management Secure cluster management Secure job workflows Secure dashboards, report, notebook management Today there are piecemeal solutions, but no comprehensive solution
  10. 10. Databricks Enterprise Security (DBES) 10 Holistic end-to-end security for Data Analytics Tables Clusters Workflow s Notebooks, Dashboards, Reports Files • Role-based access control • Auditing and governance • Integrated identity-management • Encryption on-disk and on-the- wire DBES provides The First End-to-End Security Solution for Apache Spark
  11. 11. Enterprise use-cases 11 Preventing credit card fraud Predict energy demand based on massive weather data Predict player churn, predicting network outages Natural language processing to extract author graph Generating tailored programs based on big data
  12. 12. Thank you.
  13. 13. Try Apache Spark with Databricks 13 http://databricks.com/try Try latest version of Apache Spark and preview of Spark 2.0

×