Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accelerating Insight with High Octane Graph Fueled Data

539 views

Published on

At Gartner Data & Analytics Summit 2017 Barry Zane, Vice President of Engineering, and Ben Szekely, Vice President of Solutions, discussed how Cambridge Semantics' Anzo Smart Data Lake® empowers business users with on-demand analytics on rich data through the use of graph database technology. These are the slides from their presentation.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Accelerating Insight with High Octane Graph Fueled Data

  1. 1. Anzo Smart Data Lake™ - Accelerating Insight Disrupting the Analytics Time-to-Value Function Barry Zane Vice President, Engineering barry@cambridgesemantics.com Ben Szekely Vice President, Solution Engineering ben@cambridgesemantics.com
  2. 2. ©2017 Cambridge Semantics Inc. All rights reserved. Big Data and Analytics Industry Trends • We are graduating from pieced-together ETL, Hadoop and BI solutions to consolidate around complete end-to-end solutions – Forward thinking customers looking to product vendors for innovation, delivery and accountability for value. – Consolidation of partnerships and acquisitions
  3. 3. ©2017 Cambridge Semantics Inc. All rights reserved. Cloud Computing Trends • Cloud Computing is a transformative cost saver for analytics as demand for access to all data grows – Think beyond infrastructure balance sheet savings – Pay only for the analytics compute you use, as business needs demand and peak.
  4. 4. ©2017 Cambridge Semantics Inc. All rights reserved. The importance of “Time-to-Value” • Time-to-Value from data becoming the key driver for analytics strategy with an assumption of self-serviceEffort Time to Value
  5. 5. ©2017 Cambridge Semantics Inc. All rights reserved. Key Risks Costs rising from vendor lock-in of data format/storage, analytics tools and cloud infrastructure.
  6. 6. ©2017 Cambridge Semantics Inc. All rights reserved. Anzo Smart Data Lake: Accelerating Insight Disparate Sources Insight Exploratory Analytics Knowledge Discovery Data on Demand Automated Ingestion Enterprise Knowledge Graph Governance
  7. 7. ©2017 Cambridge Semantics Inc. All rights reserved. ITBuildandDeployment Anzo Smart Data Lake AddNewDataAddNewData AddNewDataAddNewData Disrupting the Time-to-Value Function TimeandResource Investments Insights and Value Anzo Smart Data Lake
  8. 8. ©2017 Cambridge Semantics Inc. All rights reserved. Anzo Smart Data Lake A Graph-based Platform to Disrupt the Analytics Time-to-Value Function Connectors Models Rules Analytics & Tools ASDL Customer Fingerprint - Intellectual Property Data Ingestion & Mapping Automated ETL Generation Collaborative Mapping Text Processing Data Cataloging Data & Model Governance Active Metadata Management Role-Based Security Discovery & Analytics Automated Query Generation User Dashboards and Custom UI/UX Self-Serve Live Extracts In-Memory MPP Query Graphmarts on Demand ELT, Model Based Data Integration Document Search Actionable Insights Enterprise Data Sources Enterprise Data Lakes “Last Mile” Analytics
  9. 9. ©2017 Cambridge Semantics Inc. All rights reserved. Data Ingestion & Mapping Automated ETL Generation Collaborative Mapping Text Processing Data Cataloging Data & Model Governance Active Metadata Management Role-Based Security Discovery & Analytics Automated Query Generation Custom User Dashboards Self-Serve Live Extracts In-Memory MPP Query Graphmarts on Demand ELT, Model Based Data Integration Document Search Actionable Insights “Last Mile” Analytics Elastically Scaled Analytics Scalable Encrypted Storage Anzo Smart Data Lake – Cloud Deployment ASDL cloud deployment in Amazon Web Services or Google Cloud Platform Cloud automation is a significant and strategic component of the Cambridge Semantics roadmap including deployment, elastic scale and high-availability. Our cloud mission is to offer customers lower costs in development, maintenance and operations – using cloud resources efficiently as business needs determine. Enterprise Data Lakes Enterprise Data Sources Elastically Scaled Ingestion Cloud-delivered ASDL offers faster deployment and on-demand scale
  10. 10. Large Scale Graph Analytics Graph is a simple, clean model for standard analytic queries and allows you to do more. But, using Graph has had terrible performance for standard analytics queries against large-scale data. If you can’t do the standard “data warehouse” queries at scale, you won’t get to the algorithms that only Graph can perform! Build a Graph engine designed for large-scale analytics. Leverage parallel computing - lots of hardware. Scale to hundreds of severs. Extend the SPARQL language to backfill functionality present in SQL. Deploy thru a user interface that automatically writes the SPARQL, and visualizes the results. PROBLEM SOLUTION
  11. 11. Analytic Landscape ROLAP - Relational online analytics •Broad adoption, 45 years of technology evolution •Based on declarative SQL for business analysts •Formal ANSI/ISO standard since 1986 GOLAP - Graph based online analytics •Narrow adoption, accelerating over past 15 years •Based on declarative SPARQL for business analysts •Formal W3C standard since 2008 Hadoop (Spark) - Offline batch analytics •Growing adoption since created in 2005 (2012) •All queries programmed in Java/Scala/Python… •Apache and community standards •Limited only by programmer’s talents and available APIs
  12. 12. GOLAP is Real Relational Data Warehouse, Really Relational Databases are predefined “rectangular” tables and rows with columns. –Very natural for subjects (aka rows) with a number of known attributes common to all/most of the subjects. –Allows columns to be links (aka keys) to other table’s subjects. Challenged by: –Sparsity –One-to-many needs a separate “join table” –You need to understand the data in advance Graphs are real relational, really. Just a little different than the points above!
  13. 13. RDF/SPARQL… like RDB/SQL, but... Standard SQL aggregates, joins, etc, but simple and powerful relationship capabilities. “How is Joe related to Mary” –In SQL Relational •Are they spouses? •Are they siblings? •Are they friends? •Do they have the same hobby? •… enumerate the choices, EXPLODES with degrees of separation –In SPARQL Graph •How is Joe related to Mary? •… you can directly specify degrees of separation Pretty exciting, essentially all the power of SQL, but you can do more, with more diverse data, where the data tells you about itself, rather than you knowing in advance.
  14. 14. The Smart Data Lake is the “database” • Data cached in HDFS, AWS/GCP buckets • Multiple Graph Query Engine instances, usually on subsets • Ephemeral in-memory operation • Short term instances - load, query, toss
  15. 15. ©2017 Cambridge Semantics Inc. All rights reserved. Thank You Click here to request a demo

×