• Share
  • Email
  • Embed
  • Like
  • Private Content
As You Seek – How Search Enables Big Data Analytics
 

As You Seek – How Search Enables Big Data Analytics

on

  • 950 views

The Briefing Room with Robin Bloor and MarkLogic ...

The Briefing Room with Robin Bloor and MarkLogic
Live Webcast on June 18, 2013
http://www.insideanalysis.com

The heart and soul of Big Data Analytics revolves around search. That's why we keep hearing about NoSQL database vendors aligning themselves with third-party search engines. Because these purpose-built database engines do not leverage the Structured Query Language, search is the means by which valuable insights are gleaned from them. But bolted-on search engines typically don't offer the kind of deep functionality that built-in engines can.

Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how search functionality provides a window into the possibilities for Big Data Analytics. He'll be briefed by David Gorbet of MarkLogic who will tout his company's object database offering, which boasts more than 10 years of use in production. He'll discuss how search can be used to expose relationships in Big Data and thus help generate insights. He'll also provide details on MarkLogic's enterprise-caliber capabilities, such as ACID compliance, its SQL interface, and where semantics fit in the roadmap.

Statistics

Views

Total Views
950
Views on SlideShare
379
Embed Views
571

Actions

Likes
1
Downloads
9
Comments
0

2 Embeds 571

http://insideanalysis.com 570
http://zombieandson.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    As You Seek – How Search Enables Big Data Analytics As You Seek – How Search Enables Big Data Analytics Presentation Transcript

    • The Briefing RoomAs You Seek—How Search Enables Big Data Analytics
    • Twitter Tag: #briefr The Briefing RoomWelcomeHost:Eric Kavanagheric.kavanagh@bloorgroup.com
    • Twitter Tag: #briefr The Briefing Room!   Reveal the essential characteristics of enterprise software,good and bad!   Provide a forum for detailed analysis of today s innovativetechnologies!   Give vendors a chance to explain their product to savvyanalysts!   Allow audience members to pose serious questions... and getanswers!Mission
    • Twitter Tag: #briefr The Briefing RoomJUNE: DatabaseJuly: CLOUDAugust: HIGH PERFORMANCE ANALYTICSSeptember: ANALYTICS
    • Twitter Tag: #briefr The Briefing RoomDatabaseBetter SEARCHFaster INSIGHT
    • Twitter Tag: #briefr The Briefing RoomAnalyst: Robin BloorRobin Bloor isChief Analyst atThe Bloor Group robin.bloor@bloorgroup.com
    • Twitter Tag: #briefr The Briefing Room! MarkLogic is an enterprise-class NoSQL database company!   Key features of its database include ACID transactions,horizontal scaling, real-time indexing, high availability,disaster recovery, and government-grade security!   Its platform provides full-text query and search capabilities,application services and big data analyticsMarkLogic
    • Twitter Tag: #briefr The Briefing RoomDavid GorbetDavid Gorbet is Vice President of Engineering forMarkLogic, where he also runs the Supportorganization. Gorbet brings two decades ofexperience delivering some of the highest-volumeapplications and enterprise software in the world.Prior to MarkLogic, Gorbet helped pioneerMicrosoft’s business online services strategy byfounding and leading the SharePoint Onlineteam. Gorbet holds a Bachelor of AppliedScience degree in Systems Design Engineeringwith an additional major in Psychology from theUniversity of Waterloo, and an MBA from theUniversity of Washington Foster School ofBusiness.
    • MarkLogic: What it is, how it worksDavid Gorbet, VP Engineering
    • Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.WE ARE THENEW GENERATIONDATABASEAny Structure Era“For all your data!”• Schema-agnostic• Massive scale• Query and search• Analytics• Application services• Faster time-to-resultsRelational Era“For all your structureddata!”• Normalized, tabularmodel• Application-independent query• User controlHierarchical EraFor your applicationdata!• Application- andhardware-specific
    • Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Real Value From Big DataMake The World More SecureProvide Access To Valuable InformationCreate New Revenue StreamsGain Insights to Increase Market ShareReduce Bottom Line Expense
    • Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.The MarkLogic AdvantageOnly Enterprise NoSQL Database ACID compliant Big data search High availability Replication Point in-time recovery Government-grade security Real-time your Hadoop Proven customer success
    • Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.How Does It Work?Schema-agnostic designReal-time indexing and queryEvent processing and alertingScale-out shared-nothing cluster topologyAnalytics and VisualizationHigh availability and disaster recovery
    • Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Hierarchical Data Model MarkLogic Server is a document-centric database Supports any-structured data via hierarchical data modelDocumentTitleAuthorSectionSection Section Section SectionFirstLastMetadataTradeCashflowsPartyIdentifierNetPaymentPaymentDatePartyReferencePayerPartytradeIDPaymentAmountReceiverParty
    • Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.MarkLogic is Schema AgnosticJSON and XML are self-describing<article><title>MarkLogic Server:… </title><author><first-name>John</first-name><last-name>Doe</last-name></author><abstract>. . . .<company>MarkLogic</company>. . . .</abstract><body><section><section>. . . .</section></section><section>…index…</section></body><copyright>Copyright © … </copyright></article>
    • Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.MarkLogic is Schema AgnosticJSON and XML are self-describing<article><title>MarkLogic Server:…<author><first-name>John<last-name>Doe<abstract>. . . .<company>MarkLogic. . . .<body><section><section>. . . .<section>…index…<copyright>Copyright © …
    • Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.“brown” 123, 125, 129, 152, 344, 491, …“mice” 123, 125, 126, 129, 130, 152, …“brown mice” 125, 152, 516, 522, 765, 890, …STEM “mouse” 123, 125, 126, 129, 130, 152, …STEM “brown mouse” 125, 152, 516, 522, 765, 890, …<article> …<article>/<abstract> …<section>/<paragraph> …<animal>mouse</animal> …<year>1950</year> …Collection:Draft …Role:Editor + Action:Read …… …… …… …Universal IndexTerm Term ListMarkLogic indexes… Words Phrases Stemming Structure Values Collections Security PermissionsDocumentReferences125, 516, 890, …Which draft articles contain the phrase brown mice?
    • Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.“brown” 123, 125, 129, 152, 344, 491, …“mice” 123, 125, 126, 129, 130, 152, …“brown mice” 125, 152, 516, 522, 765, 890, …STEM “mouse” 123, 125, 126, 129, 130, 152, …STEM “brown mouse” 125, 152, 516, 522, 765, 890, …<article> …<article>/<abstract> …<section>/<paragraph> …<animal>mouse</animal> …<year>1950</year> …Collection:Draft …Role:Editor + Action:Read …… …… …… …Scalar QueriesTerm Term List DocumentReferences125, 516, 890, …Which draft articles that contain the phrase brown mice were written before 2010?
    • Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Range IndexesValue ID2002 32003 102004 52004 112007 42007 172009 12011 8… …… …… …ID Value1 20093 20024 20075 20048 201110 200311 200417 2007… …… …… …Map document IDs tovalues, and vice-versa ina compact in-memoryrepresentation
    • Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Geospatial Index:A 2-Dimensional Range IndexFully composable with all other indexes! Built-in support for: Point Box Circle Polygon Complex Polygon Polygon Intersection Polygon Containment
    • Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Reverse Indexes (Alerting)1. Load serialized queries as query documents2. For a given data document, find all queries that match Can provide real-time alerts during loads With no significant performance impact! Can let documents store values as "ranges" Documents about cities self-defining their geo boundaries Person documents defining birthdays as ranges, sequences Can power classifiers and "matchmaker" queries
    • Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Value ID2002 32003 102004 52004 112007 42007 172009 12011 8… …… …… …ID Value1 20093 20024 20075 20048 201110 200311 200417 2007… …… …… …Range IndexesMap document IDs tovalues, and vice-versa ina compact in-memoryrepresentationRange Indexes work likea built-in in-memorycolumn store
    • Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Facets and Aggregation
    • Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Interactive Visualization
    • Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.In-database Analytic FunctionsLeverage ready-madeanalytic built-ins forcommonly-used numericapplications Variance Covariance Correlation Standard deviation Linear model Median Mode Percentile Rank Percent-rankBenefits Faster analytics-based applicationdevelopment Supports more users & more data Eliminates costs associated withwriting custom code
    • Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.User-defined Functionsclass InfluenceRank : public AggregateUDF{public:struct Value {double sum, sum_sq, count;Value() : sum(0), sum_sq(0), count(0) {}} value;public:AggregateUDF* clone() const { return new InfluenceRank (*this); }void close() { delete this; }void start(Sequence&, Reporter&) {}void finish(OutputSequence& os, Reporter& reporter);void map(TupleIterator& values, Reporter& reporter);void reduce(const AggregateUDF* _o, Reporter& reporter);void encode(Encoder& e, Reporter& reporter);void decode(Decoder& d, Reporter& reporter);};
    • Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.• • •• • •In-database MapReducestartencodedecodereducefinishdecodemapreduceencode
    • Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.SQL and BI ToolsODBCSQLRange Indexes
    • Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.SQL and BI Tools
    • Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.HA/DR Features of MarkLogic
    • Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.MarkLogic 6FlexibleIndexesFull TextSearchSchema-AgnosticScalableAnalyticFunctionsHadoopDistributionAlerting& EventProcessingGeospatialQueryIn-databaseMapReduceVisualizationWidgetsTransactionsRole-basedSecurityAutomatedFailoverReplication JournalArchivingPoint-in-timeRecoveryDatabaseRollbackBackup/RestoreDistributedTransactionsSuper-clustersPowerfulEverything youneed to deliverbusiness valueTrustedEnterprise-ready formission-criticalappsREST &Java APIsJSONStorageApplicationBuilderInformationStudioHadoopConnectorContentPumpBIIntegrationSQLSupportMonitoring&ManagementOSSupportAccessibleLeverage existingtools, knowledge,skills
    • Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Any Questions?
    • Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.What is Semantics Technology?
    • Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Elasticity New tools to characterize and monitor theresource requirements of your applications andloads. Dynamic provisioning system that can add orsubtract resources on-the-fly to match theloads. Distributed & virtualized environments includingVMWare, Amazon AWS and Hadoop aresupported to scale-out. Make the cloud a first-class citizen: Use HadoopHDFS or Amazon S3 for backupAligning infrastructure + demand, continually
    • Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Tiered storageMLSSDlocalHDFSamzn s3Benefits Keep data on tiers appropriate toaccess needs = lower costs Detach and reattach storage whenneeded. Fewer compute nodesrequired = lower costs Leverage Hadoop HDFS investmentChoose infrastructure based onvalue of data stored. 100% online with different tiersat different SLAs/topologies On-line/near-line mix utilizingmount on-demand anddynamic node spin-up.Tiered Storage New Constructs• Range partitions by Date/Scalarmanage group of forests byrange (“Q1” or “1990-1995”)• Super Databases federatequeries across multipledatabases
    • Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Tiered Storage96 504 1,044592 2,066 2,080Total Size (TB)Total Cost ($000)Operational$25Effective Unit Cost ($/GB)$4Compliance$1.50Analytic
    • Twitter Tag: #briefr The Briefing RoomPerceptions & QuestionsAnalyst:Robin Bloor
    • The Bloor Group
    • The Bloor GroupDatabase InnovationDatabase used to be a “zero-innovation market.”Now it is the opposite.Traditional (relational)database is now seen(rightly) as inadequatein many respectsBig Data is, mainly, newdata posing newproblemsNew products areemerging and someolder products arebeing given a make-over(and gaining popularity)Hadoop has changedperceptions andthinking about database
    • The Bloor GroupMultiple Database RolesHAVE INCREASED SIGNIFICANTLY…
    • The Bloor GroupThe Analytics Issue
    • The Bloor GroupThe Origin of Big Data
    • The Bloor GroupNoSQL ConfusionAs the graph indicatesNoSQL is a veryconfusing descriptor.WHAT CAN A GIVENDATABASE ACTUALLYDO?The important question is
    • The Bloor GroupThe Joys and Sorrows of SQLSQL:Very good for set manipulationWorks for OLTP and many queryenvironmentsNot good for nested data structures(documents, web pages, etc.)Not good for ordered data setsNot good for data graphs (networks ofvalues)
    • The Bloor Group!   In my view we have reached a situation wherethere will be multiple “data engines.” Is thatMarkLogic’s view?!   Specifically, are there data structures ordatabase contexts for which MarkLogic isinappropriate?!   What new features or capabilities are on theMarkLogic roadmap?!   In your view, is the “age of the datawarehouse” over?
    • The Bloor Group!   Which sectors/businesses are currently inMarkLogic’s “sweet spot”?!   Data analytics involves much more than havinganalytical functions in the database. It is morethan 50% data prep (merging, cleansing, joining,transformation, etc.). How does MarkLogicaccommodate that?!   What is MarkLogic’s attitude to the cloud?Specifically, where would it recommend clouddeployment?
    • Twitter Tag: #briefr The Briefing Room
    • Twitter Tag: #briefr The Briefing RoomJuly: CLOUDAugust: HIGH PERFORMANCE ANALYTICSSeptember: ANALYTICSUpcoming Topicswww.insideanalysis.com
    • Twitter Tag: #briefr The Briefing RoomThank Youfor YourAttention