Your SlideShare is downloading. ×
0
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data and the BI Wild West

880

Published on

Hadoop’s “Crossing the chasm” will require widespread and ubiquitous adoption by organizations; but the keystone to all of this isn’t the widely-talked about social media like Facebook, Twitter and …

Hadoop’s “Crossing the chasm” will require widespread and ubiquitous adoption by organizations; but the keystone to all of this isn’t the widely-talked about social media like Facebook, Twitter and LinkedIn. The seemingly mundane “dark data” in business which is captured but left unutilized, or under-utilized, will start the transformation away from the standard architectures of old and transform into the brave new work generally associated with “Big Data”.
As members of the Hadoop Community, it is our challenge to bring about that change rapidly and responsibly – bringing order to the “wild west” of the disruptive business intelligence landscape today. BI is the foothold on which to bring Hadoop into mainstream. Success requires linking new technologies with the mature ones in use today to enable the search for value.
Beyond the racks and clusters, we need to bring the science and understanding to enable organizations to leave the past behind and move to the brave new world. This requires bringing along applications, processes, and groups of users – intelligently combining noSQL, relational, predictive, and advanced analytics technologies together to make them easily consumable, even to the business user

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
880
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • A brain we all depend on it – we spend early parts of our lives developing it then a few years pickling it with alcohol (not sure it helps preserve it) and then actually using itCorporations have to build and develop the corporate brain learn, adapt, develop or die!Business Intelligence is key part of that learning process
  • BI is the digital brain of business – the corporate brain - it’s a collection of tools, process and objectives Ideally an ethos!Like Humans it needs learning, information and experimentation In all the sea of technology the values and reasoning get lost
  • 7-Click build – step through text then arrow then spinSame as human learning occurs within group and context of community Requires acquisition of facts – get the data Ability to view and manipulate - get to see and interact with data Ability to discuss, absorb and review Then take action – in business Pull levers to changeAnd of course action changes things which requires iteration feedback
  • Very crudely
  • Its rarely about more charts, more colours, more report stylesLower latency – speed of access to new data - real time accessMore timely also ‘faster’where’s the value – in the data and in the accessBuild and they will come – its more about interactions per user than raw users (concurrency debate)
  • Note: no click - Progressive build from start!Mobile access is coming alongApplication space broadening BYODCan supply access to BIBut also furiously generate data for BIAccess to dynamic information but every access generates data and possible inferencesSelf-service access
  • Note: 1-Click progressive buildPurely as an aside - if anyone doubts the rise of mobile…
  • In the mean time – data does not stop flowingReality check ! The big data fire hose is now full on!
  • Note: no buildVisokio – omniscopeAlso Microstrategy Insight, SAP Analytics workbenchNew players like Domochanging players like Alteryx
  • 2-click build – extend title, then progressive textPlateaued – what a great word for a run of vowels!loosing momentum – could almost say flat-liningThe enterprise toolsOnly so many variations of charts, tables, colours, layouts etc.Standard fabric
  • 2-Click Build – ‘more’ then R logoThe progression every time from simple fetch and calcTo complex calculationMining aided discoverNew world is about dynamic – real time analyticsR being the torch bearer – cost effective! tool of choice for millennials coming out of university
  • No BuildBottlenecks caused by platforms and tools unable to cope with demands of complexity, disparity and volumeComplex analyticsMachine learning – fraud detection/gamingWeb Analytics – Dynamic content/bid managementModelling – traditional clustering/behavioural for marketing/product development/resource optimisationInvestigative Reporting (Dashboards and reports with granular data access)Data Model
  • Note: 1-Click BuildBI mostly focuses (sells) on presentation – Graphics, pictures, VisualisationBUT behind the scenes a lot of heavy lifting has to be doneThis workload has changed over time from the simple to complex
  • 2-Click build – text added then diag addedWhat the business cares about is getting work done DW is now a bottleneck – its rigour and model get in the way!They really don’t care about how it is stored or where it is stored!Some tasks just plain to big to run! Its not about raw individual speed its about throughputAddress the bottlenecksToo many vendors play games that just shift the bottleneck
  • Tension – Nearly high noon! Two interpretations -time ‘needed’ to influence – reaction - what - the time ‘now ‘to influence – action – opportunityTwo contexts - time to influence peers and managers - time to influence customersFastest draw now counts for a lot!
  • Lots more debate and arguments like everything today need to be settled quicklyDangerous but exciting timesHowever Loss of control and governance – too much going on around the EDWBusiness and IT in gun fight – Wild West
  • 1-Click BuildSo a quick check point – where are weMore timely – no – too much effort to work out what to do?Batch processing gets in the way of interactive accessSelf-serve if you are knowledgeable enoughWinning in some areas but not in all
  • No build into swipe transitionOK Let’s not forget the data warehouse!Who couldIn previous presentation drew analogy with castles
  • (Bodiam Castle – from Eric Star Picture) Consolidate power, protect, stand the test of time, some where safe in difficult timesThe DW built to protect the corporate knowledgeLaw and discipline – structure, trust, safe haven - Control
  • 1-Click buildLots of investment and permanenceControlled access – tour access not full open accessDW starts to overload, starts to be selective,DW is inflexible – its controls get in the way of new data and big data – kills the three ‘V’sWho’s allowed in, what are they allowed to do and access – like visitors to modern castle - but not necessarily with nice guidebookUltimately its queues and delays cannot cope - users initially patiently, later impatientbusiness wants more and fasterIT see’s pressure from a different perspective – trouble and pain – Main inhibitor is complexity and cost
  • A quick USA – wild west perspective on castlesMore like marts – less edifice, more practical functionWild West Castle – Rapidly constructed from local materials - few long term examplesTime to build – effort expended and time spent – more AgileRapidly moving new frontier just like modern BI – keep movingDisney recreation - Fanghoot
  • 1-Click build – extend with boringDW is policed, it controls what you can have and in some case when you can have itHow many people get excited about their DW or access to a DWYes it gets the job done
  • Well this little guy certainly woke a few people up! as if a yellow elephant could creep up on you!Hadoop will solve all my BI problems… RIGHT? Many business users still not fully aware of what Hadoop is
  • 1-Click BuildHadoop is not "universal solution“!Way too much hype and hyperbole - great for innovators and start-ups not so good for plain old business
  • No click – progressive build from startCan debate ‘free’, but substantially reduced $$$
  • 3-click build – Text then two postitsDW demanded ETL to map data into model and ensure logical consistency - upfront prerequisiteStructure is strangling the DW – it was its primary strength, now weaknessHadoop making people lazy – it cuts out thought but leaves future decisions wide open – no lock in, cuts risks of bad decisionsSimplified decisions of what to keep – keep it allBUT hey BI needs structure and discipline!!!!
  • 2-click build – SqoopthenElephant photoIntegration between business infrastructure and systems and hadoop still limitedETL vendors not sure whether to love or hate Hadoop – will eat their lunchSqoop great for moving modelsNot so great for moving big data (or big elephants)Not exactly easy to move elephants on creaky railroads!
  • 3-click build – wanted, scribble, new playersAh yes plugging into Hadoop So much for noSQL revolutionUniversal integration needed – protect the BI investmentLost the gun fight like all revolutions the upstarts died down and got absorbed (subsumed)Business and BI investment demands SQL!Hive now we have drill, impala, Pivotal,Tough game – yes its SQL access but not low latency
  • 1-Click Build – insert ‘still’, pause, then loss…Remember the rise of data discoveryFine for big trawlsNot good for low latency iterations, high frequency accessThere, I have dared to say it!Does not accelerate BI quite in the way business was sold by the EDWLoss of “interactivity”A decade of being sold train-of-thoughtHadoop - Not hands on, not desktop, not agile
  • 1-click build - RamBalance – full spectrum power availableExcellent computing powerUnlimited storageFast networksNo need for single platforms like the traditional DW – stores and analysesThis is why data sciences risesWe did not get this in rise of data mining in the 90’sWe’ll come onto RAM shortly
  • 2-click buildHadoop disk centric – Storage - just like the EDW more parallelism yes, lots more but still batch disk I/O centricSchedulers not designed for rapid responseEssentially a batch queue – BI applications and business users have significantly evolved from batch reportingHadoop infrastructure evolution will drive more CPUs as they get work done!
  • 1-Click BuildFlash is not in-memoryVendors flash-washing products – boosts I/OLimitations – cost high, capacity lowBig vendors of EDW systems just offer switching spinning drives for flash drives!EDW appliance vendors offer this at a premium cost – only makes sense if majority is flashReally its about nanoseconds not millisecondsTraditional EDW software is architected for lots of disk and relatively small amounts of CPUFlash helps – bandaid on problem – buys a little time for the EDW if you can afford it – digital jolt
  • 2-click buildTo Be quick on drawLots of access to data - iterationsAnalytics is about work done – more work needs to be doneSo don’t hold CPUs back! – Highlight the cores – many more to comeCores help open up the bottleneck we saw earlierIn-memory is not cache!Memory is underplayed in Hadoop - its cheap use it!Processors and Ram are true measure of work that can be done – disks just fetchKeep data in memory!!! Don’t swap, don’t wait on disk don’t pick through indexes then data, just access what is needed.Economics of RAM have changed, much lower cost, large volumes readily available
  • No BuildReal world viewWith better performance than DWAnd considerably better standards support for SQL – like 2011 standard!And full OLAP support both ODBO and XMLAKognitio runs on same technology as Hadoop – work in same farm
  • Kognitio Hadoop connectorNon-invasive, uses standard HDFS/Map-Reduce access methodsFast to deploy – no coding neededActive selection is Kognitio machine codeMulti-threaded delivery backKognitio can retrieve terabytes – Terabyte in 10 mins – that’s a lot of M&Ms
  • No SQL Revolution dissipated/absobed – Business wonHadoop will be disk drive of futureHadoop will be data OS of future - data processing ecosystemPlatform for data scienceSQL will be primary access methodParallel execution and low latency will be demandedSupport for running any math or complex process
  • 2-Click BuildGraduate analysis to productionKey future ability is to move rapidly from discovery to productionTaking findings from Data Scientists and within hours or days productionize!Discovery has shelf-life – time to influence is nowcloud computing flexibility, PaaS, SaaS, rapid deployment make this possible (enabler)Hadoop provides the consistent central storeCaneither scale-up and dedicateOr spawn new logical model based system populate at scale and start productionAdaptable
  • 1-Click BuildLogical Data Warehouse components just need processes and SLA
  • Followed the California gold-rush of 1848/49
  • marking the completion of the Transcontinental Railroad.Wild West was tamed by infrastructure, by the engineers and naviesSo that the shop keepers, bankers and workers could easily followBusiness infrastructure will only move on when BI and Hadoop and supportingEcosystem comes together – create an information network 
  • Kognitio
  • Transcript

    • 1. Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom
    • 2. Tools Processes Objectives
    • 3. Why Business Intelligence? View Learn Action Community Acquire
    • 4. What is Business Intelligence? Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with ease and simplicity
    • 5. Business [Intelligence] Desires More timely Lower latency More granularity More users interactions Richer data model Self service
    • 6. View and generate
    • 7. Got mobile? 200 million Employees bring their own device to work Nearly half Of the workforce will be made up of millennials by 2020 50% Companies BYOD orgs have had a security breach 1/3 Have broken or would break corporate policy on BYOD
    • 8. Data flow
    • 9. Dynamic access Drill unlimited Disruption: Data Discovery tools
    • 10. BI tools have plateaued…again Decision Support (Reporting) in late 90’s Business Intelligence of 00’s …led to data mining …leading to analytics and data science
    • 11. More math …a lot more math
    • 12. Machine learning algorithms Dynamic Simulation Statistical Analysis Clustering Behaviour modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic Interaction Technology/Automation AnalyticalComplexity Campaign Management
    • 13. create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID rank() over (partition by Trans_Year order by count(distinct A rank() over (partition by Trans_Year order by sum(total_spend) from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; Behind the numbers
    • 14. It’s all about getting work done Used to be simple fetch of value Tasks evolving: Then was compute dynamic aggregate Now complex algorithms!
    • 15. Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized
    • 16. BI Wild West
    • 17. Business [Intelligence] Desires in relation to Big Data More timely Lower latency More granularity More users interactions Richer data model Self service
    • 18. The Data Warehouse?
    • 19. Realities
    • 20. Reports against the DW are just plain dull, boring even!
    • 21. And then came…
    • 22. Hadoop ticks many but not all the boxes a aaaaaaa aa a aa aa aa a a aa a aa aaa a aa aa
    • 23. Stomped on costs Made economics of scale practical
    • 24. No need to pre-process before storage i.e. no need to align to storage No need to triage before storage
    • 25. Early bridge Building Early Hadoop integration tools
    • 26. The new bounty hunters: Drill Impala Pivotal Stinger The No SQL Posse Wanted Dead or Alive SQL
    • 27. …but Hadoop too slow for interactive BI …loss of train-of-thought still
    • 28. For once technology is on our side …oh and BTW RAM is cheap! CPU NetworkStorage
    • 29. Lots of these Not so many of these Hadoop is… Hadoop inherently disk oriented Typically low ratio of CPU to Disk
    • 30. ‘Flash’ washing is not the solution
    • 31. Analytics needs low latency, no I/O wait
    • 32. Analytical Platform Reference Architecture Analytical Platform Layer Near-line Storage (optional) Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage
    • 33. SQL MDX Cognos
    • 34. Reach out, actively select and pull back to consume
    • 35. MPP everything – get more work done “No SQL” graduates to “not-only-SQL” SQL remains preferred data access language … for business community SQL can encapsulate other processing - in-line Python, R, Java etc.
    • 36. Discovery Production
    • 37. Big Data + Hadoop + in-memory for BI a aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaa a aaaaaaaa
    • 38. Wild West 1865 to 1890 "The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner. The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization. For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.
    • 39. Driving the golden spike for Hadoop and BI
    • 40. connect kognitio.com kognitio.tel kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/kognitio contact Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com Steve Friedberg - press contact MMI Communications steve@mmicomm.com Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!

    ×