Big Data and the BI Wild West
Don’t Bring an Elephant
to a Gun Fight!
Paul Groom
Tools
Processes
Objectives
Why Business Intelligence?
View
Learn
Action
Community
Acquire
What is Business Intelligence?
Numbers
Tables
Charts
Indicators
Time
- History
- Lag
Access
- to view (portal)
- to data
-...
Business [Intelligence] Desires
More timely
Lower latency
More granularity
More users interactions
Richer data model
Self ...
View and generate
Got mobile?
200 million
Employees bring their own
device to work
Nearly half
Of the workforce will be made
up of millennia...
Data flow
Dynamic access
Drill unlimited
Disruption: Data Discovery tools
BI tools have plateaued…again
Decision Support (Reporting) in late 90’s
Business Intelligence of 00’s
…led to data mining
...
More math
…a lot more math
Machine learning
algorithms Dynamic
Simulation
Statistical
Analysis
Clustering
Behaviour
modelling
The drive for deeper un...
create external script LM_PRODUCT_FORECAST environment rsint
receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO...
It’s all about getting work done
Used to be simple fetch of value
Tasks evolving:
Then was compute dynamic aggregate
Now c...
Time to influence
Reaction – what? – potential value
Action – opportunity - interaction
BI is becoming democratized
BI Wild West
Business [Intelligence] Desires
in relation to Big Data
More timely
Lower latency
More granularity
More users interactions...
The Data Warehouse?
Realities
Reports against the DW are just plain dull, boring even!
And then came…
Hadoop ticks many but not all the boxes
a
aaaaaaa
aa a aa
aa aa a
a aa a
aa aaa
a aa aa
Stomped on costs
Made economics of scale practical
No need to pre-process before storage
i.e. no need to align to storage
No need to triage before storage
Early bridge Building
Early Hadoop integration tools
The new bounty hunters:
Drill
Impala
Pivotal
Stinger
The No SQL Posse
Wanted
Dead or Alive
SQL
…but Hadoop too slow
for interactive BI
…loss of train-of-thought
still
For once technology is on our side
…oh and BTW RAM is cheap!
CPU
NetworkStorage
Lots of these
Not so many of these
Hadoop is…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk
‘Flash’ washing is
not the solution
Analytics needs
low latency, no I/O wait
Analytical Platform Reference Architecture
Analytical
Platform
Layer
Near-line
Storage
(optional)
Application &
Client Lay...
SQL MDX
Cognos
Reach out, actively select and pull back
to consume
MPP everything – get more work done
“No SQL” graduates to “not-only-SQL”
SQL remains preferred data access
language … for ...
Discovery
Production
Big Data + Hadoop + in-memory for BI
a
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaaaa
aaaaaa a
aaaaaaaa
Wild West 1865 to 1890
"The Significance of the Frontier in
American History" (1893) a thesis by
Fredrick Jackson Turner.
...
Driving the golden spike for Hadoop and BI
connect
kognitio.com
kognitio.tel
kognitio.com/blog
twitter.com/kognitio
linkedin.com/companies/kognitio
tinyurl.com/kogni...
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Big Data and the BI Wild West
Upcoming SlideShare
Loading in …5
×

Big Data and the BI Wild West

1,124 views
990 views

Published on

Hadoop’s “Crossing the chasm” will require widespread and ubiquitous adoption by organizations; but the keystone to all of this isn’t the widely-talked about social media like Facebook, Twitter and LinkedIn. The seemingly mundane “dark data” in business which is captured but left unutilized, or under-utilized, will start the transformation away from the standard architectures of old and transform into the brave new work generally associated with “Big Data”.
As members of the Hadoop Community, it is our challenge to bring about that change rapidly and responsibly – bringing order to the “wild west” of the disruptive business intelligence landscape today. BI is the foothold on which to bring Hadoop into mainstream. Success requires linking new technologies with the mature ones in use today to enable the search for value.
Beyond the racks and clusters, we need to bring the science and understanding to enable organizations to leave the past behind and move to the brave new world. This requires bringing along applications, processes, and groups of users – intelligently combining noSQL, relational, predictive, and advanced analytics technologies together to make them easily consumable, even to the business user

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,124
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • A brain we all depend on it – we spend early parts of our lives developing it then a few years pickling it with alcohol (not sure it helps preserve it) and then actually using itCorporations have to build and develop the corporate brain learn, adapt, develop or die!Business Intelligence is key part of that learning process
  • BI is the digital brain of business – the corporate brain - it’s a collection of tools, process and objectives Ideally an ethos!Like Humans it needs learning, information and experimentation In all the sea of technology the values and reasoning get lost
  • 7-Click build – step through text then arrow then spinSame as human learning occurs within group and context of community Requires acquisition of facts – get the data Ability to view and manipulate - get to see and interact with data Ability to discuss, absorb and review Then take action – in business Pull levers to changeAnd of course action changes things which requires iteration feedback
  • Very crudely
  • Its rarely about more charts, more colours, more report stylesLower latency – speed of access to new data - real time accessMore timely also ‘faster’where’s the value – in the data and in the accessBuild and they will come – its more about interactions per user than raw users (concurrency debate)
  • Note: no click - Progressive build from start!Mobile access is coming alongApplication space broadening BYODCan supply access to BIBut also furiously generate data for BIAccess to dynamic information but every access generates data and possible inferencesSelf-service access
  • Note: 1-Click progressive buildPurely as an aside - if anyone doubts the rise of mobile…
  • In the mean time – data does not stop flowingReality check ! The big data fire hose is now full on!
  • Note: no buildVisokio – omniscopeAlso Microstrategy Insight, SAP Analytics workbenchNew players like Domochanging players like Alteryx
  • 2-click build – extend title, then progressive textPlateaued – what a great word for a run of vowels!loosing momentum – could almost say flat-liningThe enterprise toolsOnly so many variations of charts, tables, colours, layouts etc.Standard fabric
  • 2-Click Build – ‘more’ then R logoThe progression every time from simple fetch and calcTo complex calculationMining aided discoverNew world is about dynamic – real time analyticsR being the torch bearer – cost effective! tool of choice for millennials coming out of university
  • No BuildBottlenecks caused by platforms and tools unable to cope with demands of complexity, disparity and volumeComplex analyticsMachine learning – fraud detection/gamingWeb Analytics – Dynamic content/bid managementModelling – traditional clustering/behavioural for marketing/product development/resource optimisationInvestigative Reporting (Dashboards and reports with granular data access)Data Model
  • Note: 1-Click BuildBI mostly focuses (sells) on presentation – Graphics, pictures, VisualisationBUT behind the scenes a lot of heavy lifting has to be doneThis workload has changed over time from the simple to complex
  • 2-Click build – text added then diag addedWhat the business cares about is getting work done DW is now a bottleneck – its rigour and model get in the way!They really don’t care about how it is stored or where it is stored!Some tasks just plain to big to run! Its not about raw individual speed its about throughputAddress the bottlenecksToo many vendors play games that just shift the bottleneck
  • Tension – Nearly high noon! Two interpretations -time ‘needed’ to influence – reaction - what - the time ‘now ‘to influence – action – opportunityTwo contexts - time to influence peers and managers - time to influence customersFastest draw now counts for a lot!
  • Lots more debate and arguments like everything today need to be settled quicklyDangerous but exciting timesHowever Loss of control and governance – too much going on around the EDWBusiness and IT in gun fight – Wild West
  • 1-Click BuildSo a quick check point – where are weMore timely – no – too much effort to work out what to do?Batch processing gets in the way of interactive accessSelf-serve if you are knowledgeable enoughWinning in some areas but not in all
  • No build into swipe transitionOK Let’s not forget the data warehouse!Who couldIn previous presentation drew analogy with castles
  • (Bodiam Castle – from Eric Star Picture) Consolidate power, protect, stand the test of time, some where safe in difficult timesThe DW built to protect the corporate knowledgeLaw and discipline – structure, trust, safe haven - Control
  • 1-Click buildLots of investment and permanenceControlled access – tour access not full open accessDW starts to overload, starts to be selective,DW is inflexible – its controls get in the way of new data and big data – kills the three ‘V’sWho’s allowed in, what are they allowed to do and access – like visitors to modern castle - but not necessarily with nice guidebookUltimately its queues and delays cannot cope - users initially patiently, later impatientbusiness wants more and fasterIT see’s pressure from a different perspective – trouble and pain – Main inhibitor is complexity and cost
  • A quick USA – wild west perspective on castlesMore like marts – less edifice, more practical functionWild West Castle – Rapidly constructed from local materials - few long term examplesTime to build – effort expended and time spent – more AgileRapidly moving new frontier just like modern BI – keep movingDisney recreation - Fanghoot
  • 1-Click build – extend with boringDW is policed, it controls what you can have and in some case when you can have itHow many people get excited about their DW or access to a DWYes it gets the job done
  • Well this little guy certainly woke a few people up! as if a yellow elephant could creep up on you!Hadoop will solve all my BI problems… RIGHT? Many business users still not fully aware of what Hadoop is
  • 1-Click BuildHadoop is not "universal solution“!Way too much hype and hyperbole - great for innovators and start-ups not so good for plain old business
  • No click – progressive build from startCan debate ‘free’, but substantially reduced $$$
  • 3-click build – Text then two postitsDW demanded ETL to map data into model and ensure logical consistency - upfront prerequisiteStructure is strangling the DW – it was its primary strength, now weaknessHadoop making people lazy – it cuts out thought but leaves future decisions wide open – no lock in, cuts risks of bad decisionsSimplified decisions of what to keep – keep it allBUT hey BI needs structure and discipline!!!!
  • 2-click build – SqoopthenElephant photoIntegration between business infrastructure and systems and hadoop still limitedETL vendors not sure whether to love or hate Hadoop – will eat their lunchSqoop great for moving modelsNot so great for moving big data (or big elephants)Not exactly easy to move elephants on creaky railroads!
  • 3-click build – wanted, scribble, new playersAh yes plugging into Hadoop So much for noSQL revolutionUniversal integration needed – protect the BI investmentLost the gun fight like all revolutions the upstarts died down and got absorbed (subsumed)Business and BI investment demands SQL!Hive now we have drill, impala, Pivotal,Tough game – yes its SQL access but not low latency
  • 1-Click Build – insert ‘still’, pause, then loss…Remember the rise of data discoveryFine for big trawlsNot good for low latency iterations, high frequency accessThere, I have dared to say it!Does not accelerate BI quite in the way business was sold by the EDWLoss of “interactivity”A decade of being sold train-of-thoughtHadoop - Not hands on, not desktop, not agile
  • 1-click build - RamBalance – full spectrum power availableExcellent computing powerUnlimited storageFast networksNo need for single platforms like the traditional DW – stores and analysesThis is why data sciences risesWe did not get this in rise of data mining in the 90’sWe’ll come onto RAM shortly
  • 2-click buildHadoop disk centric – Storage - just like the EDW more parallelism yes, lots more but still batch disk I/O centricSchedulers not designed for rapid responseEssentially a batch queue – BI applications and business users have significantly evolved from batch reportingHadoop infrastructure evolution will drive more CPUs as they get work done!
  • 1-Click BuildFlash is not in-memoryVendors flash-washing products – boosts I/OLimitations – cost high, capacity lowBig vendors of EDW systems just offer switching spinning drives for flash drives!EDW appliance vendors offer this at a premium cost – only makes sense if majority is flashReally its about nanoseconds not millisecondsTraditional EDW software is architected for lots of disk and relatively small amounts of CPUFlash helps – bandaid on problem – buys a little time for the EDW if you can afford it – digital jolt
  • 2-click buildTo Be quick on drawLots of access to data - iterationsAnalytics is about work done – more work needs to be doneSo don’t hold CPUs back! – Highlight the cores – many more to comeCores help open up the bottleneck we saw earlierIn-memory is not cache!Memory is underplayed in Hadoop - its cheap use it!Processors and Ram are true measure of work that can be done – disks just fetchKeep data in memory!!! Don’t swap, don’t wait on disk don’t pick through indexes then data, just access what is needed.Economics of RAM have changed, much lower cost, large volumes readily available
  • No BuildReal world viewWith better performance than DWAnd considerably better standards support for SQL – like 2011 standard!And full OLAP support both ODBO and XMLAKognitio runs on same technology as Hadoop – work in same farm
  • Kognitio Hadoop connectorNon-invasive, uses standard HDFS/Map-Reduce access methodsFast to deploy – no coding neededActive selection is Kognitio machine codeMulti-threaded delivery backKognitio can retrieve terabytes – Terabyte in 10 mins – that’s a lot of M&Ms
  • No SQL Revolution dissipated/absobed – Business wonHadoop will be disk drive of futureHadoop will be data OS of future - data processing ecosystemPlatform for data scienceSQL will be primary access methodParallel execution and low latency will be demandedSupport for running any math or complex process
  • 2-Click BuildGraduate analysis to productionKey future ability is to move rapidly from discovery to productionTaking findings from Data Scientists and within hours or days productionize!Discovery has shelf-life – time to influence is nowcloud computing flexibility, PaaS, SaaS, rapid deployment make this possible (enabler)Hadoop provides the consistent central storeCaneither scale-up and dedicateOr spawn new logical model based system populate at scale and start productionAdaptable
  • 1-Click BuildLogical Data Warehouse components just need processes and SLA
  • Followed the California gold-rush of 1848/49
  • marking the completion of the Transcontinental Railroad.Wild West was tamed by infrastructure, by the engineers and naviesSo that the shop keepers, bankers and workers could easily followBusiness infrastructure will only move on when BI and Hadoop and supportingEcosystem comes together – create an information network 
  • Kognitio
  • Big Data and the BI Wild West

    1. 1. Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom
    2. 2. Tools Processes Objectives
    3. 3. Why Business Intelligence? View Learn Action Community Acquire
    4. 4. What is Business Intelligence? Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with ease and simplicity
    5. 5. Business [Intelligence] Desires More timely Lower latency More granularity More users interactions Richer data model Self service
    6. 6. View and generate
    7. 7. Got mobile? 200 million Employees bring their own device to work Nearly half Of the workforce will be made up of millennials by 2020 50% Companies BYOD orgs have had a security breach 1/3 Have broken or would break corporate policy on BYOD
    8. 8. Data flow
    9. 9. Dynamic access Drill unlimited Disruption: Data Discovery tools
    10. 10. BI tools have plateaued…again Decision Support (Reporting) in late 90’s Business Intelligence of 00’s …led to data mining …leading to analytics and data science
    11. 11. More math …a lot more math
    12. 12. Machine learning algorithms Dynamic Simulation Statistical Analysis Clustering Behaviour modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic Interaction Technology/Automation AnalyticalComplexity Campaign Management
    13. 13. create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID rank() over (partition by Trans_Year order by count(distinct A rank() over (partition by Trans_Year order by sum(total_spend) from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; Behind the numbers
    14. 14. It’s all about getting work done Used to be simple fetch of value Tasks evolving: Then was compute dynamic aggregate Now complex algorithms!
    15. 15. Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized
    16. 16. BI Wild West
    17. 17. Business [Intelligence] Desires in relation to Big Data More timely Lower latency More granularity More users interactions Richer data model Self service
    18. 18. The Data Warehouse?
    19. 19. Realities
    20. 20. Reports against the DW are just plain dull, boring even!
    21. 21. And then came…
    22. 22. Hadoop ticks many but not all the boxes a aaaaaaa aa a aa aa aa a a aa a aa aaa a aa aa
    23. 23. Stomped on costs Made economics of scale practical
    24. 24. No need to pre-process before storage i.e. no need to align to storage No need to triage before storage
    25. 25. Early bridge Building Early Hadoop integration tools
    26. 26. The new bounty hunters: Drill Impala Pivotal Stinger The No SQL Posse Wanted Dead or Alive SQL
    27. 27. …but Hadoop too slow for interactive BI …loss of train-of-thought still
    28. 28. For once technology is on our side …oh and BTW RAM is cheap! CPU NetworkStorage
    29. 29. Lots of these Not so many of these Hadoop is… Hadoop inherently disk oriented Typically low ratio of CPU to Disk
    30. 30. ‘Flash’ washing is not the solution
    31. 31. Analytics needs low latency, no I/O wait
    32. 32. Analytical Platform Reference Architecture Analytical Platform Layer Near-line Storage (optional) Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage
    33. 33. SQL MDX Cognos
    34. 34. Reach out, actively select and pull back to consume
    35. 35. MPP everything – get more work done “No SQL” graduates to “not-only-SQL” SQL remains preferred data access language … for business community SQL can encapsulate other processing - in-line Python, R, Java etc.
    36. 36. Discovery Production
    37. 37. Big Data + Hadoop + in-memory for BI a aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaa a aaaaaaaa
    38. 38. Wild West 1865 to 1890 "The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner. The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization. For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.
    39. 39. Driving the golden spike for Hadoop and BI
    40. 40. connect kognitio.com kognitio.tel kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/kognitio contact Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com Steve Friedberg - press contact MMI Communications steve@mmicomm.com Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!

    ×