8. #7 Big data is an IT problem
“Building out Big Data
capabilities too often
becomes the end goal
itself”.
What you need to make Big Data work: The pencil: Matt Ariker,
Forbes CMO Network Article
Image sourced from https://www.flickr.com/photos/rosauraochoa/3256859352/
Forrester Research finds that most organisations only analyse 12% of all data.
Average EDW about 15TB
Average Hadoop installation 150 to 200TB
In memory very good for high speed decision making on data subsets.
Big data identifies the relevant subsets
Market for Hadoop estimated at US$50bn within 5 years.
All major vendors supporting Hadoop
62 percent of respondents expect to optimize enterprise data warehouses by offloading data and batch workloads (ELT) to Hadoop
69 percent of respondents expect to make enterprise-wide data available for analytics in Hadoop
53 percent of respondents are dedicating 5-10% of their planned budgets on Big Data projects
70+ percent of the respondents are from companies of $50 million+ in revenue - See more at: http://blog.syncsort.com/2014/09/hadoop-market-adoption-survey-asks-big-data-analytics-ready-prime-time/#sthash.VorNmt4j.dpuf
CHALLENGE #1:
You need to integrate and analyse all of your siloed data together to generate the best insights.
CHALLENGE #2:
You need to generate immediate insights from big data to drive better business decisions.
CHALLENGE #3:
You need to make analysing big data so easy that anyone can do it.
Infrastructure costs are low, but development costs can be high
Big data must have clear business goals
Most common use cases:
Customer Analytics
Fraud & Compliance
Operational Analytics
Data Driven products and Services
EDW Optimisation
Commercial hadoop – MapR, Cloudera, Hortonworks
Aim for 1.5 times your use case.
Suggested starting environment – master + 3 slaves (8cores, 8GBRAM, 4*4TB HD)
Sources:
18 months reference comes from TDWI
less than 4 weeks references comes from Sears
Summary:
Traditional solutions:
- take too long
- are inflexible and not business friendly (because they require a predefine and agreed to data model)
- do not answer the questions that will get you ahead
Detail:
Traditional solutions:
1. take longer
2. are inflexible
3. do not answer the unknown questions
Before Hadoop, we had limited storage and compute which forced a very slow 3-tier architecture for business intelligence:
First IT does a process called ETL to get every new data source ready to be stored. They Extract, Transform and Load data from a data source and massage it into a database or data warehouse - basically into a static data model.
The problem with a data model is that IT designs it today with the knowledge of yesterday and you have to hope that it good enough for tomorrow. But nobody can predict the perfect schema.
Then on top of that you put the business intelligence tool, which because of the static schemas underneath, is optimized to answer KNOWN QUESTIONS.
Now every time we have new data (point at the ETL side) IT has to update the schema, which means we have to update BI.
Every time we have a different question (point at BI) IT has to update the schema and that means we have to update all ETL.
We have 3 tools, 3 teams, 3 pieces of hardware that works great when its all set up, but that causes high friction as soon something has to be changed.
The ETL guy talks about String and Integer, the business analyst talks about customer churn and user behavior and the database administrator only screams about foreign key foreign key.
TDWI says this 3-tier process takes 18 months to implement or change. On average it takes 3 months to integrate a new data source.
Business is telling us we cannot operate at this speed anymore.
Now here is how Datameer approaches the problem.
First we don’t do the time consuming ETL process, we load all data raw which makes it extremely fast. That means with all metadata but in it’s form it was generated, as log file, as database table, as mainframe copy book or a social media stream.
We take all this data for the broad view you need and store it in our unlimited low cost storage and compute platform Hadoop. Since we don’t have a static data model we have to set the data in, what we now can do is just create different templates to view the data - or technically speaking called late binding schemas. Now we don’t need one data warehouse for IT, one for Sales and one for Marketing but we have all the data in Hadoop, and just create on the front end a Marketing view, a sales view and a IT view on the data. This means you can actually explore all of your data to discover unknown patterns or relationships that you never could do with traditional BI tools.
The way we create these views extremely easy to create is by using a very familiar spreadsheet UI. (JOKE: Has anybody here used Microsoft Excel before? PAUSE Well, than you can use our product).
The Lean Analytics Process is based on the process best practices gathered from hundreds of Datameer customers. This is a process that companies have deployed to achieve maximum value from their big data analytics projects.
Identify use case: Without a business case, a big data project will fail. Datameer offers use cases and sample applications (based on customers) to provide examples of ROI and accelerate time-to-market of deployment.
The next part of the cycle is iterative and differentiates Datameer from traditional BI solutions. The ability for a Subject matter expert to integrate, prepare, analyze and visualize in a dynamic, iterative way is essential to answering known and unknown questions.
Integrate: The process of connecting to and ingesting the data and getting transformed data back into source systems.
Preparing and Analyzing: The process of performing transformation on the data. Traditional systems make it very difficult to bring in and massage unstructured data.
Visualize: The process of visually representing the transformed data in a way that business users can consume as insights to make critical business decisions on.
Deploy: This includes how do you deploy and productionalize Datameer for the problems we are solving. Datameer provides the capabilities built-in to make it a production ready enterprise solution.