Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
1
Big data analytics game – it’s all up to you
Analytics can get you anywhere,
But where would you like to go?
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Introducing the new enterprise: Internet of Corporate Things
1. IP Devices
a. Mobile APPS
b. Embedded systems
c. PCs (workstations)
d. ATMs
e. Other “connected products”
2. Secure communications
3. Inside the firewall:
a. enterprise software,
b. analytics
c. IT services and operations
d. Development platforms
e. Security
4. Outside the firewall communications
a. Cloud operations
b. Cloud platform
c. Cloud development
d. Cloud applications
e. security
2
Classic BI
boundaries
Does it make sense
to keep analyzing
only corporate data?
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Your analytics consumers should be EVERYONE
3
• Do your analysis tools support the
digital transformation?
• Is your main user still the CFO?
• What about the CMO? employees?
applications? customers?
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
4
Technology is no longer a limit - We’re in (good) trouble!
Amazing technological advancement have brought us to a point where
we can now analyze ANYTHING, at ANY TIME, EVERYWHERE
So…
now that we can do ANYTHING, What will we do?
“Oh no! everything is possible!”
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Passover preparation: we all must know
Rabban Gamliel's dictum:
"Whoever does not speak of three things on Pesach has not done their duty":
•Hadoop
•Spark
•Elasticsearch
5P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
But what got you HERE won’t get you THERE
You can’t win the big data analytics game
using the same old cards and play methods!
6N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Chose a first (pilot) use case
Data lake for everyone to
experiment
Train your BI team and analysts
(“grow” your data scientists)
Build a CoE team
(Center of Excellence)
Plan
architecture
The road to big data analytics
Transform DW to the
new architecture
Big data
analytics
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
8
It’s not only
about the
cards
it’s HOW you
play the
game
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
First step: Choose a use case
Guidelines for choosing the first use case:
•Which data is considered an asset? (call data in
telecomm; medical data in healthcare; customers
buying data in credit card companies etc.)
•Which “main strategic goal” can be supported?
Reducing risk costs, improving customer
experience, innovating new business models –
telematics, cost savings – preventive maintenance
etc.
•Do you have the right business partner in LoB?
9
STKI: “But don’t think too
much. The first use case is
usually just for learning and
experimenting, not to
transform your organization”.
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
10
2nd card: Designing a new data architecture
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Architecture
•Model 1 (preferred): side-by-side, sandbox is a separate data mart
•Model 2: Sandbox is a virtual data mart, hosted inside the DW (eBay)
11N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Architecture
•Datalake
•Unsiloed data, stored in its native structure (no transformation)
•Schema development and cleanup is done later, when a specific need occurs
•Suitable for unstructured data
12
Breaking free from one centric data model
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
13
Beware of polluted data lakes!
“We see customers creating big data graveyards, dumping everything into HDFS
and hoping to do something with it down the road. But then they just lose track of
what’s there” Sean Martin, CTO of Cambridge Semantics
IT needs to focus on the “less sexy things”:
 Data governance and metadata! A MUST
 Descriptive metadata and maintenance of it
 Risk – security and access control
 Training: most users are not ready to work on datalakes
 Establish a central analysts CoE (doesn’t have to be under IT)
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Metadata and glossary project at DW level
(to support self-service BI)
Establish data governance team,
roles and responsibilities
Monitoring and ongoing maintenance
Data governance and privacy
policies
(Ongoing)
Data quality
The road to data governance
Data
Governance
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
15
3rd Card: Building an analytical Center of Excellence (CoE)
Source: Bain
CoE
CoE
Main functions:
• Setting data strategy
• Responsible for
implementing
• Determining privacy
policy
• Sometimes –
insights generation
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
16
4th Card: training the next generation
It takes about 1-2 years to
“express-train” data scientists,
mainly by online courses
The Open Source Data Science Masters
Curriculum for Data Science
http://datasciencemasters.org/
New skills
On-demand data scientists
271,000 registered data scientists
Or find other sourcing options:
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Most important data scientist skill: T-person
•T-shaped skills
•But also “anthropological” understanding
17
Source: Capgemini
New skills
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
18
Data Scientist vs. Data Engineer
New skills
Data engineers are the
designers, builders and
managers of the "big data"
infrastructure. They develop the
architecture that helps analyze
and process data in the way the
organization needs it. And they
make sure those systems are
performing smoothly
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Chose a first pilot use case
Hadoop (Spark) store with
traditional BI tools
Build knowledge and check
open source tools
For faster projects consider
vendors solution offering
Build team
The road to big data analytics tools
Migrate to cloud big
data services Big data
analytics tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
1nd card: Big data analytics tools
•Big Data – parallel, fault proof, scalable, on commodity HW,
(many) with open source offering
•Highest level of abstraction:
• Where you store the data – repository or storage
• How you deliver data to and from the repository
• How you analyze the data at the repository
•Many tools have integrated solutions
•Management and extra layers (quality, governance, policy etc.)
•Cloud offering of the above
20
Store
Deliver
Analyze
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big data analytics tools: Storage
•Hadoop – the basics – traditionally by servers with hard drives
•NoSQL DBMS: – Mongo, Cassandra (can run MapReduce), Hbase (runs on HDFS)
•NoSQL DBMS on RAM: Redis, Aerospike
•Other storage with HDFS capabilities: EMC Isilon (emulating HDFS on OneFS)
•Other file system (and cloud services): Ceph, IBM’s GPFS, AWS S3
21
Store
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big Data analytics tools – pushing the data to and from
•Hadoop based tools: Flume, Sqoop, BigSQL
•Traditional ETL tools that can integrate with Hadoop and other big data tools:
Informatica, IBM’s datastage, SAS, Oracle’s ODI, Talend, etc.
•Analytics tools with ETL capabilities: Pentahoo
•Real time integration (some with reasoning for streaming) : RabitMQ, Apache Kafka
22
Deliver
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big Data analytics tools: Analyze
•Map Reduce
•SQL on Hadoop: Hive, Pig latin, Cloudera’s Impala, Drill, Pivotal’s Hawq, Apache
Phoenix (for Hbase)
•General analytical tools: Platfora, Panteho, RapidMiner, SAS, SAP, IBM, Oracle
23
Analyze
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Analytic tools: a mix
Chose a mix of 2-3 tools
Try to include at least one open source tool
•R (many statistical libraries)
•Rapidminer (UI)
•Knime (UI)
•Weka (UI)
•Python (programming)
•SAS
•SPSS-IBM
•SAP (Kxen)
•…
24
Analyze
Big data
analytics
tools
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big Data analytics tools: Apache Spark
•Hadoop is the basic of big data (and google ‘basics’)
•However Hadoop writes to hard drive after each operation – less
efficient for algorithms that uses the same data over and over
25
Source: cloudera
blog
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big Data analytics – combined tools (storage and analytics)
•Search environments (based on Lucene) : ElasticSearch, SolR
•Streaming environments (some act like messaging with analytics) :
Apache Spark, IBM Streams, Samza, Apache Kafka
•Special purpose integrated tools (some are HW appliances): IBM’s
Netezza, Teradata and Teradata’s Rainstor, Vertica, Oracle’s Exadata,
26
Store Analyze
Big data
analytics
tools
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Elasticsearch
•Founded in 2012 by Israeli Shay Banon
•ElasticSearch is an open source RESTful search solution built on top of
Apache Lucene
•Near real time searches and analytics any type of document and diverse formats
•Schema-free
•Kibana: Exploration and visualization layer on top of Elasticsearch
27
Big data
analytics
tools
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
28
Are there any shortcuts?
Yes! You can use always-on cloud based Hadoop environments that lets you focus on analytic tasks
 Altiscale: Hadoop as a service (Hadoop “dial tone”)
 https://www.altiscale.com/
 Quabole: Big data analytics as a service Hadoop + UI: “Enter your query“ (Dial tone + a nice phone)
 http://www.qubole.com/
Nice way to show quick value!
But STKI believes it’s important to get your hands dirty
and learn
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Cloud offering of the above
•Cloud offering of on premise solution (almost every tools is available on cloud as
IaaS).
•Cloud special offering – PaaS services of “on premise” tools and cloud big data
special purpose tools
29P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
General DBMS trends
•Oracle and MSSQL are the definite leaders
•But they are not “hot” anymore
• Open source RDBMS
• Not as easy as “next next” migration but a viable option
• NoSQL (different way of thinking by application designers!)
•PaaS for DBMS execution is an alternative
30P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
31
Big data analytics adoption in Israel:
2015 will be the year of sandbox experiments (we are 2 years behind). 2016 – “real” projects
Source: Forbes
Early adopters
Already in use
Mainly open source
Internet&
Hi-Tech
Defense
Advanced
Starting to
experiment 2015
Finance
Healthcare
Still thinking
Trying to find a
business case
Telecomm &
media
Government
Worldwide use by sectors (Source: Forbes)
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Big data analytics use case - reference architecture
32
(a.k.a. – Groupon like application)
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Source: http://www.slideshare.net/SessionsEvents/ml-conf-axp2013finalversion8am
P
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
Which analytics type are you? Your winning hand is…
34
Conservative CIO
IT-controlled BI
Reliable and consistent data is
most important
Using traditional DW
tools – Oracle
Teradata DB2 MSSQL
Classic BI tools
Invest in UI
Data quality &
Data governance
Modern CIO
User controlled BI
Self Service BI
Start checking
analytics
Build Big Data team
that will focus in
one or two areas
Utilize on premise
commercial
Hadoop being
analyzed with
traditional BI tools
Metadata glossary
Early adopter CIO
Predictive analytics
IT provides data, analysis done by LoBs
Complete strategy around
Big Data and Data Lakes
Utilize mainly open source
solutions
Data and analytics is in
public cloud
Data scientist resources
Self service analytics
Data scientist analytical
tools
Data governance
Analytical CoE
Systems of
Immersion
Systems of
Intelligence
Systems of
Engagement
Systems of
Intelligence
Systems of
Intelligence
N
Einat Shimoni and Pini Cohen’s work Copyright@2015
Do not remove source or attribution from any slide, graph or portion of graph
35

Big data analytics

  • 1.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 1 Big data analytics game – it’s all up to you Analytics can get you anywhere, But where would you like to go?
  • 2.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Introducing the new enterprise: Internet of Corporate Things 1. IP Devices a. Mobile APPS b. Embedded systems c. PCs (workstations) d. ATMs e. Other “connected products” 2. Secure communications 3. Inside the firewall: a. enterprise software, b. analytics c. IT services and operations d. Development platforms e. Security 4. Outside the firewall communications a. Cloud operations b. Cloud platform c. Cloud development d. Cloud applications e. security 2 Classic BI boundaries Does it make sense to keep analyzing only corporate data? N
  • 3.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Your analytics consumers should be EVERYONE 3 • Do your analysis tools support the digital transformation? • Is your main user still the CFO? • What about the CMO? employees? applications? customers? N
  • 4.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 4 Technology is no longer a limit - We’re in (good) trouble! Amazing technological advancement have brought us to a point where we can now analyze ANYTHING, at ANY TIME, EVERYWHERE So… now that we can do ANYTHING, What will we do? “Oh no! everything is possible!” N
  • 5.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Passover preparation: we all must know Rabban Gamliel's dictum: "Whoever does not speak of three things on Pesach has not done their duty": •Hadoop •Spark •Elasticsearch 5P
  • 6.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph But what got you HERE won’t get you THERE You can’t win the big data analytics game using the same old cards and play methods! 6N
  • 7.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Chose a first (pilot) use case Data lake for everyone to experiment Train your BI team and analysts (“grow” your data scientists) Build a CoE team (Center of Excellence) Plan architecture The road to big data analytics Transform DW to the new architecture Big data analytics N
  • 8.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 8 It’s not only about the cards it’s HOW you play the game N
  • 9.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph First step: Choose a use case Guidelines for choosing the first use case: •Which data is considered an asset? (call data in telecomm; medical data in healthcare; customers buying data in credit card companies etc.) •Which “main strategic goal” can be supported? Reducing risk costs, improving customer experience, innovating new business models – telematics, cost savings – preventive maintenance etc. •Do you have the right business partner in LoB? 9 STKI: “But don’t think too much. The first use case is usually just for learning and experimenting, not to transform your organization”. N
  • 10.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 10 2nd card: Designing a new data architecture N
  • 11.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Architecture •Model 1 (preferred): side-by-side, sandbox is a separate data mart •Model 2: Sandbox is a virtual data mart, hosted inside the DW (eBay) 11N
  • 12.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Architecture •Datalake •Unsiloed data, stored in its native structure (no transformation) •Schema development and cleanup is done later, when a specific need occurs •Suitable for unstructured data 12 Breaking free from one centric data model N
  • 13.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 13 Beware of polluted data lakes! “We see customers creating big data graveyards, dumping everything into HDFS and hoping to do something with it down the road. But then they just lose track of what’s there” Sean Martin, CTO of Cambridge Semantics IT needs to focus on the “less sexy things”:  Data governance and metadata! A MUST  Descriptive metadata and maintenance of it  Risk – security and access control  Training: most users are not ready to work on datalakes  Establish a central analysts CoE (doesn’t have to be under IT) N
  • 14.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Metadata and glossary project at DW level (to support self-service BI) Establish data governance team, roles and responsibilities Monitoring and ongoing maintenance Data governance and privacy policies (Ongoing) Data quality The road to data governance Data Governance N
  • 15.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 15 3rd Card: Building an analytical Center of Excellence (CoE) Source: Bain CoE CoE Main functions: • Setting data strategy • Responsible for implementing • Determining privacy policy • Sometimes – insights generation N
  • 16.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 16 4th Card: training the next generation It takes about 1-2 years to “express-train” data scientists, mainly by online courses The Open Source Data Science Masters Curriculum for Data Science http://datasciencemasters.org/ New skills On-demand data scientists 271,000 registered data scientists Or find other sourcing options: N
  • 17.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Most important data scientist skill: T-person •T-shaped skills •But also “anthropological” understanding 17 Source: Capgemini New skills N
  • 18.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 18 Data Scientist vs. Data Engineer New skills Data engineers are the designers, builders and managers of the "big data" infrastructure. They develop the architecture that helps analyze and process data in the way the organization needs it. And they make sure those systems are performing smoothly N
  • 19.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Chose a first pilot use case Hadoop (Spark) store with traditional BI tools Build knowledge and check open source tools For faster projects consider vendors solution offering Build team The road to big data analytics tools Migrate to cloud big data services Big data analytics tools P
  • 20.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 1nd card: Big data analytics tools •Big Data – parallel, fault proof, scalable, on commodity HW, (many) with open source offering •Highest level of abstraction: • Where you store the data – repository or storage • How you deliver data to and from the repository • How you analyze the data at the repository •Many tools have integrated solutions •Management and extra layers (quality, governance, policy etc.) •Cloud offering of the above 20 Store Deliver Analyze Big data analytics tools P
  • 21.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big data analytics tools: Storage •Hadoop – the basics – traditionally by servers with hard drives •NoSQL DBMS: – Mongo, Cassandra (can run MapReduce), Hbase (runs on HDFS) •NoSQL DBMS on RAM: Redis, Aerospike •Other storage with HDFS capabilities: EMC Isilon (emulating HDFS on OneFS) •Other file system (and cloud services): Ceph, IBM’s GPFS, AWS S3 21 Store Big data analytics tools P
  • 22.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big Data analytics tools – pushing the data to and from •Hadoop based tools: Flume, Sqoop, BigSQL •Traditional ETL tools that can integrate with Hadoop and other big data tools: Informatica, IBM’s datastage, SAS, Oracle’s ODI, Talend, etc. •Analytics tools with ETL capabilities: Pentahoo •Real time integration (some with reasoning for streaming) : RabitMQ, Apache Kafka 22 Deliver Big data analytics tools P
  • 23.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big Data analytics tools: Analyze •Map Reduce •SQL on Hadoop: Hive, Pig latin, Cloudera’s Impala, Drill, Pivotal’s Hawq, Apache Phoenix (for Hbase) •General analytical tools: Platfora, Panteho, RapidMiner, SAS, SAP, IBM, Oracle 23 Analyze Big data analytics tools P
  • 24.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Analytic tools: a mix Chose a mix of 2-3 tools Try to include at least one open source tool •R (many statistical libraries) •Rapidminer (UI) •Knime (UI) •Weka (UI) •Python (programming) •SAS •SPSS-IBM •SAP (Kxen) •… 24 Analyze Big data analytics tools N
  • 25.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big Data analytics tools: Apache Spark •Hadoop is the basic of big data (and google ‘basics’) •However Hadoop writes to hard drive after each operation – less efficient for algorithms that uses the same data over and over 25 Source: cloudera blog Big data analytics tools P
  • 26.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big Data analytics – combined tools (storage and analytics) •Search environments (based on Lucene) : ElasticSearch, SolR •Streaming environments (some act like messaging with analytics) : Apache Spark, IBM Streams, Samza, Apache Kafka •Special purpose integrated tools (some are HW appliances): IBM’s Netezza, Teradata and Teradata’s Rainstor, Vertica, Oracle’s Exadata, 26 Store Analyze Big data analytics tools P
  • 27.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Elasticsearch •Founded in 2012 by Israeli Shay Banon •ElasticSearch is an open source RESTful search solution built on top of Apache Lucene •Near real time searches and analytics any type of document and diverse formats •Schema-free •Kibana: Exploration and visualization layer on top of Elasticsearch 27 Big data analytics tools N
  • 28.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 28 Are there any shortcuts? Yes! You can use always-on cloud based Hadoop environments that lets you focus on analytic tasks  Altiscale: Hadoop as a service (Hadoop “dial tone”)  https://www.altiscale.com/  Quabole: Big data analytics as a service Hadoop + UI: “Enter your query“ (Dial tone + a nice phone)  http://www.qubole.com/ Nice way to show quick value! But STKI believes it’s important to get your hands dirty and learn N
  • 29.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Cloud offering of the above •Cloud offering of on premise solution (almost every tools is available on cloud as IaaS). •Cloud special offering – PaaS services of “on premise” tools and cloud big data special purpose tools 29P
  • 30.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph General DBMS trends •Oracle and MSSQL are the definite leaders •But they are not “hot” anymore • Open source RDBMS • Not as easy as “next next” migration but a viable option • NoSQL (different way of thinking by application designers!) •PaaS for DBMS execution is an alternative 30P
  • 31.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 31 Big data analytics adoption in Israel: 2015 will be the year of sandbox experiments (we are 2 years behind). 2016 – “real” projects Source: Forbes Early adopters Already in use Mainly open source Internet& Hi-Tech Defense Advanced Starting to experiment 2015 Finance Healthcare Still thinking Trying to find a business case Telecomm & media Government Worldwide use by sectors (Source: Forbes) N
  • 32.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Big data analytics use case - reference architecture 32 (a.k.a. – Groupon like application) P
  • 33.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Source: http://www.slideshare.net/SessionsEvents/ml-conf-axp2013finalversion8am P
  • 34.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph Which analytics type are you? Your winning hand is… 34 Conservative CIO IT-controlled BI Reliable and consistent data is most important Using traditional DW tools – Oracle Teradata DB2 MSSQL Classic BI tools Invest in UI Data quality & Data governance Modern CIO User controlled BI Self Service BI Start checking analytics Build Big Data team that will focus in one or two areas Utilize on premise commercial Hadoop being analyzed with traditional BI tools Metadata glossary Early adopter CIO Predictive analytics IT provides data, analysis done by LoBs Complete strategy around Big Data and Data Lakes Utilize mainly open source solutions Data and analytics is in public cloud Data scientist resources Self service analytics Data scientist analytical tools Data governance Analytical CoE Systems of Immersion Systems of Intelligence Systems of Engagement Systems of Intelligence Systems of Intelligence N
  • 35.
    Einat Shimoni andPini Cohen’s work Copyright@2015 Do not remove source or attribution from any slide, graph or portion of graph 35