BIG DATA IN BUSINESS
Big Data

Is it a real need or just trendy?
Why does it apply to my case?
Petabytes: Google 300 PB, facebook: 45 PB, Yahoo! 180 PB
Exabytes: U.S. healthcare
Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB
YottaByte, Brontobyte, GeopByte to be reached

I do not have such a big volume of data
A big European company = Terabytes
But could or will have it:
Ever increasing amount of data, and more
heterogeneous:
Ubiquity, mobility, geolocation, social
networks, internet, sensors, M2M

CRMs, Call Centers, Emails, Documents, logs,
voice…
"There were 5 exabytes of information created by the entire world between the
dawn of civilization and 2003. Now that same amount is created every two days."
Google Ceo Eric Schmidt
Unstructured or semi structured data, equal to 85% of available data,
is not used by companies
This represent the new Fuel for companies
83% of the surveyed companies were
able to do things with Big Data that
seemed impossible to achieve before

“The art of possible”
“Impossible is not a fact, it’s an opinion”
Value and real ROI are the best KPIs
•Increase of client acquisitions

• Increase in sales

• Resource optimization

• Customer loyalty
You can’t stay stuck in old
paradigms
When to use it?
Extract value from data in any point of their life cycle
• Past: Stored data, Batch
mode
• Present: Current data
flows, Real time
• Future: Data and future
actions, Predictive
Big volume of data
Get value from Unstructured data
Get value from external data
Need for time or cost processsing reduction
Need for Data streaming analysis in real time
Algorithms, prediction or interactive analysis
Transform data into insights and value
Transformation to a Data driven company
Customer Pain
“I know I have to change
to Big Data but…”
How do I start to use with? When?
Which technology?

How do I acquire the knowledge?
How to use it?
Iterative and Cyclical
Choose a particular use case with a clear ROI
and time and budget limits

vs Big Bang
Avoid building a Big Data generic system and
then implementing projects over them
Which Technology?
A Technological Change
From Big Data 1.0

Bigtable

To Big Data 2.0

12 YEARS GAP

Big query
F1
CUSTOMER SOLUTION

Big Data 2.0
∙

Up to 100x faster than Big Data 1.0

∙

Interactive analysis

∙

NoSQL with SQL Interface

∙

No need to change previous way of work
Which technology?

BIG DATA 2.0

Stratio

Cloudera Impala

Cloudera CDH4*

BIG DATA 1.0

NoSQL

Stream Processing

Hortonworks HDP*

EMC Pivotal HD

VoltDB

Storm

Microsoft HDInsight

C-Store

Apache HBASE

MapR Apache Drill

Espresso

Apache CouchDB

Scribe Aurora

SQLStream Platform

Cassandra FS

Apache HDFS

Open Source

Google Big Query

IBM Inphosphere Biginsight

Datastax Platform

Hadapt platform

Basho Riak

VMWare Redis

HP Vertica

Hstreaming Platform

Apache Giraph

Amazon EMR _& Red shift

MapR M3-M5-M7

EMC Greenplum

Voldemort

Apache S4 Apache Flume Kafka

NEO Techonology Neo4j*

Almacenamiento

Intel Hadoop

Mencache

EsperTech ESPER

Graph database

Hortonworks Stinger

StreamBase Platform
IBM Inphosphere Streams

FlockDB

EMC Isilon OnFS

Closed based on Open Source

Closed

Apache Cassandra
From Big Data 1.0

Batch of new technologies that allow us to extract value out of a dataset which, due
to it’s volume, variety or velocity, was not previously exploited

To Big Data 2.0

“Set of new technologies that extract value from all the available data of a
company”
Use Cases
The Bubble filter
You must enter in the user bubble
Antena 3, nubeox : Big Data Recommendation engine
Monitoring of Streaming Videos
Description:
Recommendation Engine based not
only in the purchase history of the
customer, but also in their navigation

Advantages:
Increase in clickthrough

Increasing Conversions
Increase in sales
Customizing Web Sites: Behavioural Customization
Description:
Customizing homepages based on user navigation
Analysis and customization of the homepage and site in
real time for each user based on their browsing
Modification of contents, highlights, ads, in real time
based on user history
Advantages:
Over 300% increase in clickthrough
Creating millions of web pages in real time
Increasing Conversions
Increase in sales
Cost ten times lower than other solutions
Recommended links

News Interests

Top Searches

+79% clicks +160% clicks +43% clicks

vs. randomly selected

vs. one size fits all

vs. editor selected
Personalized Marketing with DataShake integration
Description:
Newsletter development, email-marketing or any
other sent material segmented by individual
preferences
Analyzes and takes into account:
• Financial information and user data
• Navigation and usage information from previous
marketing shipments
• Mobile app data (GPS, payments, browsing of
offers…)
• Users’ information from the social networks
Advantages:
Increased clickthrough
Increase in conversions and sales
Natural language processing – semantics and
sentiments
Combines private and public data
Complement private structured data with unstructured and
public data
Description:
Complementing the internal data of a company by
combining the structured and the unstructured
data, with the data generated by the web and
social networks, allows us to determine the validity
of the data of our brand, product or company.
The comparison and analysis of internal and
external data (web) increases the value of our data
and allows us to gain a competitive advantage over
our competitors.
Advantages:

 It allows sales improvement.
Improves loyalty.
Increases Conversions.
Detects errors or data manipulation.
 SEO improvement with regards to the users and
the public data.
Improves marketing and product boosting with
regards to trends.
Big Data

Page 32
BI and data analytics
Description:
Creation and/or complementation of BI systems and
data analytics
ETL tools and data uploading with a much higher
volume than the traditional ones
Capacity for analysis and visualization of all types of
data, including graphs and new data types
Advantages:
Ability to work with larger datasets without the need
to add or delete
Much faster and reliable systems
Massive reduction in cost (M € versus k €)
Natural language processing – semantics and
sentiments
A possibility to combine internal data with external
data (private and public data)
Telefónica Dynamic Insights (Smart Steps)
Description:
Collect mobile data, anonymised and
aggregated, to understand how segments of
the population collectively behave. Trace
trends and the behaviours of crowds, not
individuals. Use this insight to enlighten the
space between organisations and their
users, enabling them to improve their
propositions, and businesses.

Focus:
By being able to measure real behaviour, in
near real-time, 24/7, 365 days a year, we
can show the actual impact on society,
therefore enabling businesses and local
government to make better decisions.
Security and fraud detection
Description:
Analysis of large volumes of data, logs, security
systems, transactional systems
Faster correlation mechanisms and machine learning
algorithms allow early detection of attacks and
security risks with extra care to false positives
Internal fraud detection analyzing data and events
from applications and risk operations
Advantages:
Combines data from transactional systems with the
SIEM to help fight fraud
Tracks and identifies new fraud methods and trends
via user reviews
Fraud detection techniques specified through the use
of built-in patterns
Much larger data volumes and much higher velocity
Combines private and public data
M2M IoT: PARK AIR SYSTEMS
NORWAY (RMMS)

Description:
The Remote Maintenance & Monitoring System
(RMMS), provide a powerful, scalable and flexible
SCADA system to perform and wide range of tasks
required by CNS agents such as maintenance,
supervision, configuration and operation.
Integration of different systems and equipment shall be
possible and straightforward using open standard
protocols, real time monitoring, data storage, testing,
reporting, events notification,…

Focus:

The main task of the RMMS is to provide complete
access to the equipment supervised in order to monitor
every single available parameter as a mean of avoiding
personnel mobilization to the remote location.
Different levels of control over the system are also
provided to cover the requirements of supervision,
maintenance and control.
Five main elements compose the RMM system:
• RCSU: Remote Control and Status Unit.
• TP: Tower Panel.
• RMM: Remote Management & Monitoring.
• LMT/RMT: Local / Remote Management Terminal.
• CMMS: Central Management & Monitoring System.
Search Engines
Description:
Big Data Search Assist: Search engines optimized for Big
Data with self-learning improvements based on use
Search engines for websites, intranets, apps
With instant real-time search, single box with natural
language processing, suggestions, highlighting,
automatic corrections, “you wanted to say” tips, etc ...
Advantages:
Easy management for business users: Order of results,
filters, etc ...

Advanced features of the search engines with a cost ten
times lower than other solutions
Improved performance and scalability compared to
other solutions
Easy to integrate and use
ORM and social dialogue

Description:
It gives a full 360 º of a company or brand online,
showing a tool that integrates the three aspects that
define your actual online image:
How am I doing on social networks?:
Do I know how to usevfacebook, twitter, google +,
youtube, linkedin? How many followers do you have,
are you an influencer, do you generate content that
spreads out?
What is my presence and reputation on the Internet:
When it comes to me, how do people talk about me,
what is said, how does it evolve over time, what is my
position on the Internet regarding my competitors in
the different aspects that interest me.
SEO:
Simple and practical analysis of both internal SEO and
external SEO to complement and give an integrated
view of the above aspects of reputation and social
dialogue.
Advantages:
Real improvement of the company or the product by
analysing the evolution over time of the three major
aspects that define your online reputation.
It improves the negative aspects, and reinforce the
positive ones.
Increase in sales: Helps optimize and follow
marketing campaigns and improve sales.
Improving conversions and attracting new customers.
Social Mining
Description:
Analyzing various social networks and
movements, looking for brand penetration,
identifying influencers in conversations and
a static map of associated terms.
Advantages:

Entering the social dialogue and hot topics at
the right time multiplies by 100 times the
viralization
View how a social network moves as time goes
by
Allows to know what that the user is talking
about when referring to my products or my
brand.
Detection of influencers and detractors

 Optimal visualization of the information.
Identification of the tags used most frequently
by the network to improve your SEO.
Social Network Tracking
Description:
Search the social network comments and
mentions of interest of a particular issue or event
for further evaluation, influencers detection and
graphical display of the conversation to facilitate
analysis.
Advantages:

Show real-time event (symposium, forum,
seminar, etc..) with visual information.
 Get opinions and feelings about a topic in social
networks in real time

Identify the influencers of a hot topic
 Risk detection and prevention
 Emotional mining: Know the term that is most
popular for some people, brand, event, etc.and
this way you can know about the generated
feelings by the most important terms.
Web Content Scraping
Description:
Search the network content and publications on
specific subjects of our interest, to detect, filter,
collect and process relevant information in semireal time or batch.
Associated with the semantic analysis this allows
the detection and classification of the contents
effectively.
Advantages:
Allows the generating of sites in a dynamic way
without any intervention or exhaustive searches,
with the contents collected and categorized.
Unifies in a single web all the tasks that users have
to do manually, so it saves them money and
generates loyalty.
Tele5: Monitoring of logs for Streaming Videos
Description:

Monitoring the download and
streamming of videos.
Analysis of streaming

Quality of streaming
Peaks of service and bottle neck

Advantages:
Problems detection and alerts

Optimization of service
Tracking of campains
Massive information tagging

Description:
Allows you to label and categorize automatically and
massively, any type of content or information.
Advantages:
Allows searching, categorization, clustering, and be
able to extract value out of information otherwise
hardly findable and usable.

Utilizes state of the art tools to identify entities, NED
systems, NERD. These tools combined with the use of
disambiguation of entities using a Big Data system
containing the Wikipedia and other sources of
information.
Speed ​processing capabilities and data volume
superior to that of other systems.
SUMMARY
Is not about Big Data, is about getting maximum value from data:

Get all the value data can give
Process and analyze new types of data: Unstructured, semistructured, streams of data
Convert data into big insights
Become a Data driven company
“the best way to predict the future is to create it”
Ride The “Big Data” wave
Q&A

Impacto del Big Data en la empresa española

  • 1.
    BIG DATA INBUSINESS
  • 2.
    Big Data Is ita real need or just trendy? Why does it apply to my case?
  • 3.
    Petabytes: Google 300PB, facebook: 45 PB, Yahoo! 180 PB Exabytes: U.S. healthcare Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB YottaByte, Brontobyte, GeopByte to be reached I do not have such a big volume of data A big European company = Terabytes
  • 4.
    But could orwill have it: Ever increasing amount of data, and more heterogeneous: Ubiquity, mobility, geolocation, social networks, internet, sensors, M2M CRMs, Call Centers, Emails, Documents, logs, voice…
  • 5.
    "There were 5exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days." Google Ceo Eric Schmidt
  • 6.
    Unstructured or semistructured data, equal to 85% of available data, is not used by companies This represent the new Fuel for companies
  • 7.
    83% of thesurveyed companies were able to do things with Big Data that seemed impossible to achieve before “The art of possible” “Impossible is not a fact, it’s an opinion”
  • 8.
    Value and realROI are the best KPIs •Increase of client acquisitions • Increase in sales • Resource optimization • Customer loyalty
  • 9.
    You can’t staystuck in old paradigms
  • 10.
  • 15.
    Extract value fromdata in any point of their life cycle • Past: Stored data, Batch mode • Present: Current data flows, Real time • Future: Data and future actions, Predictive
  • 16.
    Big volume ofdata Get value from Unstructured data Get value from external data Need for time or cost processsing reduction Need for Data streaming analysis in real time Algorithms, prediction or interactive analysis Transform data into insights and value Transformation to a Data driven company
  • 17.
  • 18.
    “I know Ihave to change to Big Data but…” How do I start to use with? When? Which technology? How do I acquire the knowledge?
  • 19.
  • 20.
    Iterative and Cyclical Choosea particular use case with a clear ROI and time and budget limits vs Big Bang Avoid building a Big Data generic system and then implementing projects over them
  • 21.
  • 22.
    A Technological Change FromBig Data 1.0 Bigtable To Big Data 2.0 12 YEARS GAP Big query F1
  • 23.
    CUSTOMER SOLUTION Big Data2.0 ∙ Up to 100x faster than Big Data 1.0 ∙ Interactive analysis ∙ NoSQL with SQL Interface ∙ No need to change previous way of work
  • 24.
    Which technology? BIG DATA2.0 Stratio Cloudera Impala Cloudera CDH4* BIG DATA 1.0 NoSQL Stream Processing Hortonworks HDP* EMC Pivotal HD VoltDB Storm Microsoft HDInsight C-Store Apache HBASE MapR Apache Drill Espresso Apache CouchDB Scribe Aurora SQLStream Platform Cassandra FS Apache HDFS Open Source Google Big Query IBM Inphosphere Biginsight Datastax Platform Hadapt platform Basho Riak VMWare Redis HP Vertica Hstreaming Platform Apache Giraph Amazon EMR _& Red shift MapR M3-M5-M7 EMC Greenplum Voldemort Apache S4 Apache Flume Kafka NEO Techonology Neo4j* Almacenamiento Intel Hadoop Mencache EsperTech ESPER Graph database Hortonworks Stinger StreamBase Platform IBM Inphosphere Streams FlockDB EMC Isilon OnFS Closed based on Open Source Closed Apache Cassandra
  • 25.
    From Big Data1.0 Batch of new technologies that allow us to extract value out of a dataset which, due to it’s volume, variety or velocity, was not previously exploited To Big Data 2.0 “Set of new technologies that extract value from all the available data of a company”
  • 26.
  • 27.
  • 28.
    You must enterin the user bubble
  • 29.
    Antena 3, nubeox: Big Data Recommendation engine Monitoring of Streaming Videos Description: Recommendation Engine based not only in the purchase history of the customer, but also in their navigation Advantages: Increase in clickthrough Increasing Conversions Increase in sales
  • 30.
    Customizing Web Sites:Behavioural Customization Description: Customizing homepages based on user navigation Analysis and customization of the homepage and site in real time for each user based on their browsing Modification of contents, highlights, ads, in real time based on user history Advantages: Over 300% increase in clickthrough Creating millions of web pages in real time Increasing Conversions Increase in sales Cost ten times lower than other solutions Recommended links News Interests Top Searches +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected
  • 31.
    Personalized Marketing withDataShake integration Description: Newsletter development, email-marketing or any other sent material segmented by individual preferences Analyzes and takes into account: • Financial information and user data • Navigation and usage information from previous marketing shipments • Mobile app data (GPS, payments, browsing of offers…) • Users’ information from the social networks Advantages: Increased clickthrough Increase in conversions and sales Natural language processing – semantics and sentiments Combines private and public data
  • 32.
    Complement private structureddata with unstructured and public data Description: Complementing the internal data of a company by combining the structured and the unstructured data, with the data generated by the web and social networks, allows us to determine the validity of the data of our brand, product or company. The comparison and analysis of internal and external data (web) increases the value of our data and allows us to gain a competitive advantage over our competitors. Advantages:  It allows sales improvement. Improves loyalty. Increases Conversions. Detects errors or data manipulation.  SEO improvement with regards to the users and the public data. Improves marketing and product boosting with regards to trends. Big Data Page 32
  • 33.
    BI and dataanalytics Description: Creation and/or complementation of BI systems and data analytics ETL tools and data uploading with a much higher volume than the traditional ones Capacity for analysis and visualization of all types of data, including graphs and new data types Advantages: Ability to work with larger datasets without the need to add or delete Much faster and reliable systems Massive reduction in cost (M € versus k €) Natural language processing – semantics and sentiments A possibility to combine internal data with external data (private and public data)
  • 34.
    Telefónica Dynamic Insights(Smart Steps) Description: Collect mobile data, anonymised and aggregated, to understand how segments of the population collectively behave. Trace trends and the behaviours of crowds, not individuals. Use this insight to enlighten the space between organisations and their users, enabling them to improve their propositions, and businesses. Focus: By being able to measure real behaviour, in near real-time, 24/7, 365 days a year, we can show the actual impact on society, therefore enabling businesses and local government to make better decisions.
  • 35.
    Security and frauddetection Description: Analysis of large volumes of data, logs, security systems, transactional systems Faster correlation mechanisms and machine learning algorithms allow early detection of attacks and security risks with extra care to false positives Internal fraud detection analyzing data and events from applications and risk operations Advantages: Combines data from transactional systems with the SIEM to help fight fraud Tracks and identifies new fraud methods and trends via user reviews Fraud detection techniques specified through the use of built-in patterns Much larger data volumes and much higher velocity Combines private and public data
  • 36.
    M2M IoT: PARKAIR SYSTEMS NORWAY (RMMS) Description: The Remote Maintenance & Monitoring System (RMMS), provide a powerful, scalable and flexible SCADA system to perform and wide range of tasks required by CNS agents such as maintenance, supervision, configuration and operation. Integration of different systems and equipment shall be possible and straightforward using open standard protocols, real time monitoring, data storage, testing, reporting, events notification,… Focus: The main task of the RMMS is to provide complete access to the equipment supervised in order to monitor every single available parameter as a mean of avoiding personnel mobilization to the remote location. Different levels of control over the system are also provided to cover the requirements of supervision, maintenance and control. Five main elements compose the RMM system: • RCSU: Remote Control and Status Unit. • TP: Tower Panel. • RMM: Remote Management & Monitoring. • LMT/RMT: Local / Remote Management Terminal. • CMMS: Central Management & Monitoring System.
  • 37.
    Search Engines Description: Big DataSearch Assist: Search engines optimized for Big Data with self-learning improvements based on use Search engines for websites, intranets, apps With instant real-time search, single box with natural language processing, suggestions, highlighting, automatic corrections, “you wanted to say” tips, etc ... Advantages: Easy management for business users: Order of results, filters, etc ... Advanced features of the search engines with a cost ten times lower than other solutions Improved performance and scalability compared to other solutions Easy to integrate and use
  • 38.
    ORM and socialdialogue Description: It gives a full 360 º of a company or brand online, showing a tool that integrates the three aspects that define your actual online image: How am I doing on social networks?: Do I know how to usevfacebook, twitter, google +, youtube, linkedin? How many followers do you have, are you an influencer, do you generate content that spreads out? What is my presence and reputation on the Internet: When it comes to me, how do people talk about me, what is said, how does it evolve over time, what is my position on the Internet regarding my competitors in the different aspects that interest me. SEO: Simple and practical analysis of both internal SEO and external SEO to complement and give an integrated view of the above aspects of reputation and social dialogue. Advantages: Real improvement of the company or the product by analysing the evolution over time of the three major aspects that define your online reputation. It improves the negative aspects, and reinforce the positive ones. Increase in sales: Helps optimize and follow marketing campaigns and improve sales. Improving conversions and attracting new customers.
  • 39.
    Social Mining Description: Analyzing varioussocial networks and movements, looking for brand penetration, identifying influencers in conversations and a static map of associated terms. Advantages: Entering the social dialogue and hot topics at the right time multiplies by 100 times the viralization View how a social network moves as time goes by Allows to know what that the user is talking about when referring to my products or my brand. Detection of influencers and detractors  Optimal visualization of the information. Identification of the tags used most frequently by the network to improve your SEO.
  • 40.
    Social Network Tracking Description: Searchthe social network comments and mentions of interest of a particular issue or event for further evaluation, influencers detection and graphical display of the conversation to facilitate analysis. Advantages: Show real-time event (symposium, forum, seminar, etc..) with visual information.  Get opinions and feelings about a topic in social networks in real time Identify the influencers of a hot topic  Risk detection and prevention  Emotional mining: Know the term that is most popular for some people, brand, event, etc.and this way you can know about the generated feelings by the most important terms.
  • 41.
    Web Content Scraping Description: Searchthe network content and publications on specific subjects of our interest, to detect, filter, collect and process relevant information in semireal time or batch. Associated with the semantic analysis this allows the detection and classification of the contents effectively. Advantages: Allows the generating of sites in a dynamic way without any intervention or exhaustive searches, with the contents collected and categorized. Unifies in a single web all the tasks that users have to do manually, so it saves them money and generates loyalty.
  • 42.
    Tele5: Monitoring oflogs for Streaming Videos Description: Monitoring the download and streamming of videos. Analysis of streaming Quality of streaming Peaks of service and bottle neck Advantages: Problems detection and alerts Optimization of service Tracking of campains
  • 43.
    Massive information tagging Description: Allowsyou to label and categorize automatically and massively, any type of content or information. Advantages: Allows searching, categorization, clustering, and be able to extract value out of information otherwise hardly findable and usable. Utilizes state of the art tools to identify entities, NED systems, NERD. These tools combined with the use of disambiguation of entities using a Big Data system containing the Wikipedia and other sources of information. Speed ​processing capabilities and data volume superior to that of other systems.
  • 44.
  • 45.
    Is not aboutBig Data, is about getting maximum value from data: Get all the value data can give Process and analyze new types of data: Unstructured, semistructured, streams of data Convert data into big insights Become a Data driven company
  • 46.
    “the best wayto predict the future is to create it”
  • 47.
    Ride The “BigData” wave
  • 48.

Editor's Notes

  • #2 Hilo de la presentación:TESIS----------Aparación de Big Data 2.0 (cambioedparadigma Big Query)Requerimientos: 100XNecesidad de arquitectura NO-HADOOP paraconseguirestosrequerimientosOPORTUNIDAD------------------------Dado quees la únicaplataforma NO-HADOOP open source, si la tesisescorrectaserá:The Open Source Big Data 2.0 Platform
  • #23 A technological Change from Big Data 1.0 to Big Data 2.0, from Batchanalysis 12 years old technology Batch analysis, to interactive analysisstate of the art.Este proyecto se basa en la tesis de que se estáproduciendo un cambiotecnológico en el mundo de Big Data, querequiere un mayor rendimientocon capacidades de analisisinteractivo y capacidades de queries entiempo real. Se requiere un rendimiento 100X superior paraconvertir enunospocosminutoslashorasque se necesitaban con lastecnologíasanteriores.Para conseguirestascapacidadesesnecesarioabandonarhadoop, cuyaarquitecuraestálimitadaporconceptos con 12 años de antiguedad, comosunecesidad y dependendia de la persistencia en disco, y escrituras nooptimizads, que no permitiráalcanzar los requerimientos de 100XPerformace.En lugar de sin seguir con retraso los pasosya dados porotros, Stratiodesarrolla y proporciona la únicaplataforma Big Data open source nobasada en hadoop, creando y definiendonuevosparadigmas y posibilidadesquehanpermitidorealizarunaarquitecturaintegradaúnicatotalmenteconcebidapara el máximorendimiento 100X requeridoactualmente,adaptable, y sin vendor lock-in.