www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI and Talend jointly support Big Data scenarios
Monica Franceschini - SpagoBI Architect
SpagoBI Competency Center - Engineering Group
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Big-data
• Agenda
– Intro & definitions
– Layers
– Talend & SpagoBI
– SpagoBI big-data roadmap
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Big Data - 3Vs
"Big data" is high-volume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information processing for enhanced
insight and decision making.
Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas. Gartner, 21 June 2012.
VOLUME The increase in data volumes within enterprise systems is
caused by transaction volumes and other traditional data types, as
well as by new types of data. Too much volume is a storage issue, but
too much data is also a massive analysis issue
VARIETY IT leaders have always had an issue translating large volumes
of transactional information into decisions — now there are more
types of information to analyze — mainly coming from social media
and mobile (context-aware). Variety includes tabular data (databases),
hierarchical data, documents, e-mail, metering data, video, still
images, audio, stock ticker data, financial transactions and more.
VELOCITY This involves streams of data, structured record creation, and
availability for access and delivery. Velocity means both how fast data
is being produced and how fast the data must be processed to meet
demand
Gartner Press Release, “Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data”, June
27, 2011
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Big Data- 3Vs & more
VARIABILITY
variance in meaning, in lexicon
VERACITY
1 in 3 business leaders don’t trust the information they use to make
decisions. How can you act upon information if you don’t trust it?
Establishing trust in big data presents a huge challenge as the
variety and number of sources grows.
VALUE
The economic value of different data varies significantly. Typically
there is good information hidden amongst a larger body of non-
traditional data; the challenge is identifying what is valuable and
then transforming and extracting that data for analysis.
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Big data - Layers
• Infastructure
– On-site
– IaaS
• Data management:
– capture
– cleaning
– loading
– store
• View and Analyse
– Text analysis
– Text mining
– exploration, navigation, presentation
• Application
– Cloud
– SaaA
ETL
Business Intelligence
Services
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Big data & Businessn Intelligence
• Tasks:
– Manage big-data (ETL) Talend→
– Read, interpret and show big-data (BI) SpagoBI→
– Big-data and real-time (BI) SpagoBI→
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Talend - Big Data Management
Big Data
Production
Big Data Management
Big Data
Consumption
Storage
Processing
Filtering
Mining
Analytics
Search
Enrichment
RDBMS
Analytical DB
NoSQL DB
ERP/CRM
SaaS
Social Media
Web Analytics
Log Files
RFID
Call Data Records
Sensors
Machine-Generated
Big Data
Integration
Big Data
Quality
Turn Big Data into
actionable information
Parsing
Checking
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Talend Goal: democratize Big Data
…an open source
ecosystem
Talend Open Studio for Big Data
“Big Data for the Masses”
 Improves efficiency of big data job design with
graphic interface
 Abstracts and generates code
 Run transforms inside Hadoop
 Native support for HDFS, Sqoop, HBase,
Mahout, Pig, Hive & MapReduce code generat°
 Apache License 2.0
 Embedded in Hortonworks Data Platform
 Certifed with Cloudera, MapR and Grenplum
HCatalog
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
ETL: Analytical databases & appliances
Connectors from/to:
‗Greenplum
‗Netezza
‗Sybase
‗Teradata
‗VectorWise
‗Vertica
‗HDFS
‗HBase
‗Hive
‗Cassandra
‗MongoDB
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI - load
Certified appliances:
‗Teradata
‗VectorWise
Connectors from:
‗Cassandra
‗HBase
‗Hive
‗Impala
‗Hadoop
RT with:
‗Storm
‗WSO2
More:
‗Scheduled data-set
‗In-memory data set
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI - meaning
Support for open standards:
‗RDF (Resource Description Framework) http://www.w3.org/RDF/
‗OWL (Web Ontology Language) http://www.w3.org/OWL/
‗R
‗Mahout
‗Text mining
Connectors from:
‗Neo4J
‗Freebase
‗OrientDB
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI - show
Explorative front-end
‗Network analysis
‗Exploration
‗In-memory
‗Data visualization
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI - roadmap
• Capture / Store
– Talend, connector to/from:
• Greenplum
• Netezza
• Sybase
• Teradata
• VectorWise
• Vertica
• HDFS
• HBase
• Hive
• Cassandra
• MongoDB
• …
• LOAD
– Certified appliances:
• Teradata
• VectorWise
– Connectors from:
• Cassandra
• HBase
• Hive
• Impala
• Hadoop
• MongoDB
– RT with:
• Storm
• WS02
– More:
• Scheduled data-set
• In-memory data set
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
SpagoBI - roadmap
• Meaning
– Connectors from:
• Neo4J
• Freebase
• OrientDB
– Support for open standards:
• RDF
• OWL
– Mining
• R
• MashR
• Text mining
• Show
– Explorative front-end
– Network analysis
– Data visualization
• Services
– Big data as a service
• Multitenant
• Cloud
• BI as a service (ad-hoc+self-service)
Data scientist
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
Bundle Talend -SpagoBI
The bundle will provide:
a distribution of both tools
interacting one with each other
a use-case that can be run to explore
their functionalities
SpagoBI and Talend announce their bundle!
www.spagobi.orgCopyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.
@twittmonique
Monica.franceschini@eng.it

Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios

  • 1.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI and Talend jointly support Big Data scenarios Monica Franceschini - SpagoBI Architect SpagoBI Competency Center - Engineering Group
  • 2.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Big-data • Agenda – Intro & definitions – Layers – Talend & SpagoBI – SpagoBI big-data roadmap
  • 3.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Big Data - 3Vs "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas. Gartner, 21 June 2012. VOLUME The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue VARIETY IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more. VELOCITY This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand Gartner Press Release, “Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data”, June 27, 2011
  • 4.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Big Data- 3Vs & more VARIABILITY variance in meaning, in lexicon VERACITY 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows. VALUE The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non- traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
  • 5.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Big data - Layers • Infastructure – On-site – IaaS • Data management: – capture – cleaning – loading – store • View and Analyse – Text analysis – Text mining – exploration, navigation, presentation • Application – Cloud – SaaA ETL Business Intelligence Services
  • 6.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Big data & Businessn Intelligence • Tasks: – Manage big-data (ETL) Talend→ – Read, interpret and show big-data (BI) SpagoBI→ – Big-data and real-time (BI) SpagoBI→
  • 7.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Talend - Big Data Management Big Data Production Big Data Management Big Data Consumption Storage Processing Filtering Mining Analytics Search Enrichment RDBMS Analytical DB NoSQL DB ERP/CRM SaaS Social Media Web Analytics Log Files RFID Call Data Records Sensors Machine-Generated Big Data Integration Big Data Quality Turn Big Data into actionable information Parsing Checking
  • 8.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Talend Goal: democratize Big Data …an open source ecosystem Talend Open Studio for Big Data “Big Data for the Masses”  Improves efficiency of big data job design with graphic interface  Abstracts and generates code  Run transforms inside Hadoop  Native support for HDFS, Sqoop, HBase, Mahout, Pig, Hive & MapReduce code generat°  Apache License 2.0  Embedded in Hortonworks Data Platform  Certifed with Cloudera, MapR and Grenplum HCatalog
  • 9.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. ETL: Analytical databases & appliances Connectors from/to: ‗Greenplum ‗Netezza ‗Sybase ‗Teradata ‗VectorWise ‗Vertica ‗HDFS ‗HBase ‗Hive ‗Cassandra ‗MongoDB
  • 10.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI - load Certified appliances: ‗Teradata ‗VectorWise Connectors from: ‗Cassandra ‗HBase ‗Hive ‗Impala ‗Hadoop RT with: ‗Storm ‗WSO2 More: ‗Scheduled data-set ‗In-memory data set
  • 11.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI - meaning Support for open standards: ‗RDF (Resource Description Framework) http://www.w3.org/RDF/ ‗OWL (Web Ontology Language) http://www.w3.org/OWL/ ‗R ‗Mahout ‗Text mining Connectors from: ‗Neo4J ‗Freebase ‗OrientDB
  • 12.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI - show Explorative front-end ‗Network analysis ‗Exploration ‗In-memory ‗Data visualization
  • 13.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI - roadmap • Capture / Store – Talend, connector to/from: • Greenplum • Netezza • Sybase • Teradata • VectorWise • Vertica • HDFS • HBase • Hive • Cassandra • MongoDB • … • LOAD – Certified appliances: • Teradata • VectorWise – Connectors from: • Cassandra • HBase • Hive • Impala • Hadoop • MongoDB – RT with: • Storm • WS02 – More: • Scheduled data-set • In-memory data set
  • 14.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. SpagoBI - roadmap • Meaning – Connectors from: • Neo4J • Freebase • OrientDB – Support for open standards: • RDF • OWL – Mining • R • MashR • Text mining • Show – Explorative front-end – Network analysis – Data visualization • Services – Big data as a service • Multitenant • Cloud • BI as a service (ad-hoc+self-service) Data scientist
  • 15.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved.Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved. Bundle Talend -SpagoBI The bundle will provide: a distribution of both tools interacting one with each other a use-case that can be run to explore their functionalities SpagoBI and Talend announce their bundle!
  • 16.
    www.spagobi.orgCopyright © 2013Engineering Group, SpagoBI Competency Center. All rights reserved. @twittmonique Monica.franceschini@eng.it