Talend: Solutions Overview
Presenter: Rajan Kanitkar
Talend Big Data Overview
© Talend 2012
The Drivers for Big Data
Volume
Velocity
Variety
© Talend 2012
The defacto standard for big data processing
How to process big data?
© Talend 2012
Apache Hadoop, an open-source software library, is a framework
that allows for the distributed processing of...
© Talend 2012
The Big Data Ecosystem
Hadoop: the core project
HDFS: the Hadoop Distributed File System
MapReduce: the soft...
Talend Big Data Overview
© Talend 2012
Key differentiator Of Our Next Gen Architecture…
JAVA
ETL
Day-to-day
integration
Run everywhere
SQL
ELT
DW
a...
© Talend 2012 12
Talend Unique Integration Solution
Consolidated
metadata & project
information
Repository
2
Web-based
dep...
© Talend 2012
Talend Big Data Product Strategy
Big Data Integration
▶ Land data in a Big Data cluster without coding
▶ Cod...
© Talend 2012
…an open source
ecosystem
Talend Open Studio for Big Data
• Improves efficiency of big data job design
with ...
© Talend 2012
…an open source
ecosystem
Talend Platform for Big Data
• Builds on Talend Open Studio for Big Data
• Adds da...
© Talend 2012
Talend Big Data Partnerships
Hadoop DistributionsTalend Big Data
Partners
© Talend 2012
Demonstration: ETL for Big Data with Talend
Extract
Transform
Load
Talend Demo
2013
Upcoming SlideShare
Loading in...5
×

Talend Big Data Capabilities Overview

3,873

Published on

Talend Big Data Capabilities Overview

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,873
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
237
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Finally, the entire big data world has been built as an open source ecosystem. This all makes sense… talend is the open source leader.To this end we will introduce the first compelte set of tools that will democratize big data. Talend Open Studio for Big Data
  • Finally, the entire big data world has been built as an open source ecosystem. This all makes sense… talend is the open source leader.To this end we will introduce the first compelte set of tools that will democratize big data. Talend Open Studio for Big Data
  • Purpose of the slide:Present the 2nd big change in the Integration Market: Big DataRemember, this is not (yet) about Talend solutions, this is about changes in the market that will impact the prospect’s company, no matter what.___________________________________________________________________________________________________________________Key themes: The Big Data RevolutionThe volume, variety and complexity of data is growing explosivelyTo serve the needs of customers for the next decade, next generation integration platforms must be designed & architected with these requirement in mind.Processing of data needs to be flexible based on customer’s use cases: eithercentralized for simple batch jobs OR distributed and parallelized for high volume complex transformations. Processing of data needs to be cost effective by leveraging and optimizing an organization’s full IT infrastructure (ELT for example) AND/OR external cloud platforms to take advantage of elasticity.New technologies for Big Data like NoSQL or Hadoop are not compatible with existing DatabasesIn the last 10-15 years companies have made massive investments in DW and relational databases that are not compatible with the Big Data technologies. Data integration technologies today need to address all different database and processing technologies that will co-exist as customers will need to leverage their past investments with new emerging Big Data technologies.Big Data is inevitable but relatively new and apt to change so any technology must also be Adaptable to knew technologies, architectures and topologies.Animportant attribute of this market trend is the need for scalability that Big Data necessitates, whether scaling up or scaling down based on the business needs. Note: Don’t forget garbage in, garbage out exponentially with Big Data. Therefore Data Quality is more important than ever with Big Data.
  • Transcript of "Talend Big Data Capabilities Overview"

    1. 1. Talend: Solutions Overview Presenter: Rajan Kanitkar
    2. 2. Talend Big Data Overview
    3. 3. © Talend 2012 The Drivers for Big Data Volume Velocity Variety
    4. 4. © Talend 2012 The defacto standard for big data processing How to process big data?
    5. 5. © Talend 2012 Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. What is Hadoop?
    6. 6. © Talend 2012 The Big Data Ecosystem Hadoop: the core project HDFS: the Hadoop Distributed File System MapReduce: the software framework for distributed processing of large data sets Hive: a data warehouse infrastructure that provides data summarization and a querying language Pig: a high-level data-flow language and execution framework for parallel computation HBase: this is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc. Thanks to you all! Google, Amazon, Facebook, Twitter, Yahoo, 10gen, Cloudera, Hortonworks, MapR, etc.
    7. 7. Talend Big Data Overview
    8. 8. © Talend 2012 Key differentiator Of Our Next Gen Architecture… JAVA ETL Day-to-day integration Run everywhere SQL ELT DW appliance Teradata, Netezza… MapReduce Hadoop Highly Scalable Hadoop Grid CAMEL CAMEL Message transform- ation High Frequency  No black-box engine  Enables light-weight distributed, customizable and parallelizable run time  Standards-Based Code Generator
    9. 9. © Talend 2012 12 Talend Unique Integration Solution Consolidated metadata & project information Repository 2 Web-based deployment & scheduling Deployment 3 Same container for batch processing, message routing & services Execution 4 Single web-based monitoring console Monitoring 5 Comprehensive Eclipse-based user interface 1 Studio Data Quality Data Integration MDM ESB BPM Best-of- Breed Solutions + Talend Unified Platform = Unique Integration Solution
    10. 10. © Talend 2012 Talend Big Data Product Strategy Big Data Integration ▶ Land data in a Big Data cluster without coding ▶ Code generation for MapReduce, HDFS, Hbase, Pig, Hive, Hcatalog, etc. Big Data Manipulation ▶ Simplify manipulation, such as sort and filter ▶ Computational expensive functions using Hadoop Big Data Quality & Governance ▶ Identify linkages & duplicates, validate big data ▶ Match component, execute basic quality features Big Data Project Management ▶ Place frameworks around big data projects ▶ Common Repository, scheduling, monitoring strategic pillars
    11. 11. © Talend 2012 …an open source ecosystem Talend Open Studio for Big Data • Improves efficiency of big data job design with graphic interface • Generates Hadoop code and run transforms inside Hadoop • Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive • 100% open source under an Apache License • Standards based Pig Vision: Democratize big data
    12. 12. © Talend 2012 …an open source ecosystem Talend Platform for Big Data • Builds on Talend Open Studio for Big Data • Adds data quality, advanced scalability and management functions • MapReduce massively parallel data processing • Shared Repository and remote deployment • Data quality and profiling • Data cleansing • Reporting and dashboards • Commercial support, warranty/IP indemnity under a subscription license Pig Vision: Democratize big data
    13. 13. © Talend 2012 Talend Big Data Partnerships Hadoop DistributionsTalend Big Data Partners
    14. 14. © Talend 2012 Demonstration: ETL for Big Data with Talend Extract Transform Load
    15. 15. Talend Demo 2013
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×