SlideShare a Scribd company logo
1 of 15
About Talend Corporation and Their Journey:
•Talend was founded in 2005 and the first product Talend open studio for Data Integration
was launched in October 2006.
•Talend is sponsoring to many open source technology foundations Apache, Eclipse ...
•Talend currently employs engineers to work on Apache projects like Apache Karaf, Active
MQ, Hadoop..
•Talend has total 600+ employees base and it has total 1300 enterprise customers across
range of Industries.
What is Talend?
• Talend is an open source project for data integration studio based on Eclipse IDE.
• Talend studio is a dynamic java or Perl or Mapreduce code generator for the respective
job design.
• Jobs created in Talend studio can be executed from within the studio or as a standalone
JAR file from external programs.
• Talend jobs can be easily embedded in custom applications or we can create custom
components in Talend, based on external application related jar files.
Products under Talend Platform:
open source Edition – Data Integration, Data Quality, ESB, MDM, and Big Data
Enterprise Edition – All open source products with additional features plus – Real-Time Big
data, Cloud integration, Meta data manager and Data fabric
Advantages of using Talend over other competitor integration tools?
1. All other competitor integration tools are very expensive and not yet matured for big data
space
2. Talend is an open source DI tool, one can initiate the project with out any budget for ETL tool
3. Talend has 900+ connectors including technologies like Bonita BPM, EXASOL in-memory db….
4. Leverage HDFS,Pig,Sqoop,Hbase,MapReduce and Hive for ETL without having the core
programming depth.
5. Can extend the Data quality, Master data management capabilities to Big data platform also.
Why Talend is getting popularity in the current trend?
•Horizontal resource scalability at runtime
•Layer of abstraction
•Breadth of functionality
•Ease of deployment and management
1).Horizontal resource scalability with Runtime servers:
We can deploy Talend jobs to AWS EC2 servers to execute the jobs using Talend Container
Service and right after completion of job execution we can terminate the lease of EC2 instance.
2).Layer of abstraction:
with one click we can change the execution engine from Mapreduce to Spark. It is just a configuration option change.
3). Breadth of functionality: In Talend you will be using same type of Job designer product for Data Integration and Big Data
endition.But in other tools we have to use different tool set for designing big data jobs.
4). Ease of deployment and management: Talend will create the hadoop job and will pass the job id information to
the YARN resource manager from there resource manager will take care of the job executions. It does not need to install any Talend related
libraries in Hadoop cluster but other tools need to install corresponding tool related big data libraries in the nodes which is part of the hadoop
cluster.
Talend for Big data Installation system Requirements:
1. Memory: 4 GB RAM
2. Disk Space: 3 GB
3. Recommended OS: Microsoft windows 7 professional, Linux ubuntu
4. Supported OS: Apple OSX
5. But it is perfectly working on my personal laptop with windows 10 home edition and
centos 7 too
6. Software: Java 8 JRE Oracle
7. Network connectivity with a properly installed and configured Hadoop cluster.
Pre-Requisites to learn Talend for Big Data?
Fundamentals of computers,sql,linux commands, Conditional statements(if .. then else)
is Java programming Mandatory for Talend DI job design?
No it is not mandatory. Once in a while when a business requirement is beyond existing
tool functionality we may have to write code routines to fulfill custom requirements.
Talend tool GUI basics :
What is Workspace?
What is project? How to create/Delete/import demo or existing project ?
Types of repository connections to connect Talend Studio?
GUI tools and Features:
Main window, Tool bar, Repository Tree view
Designer workspace, Palette, Configuration Tabs
Outline, Code Viewer
Window  Show view to bring up other configuration tabs to the main window
Metadata:
Centralize connections – File, Data base, Hadoop
What is connection, types of connections and need of connection links?
Row Connections, Trigger Connections
Sample job design and execution in Talend studio?
• Agenda: Topics covered in this Demo
• What is Big data and characteristics of a Big Data platform
• Why many customers are running after Hadoop stack
• Physical components of Hadoop cluster and its architecture
• Hadoop eco system components and use of each component
• Challenges in implementing Big data projects using conventional hadoop
• Positives and negatives in using Talend DI for Big data compared to conventional
hadoop eco system components
• Talend client - server Architecture
• Mapreduce job use case with Hand coding and Talend DI job.
What is Big Data?
Big data is a data hosting platform to host data sets that are so large or complex that existing traditional data
processing applications are insufficient to deal with them.
•Big Data = Distributed computing + Fault tolerance
•Distributed Computing – concept of shared nothing storage + parallel processing
•Fault tolerance – An ability which enables a system to continue functioning properly in the event of partial failure.
Brief Explanation on DC, FLT Tolerance using how a gigantic file blocks storage and data access on the
client server architecture vs Distributed computing architecture.
List of some example Big data platforms:
• Hadoop -- Apache, Hortonworks, Cloudera, Mapr
• Teradata Aster Ncluster
• Pivotal Big data suite
• Amazon Redshift
• Azure sql warehouse
• Microsoft Azure HDInsight
Hadoop history and role of Google in Apache Hadoop project development?
 Google problem statement and how they overcome using GFS + MapReduce.
 Later Doug Cutting created Hadoop framework with reference to Google published white paper.
How hadoop is different from traditional technologies?
• Hadoop has Big data characteristics  Distributed computing, Fault tolerance
• Due to inexpensive storage cost, one can build data lake on HDFS layer
What are the advantages of using Hadoop? In cost and Architectural feasibility prospective.
• Horizontal resource scalability
• Processing of large and/or rapidly growing data sets either structured or non-structured
• Affordable commodity hardware
• open source
• Move computation towards data rather than transfer data towards computation
High level hadoop cluster architecture and physical core components:
Multiple Nodes in a Rack  Multiple Racks in a Data center  Nodes, Racks from different
Data centers in various geographic location could be part of hadoop cluster
Hadoop eco system components:
The two main components of Apache Hadoop are HDFS for storage and MapReduce for
data processing.
• Flume
• Sqoop
• Zookeeper
• Oozie
• Pig
• Hive
• HBase
• Solr
what are the challenges in Implementing a Big data project with conventional Hadoop
framework?
• Agile methodology in Big data projects is nightmare
• Finding the right resources with MapReduce,Scala -spark skills in the market is big challenge.
• Addressing /Justifying Business : Trying to convert existing reports which were been using from past
decade to make it work for Big data rather you may have to make them understand the need of adding
Big Data to add Predictive Analytics features to compete with the business competitor.
Pros and Cons in using Talend BD DI compared to conventional Hadoop eco system
components?
Pros:
• Graphical Development of Big Data and Hadoop Jobs
• Leverage existing technical resources with bare minimum training investment
• Speedup the Big data projects with Agile methodology implementations
• Seamless tight integration is possible with related subject areas like Data quality, event-based
job scheduling, Master data management
• Runtime server execution
Constraints:
Version compatibility dependencies between Talend and Hadoop distribution.
Talend Architecture and its components:
Nexus Repository
Meta data repository
Talend Administration center
Admin/Audit/Monitoring
Execution Servers
Lab Practical on Joining 2 HDFS files and aggregate data using Talend job with
Mapreduce Engine
Lab Practical on Joining 2 HDFS files and aggregate data using conventional hadoop
hand coded Mapreduce job

More Related Content

What's hot

Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend Edureka!
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSIRemain Software
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studiosantosluis87
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Edureka!
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupportkraja2035
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Rajan Kanitkar
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to HadoopEdureka!
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor WarehousingJeffrey T. Pollock
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2DataWorks Summit
 

What's hot (20)

Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSI
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studio
 
Talend
TalendTalend
Talend
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupport
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 

Viewers also liked

Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
 
Unleashing the value of metadata with Talend
Unleashing the value of metadata with Talend Unleashing the value of metadata with Talend
Unleashing the value of metadata with Talend Jean-Michel Franco
 
Meet David - ETL / Informatica Consultant
Meet David - ETL / Informatica ConsultantMeet David - ETL / Informatica Consultant
Meet David - ETL / Informatica ConsultantDavid Hubbard
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)pomishra
 
Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...KETL Limited
 
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareLaurens De Vocht
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinarJean-Michel Franco
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Jean-Michel Franco
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendEdureka!
 
Talend Data Quality
Talend Data QualityTalend Data Quality
Talend Data QualityTalend
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 

Viewers also liked (11)

Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Unleashing the value of metadata with Talend
Unleashing the value of metadata with Talend Unleashing the value of metadata with Talend
Unleashing the value of metadata with Talend
 
Meet David - ETL / Informatica Consultant
Meet David - ETL / Informatica ConsultantMeet David - ETL / Informatica Consultant
Meet David - ETL / Informatica Consultant
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)
 
Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...
 
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Talend Data Quality
Talend Data QualityTalend Data Quality
Talend Data Quality
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 

Similar to Talend for big_data_intorduction

Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Edureka!
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big DataEdureka!
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 

Similar to Talend for big_data_intorduction (20)

Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
hadoop overview.pptx
hadoop overview.pptxhadoop overview.pptx
hadoop overview.pptx
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 

Recently uploaded

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Talend for big_data_intorduction

  • 1. About Talend Corporation and Their Journey: •Talend was founded in 2005 and the first product Talend open studio for Data Integration was launched in October 2006. •Talend is sponsoring to many open source technology foundations Apache, Eclipse ... •Talend currently employs engineers to work on Apache projects like Apache Karaf, Active MQ, Hadoop.. •Talend has total 600+ employees base and it has total 1300 enterprise customers across range of Industries. What is Talend? • Talend is an open source project for data integration studio based on Eclipse IDE. • Talend studio is a dynamic java or Perl or Mapreduce code generator for the respective job design. • Jobs created in Talend studio can be executed from within the studio or as a standalone JAR file from external programs. • Talend jobs can be easily embedded in custom applications or we can create custom components in Talend, based on external application related jar files. Products under Talend Platform: open source Edition – Data Integration, Data Quality, ESB, MDM, and Big Data Enterprise Edition – All open source products with additional features plus – Real-Time Big data, Cloud integration, Meta data manager and Data fabric
  • 2. Advantages of using Talend over other competitor integration tools? 1. All other competitor integration tools are very expensive and not yet matured for big data space 2. Talend is an open source DI tool, one can initiate the project with out any budget for ETL tool 3. Talend has 900+ connectors including technologies like Bonita BPM, EXASOL in-memory db…. 4. Leverage HDFS,Pig,Sqoop,Hbase,MapReduce and Hive for ETL without having the core programming depth. 5. Can extend the Data quality, Master data management capabilities to Big data platform also. Why Talend is getting popularity in the current trend? •Horizontal resource scalability at runtime •Layer of abstraction •Breadth of functionality •Ease of deployment and management
  • 3. 1).Horizontal resource scalability with Runtime servers: We can deploy Talend jobs to AWS EC2 servers to execute the jobs using Talend Container Service and right after completion of job execution we can terminate the lease of EC2 instance.
  • 4.
  • 5. 2).Layer of abstraction: with one click we can change the execution engine from Mapreduce to Spark. It is just a configuration option change. 3). Breadth of functionality: In Talend you will be using same type of Job designer product for Data Integration and Big Data endition.But in other tools we have to use different tool set for designing big data jobs. 4). Ease of deployment and management: Talend will create the hadoop job and will pass the job id information to the YARN resource manager from there resource manager will take care of the job executions. It does not need to install any Talend related libraries in Hadoop cluster but other tools need to install corresponding tool related big data libraries in the nodes which is part of the hadoop cluster.
  • 6. Talend for Big data Installation system Requirements: 1. Memory: 4 GB RAM 2. Disk Space: 3 GB 3. Recommended OS: Microsoft windows 7 professional, Linux ubuntu 4. Supported OS: Apple OSX 5. But it is perfectly working on my personal laptop with windows 10 home edition and centos 7 too 6. Software: Java 8 JRE Oracle 7. Network connectivity with a properly installed and configured Hadoop cluster. Pre-Requisites to learn Talend for Big Data? Fundamentals of computers,sql,linux commands, Conditional statements(if .. then else) is Java programming Mandatory for Talend DI job design? No it is not mandatory. Once in a while when a business requirement is beyond existing tool functionality we may have to write code routines to fulfill custom requirements. Talend tool GUI basics : What is Workspace? What is project? How to create/Delete/import demo or existing project ? Types of repository connections to connect Talend Studio?
  • 7. GUI tools and Features: Main window, Tool bar, Repository Tree view Designer workspace, Palette, Configuration Tabs Outline, Code Viewer Window  Show view to bring up other configuration tabs to the main window Metadata: Centralize connections – File, Data base, Hadoop What is connection, types of connections and need of connection links? Row Connections, Trigger Connections Sample job design and execution in Talend studio?
  • 8. • Agenda: Topics covered in this Demo • What is Big data and characteristics of a Big Data platform • Why many customers are running after Hadoop stack • Physical components of Hadoop cluster and its architecture • Hadoop eco system components and use of each component • Challenges in implementing Big data projects using conventional hadoop • Positives and negatives in using Talend DI for Big data compared to conventional hadoop eco system components • Talend client - server Architecture • Mapreduce job use case with Hand coding and Talend DI job.
  • 9. What is Big Data? Big data is a data hosting platform to host data sets that are so large or complex that existing traditional data processing applications are insufficient to deal with them. •Big Data = Distributed computing + Fault tolerance •Distributed Computing – concept of shared nothing storage + parallel processing •Fault tolerance – An ability which enables a system to continue functioning properly in the event of partial failure. Brief Explanation on DC, FLT Tolerance using how a gigantic file blocks storage and data access on the client server architecture vs Distributed computing architecture.
  • 10. List of some example Big data platforms: • Hadoop -- Apache, Hortonworks, Cloudera, Mapr • Teradata Aster Ncluster • Pivotal Big data suite • Amazon Redshift • Azure sql warehouse • Microsoft Azure HDInsight Hadoop history and role of Google in Apache Hadoop project development?  Google problem statement and how they overcome using GFS + MapReduce.  Later Doug Cutting created Hadoop framework with reference to Google published white paper. How hadoop is different from traditional technologies? • Hadoop has Big data characteristics  Distributed computing, Fault tolerance • Due to inexpensive storage cost, one can build data lake on HDFS layer What are the advantages of using Hadoop? In cost and Architectural feasibility prospective. • Horizontal resource scalability • Processing of large and/or rapidly growing data sets either structured or non-structured • Affordable commodity hardware • open source • Move computation towards data rather than transfer data towards computation
  • 11. High level hadoop cluster architecture and physical core components: Multiple Nodes in a Rack  Multiple Racks in a Data center  Nodes, Racks from different Data centers in various geographic location could be part of hadoop cluster
  • 12. Hadoop eco system components: The two main components of Apache Hadoop are HDFS for storage and MapReduce for data processing. • Flume • Sqoop • Zookeeper • Oozie • Pig • Hive • HBase • Solr
  • 13. what are the challenges in Implementing a Big data project with conventional Hadoop framework? • Agile methodology in Big data projects is nightmare • Finding the right resources with MapReduce,Scala -spark skills in the market is big challenge. • Addressing /Justifying Business : Trying to convert existing reports which were been using from past decade to make it work for Big data rather you may have to make them understand the need of adding Big Data to add Predictive Analytics features to compete with the business competitor. Pros and Cons in using Talend BD DI compared to conventional Hadoop eco system components? Pros: • Graphical Development of Big Data and Hadoop Jobs • Leverage existing technical resources with bare minimum training investment • Speedup the Big data projects with Agile methodology implementations • Seamless tight integration is possible with related subject areas like Data quality, event-based job scheduling, Master data management • Runtime server execution Constraints: Version compatibility dependencies between Talend and Hadoop distribution.
  • 14. Talend Architecture and its components: Nexus Repository Meta data repository Talend Administration center Admin/Audit/Monitoring Execution Servers
  • 15. Lab Practical on Joining 2 HDFS files and aggregate data using Talend job with Mapreduce Engine Lab Practical on Joining 2 HDFS files and aggregate data using conventional hadoop hand coded Mapreduce job