SlideShare a Scribd company logo
1 of 24
CS157B - Big Data Management
Flume with
Twitter Integration
Date: 03/3/2014
Professor: Thanh Tran
by Swathi Kotturu
ETL Using Flume
What is Flume?
Apache Flume is a distributed service for efficiently
collecting, aggregating, and moving large amounts of
log data.
Flume and it’s integration with Hadoop and can be used
to capture streaming twitter data which can be filtered
based on keywords and locations..
More About Flume
It has a very simple architecture based on streaming data flows.
Flume takes a source and processes it through a memory
channel, where the data gets filtered and sinks into the HDFS.
Flume Agents
Flume can deploy any number of agents. An Agent is a
container for Flume data flow. It can run any number of
sources, sinks, and channels.
It must have a source, channel, and sink.
Flume Sources
Sources are not Necessarily restricted to log data.
It is possible to use Flume to transport event data such
as network traffic data, social-media-generated data,
e-mail messages, etc…
The events can be HTTP POSTS, RPC calls, strings in
stdout, etc….
After an event occurs, Flume sources write the event to a
channel as a transaction.
Flume Channels
Channels are internal passive stores with specific
characteristics. This allows a source and a sink to run
asynchronously.
Two Main Types of Channels
Memory Channels
- Volatile Channel that buffers events in memory
only. If JVM crashes, all data is lost.
File Channels
- Persistant Channel that is stored to disk.
You can Run Multiple Agents and Servers to collect data in
parallel.
Get Twitter Access
Flume in Cloudera
Download flume-sources-1.0-SNAPSHOT.jar and add it
to the flume class path.
http://files.cloudera.com/samples/flume-sources-1.0-
SNAPSHOT.jar
In the Cloudera Manager, you can add the class path:
“Services” -> “flume1″ -> “Configuration” ->
“Agent(Default)” -> “Advanced” -> “Java Configuration
Options for Flume Agent”, add:
–classpath /opt/cloudera/parcels/CDH-4.3.0-
1.cdh4.3.0.p0.22/lib/flume-ng/lib/flume-sources-1.0-
SNAPSHOT.jar
Flume in Cloudera (cont.)
Flume in Cloudera (cont.)
You also have to exclude the original file that came with
Flume, pre-installed by renaming it .org. The file is
search-contrib-1.0.0-jar-with-dependencies.jar and is in
the /usr/lib/flume-ng/lib/ path.
mv search-contrib-1.0.0-jar-with-
dependencies.jar search-contrib-1.0.0-jar-with-
dependencies.jar.org
Using Hue, create user Flume and give them access to
read and write in hdfs.
Flume in Cloudera (cont.)
Flume in Cloudera (cont.)
From the Cloudera Manager, go to
“Services” -> “flume1″ -> “Configuration” ->
“Agent(Default)” -> “Agent Name”.
Set the Agent Name to Twitter Agent
Flume in Cloudera (cont.)
Flume in Cloudera (cont.)
Also set the Configuration File to the following and make sure to replace
the ConsumerKey, ConsumerSecret, AccessToken, AccessTokenSecret
Also set the Configuration File to the following and make sure to replace
the ConsumerKey, ConsumerSecret, AccessToken, AccessTokenSecret
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type =
com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <consumer key>
TwitterAgent.sources.Twitter.consumerSecret = <consumer
secret>
TwitterAgent.sources.Twitter.accessToken = <access token>
TwitterAgent.sources.Twitter.accessTokenSecret = <access
Flume in Cloudera (cont.)
TwitterAgent.sources.Twitter.keywords = flu, runny nose,
tissue, sick, ill, cough
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path =
hdfs://localhost:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Flume in Cloudera (cont.)
Flume in Cloudera (cont.)
Restart Flume Agent
Flume in Cloudera (cont.)
Flume in Cloudera (cont.)
Example Tweet
We loaded raw tweets into HDFS which are represented
as chunks of JSON
Next Steps
Tell Hive how to read the data
You will need
Hive-serdes-1.0-SNAPSHOT.jar
http://files.cloudera.com/samples/hive-serdes-
1.0-SNAPSHOT.jar
As Hive is setup to read delimited
row format but in this case needs to
read json.
Flume Resources
Learn More
https://dev.twitter.com/docs/streaming-
apis/parameters
https://cwiki.apache.org/confluence/display/FLUME/
Home
http://blog.cloudera.com/blog/2012/09/analyzing-
twitter-data-with-hadoop/
Thank you!
Q/A

More Related Content

What's hot

Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Steve Hoffman
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveIMC Institute
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Cloudera, Inc.
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 

What's hot (20)

Flume
FlumeFlume
Flume
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Apache HTTP Server
Apache HTTP ServerApache HTTP Server
Apache HTTP Server
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Meeting 4 DNS
Meeting 4   DNSMeeting 4   DNS
Meeting 4 DNS
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
Apache web service
Apache web serviceApache web service
Apache web service
 

Similar to Flume with Twitter Integration

Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptxJayesh Patil
 
Slide 1 - The University of Mississippi
Slide 1 - The University of MississippiSlide 1 - The University of Mississippi
Slide 1 - The University of Mississippiwebhostingguy
 
internet programming and java notes 5th sem mca
internet programming and java notes 5th sem mcainternet programming and java notes 5th sem mca
internet programming and java notes 5th sem mcaRenu Thakur
 
Data ingestion
Data ingestionData ingestion
Data ingestionnitheeshe2
 
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary files
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary filesPHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary files
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary filesAttaporn Ninsuwan
 
香港六合彩
香港六合彩香港六合彩
香港六合彩csukxnr
 
六合彩 » SlideShare
六合彩 » SlideShare六合彩 » SlideShare
六合彩 » SlideSharemvtqyygx
 
六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideSharedqxjlhfc
 
香港六合彩-六合彩
香港六合彩-六合彩香港六合彩-六合彩
香港六合彩-六合彩qiohms
 
六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShareyqtvdsbl
 
六合彩-香港六合彩 » SlideShare
六合彩-香港六合彩 » SlideShare六合彩-香港六合彩 » SlideShare
六合彩-香港六合彩 » SlideSharemmfirkhw
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideSharecxrcpdu
 
六合彩-香港六合彩
六合彩-香港六合彩六合彩-香港六合彩
六合彩-香港六合彩skpkcd
 
Respond to the statement below.One of the best protocols today for.pdf
Respond to the statement below.One of the best protocols today for.pdfRespond to the statement below.One of the best protocols today for.pdf
Respond to the statement below.One of the best protocols today for.pdfrufohudsonak74125
 
Lecture 1 Introduction to Web Development.pptx
Lecture 1 Introduction to Web Development.pptxLecture 1 Introduction to Web Development.pptx
Lecture 1 Introduction to Web Development.pptxKevi20
 
Mule developing a cloud hub application
Mule developing a cloud hub applicationMule developing a cloud hub application
Mule developing a cloud hub applicationD.Rajesh Kumar
 

Similar to Flume with Twitter Integration (20)

Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
 
Slide 1 - The University of Mississippi
Slide 1 - The University of MississippiSlide 1 - The University of Mississippi
Slide 1 - The University of Mississippi
 
internet programming and java notes 5th sem mca
internet programming and java notes 5th sem mcainternet programming and java notes 5th sem mca
internet programming and java notes 5th sem mca
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Data ingestion
Data ingestionData ingestion
Data ingestion
 
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary files
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary filesPHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary files
PHP LFI to Arbitrary Code Execution via rfc1867 file upload temporary files
 
File transfer methods
File transfer methodsFile transfer methods
File transfer methods
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
六合彩 » SlideShare
六合彩 » SlideShare六合彩 » SlideShare
六合彩 » SlideShare
 
六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare
 
香港六合彩-六合彩
香港六合彩-六合彩香港六合彩-六合彩
香港六合彩-六合彩
 
六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare六合彩,香港六合彩 » SlideShare
六合彩,香港六合彩 » SlideShare
 
六合彩-香港六合彩 » SlideShare
六合彩-香港六合彩 » SlideShare六合彩-香港六合彩 » SlideShare
六合彩-香港六合彩 » SlideShare
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideShare
 
六合彩-香港六合彩
六合彩-香港六合彩六合彩-香港六合彩
六合彩-香港六合彩
 
cPanel & WHM Glossary
cPanel & WHM GlossarycPanel & WHM Glossary
cPanel & WHM Glossary
 
Respond to the statement below.One of the best protocols today for.pdf
Respond to the statement below.One of the best protocols today for.pdfRespond to the statement below.One of the best protocols today for.pdf
Respond to the statement below.One of the best protocols today for.pdf
 
Lecture 1 Introduction to Web Development.pptx
Lecture 1 Introduction to Web Development.pptxLecture 1 Introduction to Web Development.pptx
Lecture 1 Introduction to Web Development.pptx
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Mule developing a cloud hub application
Mule developing a cloud hub applicationMule developing a cloud hub application
Mule developing a cloud hub application
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Flume with Twitter Integration

  • 1. CS157B - Big Data Management Flume with Twitter Integration Date: 03/3/2014 Professor: Thanh Tran by Swathi Kotturu
  • 2. ETL Using Flume What is Flume? Apache Flume is a distributed service for efficiently collecting, aggregating, and moving large amounts of log data. Flume and it’s integration with Hadoop and can be used to capture streaming twitter data which can be filtered based on keywords and locations..
  • 3. More About Flume It has a very simple architecture based on streaming data flows. Flume takes a source and processes it through a memory channel, where the data gets filtered and sinks into the HDFS.
  • 4. Flume Agents Flume can deploy any number of agents. An Agent is a container for Flume data flow. It can run any number of sources, sinks, and channels. It must have a source, channel, and sink.
  • 5. Flume Sources Sources are not Necessarily restricted to log data. It is possible to use Flume to transport event data such as network traffic data, social-media-generated data, e-mail messages, etc… The events can be HTTP POSTS, RPC calls, strings in stdout, etc…. After an event occurs, Flume sources write the event to a channel as a transaction.
  • 6. Flume Channels Channels are internal passive stores with specific characteristics. This allows a source and a sink to run asynchronously. Two Main Types of Channels Memory Channels - Volatile Channel that buffers events in memory only. If JVM crashes, all data is lost. File Channels - Persistant Channel that is stored to disk.
  • 7. You can Run Multiple Agents and Servers to collect data in parallel.
  • 9. Flume in Cloudera Download flume-sources-1.0-SNAPSHOT.jar and add it to the flume class path. http://files.cloudera.com/samples/flume-sources-1.0- SNAPSHOT.jar In the Cloudera Manager, you can add the class path: “Services” -> “flume1″ -> “Configuration” -> “Agent(Default)” -> “Advanced” -> “Java Configuration Options for Flume Agent”, add: –classpath /opt/cloudera/parcels/CDH-4.3.0- 1.cdh4.3.0.p0.22/lib/flume-ng/lib/flume-sources-1.0- SNAPSHOT.jar
  • 10. Flume in Cloudera (cont.)
  • 11. Flume in Cloudera (cont.) You also have to exclude the original file that came with Flume, pre-installed by renaming it .org. The file is search-contrib-1.0.0-jar-with-dependencies.jar and is in the /usr/lib/flume-ng/lib/ path. mv search-contrib-1.0.0-jar-with- dependencies.jar search-contrib-1.0.0-jar-with- dependencies.jar.org Using Hue, create user Flume and give them access to read and write in hdfs.
  • 12. Flume in Cloudera (cont.)
  • 13. Flume in Cloudera (cont.) From the Cloudera Manager, go to “Services” -> “flume1″ -> “Configuration” -> “Agent(Default)” -> “Agent Name”. Set the Agent Name to Twitter Agent
  • 14. Flume in Cloudera (cont.)
  • 15. Flume in Cloudera (cont.) Also set the Configuration File to the following and make sure to replace the ConsumerKey, ConsumerSecret, AccessToken, AccessTokenSecret Also set the Configuration File to the following and make sure to replace the ConsumerKey, ConsumerSecret, AccessToken, AccessTokenSecret TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = <consumer key> TwitterAgent.sources.Twitter.consumerSecret = <consumer secret> TwitterAgent.sources.Twitter.accessToken = <access token> TwitterAgent.sources.Twitter.accessTokenSecret = <access
  • 16. Flume in Cloudera (cont.) TwitterAgent.sources.Twitter.keywords = flu, runny nose, tissue, sick, ill, cough TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transactionCapacity = 100
  • 17. Flume in Cloudera (cont.)
  • 18. Flume in Cloudera (cont.) Restart Flume Agent
  • 19. Flume in Cloudera (cont.)
  • 20. Flume in Cloudera (cont.)
  • 21. Example Tweet We loaded raw tweets into HDFS which are represented as chunks of JSON
  • 22. Next Steps Tell Hive how to read the data You will need Hive-serdes-1.0-SNAPSHOT.jar http://files.cloudera.com/samples/hive-serdes- 1.0-SNAPSHOT.jar As Hive is setup to read delimited row format but in this case needs to read json.