Hadoop for carrier

1,191 views

Published on

Harnessing Hadoop for Big Data, Series II

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,191
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
38
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop for carrier

  1. 1. Leveraging Hadoop Cluster for Carrier grade application Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012
  2. 2. No PersonalizationServicediscovery Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 2
  3. 3.  600- 800 GB of CDR per day ◦ GPRS Signaling 50GB/day ◦ 3G Signaling 300GB/day ◦ Voice 100GB/day ◦ SMS 200GB/day  100 - 200 GB/day of Web DataMammoth Data Data Analysis Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 3
  4. 4. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 4
  5. 5. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 5
  6. 6.  Framework for distributed processing of large data sets across clusters Consists of ◦ Hadoop Distributed File System aka HDFS (File system) ◦ Hadoop MapReduce (programming model ) Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data ◦ Simple core, Modular and Extensible Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 6
  7. 7.  Current Bottleneck ◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data ◦ Loading terabytes of data into database is slow ◦ Parallel computing not a possibility in Conventional BI ETL ◦ User profile and application data resides in DB which can scale only vertically Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 7
  8. 8.  Structured Data  sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as- sequencefile Un Structured Data Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 8
  9. 9.  A Distributed data Collection server ◦ Scalable ◦ Configurable ◦ Extensible ◦ Manageable Built around the concept of flows ◦ A single flow corresponds to a type of data source ◦ Supports compression, batching & reliability setups per flow Data come in through a source ◦ Optionally processed by one or more decorators ◦ And transmitted out via sink Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 9
  10. 10. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 10
  11. 11. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 11
  12. 12.  Map Reduce is very powerful, but: ◦ It requires a Java programmer ◦ User has to re-invent common ◦ functionality (join, filter, etc.) Execution engine atop Hadoop Pig provides a higher level language Pig Latin Opens the system to non-Java programmers Provides common operations like join, group, filter, sort Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 12
  13. 13.  Web log processing. Data processing for web search platforms. Ad hoc queries across large data sets. Rapid prototyping of algorithms for processing large data sets. Pig runs on local machine and job gets executed in hadoop cluster  $ cd /usr/share/cloudera/pig/  $ bin/pig –x local  grunt>  Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);  grpd = GROUP log BY user;  cntd = FOREACH grpd GENERATE group, COUNT(log);  STORE cntd INTO ‘output’; Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 13
  14. 14.  System for querying and managing structured data Built on top of hadoop Uses map reduce for execution SQL like syntax; supports ◦ From clause subquery ◦ ANSO Join (equi join ) ◦ Multi-table insert ◦ Multi group-by ◦ Sampling ◦ Object traversal Engagement ◦ Summarization ◦ Ad hoc analysis ◦ Spam detection Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 14
  15. 15. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 15
  16. 16. Feature Hive PigLanguage SQL-like PigLatinSchemas/Types Yes (explicit) Yes (implicit)Partitions Yes NoServer Optional(thirft) NoUser Defined Functions Yes YesCustom Serializer/Deserializer Yes YesDFS Direct Access Yes (implicit) Yes (explicit)Join/Order/Sort Yes YesShell Yes YesStreaming Yes NoWeb Interface Yes NoJDBC/ODBC Yes (limited) No Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 16
  17. 17. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 17
  18. 18. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 18

×