Radoop: A Graphical Analytics Tool for Big Data                 Gabor Makrai, CTO, Radoop
Who we are• Active members of a Data Mining Research  Group in Europe• We started using Hadoop two years ago• We are using...
Data mining tools• Closed software      – SAS Enterprise Miner      – IBM SPSS Modeler• Open-source software      – Rapid-...
Hadoop vs. Data mining toolsHadoop                       Data mining tools11/9/2011          HadoopWorld 2011              4
Why is it important?• Barrier to entry for Hadoop      – Using Hadoop without expert Hadoop knowledge• Develop time vs. ru...
RapidMiner•   The most used data mining tool in 2010*•   Open-source software•   Supports extensions•   Data-flow structur...
Radoop architecture11/9/2011         HadoopWorld 2011   7
Implementation difficulties  RapidMiner and Hive data types  RapidMiner                        Hive  • Nominal            ...
Implementation difficulties• Input data restrictions for Mahout      – Conversion between Hive and Mahout• Mahout needs da...
Implementation difficulties• Remote Mahout’s jobs running• Hadoop Commons and Hive handle remote  connections well• At the...
Implementation status• Data imports and exports      – CSV, Excel, and Database import/export• Data transformations      –...
Radoop base elements• Operator• Process11/9/2011        HadoopWorld 2011   12
Radoop case study11/9/2011           HadoopWorld 2011   13
Radoop case study11/9/2011           HadoopWorld 2011   14
Radoop case study11/9/2011           HadoopWorld 2011   15
Radoop case study            Gets the Hive table11/9/2011                  HadoopWorld 2011   16
Radoop case study             Creates a new view with where statement11/9/2011           HadoopWorld 2011                 ...
Radoop case study               Creates a new view with group by function11/9/2011           HadoopWorld 2011             ...
Radoop case study       Creates a new view with sort by function11/9/2011                    HadoopWorld 2011     19
Radoop case study                    Creates a new view with limit11/9/2011           HadoopWorld 2011                20
Radoop case study                         Creates a new table from the last view11/9/2011           HadoopWorld 2011      ...
Future• “We believe that more than half of the world’s  data will be stored in Apache Hadoop within  five years.” Hortonwo...
Contacts• Gabor Makrai      – makrai@radoop.eu• Webpage      – http://www.radoop.eu/• E-mail      – radoop@radoop.eu• Twit...
Upcoming SlideShare
Loading in …5
×

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Makrai, Radoop

4,291 views

Published on

Hadoop is an excellent environment for analyzing large data sets, but it lacks an easy-to-use graphical interface for building data pipelines and performing advanced analytics. RapidMiner is an excellent open-source tool for data analytics, but is limited to running on a single machine.In this presentation, we will introduce Radoop, an extension to RapidMiner that lets users interact with a Hadoop cluster. Radoop combines the strengths of both projects and provides a user-friendly interface for editing and running ETL, analytics, and machine learning processes on Hadoop. We will also discuss lessons learned while integrating HDFS, Hive, and Mahout with RapidMiner.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,291
On SlideShare
0
From Embeds
0
Number of Embeds
696
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Makrai, Radoop

  1. 1. Radoop: A Graphical Analytics Tool for Big Data Gabor Makrai, CTO, Radoop
  2. 2. Who we are• Active members of a Data Mining Research Group in Europe• We started using Hadoop two years ago• We are using basic Hadoop, Hive, and Mahout11/9/2011 HadoopWorld 2011 2
  3. 3. Data mining tools• Closed software – SAS Enterprise Miner – IBM SPSS Modeler• Open-source software – Rapid-I RapidMiner –R• Graphical user interface• Data-flow structure• Adaptability is important11/9/2011 HadoopWorld 2011 3
  4. 4. Hadoop vs. Data mining toolsHadoop Data mining tools11/9/2011 HadoopWorld 2011 4
  5. 5. Why is it important?• Barrier to entry for Hadoop – Using Hadoop without expert Hadoop knowledge• Develop time vs. running time• User-friendly graphical interface – Program readability11/9/2011 HadoopWorld 2011 5
  6. 6. RapidMiner• The most used data mining tool in 2010*• Open-source software• Supports extensions• Data-flow structure• Marketplace• * http://www.kdnuggets.com/11/9/2011 HadoopWorld 2011 6
  7. 7. Radoop architecture11/9/2011 HadoopWorld 2011 7
  8. 8. Implementation difficulties RapidMiner and Hive data types RapidMiner Hive • Nominal • TINYINT – Text • SMALLINT – Polynominal • INT – Binominal • BIGINT • Numeric • BOOLEAN – Integer – Real • FLOAT • Date and time • DOUBLE – Date • STRING – Time11/9/2011 HadoopWorld 2011 8
  9. 9. Implementation difficulties• Input data restrictions for Mahout – Conversion between Hive and Mahout• Mahout needs data in special format – Data must be stored in VectorWritable class• Hive can export data – Plain text or Sequence file format• Solution: simple MapReduce jobs – Convert exported plain text Hive table to VectorWritable format and vica versa11/9/2011 HadoopWorld 2011 9
  10. 10. Implementation difficulties• Remote Mahout’s jobs running• Hadoop Commons and Hive handle remote connections well• At the same time, Mahout does not support remote running• Solution: modifications in the Mahout’s base source code11/9/2011 HadoopWorld 2011 10
  11. 11. Implementation status• Data imports and exports – CSV, Excel, and Database import/export• Data transformations – Most used data manipulation functions• Scalable machine learning and data mining – Clustering algorithms – Classifications11/9/2011 HadoopWorld 2011 11
  12. 12. Radoop base elements• Operator• Process11/9/2011 HadoopWorld 2011 12
  13. 13. Radoop case study11/9/2011 HadoopWorld 2011 13
  14. 14. Radoop case study11/9/2011 HadoopWorld 2011 14
  15. 15. Radoop case study11/9/2011 HadoopWorld 2011 15
  16. 16. Radoop case study Gets the Hive table11/9/2011 HadoopWorld 2011 16
  17. 17. Radoop case study Creates a new view with where statement11/9/2011 HadoopWorld 2011 17
  18. 18. Radoop case study Creates a new view with group by function11/9/2011 HadoopWorld 2011 18
  19. 19. Radoop case study Creates a new view with sort by function11/9/2011 HadoopWorld 2011 19
  20. 20. Radoop case study Creates a new view with limit11/9/2011 HadoopWorld 2011 20
  21. 21. Radoop case study Creates a new table from the last view11/9/2011 HadoopWorld 2011 21
  22. 22. Future• “We believe that more than half of the world’s data will be stored in Apache Hadoop within five years.” Hortonworks• Radoop is opening the doors for people who are less comfortable with Hadoop but want to use Hadoop for Big Data analytics11/9/2011 HadoopWorld 2011 22
  23. 23. Contacts• Gabor Makrai – makrai@radoop.eu• Webpage – http://www.radoop.eu/• E-mail – radoop@radoop.eu• Twitter – @radoopeu11/9/2011 HadoopWorld 2011 23

×