Your SlideShare is downloading. ×
An introduction to Apache Gora
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

An introduction to Apache Gora

759
views

Published on

A short introduction to Apache Gora, what is it and how does it work ? …

A short introduction to Apache Gora, what is it and how does it work ?
How can it provide data store abstraction and persistency for big data ?

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
759
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Gora ● What is it ? ● Gora – Nutch ● Supports ● Data Access ● API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Apache Gora – What is it ? ● Provides for Big Data – – Persistence – ● In memory data model Data store abstraction Supports persisting to – – Key/value stores – Document stores – ● Column stores RDBMS's Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Apache Gora – What is it ? ● Released via Apache 2 license ● Written in Java ● Offers a persistence framework ● Designed for big data applications ● Used by Nutch 2.x for web crawl data storage ● Used for – Persistence – Indexing – Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Apache Gora – Nutch ● Nutch 2.x now uses Gora – Abstracted storage – Data store independence – Handles object to persistent mappings – Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Apache Gora – Supports ● Gora supports the following – Apache Accumulo – Apache Cassandra – Apache Hbase – Amazon DynamoDB – Pig – Hive – Cascading – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Apache Gora – Data Access ● Java API for data access – ● Independent of location Core Gora API's – Store – Persistency – Query – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Apache Gora – Store API ● Java API – org.apache.gora.store.* – DataStore handles object persistence – DataStore methods process objects ● ● ● ● Persist Fetch Query Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Apache Gora – Persistency API ● Java API – org.apache.gora.persistency.* – Core classes ● ● ● BeanFactory – Construct keys Persistent – Persist objects State – State managed through StateManager – – NEW, CLEAN (UNMODIFIED) DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 9. Apache Gora – Query API ● Java API – org.apache.gora.query.* – Core classes ● ● ● Query – Constructed via DataStore PartitionQuery – Divide results of Query into partitions. – Run queries on data nodes. – Generate Hadoop InputSplits Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 10. Apache Gora – MapReduce API ● Java API – org.apache.gora.mapreduce.* – GoraMapper – GoraReducer – ALL Record Counter – Reader – Writer – Hadoop / Avro ● ● ● Serialise De-serialise Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 11. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems