• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An introduction to Apache Gora
 

An introduction to Apache Gora

on

  • 586 views

A short introduction to Apache Gora, what is it and how does it work ?

A short introduction to Apache Gora, what is it and how does it work ?
How can it provide data store abstraction and persistency for big data ?

Statistics

Views

Total Views
586
Views on SlideShare
586
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An introduction to Apache Gora An introduction to Apache Gora Presentation Transcript

    • Apache Gora ● What is it ? ● Gora – Nutch ● Supports ● Data Access ● API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – What is it ? ● Provides for Big Data – – Persistence – ● In memory data model Data store abstraction Supports persisting to – – Key/value stores – Document stores – ● Column stores RDBMS's Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – What is it ? ● Released via Apache 2 license ● Written in Java ● Offers a persistence framework ● Designed for big data applications ● Used by Nutch 2.x for web crawl data storage ● Used for – Persistence – Indexing – Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Nutch ● Nutch 2.x now uses Gora – Abstracted storage – Data store independence – Handles object to persistent mappings – Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Supports ● Gora supports the following – Apache Accumulo – Apache Cassandra – Apache Hbase – Amazon DynamoDB – Pig – Hive – Cascading – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Data Access ● Java API for data access – ● Independent of location Core Gora API's – Store – Persistency – Query – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Store API ● Java API – org.apache.gora.store.* – DataStore handles object persistence – DataStore methods process objects ● ● ● ● Persist Fetch Query Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Persistency API ● Java API – org.apache.gora.persistency.* – Core classes ● ● ● BeanFactory – Construct keys Persistent – Persist objects State – State managed through StateManager – – NEW, CLEAN (UNMODIFIED) DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – Query API ● Java API – org.apache.gora.query.* – Core classes ● ● ● Query – Constructed via DataStore PartitionQuery – Divide results of Query into partitions. – Run queries on data nodes. – Generate Hadoop InputSplits Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Apache Gora – MapReduce API ● Java API – org.apache.gora.mapreduce.* – GoraMapper – GoraReducer – ALL Record Counter – Reader – Writer – Hadoop / Avro ● ● ● Serialise De-serialise Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz
    • Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems