Clique square storage

•

0 likes•333 views

INRIA-OAK

rdf distributed storage

Data & Analytics

MapReduce-based loader
OWNER: Benjamin Djahandideh
PRESENTER: Alexandra Roatiș

CliqueSquare
RDF DM platform
on top of Hadoop

Contributors Working on the
code
Using the
code
Jorge Quiane
Zoi Kaoudi
Ioana Manolescu
François Goasdoué
Stamatis Zampetakis
Benjamin Djahandideh

Some details
● https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/?root=xmlinthecloud
● 129 classes, 10k lines of code
Packages related to storing RDF data:
– fr.inria.oak.cliquesquare.partitioner.*
– two main package:
● cliquesquare.partitioner.simple → usefull for small
experiments
● cliquesquare.partitioner.skewed → better performance
for important datasets (>100Go)

Storing RDF data
● Input : RDF ntriples
● Output : files over HDFS partitioned by triple attribute (replication factor 3)
● Runs from command line now; GUI in development
● Functionality :
– take an RDF dataset (n-triple format)
– filter duplicates
– spread the data over the HDFS nodes (custom partitioning)
– at the end - RDF data replicated by a factor of 3
● Aim of custom partitioning - reduce the data shuffled across the network

CliqueSquare partitioning strategy
● input triples
→ hashed into key-value pairs
Key: subject / predicate / object
Value: triple
● same key → same node
● filename specifying the type of
hash key used (-S, -P or -O)

● Dependency: Hadoop
● Project development : Maven, JUnit
● Known bugs:
– no data cleaning
● ToDo:
– branched version with value indexing for files
– requires redo-ing the partitioning code

What's hot

MapDB - taking Java collections to the next levelJavaDayUA

Alexander Ignatyev "MapReduce infrastructure"Yandex

9b. Document-Oriented Databases labFabio Fumarola

Gluster intro-tdoseGluster.org

Tiering barcelonaGluster.org

Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org

Gluster for sysadminsGluster.org

MapReduce and HadoopSalil Navgire

Map dbDebmalya Jash

State of the_gluster_-_lceuGluster.org

21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data

Php dba cacheGjero Krsteski

Modern software design in Big data eraBill GU

Lisa 2015-gluster fs-introductionGluster.org

$ Spark startBhuridech Sudsee

Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Ian Lumb

TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter

hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack

Disperse xlator ramon_datalabGluster.org

Join the super_colony_-_feb2013Gluster.org

What's hot (20)

MapDB - taking Java collections to the next level

Alexander Ignatyev "MapReduce infrastructure"

9b. Document-Oriented Databases lab

Gluster intro-tdose

Tiering barcelona

Challenges with Gluster and Persistent Memory with Dan Lambright

Gluster for sysadmins

MapReduce and Hadoop

Map db

State of the_gluster_-_lceu

21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...

Php dba cache

Modern software design in Big data era

Lisa 2015-gluster fs-introduction

$ Spark start

Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...

TRHUG 2015 - Veloxity Big Data Migration Use Case

hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...

Disperse xlator ramon_datalab

Join the super_colony_-_feb2013

Similar to Clique square storage

Paris Data Geek - Spark Streaming Djamel Zouaoui

SJTU Summary reportYves Chan

Apache Spark™ is a multi-language engine for executing data-S5.pptbhargavi804095

NIIF Grid Development portfolioFerenc Szalai

Hungarian ClusterGrid and its applicationsFerenc Szalai

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt

Introduction to Hadoop AdministrationRamesh Pabba - seeking new projects

Under the hood, fighting fires with realtime semantic web technologyBart van Leeuwen

NetFlow Data processing using Hadoop and VerticaJosef Niedermeier

Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime

Hadoop ppt on the basics and architecturesaipriyacoool

Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz

Hadoop 2.0 handout 5.0Manaranjan Pradhan

Lecture 2 part 1Jazan University

Spark Summit East 2015 Advanced Devops Student SlidesDatabricks

Scala and sparkFabio Fumarola

Big data analytics and docker the thrilla in manilaDean Hildebrand

TrainingDoug Chang

Similar to Clique square storage (20)

Paris Data Geek - Spark Streaming

SJTU Summary report

Apache Spark™ is a multi-language engine for executing data-S5.ppt

NIIF Grid Development portfolio

Hungarian ClusterGrid and its applications

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...

Introduction to Hadoop Administration

Under the hood, fighting fires with realtime semantic web technology

NetFlow Data processing using Hadoop and Vertica

Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014

Hadoop ppt on the basics and architecture

Taboola's experience with Apache Spark (presentation @ Reversim 2014)

Hadoop 2.0 handout 5.0

Lecture 2 part 1

Spark Summit East 2015 Advanced Devops Student Slides

Scala and spark

Big data analytics and docker the thrilla in manila

Training

Recently uploaded

ALSO dropshipping via API with DroFx.pptxolyaivanovalion

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Discover Why Less is More in B2B Researchmichael115558

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Midocean dropshipping via API with DroFxolyaivanovalion

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

ALSO dropshipping via API with DroFx.pptx

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Discover Why Less is More in B2B Research

Edukaciniai dropshipping via API with DroFx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Ravak dropshipping via API with DroFx.pptx

Midocean dropshipping via API with DroFx

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Predicting Loan Approval: A Data Science Project

Probability Grade 10 Third Quarter Lessons

Smarteg dropshipping via API with DroFx.pptx

Capstone Project on IBM Data Analytics Program

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Mature dropshipping via API with DroFx.pptx

Clique square storage

1. MapReduce-based loader OWNER: Benjamin Djahandideh PRESENTER: Alexandra Roatiș

2. CliqueSquare RDF DM platform on top of Hadoop

3. Contributors Working on the code Using the code Jorge Quiane Zoi Kaoudi Ioana Manolescu François Goasdoué Stamatis Zampetakis Benjamin Djahandideh

4. Some details ● https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/?root=xmlinthecloud ● 129 classes, 10k lines of code Packages related to storing RDF data: – fr.inria.oak.cliquesquare.partitioner.* – two main package: ● cliquesquare.partitioner.simple → usefull for small experiments ● cliquesquare.partitioner.skewed → better performance for important datasets (>100Go)

5. Storing RDF data ● Input : RDF ntriples ● Output : files over HDFS partitioned by triple attribute (replication factor 3) ● Runs from command line now; GUI in development ● Functionality : – take an RDF dataset (n-triple format) – filter duplicates – spread the data over the HDFS nodes (custom partitioning) – at the end - RDF data replicated by a factor of 3 ● Aim of custom partitioning - reduce the data shuffled across the network

6. CliqueSquare partitioning strategy ● input triples → hashed into key-value pairs Key: subject / predicate / object Value: triple ● same key → same node ● filename specifying the type of hash key used (-S, -P or -O)

7. ● Dependency: Hadoop ● Project development : Maven, JUnit ● Known bugs: – no data cleaning ● ToDo: – branched version with value indexing for files – requires redo-ing the partitioning code

Clique square storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Clique square storage

Similar to Clique square storage (20)

More from INRIA-OAK

More from INRIA-OAK (20)

Recently uploaded

Recently uploaded (20)

Clique square storage