SlideShare a Scribd company logo
1 of 51
ROME 27-28 march 2015 - Speaker’s name
Dive into Sahara
Davide Del Vecchio
Francesco Vollero
Matteo Bernacchi
March 27, 2015
ROME 27-28 march 2015 - Speaker’s name
Davide Del Vecchio
•Principal Domain
Architect Middleware
•Previous experience
with analytics and Big
Data
•Background in Science
•Passionate about
technology
Who are we
Francesco Vollero
● OpenStack Technical
Specialist in EMEA
● Developer background -
in Openstack since
Grizzly
● Core contributor in
packstack, openstack-
puppet
● Snooping other
openstack components
like Sahara
● Functional programming
brain oriented :)
Matteo Bernacchi
•Senior Infrastructure
Consultant
•Experienced in cloud
solutions deployment
•Supporter of FOSS
technologies since 2003
ROME 27-28 march 2015 - Speaker’s name
•An introduction to Big Data
•An overview of the OpenStack components
•A (Moderately) Brief Introduction to Sahara
•Sahara in action
Agenda
ROME 27-28 march 2015 - Speaker’s name
Everything You Ever Wanted to Know
About Big Data But Only Had About 20
Minutes to Learn
ROME 27-28 march 2015 - Speaker’s name
Insert some very Big Data here …
What is it
•Something you cannot drag'n
drop
•Something you cannot think
to process in a reasonable
amount of time on your
machines
•Something that needs on-
purpose algorithm to work
with
ROME 27-28 march 2015 - Speaker’s name
It is not a just a matter of volume ...
There are many other key aspects
•Data must be processed in a small time
frame
• Data sets are different from traditional
relational/not relational including
machine and social data
•The large availability of computational
and mathematical tools in the open
source goes beyond the academia
•It's the second iteration of the feedback
process of open source tools that are now
available as a commodity
•Data visualization tools is an accelerator
to the movement
ROME 27-28 march 2015 - Speaker’s name
How do I commoditize Big Data
ROME 27-28 march 2015 - Speaker’s name
-2004: MapReduce Whitepaper (Google)
- Described the MapReduce algorithm
- Kind of a big deal
-Many were already doing this; it's a very basic prescription
-Specification for easy extensibility
-THIS was the big deal
-Google's vision for clean extension points and design drove
the Big Data movement
A Bit of History: MapReduce
ROME 27-28 march 2015 - Speaker’s name
-2007: Apache Hadoop
-First and still most significant OSS Big Data engine
-Originally built by Yahoo!
-“Hadoop” now used to refer both to Hadoop itself and the
large ecosystem of supporting technologies
-Dominant in the market now, but there are new contenders
-Named after a developer's son's stuffed elephant
A Bit of History: Hadoop
ROME 27-28 march 2015 - Speaker’s name
MapReduce: What Does It Do
•MAP
•Iterate over records
•Emit (0, 1, or n) key-value
pairs for each
•Word Count:
•Input: “Let's reduce map
reduce”
•Output: (“Let's”: 1),
(“reduce”: 1), (“map”: 1),
(“reduce”: 1)
•REDUCE
•Gather all the KVPs for each
key together
•Apply some function to all of
each key's values and emit
something for each key
•Word Count:
•Input: {“Let's”: [1], “map”: [1],
“reduce”: [1, 1]}
•Ouptut: {“Let's”: 1, “map”: 1,
“reduce”: 2}
ROME 27-28 march 2015 - Speaker’s name
So... It's... GROUP BY.
•Yes, it is kinda GROUP BY.
•You are now authorized to
laugh at Big Data engineers.
•It is, however, VERY easy to
parallelize.
•M Mappers can be run against
any amount of data on any
number of nodes, in small
chunks
•N Reducers only have to deal
with the data for any one key at
a time
ROME 27-28 march 2015 - Speaker’s name
MapReduce Extension Points(Per Hadoop
MapReduce Interface)
•An Input Reader
•Divides data into “splits” (1 per
mapper)
•Usually 16-128MB
•A Map Function
•A Combiner Function
•Just a reduce function within a mapper
process
•With a combiner, mappers only emit
one KVP per key
•A Partition Function
•Determines which key goes to which
reducer
•Default is hash(key) % len(reducers)
•(Optional) A Compare Function
•Orders final output
•A Reduce Function
•An Output Writer
•By default, writes one file per reducer and
just dumps text
ROME 27-28 march 2015 - Speaker’s name
MapReduce Abstraction Layers
•Hive (SQL-like)
•DROP TABLE IF EXISTS words;
•CREATE TABLE words( text string )
row format delimited fields
terminated by 'n' stored as
textfile;
•LOAD DATA LOCAL INPATH
‘data_path' OVERWRITE INTO TABLE
words;
•SELECT word, COUNT(*) FROM words
LATERAL VIEW explode(split(text,' '))
lTable AS word GROUP BY word;
•Pig (relational flow)
•raw_input = LOAD './input.txt‘;
•words = FOREACH raw_input
GENERATE
FLATTEN(TOKENIZE((chararray)$0)) AS
word;
•grouped = GROUP words BY word;
•counted = FOREACH grouped
GENERATE group, COUNT(words);
•STORE counted INTO './wordcount';
ROME 27-28 march 2015 - Speaker’s name
Hadoop: HDFS
Hadoop Distributed File System
•Large block size
•128MB default
Replication
•3 default, 512 max
Strictly separate from logic –
can be used with any algo
•Giraph: Graph Processing
•Mahout: Machine Learning
•The name node tracks data
blocks and replication
•Data nodes hold data
ROME 27-28 march 2015 - Speaker’s name
Hadoop: Data Processing
•Namenode tasks
•Breaks jobs (whole dataset) into tasks (one mapper or reducer)
•Assigns tasks to data nodes
•Tracks progress to completion
•Retry failed tasks a configurable number of times
•Allows Hadoop clusters to be run on error-prone commodity hardware
•Datanode tasks
•Tracks its own map and reduce jobs
•Transfers data to other nodes as needed
•Each data node has slots for map and reduce tasks (to be run in JVMs)
ROME 27-28 march 2015 - Speaker’s name
Hadoop: The Ecosystem
•Oozie: Workflow manager
(chained jobs)
•Data pipelining: Flume, Scribe,
Kafka
•RDBMS integration: Sqoop
•Tabular interface for
unstructured data: Hcatalog
•M/R Abstraction: Pig, Hive
•SO MANY OTHERS
ROME 27-28 march 2015 - Speaker’s name
OpenStack: take a look at the best place
to host your Big Data platform
OpenStack: take a look at the best place
to host your Big Data platform
ROME 27-28 march 2015 - Speaker’s name
ROME 27-28 march 2015 - Speaker’s name
Why does the world need OpenStack?
● Cloud is widely seen as the next-generation IT delivery model
o Agile & Flexible
o Utility-based on-demand consumption
o Self-service driving down administrative overhead and
maintenance
● Public clouds are setting the benchmark of how IT could be delivered to
users
o Not all organisations are ready for public cloud
● Applications are being written differently today-
o More tolerant of failure
o Making use of scale-out architecture
ROME 27-28 march 2015 - Speaker’s name
● Our data is too large
o Volumes of data are being generated at unprecedented levels
o Most of this data is unstructured
● Service requests are too large
o More and more devices are coming online
o Tablets, phones, laptops, BYOD generation…
● Crucially, applications weren’t written to cope with the demand!
o Traditional infrastructure capabilities are being exhausted
o Service uptime, QoS, KPI’s and SLA’s are slipping
Major issues with traditional infrastructure…
ROME 27-28 march 2015 - Speaker’s name
Workloads are evolving…
● Typically each tier resides on a single machine
● Doesn’t tolerate any downtime
● Relies on underlying infrastructure for
availability
● Applications scale-up, not out
● Workload resides across multiple machines
● Applications built to tolerate failure
● Does not rely on underlying infrastructure
● Applications scale-out, not up
Cloud-enabled Workloads
Traditional workloads
ROME 27-28 march 2015 - Speaker’s name
Or an easier analogy...
PETS = TRADITIONAL WORKLOADS FARM ANIMALS = CLOUD WORKLOADS
● Farm animals have tag
numbers like
piggie242.redhat.com
● They are almost identical to
each other
● When they get ill you get
another one
● Pets are given names like
lasy.internal.redhat.com
● They are unique, lovingly hand
raised and cared for
● When they get ill you nurse
them back to health
ROME 27-28 march 2015 - Speaker’s name
OpenStack is typically suitable for the following use cases —
● A public cloud-like Infrastructure-as-a-Service cloud platform
o Internal “Infrastructure on Demand” - private cloud
o Test and Development environments - e.g. sandbox
o Cloud service provider platform - reselling compute, network &
storage
● Building a scale-out platform for cloud-enabled workloads
o Web-scale applications, e.g. NetFlix-like, photo/video-streaming
o Academic or pharma workloads, e.g. genetic sequencing
So, how does OpenStack fit in?
ROME 27-28 march 2015 - Speaker’s name
•OpenStack is made up of individual autonomous components
•All of which are designed to scale-out to accommodate throughput and
availability
•OpenStack is considered more of a framework, that relies on drivers and
plugins
•Largely written in Python and is heavily dependent on Linux
OpenStack Architecture
ROME 27-28 march 2015 - Speaker’s name
•Keystone provides a common authentication and authorisation store for OpenStack
•Responsible for users, their roles, and to which project(s) they belong to
•Provides a catalogue of all other OpenStack services
•All OpenStack services typically rely on Keystone to verify a user’s request
OpenStack Identity Service (Keystone)
ROME 27-28 march 2015 - Speaker’s name
•Nova is responsible for the lifecycle of running instances within OpenStack
•Manages multiple different hypervisor types via drivers, e.g-
•Red Hat Enterprise Linux (+KVM)
•VMware vSphere
OpenStack Compute (Nova)
ROME 27-28 march 2015 - Speaker’s name
•Glance provides a mechanism for the storage and retrieval of disk
images/templates
•Supports a wide variety of image formats, including qcow2, vmdk, ami, vhd
and ova
•Many different backend storage options for images, including Swift…
OpenStack Image Service (Glance)
ROME 27-28 march 2015 - Speaker’s name
•Swift provides a mechanism for storing and retrieving arbitrary unstructured data
•Provides an object based interface via a RESTful/HTTP-based API
•Highly fault-tolerant with replication, self-healing, and load-balancing
•Architected to be implemented using commodity compute and storage
OpenStack Object Store (Swift)
ROME 27-28 march 2015 - Speaker’s name
•Neutron is responsible for providing networking to running instances within
OpenStack
•Provides an API for defining, configuring, and using networks
•Relies on a plugin architecture for implementation of networks, examples include-
•Open vSwitch (default in Red Hat’s distribution)
•Cisco, PLUMgrid, VMware NSX, Arista, Mellanox, Brocade, etc.
OpenStack Networking (Neutron)
ROME 27-28 march 2015 - Speaker’s name
•Cinder provides block storage to instances running within OpenStack
•Used for providing persistent and/or additional storage
•Relies on a plugin/driver architecture for implementation, examples include-
• Red Hat Storage (GlusterFS), IBM XIV, HP Leftland, 3PAR, etc.
OpenStack Volume Service (Cinder)
ROME 27-28 march 2015 - Speaker’s name
•Heat facilitates the creation of ‘application stacks’ made from multiple resources
•Stacks are imported as a descriptive template language
•Heat manages the automated orchestration of resources and their dependencies
•Allows for dynamic scaling of applications based on configurable metrics
OpenStack Orchestration (Heat)
ROME 27-28 march 2015 - Speaker’s name
•Ceilometer is a central collection of metering and monitoring data
•Primarily used for chargeback of resource usage
•Ceilometer consumes data from the other components - e.g. via agents
•Architecture is completely extensible - meter what you want to - expose via API
OpenStack Telemetry (Ceilometer)
ROME 27-28 march 2015 - Speaker’s name
•Horizon is OpenStack’s web-based self-service portal
•Sits on-top of all of the other OpenStack components via API interaction
•Provides a subset of underlying functionality
•Examples include: instance creation, network configuration, block storage attachment
•Exposes an administrative extension for basic tasks, e.g. user creation
OpenStack Dashboard (Horizon)
ROME 27-28 march 2015 - Speaker’s name
•All OpenStack components expose a RESTful API for communication
•A stateless, shared-nothing API service provides scalability and fault-tolerance
•Keystone manages a list of these API endpoints in its catalog
Common OpenStack Architecture
ROME 27-28 march 2015 - Speaker’s name
Common OpenStack Architecture
Where’s
Nova?
http://server0:8773
server1:877
3
server2:8773
server3:8773
L
B
server0:877
3
ROME 27-28 march 2015 - Speaker’s name
•In addition to providing API services, each component has a set of workers
•These workers actually do the heavy lifting behind the scenes
•Workers (and API services) scale-out and communicate using a message bus
(RabbitMQ)
•Example with Nova:
Common OpenStack Architecture
Nova
API
Nova
Compute
Nova
Compute
Nova
Compute
RabbitMQ
AMQP
ROME 27-28 march 2015 - Speaker’s name
•In addition to providing API services, each component has a set of workers
•These workers actually do the heavy lifting behind the scenes
•Workers (and API services) scale-out and communicate using a message bus
(RabbitMQ)
•Example with Nova:
Common OpenStack Architecture
Nova
API
Nova
Compute
Nova
Compute
Nova
Compute
RabbitMQ
AMQP
ROME 27-28 march 2015 - Speaker’s name
• In addition to providing API services, each component has a set of workers
• These workers actually do the heavy lifting behind the scenes
• Workers (and API services) scale-out and communicate using a message bus (RabbitMQ)
• Example with Nova:
Common OpenStack Architecture
Nova
API
Nova
Compute
Nova
Compute
Nova
Compute
RabbitMQ
AMQP
ROME 27-28 march 2015 - Speaker’s name
• OpenStack services store state information in a SQL-based database, default is MySQL
• Each service can use it’s own database infrastructure or share a common platform
• For resilience and throughput, replicated multi-master databases can be implemented
• Example with Keystone:
Common OpenStack Architecture
Keystone
Server
L
B
Multi-Master Replication
Using Galera
ROME 27-28 march 2015 - Speaker’s name
• OpenStack services check a users request with Keystone for both authentication and authorisation
• Example with Nova:
Common OpenStack Architecture
Keystone
Server
Nova
API
Launch an
Instance
1) Are they authenticated?
2) Are they allowed to launch an
instance?
Success/Fai
l
ROME 27-28 march 2015 - Speaker’s name
OpenStack Architecture
ROME 27-28 march 2015 - Speaker’s name
ROME 27-28 march 2015 - Speaker’s name
OpenStack Sahara, or what we supposed
to talk about today
ROME 27-28 march 2015 - Speaker’s name
Hadoop without Sahara: the challenges
•Hadoop clusters are difficult to configure and few have the
expert knowledge to do fine
•Commodity hardware is cheap but requires frequent (costly,
expert) maintenance
•Demand for data processing varies over time, even with
sophisticated scheduling
•Baremetal Hadoop cluster nodes can fail, leading to a loss of
service
•Many public BigData services don't give you flexibility
ROME 27-28 march 2015 - Speaker’s name
Hadoop with Sahara: beat the challenges
•OpenStack Sahara lets you to:
•Deploy Hadoop Clusters (predictable and repeatable)
•Scaling the deployed clusters
•Define and run jobs
•Offer a programmatic API interface and a web console
•Furthermore:
•It support many Hadoop Distributions
•It is well integrated with other OpenStack Services
•Enables to use Hadoop even with little knowledge about it
ROME 27-28 march 2015 - Speaker’s name
Sahara: the project
History:
•Started at Portland Summit
•Incubated in Icehouse
•Integrated in Juno
Main components:
•Sahara REST API
•Python REST Client and Sahara Pages (Integrated with Horizon)
•Elastic Data Processing
•Provisioning Engine
•Vendor Plugins (Vanilla, Intel, Hortonworks, Cloudera, MapR)
ROME 27-28 march 2015 - Speaker’s name
Sahara: Architecture
ROME 27-28 march 2015 - Speaker’s name
Sahara: Usecases
•Cluster Management (API V1.0)
•On-demand, scalable, persistent clusters
•Supports multiple plugins
•Integrates with Heat, Glance, Nova, Neutron, and Cinder
•EDP (Elastic Data Processing ) (API V1.1)
•Supports multiple job types (Java, MR, Hive, Pig, Spark...)
•Supports transient clusters (spin up, process, shut down) or persistent clusters
•Integrates with Swift (optionally) and services on Vms
ROME 27-28 march 2015 - Speaker’s name
Sahara: end-user workflow
ROME 27-28 march 2015 - Speaker’s name
ROME 27-28 march 2015 - Speaker’s name
Questions ?

More Related Content

What's hot

What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesNacho García Fernández
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
Ten tools for ten big data areas 02_Tableau
Ten tools for ten big data areas 02_TableauTen tools for ten big data areas 02_Tableau
Ten tools for ten big data areas 02_TableauWill Du
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 

What's hot (20)

What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Ten tools for ten big data areas 02_Tableau
Ten tools for ten big data areas 02_TableauTen tools for ten big data areas 02_Tableau
Ten tools for ten big data areas 02_Tableau
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 

Viewers also liked

از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...Leila Esmaeili
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 
Benchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutionsBenchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutionsZhidong Yu
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014spinningmatt
 
آشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایآشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایRamin Najjarbashi
 
Cloud Security and Risk Management
Cloud Security and Risk ManagementCloud Security and Risk Management
Cloud Security and Risk ManagementMorteza Javan
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseRackspace
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStackNati Shalom
 

Viewers also liked (13)

از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Benchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutionsBenchmarking sahara based big data as a service solutions
Benchmarking sahara based big data as a service solutions
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
Sahara Updates - Kilo Edition
Sahara Updates - Kilo EditionSahara Updates - Kilo Edition
Sahara Updates - Kilo Edition
 
آشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایآشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ای
 
Cloud Security and Risk Management
Cloud Security and Risk ManagementCloud Security and Risk Management
Cloud Security and Risk Management
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to Enterprise
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 

Similar to Sahara presentation latest - Codemotion Rome 2015

OpenStack in Action 4! Franz Meyer - What Use Case does Red Hat Enterprise ...
OpenStack in Action 4!   Franz Meyer - What Use Case does Red Hat Enterprise ...OpenStack in Action 4!   Franz Meyer - What Use Case does Red Hat Enterprise ...
OpenStack in Action 4! Franz Meyer - What Use Case does Red Hat Enterprise ...eNovance
 
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLBCS Data Management Specialist Group
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
IT Arena-2021
IT Arena-2021IT Arena-2021
IT Arena-2021b0ris_1
 
Introduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a ServiceIntroduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a ServiceTesora
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015Mark Voelker
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackAbderrahmane TEKFI
 
Positioning Yourself for the Future
Positioning Yourself for the FuturePositioning Yourself for the Future
Positioning Yourself for the FutureScott Lowe
 
Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Renaud Boutet
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 cleanbenrodrigue
 

Similar to Sahara presentation latest - Codemotion Rome 2015 (20)

OpenStack in Action 4! Franz Meyer - What Use Case does Red Hat Enterprise ...
OpenStack in Action 4!   Franz Meyer - What Use Case does Red Hat Enterprise ...OpenStack in Action 4!   Franz Meyer - What Use Case does Red Hat Enterprise ...
OpenStack in Action 4! Franz Meyer - What Use Case does Red Hat Enterprise ...
 
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
IT Arena-2021
IT Arena-2021IT Arena-2021
IT Arena-2021
 
Introduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a ServiceIntroduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a Service
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
OpenStack 101
OpenStack 101OpenStack 101
OpenStack 101
 
OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015OpenStack 101 - All Things Open 2015
OpenStack 101 - All Things Open 2015
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStack
 
After summit catch up
After summit catch upAfter summit catch up
After summit catch up
 
Positioning Yourself for the Future
Positioning Yourself for the FuturePositioning Yourself for the Future
Positioning Yourself for the Future
 
Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 clean
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Sahara presentation latest - Codemotion Rome 2015

  • 1. ROME 27-28 march 2015 - Speaker’s name Dive into Sahara Davide Del Vecchio Francesco Vollero Matteo Bernacchi March 27, 2015
  • 2. ROME 27-28 march 2015 - Speaker’s name Davide Del Vecchio •Principal Domain Architect Middleware •Previous experience with analytics and Big Data •Background in Science •Passionate about technology Who are we Francesco Vollero ● OpenStack Technical Specialist in EMEA ● Developer background - in Openstack since Grizzly ● Core contributor in packstack, openstack- puppet ● Snooping other openstack components like Sahara ● Functional programming brain oriented :) Matteo Bernacchi •Senior Infrastructure Consultant •Experienced in cloud solutions deployment •Supporter of FOSS technologies since 2003
  • 3. ROME 27-28 march 2015 - Speaker’s name •An introduction to Big Data •An overview of the OpenStack components •A (Moderately) Brief Introduction to Sahara •Sahara in action Agenda
  • 4. ROME 27-28 march 2015 - Speaker’s name Everything You Ever Wanted to Know About Big Data But Only Had About 20 Minutes to Learn
  • 5. ROME 27-28 march 2015 - Speaker’s name Insert some very Big Data here … What is it •Something you cannot drag'n drop •Something you cannot think to process in a reasonable amount of time on your machines •Something that needs on- purpose algorithm to work with
  • 6. ROME 27-28 march 2015 - Speaker’s name It is not a just a matter of volume ... There are many other key aspects •Data must be processed in a small time frame • Data sets are different from traditional relational/not relational including machine and social data •The large availability of computational and mathematical tools in the open source goes beyond the academia •It's the second iteration of the feedback process of open source tools that are now available as a commodity •Data visualization tools is an accelerator to the movement
  • 7. ROME 27-28 march 2015 - Speaker’s name How do I commoditize Big Data
  • 8. ROME 27-28 march 2015 - Speaker’s name -2004: MapReduce Whitepaper (Google) - Described the MapReduce algorithm - Kind of a big deal -Many were already doing this; it's a very basic prescription -Specification for easy extensibility -THIS was the big deal -Google's vision for clean extension points and design drove the Big Data movement A Bit of History: MapReduce
  • 9. ROME 27-28 march 2015 - Speaker’s name -2007: Apache Hadoop -First and still most significant OSS Big Data engine -Originally built by Yahoo! -“Hadoop” now used to refer both to Hadoop itself and the large ecosystem of supporting technologies -Dominant in the market now, but there are new contenders -Named after a developer's son's stuffed elephant A Bit of History: Hadoop
  • 10. ROME 27-28 march 2015 - Speaker’s name MapReduce: What Does It Do •MAP •Iterate over records •Emit (0, 1, or n) key-value pairs for each •Word Count: •Input: “Let's reduce map reduce” •Output: (“Let's”: 1), (“reduce”: 1), (“map”: 1), (“reduce”: 1) •REDUCE •Gather all the KVPs for each key together •Apply some function to all of each key's values and emit something for each key •Word Count: •Input: {“Let's”: [1], “map”: [1], “reduce”: [1, 1]} •Ouptut: {“Let's”: 1, “map”: 1, “reduce”: 2}
  • 11. ROME 27-28 march 2015 - Speaker’s name So... It's... GROUP BY. •Yes, it is kinda GROUP BY. •You are now authorized to laugh at Big Data engineers. •It is, however, VERY easy to parallelize. •M Mappers can be run against any amount of data on any number of nodes, in small chunks •N Reducers only have to deal with the data for any one key at a time
  • 12. ROME 27-28 march 2015 - Speaker’s name MapReduce Extension Points(Per Hadoop MapReduce Interface) •An Input Reader •Divides data into “splits” (1 per mapper) •Usually 16-128MB •A Map Function •A Combiner Function •Just a reduce function within a mapper process •With a combiner, mappers only emit one KVP per key •A Partition Function •Determines which key goes to which reducer •Default is hash(key) % len(reducers) •(Optional) A Compare Function •Orders final output •A Reduce Function •An Output Writer •By default, writes one file per reducer and just dumps text
  • 13. ROME 27-28 march 2015 - Speaker’s name MapReduce Abstraction Layers •Hive (SQL-like) •DROP TABLE IF EXISTS words; •CREATE TABLE words( text string ) row format delimited fields terminated by 'n' stored as textfile; •LOAD DATA LOCAL INPATH ‘data_path' OVERWRITE INTO TABLE words; •SELECT word, COUNT(*) FROM words LATERAL VIEW explode(split(text,' ')) lTable AS word GROUP BY word; •Pig (relational flow) •raw_input = LOAD './input.txt‘; •words = FOREACH raw_input GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word; •grouped = GROUP words BY word; •counted = FOREACH grouped GENERATE group, COUNT(words); •STORE counted INTO './wordcount';
  • 14. ROME 27-28 march 2015 - Speaker’s name Hadoop: HDFS Hadoop Distributed File System •Large block size •128MB default Replication •3 default, 512 max Strictly separate from logic – can be used with any algo •Giraph: Graph Processing •Mahout: Machine Learning •The name node tracks data blocks and replication •Data nodes hold data
  • 15. ROME 27-28 march 2015 - Speaker’s name Hadoop: Data Processing •Namenode tasks •Breaks jobs (whole dataset) into tasks (one mapper or reducer) •Assigns tasks to data nodes •Tracks progress to completion •Retry failed tasks a configurable number of times •Allows Hadoop clusters to be run on error-prone commodity hardware •Datanode tasks •Tracks its own map and reduce jobs •Transfers data to other nodes as needed •Each data node has slots for map and reduce tasks (to be run in JVMs)
  • 16. ROME 27-28 march 2015 - Speaker’s name Hadoop: The Ecosystem •Oozie: Workflow manager (chained jobs) •Data pipelining: Flume, Scribe, Kafka •RDBMS integration: Sqoop •Tabular interface for unstructured data: Hcatalog •M/R Abstraction: Pig, Hive •SO MANY OTHERS
  • 17. ROME 27-28 march 2015 - Speaker’s name OpenStack: take a look at the best place to host your Big Data platform OpenStack: take a look at the best place to host your Big Data platform
  • 18. ROME 27-28 march 2015 - Speaker’s name
  • 19. ROME 27-28 march 2015 - Speaker’s name Why does the world need OpenStack? ● Cloud is widely seen as the next-generation IT delivery model o Agile & Flexible o Utility-based on-demand consumption o Self-service driving down administrative overhead and maintenance ● Public clouds are setting the benchmark of how IT could be delivered to users o Not all organisations are ready for public cloud ● Applications are being written differently today- o More tolerant of failure o Making use of scale-out architecture
  • 20. ROME 27-28 march 2015 - Speaker’s name ● Our data is too large o Volumes of data are being generated at unprecedented levels o Most of this data is unstructured ● Service requests are too large o More and more devices are coming online o Tablets, phones, laptops, BYOD generation… ● Crucially, applications weren’t written to cope with the demand! o Traditional infrastructure capabilities are being exhausted o Service uptime, QoS, KPI’s and SLA’s are slipping Major issues with traditional infrastructure…
  • 21. ROME 27-28 march 2015 - Speaker’s name Workloads are evolving… ● Typically each tier resides on a single machine ● Doesn’t tolerate any downtime ● Relies on underlying infrastructure for availability ● Applications scale-up, not out ● Workload resides across multiple machines ● Applications built to tolerate failure ● Does not rely on underlying infrastructure ● Applications scale-out, not up Cloud-enabled Workloads Traditional workloads
  • 22. ROME 27-28 march 2015 - Speaker’s name Or an easier analogy... PETS = TRADITIONAL WORKLOADS FARM ANIMALS = CLOUD WORKLOADS ● Farm animals have tag numbers like piggie242.redhat.com ● They are almost identical to each other ● When they get ill you get another one ● Pets are given names like lasy.internal.redhat.com ● They are unique, lovingly hand raised and cared for ● When they get ill you nurse them back to health
  • 23. ROME 27-28 march 2015 - Speaker’s name OpenStack is typically suitable for the following use cases — ● A public cloud-like Infrastructure-as-a-Service cloud platform o Internal “Infrastructure on Demand” - private cloud o Test and Development environments - e.g. sandbox o Cloud service provider platform - reselling compute, network & storage ● Building a scale-out platform for cloud-enabled workloads o Web-scale applications, e.g. NetFlix-like, photo/video-streaming o Academic or pharma workloads, e.g. genetic sequencing So, how does OpenStack fit in?
  • 24. ROME 27-28 march 2015 - Speaker’s name •OpenStack is made up of individual autonomous components •All of which are designed to scale-out to accommodate throughput and availability •OpenStack is considered more of a framework, that relies on drivers and plugins •Largely written in Python and is heavily dependent on Linux OpenStack Architecture
  • 25. ROME 27-28 march 2015 - Speaker’s name •Keystone provides a common authentication and authorisation store for OpenStack •Responsible for users, their roles, and to which project(s) they belong to •Provides a catalogue of all other OpenStack services •All OpenStack services typically rely on Keystone to verify a user’s request OpenStack Identity Service (Keystone)
  • 26. ROME 27-28 march 2015 - Speaker’s name •Nova is responsible for the lifecycle of running instances within OpenStack •Manages multiple different hypervisor types via drivers, e.g- •Red Hat Enterprise Linux (+KVM) •VMware vSphere OpenStack Compute (Nova)
  • 27. ROME 27-28 march 2015 - Speaker’s name •Glance provides a mechanism for the storage and retrieval of disk images/templates •Supports a wide variety of image formats, including qcow2, vmdk, ami, vhd and ova •Many different backend storage options for images, including Swift… OpenStack Image Service (Glance)
  • 28. ROME 27-28 march 2015 - Speaker’s name •Swift provides a mechanism for storing and retrieving arbitrary unstructured data •Provides an object based interface via a RESTful/HTTP-based API •Highly fault-tolerant with replication, self-healing, and load-balancing •Architected to be implemented using commodity compute and storage OpenStack Object Store (Swift)
  • 29. ROME 27-28 march 2015 - Speaker’s name •Neutron is responsible for providing networking to running instances within OpenStack •Provides an API for defining, configuring, and using networks •Relies on a plugin architecture for implementation of networks, examples include- •Open vSwitch (default in Red Hat’s distribution) •Cisco, PLUMgrid, VMware NSX, Arista, Mellanox, Brocade, etc. OpenStack Networking (Neutron)
  • 30. ROME 27-28 march 2015 - Speaker’s name •Cinder provides block storage to instances running within OpenStack •Used for providing persistent and/or additional storage •Relies on a plugin/driver architecture for implementation, examples include- • Red Hat Storage (GlusterFS), IBM XIV, HP Leftland, 3PAR, etc. OpenStack Volume Service (Cinder)
  • 31. ROME 27-28 march 2015 - Speaker’s name •Heat facilitates the creation of ‘application stacks’ made from multiple resources •Stacks are imported as a descriptive template language •Heat manages the automated orchestration of resources and their dependencies •Allows for dynamic scaling of applications based on configurable metrics OpenStack Orchestration (Heat)
  • 32. ROME 27-28 march 2015 - Speaker’s name •Ceilometer is a central collection of metering and monitoring data •Primarily used for chargeback of resource usage •Ceilometer consumes data from the other components - e.g. via agents •Architecture is completely extensible - meter what you want to - expose via API OpenStack Telemetry (Ceilometer)
  • 33. ROME 27-28 march 2015 - Speaker’s name •Horizon is OpenStack’s web-based self-service portal •Sits on-top of all of the other OpenStack components via API interaction •Provides a subset of underlying functionality •Examples include: instance creation, network configuration, block storage attachment •Exposes an administrative extension for basic tasks, e.g. user creation OpenStack Dashboard (Horizon)
  • 34. ROME 27-28 march 2015 - Speaker’s name •All OpenStack components expose a RESTful API for communication •A stateless, shared-nothing API service provides scalability and fault-tolerance •Keystone manages a list of these API endpoints in its catalog Common OpenStack Architecture
  • 35. ROME 27-28 march 2015 - Speaker’s name Common OpenStack Architecture Where’s Nova? http://server0:8773 server1:877 3 server2:8773 server3:8773 L B server0:877 3
  • 36. ROME 27-28 march 2015 - Speaker’s name •In addition to providing API services, each component has a set of workers •These workers actually do the heavy lifting behind the scenes •Workers (and API services) scale-out and communicate using a message bus (RabbitMQ) •Example with Nova: Common OpenStack Architecture Nova API Nova Compute Nova Compute Nova Compute RabbitMQ AMQP
  • 37. ROME 27-28 march 2015 - Speaker’s name •In addition to providing API services, each component has a set of workers •These workers actually do the heavy lifting behind the scenes •Workers (and API services) scale-out and communicate using a message bus (RabbitMQ) •Example with Nova: Common OpenStack Architecture Nova API Nova Compute Nova Compute Nova Compute RabbitMQ AMQP
  • 38. ROME 27-28 march 2015 - Speaker’s name • In addition to providing API services, each component has a set of workers • These workers actually do the heavy lifting behind the scenes • Workers (and API services) scale-out and communicate using a message bus (RabbitMQ) • Example with Nova: Common OpenStack Architecture Nova API Nova Compute Nova Compute Nova Compute RabbitMQ AMQP
  • 39. ROME 27-28 march 2015 - Speaker’s name • OpenStack services store state information in a SQL-based database, default is MySQL • Each service can use it’s own database infrastructure or share a common platform • For resilience and throughput, replicated multi-master databases can be implemented • Example with Keystone: Common OpenStack Architecture Keystone Server L B Multi-Master Replication Using Galera
  • 40. ROME 27-28 march 2015 - Speaker’s name • OpenStack services check a users request with Keystone for both authentication and authorisation • Example with Nova: Common OpenStack Architecture Keystone Server Nova API Launch an Instance 1) Are they authenticated? 2) Are they allowed to launch an instance? Success/Fai l
  • 41. ROME 27-28 march 2015 - Speaker’s name OpenStack Architecture
  • 42. ROME 27-28 march 2015 - Speaker’s name
  • 43. ROME 27-28 march 2015 - Speaker’s name OpenStack Sahara, or what we supposed to talk about today
  • 44. ROME 27-28 march 2015 - Speaker’s name Hadoop without Sahara: the challenges •Hadoop clusters are difficult to configure and few have the expert knowledge to do fine •Commodity hardware is cheap but requires frequent (costly, expert) maintenance •Demand for data processing varies over time, even with sophisticated scheduling •Baremetal Hadoop cluster nodes can fail, leading to a loss of service •Many public BigData services don't give you flexibility
  • 45. ROME 27-28 march 2015 - Speaker’s name Hadoop with Sahara: beat the challenges •OpenStack Sahara lets you to: •Deploy Hadoop Clusters (predictable and repeatable) •Scaling the deployed clusters •Define and run jobs •Offer a programmatic API interface and a web console •Furthermore: •It support many Hadoop Distributions •It is well integrated with other OpenStack Services •Enables to use Hadoop even with little knowledge about it
  • 46. ROME 27-28 march 2015 - Speaker’s name Sahara: the project History: •Started at Portland Summit •Incubated in Icehouse •Integrated in Juno Main components: •Sahara REST API •Python REST Client and Sahara Pages (Integrated with Horizon) •Elastic Data Processing •Provisioning Engine •Vendor Plugins (Vanilla, Intel, Hortonworks, Cloudera, MapR)
  • 47. ROME 27-28 march 2015 - Speaker’s name Sahara: Architecture
  • 48. ROME 27-28 march 2015 - Speaker’s name Sahara: Usecases •Cluster Management (API V1.0) •On-demand, scalable, persistent clusters •Supports multiple plugins •Integrates with Heat, Glance, Nova, Neutron, and Cinder •EDP (Elastic Data Processing ) (API V1.1) •Supports multiple job types (Java, MR, Hive, Pig, Spark...) •Supports transient clusters (spin up, process, shut down) or persistent clusters •Integrates with Swift (optionally) and services on Vms
  • 49. ROME 27-28 march 2015 - Speaker’s name Sahara: end-user workflow
  • 50. ROME 27-28 march 2015 - Speaker’s name
  • 51. ROME 27-28 march 2015 - Speaker’s name Questions ?

Editor's Notes

  1. OVF Lo crearon DMTF que en principio es una composición de organizaciones incluidos VMware, HP, IBM, Dell, Microsoft y XenSource. Empezo a usarse a partir del 2007 por VMware aunque la especificacion final se hizo en 2008
  2. A really quick plug of Red Hat’s OpenStack distribution - an enterprise-class fully-supported release. Built specifically for and tightly integrated with RHEL - the #1 enterprise Linux distribution We follow the upstream 6 month release cadence but take two months — Enterprise class support from the #1 corporate contributors, and this isn’t just OpenStack - it’s Linux too
  3. A really quick plug of Red Hat’s OpenStack distribution - an enterprise-class fully-supported release. Built specifically for and tightly integrated with RHEL - the #1 enterprise Linux distribution We follow the upstream 6 month release cadence but take two months — Enterprise class support from the #1 corporate contributors, and this isn’t just OpenStack - it’s Linux too
  4. A really quick plug of Red Hat’s OpenStack distribution - an enterprise-class fully-supported release. Built specifically for and tightly integrated with RHEL - the #1 enterprise Linux distribution We follow the upstream 6 month release cadence but take two months — Enterprise class support from the #1 corporate contributors, and this isn’t just OpenStack - it’s Linux too
  5. A really quick plug of Red Hat’s OpenStack distribution - an enterprise-class fully-supported release. Built specifically for and tightly integrated with RHEL - the #1 enterprise Linux distribution We follow the upstream 6 month release cadence but take two months — Enterprise class support from the #1 corporate contributors, and this isn’t just OpenStack - it’s Linux too