Sergio Bossa is a software architect and engineer who has worked on online gambling and casino software. He is an open source enthusiast who has contributed to projects like Spring, Terracotta, and Terrastore. Terrastore is a document database for developers that is document-based, consistent, distributed, scalable, and written in Java using Terracotta. It allows for easy installation, no complex configuration, and simple basic operations like putting and getting documents from buckets. It also supports features like range queries, predicate queries, server-side updates, and easy scalability. Terrastore is best suited for data hot spots, computational data, complex or variable data, and throw-away data.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
** Hadoop Training: https://www.edureka.co/hadoop **
This Edureka PPT on Sqoop Tutorial will explain you the fundamentals of Apache Sqoop. It will also give you a brief idea on Sqoop Architecture. In the end, it will showcase a demo of data transfer between Mysql and Hadoop
Below topics are covered in this video:
1. Problems with RDBMS
2. Need for Apache Sqoop
3. Introduction to Sqoop
4. Apache Sqoop Architecture
5. Sqoop Commands
6. Demo to transfer data between Mysql and Hadoop
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
** Hadoop Training: https://www.edureka.co/hadoop **
This Edureka PPT on Sqoop Tutorial will explain you the fundamentals of Apache Sqoop. It will also give you a brief idea on Sqoop Architecture. In the end, it will showcase a demo of data transfer between Mysql and Hadoop
Below topics are covered in this video:
1. Problems with RDBMS
2. Need for Apache Sqoop
3. Introduction to Sqoop
4. Apache Sqoop Architecture
5. Sqoop Commands
6. Demo to transfer data between Mysql and Hadoop
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Infrastructure-as-code: bridging the gap between Devs and OpsMykyta Protsenko
Ops are overwhelmed with support. Devs are mad because their cannot deploy the changes as fast as they want. Sounds familiar?
Infrastructure-as-code can make your life easier by empowering developers and reducing operations' routine toil. It can cut down the lead time for infrastructure provisioning from hours or even days to minutes.
This talk reviews several IaC tools and approaches, showing how to integrate them into continuous delivery pipeline. It covers the problems and challenges that engineers may face while working with infrastructure-as-code tools and provides a few hands-on recipes to address them.
Slides to the Hands On Spring Data lab, presented in Paris on Dec 10th, 2012. Code exercises are here: https://github.com/ericbottard/hands-on-spring-data
In this slideshare we introduce the basic concepts of a simple REST applications with Python and present some examples, see our Github repository. In addition we’ll go under the hood to see how Hammock provides abstraction and I’ll also show simple benchmarks that measure the library overhead.
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
In this talk, I explain how to use CouchDB mobile to connect your iPhone or Android phone with a a remote ChouchDB to build a RunKeeper clone. The code for this talk is available at https://github.com/peterfriese/CouchTo5K
MongoDB is the trusted document store we turn to when we have tough data store problems to solve. For this talk we are going to go a little bit off the path and explore what other roles we can fit MongoDB into. Others have discussed how to turn MongoDB’s capped collections into a publish/subscribe server. We stretch that a little further and turn MongoDB into a full fledged broker with both publish/subscribe and queue semantics, and a the ability to mix them. We will provide code and a running demo of the queue producers and consumers. Next we will turn to coordination services: We will explore the fundamental features and show how to implement them using MongoDB as the storage engine. Again we will show the code and demo the coordination of multiple applications.
Hidden pearls for High-Performance-PersistenceSven Ruppert
Small UseCases with a significant amount of data for internal company usage, most developers had this in their career, already. However, no Ops Team, no Kubernetes, no Cluster is available as part of the solution.
In this talk, I will show a few tech stacks that are helping to deal with persistent data without dealing with the classic horizontal scaling tech monsters like Kubernetes, Hadoop and many more.
Sit down, relax and enjoy the journey through a bunch of lightning-fast persistence alternatives for pure java devs.
Talk about add proxy user in Spark Task execution time given in Spark Summit East 2017 by Jorge López-Malla and Abel Ricon
full video:
https://www.youtube.com/watch?v=VaU1xC0Rixo&feature=youtu.be
The web has changed! Users spend more time on mobile than on desktops and they expect to have an amazing user experience on both platforms. APIs are the heart of the new web as the central point of access data, encapsulating logic and providing the same data and same features for desktops and mobiles.
In this talk, I will show you how in only 45 minutes we can create full REST API, with documentation and admin application build with React.
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Zabbix
At DBC we are running docker and other container types in a mesos/marathon cluster environment. I will demonstrate how we collect statistics, logs etc. and monitor this environment, showing configuration examples, data flows and templates.
Some of the covered topics:
- Mesos master and agents
- Marathon Framework
- Docker engine
- Containers
- Zookeeper
- Elasticserach/ELK
Similar to Terrastore - A document database for developers (20)
To be relational, or not to be relational? That's NOT the question!Sergio Bossa
With distributed computing being a reality, whether between clouds or within data center walls, programmers and architects are facing new challenges. In traditional enterprise development, we?ve been relying on the database to be the keeper of our data and constraints, but since the pattern didn't scale, NoSQL recently came to our rescue. Now what does all that mean? And more importantly, what does it mean to you and your application? This session aims at guiding you on the journey from using a single relational database server, to making it as scalable as possible and choosing proper alternative solutions: explaining pros and cons of tools available depending on your needs, their impact on your architecture, the concepts and algorithms under the hood, and when NoSQL should really be Not Only SQL. This session should be attended by developers and architects that plan on delivering applications in the next couple of years, as the future is now and you can't get struggled with hamletic questions!
Learning new programming languages is going to be more and more important with the advent of "polyglot programming", advocating the use of different languages depending on the most suitable one for the job. So what about picking three among the most interesting languages and talking about them in a single presentation? In this session we'll showcase the Clojure, Javascript and Scala languages, providing a gentle introduction and code snippets to highlight strengths and characteristics.
Actor concurrency for the JVM: a case studySergio Bossa
Actors are powerful abstractions to build highly concurrent and scalable applications.
We introduce the actor model and and an open-source, pure-java implementation called Actorom.
We then use Actorom for our case-study, where we'll build a fully decentralized Twitter-clone.
Scalable Databases - From Relational Databases To Polyglot PersistenceSergio Bossa
In a world where everyone is connected, and everyone's data is on the web, scaling your database is no more a choice: it is a necessity.
In this talk we'll see how to make relational and non-relational databases scale at our needs by understanding and applying old and new patterns, then we'll look at the most common use cases, and how to address them by choosing the right patterns and tools.
Today we're facing a paramount change in the data management field: more and more business applications are going to be contaminated with "social" aspects, requiring your data layer to be always available and perform well under increasing load conditions.
And while your relational database will be there to keep your transactional data in safe, you will need a whole new breed of data store to accommodate your availability and scalability needs: a so called "no-SQL" store.
In this talk you will learn about the forces driving this data layer revolution, and the most important patterns and products which will help you scale, stay available and smile happily at your "social" needs.
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008Sergio Bossa
Cheaper hardware and highly demanding applications make nowadays scalability a strong requirement: what will you say when your Boss will complain about more and more users waiting for that long task to complete before committing their transaction?
So take your application and make it scale with the Spring Framework, the leading full-stack solution for your Java applications, and Grid Gain, the most powerful Open Source production-ready grid computing framework!
In this talk you will learn about scalability principles, the
Map/Reduce pattern and how they\'re applied in Grid Gain for scaling out your Spring application.
Gridify your Spring application with Grid Gain @ Spring Italian Meeting 2008
Terrastore - A document database for developers
1. Sergio Bossa @sbtourist
Terrastore
A document database for
developers
2. About Me
Software architect and engineer
Gioco Digitale (online gambling and casinos).
Long time open source enthusiast and contributor
Spring.
Taconite.
Terracotta.
Actorom, Terrastore ...
(Micro)-Blogger
http://twitter.com/sbtourist
http://sbtourist.blogspot.com
3. NOSQL ... what?
1998
Just a tiny non-relational database.
2009
Non-relational databases as a replacement for
relational ones?
Nowadays ...
Not Only SQL.
Use the right tool for your job.
4. NOSQL ... why?
When you're in troubles with ...
Data Model.
Relational mismatch.
Variable schema.
Data Access Patterns.
Expensive joins.
Denormalized data.
Scalability.
More data.
More processing.
19. No impedence mismatch
Java:
public class Character {
private String name;
private List<Character> friends;
private List<Character> foes;
// ...
}
Json:
{"name" : "Spider-man",
"friends" : [{"name" : "Iceman"}]
"foes" : [{"name" : "Kingpin"}]}
20. Simple basic operations
Put documents in buckets ...
PUT /bucket/key
Content-Type: application/json
{...}
Get documents from buckets ...
GET /bucket/key
Content-Type: application/json
{...}
21. Range queries
Find documents in bucket with keys in a given range
...
GET /bucket/range?comparator=comparator_name&
startKey=start_key&
endKey=end_key&
timeToLive=max_age
Content-Type: application/json
{...}
23. What if ... custom comparators?
@AutoDetect(name="my-comparator")
public class MyComparator implements
terrastore.store.operators.Comparator {
public int compare(String key1, String key2) {
// ...
}
}
24. Predicate queries
Find documents in bucket satisfying a given predicate
condition ...
GET /bucket/predicate?
predicate=type:expression
Content-Type: application/json
{...}
25. Conditional put/get
Conditionally put documents in buckets ...
PUT /bucket/key?
predicate=type:expression
Content-Type: application/json
{...}
Conditionally get documents from buckets ...
GET /bucket/key?
predicate=type:expression
Content-Type: application/json
{...}
26. Built-in predicate conditions
JXPapth
Based on X-Path.
Find people whose name is 'Sergio':
jxpath:/name[.='Sergio']
JavaScript
Applies a JavaScript-like condition.
Find people whose name is 'Sergio':
js:value.name=='Sergio'
27. What if ... custom conditions?
@AutoDetect(name="my-condition")
public class MyCondition implements
terrastore.store.operators.Condition {
public boolean isSatisfied(String key,
Map<String, Object> value, String expression) {
// ...
}
}
28. Server-side updates
Atomically execute complex updates to a document ...
PUT /bucket/key/update?function=function_name&
timeout=timeout_value
Content-Type: application/json
{...}
29. Built-in update functions
Atomic Counters
Atomically increment/decrement/set-up one or
more counters.
Merge
Merge the stored document with provided values.
JavaScript custom update
Update the stored document by executing a user-
provided javascript function.
30. What if ... custom functions?
@AutoDetect(name="my-function")
public class MyFunction implements
terrastore.store.operators.Function {
public Map<String, Object> apply(String key,
Map<String, Object> value,
Map<String, Object> parameters) {
// ...
}
}
31. Easy scale-out
A command line parameter:
$server>./start.sh
--master
--ensemble
And a json configuration file:
{
"localCluster" : "apple",
"discoveryInterval" : 5000,
"clusters" : ["apple", "orange"],
"seeds" : {"orange" : "192.168.1.2:6001"}
}
32. DSL-like Java APIs
Fluent APIs.
Http-based.
Transparent (yet configurable) object conversion.
// Create client:
TerrastoreClient client =
new TerrastoreClient("http://localhost:8080",
new HTTPConnectionFactory());
// Create object:
Person sbtourist = new Person("Sergio Bossa”, "sbtourist")
// Put:
client.bucket("people").key("1").put(person);
// Get:
sbtourist = client.bucket("people").key("1").get(Person.class);
33. Support for other languages
Clojure
http://github.com/sbtourist/terrastore-cloj
Python
http://dnene.bitbucket.org/docs/pyterrastore
Scala
http://github.com/ssuravarapu/Terrastore-Scala-Client
http://github.com/teigen/terrastore-scala
Easy to add more!
35. More!
Backup
Import/export documents to/from buckets.
Events management
Get notified of document updates.
Third-party products integration.
Write-behind.
ActiveMQ integration for higher reliability.
Custom data partitioning
Retain control of where your data is placed.
Indexing and searching
Terrastore-Search.
ElasticSearch integration.
Cross-origin resource sharing support
Integrate with browser-based clients.
36. Now that I have an hammer ...
Don't go blinded.
Use the right tool for the job!
Terrastore is best suited for:
Data hot spots.
Computational data.
Complex, rich or variable data.
Throw-away data.
37. Final words ... engaging.
Explore
http://code.google.com/p/terrastore
Download
http://code.google.com/p/terrastore/downloads/list
Hack
http://code.google.
com/p/terrastore/source/checkout
Participate
http://groups.google.com/group/terrastore-
discussions
Enjoy!
38. Q&A Contact me on:
http://twitter.com/sbtourist