Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using tools like MapReduce, Pig and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With Hadoop MapReduce, Java and Scala programmers will find a native solution for using MapReduce to process their data with MongoDB. Programmers of all kinds will find a new way to work with ETL using Pig to extract and analyze large datasets and persist the results to MongoDB. Python and Ruby Programmers can rejoice as well in a new way to write native Mongo MapReduce using the Hadoop Streaming interfaces.
Relational databases are central to web applications, but they have also been the primary source of pain when it comes to scale and performance. Recently, non-relational databases (also referred to as NoSQL) have arrived on the scene. This session explains not only what MongoDB is and how it works, but when and how to gain the most benefit.
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was presented as more of a Q & A session. Topics covered include. Introduction to MongoDB, Use Cases, Schema design, High availability (replication) and Horizontal Scaling (sharding).
Relational databases are central to web applications, but they have also been the primary source of pain when it comes to scale and performance. Recently, non-relational databases (also referred to as NoSQL) have arrived on the scene. This session explains not only what MongoDB is and how it works, but when and how to gain the most benefit.
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was presented as more of a Q & A session. Topics covered include. Introduction to MongoDB, Use Cases, Schema design, High availability (replication) and Horizontal Scaling (sharding).
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
Este es el tercer seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web se explica la arquitectura de las bases de datos de documentos.
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
Este es el segundo seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web mostraremos cómo construir una aplicación de creación de blogs en MongoDB.
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
Presented by Greg Deeds, CEO, Technology Exploration Group
Experience level: Introductory
A two person team using MongoDB and Salesforce.com created a geospatial machine learning tool from various datasets, parsing, indexing, and mapreduce in 24 hours. The amazing hack that beat 350 teams from around the world designer Greg Deeds will speak on getting to the winners circle with MongoDB power. It was MongoDB that proved to be the teams secret weapon to level the playing field for the win!
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
Este es el quinto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web, se analizan los aspectos básicos de Aggregation Framework.
Webinar: Getting Started with MongoDB - Back to BasicsMongoDB
Part one an Introduction to MongoDB. Learn how easy it is to start building applications with MongoDB. This session covers key features and functionality of MongoDB and sets out the course of building an application.
CosmosDB service is a NoSQL is a globally distributed, multi-model database database service designed for scalable and high performance modern applications. CosmosDB is delivered as a fully managed service with an enterprise grade SLA. It supports querying of documents using a familiar SQL over hierarchical JSON documents. Azure Cosmos DB is a superset of the DocumentDB service. It allows you to store and query noSQL data, regardless of schema. In this presentation, you will learn: • How to get started with DocumentDB you provision a new database account. • How to index documents • How to create applications using CosmosDb (using REST API or programming libraries for several popular language) • Best practices designing applications with CosmosDB • Best practices creating queries.
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
Will Shulman, from MongoLab, shows us how to to persist our mobile app data in the cloud using a super-scalable and amazingly developer-friendly MongoDB back-end.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This is the first webinar of a Back to Basics series that will introduce you to the MongoDB database, what it is, why you would use it, and what you would use it for.
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
Il s'agit du deuxième webinaire de la série « Retour aux fondamentaux » qui a pour but de vous présenter la base de données MongoDB. Dans ce webinaire, nous vous expliquerons comment créer une application de création de blogs dans MongoDB.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about enterprise NoSQL architectures, with examples drawn from real-world deployments, as well as how to apply big data regardless of the size of your own enterprise.
What every successful open source project needsSteven Francia
In the last few years open source has transformed the software industry. From Android to Wikipedia, open source is everywhere, but how does one succeed in it? While open source projects come in all shapes and sizes and all forms of governance, no matter what kind of project you’re a part of, there are a set of fundamentals that lead to success. I’d like to share some of the lessons I’ve learned from running two of the largest commercial open source projects, Docker and MongoDB, as well as some very successful community projects.
This presentation was delievered at sinfo.org in Feb 2015.
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
Este es el tercer seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web se explica la arquitectura de las bases de datos de documentos.
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
Este es el segundo seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web mostraremos cómo construir una aplicación de creación de blogs en MongoDB.
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
Presented by Greg Deeds, CEO, Technology Exploration Group
Experience level: Introductory
A two person team using MongoDB and Salesforce.com created a geospatial machine learning tool from various datasets, parsing, indexing, and mapreduce in 24 hours. The amazing hack that beat 350 teams from around the world designer Greg Deeds will speak on getting to the winners circle with MongoDB power. It was MongoDB that proved to be the teams secret weapon to level the playing field for the win!
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
Este es el quinto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web, se analizan los aspectos básicos de Aggregation Framework.
Webinar: Getting Started with MongoDB - Back to BasicsMongoDB
Part one an Introduction to MongoDB. Learn how easy it is to start building applications with MongoDB. This session covers key features and functionality of MongoDB and sets out the course of building an application.
CosmosDB service is a NoSQL is a globally distributed, multi-model database database service designed for scalable and high performance modern applications. CosmosDB is delivered as a fully managed service with an enterprise grade SLA. It supports querying of documents using a familiar SQL over hierarchical JSON documents. Azure Cosmos DB is a superset of the DocumentDB service. It allows you to store and query noSQL data, regardless of schema. In this presentation, you will learn: • How to get started with DocumentDB you provision a new database account. • How to index documents • How to create applications using CosmosDb (using REST API or programming libraries for several popular language) • Best practices designing applications with CosmosDB • Best practices creating queries.
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
Will Shulman, from MongoLab, shows us how to to persist our mobile app data in the cloud using a super-scalable and amazingly developer-friendly MongoDB back-end.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This is the first webinar of a Back to Basics series that will introduce you to the MongoDB database, what it is, why you would use it, and what you would use it for.
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
Il s'agit du deuxième webinaire de la série « Retour aux fondamentaux » qui a pour but de vous présenter la base de données MongoDB. Dans ce webinaire, nous vous expliquerons comment créer une application de création de blogs dans MongoDB.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about enterprise NoSQL architectures, with examples drawn from real-world deployments, as well as how to apply big data regardless of the size of your own enterprise.
What every successful open source project needsSteven Francia
In the last few years open source has transformed the software industry. From Android to Wikipedia, open source is everywhere, but how does one succeed in it? While open source projects come in all shapes and sizes and all forms of governance, no matter what kind of project you’re a part of, there are a set of fundamentals that lead to success. I’d like to share some of the lessons I’ve learned from running two of the largest commercial open source projects, Docker and MongoDB, as well as some very successful community projects.
This presentation was delievered at sinfo.org in Feb 2015.
Given at GopherFest 2015. This is an updated version of the talk I gave in NYC Nov 14 at GothamGo.
“We need to think about failure differently. Most people think mistakes are a necessary evil. Mistakes aren't a necessary evil, they aren't evil at all. They are an inevitable consequence of doing something new and as such should be seen as valuable. “ - Ed Catmull
As Go is a "new" programming language we are all experimenting and learning how to write better Go. While most presentations focus on the destination, this presentation focuses on the journey of learning Go and the mistakes I personally made while developing Hugo, Cobra, Viper, Afero & Docker.
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
Linux has become the foundation for infrastructure everywhere as it defined application portability from the desktop to the phone and from to the data center to the cloud. As applications become increasingly distributed in nature, the Docker platform serves as the cornerstone of Linux’s evolution solidifying the dominance of Linux today and into tomorrow.
Given as a Keynote at LinuxCon 2015 in Tokyo
This presentation will give developers an introduction and practical experience
of using MongoDB with the Go language. MongoDB Chief Developer Advocate &
Gopher Steve Francia presents plainly what you need to know about using MongoDB
with Go.
As an emerging language Go is able to start fresh without years of relational database dependencies. Application and library developers are able to build applications using the excellent Mgo MongoDB driver and the reliable go sql package for relational database. Find out why some people claim Go and MongoDB are a “pair made in heaven” and “the best database driver they’ve ever used” in this talk by Gustavo Niemeyer, the author of the mgo driver, and Steve Francia, the drivers team lead at MongoDB Inc.
We will cover:
Connecting to MongoDB in various configurations
Performing basic operations in Mgo
Marshaling data to and from MongoDB
Asynchronous & Concurrent operations
Pre-fetching batches for seamless performance
Using GridFS
How MongoDB uses Mgo internally
This presentation was given as a Workshop at OSCON 2014.
New to Go? This tutorial will give developers an introduction and practical
experience in building applications with the Go language. Gopher Steve Francia,
Author of [Hugo](http://hugo.spf13.com),
[Cobra](http://github.com/spf13/cobra), and many other popular Go packages
breaks it down step by step as you build your own full featured Go application.
Starting with an introduction to the Go language. He then reviews the fantastic
go tools available. With our environment ready we will learn by doing. The
remainder of the time will be dedicated to building a working go web and cli
application. Through our application development experience we will introduce
key features, libraries and best practices of using Go.
This tutorial is designed with developers in mind. Prior experience with any of the
following languages: ruby, perl, java, c#, javascript, php, node.js, or python
is preferred. We will be using the MongoDB database as a backend for our
application.
We will be using/learning a variety of libraries including:
* bytes and strings
* templates
* net/http
* io, fmt, errors
* cobra
* mgo
* Gin
* Go.Rice
* Cobra
* Viper
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
Object Oriented (OO) programming has dominated software engineering for the last two decades. The paradigm built on powerful concepts such as Encapsulation, Inheritance, and Polymoprhism has been internalized by the majority of software engineers. Although Go is not OO in the strict sense, we can continue to leverage the skills we’ve honed as OO engineers to come up with simple and solid designs.
Gopher Steve Francia, Author of
[Hugo](http://hugo.spf13.com), [Cobra](http://github.com/spf13/cobra), and many
other popular Go packages makes these difficult concepts accessible for everyone.
If you’re a OO programmer, especially one with a background with dynamic languages and are curious about go then this talk is for you. We will cover everything you need to know to leverage your existing skills and quickly start coding in go including:
How to use our Object Oriented programming fundamentals in go
Static and pseudo dynamic typing in go
Building fluent interfaces in go
Using go interfaces and duck typing to simplify architecture
Common mistakes made by those coming to go from other OO languages (Ruby, Python, Javascript, etc.),
Principles of good design in go.
7 Common mistakes in Go and when to avoid themSteven Francia
I've spent the past two years developing some of the most popular libraries and applications written in Go. I've also made a lot of mistakes along the way. Recognizing that "The only real mistake is the one from which we learn nothing. -John Powell", I would like to share with you the mistakes that I have made over my journey with Go and how you can avoid them.
This ingite length deck talks about why we have seen so much database innovation and the genesis of the NoSQL movement over the last 5 year. While there are many great NoSQL products it speaks to why MongoDB is dominating the space and is the heir apparent to the RDBMS for modern operational data.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
Data-Intensive Computing for Text Analysis CS395T / INF385T / LIN386M
University of Texas at Austin, Fall 2011
Lecture 2 September 1, 2011
Jason Baldridge and Matt Lease
https://sites.google.com/a/utcompling.com/dicta-f11/
Slides for a talk given by Fede Fernández & Fran Pérez. It describes some common tips to improve a Kafka and Spark application, going through improving table joins, operational parameters as blockIntervalTime or number of partitions, serializations or how byKey operations work under the scenes.
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezJ On The Beach
During this talk we will see a regular Kafka/Spark Streaming application, going through some of the most common issues and how we fix them. We'll see how to improve our Spark App in two different point of views: Code quality and Spark Tuning. The final goal is to have a robust and resilient Spark Application deployable in a production-like environment.
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
Talk at Scala Up North Jul 21 2017
We will talk about Spotify's story with Scala big data and our journey to migrate our entire data infrastructure to Google Cloud and how Justin Bieber contributed to breaking it. We'll talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and the technology behind it, including macros, algebird, chill and shapeless. There'll also be a live coding demo.
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started... which soon becomes 50 distributed replica sets as volume increases. This talk describes how we designed a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. In this deep-dive talk, we explore how we've used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
Webinar: Data Processing and Aggregation OptionsMongoDB
MongoDB scales easily to store mass volumes of data. However, when it comes to making sense of it all what options do you have? In this talk, we'll take a look at 3 different ways of aggregating your data with MongoDB, and determine the reasons why you might choose one way over another. No matter what your big data needs are, you will find out how MongoDB the big data store is evolving to help make sense of your data.
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
In the analysis of big data there are problematic queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most frequent items, joins, matrix computations, and graph analysis. If approximate results are acceptable, there is a class of sub-linear, stochastic streaming algorithms, called "sketches", that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of extracting results for these problem queries in real-time, sketches are the only known solution. For any analysis system that requires these problematic queries from big data, sketches are a required toolkit that should be tightly integrated into the system's analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days to hours, or minutes to seconds on a number of its internal platforms. This talk covers the current state of our Open Source DataSketches.github.io library, which includes adaptations and example code for Pig, Hive, Spark and Druid and gives architectural examples of use and a case study.
Speakers:
Jon Malkin is a scientist at Yahoo working to extend the DataSketches library. His previous roles have involved large scale data processing for sponsored search, display advertising, user counting, ad targeting, and cross-device user identity modeling.
Alexander Saydakov is a senior software engineer at Yahoo working on the open source Data Sketches project. In his previous roles he has been involved in building large-scale back-end data processing systems and frameworks for data analytics and experimentation based on Torque, Hadoop, Pig, Hive and Druid. Alexander’s education background is in the field of applied mathematics.
MongoDB Solution for Internet of Things and Big DataStefano Dindo
Internet of Things è uno degli scenari di mercato più importanti su cui investire entro il 2020.
L'Internet of Things permette di trasferire sul Web la vita reale delle persone grazie all'interazione con oggetti e spazi fisici scambiando un grande volume di dati.
Durante il Lab è stata fornita una descrizione di architettura necessaria a supportare progetti di Internet of Things con un focus sull'organizzazione dei dati all'interno di MongoDB, database NoSQL Leader di mercato, per raccogliere ed analizzare grandi volumi di dati in tempo reale ed in modo efficiente.
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...festival ICT 2016
Le aziende sono chiamate a rispondere velocemente ai cambiamenti di mercato a seguito dell’affermarsi di nuovi scenari (Internet of Things, Social Analysis, manifattura 4.0 ecc.) che richiedono sempre di più l’integrazione con nuove tecnologie. Trattandosi di progetti innovativi, che impattano su processi e contesti aziendali, è fondamentale disporre di soluzioni flessibili per la raccolta e l’analisi dei dati.
MongoDB è un database NoSQL in grado di offrire flessibilità, scalabilità e semplificazione delle attività di sviluppo. Il lab avrà lo scopo di illustrare come creare architetture MongoDB e svolgere attività di Schema Design per la gestione dei dati in ambito IoT e Big Data facendo inoltre riferimento a casi pratici reali che si basano su tecnologie Cloud, necessarie a far fronte ad un mercato sempre più globale.
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
Presented by Luke Lovett, Software Engineer, MongoDB
Experience level: Introductory
MongoDB and Hadoop work powerfully together as complementary technologies. Learn how the Hadoop connector allows you to leverage the power of MapReduce to process data sourced from your MongoDB cluster.
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...confluent
High-speed and low footprint data stream processing is high in demand for Kafka Streams applications. However, how to write an efficient streaming application using the Streams DSL has been asked by many users in the past since it requires some deep knowledge about Kafka Streams internals. In this talk, I will talk about how to analyze your Kafka Streams applications, target performance bottlenecks and unnecessary storage costs, and optimize your application code accordingly using the Streams DSL.
In addition, I will talk about the new optimization framework that we have been developed inside Kafka Streams since the 2.1 release which replaced the in-place translation of the Streams DSL into a comprehensive process composed of streams topology compilation and rewriting phases, with a focus on reducing various storage footprints of Streams applications, such as state stores, internal topics etc.
This talk is aimed to give developers who are interested to write their first or second Streams applications using the Streams DSL, or those who have already launched several services written in Kafka Streams in production but wants to further optimize these applications a better understanding on how different Streams operators in the DSL are being translated into the streams topology during runtime. And by having a deep-dive into the newly added optimization framework for Streams DSL, audience will have more insight into the kinds of optimization opportunities that are possible for Kafka Streams, that the library is trying to tackle right now and in the near future.
Similar to MongoDB, Hadoop and humongous data - MongoSV 2012 (20)
State of the Gopher Nation - Golang - August 2017Steven Francia
This talk is an overview of the Go project. It covers “what we’ve done”, “why we did it” and “where we are going” as a project.
It highlights our accomplishments, challenges and how the Go Project is working on our challenges.
Discover & identify ideal storage solution for our needs by examining the history of data storage & the modern database systems including Key Value, Relational, Graph and Document databases.
This presentation was given at RootsTech 2013 in March
Replication, Durability, and Disaster RecoverySteven Francia
This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.
Strategies for multi-data center deployment. Diving into the details of deploying of MongoDB across multiple data centers.
Covers the advantages of a multi data center deployment for read/write locality, the various deployment strategies, and disaster preparedness and recovery.
In addition, we’ll look at the MongoDB roadmap and planned enhancements around data center awareness.
This presentation was given at MongoNYC 2012. The animations didn’t survive the transformation to the web, so not all the meaning carries over perfectly.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Building your first application w/mongoDB MongoSV2011Steven Francia
This talk will introduce the features of MongoDB by walking through how one can building a simple location-based application using MongoDB. The talk will cover the basics of MongoDB's document model, query language, map-reduce framework and deployment architecture.
MongoDB, PHP and the cloud - php cloud summit 2011Steven Francia
An introduction to using MongoDB with PHP.
Walking through the basics of schema design, connecting to a DB, performing CRUD operations and queries in PHP.
MongoDB runs great in the cloud, but there are some things you should know. In this session we'll explore scaling and performance characteristics of running Mongo in the cloud as well as best practices for running on platforms like Amazon EC2.
Augmenting RDBMS with MongoDB for ecommerceSteven Francia
Steve Francia, VP of Engineering at OpenSky a NYC based social commerce company, on how OpenSky augments using RDBMS with MongoDB to develop the next ecommerce platform.
OpenSky utilizes both traditional SQL solutions and combines them with NoSQL to overcome the limitations of each, increase development speed and scale quickly.
MongoDB and Ecommerce : A perfect combinationSteven Francia
Presentation given at the MongoDB NYC Meetup by Steve Francia, VP of Engineering at OpenSky. OpenSky uses MongoDB to develop the next ecommerce platform. OpenSky also uses Symfony 2, Doctrine 2, PHP 5.3, PHPUnit 3.5, jQuery, node.js, Git (with gitflow) and a touch of Java and Python. The OpenSky team contributes back to many of these technologies and employs core members of the Symfony 2 and Doctrine 2 teams.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
2. Talking about
What is Humongous Data
Humongous Data & You
MongoDB & Data processing
Future of Humongous Data
3. @spf13
AKA
Steve Francia
15+ years building
the internet
Father, husband,
skateboarder
Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
5. 2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.
Google’s new index, comprising
more than 1 billion URLs
6. 2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs
(and the number of individual web
pages out there is growing by
several billion pages per day).
8. Data Growth 1,000
1000
750
500
500
250
250
120
55
4 10 24
1
0
2000 2001 2002 2003 2004 2005 2006 2007 2008
Millions of URLs
9. Truly Exponential
Growth
Is hard for people to grasp
A BBC reporter recently: "Your current PC
is more powerful than the computer they
had on board the first flight to the moon".
10. Moore’s Law
Applies to more than just CPUs
Boiled down it is that things double at
regular intervals
It’s exponential growth.. and applies to
big data
25. Applications have
complex needs
MongoDB ideal operational
database
MongoDB ideal for BIG data
Not a data processing engine, but
provides processing functionality
26. Many options for
Processing Data
•Process in MongoDB using
Map Reduce
•Process in MongoDB using
Aggregation Framework
•Process outside MongoDB (using Hadoop)
27. MongoDB Map Reduce
Map()
MongoDB Data
Group(k)
emit(k,v)
map iterates on
documents
Document is $this
Sort(k)
1 at time per shard
Reduce(k,values)
k,v
Finalize(k,v)
Input matches output
k,v Can run multiple times
28. MongoDB Map Reduce
MongoDB map reduce quite capable... but with
limits
- Javascript not best language for processing map
reduce
- Javascript limited in external data processing
libraries
- Adds load to data store
29. MongoDB
Aggregation
Most uses of MongoDB Map Reduce were for
aggregation
Aggregation Framework optimized for aggregate
queries
Realtime aggregation similar to SQL GroupBy
30. MongoDB & Hadoop
same as Mongo's Many map operations
MongoDB shard chunks (64mb) 1 at time per input split
Creates a list each split Map (k1,1v1,1ctx) Runs on same
of Input Splits Map (k ,1v ,1ctx) thread as map
each split Map (k , v , ctx)
single server or
sharded cluster (InputFormat) each split ctx.write(k2,v2)2
ctx.write(k2,v )2 Combiner(k2,values2)2
RecordReader ctx.write(k2,v ) Combiner(k2,values )2
Combiner(k2,values )
k2, 2v3 3
k , 2v 3
k ,v
Partitioner(k2)2
Partitioner(k )2
Partitioner(k )
Sort(keys2)
Sort(k2)2
Sort(k )
MongoDB
Reducer threads
Reduce(k2,values3)
Output Format Runs once per key
kf,vf
32. DEMO
Install Hadoop MongoDB Plugin
Import tweets from twitter
Write mapper in Python using Hadoop
streaming
Write reducer in Python using Hadoop
streaming
Call myself a data scientist
33. Installing Mongo-hadoop
https://gist.github.com/1887726
hadoop_version '0.23'
hadoop_path="/usr/local/Cellar/hadoop/
$hadoop_version.0/libexec/lib"
git clone git://github.com/mongodb/mongo-
hadoop.git
cd mongo-hadoop
sed -i '' "s/default/$hadoop_version/g" build.sbt
cd streaming
./build.sh