T-Sciences offers iSpatial - a web-based Spatial Data Infrastructure (SDI) to enable integration of third-party applications with geo-visualization tools. The iHarvest tool further enables the mining and analysis of data aggregated in the iSpatial platform for spatio-temporal behavior modelling. At the back-end of both products is MongoDB, providing fundamental framework capabilities for the spatial indexing and data analysis techniques. Come witness how Thermopylae Sciences and Technology leveraged the aggregation framework, and extended the spatial capabilities of MongoDB to tackle dynamic spatio-behavioral data at scale.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
Treasure Data simplifies event analytics for the complex digital
world. Our customers send us 1,000,000 events per second and issue 30,000+ Presto queries everyday to understand their customers better. One of the challenges is designing a cloud database with zero downtime to support a global customer base. We have achieved this goal by developing several open-source technologies; Fluentd and Embulk enable seamless log collection from stream/batch sources, and with MessagePack we can provide an extensible columnar store that accommodates future schema changes. Finally, Presto allows us to serve a wide variety of data processing our customers perform on our service. In this talk, I will present an overview of our system, and how our customers keep using Presto while collecting and extending their data set.
Find out how NoSQL can help your application with practical examples and use-cases from our Cloud Data Services Developer Advocate Glynn Bird. This webinar won't dwell on the science behind the database, but will walk you through real-life use-cases for NoSQL technologies that you can start using today.
Webinar: https://youtu.be/M_Jqw
SQL-based databases have been around for decades and they power a wide range of applications. So what exactly do NoSQL databases bring to the table? In this webcast, you'll find out how NoSQL can liberate your development cycle, allow your application to scale and improve your system's uptime.
Hype, buzzword, threat; however you want to characterize it, the Internet of Things (IoT) is here.
IoT scenarios that were hypothetical only a few years ago are real today. Still thinking along the line of fleet management and temperature measurements? You’re out. Endless possibilities of IoT applications are surfacing every day, from the connected cow (huh?) to things that monitor and analyze your daily life (really?).
In this webinar, we will discuss architecture of IoT data management solutions and the challenges that arise. We will explore how MongoDB features provide solutions to those problems. Time permitting, we will demonstrate an IoT Cloud service built on top of MongoDB.
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
Chris Merz, Manager of Operations, MapMyFitness
The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
Treasure Data simplifies event analytics for the complex digital
world. Our customers send us 1,000,000 events per second and issue 30,000+ Presto queries everyday to understand their customers better. One of the challenges is designing a cloud database with zero downtime to support a global customer base. We have achieved this goal by developing several open-source technologies; Fluentd and Embulk enable seamless log collection from stream/batch sources, and with MessagePack we can provide an extensible columnar store that accommodates future schema changes. Finally, Presto allows us to serve a wide variety of data processing our customers perform on our service. In this talk, I will present an overview of our system, and how our customers keep using Presto while collecting and extending their data set.
Find out how NoSQL can help your application with practical examples and use-cases from our Cloud Data Services Developer Advocate Glynn Bird. This webinar won't dwell on the science behind the database, but will walk you through real-life use-cases for NoSQL technologies that you can start using today.
Webinar: https://youtu.be/M_Jqw
SQL-based databases have been around for decades and they power a wide range of applications. So what exactly do NoSQL databases bring to the table? In this webcast, you'll find out how NoSQL can liberate your development cycle, allow your application to scale and improve your system's uptime.
Hype, buzzword, threat; however you want to characterize it, the Internet of Things (IoT) is here.
IoT scenarios that were hypothetical only a few years ago are real today. Still thinking along the line of fleet management and temperature measurements? You’re out. Endless possibilities of IoT applications are surfacing every day, from the connected cow (huh?) to things that monitor and analyze your daily life (really?).
In this webinar, we will discuss architecture of IoT data management solutions and the challenges that arise. We will explore how MongoDB features provide solutions to those problems. Time permitting, we will demonstrate an IoT Cloud service built on top of MongoDB.
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
Chris Merz, Manager of Operations, MapMyFitness
The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
How to get the best of both: MongoDB is great for low latency quick access of recent data; Treasure Data is great for infinitely growing store of historical data. In the latter case, one need not worry about scaling.
In the world of NoSQL, each database has its own strengths and weaknesses. Understanding which open source database is "the right tool for the job" is half the battle if you want to start building better applications quickly. IBM developer advocate Glynn Bird explores practical examples of how two popular NoSQL databases - the Cloudant JSON document store and the Redis in-memory key-value store - can be used together to create performant and scalable Web applications. It also includes real world use cases you can try today, for free, using the IBM Cloud Data Services suite of fully managed NoSQL databases-as-a-service.
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Wenjun Tao, Sr. Software Engineer, JD.com
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
In order to provide prompt results and efficiently deal with data-intensive workloads, Big Data applications execute their jobs on compute slots across large clusters. Also, for optimal performance, these applications should be as close as possible to the data they use. Data-aware scheduling is the way to achieve that optimization and can conveniently be set up using Kubernetes. We’ll present two different use cases: First, we’ll make use of how Big Data applications like Hadoop and Spark can use their native HDFS protocol for data-aware scheduling. Second, we’ll demonstrate an efficient way to write a data-aware scheduler for Kubernetes that satisfies not just your application’s requirements, but also keeps your admins happy. As a bonus, it’ll also allows us to run data-aware scheduling on applications other than Big Data.
Event: Kubernetes Meetup Rhein-Neckar, 18.10.2017
Speaker: Johannes M. Scheuermann
weiter Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Tech-Artikel in unserem Blog: https://www.inovex.de/blog/
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
Zeus is an efficient, highly scalable and distributed shuffle as a service which is powering all Data processing (Spark and Hive) at Uber. Uber runs one of the largest Spark and Hive clusters on top of YARN in industry which leads to many issues such as hardware failures (Burn out Disks), reliability and scalability challenges.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
How to get the best of both: MongoDB is great for low latency quick access of recent data; Treasure Data is great for infinitely growing store of historical data. In the latter case, one need not worry about scaling.
In the world of NoSQL, each database has its own strengths and weaknesses. Understanding which open source database is "the right tool for the job" is half the battle if you want to start building better applications quickly. IBM developer advocate Glynn Bird explores practical examples of how two popular NoSQL databases - the Cloudant JSON document store and the Redis in-memory key-value store - can be used together to create performant and scalable Web applications. It also includes real world use cases you can try today, for free, using the IBM Cloud Data Services suite of fully managed NoSQL databases-as-a-service.
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Wenjun Tao, Sr. Software Engineer, JD.com
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
In order to provide prompt results and efficiently deal with data-intensive workloads, Big Data applications execute their jobs on compute slots across large clusters. Also, for optimal performance, these applications should be as close as possible to the data they use. Data-aware scheduling is the way to achieve that optimization and can conveniently be set up using Kubernetes. We’ll present two different use cases: First, we’ll make use of how Big Data applications like Hadoop and Spark can use their native HDFS protocol for data-aware scheduling. Second, we’ll demonstrate an efficient way to write a data-aware scheduler for Kubernetes that satisfies not just your application’s requirements, but also keeps your admins happy. As a bonus, it’ll also allows us to run data-aware scheduling on applications other than Big Data.
Event: Kubernetes Meetup Rhein-Neckar, 18.10.2017
Speaker: Johannes M. Scheuermann
weiter Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Tech-Artikel in unserem Blog: https://www.inovex.de/blog/
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
Zeus is an efficient, highly scalable and distributed shuffle as a service which is powering all Data processing (Spark and Hive) at Uber. Uber runs one of the largest Spark and Hive clusters on top of YARN in industry which leads to many issues such as hardware failures (Burn out Disks), reliability and scalability challenges.
If you are working in refining or petrochemical industry, you need to learn about fired heaters. This paper talks about basic specifications of fired heaters. You can benefit by using good specifications to purchase fired heaters for your next project.
this is summary about smart building. i got it from many literature, in this summary you can know what is smart building, the definition, the characteristic of smart building, what is the point of smart building and many others.
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Amazon Web Services
In this session, we will provide an overview of existing cloud-ready contracts, such as cooperative contracts and GWACs, and walk through steps on how to choose the right one for your procurement. We will explore the strengths and weaknesses and provide a comparison of various cloud-ready contracts to help you make the right choice for your mission needs.
The RPFP presents the vision for the physical and socio-economic development of the region for the next twenty-six (26) years, as well as, the policy guidelines and directions
for the major components of the plan, namely, Protection Land Use, Production Land Use, Settlements, and Infrastructure Support.
Master Source-to-Pay with Cloud and Business Networks [Stockholm]SAP Ariba
In their initial phase, business networks were all about connecting companies more efficiently to perform a discreet process – buying, selling, invoicing, etc. Today, Ariba is so much more - a platform for innovation for companies of all sizes, to harness insights and intelligence to break down the barriers to collaboration and enable competitive advantage. But this is a new Ariba - smarter, faster , more accessible, and more global than ever. And we can help you transform your Procurement and Finance processes in ways never thought possible.
IT-AAC Defense IT Reform Report to the Sec 809 PanelJohn Weiler
Today, 1/12/17, the IT-AAC briefed the Panel on Streamlining and Codifying Acquisition Regulations (NDAA Sec 809). These recommendations are the results of an 8 year study that included the review of over 40 major studies, over 40 leadership workshops, and root cause analysis of over 40 major IT program failures.
For the March ODROID Magazine edition I contributed article for "Peeking Under the Hood" for Android. In this article I walk through the different directories inside the Android massive source base, explaining as much as I can what code the directory contains and what is it usage.
For complete information about Android source code should check out my page http://elinux.org/Android_Source_Code_Description
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
In recent years, Apache™ Hadoop® has emerged from humble beginnings to disrupt the traditional disciplines of information management. As with all technology innovation, hype is rampant, and data professionals are easily overwhelmed by diverse opinions and confusing messages.
Even seasoned practitioners sometimes miss the point, claiming for example that Hadoop replaces relational databases and is becoming the new data warehouse. It is easy to see where these claims originate since both Hadoop and Teradata® systems run in parallel, scale up to enormous data volumes and have shared-nothing architectures. At a conceptual level, it is easy to think they are interchangeable, but the differences overwhelm the similarities. This session will shed light on the differences and help architects, engineering executives, and data scientists identify when to deploy Hadoop and when it is best to use MPP relational database in a data warehouse, discovery platform, or other workload-specific applications.
Two of the most trusted experts in their fields, Steve Wooledge, VP of Product Marketing from Teradata and Jim Walker of Hortonworks will examine how big data technologies are being used today by practical big data practitioners.
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
Most organizations still rely on batch and offline processing of data streams to gain meaningful analysis and insight into their business. However, in our instant gratification world, real-time computation and analysis of streaming data is crucial in gaining insight into patterns and threats. A trend is emerging for real-time and instant analysis from live data streams, promoting the value of logs and a move toward functional programming.
This shift in technology is not about what and how to store the data, but what we can do with it to see emerging patterns and trends across multiple resources, applications, services and environments. Log data represents a wealth of information, yet is often sporadic, unstructured, scattered across the enterprise and difficult to track.
These slides provide insights into some of the most helpful Big Data tools used by the largest social media and data-centric organizations for competitive trends, instant analysis and feedback from large volume data streams. We show how how using Big Data tools Storm, ElasticSearch and an elastic UI can turn application logs into real-time analytical views.
You will also learn how Big Data:
Contains data that is elastic, minimally structured, flexible and scalable
Helps process live streams into meaningful data
Promotes a move toward functional programming
Effects the enterprise data architecture
Works with real-time CEP tools like Storm for functional programming
Hi all, its presentation about the big data analysis done using a data mining tool known as HADOOP, which is based on Distributive file system and uses parallel computing for working.
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
Postgres is the leading open source database management system that is being developed by a very active community for more than 15 years. Gaby Schilders is Sales Engineer at EnterpriseDB, supplier of the EDB Postgres data platform.
Gaby Schilders, Sales Engineer at EnterpriseDB, will be explaining why companies take open source as the centerpiece for modernising their IT infrastructure, thus increasing their scalability and taking full advantage today's technologies offer them.
5 Things that Make Hadoop a Game Changer
Webinar by Elliott Cordo, Caserta Concepts
There is much hype and mystery surrounding Hadoop's role in analytic architecture. In this webinar, Elliott presented, in detail, the services and concepts that makes Hadoop a truly unique solution - a game changer for the enterprise. He talked about the real benefits of a distributed file system, the multi workload processing capabilities enabled by YARN, and the 3 other important things you need to know about Hadoop.
To access the recorded webinar, visit the event site: https://www.brighttalk.com/webcast/9061/131029
For more information the services and solutions that Caserta Concepts offers, please visit http://casertaconcepts.com/
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
Watch full webinar here: https://bit.ly/3nLHayP
Performance is critical for an organization across the board. Developers can optimize execution with Summaries, MPP, Data Movement, and more. Business users rely on the Recommendation engine to guide them to the right data. Let’s discover and learn about various performance acceleration techniques in this session.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
2. Overview
• About Thermopylae Sciences + Technology
• What is iHarvest?
• The Problem
• Other Solutions
• Why we chose MongoDB
• Lesson Learned
• What next?
3. What is iHarvest?
iHarvest (Interest Harvest) is a system that builds profiles of activities by discrete node based on a
number of variables and then analyzes those models for similarity. It is an automated and
intelligent system that continually monitors changes in activities and models using
advanced, proprietary algorithms. iHarvest is designed to:
• Identify - Collect and store event activities and data feeds
• Model - Build and identify related interests to store as profile model
• Analyze - Identify similarity and comparisons between common activities
• Report - Aggregate and provide recommendations and analytics on findings
Features
• Operating unobtrusively on any closed-network system
• Adapts to system and usage activity changes as it is used
• Hone in on user-specific needs, becoming more accurate, efficient, and easy to use
• Deliver customized solutions such as collaboration, monitoring, and even insider threat
analysis
• Alerts on "non-observable" data and relationships
5. Our Problems
• Data Storage was Difficult to Scale
o 2012 iHarvest Roadmap Releases required adding significantly more analytic
processing and storage of results.
• Document Based Data Store (JSON)
o Need to rapidly increase the richness of our data models dynamically so we don't have
to redesign our data access layer and schema with each update/change.
• Geo Spatial Index – event data is not purely textual – needed a solution that
included support for spatial qualities
• Increased Analytics requiring more processing power – as data grew – so does
analytic processing requirement
• Had a requirement to provide Statistical and Aggregate results of our data
6. Other Solutions We Tried/Looked at
• PostgreSQL. Used "NoSQL" like key-value pair
store, but totally failed on performance when trying
to access sub-field data.
• Accumulo. Very difficult to setup, configure and
develop against. Required HDFS, Hadoop, and
Zookeeper. Also required expert Admins that just
didn't exist yet. On the plus side, provided
MapReduce capability.
7. Why we chose MongoDB
• Built-in MapReduce
– Based on the fact that we are predominantly doing massive amounts of
analytics on our data
• Aggregation Framework
– Connected directly to REST endpoints for developer/prototyping use –
substantial decrease in development time
• No Need for separate Hadoop Cluster
– Faster development and reduced installation/integration/maintenance for
customers
• Developer Friendly
• Instead of using complex JDBC and SQL, able to simply instantiate Objects
and call Methods
• Great Documentation
• Easy to Scale
8. How we're using MongoDB in iHarvest
Scalable Dynamic Storage for:
• Events, Feeds, Profile models
• Processed Analytic Results
Aggregation Framework
• Statistics and Data Aggregation
MapReduce
• Primarily to process K-Means clustering algorithm of Geo
Data
9. Dynamic Storage
• We leverage a JSON based document model
• Allows us to add new fields/attributes without having to update a schema
• Shard addition allows us to scale easily with our data
• Events
– High volume of incoming data – data can grow to be very large very
quickly
• Profile Engine processes events
– 16 x 16 profile tables – key based on profile ID allows even distribution
that allows us to dedicate profile engine processing to specific profiles
that require updating - No one Processing Engine has to do all the
work
– MongoDB allows us to dynamically grow our dimensions by adding
new tables to the grid programmatically
10. Aggregation Framework
• Create statistical endpoints by making calls to Aggregation Framework
– We created a REST API that allows JS query to aggregation framework
against any of our tables/indexes
– This gives us very powerful way to prototype new statistics and data
aggregations quickly
• Temporal aggregation is very valuable to us and we basically get it for
“free” with MongoDB built-in functions
• We leverage Aggregation Framework on the following components
– Activity
• Raw incoming data related to event
– Events
• Summarization of Activities
– Node
• Discrete item the Activity/Event is related to
11. MapReduce
• Geo Clustering is the predominant use for built-in MapReduce at this point
(outside of what Aggregation Framework is already doing)
• K-means is the cluster analysis method we use to look at geo
similarity/overlap
• In order to take advantage of this method, we use the inherent MongoDB
geo-indexing mechanism to quickly spatially access data by geographic
region
• Segregation of this data allows us to quickly perform k-means clustering
on this data alone w/o having to jam the data alongside other event data
• Developed our own k-means MapReduce queries to support the spatial-
clustering model development/process
• Automatically scales processing across # of shards which is very helpful
since k-means is very computationally intensive
12. Lessons Learned
• Moving to NoSQL from a Relational database requires switching mind-set
on data storage and processing. ie. More back-end processing, and
immediate access to results when they are ready (done being processed).
• Aggregation Framework is powerful, but could use more tutorials and
example on usage.
• Built in MapReduce allowed us to offload much of our processing and
take advantage of MongoDB auto sharding / processing.
• When storing dates in MongoDB, be sure to use ISODate to take
advantage of their Date/Time related functions.
• Understanding the data types provided by MongoDB is important to fully
take advantage of the inherent Aggregation Framework capabilities
• Go to schema workshop!
• You can of course write your own MapReduce query but can do a lot out of the box by
being mindful/knowledgeable on what is already provided for you
13. What's next?
• Increased use of MapReduce to process:
o Enhance Similarity Analytics Processing to increase
efficiency
o Additional Interest Building Algorithms
for Profile generation
• Integration of Mahout and MongoDB for additional
Clustering Algorithms
• Integration of Spring/MongoDB to better abstract
the data model