This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
In this talk, we go over the history and future of Apache Flink adoption at Shopify. We'll talk about how and why we went from choosing Apache Flink as the replacement for our existing streaming technologies in 2021, to a year later with a flourishing streaming community. Today, we have tens of prototypes and several large use-cases running production. Along the way, we'll overview the Flink ecosystem at Shopify, the tools and libraries Shopify built, the decision to fork Flink, how we drove adoption of streaming at the company, and what's next for the platform.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
In this talk, we go over the history and future of Apache Flink adoption at Shopify. We'll talk about how and why we went from choosing Apache Flink as the replacement for our existing streaming technologies in 2021, to a year later with a flourishing streaming community. Today, we have tens of prototypes and several large use-cases running production. Along the way, we'll overview the Flink ecosystem at Shopify, the tools and libraries Shopify built, the decision to fork Flink, how we drove adoption of streaming at the company, and what's next for the platform.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
Configuration of Spring Boot applications using Spring Cloud Config and Spring Cloud Vault.
Presentation given at the meeting of the Java User Group Freiburg on October 24, 2017
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
f you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. The focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation into Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
This presentation as been used to start the pilot phase of the OpenAIRE Advance' funded implementation project in DSpace-CRIS.
DSpace-CRIS now provide support for the OpenAIRE guidelines for CRIS manager in addition to the previous already supported guidelines for Literature Repository and DataArchive
This talk will tell the story of an analytics use case database from a non-OLAP and ACID-compliant RDBMS (MySQL) perspective.
I will cover the basics of the Clickhouse database Sample Clickhouse installation in a lab environment.
We are configuring Clickhouse for essential operations.
We will load the sample data set and monitor it.
We will query and visualize the results.
This talk will also base on how Kubernetes can help Clickhouse implementation via an operator.
Conclusions will include Do's and Don't of this emerging technology. Best practices and some advice around ingesting and analyzing terabytes of data efficiently.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
The need to enrich a fast, high volume data stream with slow-changing reference data is probably one of the most wide-spread requirements in stream processing applications. Apache Flink's built-in join functionalities and its flexible lower-level APIs support stream enrichment in various ways depending on the specific requirements of the use case at hand. In this webinar, I like to provide an overview of the basic methods to enrich a data stream with Apache Flink and highlight use cases, limitations, advantages and disadvantages of each.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
presentation I gave on GaianDB - a dynamic federated distributed database available on IBM alphaWorks
The presentation wont make a lot of sense without speaker notes... which I've not written yet. Sorry about that.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
Graph Database Use Cases - StampedeCon 2015StampedeCon
Presented by Max De Marzi at StampedeCon 2015: Graphs are eating the world – but in what form? Starting off with a primer on Graph Databases, this talk will focus on practical examples of graph applications.
We’ll look at multiple use cases like job boards, dating sites, recommendation engines of all kinds, network management, scheduling engines, etc. We'll also see some examples of graph search in action.
Configuration of Spring Boot applications using Spring Cloud Config and Spring Cloud Vault.
Presentation given at the meeting of the Java User Group Freiburg on October 24, 2017
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
f you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. The focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation into Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
This presentation as been used to start the pilot phase of the OpenAIRE Advance' funded implementation project in DSpace-CRIS.
DSpace-CRIS now provide support for the OpenAIRE guidelines for CRIS manager in addition to the previous already supported guidelines for Literature Repository and DataArchive
This talk will tell the story of an analytics use case database from a non-OLAP and ACID-compliant RDBMS (MySQL) perspective.
I will cover the basics of the Clickhouse database Sample Clickhouse installation in a lab environment.
We are configuring Clickhouse for essential operations.
We will load the sample data set and monitor it.
We will query and visualize the results.
This talk will also base on how Kubernetes can help Clickhouse implementation via an operator.
Conclusions will include Do's and Don't of this emerging technology. Best practices and some advice around ingesting and analyzing terabytes of data efficiently.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
The need to enrich a fast, high volume data stream with slow-changing reference data is probably one of the most wide-spread requirements in stream processing applications. Apache Flink's built-in join functionalities and its flexible lower-level APIs support stream enrichment in various ways depending on the specific requirements of the use case at hand. In this webinar, I like to provide an overview of the basic methods to enrich a data stream with Apache Flink and highlight use cases, limitations, advantages and disadvantages of each.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
presentation I gave on GaianDB - a dynamic federated distributed database available on IBM alphaWorks
The presentation wont make a lot of sense without speaker notes... which I've not written yet. Sorry about that.
Apache Doris (incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis. It is open-sourced by Baidu. Doris mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris not only provides batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris.
Graph Database Use Cases - StampedeCon 2015StampedeCon
Presented by Max De Marzi at StampedeCon 2015: Graphs are eating the world – but in what form? Starting off with a primer on Graph Databases, this talk will focus on practical examples of graph applications.
We’ll look at multiple use cases like job boards, dating sites, recommendation engines of all kinds, network management, scheduling engines, etc. We'll also see some examples of graph search in action.
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
A deep learning startup has a requirement for a robust and scalable data architecture. Training a Deep Neural Network requires 10s-100s of millions of examples consisting of data and metadata. In addition to training it is necessary to support test/validation, data exploration and more traditional data science analytics workloads. As a startup we have minimal resources and an engineering team of 1.
Cassandra, Spark and Kafka running on Mesos in AWS is a scalable architecture that is fast and easy to set up and maintain to deliver a data architecture for Deep Learning.
About the Speaker
Andrew Jefferson VP Engineering, Tractable
A software engineer specialising in realtime data systems. I've worked at companies from Startups to Apple on applications ranging from Ticketing to Genetics. Currently building data systems for training and exploiting Deep Neural Networks.
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
Do you want to learn how to use the low-hanging fruit of knowledge graphs — schema.org and JSON-LD — to annotate content and improve your SEO with semantics and entities? This hands-on workshop with one of the leading Semantic SEO practitioners will help you get started.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL. Join this webinar to learn why companies are shifting away from RDBMS towards graphs to unlock the business value in their data relationships.
Ryan Boyd, Developer Relations at Neo4j
Ryan is a SF-based software engineer focused on helping developers understand the power of graph databases. Previously he was a product manager for architectural software, built applications and web hosting environments for higher education, and worked in developer relations for twenty products during his 8 years at Google. He enjoys cycling, sailing, skydiving, and many other adventures when not in front of his computer.
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
Elasticsearch est un moteur de recherche Open Source très puissant basé sur
Apache Lucene. Il permet l'indexation de millions de données, leur recherche et leur
analyse en temps réel. Les outils Elascticsearch sont déjà utilisés par des acteurs de
référence tels que FourSquare, GitHub, OpenDataSoft ou encore Dailymotion.
Alter Way et Elasticsearch vous convient à venir découvrir la suite Elasticsearch
enfin disponible en version 1.0 et prête pour la production !
Not only, is our data is getting not just more complex but also more connected. In order not to lose sight of the web of information, but to use it as a source of new insights and opportunities, technologies such as graph databases can help.
For both analytical and transactional use cases, they allow efficient storage, retrieval, and processing of networked data without loss of detail. In this talk, we want to get to know existing tools and techniques for graph data processing.
[Given at DAMA WI, Nov 2018] With the increasing prevalence of semi-structured data from IoT devices, web logs, and other sources, data architects and modelers have to learn how to interpret and project data from things like JSON. While the concept of loading data without upfront modeling is appealing to many, ultimately, in order to make sense of the data and use it to drive business value, we have to turn that schema-on-read data into a real schema! That means data modeling! In this session I will walk through both simple and complex JSON documents, decompose them, then turn them into a representative data model using Oracle SQL Developer Data Modeler. I will show you how they might look using both traditional 3NF and data vault styles of modeling. In this session you will:
1. See what a JSON document looks like
2. Understand how to read it
3. Learn how to convert it to a standard data model
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
REX Meetic, Comment la qualité reflète-t-elle nos organisations ?meeticTech
Dans un environnement de transformation digitale, d'Agilité at Scale, il est primordial de mesurer notre capacité à apporter de la valeur à nos utilisateurs.
Cela revient également à mesurer la performance de nos organisations, dixit Melvin Conway, "Les organisations qui conçoivent les systèmes sont contraintes de produire des modèles qui sont des copies de leur propre structure de communication".
Dans ce contexte, comment la qualité peut-elle être un indicateur clé pour mesurer notre performance ? Comment, nos pratiques de tests, reflètent-elles nos organisations ?
Au travers mon expérience chez Meetic durant ces 5 dernières années, je vous propose un retour d'expérience sur les grands changements stratégiques techniques et produits qui m'ont amené à vous proposer cette réflexion autour de l'impact de nos organisations sur la qualité.
Avez-vous déjà livré votre menu debug en Prod ? Nous oui, et nous avons trouvé une solution originale basée sur les App groups pour que cela n'arrive plus jamais.
Feedback on Meetic journey to migrate from a monolithic PHP application to a MicroServices architecture using PHP & Symfony. First presented at Symfony Live Paris 2017 in March 2017 by Etienne Broutin, Software Architect @MeeticTech
Meetup scala paris user group - conflation like @ meeticmeeticTech
Dans un contexte temps de réel, La conflation est un moyen de limiter les traitements sur un flux de données important. Il est parfois plus adapté que la back pressure.
Au programme : présentation des concepts de la conflation, mise en application de certains cas d'usages en live coding.
Pour finir, présentation de nos choix de monitoring qui sont indispensables pour mesurer l'efficacité de cette solution.
Scrum, Kanban, XP, Continuous Delivery, DevOps, … une ou plusieurs des pratiques que vous aimeriez instaurer dans votre société ? Mais par où commencer ? A quoi s’attendre ? Venez découvrir durant cette session le retour d’expérience d’une transition agile ou comment les équipes IT de Meetic ont insufflé et progressivement propagé les méthodes issues des géants du web. Au programme : réalisations, succès mais aussi obstacles, pistes d’amélioration… Bref, un retour de la vraie vie sans langue de bois (PHP Tour 2014)
Comment Meetic opère son changement technologique sur son SI. De la création d’API jusqu’à la mise en place d’une démarche qualité tout en passant par l'adoption du Behavior Driven Development, vous saurez tout sur notre parcours, sur les problématiques que nous avons rencontrées, les solutions que nous avons mises en place ainsi que sur le chemin qu'il nous reste à parcourir afin d’appréhender l’avenir avec la plus grande des sérénités. Les thèmes abordés seront : - Comment aborder des changements majeurs sur notre SI sans impacter notre performance globale ? - Migration d'un code monolithique vers des API REST en Sf2, - Exemple de microservices : AB Test, GEO, Permission, Configuration. - Déploiement avec Composer, Satis, Sf2 et Capistrano sur des centaines de serveurs, - Démarche Qualité (Back, Front, App) : nos métriques, outils du marché, outils interne, gestion aux changements. - Méthodologie : Agilité, DevOps, TDD, BDD. - Next steps : Kafka, Continuous Delivery.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
7. Once upon a time in a far far away galaxy
country…
John wants to meet someone Ygritte wants to meet someone
8. Ygritte knows what king of John she wants
Ygritte knows what she wants for her.
She is looking for a John that match with her criterias.
$ curl –XGET https://api.meetic.fr/search?height=188&eyes=brown®ion=north&country=westeros
9. John knows nothing
John is looking for love, wherever it comes from.
(Except from Casterly Rock)
$ curl –XGET https://api.meetic.fr/search/1234/shuffle
John’s Id
10. Summary
• Let’s introduce the search microservice
• An overview of the code architecture in a microservice
• Indexing data
• What happens when Ygritte update her profile?
• John signs up on Meetic. How his profile is indexed?
• Searching people
• Overview of the advanced search feature used by Ygritte
• How does Meetic suggest profile that may interest John?
12. The search microservice
The search microservice has one responsibility and only one :
Searching Meetic users
Some Meetic features :
• Advanced Search
• Shuffle (Tinder like)
• Online profiles
• Similar Profiles
• Etc..
13. The search microservice
In order to do so, the search microservice should :
• Be responsible of the way data is stored
• Be aware of any data updates when the updates are in its scope
• Return a list of profile’s ID when calling it
16. About code structure in a
microservice @ Meetic
Don’t blame your messy code, blame the design pattern.
(or the architect, or the tech lead, or the product owner)
17. The hexagonal workflow
Request (GET)
Handler
DAO
Domain Object
Domain
Repository
Populate
Call
Implements
Infrastructure
Application
Domain
19. Design pattern hexagonal
Define the “profile” domain object
Implement the repository using Guzzle
Handle the “search” request
Define how profile should be get
Implement the repository using Doctrine
23. John signs up on Meetic
(What should be done)
Event bus
Consumer
Profile
Exposition Layer
POST
{
”id”: 1234,
“birthday”: “1989-01-12,
”picture”: “me.jpg”
}
POST
{
”id”: 1234,
“birthday”: “1989-01-12
}
POST
{
”id”: 6789,
“birthday”: “1989-01-12,
“has_photo”: true,
“paymentStatus”: “free”,
….
}
GET /1234/pictures
GET /payment-status/1234
GET …
Picture
POST
{
”id”: 1234,
“picture”: “me.jpg”
}
24. Theory vs reality
Calling microservices
+ In case of change on any databases, this workflow stay unchanged
+ Avoid duplicated business logic
- Don’t scale very well because of the number of http calls needed
- Takes a lot of time to implement
25. John signs up on Meetic
(What is really done)
Event bus
Consumer
Profile
Exposition Layer
POST
{
”id”: 1234,
“birthday”: “1989-01-12
}
POST /reload/6789
Search
SELECT * FROM PROFILE
LEFT JOIN PAYMENT
WHERE ID = 1234POST
{
”id”: 1234,
“birthday”: “1989-01-12,
“eye”: “brown”,
“paymentStatus”: “free”,
….
}
Picture
POST
{
”id”: 1234,
“birthday”: “1989-01-12,
”picture”: “me.jpg”
}
POST
{
”id”: 1234,
“picture”: “me.jpg”
}
26. Theory vs reality
Querying the database
+ The search microservice stays responsible of his data
+ Allows batch processing
- Works only because databases are not yet split
- Change on the database have to be replicated in the search
microservice
28. What does the reload query looks
like?
SELECT
ID,
BIRTHDAY,
…,
(SELECT * FROM …),
FROM MEETIC.MEMBER
INNER JOIN …
LEFT JOIN…
INNER JOIN…
LEFT JOIN…
LEFT JOIN…
INNER JOIN…
WHERE (
SELECT T FROM MEETIC.T WHERE …
)
AND …
OR…
29. Keeping the reload query maintainable
Since we chose to get data directly from the database when creating a
new document in the index, the SQL query is huge and complex.
• We need it to be easily shared, review and updated by DBAs
• We need to keep it isolated so changes in the DB schema can be
reported in a single file
• We want to be able to just copy-paste it and check if it works.
30. Managing big (SQL) queries with Symfony
#1 Step : The
application handle the
request by calling the
domain interface
31. Managing big (SQL) queries with Symfony
#2 Step : The
domain describe
how objects
should be
manipulated
32. Managing big (SQL) queries with Symfony
#3 Step : The
infrastructure actually
manipulate data.
37. Ygritte uses the search engine
Exposition Layer
Search
POST
{
“query”: {
“term”: {
”eyes”: “brown”
}
}
}
GET /search?eye=brown&hair=brown
GET /search?eye=brown&hair=brown
{
”memberId”: [
1234,
456786,
]
}
38.
39. Keeping the search logic clear
Templating ElasticSearch queries with twig
40. What does an ElasticSearch
query look like?
Most of the time the ElasticSearch query contains the larger
part of the business logic.
• Very long
• Lot of strange json key
• Lot of parameters
And yet…
41. Why not generating queries via php?
• Managing big PHP array throw if…else is quite hard
• It becomes harder to understand what actually
does the query.
42. Keeping the
business logic clear
with twig
Templating the json query using
twig let us know easily what
actually does the query
48. Building query from multiple source
Request (GET)
Handler
Enricher
Domain Object
Domain
Repository
Populate
Call
Implements
Infrastructure
Application
Domain
DAO DAO
DAO
49. Optimizing response time with Guzzle promises
Http calls take time.
Guzzle promises let us use
parallelism in order to save
precious milliseconds.
52. Search microservice in production
• 19 millions hits per day
• ~ 10 servers on 2 DC needed to be “Disaster Recovery Plan” friendly
• Search route AVG response time : ~ 163 ms
• Shuffle route AVG response time : ~ 336 ms