TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
David Durst of BlackRock presents TopNotch, a system for systematically quality controlling big data. TopNotch uses assertions to define and measure data quality, reuses commands across data sets to maximize efficiency, and institutionalizes knowledge of data sets through plans and commands. It provides a unit testing framework for data with assertions to verify facts, diffs to compare data sets, and views to transform data. This solves the problems of defining data quality, efficiently quality controlling many data sets, and institutionalizing knowledge of data sets.
Alexander Pavlenko, Java Software Engineer, DataArt.Alina Vilk
This document discusses building a Spark connector for Ryft, a hardware appliance that performs high-speed compute on big data. It provides examples of how to query data stored on Ryft using its query language. The Spark connector allows querying and processing Ryft data using Spark and RDDs in a distributed manner. It maps structured Ryft data to DataFrames for SQL queries and shows performance benefits of using Ryft compared to running Spark on EC2 servers.
Introduction to Data Science: A Practical Approach to Big Data AnalyticsIvan Khvostishkov
Meetup Moscow Big Systems/Big Data invited 3 March 2016 an engineer from EMC Corporation, Ivan Khvostishkov, to speak on key technologies and tools used in Big Data analytics, explain differences between Data Science and Business Intelligence and look closer on real use case from the industry. Materials are useful for engineers and analysts, who want to become contributors to Big Data projects, database professionals, college graduates and all, who want to know about Data Science as a career field.
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
Shobhna Srivastava discusses Elsevier's Research Citation network. She talks about how the journey of trying to simplify the existing data processing pipeline, to optimise costs, and choose the right solution to the problem opens the doors to other potential use cases and innovation. Graph technology has been applied to the scientific research domain to enhance content discovery.
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
David Durst of BlackRock presents TopNotch, a system for systematically quality controlling big data. TopNotch uses assertions to define and measure data quality, reuses commands across data sets to maximize efficiency, and institutionalizes knowledge of data sets through plans and commands. It provides a unit testing framework for data with assertions to verify facts, diffs to compare data sets, and views to transform data. This solves the problems of defining data quality, efficiently quality controlling many data sets, and institutionalizing knowledge of data sets.
Alexander Pavlenko, Java Software Engineer, DataArt.Alina Vilk
This document discusses building a Spark connector for Ryft, a hardware appliance that performs high-speed compute on big data. It provides examples of how to query data stored on Ryft using its query language. The Spark connector allows querying and processing Ryft data using Spark and RDDs in a distributed manner. It maps structured Ryft data to DataFrames for SQL queries and shows performance benefits of using Ryft compared to running Spark on EC2 servers.
Introduction to Data Science: A Practical Approach to Big Data AnalyticsIvan Khvostishkov
Meetup Moscow Big Systems/Big Data invited 3 March 2016 an engineer from EMC Corporation, Ivan Khvostishkov, to speak on key technologies and tools used in Big Data analytics, explain differences between Data Science and Business Intelligence and look closer on real use case from the industry. Materials are useful for engineers and analysts, who want to become contributors to Big Data projects, database professionals, college graduates and all, who want to know about Data Science as a career field.
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
Shobhna Srivastava discusses Elsevier's Research Citation network. She talks about how the journey of trying to simplify the existing data processing pipeline, to optimise costs, and choose the right solution to the problem opens the doors to other potential use cases and innovation. Graph technology has been applied to the scientific research domain to enhance content discovery.
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Spark Summit
HP ships millions of PCs, Printers, and other devices every year to customers in all market segments. More customers are seeking services provided with our products enabling new opportunities for HP to create services from the data we can collect from our devices. Every device we ship is an IoT endpoint with powerful CPU to capture rich data. Insights from this data are used internally to improve our products and focus on customer needs.
In this presentation, John will focus on HP’s journey to enabling Big Data analytics from within a large enterprise environment. He will review the challenges and how HP decided on AWS, Apache Spark and Databricks as the foundation for their entry into Big Data Analytics. John will also review how HP uses Spark to build analytic services from the data they generate from their devices.
The document summarizes a project done by Team 10 on text analysis. It involved programmatic data download, cleaning and preprocessing text data, exploratory analysis using TF-IDF, classification using models like logistic regression and random forest, and clustering using LDA and K-means. The aims were to help advertisers identify trending topics for their target audiences and find abusive content. Nested JSON files with blog data were analyzed to find preferred languages and country-wise contributions over time. Clustering identified 4 blog types. The models were implemented in Azure ML Studio with 85% accuracy.
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j
This document discusses Neo4j services that can help customers successfully adopt Neo4j for their graph database projects. It outlines various services Neo4j offers including training, proofs of concept, bootcamps, deployment assistance, expert consulting, and prime project leadership. The document emphasizes that Neo4j services provide senior graph database expertise to help customers avoid common mistakes, accelerate their projects, and ensure successful outcomes.
Ten things to consider for interactive analytics on write once workloadsAbinasha Karana
CONTEXT – Write once data load - Ex. Time-series data.Which Database?
SSD is Good
MPP is Good
Columnar is Good
Logical Partition is Good
Data Skew Partition is Good
Search Engine Index could lead to Index Explosion
Concurrent Users First, Single Query Performance Next
High Throughput File level Snapshot Loading
Calculate cost upfront
Data Structure makes a Big Difference
Introduction to basic data analytics toolsNascenia IT
This document introduces basic data analytics tools. It discusses the data analytics pipeline of collecting, refining, storing, analyzing, and presenting data. It describes tools for each step including Requests and BeautifulSoup for data acquisition, Pandas and SQLAlchemy for data processing and storage, R and RStudio for data analysis, and Plotly and Matplotlib for data visualization. Apache Superset is highlighted as a tool for data visualization and exploration. Challenges of data analytics like data quality, privacy, and scaling are also outlined.
The document discusses an S3 VFD that allows HDF5 files to be served from object storage. It uses existing HDF5 libraries and new VFD drivers to access HDF5 files stored in S3. The S3 VFD uses range gets to read desired data and optimization is performed to avoid small metadata accesses. A new API and data structure are introduced to set credentials for the S3 VFD when opening HDF5 files.
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB
This document discusses MongoDB implementation at Medtronic Energy and Component Center (MECC) to address data management challenges. MECC manufactures components for medical devices and generates a large volume of operational data from various sources. Previously, this data was stored in spreadsheets and relational databases, making it difficult to analyze. MongoDB was implemented to provide a flexible schema for storing component manufacturing and test data as documents. This has allowed for faster querying of complete historical data and improved reporting and analytics. While some gaps remain around enterprise acceptance and tool integration, MongoDB has provided benefits over the previous data management approaches.
Complex analytics should work as nimbly on extremely large data sets as on small ones. You don’t want to think about whether your data fits in-memory, about parallelism, or formatting data for math packages. You’d like to use your favorite analytical language and have it transparently scale up to Big Data volumes.
Paradigm4 presents a webinar about SciDB—the massively scalable, open source, array database with native complex analytics, integrated with R and Python.
Details:
Presenter: Bryan Lewis, Chief Data Scientist, Paradigm4
Day/Time: Tuesday November 12th, 2013 at 1pm EST
Learn how SciDB enables you to:
-Explore rich data sets interactively
-Do complex math in-database—without being constrained -by memory limitations
-Perform multi-dimensional windowing, filtering, and aggregation
-Offload large computations to a commodity hardware cluster—on-premise or in a cloud
-Use R and Python to analyze SciDB arrays as if they were R or Python objects.
-Share data among users, with multi-user data integrity guarantees and version control
Webinar Agenda:
-Introduction to SciDB
-Demo
-Live Q&A
Massively Scalable Computational Finance with SciDBParadigm4Inc
Hedge funds, investment managers and prop shops need to keep pace with rapidly growing data volumes from many sources.
SciDB—an advanced computational database programmable from R and Python—scales out to petabyte volumes and facilitates rapid integration of diverse data sources. Open source and running on commodity hardware, SciDB is extensible and scales cost effectively.
Attend this webinar to learn how quants and system developers harness SciDB’s massively scalable complex analytics to solve hard problems faster. SciDB’s native array storage is optimized for time-series data, delivering fast windowed aggregates and complex analytics, without time-consuming data extraction.
Webinar presenters will demonstrate real world use cases, including the ability to quickly:
1. Generate aggregated order books across multiple exchanges
2. Create adjusted continuous futures contracts
3. Analyze complex financial networks to detect anomalous behavior
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
Presentation given at OR2019 Hamburg
DSpace 7 is a major update of the DSpace platform with a new substantial different architecture than previous DSpace versions. The need to provide a fully-fledged REST API layer on top of which the new Angular (javascript) UI has been built was the opportunity to move to updated technologies, standards and best practice for the REST layer previously quickly introduced in the DSpace landscape to meet urgent needs of interoperability.
Indeed, the goal of the new REST API is to meet the level 3 of the famous Richardson Maturity Model [1] allowing a client application or its developer to potentially learn how to use the API without pre-knowledge or need to consult external information. The adoption of uniform, consistent and self-documented behaviour will drastically reduce the effort for developers to interact with DSpace.
Moreover, each operation that can be performed from the UI as an anonymous user or a user with whatever privilege, including repository managers and administrators, can now be done via a standard REST API. To offer a strong interoperability layer is crucial to provide a stable, well documented and fully tested solution so that integration will not easily break from one version to another: for this reason the REST development adopted a Test-Driven Development and Contract [2] first approach.
The presentation will illustrate the adopted standards HATEOAS [3], HAL [4], JWT [5], ALPS [6] showing how to interact with the new REST API to get and manipulate information.
Different integration scenarios will be presented explaining how the new REST API can be used to implement them. This will include reuse of the information available in DSpace in other contexts such as personal, departmental or institutional websites. Integration of DSpace in research workflows for data acquisition, embedding of the repository in wide institutional processes like ETD preparation triggering workflow automation in response of external events. Quickly prototyping end users functionalities such as notification services, reporting tools, batch processing in the language of preference of developers.
The development processes and technologies [7] will also be quickly introduced as a reference to provide direction to those interested in customizing the new REST API or to participate in its future development.
Implementing BigPetStore with Apache FlinkMárton Balassi
The document outlines how to implement BigPetStore, a blueprint for Flink users, in under 500 lines of code. It describes generating sample data, performing ETL with both the DataSet and Table APIs, training a matrix factorization model with FlinkML for recommendations, and serving recommendations with the DataStream API. The goal is to demonstrate end-to-end workflows in Flink that go beyond WordCount by mixing APIs for data generation, cleaning, machine learning, and streaming predictions.
Over the past two decades, the Big Data stack has reshaped and evolved quickly with numerous innovations driven by the rise of many different open source projects and communities. In this meetup, speakers from Uber, Alibaba, and Alluxio will share best practices for addressing the challenges and opportunities in the developing data architectures using new and emerging open source building blocks. Topics include data format (ORC) optimization, storage security (HDFS), data format (Parquet) layers, and unified data access (Alluxio) layers.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
Corva's analytics platform enables real-time engineering and machine learning predictions and powers faster and safer drilling. The platform utilizes AWS serverless Lambda & extensible, data-driven API with MongoDB to handle 100,000+ requests per minute of streaming sensor data.
Getting started with Cosmos DB + Linkurious EnterpriseLinkurious
Nowadays, many real-world applications generate data that is naturally connected, but traditional systems fail to capture the value it represents. Thanks to its graph API, the multi-model database Cosmos DB lets you model and store graph-like data. On top of Cosmos DB, Linkurious Enterprise is turnkey solution to detect and investigate insights through an interface for graph data visualization and analysis.
In this presentation, we will explain the value of graphs and show how to get started with Cosmos DB and Linkurious Enterprise to accelerate the discovery of new insights in your connected data.
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
This document discusses Apache Arrow, an open source project that provides cross-language data structures and algorithms for efficient data analytics. It summarizes the history and goals of Arrow, provides examples of how it has been adopted, and outlines ongoing development initiatives. Key points include that Arrow aims to accelerate data processing by standardizing columnar data formats and protocols, it has seen widespread adoption with over 50M installs in 2019, and active areas of work include the C++ development platform and Arrow Flight RPC framework.
The document discusses ElasticSearch, an open source search engine and database. It describes how ElasticSearch allows data to flow from various sources into an index using Rivers. It also explains key ElasticSearch concepts like shards, replicas, and index aliases that improve scalability and performance. The document provides examples of ElasticSearch REST API calls for indexing, searching, and retrieving documents.
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Spark Summit
HP ships millions of PCs, Printers, and other devices every year to customers in all market segments. More customers are seeking services provided with our products enabling new opportunities for HP to create services from the data we can collect from our devices. Every device we ship is an IoT endpoint with powerful CPU to capture rich data. Insights from this data are used internally to improve our products and focus on customer needs.
In this presentation, John will focus on HP’s journey to enabling Big Data analytics from within a large enterprise environment. He will review the challenges and how HP decided on AWS, Apache Spark and Databricks as the foundation for their entry into Big Data Analytics. John will also review how HP uses Spark to build analytic services from the data they generate from their devices.
The document summarizes a project done by Team 10 on text analysis. It involved programmatic data download, cleaning and preprocessing text data, exploratory analysis using TF-IDF, classification using models like logistic regression and random forest, and clustering using LDA and K-means. The aims were to help advertisers identify trending topics for their target audiences and find abusive content. Nested JSON files with blog data were analyzed to find preferred languages and country-wise contributions over time. Clustering identified 4 blog types. The models were implemented in Azure ML Studio with 85% accuracy.
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j
This document discusses Neo4j services that can help customers successfully adopt Neo4j for their graph database projects. It outlines various services Neo4j offers including training, proofs of concept, bootcamps, deployment assistance, expert consulting, and prime project leadership. The document emphasizes that Neo4j services provide senior graph database expertise to help customers avoid common mistakes, accelerate their projects, and ensure successful outcomes.
Ten things to consider for interactive analytics on write once workloadsAbinasha Karana
CONTEXT – Write once data load - Ex. Time-series data.Which Database?
SSD is Good
MPP is Good
Columnar is Good
Logical Partition is Good
Data Skew Partition is Good
Search Engine Index could lead to Index Explosion
Concurrent Users First, Single Query Performance Next
High Throughput File level Snapshot Loading
Calculate cost upfront
Data Structure makes a Big Difference
Introduction to basic data analytics toolsNascenia IT
This document introduces basic data analytics tools. It discusses the data analytics pipeline of collecting, refining, storing, analyzing, and presenting data. It describes tools for each step including Requests and BeautifulSoup for data acquisition, Pandas and SQLAlchemy for data processing and storage, R and RStudio for data analysis, and Plotly and Matplotlib for data visualization. Apache Superset is highlighted as a tool for data visualization and exploration. Challenges of data analytics like data quality, privacy, and scaling are also outlined.
The document discusses an S3 VFD that allows HDF5 files to be served from object storage. It uses existing HDF5 libraries and new VFD drivers to access HDF5 files stored in S3. The S3 VFD uses range gets to read desired data and optimization is performed to avoid small metadata accesses. A new API and data structure are introduced to set credentials for the S3 VFD when opening HDF5 files.
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB
This document discusses MongoDB implementation at Medtronic Energy and Component Center (MECC) to address data management challenges. MECC manufactures components for medical devices and generates a large volume of operational data from various sources. Previously, this data was stored in spreadsheets and relational databases, making it difficult to analyze. MongoDB was implemented to provide a flexible schema for storing component manufacturing and test data as documents. This has allowed for faster querying of complete historical data and improved reporting and analytics. While some gaps remain around enterprise acceptance and tool integration, MongoDB has provided benefits over the previous data management approaches.
Complex analytics should work as nimbly on extremely large data sets as on small ones. You don’t want to think about whether your data fits in-memory, about parallelism, or formatting data for math packages. You’d like to use your favorite analytical language and have it transparently scale up to Big Data volumes.
Paradigm4 presents a webinar about SciDB—the massively scalable, open source, array database with native complex analytics, integrated with R and Python.
Details:
Presenter: Bryan Lewis, Chief Data Scientist, Paradigm4
Day/Time: Tuesday November 12th, 2013 at 1pm EST
Learn how SciDB enables you to:
-Explore rich data sets interactively
-Do complex math in-database—without being constrained -by memory limitations
-Perform multi-dimensional windowing, filtering, and aggregation
-Offload large computations to a commodity hardware cluster—on-premise or in a cloud
-Use R and Python to analyze SciDB arrays as if they were R or Python objects.
-Share data among users, with multi-user data integrity guarantees and version control
Webinar Agenda:
-Introduction to SciDB
-Demo
-Live Q&A
Massively Scalable Computational Finance with SciDBParadigm4Inc
Hedge funds, investment managers and prop shops need to keep pace with rapidly growing data volumes from many sources.
SciDB—an advanced computational database programmable from R and Python—scales out to petabyte volumes and facilitates rapid integration of diverse data sources. Open source and running on commodity hardware, SciDB is extensible and scales cost effectively.
Attend this webinar to learn how quants and system developers harness SciDB’s massively scalable complex analytics to solve hard problems faster. SciDB’s native array storage is optimized for time-series data, delivering fast windowed aggregates and complex analytics, without time-consuming data extraction.
Webinar presenters will demonstrate real world use cases, including the ability to quickly:
1. Generate aggregated order books across multiple exchanges
2. Create adjusted continuous futures contracts
3. Analyze complex financial networks to detect anomalous behavior
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
Presentation given at OR2019 Hamburg
DSpace 7 is a major update of the DSpace platform with a new substantial different architecture than previous DSpace versions. The need to provide a fully-fledged REST API layer on top of which the new Angular (javascript) UI has been built was the opportunity to move to updated technologies, standards and best practice for the REST layer previously quickly introduced in the DSpace landscape to meet urgent needs of interoperability.
Indeed, the goal of the new REST API is to meet the level 3 of the famous Richardson Maturity Model [1] allowing a client application or its developer to potentially learn how to use the API without pre-knowledge or need to consult external information. The adoption of uniform, consistent and self-documented behaviour will drastically reduce the effort for developers to interact with DSpace.
Moreover, each operation that can be performed from the UI as an anonymous user or a user with whatever privilege, including repository managers and administrators, can now be done via a standard REST API. To offer a strong interoperability layer is crucial to provide a stable, well documented and fully tested solution so that integration will not easily break from one version to another: for this reason the REST development adopted a Test-Driven Development and Contract [2] first approach.
The presentation will illustrate the adopted standards HATEOAS [3], HAL [4], JWT [5], ALPS [6] showing how to interact with the new REST API to get and manipulate information.
Different integration scenarios will be presented explaining how the new REST API can be used to implement them. This will include reuse of the information available in DSpace in other contexts such as personal, departmental or institutional websites. Integration of DSpace in research workflows for data acquisition, embedding of the repository in wide institutional processes like ETD preparation triggering workflow automation in response of external events. Quickly prototyping end users functionalities such as notification services, reporting tools, batch processing in the language of preference of developers.
The development processes and technologies [7] will also be quickly introduced as a reference to provide direction to those interested in customizing the new REST API or to participate in its future development.
Implementing BigPetStore with Apache FlinkMárton Balassi
The document outlines how to implement BigPetStore, a blueprint for Flink users, in under 500 lines of code. It describes generating sample data, performing ETL with both the DataSet and Table APIs, training a matrix factorization model with FlinkML for recommendations, and serving recommendations with the DataStream API. The goal is to demonstrate end-to-end workflows in Flink that go beyond WordCount by mixing APIs for data generation, cleaning, machine learning, and streaming predictions.
Over the past two decades, the Big Data stack has reshaped and evolved quickly with numerous innovations driven by the rise of many different open source projects and communities. In this meetup, speakers from Uber, Alibaba, and Alluxio will share best practices for addressing the challenges and opportunities in the developing data architectures using new and emerging open source building blocks. Topics include data format (ORC) optimization, storage security (HDFS), data format (Parquet) layers, and unified data access (Alluxio) layers.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
Corva's analytics platform enables real-time engineering and machine learning predictions and powers faster and safer drilling. The platform utilizes AWS serverless Lambda & extensible, data-driven API with MongoDB to handle 100,000+ requests per minute of streaming sensor data.
Getting started with Cosmos DB + Linkurious EnterpriseLinkurious
Nowadays, many real-world applications generate data that is naturally connected, but traditional systems fail to capture the value it represents. Thanks to its graph API, the multi-model database Cosmos DB lets you model and store graph-like data. On top of Cosmos DB, Linkurious Enterprise is turnkey solution to detect and investigate insights through an interface for graph data visualization and analysis.
In this presentation, we will explain the value of graphs and show how to get started with Cosmos DB and Linkurious Enterprise to accelerate the discovery of new insights in your connected data.
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
This document discusses Apache Arrow, an open source project that provides cross-language data structures and algorithms for efficient data analytics. It summarizes the history and goals of Arrow, provides examples of how it has been adopted, and outlines ongoing development initiatives. Key points include that Arrow aims to accelerate data processing by standardizing columnar data formats and protocols, it has seen widespread adoption with over 50M installs in 2019, and active areas of work include the C++ development platform and Arrow Flight RPC framework.
The document discusses ElasticSearch, an open source search engine and database. It describes how ElasticSearch allows data to flow from various sources into an index using Rivers. It also explains key ElasticSearch concepts like shards, replicas, and index aliases that improve scalability and performance. The document provides examples of ElasticSearch REST API calls for indexing, searching, and retrieving documents.
This document discusses building distributed search applications using Apache Solr. It provides an agenda that covers topics such as Solr architecture, schema configuration, indexing data, querying, SolrCloud, and performance factors. It also references a demo app that will be used for hands-on examples during the presentation.
Apache Solr is an open-source enterprise search platform that provides fast, scalable, and reliable full-text search functionality. It powers the search capabilities of many large websites and applications. Some key features of Solr include fast indexing and search, faceted search, autocomplete, geospatial search, and integration with various databases and applications.
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
Presented by M.C. Srivas | MapR. See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
This session addresses the biggest issue facing Big Data – Search, Discovery and Analytics need to be integrated. While creating and maintaining separate SOLR and Hadoop clusters is time consuming, error prone and difficult to keep in synch, most Hadoop installations do not integrate with SOLR within the same cluster. Find out how to easily integrate these capabilities into a single cluster. The session will also touch on some of the technical aspects of Big Data Search including how to; protect against silent index corruption that permeates large distributed clusters, overcome the shard distribution problem by leveraging Hadoop to ensure accurate distributed search results, and provide real-time indexing for distributed search including support for streaming data capture. Srivas will also share relevant experiences from his days at Google where he ran one of the major search infrastructure teams where GFS, BigTable and MapReduce were used extensively.
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
In the age of information and big data, ability to quickly and easily find a needle in a haystack is extremely important. Elasticsearch is a distributed and scalable search engine which provides rich and flexible search capabilities. Social networks (Facebook, LinkedIn), media services (Netflix, SoundCloud), Q&A sites (StackOverflow, Quora, StackExchange) and even GitHub - they all find data for you using Elasticsearch. In conjunction with Logstash and Kibana, Elasticsearch becomes a powerful log engine which allows to process, store, analyze, search through and visualize your logs.
Video: https://www.youtube.com/watch?v=GL7xC5kpb-c
Scripts for the Demo: https://github.com/opanchenko/morning-at-lohika-ELK
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
Building a real time big data analytics platform with solrTrey Grainger
Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
This document discusses a presentation about big data and analytics on AWS. It describes what big data is, provides examples of AWS services for ingesting, storing, processing, analyzing and visualizing big data. It also provides examples of industries using AWS for data analysis and discusses Amazon Kinesis for real-time processing of streaming data. Finally, it discusses putting the various AWS services together in an end-to-end big data workflow.
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
A presentation covering the use of Python frameworks on the Hadoop ecosystem. Covers, in particular, Hadoop Streaming, mrjob, luigi, PySpark, and using Numba with Impala.
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth.
In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting.
Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution.
Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.
This document provides an overview of the Solr search platform, including its main features like full text search, faceting, scalability and APIs. It discusses how Solr indexes and ranks documents, handles queries and facets, and can scale to large datasets through techniques like replication and sharding. The presentation concludes with demonstrating useful Solr commands and its main administrative interface.
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
Microsoft Cosmos DB is the Swiss army NoSQL database in the cloud. It is a multi-model, multi-API, globally-distributed, highly-available, and secure No-SQL database in Azure. In this session, we will explore its capabilities and features through several demos.
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
Data Centric Approach: Our platform is built on the premise of absorbing data from multiple data sources and transforming them to a highly intelligent social network graphs that can be processed to non-obvious relationships.
The document discusses modernizing a data warehouse using the Microsoft Analytics Platform System (APS). APS is described as a turnkey appliance that allows organizations to integrate relational and non-relational data in a single system for enterprise-ready querying and business intelligence. It provides a scalable solution for growing data volumes and types that removes limitations of traditional data warehousing approaches.
This document discusses the 5 year evolution of Dataverse, an open source data repository platform. It began as a tool for collaborative data curation and sharing within research teams. Over time, features were added like dataset version control, APIs, and integration with other systems. The document outlines challenges around maintenance and sustainability. It also covers efforts to improve Dataverse's interoperability, such as integrating metadata standards and controlled vocabularies, and making datasets FAIR compliant. The goal is to establish Dataverse as a core component of the European Open Science Cloud by improving areas like software quality, integration with tools, and standardization.
Big Linked Data ETL Benchmark on Cloud Commodity HardwareLaurens De Vocht
Linked Data storage solutions often optimize for low latency querying and quick responsiveness. Meanwhile, in the back-end, offline ETL processes take care of integrating and preparing the data. In this paper we explain a workflow and the results of a benchmark that examines which Linked Data storage solution and setup should be chosen for different dataset sizes to optimize the cost-effectiveness of the entire ETL process. The benchmark executes diversified stress tests on the storage solutions. The results include an in-depth analysis of four mature Linked Data solutions with commercial support and full SPARQL 1.1 compliance. Whereas traditional benchmarks studies generally deploy the triple stores on premises using high-end hardware, this benchmark uses publicly available cloud machine images for reproducibility and runs on commodity hardware. All stores are tested using their default configuration. In this setting Virtuoso shows the best performance in general. The other tree stores show competitive results and have disjunct areas of excellence. Finally, it is shown that each store’s performance heavily depends on the structural properties of the queries, giving an indication of where vendors can focus their optimization efforts.
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
Dean discusses architecture patterns with InfluxDB Enterprise, covering an overview of InfluxDB Enterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice.
This presentation was provided by David Kuliman of Elsevier, during the NISO event "Content Presentation: Diversity of Formats." The webinar was held on February 10, 2021.
Getting Started with Splunk Breakout SessionSplunk
This document provides an overview and agenda for a presentation on getting started with Splunk Enterprise. The presentation covers an overview of Splunk Inc. and the Splunk platform, a live demonstration of using Splunk to install, index, search, create reports and dashboards, and set alerts. It also discusses deploying Splunk in distributed architectures, the Splunk community resources, and support options. The goal is to help attendees understand how to use the key capabilities of Splunk Enterprise.
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
CXAIR is a new business intelligence tool that uses search engine technology to allow fast and easy analysis of large datasets. It can search across multiple data sources and provide sub-second responses to natural language queries. Unlike traditional BI solutions, CXAIR does not require data aggregation or knowledge of SQL/MDX. It provides a more user-friendly interface that is optimized for speed without sacrificing the ability to drill down into detail. The document argues that search technology is the future of business analytics as it can better handle the increasing volumes of data available.
This was presented by the MongoDB team at the Singapore VIP event on 24th Jan 2019.
The presentation covers-
What is MongoDB
Why MongoDB
MongoDB As a Service, Serverless Platform and Mobile
MongoDB Atlas: Database as a Service (Available on AWS, Azure and Google Cloud)
Usecases
This document summarizes the key features of HPE Universal Discovery software. It discovers assets, applications, and infrastructure dependencies across physical, virtual, and cloud environments and maps them automatically. It integrates with the HPE Universal CMDB to provide a comprehensive and continuously updated view of the IT environment that supports incident and change management. Discovery is automated, agent-based, agentless, and passive to provide complete visibility.
The document discusses Microsoft's ALM Search service architecture and design. It describes plans for the search indexing and query pipelines, including using Elastic Search for indexing and querying across artifacts. It addresses security, performance, deployment topology, and futures like semantic search and integration with on-premise systems. Key points include indexing millions of files in hours, scaling out the indexing pipeline, and supporting cross-account and public repository search.
This document discusses FAIR computational workflows and why they are important. It defines computational workflows as multi-step processes for data analysis and simulation that link computational steps and handle data and processing dependencies. Workflows improve reproducibility, enable automation, and allow for increased sharing and reuse of research. The document outlines how applying FAIR principles to workflows makes them findable, accessible, interoperable, and reusable. This includes using standardized metadata, identifiers, licensing, and formats to describe workflows and ensure their components and data are also FAIR. Adopting FAIR workflows requires support from workflow systems, tools, communities and services.
The document discusses various techniques for managing performance and concurrency in SQL Server databases. It covers new features in SQL Server 2008/R2 such as read committed snapshot isolation, partition-level lock escalation, filtered indexes, and bulk loading. It also discusses tools for monitoring performance like the Utility Control Point and Performance Monitor. The document uses case studies to demonstrate how these techniques can be applied.
This document discusses building robust data pipelines that stream events between applications and services in real time using MongoDB and Confluent. It outlines how event-driven architectures with Apache Kafka and MongoDB can help customers address challenges like reacting to new data sources in real time, modernizing applications, and gaining insights from data. Specific use cases are discussed like application modernization, microservices, analytics, and IoT. Customer examples are provided from healthcare, financial services, and other industries. The benefits of MongoDB's document data model and transactions are highlighted. Finally, the document demonstrates MongoDB Atlas and Confluent Platform capabilities.
This presentation is created for Entrepreneurship and Businness Planning lecture of Computer Systems Engineering master programme at Tallinn University of Technology
This document discusses using Ansible to manage PostgreSQL databases. It begins with an introduction to Ansible, explaining that it is an agentless automation tool used for configuration management, deployment, and orchestration. It then provides an overview of installing and using Ansible to provision infrastructure on Amazon Web Services and install PostgreSQL with streaming replication across multiple servers. Key components of Ansible like templates, variables, tasks, and playbooks are demonstrated in an example repository for automating PostgreSQL configuration management.
This presentation is created for Applied Data Communication lecture of Computer Systems Engineering master programme at Tallinn University of Technology
This document discusses managing PostgreSQL databases using Ansible. It begins with an introduction to Ansible and its key components like inventory, modules, tasks, variables, templates and playbooks. It then demonstrates how to install and use Ansible to provision Amazon EC2 instances, install PostgreSQL packages, setup streaming replication with a master and two standby servers, and add a new standby server. Various PostgreSQL and AWS-specific Ansible modules are also described.
I kept a learning diary for my entrepreneurship class studies at Tallinn University of Technology. Here is my reflections about entrepreneurship. Enjoy reading!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Solr on Cloud
1. 1
Solr on CloudSolr on Cloud
Tallinn University of TechnologyTallinn University of Technology
Introduction to Development in Cloud by Anton Vedešin
Road Management Team
2. 2
What is ?What is ?
“ Solr is the popular, blazing-fast open
source enterprise search platform built on
Apache Lucene™.
Solr powers the search and navigation
features of many of the world's largest
internet sites.
5. 5
Why we need ?Why we need ?
optimised for search
larges volumes of documents
text-centric
results sorted by relevance
read-dominant
document-oriented
flexible schema
7. 7
All terms in the index map to one or more documents.
Terms in the inverted index are sorted in ascending lexicographical
order
Inverted indexInverted index
16. 16
Keyword searchKeyword search
relevant results must be returned quickly
spelling correction is needed
autosuggestions save keystrokes
synonyms of query terms must be recognised
phrase handling is needed
queries with common words must be handled
show more results if the top results aren’t satisfactory
24. 24
Data modeling featuresData modeling features
Result grouping/field collapsing
Flexible query support
Joins
Document clustering
Importing rich document formats such as PDF, Word
Importing data from relational databases
flat denormalised documentflat denormalised document
25. 25
Other important featuresOther important features
Atomic updates with optimistic concurrency
Real-time get
Write-durability using a transaction log
26. 26
SolrCloudSolrCloud
centralised configuration
distributed indexing with no SPoF
automated failover to a new shard leader
queries can be sent to any node in a cluster to trigger
a full, distributed search across all shards, with
failover and load-balancing support built in.
fault-tolerance & high availabilityfault-tolerance & high availability
ZooKeeper
27. 27
Not to use !Not to use !
request a large result set
do deep analytic tasks
querying across relationships
document-level security
30. 30
Who?Who?
Postgres DBA @
Studying MSc Comp. & Systems Eng.
@
Studied BSc Maths Eng. @
Writes blog on 2ndQuadrant
Does some childish
Loves independent films
2ndQuadrant
Tallinn University of Technology
Yildiz Technical
University
blog
paintings
@
Skype: gulcin2ndq
Github:
apatheticmagpie
gulcin