Big Data keeps climbing its hype-cycle hill, now above semantics and many other terms. But what do these in fact mean? In the leading books about Big Data, the word Semantics does not occur.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook. One key feature in Presto is the ability to query data where it lives via a uniform ANSI SQL interface. Presto’s connector architecture creates an abstraction layer for anything that can be expressed in a row-like format, such as HDFS, Amazon S3, Azure Storage, NoSQL stores, relational databases, Kafka streams and even proprietary data stores. Furthermore, a single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
This talk will be co-presented by Facebook and Teradata, the two largest contributors to Presto. The talk will focus on Presto’s ability to query virtually any data source via it’s connector interface. Facebook and Teradata will present some of their use cases of Presto querying various data sources, discuss the existing connectors in Presto, and describe the anatomy of a connector.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook. One key feature in Presto is the ability to query data where it lives via a uniform ANSI SQL interface. Presto’s connector architecture creates an abstraction layer for anything that can be expressed in a row-like format, such as HDFS, Amazon S3, Azure Storage, NoSQL stores, relational databases, Kafka streams and even proprietary data stores. Furthermore, a single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
This talk will be co-presented by Facebook and Teradata, the two largest contributors to Presto. The talk will focus on Presto’s ability to query virtually any data source via it’s connector interface. Facebook and Teradata will present some of their use cases of Presto querying various data sources, discuss the existing connectors in Presto, and describe the anatomy of a connector.
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
R is a hugely popular platform for Data Scientists to create analytic models in many different domains. But when these applications should move from the science lab to the production environment of large enterprises a new set of challenges arises. Independently of R, Spark has been very successful as a powerful general-purpose computing platform. With the introduction of SparkR an exciting new option to productionize Data Science applications has been made available. This talk will give insight into two real-life projects at major enterprises where Data Science applications in R have been migrated to SparkR.
• Dealing with platform challenges: R was not installed on the cluster. We show how to execute SparkR on a Yarn cluster with a dynamic deployment of R.
• Integrating Data Engineering and Data Science: we highlight the technical and cultural challenges that arise from closely integrating these two different areas.
• Separation of concerns: we describe how to disentangle ETL and data preparation from analytic computing and statistical methods.
• Scaling R with SparkR: we present what options SparkR offers to scale R applications and how we applied them to different areas such as time series forecasting and web analytics.
• Performance Improvements: we will show benchmarks for an R applications that took over 20 hours on a single server/single-threaded setup. With moderate effort we have been able to reduce that number to 15 minutes with SparkR. And we will show how we plan to further reduces this to less than a minute in the future.
• Mixing SparkR, SparkSQL and MLlib: we show how we combined the three different libraries to maximize efficiency.
• Summary and Outlook: we describe what we have learnt so far, what the biggest gaps currently are and what challenges we expect to solve in the short- to mid-term.
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...Teresa Giacomini
PostgreSQL is becoming the relational database of choice. An important factor in the rising popularity of Postgres is the extension APIs that allow developers to improve any database module’s behavior. As a result, Postgres users have access to hundreds of extensions today.
In this talk, we're going to first describe extension APIs. Then, we’re going to present four popular Postgres extensions, and demo their use.
* PostGIS turns Postgres into a spatial database through adding support for geographic objects.
* HLL & TopN add approximation algorithms to Postgres. These algorithms are used when real-time responses matter more than exact results.
* pg_partman makes managing partitions in Postgres easy. Through partitions, Postgres provide 5-10x higher performance for time-series data.
* Citus transforms Postgres into a distributed database. To do this, Citus shards data, performs distributed deadlock detection, and parallelizes queries.
Finally, we’ll conclude with why we think Postgres sets the way forward for relational databases.
PostgreSQL is becoming the relational database of choice. One important factor in the rising popularity of Postgres are the extension APIs. These APIs allow developers to extend any database sub-module’s behavior for higher performance, security, or new functionality. As a result, Postgres users have access to over a hundred extensions today, and more to come in the future.
In this talk, I’m going to first describe PostgreSQL’s extension APIs. These APIs are unique to Postgres, and have the potential to change the database landscape. Then, we’re going to present the four most popular Postgres extensions, show the use cases where they are applicable, and demo their usage.
PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL.
HyperLogLog (HLL) & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results.
pgpartman makes creating and managing partitions in Postgres easy. Through careful partition management with pgpartman, Postgres offers 5-10x higher write and query performance for time-series data.
Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries.
After demoing these popular extensions, we’ll conclude with why we think the monolithic relational database is dying and how Postgres sets a path for the future. We’ll end the talk with a Q&A.
In this era of ever growing data, the need for analyzing it for meaningful business insights becomes more and more significant. There are different Big Data processing alternatives like Hadoop, Spark, Storm etc. Spark, however is unique in providing batch as well as streaming capabilities, thus making it a preferred choice for lightening fast Big Data Analysis platforms.
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
So you know you want to write a streaming app but any non-trivial streaming app developer would have to think about these questions:
How do I manage offsets?
How do I manage state?
How do I make my spark streaming job resilient to failures? Can I avoid some failures?
How do I gracefully shutdown my streaming job?
How do I monitor and manage (e.g. re-try logic) streaming job?
How can I better manage the DAG in my streaming job?
When to use checkpointing and for what? When not to use checkpointing?
Do I need a WAL when using streaming data source? Why? When don’t I need one?
In this talk, we’ll share practices that no one talks about when you start writing your streaming app, but you’ll inevitably need to learn along the way.
Apache Spark has quickly become a major tool in the problem space of crunching big data. This presentation tells the history of Spark, when and why to use it, and ends with an example of how easy it is to get started!
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
Spark Summit East Keynote by Ion Stoica
A long-standing grand challenge in computing is to enable machines to act autonomously and intelligently: to rapidly and repeatedly take appropriate actions based on information in the world around them. To address this challenge, at UC Berkeley we are starting a new five year effort that focuses on the development of data-intensive systems that provide Real-Time Intelligence with Secure Execution (RISE). Following in the footsteps of AMPLab, RISELab is an interdisciplinary effort bringing together researchers across AI, robotics, security, and data systems. In this talk I’ll present our research vision and then discuss some of the applications that will be enabled by RISE technologies.
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
R is a hugely popular platform for Data Scientists to create analytic models in many different domains. But when these applications should move from the science lab to the production environment of large enterprises a new set of challenges arises. Independently of R, Spark has been very successful as a powerful general-purpose computing platform. With the introduction of SparkR an exciting new option to productionize Data Science applications has been made available. This talk will give insight into two real-life projects at major enterprises where Data Science applications in R have been migrated to SparkR.
• Dealing with platform challenges: R was not installed on the cluster. We show how to execute SparkR on a Yarn cluster with a dynamic deployment of R.
• Integrating Data Engineering and Data Science: we highlight the technical and cultural challenges that arise from closely integrating these two different areas.
• Separation of concerns: we describe how to disentangle ETL and data preparation from analytic computing and statistical methods.
• Scaling R with SparkR: we present what options SparkR offers to scale R applications and how we applied them to different areas such as time series forecasting and web analytics.
• Performance Improvements: we will show benchmarks for an R applications that took over 20 hours on a single server/single-threaded setup. With moderate effort we have been able to reduce that number to 15 minutes with SparkR. And we will show how we plan to further reduces this to less than a minute in the future.
• Mixing SparkR, SparkSQL and MLlib: we show how we combined the three different libraries to maximize efficiency.
• Summary and Outlook: we describe what we have learnt so far, what the biggest gaps currently are and what challenges we expect to solve in the short- to mid-term.
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...Teresa Giacomini
PostgreSQL is becoming the relational database of choice. An important factor in the rising popularity of Postgres is the extension APIs that allow developers to improve any database module’s behavior. As a result, Postgres users have access to hundreds of extensions today.
In this talk, we're going to first describe extension APIs. Then, we’re going to present four popular Postgres extensions, and demo their use.
* PostGIS turns Postgres into a spatial database through adding support for geographic objects.
* HLL & TopN add approximation algorithms to Postgres. These algorithms are used when real-time responses matter more than exact results.
* pg_partman makes managing partitions in Postgres easy. Through partitions, Postgres provide 5-10x higher performance for time-series data.
* Citus transforms Postgres into a distributed database. To do this, Citus shards data, performs distributed deadlock detection, and parallelizes queries.
Finally, we’ll conclude with why we think Postgres sets the way forward for relational databases.
PostgreSQL is becoming the relational database of choice. One important factor in the rising popularity of Postgres are the extension APIs. These APIs allow developers to extend any database sub-module’s behavior for higher performance, security, or new functionality. As a result, Postgres users have access to over a hundred extensions today, and more to come in the future.
In this talk, I’m going to first describe PostgreSQL’s extension APIs. These APIs are unique to Postgres, and have the potential to change the database landscape. Then, we’re going to present the four most popular Postgres extensions, show the use cases where they are applicable, and demo their usage.
PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL.
HyperLogLog (HLL) & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results.
pgpartman makes creating and managing partitions in Postgres easy. Through careful partition management with pgpartman, Postgres offers 5-10x higher write and query performance for time-series data.
Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries.
After demoing these popular extensions, we’ll conclude with why we think the monolithic relational database is dying and how Postgres sets a path for the future. We’ll end the talk with a Q&A.
In this era of ever growing data, the need for analyzing it for meaningful business insights becomes more and more significant. There are different Big Data processing alternatives like Hadoop, Spark, Storm etc. Spark, however is unique in providing batch as well as streaming capabilities, thus making it a preferred choice for lightening fast Big Data Analysis platforms.
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
So you know you want to write a streaming app but any non-trivial streaming app developer would have to think about these questions:
How do I manage offsets?
How do I manage state?
How do I make my spark streaming job resilient to failures? Can I avoid some failures?
How do I gracefully shutdown my streaming job?
How do I monitor and manage (e.g. re-try logic) streaming job?
How can I better manage the DAG in my streaming job?
When to use checkpointing and for what? When not to use checkpointing?
Do I need a WAL when using streaming data source? Why? When don’t I need one?
In this talk, we’ll share practices that no one talks about when you start writing your streaming app, but you’ll inevitably need to learn along the way.
Apache Spark has quickly become a major tool in the problem space of crunching big data. This presentation tells the history of Spark, when and why to use it, and ends with an example of how easy it is to get started!
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
Spark Summit East Keynote by Ion Stoica
A long-standing grand challenge in computing is to enable machines to act autonomously and intelligently: to rapidly and repeatedly take appropriate actions based on information in the world around them. To address this challenge, at UC Berkeley we are starting a new five year effort that focuses on the development of data-intensive systems that provide Real-Time Intelligence with Secure Execution (RISE). Following in the footsteps of AMPLab, RISELab is an interdisciplinary effort bringing together researchers across AI, robotics, security, and data systems. In this talk I’ll present our research vision and then discuss some of the applications that will be enabled by RISE technologies.
This presentation presents OpenLink Virtuoso -- The Prometheus of RDF -- including Linked Data Verticals and Patterns, involving Web and Big Data, SPARQL and RDF, RDF Tax and many others.
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
During the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California, Zhe Wu, Software Architect at Oracle Spatial and Graph, explained how is his team trying to bridge RDF Graph and Property Data Models.
GraphTech Ecosystem - part 1: Graph DatabasesLinkurious
The graph ecosystem presentation lists and introduces a vast majority of storage systems for graph-like data: native graph databases, RDF databases, multi-model systems or systems with a graph API.
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
In this security solution demo, we have integrated Oracle NoSQL DB with InfiniteGraph to demonstrate the power of using the right tools for the solution. By integrating the key value technology of Oracle with the InfiniteGraph distributed graph database, we are able to create new views of existing Call Detail Record (CDR) details to enable discovery of connections, paths and behaviors that may otherwise be missed.
Discover how to add value to your existing Big Data to increase revenues and performance!
Demi Ben Ari - Apache Spark 101 - First Steps into distributed computing:
The world has changed, having one huge server won’t do the job, the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with streaming, SQL, machine learning and graph processing. Showing the basics of Apache Spark and distributed computing.
Demi is a Software engineer, Entrepreneur and an International Tech Speaker.
Demi has over 10 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Co-Founder of the “Big Things” Big Data community and Google Developer Group Cloud.
Big Data Expert, but interested in all kinds of technologies, from front-end to backend, whatever moves data around.
A gentle introduction to Apache Spark from the theorem of Resilient Distributed Datasets to deploying software to the core platform, Spark Streaming, and Spark SQL
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsAndrew Morgan
Understand how you can get the benefits you're looking for from NoSQL data stores without sacrificing the power and flexibility of the world's most popular open source database - MySQL.
In-depth overview of Oracle Real Application Clusters (RAC) 12c Release 2, which was first presented during UKOUG Tech16 under the title "Under the Hood of Oracle Real Application Clusters (RAC) 12c Release 2" and before Oracle Database 12c Release 2 became generally available (GA) in March 2017.
Introduction to Property Graph Features (AskTOM Office Hours part 1) Jean Ihm
1st in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
Xavier Lopez (PM Senior Director) and Zhe Wu (Graph Architect) will share a brief intro to what property graphs can do for you, and take your questions - on property graphs or any other aspect of Oracle Database Spatial and Graph features. With property graphs, you can analyze relationships in Big Data like social networks, financial transactions, or IoT sensor networks; identify influencers; discover patterns of fraudulent behavior; recommend products, and much more -- right inside Oracle Database.
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
There’s a growing number of data scientists that use R as their primary language. While the SparkR API has made tremendous progress since release 1.6, with major advancements in Apache Spark 2.0 and 2.1, it can be difficult for traditional R programmers to embrace the Spark ecosystem.
In this session, Zaidi will discuss the sparklyr package, which is a feature-rich and tidy interface for data science with Spark, and will show how it can be coupled with Microsoft R Server and extended with it’s lower-level API to become a full, first-class citizen of Spark. Learn how easy it is to go from single-threaded, memory-bound R functions to multi-threaded, multi-node, out-of-memory applications that can be deployed in a distributed cluster environment with minimal amount of code changes. You’ll also get best practices for reproducibility and performance by looking at a real-world case study of default risk classification and prediction entirely through R and Spark.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.