Session from BGOUG I presented in June, 2016
Big data is one of the biggest buzzword in today's market. Terms like Hadoop, HDFS, YARN, Sqoop, and non-structured data has been scaring DBA's since 2010 - but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers needs to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world, and where traditional databases fits into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into Big Data and Hadoop professionals and experts.
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemZohar Elkayam
This is a presentation which was presented in multiple forums (in one way or the other). This is a short introduction for Oracle personal (DBAs and DB Developers) for Big Data and the Hadoop Ecosystem.
In the agenda:
• What is the Big Data challenge?
• A Big Data Solution: Apache Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools
• Another Big Data Solution: Apache Spark
• Where does the DBA fits in?
This presentation was presented in DOAG 2016, HROUG 2016, BGOUG 2016, ILOUG Tech Days 2016 and other small private sessions (Israel Technology Police leaders, CIO forum, Amdocs, and others).
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
Oracle multi tenant architecture is one of the biggest changes in Oracle 12c. In this presentation, we will review this major change and see how it can be effective for daily use.
The agenda:
- The Multitenant Container Database Architecture
- Multitenant Benefits and Impacts
- CDB and PDB Deployments and Provisioning
- Tools and Self-service tools
This presentation is based on work of Ami Aharonovich and was adapted with his permission.
Adding real time reporting to your database oracle db in memoryZohar Elkayam
This is a presentation I gave in the UKOUG Scotland user conference in June 2015. This is presentation describe a proof of concept we did for Clarizen on the Oracle 12c Database In Memory Option.
Introduction to Oracle Data Guard BrokerZohar Elkayam
This is an old deck I recently renewed for a customer session. This is the introduction to Oracle Data Guard broker feature, how to deploy it, how to use it and what are its benefits.
This presentation is based on version 11g but most of it is also compatible to Oracle 12c,
Agenda:
- Oracle Data Guard overview
- Dataguard broker introduction
- Configuring and using the data guard
- Live Demos
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
The Hadoop Ecosystem for developers session in DevGeekWeek in Israel.
This was a day long session talking about big data problems and the hadoop solution. we also talked about Spark and NoSQL.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
Big data is one of the biggest buzzwords in today's market. Terms such as Hadoop, HDFS, YARN, Sqoop, and non-structured data have been scaring DBAs since 2010, but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers need to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world and where traditional databases fit into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into big data and Hadoop professionals and experts.
This is the presentation I gave in Kscope17, on June 27, 2017.
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemZohar Elkayam
This is a presentation which was presented in multiple forums (in one way or the other). This is a short introduction for Oracle personal (DBAs and DB Developers) for Big Data and the Hadoop Ecosystem.
In the agenda:
• What is the Big Data challenge?
• A Big Data Solution: Apache Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools
• Another Big Data Solution: Apache Spark
• Where does the DBA fits in?
This presentation was presented in DOAG 2016, HROUG 2016, BGOUG 2016, ILOUG Tech Days 2016 and other small private sessions (Israel Technology Police leaders, CIO forum, Amdocs, and others).
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
Oracle multi tenant architecture is one of the biggest changes in Oracle 12c. In this presentation, we will review this major change and see how it can be effective for daily use.
The agenda:
- The Multitenant Container Database Architecture
- Multitenant Benefits and Impacts
- CDB and PDB Deployments and Provisioning
- Tools and Self-service tools
This presentation is based on work of Ami Aharonovich and was adapted with his permission.
Adding real time reporting to your database oracle db in memoryZohar Elkayam
This is a presentation I gave in the UKOUG Scotland user conference in June 2015. This is presentation describe a proof of concept we did for Clarizen on the Oracle 12c Database In Memory Option.
Introduction to Oracle Data Guard BrokerZohar Elkayam
This is an old deck I recently renewed for a customer session. This is the introduction to Oracle Data Guard broker feature, how to deploy it, how to use it and what are its benefits.
This presentation is based on version 11g but most of it is also compatible to Oracle 12c,
Agenda:
- Oracle Data Guard overview
- Dataguard broker introduction
- Configuring and using the data guard
- Live Demos
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
The Hadoop Ecosystem for developers session in DevGeekWeek in Israel.
This was a day long session talking about big data problems and the hadoop solution. we also talked about Spark and NoSQL.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
Big data is one of the biggest buzzwords in today's market. Terms such as Hadoop, HDFS, YARN, Sqoop, and non-structured data have been scaring DBAs since 2010, but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers need to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world and where traditional databases fit into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into big data and Hadoop professionals and experts.
This is the presentation I gave in Kscope17, on June 27, 2017.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Docker Concepts for Oracle/MySQL DBAs and DevOpsZohar Elkayam
Oracle Week 2017 Slides
Agenda:
Docker overview – why do we even need containers?
Installing Docker and getting started
Images and Containers
Docker Networks
Docker Storage and Volumes
Oracle and Docker
Docker tools, GUI and Swarm
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
Oracle Week 2017 slides.
Agenda:
Basics: How and What To Tune?
Using the Automatic Workload Repository (AWR)
Using AWR-Based Tools: ASH, ADDM
Real-Time Database Operation Monitoring (12c)
Identifying Problem SQL Statements
Using SQL Performance Analyzer
Tuning Memory (SGA and PGA)
Parallel Execution and Compression
Oracle Database 12c Performance New Features
This is a presentation I gave at UKOUG user conference in Scotland. SQLcl is a new command line tool from the developers of SQL Developer in Oracle, This presentation is accompanied by live demo that can be downloaded from my blog.
Oracle Database In-Memory Option for ILOUGZohar Elkayam
Oracle 12.1.0.2 introduced a new feature: the Oracle In Memory Option (Databases In Memory - DBIM).
This is the presentation which was given before the ILOUG DBA SIG where I introduced the technology and how to use it.
Oracle 12c New Features For Better PerformanceZohar Elkayam
Oracle 12cR1 and 12cR2 came with some great features for better performance and scaling. In this session we will talk about some of the new features that might improve performance greatly: Optimizer changes, adaptive plans improvements, changes to statistics gathering and we'll get to know Oracle 12cR2 new sharding option
On the agenda:
- Oracle Database In Memory (Column Store)
- Oracle Sharding (12.2.0.1)
- Optimizer changes in 12c
- Statistics changes in 12c.
Presented first at ilOUG - Israel Oracle User Group meetup in February 2017.
[including promised hidden slide.. :) ]
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
Developing applications to run on the most important Database Manager in the world ? Why not do it in the cloud? With Oracle Database Cloud Service, developers can quickly and easily access the power and flexibility of the Oracle database in the cloud. With a choice between an instance or a dedicated database with full administrative control, or a schema dedicated to a development platform and full deployment managed by Oracle, developers can decide how much control they have over their development environments. Attend this session to learn more about the features and benefits of Oracle Database Cloud.
Oracle in the 2014 edition of its Open World rolled out new database public cloud service with its DBaaS offerings, but this is just a piece in each company's technological architecture. Businesses still have the need to create a Private cloud and discover the driver to create it; Wether it is a measured service,consolidation or rapid provisioning, finding this driver will be the initial building block for it. This presentation will give you an insight on how a Private Cloud is architected, how the service catalog is the most important brick and how get the benefit of this upcoming era of Databases.
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
Presentation used by Lucas Jellema and Marco Gralike during the AMIS Oracle Database 12c Launch event on Monday the 15th of July 2013 (much thanks to Tom Kyte, Oracle, for being allowed to use some of his material)
M.
Providing true interactive and scalable BI on Hadoop is proven to be one of the biggest challenges that is preventing completion of legacy EDW OLAP system transit to Hadoop. While we have all seen many benchmarks running consecutive queries claiming success, having thousands of concurrent business users sending complicated generated queries from their dashboards over billions of records while delivering interactive speed is yet to be seen.
In this session we will discuss how an architecture that replaces full-scan brute-force approach with adaptive indexing and auto-generated cubes can dramatically reduce the resources and effort per query, resulting in interactive performance for high concurrency workloads and explain how this is achieved with minimum data engineering efforts. We will also discuss how this architecture can be seamlessly integrated with Hive to provide a complete OLAP-on-Hadoop solution.
Session will include live demo of complex business dashboards connected to Hive and accessing billions of rows at interactive speed.
Speaker
Boaz Raufman, CTO and Co-Founder, JethroData
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session about Apache Spark.
In the agenda:
- The Big Data problem and possible solutions
- Basic Spark Core
- Working with RDDs
- Working with Spark Cluster and Parallel programming
- Spark modules: Spark SQL and Spark Streaming
- Performance and Troubleshooting
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Docker Concepts for Oracle/MySQL DBAs and DevOpsZohar Elkayam
Oracle Week 2017 Slides
Agenda:
Docker overview – why do we even need containers?
Installing Docker and getting started
Images and Containers
Docker Networks
Docker Storage and Volumes
Oracle and Docker
Docker tools, GUI and Swarm
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
Oracle Week 2017 slides.
Agenda:
Basics: How and What To Tune?
Using the Automatic Workload Repository (AWR)
Using AWR-Based Tools: ASH, ADDM
Real-Time Database Operation Monitoring (12c)
Identifying Problem SQL Statements
Using SQL Performance Analyzer
Tuning Memory (SGA and PGA)
Parallel Execution and Compression
Oracle Database 12c Performance New Features
This is a presentation I gave at UKOUG user conference in Scotland. SQLcl is a new command line tool from the developers of SQL Developer in Oracle, This presentation is accompanied by live demo that can be downloaded from my blog.
Oracle Database In-Memory Option for ILOUGZohar Elkayam
Oracle 12.1.0.2 introduced a new feature: the Oracle In Memory Option (Databases In Memory - DBIM).
This is the presentation which was given before the ILOUG DBA SIG where I introduced the technology and how to use it.
Oracle 12c New Features For Better PerformanceZohar Elkayam
Oracle 12cR1 and 12cR2 came with some great features for better performance and scaling. In this session we will talk about some of the new features that might improve performance greatly: Optimizer changes, adaptive plans improvements, changes to statistics gathering and we'll get to know Oracle 12cR2 new sharding option
On the agenda:
- Oracle Database In Memory (Column Store)
- Oracle Sharding (12.2.0.1)
- Optimizer changes in 12c
- Statistics changes in 12c.
Presented first at ilOUG - Israel Oracle User Group meetup in February 2017.
[including promised hidden slide.. :) ]
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
Developing applications to run on the most important Database Manager in the world ? Why not do it in the cloud? With Oracle Database Cloud Service, developers can quickly and easily access the power and flexibility of the Oracle database in the cloud. With a choice between an instance or a dedicated database with full administrative control, or a schema dedicated to a development platform and full deployment managed by Oracle, developers can decide how much control they have over their development environments. Attend this session to learn more about the features and benefits of Oracle Database Cloud.
Oracle in the 2014 edition of its Open World rolled out new database public cloud service with its DBaaS offerings, but this is just a piece in each company's technological architecture. Businesses still have the need to create a Private cloud and discover the driver to create it; Wether it is a measured service,consolidation or rapid provisioning, finding this driver will be the initial building block for it. This presentation will give you an insight on how a Private Cloud is architected, how the service catalog is the most important brick and how get the benefit of this upcoming era of Databases.
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
Presentation used by Lucas Jellema and Marco Gralike during the AMIS Oracle Database 12c Launch event on Monday the 15th of July 2013 (much thanks to Tom Kyte, Oracle, for being allowed to use some of his material)
M.
Providing true interactive and scalable BI on Hadoop is proven to be one of the biggest challenges that is preventing completion of legacy EDW OLAP system transit to Hadoop. While we have all seen many benchmarks running consecutive queries claiming success, having thousands of concurrent business users sending complicated generated queries from their dashboards over billions of records while delivering interactive speed is yet to be seen.
In this session we will discuss how an architecture that replaces full-scan brute-force approach with adaptive indexing and auto-generated cubes can dramatically reduce the resources and effort per query, resulting in interactive performance for high concurrency workloads and explain how this is achieved with minimum data engineering efforts. We will also discuss how this architecture can be seamlessly integrated with Hive to provide a complete OLAP-on-Hadoop solution.
Session will include live demo of complex business dashboards connected to Hive and accessing billions of rows at interactive speed.
Speaker
Boaz Raufman, CTO and Co-Founder, JethroData
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session about Apache Spark.
In the agenda:
- The Big Data problem and possible solutions
- Basic Spark Core
- Working with RDDs
- Working with Spark Cluster and Parallel programming
- Spark modules: Spark SQL and Spark Streaming
- Performance and Troubleshooting
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
A summarized version of a presentation regarding Big Data architecture, covering from Big Data concept to Hadoop and tools like Hive, Pig and Cassandra
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
http://www.learntek.org/product/big-data-and-hadoop/
http://www.learntek.org
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional.
Hi all, its presentation about the big data analysis done using a data mining tool known as HADOOP, which is based on Distributive file system and uses parallel computing for working.
The art of querying – newest and advanced SQL techniquesZohar Elkayam
Presentation from Oracle Week 2017.
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Regular Expressions
Oracle 12c new rows pattern matching
XML and JSON handling with SQL
Oracle 12c (12.1 + 12.2) new features
SQL Developer Command Line tool (if time allows)
Oracle 18c
Oracle Advanced SQL and Analytic FunctionsZohar Elkayam
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multidimension aggregation and analytic functions still remain relatively unknown. In this session, we will explore some of the common real-world usages for analytic function and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups, understand aggregation of multiple dimensions without a group by, see how to do inter-row calculations, and much more.
This is the presentation slides which was presented in Kscope 17 on June 28, 2017.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Zohar Elkayam
Big data is one of the biggest buzzword in today's market. Terms like Hadoop, HDFS, YARN, Sqoop, and non-structured data has been scaring DBA's since 2010 - but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers needs to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world, and where traditional databases fits into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into Big Data and Hadoop professionals and experts.
Learning Objective #1: What is the Big Data challenge
Learning Objective #2: Learn about Hadoop - HDFS, MapReduce and Yarn
Learning Objective #3: Understand where a DBA fits in this world
Advanced PL/SQL Optimizing for Better Performance 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session. This includes new features from both 12cR1 and 12cR2.
Agenda:
Developing PL/SQL:
- Composite datatypes, advanced cursors, dynamic SQL, tracing, and more…
Compiling PL/SQL:
- dependencies, optimization levels, and DBMS_WARNING
Tuning PL/SQL:
- GTT, Result cache and Memory handling
- Oracle 11g, 12cR1 and 12cR2 new useful features
- SQLcl – New replacement tool for SQL*Plus (if we have time)
This is a presentation from Oracle Week 2016 (Israel). This is a newer version from last year with new 12cR2 features and demo.
In the agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Regular Expressions
Oracle 12c new rows pattern matching
XML and JSON handling with SQL
Oracle 12c (12.1 + 12.2) new features
SQL Developer Command Line tool
OOW2016: Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
This is the presentation I gave on the Oracle Open World 2016 - the topic was group functions and analytic functions.
We talked about reporting analytic functions, ranking and couple of Oracle 12c new features like top-n query syntax and pattern matching.
This presentation has the bonus slides which were not presented at the event itself, as promissed
Is SQLcl the Next Generation of SQL*Plus?Zohar Elkayam
Session from ILOUG I presented in May, 2016
Introducing the new tool from the developers of SQL Developer: SQLcl – a new command line tool from the SQL Developer team that might replace SQL*Plus and all of its functions which has been around for over 30 years!
In this session, we will explore the new functionality of the SQLcl, and use a live demonstration to show what SQLcl has to offer over the old SQL*Plus. We will use real life example to see what makes this tool such a time saver in day-to-day tasks for DBAs and developers who prefer using the command line interface.
Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
Session from ILOUG I presented in May, 2016
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multi-dimension aggregation and analytic functions are still relatively remain unknown. In this session, we will explore some of the common real-world usages for analytic function, and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups; understand aggregation of multiple dimensions without a group by; see how to do inter-row calculations, and much-much more…
Together we will see how we can unleash the power of analytics using Oracle 11g best practices and Oracle 12c new features.
Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
Session from BGOUG I presented in June, 2016
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multi-dimension aggregation and analytic functions are still relatively remain unknown. In this session, we will explore some of the common real-world usages for analytic function, and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups; understand aggregation of multiple dimensions without a group by; see how to do inter-row calculations, and much-much more…
Together we will see how we can unleash the power of analytics using Oracle 11g best practices and Oracle 12c new features.
Advanced PLSQL Optimizing for Better PerformanceZohar Elkayam
A Presentation from Oracle Week 2015 in Israel
Agenda:
• Developing PL/SQL:
o Composite Data Types: Records, Collections and Table type
o Advanced Cursors: Ref cursor, Cursor function, Cursor subquery in PL/SQL
o Bulk Binding
o Dynamic SQL – SQL Injection
o Tracing PL/SQL Execution
o Design patterns for PL/SQL: Autonomous Transactions, Invoker and Definer rights, serially_reusable code
o Triggers Improvements
• Compiling PL/SQL:
o PL/SQL Fine-Grain Dependency Management
o PLSQL_OPTIMIZE_LEVEL parameter
o PL/SQL Compile-Time Warnings and Using DBMS_WARNING package
• Tuning PL/SQL:
o Handling Packages in Memory
o Global Temporary Tables
o PL/SQL Function Result Cache and pitfalls
• Oracle Database 12c PL/SQL new features: What is new in Oracle 12c
o Language Usability Enhancements
o New Limitations
• Additional useful features, Tips and Tricks for better performance
Oracle Week 2015 presentation (Presented on November 15, 2015)
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Oracle 12c new rows pattern matching feature
XML and JSON handling with SQL
Regular Expressions
SQLcl – a new replacement tool for SQL*Plus from Oracle
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
2. Who am I?
• Zohar Elkayam, CTO at Brillix
• DBA, team leader, database trainer, public speaker, and a
senior consultant for over 18 years
• Oracle ACE Associate
• Involved with Big Data projects since 2011
• Blogger – www.realdbamagic.com and www.ilDBA.co.il
http://brillix.co.il2
3. About Brillix
• Brillix is a leading company that specialized in Data
Management
• We provide professional services, training and
consulting for Databases, Security, NoSQL, and Big
Data solutions
• Providing the Brillix Big Data Experience Center
3
4. Agenda
• What is the Big Data challenge?
• A Big Data Solution: Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other
tools
• Where does the DBA fits in?
http://brillix.co.il4
7. Volume
• Big data come in one size: Big.
• Size is measured in Terabyte(1012), Petabyte(1015),
Exabyte(1018), Zettabyte (1021)
• The storing and handling of the data becomes an issue
• Producing value out of the data in a reasonable time is
an issue
http://brillix.co.il7
8. Variety
• Big Data extends beyond structured data, including semi-structured
and unstructured information: logs, text, audio and videos
• Wide variety of rapidly evolving data types requires highly flexible
stores and handling
http://brillix.co.il8
Un-Structured Structured
Objects Tables
Flexible Columns and Rows
Structure Unknown Predefined Structure
Textual and Binary Mostly Textual
9. Velocity
• The speed in which the data is being generated and
collected
• Streaming data and large volume data movement
• High velocity of data capture – requires rapid ingestion
• Might cause a backlog problem
http://brillix.co.il9
10. Okay, So What Defines Big Data?
• When the data is too big or moves too fast to handle in
a sensible amount of time
• When the data doesn’t fit any conventional database
structure
• When the solution to the business need becomes part
of the problem
• When we think that we can still produce value from that
data and want to handle it
http://brillix.co.il10
11. Value
Big data is not about the size of the data,
It’s about the value within the data
http://brillix.co.il11
14. Big Data in Practice
• Big data is big: technological infrastructure solutions
needed
• Big data is complicated:
• We need developers to manage handling of the data
• We need devops to manage the clusters
• We need data analysts and data scientists to produce value
http://brillix.co.il14
15. Infrastructure Challenges
• Infrastructure that is built for:
• Large-scale
• Distributed / scaled out
• Data-intensive jobs that spread the problem across clusters
of server nodes
http://brillix.co.il15
16. Infrastructure Challenges (cont.)
• Storage: Efficient and cost-effective enough to capture
and store terabytes, if not petabytes, of data
• Network infrastructure that can quickly import large data
sets and then replicate it to various nodes for
processing
• Security capabilities that protect highly-distributed
infrastructure and data
16 http://brillix.co.il
17. A Big Data Solution:
Apache Hadoop
http://brillix.co.il17
18. Apache Hadoop
• Open source project run by Apache Foundation (2006)
• Hadoop brings the ability to cheaply process large
amounts of data, regardless of its structure
• It Is has been the driving force behind the growth of the
big data industry
• Get the public release from:
• http://hadoop.apache.org/core/
http://brillix.co.il18
19. Original Hadoop Components
• HDFS: Hadoop Distributed File System – distributed file
system that runs in a clustered environment
• MapReduce – programming paradigm for running
processes over a clustered environments.
Main idea: bring the program to the data
19 http://brillix.co.il
20. Hadoop Benefits
• Reliable solution based on unreliable hardware
• Designed for large files
• Load data first, structure later
• Designed to maximize throughput of large scans
• Designed to leverage parallelism
• Designed for scale out
• Flexible development platform
• Solution Ecosystem
20
21. What Hadoop Is Not?
• Hadoop is not a database – it does not a replacement for
DW, or for other relational databases
• Hadoop is not for OLTP/real-time systems
• Very good for large amount, not so much for smaller sets
• Designed for clusters – there is no Hadoop monster
server (single server)
21 http://brillix.co.il
22. Hadoop Limitations
• Hadoop is scalable but it’s not fast
• Some assembly is required
• Batteries are not included
• DIY mindset
• Open source limitations apply
• Technology is changing very rapidly
http://brillix.co.il22
24. Original Hadoop 1.0 Components
• HDFS: Hadoop Distributed File System – distributed file
system that runs in a clustered environment
• MapReduce – programming paradigm for running
processes over a clustered environments
24 http://brillix.co.il
25. Hadoop 2.0
• Hadoop 2.0 changed the Hadoop conception and
introduced better resource management and speed:
• Hadoop Common
• HDFS
• YARN
• Multiple data processing
frameworks including
MapReduce, Spark and
others
http://brillix.co.il25
26. HDFS is...
• A distributed file system
• Designed to reliably store data using commodity hardware
• Designed to expect hardware failures and stay resilient
• Intended for large files
• Designed for batch inserts
26 http://brillix.co.il
27. Files and Blocks
• Files are split into blocks (single unit of storage)
• Managed by Namenode and stored on Datanodes
• Transparent to users
• Replicated across machines at load time
• Same block is stored on multiple machines
• Good for fault-tolerance and access
• Default replication factor is 3
27
28. HDFS Node Types
HDFS has three types of Nodes:
• Datanodes
• Responsible for actual file store
• Serving data from files (data) to client
• Namenode (MasterNode)
• Distribute files in the cluster
• Responsible for the replication between
the datanodes and for file blocks location
• BackupNode
• Backup node for the NameNode
28
29. HDFS is Good for...
• Storing large files
• Terabytes, Petabytes, etc...
• Millions rather than billions of files
• 128MB or more per file
• Streaming data
• Write once and read-many times patterns
• Optimized for streaming reads rather than random reads
29
30. HDFS is Not So Good For...
• Low-latency reads / Real-time application
• High-throughput rather than low latency for small chunks of data
• HBase addresses this issue
• Large amount of small files
• Better for millions of large files instead of billions of small files
• Multiple Writers
• Single writer per file
• Writes at the end of files, no-support for arbitrary offset
30
31. MapReduce is...
• A programming model for expressing distributed
computations at a massive scale
• An execution framework for organizing and performing
such computations
• Bring the code to the data, not the data to the code
http://brillix.co.il31
32. The MapReduce Paradigm
• Imposes key-value input/output
• We implement two functions:
• MAP - Takes a large problem and divides into sub problems and
performs the same function on all sub-problems
Map(k1, v1) -> list(k2, v2)
• REDUCE - Combine the output from all sub-problems (each key goes
to the same reducer)
Reduce(k2, list(v2)) -> list(v3)
• Framework handles everything else (almost)
32
34. MapReduce is Good For...
• Embarrassingly parallel algorithms
• Summing, grouping, filtering, joining
• Off-line batch jobs on massive data sets
• Analyzing an entire large dataset
http://brillix.co.il34
35. MapReduce is NOT Good For...
• Jobs that needs shared state or coordination
• Tasks are shared-nothing
• Shared-state requires scalable state store
• Low-latency jobs
• Jobs on smaller datasets
• Finding individual records
35 http://brillix.co.il
36. YARN
• Takes care of distributed processing and coordination
• Scheduling
• Jobs are broken down into smaller chunks called tasks
• These tasks are scheduled to run on data nodes
• Task Localization with Data
• Framework strives to place tasks on the nodes that host the
segment of data to be processed by that specific task
• Code is moved to where the data is
36
37. YARN
• Error Handling
• Failures are an expected behavior so tasks are automatically re-
tried on other machines
• Data Synchronization
• Shuffle and Sort barrier re-arranges and moves data between
machines
• Input and output are coordinated by the framework
37
39. Improving Hadoop
• Core Hadoop is complicated so some tools and solution
frameworks were added to make things easier
• There are over 80 different Apache projects for big data
solution which uses Hadoop
• Hadoop Distributions collects some of these tools and
release them as a complete package
• Cloudera
• HortonWorks
• MapR
• Amazon EMR
39
41. Databases and DB Connectivity
• HBase: NoSQL Key/Value wide-column oriented
datastore that is native to HDFS
• Sqoop: a tool designed to import data from and export
data to relational databases (HDFS, Hbase, or Hive)
41
42. HBase
• HBase is the closest thing we had to database in the early
Hadoop days
• Distributed key/value with wide-column oriented database
built on top of HDFS - providing Big Table-like capabilities
• Does not have a query language: only get, put, and scan
commands
• Usually compared with Cassandra (non-Hadoop native
Apache project)
42
43. When do we use HBase?
• Huge volumes of randomly accessed data
• HBase is at its best when it’s accessed in a distributed fashion
by many clients (high consistency)
• Consider HBase when we are loading data by key, searching
data by key (or range), serving data by key, querying data by
key or when storing data by row that doesn’t conform well to a
schema.
43
44. When NOT to use HBase
• HBase doesn’t use SQL, don’t have an optimizer,
doesn’t support in transactions or joins
• HBase doesn’t have data types
• See project Apache Phoenix for better data structure
and query language when using HBase
44
45. Sqoop2
• Sqoop is a command line tool for moving data from RDBMS to Hadoop
• Uses MapReduce program or Hive to load the data
• Can also export data from HBase to RDBMS
• Comes with connectors to MySQL, PostgreSQL, Oracle, SQL Server and
DB2.
• Example:
$bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch'
--table lineitem --hive-import
$bin/sqoop export --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --
export-dir /data/lineitemData
45
46. Improving MapReduce Programmability
• Pig: Programming language that simplifies Hadoop
actions: loading, transforming and sorting of data
• Hive: enables Hadoop to operate as data warehouse
using SQL-like syntax
46
47. Pig
• Pig is an abstraction on top of Hadoop
• Provides high level programming language designed for data
processing
• Scripts converted into MapReduce code, and executed on
the Hadoop Clusters
• Makes ETL processing and other simple MapReduce
easier without writing MapReduce code
• Often replaced by more up-to-date tools like
Apache Spark
47
48. Hive
• Data warehousing solution built on top of Hadoop
• Provides SQL-like query language named HiveQL
• Minimal learning curve for people with SQL expertise
• Data analysts are target audience
• Early Hive development work started at Facebook in
2007
48
49. Hive Provides
• Ability to bring structure to various data formats
• Simple interface for ad hoc querying, analyzing and
summarizing large amounts of data
• Access to files on various data stores such as HDFS
and HBase
• Also see: Apache Impala (mainly in Cloudera)
49
50. Improving Hadoop – More useful tools
• For improving coordination: Zookeeper
• For Improving log collection: Flume
• For improving scheduling/orchestration: Oozie
• For Improving UI: Hue/Ambari
50
51. ZooKeeper
• ZooKeeper is a centralized service for maintaining configuration
information, naming, providing distributed synchronization, and providing
group services
• It allows distributed processes to coordinate with each other through a
shared hierarchal namespace which is organized similarly to a standard
file system
• ZooKeeper stamps each update with a number that reflects the order of all
ZooKeeper transactions
51
52. Flume
• Flume is a distributed system for collecting log data from many
sources, aggregating it, and writing it to HDFS
• Flume does for file what Sqoop did for RDBMS
• Flume maintains a central list of ongoing data flows, stored
redundantly in Zookeeper.
52
54. Is Hadoop the Only Big Data Solution?
• No – There are other solutions:
• Apache Spark and Apache Mesos frameworks
• NoSQL systems (Apache Cassandra, CouchBase, MongoDB and
many others)
• Stream analysis (Apache Kafka, Apache Flink)
• Machine learning (Apache Mahout, Spark MLib)
• Some can be integrated with Hadoop, but some are
independent
http://brillix.co.il54
55. Where Does the DBA Fits In?
• Big Data solutions are not databases. Databases are
probably not going to disappear, but we feel the change
even today: DBA’s must be ready for the change
• DBA’s are the perfect candidates to transition into Big Data
Experts:
• Have system (OS, disk, memory, hardware) experience
• Can understand data easily
• DBA’s are used to work with developers and other data users
http://brillix.co.il55
56. What DBAs Needs Now?
• DBA’s will need to know more programming: Java,
Scala, Python, R or any other popular language in the
Big Data world will do
• DBA’s needs to understand the position shifts, and the
introduction of DevOps, Data Scientists, CDO etc.
• Big Data is changing daily: we need to learn, read, and
be involved before we are left behind…
http://brillix.co.il56
58. Summary
• Big Data is here – it’s complicated and RDBMS does
not fit anymore
• Big Data solutions are evolving Hadoop is an example
for such a solution
• Spark is very popular Big Data solution
• DBA’s need to be ready for the change: Big Data
solutions are not databases and we make ourselves
ready
http://brillix.co.il58
One very common use of Hadoop is taking web server or other logs from a large number of machines, and periodically processing them to pull out analytics information. The Flume project is designed to make the data gathering process easy and scalable, by running agents on the source machines that pass the data updates to collectors, which then aggregate them into large chunks that can be efficiently written as HDFS files. It’s usually set up using a command-line tool that supports common operations, like tailing a file or listening on a network socket, and has tunable reliability guarantees that let you trade off performance and the potential for data loss.