This document discusses small, medium, and big data. It provides an overview of semantic web technologies like RDF, OWL, and SPARQL. It then covers the characteristics of big data, known as the 3Vs - volume, variety, and velocity. The document discusses different database types like relational databases, NoSQL databases including key-value stores, document databases, and graph databases. It also covers big data processing techniques like MapReduce and Hadoop.
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/
Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.
This presentation focuses on the “Data - Big Data - Bigger Data” and the Challenges, Opportunities and Solutions from these trends.
What are the Challenges this massive data brings to the table?
What are the opportunities this data provide ?
Some solutions on how to handle this data.
Big Data Modeling Challenges and Machine Learning with No CodeLiana Ye
Presented at SF BAY ACM_202001015_by_Karthik Chinnusamy
What are the Big Data model challenges in today's field? With a few best practice recommendations and Machine Learning approaches, I will use Knime to show the modeling advantages for Big Data with the following themes:
.Performance: Good data models can help us quickly query the required data and reduce I/O throughput.
.Cost: Good data models can significantly reduce unnecessary data redundancy, reuse computing results, and reduce the storage and computing costs for the big data system.
.Efficiency: Good data models can greatly improve user experience and increase the efficiency of data utilization.
.Quality: Good data models make data statistics more consistent and reduce the possibility of computing errors.
I will also describe tools for Sources, Ingestion, Exploration, Modeling and Machine Learning.
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/
Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.
This presentation focuses on the “Data - Big Data - Bigger Data” and the Challenges, Opportunities and Solutions from these trends.
What are the Challenges this massive data brings to the table?
What are the opportunities this data provide ?
Some solutions on how to handle this data.
Big Data Modeling Challenges and Machine Learning with No CodeLiana Ye
Presented at SF BAY ACM_202001015_by_Karthik Chinnusamy
What are the Big Data model challenges in today's field? With a few best practice recommendations and Machine Learning approaches, I will use Knime to show the modeling advantages for Big Data with the following themes:
.Performance: Good data models can help us quickly query the required data and reduce I/O throughput.
.Cost: Good data models can significantly reduce unnecessary data redundancy, reuse computing results, and reduce the storage and computing costs for the big data system.
.Efficiency: Good data models can greatly improve user experience and increase the efficiency of data utilization.
.Quality: Good data models make data statistics more consistent and reduce the possibility of computing errors.
I will also describe tools for Sources, Ingestion, Exploration, Modeling and Machine Learning.
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.
Slide presentasi ini dibawakan oleh Jony Sugianto dalam Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDO pada tanggal 14 Mei 2016.
.NET Usergroup Oldenburg 28. Mai 2015 - von Dr. Yvette Teiken
Big Data ist in aller Munde. Auch Microsoft ist mit HDInsight auf den Zug aufgesprungen. Aber wie passt das zusammen, Open Source, Hadoop und Microsoft? Wo sind die Anknüpfungspunkte zu klassischem BI? Wie werden Daten gespeichert und analysiert? Was ändert sich mit Big Data und was nicht? Unter anderem soll es gehen um.
Erstellung, Anfragen und Export von Hive Tabellen
Umsetzung von ETL-Prozessen mit Hilfe von PIG
Entwicklung nativer Map Reduce-Jobs mit C#
Interaktion mit traditionellen RDBMS und Streaming-Technologien
Datenspeicherung mit DocumentDB
Skalierung von Analysen
Bigdata.
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem."[2] Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on."[3] Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[4] connectomics, complex physics simulations, biology and environmental research.[5]
Data sets grow rapidly - in part because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.[6][7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data are generated.[9] One question for large enterprises is determining who should own big-data initiatives that affect the entire organization.[10]
Relational database management systems and desktop statistics- and visualization-packages often have difficulty handling big data. The work may require "massively parallel software running on tens, hundreds, or even thousands of servers".[11] What counts as "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."
The analysis of movement is an important research topic in, for example, geography, ecology, visual analytics, GIScience as well as in application domains such as urban, maritime, and aviation research. Movement data analysis requires tools for the manipulation and visualization of movement or trajectory data. This talk presents the new Python library MovingPandas.org
Storing and Querying Semantic Data in the CloudSteffen Staab
Daniel Janke and Steffen Staab. Tutorial at Reasoning Web
With proliferation of semantic data, there is a need to cope with trillions of triples by horizontally scaling data management in the cloud. To this end one needs to advance (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this tutorial, we want to review challenges and how they have been addressed by research and development in the last 15 years.
A very high-level introduction to scaling out wth Hadoop and NoSQL combined with some experiences on my current project. I gave this presentation at the JFall 2009 conference in the Netherlands
Slide presentasi ini dibawakan oleh Jony Sugianto dalam Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDO pada tanggal 14 Mei 2016.
.NET Usergroup Oldenburg 28. Mai 2015 - von Dr. Yvette Teiken
Big Data ist in aller Munde. Auch Microsoft ist mit HDInsight auf den Zug aufgesprungen. Aber wie passt das zusammen, Open Source, Hadoop und Microsoft? Wo sind die Anknüpfungspunkte zu klassischem BI? Wie werden Daten gespeichert und analysiert? Was ändert sich mit Big Data und was nicht? Unter anderem soll es gehen um.
Erstellung, Anfragen und Export von Hive Tabellen
Umsetzung von ETL-Prozessen mit Hilfe von PIG
Entwicklung nativer Map Reduce-Jobs mit C#
Interaktion mit traditionellen RDBMS und Streaming-Technologien
Datenspeicherung mit DocumentDB
Skalierung von Analysen
Bigdata.
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem."[2] Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on."[3] Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[4] connectomics, complex physics simulations, biology and environmental research.[5]
Data sets grow rapidly - in part because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.[6][7] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[8] as of 2012, every day 2.5 exabytes (2.5×1018) of data are generated.[9] One question for large enterprises is determining who should own big-data initiatives that affect the entire organization.[10]
Relational database management systems and desktop statistics- and visualization-packages often have difficulty handling big data. The work may require "massively parallel software running on tens, hundreds, or even thousands of servers".[11] What counts as "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."
The analysis of movement is an important research topic in, for example, geography, ecology, visual analytics, GIScience as well as in application domains such as urban, maritime, and aviation research. Movement data analysis requires tools for the manipulation and visualization of movement or trajectory data. This talk presents the new Python library MovingPandas.org
Storing and Querying Semantic Data in the CloudSteffen Staab
Daniel Janke and Steffen Staab. Tutorial at Reasoning Web
With proliferation of semantic data, there is a need to cope with trillions of triples by horizontally scaling data management in the cloud. To this end one needs to advance (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this tutorial, we want to review challenges and how they have been addressed by research and development in the last 15 years.
A very high-level introduction to scaling out wth Hadoop and NoSQL combined with some experiences on my current project. I gave this presentation at the JFall 2009 conference in the Netherlands
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
11. V for ...
Volume
Scale
Sources
Variety
Relational
NoSQL
Velocity
Operational
Analytical
12. V for ...
Volume
Scale
Sources
Variety
Relational
NoSQL
Velocity
Operational
Analytical
13. How Big is our Data?
M mega million 106
G giga billion 109
T tera trillion 1012
P peta quadrillion 1015
E exa quintillion 1018
Z zetta sextillion 1021
Y yotta septillion 1024
Check The Powers of Ten (1977) on YouTube
14. Big Data Sources
Million of servers (logs)
Billion of users (social networks)
Billion of devices (smartphones)
+ Time/Space = Big Data
15. Big Data Examples
Facebook collects 500 TB per day (1)
Google processes 24 PB per day (2)
We create 2.5 EB per day (3)
(1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/
(2) http://en.wikipedia.org/wiki/Petabyte (2009)
(3) http://www-01.ibm.com/software/data/bigdata/
16. How Small is our Wisdom?
Wisdom
Knowledge
Information
Big Data
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
T. S. Eliot, The Rock
17. V for ...
Volume
Scale
Sources
Variety
Relational
NoSQL
Velocity
Operational
Analytical
18. Scalability
Scaling up and Scaling out
Partitioning and Sharding
24. Key-Value Stores
(Key:string) => Value
fast read, low write latency
used for sessions, carts
Dynamo: Amazon’s Highly Available Key-value Store (2007)
25. Bigtable Clones
Google's Distributed Storage System
(row:string, col:string, ts:int64) => string
used by Google & most companies
Bigtable: A Distributed Storage System for Structured Data (2006)
26. Document Databases
document-oriented (content query)
semi-structured data (JSON)
used for web apps
27. Graph Databases
property graph
index-free adjacency
used for recommendations, social networks
29. Property Graph
A property graph is a directed, labeled, attributed graph
30. Graph Traversal
Gremlin is jumping
- from vertex to vertex
- from vertex to edge
- from edge to vertex
https://github.com/tinkerpop/gremlin/wiki
31. DBpedia Traversal
+ +
gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql")
gremlin> r = g.v('http://dbpedia.org/resource/Tim_Berners-Lee')
gremlin> r.out('http://www.w3.org/2000/01/rdf-schema#comment').has('lang','fr').value
==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeur
du World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nom
officiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium
(W3C), organisme qu'il a fondé.
gremlin> r.in('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Paul_Otlet]
gremlin> r.in('http://dbpedia.org/ontology/influenced').out('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Douglas_Engelbart]
==>v[http://dbpedia.org/resource/Ted_Nelson]
==>v[http://dbpedia.org/resource/Vannevar_Bush]
==>v[http://dbpedia.org/resource/Tim_Berners-Lee]
...
32. Triple/RDF Stores
Subject-Predicate-Object
SPARQL as query language
AllegroGraph, OpenLink Virtuoso, ...
33. V for ...
Volume
Scale
Sources
Variety
Relational
NoSQL
Velocity
Operational
Analytical
34. Big Data Processing
Batch Processing
MapReduce
Interactive Analysis
BigQuery
35. MapReduce
MapReduce: Simplified Data Processing on Large Clusters (2004)
36. Apache Hadoop
Distributed Data + MapReduce
http://hadoop.apache.org/
37. Last Trends
http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
38. NoSQL issues
No Distributed Transactions
No SQL as query language