Bigdata and data warehousing can work in synergy by applying the structure of data warehousing to the large and unstructured datasets of bigdata. While data warehousing focuses on modeling data, co-locating related information, and optimizing queries, bigdata is better suited to analyzing unstructured data at scale through distributed systems without an upfront model. The two approaches complement each other by bringing structure to bigdata through modeling and applying bigdata's ability to analyze unstructured data at massive scale.
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
This presentation introduces concepts of Big Data in a layman's language. Author does not claim the originality of the content. The presentation is made by compiling from various sources. Author does not claim copyrights or privacy issues.
Big data is exponentially rising in today's age of information and digital shrinkage. This presentation potentially clears the concept and revolving hype around it.
At the Technology Trends seminar, with HCMC University of Polytechnics' lecturers, KMS Technology's CTO delivered a topic of Big Data, Cloud Computing, Mobile, Social Media and In-memory Computing.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
This presentation introduces concepts of Big Data in a layman's language. Author does not claim the originality of the content. The presentation is made by compiling from various sources. Author does not claim copyrights or privacy issues.
Big data is exponentially rising in today's age of information and digital shrinkage. This presentation potentially clears the concept and revolving hype around it.
At the Technology Trends seminar, with HCMC University of Polytechnics' lecturers, KMS Technology's CTO delivered a topic of Big Data, Cloud Computing, Mobile, Social Media and In-memory Computing.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
In this session it will presented main workflows and technologies of getting value from Big Data stored in our Enterprise using Azure.
- When we have a Big Data problem
- Finding the best solution for our Big Data
- Working inside the Data Team
- Extract the true value of our data.
http://www.youtube.com/watch?v=EGDv8jctVqw
Introdução ao MongoDB, Redis e Cassandra através de Python. Quais as características principais de um banco orientado a documentos, chave-valor e colunar. Que vantagens esses bancos possuem em relação a um banco relacional tradicional. No final farei uma aplicação que persiste dados do Twitter e Facebook nos três bancos mencionados.
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
The main objective of “Big Data intelligence” is to understand all of us better to predict the future. Be it 4 billion google queries a day or 1 billion FB users, we need smarter AI algorithms to learn and connect the dots from the ocean of data. With massive parallelism and Map-Reduce techniques, millions of servers take us one step closer to the “Turing’s Intelligent machine”. Near AI success stories are google, facebook, twitter, youtube and Amazon. Let's begin our journey by knowing big hype, big dreams of 50's , big laws, big growth and basic operations to extract big data intelligence.For more information on Big Data training in coimbatore, please visit https://bigzettab.wordpress.com/ . - Prof. Ashok.R, +91-9943900101, ashok@zettab.com.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
By Design, not by Accident - Agile Venture Bolzano 2024
Big Data vs Data Warehousing
1. Bigdata vs. Data Warehousing
Synergy or Conflict?
Thomas Kejser
thomas@kejser.org
http://blog.kejser.org
@thomaskejser
2. Who is this Guy?
Thomas Kejser
http://blog.kejser.org
@thomaskejser
• Formerly: Lead SQLCAT EMEA
• Now: CTO FusionIo EMEA
• 15 year database experience
• Performance Tuner
3. Human Consciousness Doesn’t Scale
10
9
Billion Humans
8
7
6
5
2000 2050 2100 2150 2200 2250
Year Source: United Nations Projections
4. Text Messages in a Table
CREATE TABLE AllTexts (
Sender BIGINT 8B
, Receiver BIGINT 8B
, SenderLocation BIGINT 8B
, ReceiverLocation BIGINT 8B
, Time DATETIME 8B
, SMS VARCHAR(140) 140B
)
= 180Bytes
5. How much do we text?
• World Average
• 6.1 Trillion Text Messages / year
• About 80% cell phone coverage
• 7 billion people
• 3 messages/day/person
• But:
• Teenagers: 50 messages/day
Source: Pew Internet Research 2010 & ITU
6. How much will we EVER text?
• 9B people acting like teenagers (in 2050)
• 50 texts/day
• That’s 450 billion texts/day
• 164 Trillion texts/year (20x today)
• 180 bytes each
• Assume x3 compression
• Approximation: 10 Petabytes/year in
2050
8. How Large is this/year?
Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0”
About 1500 Wine Bottles
9. In the Data Center
• Calculating:
• 2U Storage=24 Disks
(includes compute)
• 4TB per Disk
• 100TB in 2U (a bit
less)
• 10PB = 200U storage
• About six racks
11. … And it is Becoming a Commodity
• Good Management
Interfaces
• Standard SQL
• with a few extensions
• Appliances
• Support system
• Homogenous HW
• In chunks
13. PDW vs. Hive – Scan/seek
Query 1 Query 2
SELECT count(*) SELECT max(l_quantity)
FROM lineitem FROM lineitem
WHERE l_orderkey > 1000
and l_orderkey < 100000
GROUP BY l_linestatus
Secs.
1500
1000
Hive
500 PDW
0
Query 1 Query 2
14. PDW vs. Hive - Joins
PDW-U:
SELECT max(l_orderkey) • orders partitioned on c_custkey
FROM orders
JOIN lineitem • lineitem partitioned on l_partkey
ON l_orderkey = o_orderkey PDW-P:
• orders partitioned on o_orderkey
• lineitem partitioned on
l_orderkey
Secs.
4000
3000
Hive
2000 PDW-U
1000 PDW-P
0
Hive PDW-U PDW-P
15. What does Big Data need to Catch up?
• Thread startup times
• Co-location awareness
• Files vs. optimized DB memory
structures
• Column stores and other DB tech
Generic is good…
… but when there is structure, make
use of it!
17. How many Pictures of Cats?
• Flickr Today:
• 300MB/month
• 2GB/year
• 51M users (too small?)
• Estimate: 102 PB /
year
• 10 x text messages
Source: WikiPedia
25. Saturday, 1:39am - at The Pub
Your Semi-structured Data, For Free
26. Big Value
Extraction of
of meaning and insight
from semi-structured data
27. Extracting Meaning from Humans
Method Examples
Turn semi-structure to structure Image recognition, network proximity
and super nodes, social media
Needle in a haystack Extract outliers, Fraud
Herd behaviors Clustering, Pattern Recognition,
“Customers who bought this also
bought”
Text classification and search Text indexes, syntactic counting,
pagerank
Text to structure Semantic analysis, loose structure into
structure
28. Find New Customers
“Michael, who is
Tommy
Thomas
respected among his
peers, Michael
often talks
about his
new, cool
gadgets”
34. Things to Learn for the Future
• Get good at
• Statistics (again)
• Distributed Algorithms
• Tuning
• Understand Physical
Constraints
• Acquire deep domain
knowledge
40. Summary
Data Warehouse Big Data
• There is a model • Don’t bother modeling!
• Seek Co-location • Optional Co-Location
• Respond in seconds • Respond in minutes
• Calculate first, query after • Calculate while querying
• Expensive HW • Cheap HW
• Optimise for target HW • Good enough on all HW
• Homogenous HW • Heterogeneous HW
• Pay vendor, expect • Free license, optimise
optimised yourself
We are at the end of the growth curve... 9B is our total population... This is an important observation because many data estimates are based on human activity and has so far assumed exponention growthm.. This is NOT the case anymore!
This show the development of hard drive capacity over time
The calculation is not meant to be read, just letting people know we did the calc and what it PHYSICALLY means (see the animation)... There is a real cost to storing a lot of data, and this is one of the reasons cloud makes a lot of senseWine bottles