Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Exploratory data analysis data visualization:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
Maximize insight into a data set.
Uncover underlying structure.
Extract important variables.
Detect outliers and anomalies.
Test underlying assumptions.
Develop parsimonious models.
Determine optimal factor settings
Big data nowadays is a new challenge to be managed, not as a barrier to grow up business. Data storages costs relatively is inexpensive, with more transactions generated from social media, machine, and sensors, data increased from pieces by pieces into pentabytes.
This slide explained what the challenges of Big Data (Volume, Velocity, and Variety) and give a solution how to managed them.
There are many tools that could help to solve the problems, but the main focus tools in this slide is Apache Hadoop.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Exploratory data analysis data visualization:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
Maximize insight into a data set.
Uncover underlying structure.
Extract important variables.
Detect outliers and anomalies.
Test underlying assumptions.
Develop parsimonious models.
Determine optimal factor settings
Big data nowadays is a new challenge to be managed, not as a barrier to grow up business. Data storages costs relatively is inexpensive, with more transactions generated from social media, machine, and sensors, data increased from pieces by pieces into pentabytes.
This slide explained what the challenges of Big Data (Volume, Velocity, and Variety) and give a solution how to managed them.
There are many tools that could help to solve the problems, but the main focus tools in this slide is Apache Hadoop.
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
2016 IBM Interconnect - medical devices transformationElizabeth Koumpan
Emerging technologies such as Internet of Things, 3D Printing are driving the creation of new business models and forcing the Industry for transformation. The product centric model where the Industry main objective was to develop the device, is moving to software and services model, with the focus on Big Data & Analytics, Integration and Cloud.
The maturation of technologies such as social, mobile, analytics, cloud, 3D printing, bio- and nanotechnology are rapidly shifting the competitive landscape. These emerging technologies create an environment that is connected and open, simple and intelligent, fast and scalable. Organizations must embrace disruptive technologies to drive innovation
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
https://aws.amazon.com/webinars/anz-webinar-series/
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
2016 IBM Interconnect - medical devices transformationElizabeth Koumpan
Emerging technologies such as Internet of Things, 3D Printing are driving the creation of new business models and forcing the Industry for transformation. The product centric model where the Industry main objective was to develop the device, is moving to software and services model, with the focus on Big Data & Analytics, Integration and Cloud.
The maturation of technologies such as social, mobile, analytics, cloud, 3D printing, bio- and nanotechnology are rapidly shifting the competitive landscape. These emerging technologies create an environment that is connected and open, simple and intelligent, fast and scalable. Organizations must embrace disruptive technologies to drive innovation
A bit about Augmented Reality http://k3hamilton.com/AR/
Based on a presentation given on May 27, 2010 by Karen Hamilton and Jorge Olenenwa
Website has moved to http://k3hamilton.com/AR/ due to closing of wikispaces
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
Web, gaming, mobile : quel développeur serez-vous demain ?Microsoft
Cette conférence table ronde a pour objectif de donner un coup de projecteur sur les enjeux de carrière des développeurs et l'évolution de leurs compétences pour assurer leur employabilité. Quelles prévisions de recrutement à horizon 2 ans/ 5 ans ? Quels sont les besoins des entreprises ? Quelles sont les compétences requises ? Quel est l’impact du Cloud? Quel est l'enjeu de la mobilité? Devops : qu’est ce qui change ? Comment devenir un Dev Hero ?
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...NAMSA
Use of Chemical Characterization to Assess the Equivalency of Medical Devices and Materials describes chemical characterization techniques and why they are important.
Meta Analysis of Medical Device Data Applications for Designing Studies and R...NAMSA
Meta Analysis of Medical Device Data Applications for Designing Studies and Reinforcing Clinical Evidence discusses what meta analysis is as well as the potential benefits.
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
In the age of Big Data and large volume analytics there is a lot to cover and a lot to learn. While at Microsoft developing Windows HDInsight and now developing a one of kind Big Data product at my own company Big Data Perspective, San Francisco I have lived last several years covering Big Data at various level. This talk is customized for database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of Big Data Analytics on Hadoop. This webinar will be useful for those, who wants to know what is Hadoop, and how they can take advantage just by spending few dollars to run the cluster. The webinar is great for those who are looking to deploy their first data cluster and run MapReduce jobs to discover insights.
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Businesses are generating more data than ever before.
Doing real time data analytics requires IT infrastructure that often needs to be scaled up quickly and running an on-premise environment in this setting has its limitations.
Organisations often require a massive amount of IT resources to analyse their data and the upfront capital cost can deter them from embarking on these projects.
What’s needed is scalable, agile and secure cloud-based infrastructure at the lowest possible cost so they can spin up servers that support their data analysis projects exactly when they are required. This infrastructure must enable them to create proof-of-concepts quickly and cheaply – to fail fast and move on.
Stream Meets Batch for Smarter Analytics- Impetus White PaperImpetus Technologies
For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
The paper discusses how the traditional batch and real time paradigm can work together to deliver smarter, quicker and better insights on large volumes of data picking the right strategy and right technology.
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we first present an end-to-end streaming data solution using Amazon Kinesis Data Streams for data ingestion, Amazon Kinesis Data Analytics for real-time processing, and Amazon Kinesis Data Firehose for persistence. We review in detail how to write SQL queries for operational monitoring using Kinesis Data Analytics.
Learn how PNNL is building their ingestion flow into their Serverless Data Lake leveraging the Kinesis Platform. At times migrating existing NiFi Processes where applicable to various parts of the Kinesis Platform, replacing complex flows on Nifi to bundle and compress the data with Kinesis Firehose, leveraging Kinesis Streams for their enrichment and transformation pipelines, and using Kinesis Analytics to Filter, Aggregate, and detect anomalies.
Data-driven companies have a need to make their data easily accessible to those who analyze it. Many organizations have adopted the Looker application, LookML on AWS, a centralized analytical database with a user-friendly interface that allows employees to ask and answer their own questions to make informed business decisions.
Join our webinar to learn how our customer, Casper, an online mattress retailer, made the switch from a transactional database to Looker’s data analytics program on Amazon Redshift. Looker on Amazon Redshift can help you greatly reduce your analytics lifecycle with a simplified infrastructure and rapid cloud scaling.
Join us to learn:
• How to utilize LookML to build reusable definitions and logic for your data
• Best practices for architecting a centralized analytical database
• How Casper leveraged Looker and Amazon Redshift to provide all their employees access to their data and metrics
Who should attend: Heads of Analytics, Heads of BI, Analytics Managers, BI Teams, Senior Analysts
In this presentation we review the basic architecture behind SQL Server StreamInsight.
Regards,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
Using real time big data analytics for competitive advantageAmazon Web Services
Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.
Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.
To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.
BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
Data Centric Approach: Our platform is built on the premise of absorbing data from multiple data sources and transforming them to a highly intelligent social network graphs that can be processed to non-obvious relationships.
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
This presentation outlines key business drivers for real-time analytics applications in retail and describes the emerging architectures based on In-Stream Processing (ISP) technologies. The slides present a complete open blueprint for an ISP platform - including a demo application for real-time Twitter Sentiment Analytics - designed with 100% open source components and deployable to any cloud.
To learn more, read an adjoining blog series on this topic here : https://blog.griddynamics.com/in-stream-processing-service-blueprint
Similar to Big Data Analytics for Real Time Systems (20)
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
2. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
2 Big Data Analytics for Real Time Systems
3. Overview
3 Big Data Analytics for Real Time Systems
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
4. Where does Big Data come from?
4 Big Data Analytics for Real Time Systems
Courtesy: http://goo.gl/JWswfj
5. What makes it Big Data?
5 Big Data Analytics for Real Time Systems
Courtesy: Oracle
VARIABILITY
6. Evolution of Big Data
6 Big Data Analytics for Real Time Systems
1960s 1967
Automatic Data
Compression
1997
Information Explosion
Our Literature Survey!
7. Overview
7 Big Data Analytics for Real Time Systems
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
8. Big Data Analytics
“Big data analytics is the process of examining large data sets to
uncover hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.“
8 Big Data Analytics for Real Time Systems
Predictive Analysis
Text Analysis
Data Mining
Statistical Analysis
Courtesy: smartdatacollective.com
10. Analytics & 3 V‘s
10 Big Data Analytics for Real Time Systems
Courtesy: watalon.com
11. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
11 Big Data Analytics for Real Time Systems
12. Real Time Systems
“A real-time system is one that processes information and produces a
response within a specified time, else risk severe consequences,
sometimes including failure.“
12 Big Data Analytics for Real Time Systems
Telecommunication
Systems
Anti-Lock Brakes in a Car
Air Traffic Control System
Weather Forecasting
System
Courtesy: yourdon.com
13. Real-Time Analytics of Big Data
13 Big Data Analytics for Real Time Systems
What is Happening?
Kilobytes/
Sec
Megabytes/
Sec
Gigabytes
Terabytes
Petabytes
Exabytes
Seconds Milliseconds Minutes
Minutes
Hours
Big Data
Real Time
Courtesy: infochimps.com
14. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
14 Big Data Analytics for Real Time Systems
15. Challenges of Real Time Analytics
15 Big Data Analytics for Real Time Systems
Expensive
Complex Architecture, Batch Processing
Semi and Unstructured Data: New Sources are unpredictable; Relational
databases are not capable, leaving us hamstrung
Market too Dynamic to Predict: Subscribers preferences change; competition
adds acceleration to it
Scalability: Requires sub-second response times; more than a single server can
handle
16. Thinking Beyond Hadoop!
16 Big Data Analytics for Real Time Systems
Manage & store huge
volume of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
Courtesy: IBM
17. Our Solution
Do the impossible: Incorporate any kind
of data
Scale Big: Scale without any complexity
Not Time Consuming: Seconds to
Minutes
Real Time: Try to analyze data without
expensive data warehouse loads
17 Big Data Analytics for Real Time Systems
Powerful Analytics, In Place, In Real Time.
Courtesy: slideshare.com
18. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
18 Big Data Analytics for Real Time Systems
19. In-Memory Computing
In-memory computing primarily relies on keeping data in a server's RAM as a
means of processing at faster speeds. It uses a type of middleware software that
allows one to store data in RAM, across a cluster of computers, and process it in
parallel.
19 Big Data Analytics for Real Time Systems
Courtesy: Stratecast
20. Stream Processing
20 Big Data Analytics for Real Time Systems
Courtesy: EMC
Stream-processing systems operate on continuous data streams e.g., click
streams on web pages, user request/query streams, monitoring events,
notifications, etc.
Stream processing delivers real-time analytic processing on constantly changing
data in motion.
Analyse first store later!
21. Complex Event Processing
Complex Event Processing (CEP) processes multiple event streams generated
within the enterprise to construct data abstraction and identify meaningful
patterns among those streams.
21 Big Data Analytics for Real Time Systems
Analytics across both real-time and historical data.
Real-time event capture, filtering, pattern detection, matching, and
aggregation.
22. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
22 Big Data Analytics for Real Time Systems
23. Tools for Real Time Analytics
Big Data is NOT new, the Tools ARE!
23 Big Data Analytics for Real Time Systems
IBM InfoSphere Streams
24. Kafka
A high performance distributed publish-subscribe messaging system.
Designed for processing of real time activity stream data.
Initially developed at LinkedIn, now part of Apache.
Kafka works in combination with Apache Storm, Apache HBase and Apache
Spark for real-time analysis and rendering of streaming data.
24 Big Data Analytics for Real Time Systems
Fast
Scalable
Durable
Fault-tolerant
25. Storm
A highly distributed real-time computation system.
Acquired by Twitter.
Twitter claims, “Over a million tuples processed per second per node.”
Fast, Scalable, Reliable and Fault-tolerant.
25 Big Data Analytics for Real Time Systems
Stream: Unbounded
sequence of tuples
Primitives
Spouts: Pull messages
Bolts: Perform core
functions of stream
computing
Stream
26. Spark Streaming
Was developed in the AMPLab at
UC Berkeley.
In-memory computing
capabilities deliver speed.
Low latency
High throughput
Fault tolerant
New programing model:
Discretized streams (Dstreams)
Resilient Distributed Datasets
26 Big Data Analytics for Real Time Systems
Spark Streaming uses micro-batching to support continuous stream processing. It is
an extension of Spark which is a batch-processing system.
Courtesy: Apache Spark
27. Spring XD (XD=eXtreme Data)
Spring XD is a unified, distributed, and extensible system for data ingestion, real
time analytics, batch processing, and data export.
Spring XD framework supports streams for the ingestion of event driven data
from a source to a sink that passes through any number of processors.
27 Big Data Analytics for Real Time Systems
Courtesy: Infoq
28. Comparison of Tools (1)
Spark Streaming Apache Storm Spring XD
Definition
A fast and general purpose
cluster computing system.
A distributed real-time
computation system.
A unified, distributed, and
extensible system for data
ingestion, real time analytics,
batch processing, and data
export.
Implemented in Scala Clojure Java
Programming API Scala, Java, Python
Java API and usable with any
programing language.
Java
Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal.
Processing Model
Batch processing framework
that also does micro-batching.
Stream Processing Framework
that processes and dispatches
messages as soon as they
arrive.
Unified platform for stream
processing.
Fault Tolerance
Recovery of lost work and
restart of workers via the
resource manager.
Restart of Workers,
Supervisors like nothing ever
happened.
Reassignment of work to
container working.
28 Big Data Analytics for Real Time Systems
29. Comparison of Tools (2)
Spark Streaming Apache Storm Spring XD
Data processing
Messages are not lost and
delivered once. (Small-scale
batching)
Keeps track of each and every
record.
Unacknowledged messages
are retried until the
container comes back.
Use Cases
• Combines batch and
stream processing
(Lambda Architecture).
• Machine Learning:
Improve performance of
iterative algorithms
• Power Real-time
Dashboards.
Prevention of:
• securities fraud
• compliance violations
• security breaches
• network outage
• Stream tweets to
Hadoop for sentiment
analysis.
• High throughput
distributed data
ingestion into HDFS from
a variety of input
sources.
• Real-time analytics at
ingestion time, e.g.
gathering metrics and
counting values.
29 Big Data Analytics for Real Time Systems
30. Which tools are right for you?
30 Big Data Analytics for Real Time Systems
31. Lambda Architecture
31 Big Data Analytics for Real Time Systems
In 2013, Nathan Marz and James Warren proposed the Lambda Architecture
that attempts to provide a methodology to build a Big Data system.
Such a system would balance latency, throughput, and fault-tolerance by
using batch processing to provide comprehensive and accurate pre-computed
views, while simultaneously using real-time stream processing to provide
dynamic views.
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
32. Lambda Architecture Example
32 Big Data Analytics for Real Time Systems
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
33. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
33 Big Data Analytics for Real Time Systems
34. Use Cases
34 Big Data Analytics for Real Time Systems
Healthcare
Capture and analyze real-time data from medical monitors,
alerting hospital staff to potential health problems before patients
manifest clinical signs of infection or other issues.
Analyze privacy-protected streams of medical device data to
detect early signs of disease, identify correlations among multiple
patients.
Finance
Analyze ticks, tweets, satellite imagery, weather trends, and any
other type of data to inform trading algorithms in real time.
Apply fraud insights to take action in real time. Use analytics on
streaming data to confidently differentiate legitimate actions,
while preventing or interrupting suspicious actions and respond
immediately to criminal patterns and activities.
35. Use Cases
35 Big Data Analytics for Real Time Systems
Government
Identify social program fraud within seconds based on program
history, citizen profile, and geospatial data.
Identify items or patterns for deeper investigation in Cyber-
security.
Transport
Traffic managers can now respond quickly and accurately to
relevant insights from real-time analytics drawn from data feeds
and reports.
Telematics can provide data-in-motion such as vehicle speed, data
relating to the transmission control system, braking, air bags, tire
pressure and wiper speed as well as geospatial and current
environmental conditions data. Hence, automotive companies can
strengthen customer relationships
36. Use Cases
36 Big Data Analytics for Real Time Systems
Telecommunication
Improve customer profitability analysis, end-to-end visibility for
new product rollouts and real-time analysis for better the network
customers.
Perform capacity planning for mobile networks as new high-
bandwidth services are introduced. Improve customer experience.
Retail
See a product recurring in abandoned shopping carts. Run a
promotion to close more sales of that product.
Evaluate sales performance in real time. Take measures now to
achieve sales quotas.
An electric coupon delivery service sends e-mails to customers
with recommendations matched to their interests derived from
their location information, membership information, and
information on nearby stores.
37. 37 Big Data Analytics for Real Time Systems
Courtesy: SAP
38. Overview
Introduction
Big Data Analytics
Real Time Systems
Challenges of Real Time Analytics
Technologies
Tools
Use Cases
Future Work and Conclusion
38 Big Data Analytics for Real Time Systems
39. Future Work
Increased Level of Merging
Application of Social and Digital Media
New Technologies
Further Development of Telemetric Data
Self Learning Systems
Complex Statistical Methods
39 Big Data Analytics for Real Time Systems
40. Conclusion
40 Big Data Analytics for Real Time Systems
Resources
Privacy Security
TimeCost
“Consumer Data will be the biggest differentiator in the next two to three years.
Whoever unlocks the reams of data and uses it strategically, will win”
-Angela Ahrendts, CEO, Burberry
?