Interactive Latency in Big Data Visualization
Zhicheng "Leo" Liu, Research Scientist at the Creative Technologies Lab at Adobe Research
January 22nd, 2014
Reducing interactive latency is a central problem in visualizing large datasets. I discuss two inter-related projects in this problem space. First, I present the imMens system and show how we can achieve real-time interaction at 50 frames per second for billions of data points by combining techniques such as data tiling and parallel processing. Second, I discuss an ongoing user study that aims to understand the effect of interactive latency on human cognitive behavior in exploratory visual analysis.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
Big Data Visualization Problem in IT Managementbigdataviz_bay
Big Data Visualization Problem in IT Management
Serge Mankovski, Research Staff member at CA Technologies
January 22nd 2014
Big Data exposes and amplifies need for understanding big picture of what data tells you without loosing track of minute details. This need manifests in different aspects of Big Data from data processing and management to visual analytics and insight. This problem clearly manifests in management of very large IT infrastructures. CA Labs, as a part of CA Technologies, is focusing on the problem of visualization of Big Data in context of traditional IT management problems of root cause analysis, impact analysis, and change management. We will demonstrate examples of the Big Data Visualization problem as we see it and try to somehow outline this area of research problems.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay
Big data visualization frameworks and applications at Kitwarebigdataviz_bay
Big data visualization frameworks and applications at Kitware
Marcus Hanwell, Technical Leader at Kitware, Inc.
March 27th 2014
Kitware develops permissively licensed open source frameworks and applications for scientific data applications, and related areas. Some of the frameworks developed by our High Performance Computing and Visualization group address current challenges in big data visualization and analysis in a number of application domains including geospatial visualization, social media, finance, chemistry, biological (phylogenetics), and climate. The frameworks used to develop solutions in these areas will be described, along with the applications and the nature of the underlying data. These solutions focus on shared frameworks providing data storage, indexing, retrieval, client-server delivery models, server-side serial and parallel data reduction, analysis, and diagnostics. Additionally, they provide mechanisms that enable server-side or client-side rendering based on the capabilities and configuration of the system.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
Big Data Visualization
Kwan-Liu Ma
Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis
January 22nd 2014
We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
Work presented in partial fulfillment
of the requirements for the degree of
Bachelor in Computer Science - Federal University of Rio Grande do - Brazil
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Efficient frequent pattern mining in distributed systemSaurav Kumar
Data Mining the domain of our project , is a newly developed sub-field of computer science engineering , it is the analysis step of Knowledge discovery in databases(KDD ) process and is used for extraction of data from a huge data set and make it understandable for further use. Among the Six classes of data mining our choice of interest and our project area is the Association Rule Mining. We will be applying this class of data mining in an efficient and frequent pattern for the mining of knowledge or data from Distributed System , which can be explained as a collection of set of computers that act , work and appear as one large computer.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Big Data Visualization Problem in IT Managementbigdataviz_bay
Big Data Visualization Problem in IT Management
Serge Mankovski, Research Staff member at CA Technologies
January 22nd 2014
Big Data exposes and amplifies need for understanding big picture of what data tells you without loosing track of minute details. This need manifests in different aspects of Big Data from data processing and management to visual analytics and insight. This problem clearly manifests in management of very large IT infrastructures. CA Labs, as a part of CA Technologies, is focusing on the problem of visualization of Big Data in context of traditional IT management problems of root cause analysis, impact analysis, and change management. We will demonstrate examples of the Big Data Visualization problem as we see it and try to somehow outline this area of research problems.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay
Big data visualization frameworks and applications at Kitwarebigdataviz_bay
Big data visualization frameworks and applications at Kitware
Marcus Hanwell, Technical Leader at Kitware, Inc.
March 27th 2014
Kitware develops permissively licensed open source frameworks and applications for scientific data applications, and related areas. Some of the frameworks developed by our High Performance Computing and Visualization group address current challenges in big data visualization and analysis in a number of application domains including geospatial visualization, social media, finance, chemistry, biological (phylogenetics), and climate. The frameworks used to develop solutions in these areas will be described, along with the applications and the nature of the underlying data. These solutions focus on shared frameworks providing data storage, indexing, retrieval, client-server delivery models, server-side serial and parallel data reduction, analysis, and diagnostics. Additionally, they provide mechanisms that enable server-side or client-side rendering based on the capabilities and configuration of the system.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
Big Data Visualization
Kwan-Liu Ma
Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis
January 22nd 2014
We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
Work presented in partial fulfillment
of the requirements for the degree of
Bachelor in Computer Science - Federal University of Rio Grande do - Brazil
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Efficient frequent pattern mining in distributed systemSaurav Kumar
Data Mining the domain of our project , is a newly developed sub-field of computer science engineering , it is the analysis step of Knowledge discovery in databases(KDD ) process and is used for extraction of data from a huge data set and make it understandable for further use. Among the Six classes of data mining our choice of interest and our project area is the Association Rule Mining. We will be applying this class of data mining in an efficient and frequent pattern for the mining of knowledge or data from Distributed System , which can be explained as a collection of set of computers that act , work and appear as one large computer.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
A talk at the CODATA/RDA meeting in Gaborone, Botswana. I made the case that the biggest barriers to effective data sharing and reuse are often those associated with "data friction" and that cloud automation can be used to overcome those barriers.
The image on the first slide shows a few of the more than 20,000 active Globus endpoints.
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web
Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0
Visit the conference website to learn more: http://telaviv.datanatives.io/
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
I am a data science researcher. I have a diverse academic background - a B.Sc. in electrical engineering, a B.Sc. in physics (cum laude) from Tel Aviv University's prestigious program for parallel B.Sc. in Physics and in Electrical Engineering, an M.Sc. in condensed matter (cum laude), and have started my Ph.D. in bioinformatics. Prior to my M.Sc. I have served as a captain in a technology unit of the IDF.
I am passionate about science and solving complex big data problems that require out of the box thinking, and like to dive deep into the details. I always take a positive, proactive approach, and put an emphasis on understanding the big picture as well.
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
Director's Colloquium at Los Alamos National Laboratory, September 18, 2014.
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
Reviews recent results from the Materials Data Facility. Thanks in particular to Ben Blaiszik, Jonathon Goff, and Logan Ward, and the Globus data search team. Some features shown here are still in beta. We are grateful for NIST for their support.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
How does Twitter track the top trending topics?
How does Amazon keep track of the top-selling items for the day?
How many cabs have been booked this month using your App?
Is the password that a new user is choosing a common/compromised password?
Modern web-scale systems process billions of transactions and generate terabytes of data every single day. In order to find answers to questions against this data, one would initiate a multi-minute query against a NoSQL datastore or kick off a batch job written in a distributed processing framework such as Spark or Flink. However, these jobs are throughput-heavy and not suited for realtime low-latency queries. However, you and your customers would like to have all this information "right now".
At the end of this talk, you'll realize that you can power these low-latency queries and with incredibly low memory footprint "IF" you are willing to accept answers that are, say, 96-99% accurate. This talk introduces some of the go-to probabilistic data structures that are used by organisations with large amounts of data - specifically Bloom filter, Count Min Sketch and HyperLogLog.
Andrii Gordiichuk, Software Developer
“Visualization of Big Data in Web Applications”
- Data in our life
- Patterns for data visualization
- Technologies for data visualization
- SVG and Canvas
- Frameworks for data visualization. Selection criteria
- D3.js and Highcharts.js
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
A talk at the CODATA/RDA meeting in Gaborone, Botswana. I made the case that the biggest barriers to effective data sharing and reuse are often those associated with "data friction" and that cloud automation can be used to overcome those barriers.
The image on the first slide shows a few of the more than 20,000 active Globus endpoints.
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web
Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0
Visit the conference website to learn more: http://telaviv.datanatives.io/
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
I am a data science researcher. I have a diverse academic background - a B.Sc. in electrical engineering, a B.Sc. in physics (cum laude) from Tel Aviv University's prestigious program for parallel B.Sc. in Physics and in Electrical Engineering, an M.Sc. in condensed matter (cum laude), and have started my Ph.D. in bioinformatics. Prior to my M.Sc. I have served as a captain in a technology unit of the IDF.
I am passionate about science and solving complex big data problems that require out of the box thinking, and like to dive deep into the details. I always take a positive, proactive approach, and put an emphasis on understanding the big picture as well.
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
Director's Colloquium at Los Alamos National Laboratory, September 18, 2014.
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
Reviews recent results from the Materials Data Facility. Thanks in particular to Ben Blaiszik, Jonathon Goff, and Logan Ward, and the Globus data search team. Some features shown here are still in beta. We are grateful for NIST for their support.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
In this paper we present the feature extraction based estimation of rain fall by cross correlating
cloud RADAR Data. The idea is to select a square box of around 200x200 pixels around the point of interest and
take the cross correlation between the last picture and one that is 5 or 10 minutes older. We then determine the
wind direction and speed by finding the highest point in the correlation. Last step is to interpolate the data
acquired in a tagged format to the latest data in the up-wind direction to get a prediction for the near future.
The basic principle works, but it is hard to get a good estimate of the wind direction.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
Dr. Frank Wuerthwein from the University of California at San Diego presentation at International Super Computing Conference on Big Data, 2013, US Until recently, the large CERN experiments, ATLAS and CMS, owned and controlled the computing infrastructure they operated on in the US, and accessed data only when it was locally available on the hardware they operated. However, Würthwein explains, with data-taking rates set to increase dramatically by the end of LS1 in 2015, the current operational model is no longer viable to satisfy peak processing needs. Instead, he argues, large-scale processing centers need to be created dynamically to cope with spikes in demand. To this end, Würthwein and colleagues carried out a successful proof-of-concept study, in which the Gordon Supercomputer at the San Diego Supercomputer Center was dynamically and seamlessly integrated into the CMS production system to process a 125-terabyte data set.
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
How does Twitter track the top trending topics?
How does Amazon keep track of the top-selling items for the day?
How many cabs have been booked this month using your App?
Is the password that a new user is choosing a common/compromised password?
Modern web-scale systems process billions of transactions and generate terabytes of data every single day. In order to find answers to questions against this data, one would initiate a multi-minute query against a NoSQL datastore or kick off a batch job written in a distributed processing framework such as Spark or Flink. However, these jobs are throughput-heavy and not suited for realtime low-latency queries. However, you and your customers would like to have all this information "right now".
At the end of this talk, you'll realize that you can power these low-latency queries and with incredibly low memory footprint "IF" you are willing to accept answers that are, say, 96-99% accurate. This talk introduces some of the go-to probabilistic data structures that are used by organisations with large amounts of data - specifically Bloom filter, Count Min Sketch and HyperLogLog.
Andrii Gordiichuk, Software Developer
“Visualization of Big Data in Web Applications”
- Data in our life
- Patterns for data visualization
- Technologies for data visualization
- SVG and Canvas
- Frameworks for data visualization. Selection criteria
- D3.js and Highcharts.js
Visualizing big data in the browser using sparkDatabricks
In this talk at 2015 Spark Summit East, @mhfalaki from Databricks shows how Spark can be used along with open source visualization tools such as, D3, Matplotlib, and ggplot, to address challenges in visualizing large data sets.
This presentation reviews Qlik and Tableau capabilities. It provides a summary of what the business intelligence experts say about each products pros and cons. It touches the surface on what each company provides and how much each product may cost.
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailogjuljo
Full day workshop about Microservices Architectures, from the basis to advanced topics like Service Discovery, Load Balancing, Fault Tolerance and Centralized Logging.
Many technologies are involved, like Spring Cloud Netflix, Docker, Cloud Foundry and ELK.
A separate deck describes all the lab exercises.
Big Data visualization with Apache Spark and Zeppelinprajods
This presentation gives an overview of Apache Spark and explains the features of Apache Zeppelin(incubator). Zeppelin is the open source tool for data discovery, exploration and visualization. It supports REPLs for shell, SparkSQL, Spark(scala), python and angular. This presentation was made on the Big Data Day, at the Great Indian Developer Summit, Bangalore, April 2015
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within tight power and thermal budgets. Advancements in multiple areas are necessary to improve AI model efficiency, including quantization, compression, compilation, and neural architecture search (NAS). In this presentation, we’ll discuss:
- Qualcomm AI Research’s latest model efficiency research
- Our new NAS research to optimize neural networks more easily for on-device efficiency
- How the AI community can take advantage of this research though our open-source projects, such as the AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.
We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core.
The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed:
1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth.
2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.
Visual, Interactive, Predictive Analytics for Big DataArimo, Inc.
Adatao Demo at the First Apache Spark Summit, Nikko Hotel, San Francisco, December 2, 2013
A real-time, live demo of the Adatao big-data analytics system for both Business Users and Data Scientists/Engineers. We showed terabyte data modeling in seconds on a 40-node cluster. And with a beautiful, user-friendly web app, as well as R/R-Studio & Python interfaces.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
Similar to Interactive Latency in Big Data Visualization (20)
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Latency:
a measure of time delay experienced in a system
rotational latency
network latency
query latency
interactive latency
3.
4. Questions
How to reduce interactive latency in big data visualization?
How does interactive latency affect user behavior?
5. Questions
How to reduce interactive latency in big data visualization?
How does interactive latency affect user behavior?
6. Reducing Latency
More memory
in-memory data store
Clever indexing
cube representation schemes
Parallel processing
multicore, GPGPU, distributed platforms
7. imMens: a holistic approach
Perceptual scalability
Binned aggregation as primary data reduction strategy
Interactive scalability
Multivariate data tiles
Parallel query processing and rendering on GPU
[Liu et. al. 2013]
8. imMens: a holistic approach
Perceptual scalability
Binned aggregation as primary data reduction strategy
Interactive scalability
Multivariate data tiles
Parallel query processing and rendering on GPU
[Liu et. al. 2013]
9. Guiding Principle
Perceptual & interactive scalability should be limited
by the chosen resolution of the visualized data,
not the number of records.
19. imMens: a holistic approach
Perceptual scalability
Binned aggregation as primary data reduction strategy
Interactive scalability
Multivariate data tiles
Parallel query processing and rendering on GPU
[Liu et. al. 2013]
47. Client-Side Processing
47
0
1
…
11
768 769 … 1023
512
513
…
767
R
G
B
A
R
G
B
A
…
…
…
…
R
G
B
A
data
Hles
query
fragment
shader
Y
[768-‐1023]
X
[512-‐767]
{
0
1
…
11
Pass
1
projecHons
off-‐screen
FBO
render
fragment
shader
Pass
2
canvas
Pack
data
Hles
as
images
(352KB
for
Brightkite)
Bind
to
WebGL
context
as
textures
48. 48
Simulate brush & linking across
plots in a scatter plot matrix
imMens vs. full data cube
60 synthesized datasets
Parameters
bin count per dimension
(10,20,30,40,50)
number of records
(10K, 100K, 1M, 10M, 100M, 1B)
number of dimensions (4,5)
Performance Benchmarks
49. 49
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2
caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with
1024MB video RAM.
51.9
52.3
51.6
52.0
53.2
52.1
5.5
3.0
2.2
50. 50
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2
caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with
1024MB video RAM.
51.9
52.3
51.6
52.0
53.2
52.1
5.5
3.0
2.2
51. 51
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2
caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with
1024MB video RAM.
51.9
52.3
51.6
52.0
53.2
52.1
5.5
3.0
2.2
50fps querying and
rendering of 1B data points
56. Newell (1994) Card et al (1983) Example Time Range
deliberate act perceptual fusion recognize a pattern,
track animation
~100 milliseconds
cognitive operation unprepared response click a link,
select an object
~1 second
unit task unit task edit a line of text,
make a chess move
~10 seconds
69. What is an insight?
"many new airlines emerged around year 2003”
"HP started in 2001, AS in 2003, PI in 2004, OH in 2003”
“OH started in 2003, and they are doing pretty well
in terms of delays”
70. Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction
binned aggregation for perceptual scalability
multivariate data tiles & GPU processing for low latency
How does interactive latency affect user behavior?
Comparative study: quantitative & qualitative analysis
71. Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction
binned aggregation for perceptual scalability
multivariate data tiles & GPU processing for low latency
How does interactive latency affect user behavior?
72. Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction
binned aggregation for perceptual scalability
multivariate data tiles & GPU processing for low latency
How does interactive latency affect user behavior?
User study: quantitative & qualitative analysis