Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
The document discusses model-simulation-and-measurement-based systems engineering of power system synchrophasor systems. It outlines the speaker's background and research interests in modeling and simulation technologies for cyber-physical power systems. The talk motivates the need for these technologies to enable applications like wide-area control systems using synchronized phasor measurements. It also discusses challenges in developing smart grids as complex cyber-physical systems and the roles that modeling and simulation can play in addressing these challenges.
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
This presentation is related to my Master's Thesis at Ozyegin University. We focused on data mining on the real streaming (not binary) data. The most popular data mining algorithm, Association Rule Mining (ARM), was performed during this study from scratch. At the end of the thesis, we published four national/international papers in the different conferences such as Cloud Computing and Big Data.
This document describes a proposed system for detecting cyber attacks using Bayesian inference. It begins with an introduction to the problem of credit/debit card theft and existing physical unclonable functions. It then discusses the disadvantages of existing cyber attack detection systems, such as performance issues and high false positive rates. The proposed system builds a directed acyclic graph to represent the probability distribution of variables related to cyber attacks. It will use modules for data collection, preprocessing, model training/testing, and attack detection. The system will be implemented in Python using frameworks like Django and evaluated using algorithms like random forest, artificial neural networks, and support vector machines.
A general framework for predicting the optimal computing configuration for cl...Scott Farley
This document summarizes Scott Farley's master's thesis presentation on developing a framework to predict optimal computing configurations for ecological forecasting models under climate change. The presentation discusses species distribution modeling and biodiversity informatics, describes challenges posed by big biodiversity data, and proposes using computational performance models to identify the hardware configuration that maximizes model accuracy while minimizing time and costs. The goal is to efficiently run ecological forecasting models on flexible cloud computing resources.
This document discusses spatio-temporal data mining and how deep learning techniques can be applied. Spatio-temporal data relates to both space and time, such as images showing destruction from flooding over time. Traditional data mining approaches did not perform well on spatio-temporal data due to independence assumptions. Deep learning models like CNNs, RNNs, and LSTMs are better suited as they can automatically learn features and handle sequence data. The document reviews frameworks for preprocessing spatio-temporal data and selecting appropriate deep learning models to address problems like prediction, classification, and learning. Real-time applications include transportation, social networks, climate, and neuroscience.
This document is a resume for Ye Xu, who has 5 years of experience in software engineering, data engineering, and data science. Xu developed efficient solutions for embedded systems and real-time computer resource management, and has experience building data pipelines and using machine learning on large datasets. Current areas of research interest include combining real-time systems with big data in fields like the Internet of Things.
Predicting Defects Using Change Genealogies (ISSE 2013)Kim Herzig
This document discusses using change genealogies, which model dependencies between code changes, to predict defects. It finds that models using change genealogy metrics outperform those based on code complexity or dependency networks alone, achieving better precision while maintaining close recall. Key metrics include network efficiency and relationships between changes and dependency types. The study confirms that code entities combining functionalities from multiple older changes are more defect-prone.
1. The document discusses distributed and private machine learning techniques for medical imaging devices like smartwatches that can collaborate without sharing sensitive patient data.
2. It proposes using techniques like split learning, federated learning, and differential privacy so that devices can train models together while keeping data localized.
3. The challenges include limited compute, bandwidth, and data availability from individual devices, as well as ensuring privacy and preventing data reconstruction attacks. Addressing these challenges is important for incentivizing widespread device collaboration at scale.
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
The document discusses model-simulation-and-measurement-based systems engineering of power system synchrophasor systems. It outlines the speaker's background and research interests in modeling and simulation technologies for cyber-physical power systems. The talk motivates the need for these technologies to enable applications like wide-area control systems using synchronized phasor measurements. It also discusses challenges in developing smart grids as complex cyber-physical systems and the roles that modeling and simulation can play in addressing these challenges.
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
This presentation is related to my Master's Thesis at Ozyegin University. We focused on data mining on the real streaming (not binary) data. The most popular data mining algorithm, Association Rule Mining (ARM), was performed during this study from scratch. At the end of the thesis, we published four national/international papers in the different conferences such as Cloud Computing and Big Data.
This document describes a proposed system for detecting cyber attacks using Bayesian inference. It begins with an introduction to the problem of credit/debit card theft and existing physical unclonable functions. It then discusses the disadvantages of existing cyber attack detection systems, such as performance issues and high false positive rates. The proposed system builds a directed acyclic graph to represent the probability distribution of variables related to cyber attacks. It will use modules for data collection, preprocessing, model training/testing, and attack detection. The system will be implemented in Python using frameworks like Django and evaluated using algorithms like random forest, artificial neural networks, and support vector machines.
A general framework for predicting the optimal computing configuration for cl...Scott Farley
This document summarizes Scott Farley's master's thesis presentation on developing a framework to predict optimal computing configurations for ecological forecasting models under climate change. The presentation discusses species distribution modeling and biodiversity informatics, describes challenges posed by big biodiversity data, and proposes using computational performance models to identify the hardware configuration that maximizes model accuracy while minimizing time and costs. The goal is to efficiently run ecological forecasting models on flexible cloud computing resources.
This document discusses spatio-temporal data mining and how deep learning techniques can be applied. Spatio-temporal data relates to both space and time, such as images showing destruction from flooding over time. Traditional data mining approaches did not perform well on spatio-temporal data due to independence assumptions. Deep learning models like CNNs, RNNs, and LSTMs are better suited as they can automatically learn features and handle sequence data. The document reviews frameworks for preprocessing spatio-temporal data and selecting appropriate deep learning models to address problems like prediction, classification, and learning. Real-time applications include transportation, social networks, climate, and neuroscience.
This document is a resume for Ye Xu, who has 5 years of experience in software engineering, data engineering, and data science. Xu developed efficient solutions for embedded systems and real-time computer resource management, and has experience building data pipelines and using machine learning on large datasets. Current areas of research interest include combining real-time systems with big data in fields like the Internet of Things.
Predicting Defects Using Change Genealogies (ISSE 2013)Kim Herzig
This document discusses using change genealogies, which model dependencies between code changes, to predict defects. It finds that models using change genealogy metrics outperform those based on code complexity or dependency networks alone, achieving better precision while maintaining close recall. Key metrics include network efficiency and relationships between changes and dependency types. The study confirms that code entities combining functionalities from multiple older changes are more defect-prone.
1. The document discusses distributed and private machine learning techniques for medical imaging devices like smartwatches that can collaborate without sharing sensitive patient data.
2. It proposes using techniques like split learning, federated learning, and differential privacy so that devices can train models together while keeping data localized.
3. The challenges include limited compute, bandwidth, and data availability from individual devices, as well as ensuring privacy and preventing data reconstruction attacks. Addressing these challenges is important for incentivizing widespread device collaboration at scale.
The document provides a summary of Quinn M. Owens' professional experience and achievements. It details their experience leading deployments for new Amazon data centers globally and developing new hardware designs. It also lists their roles and responsibilities in network engineering and technical program management positions at Amazon from 2012 to present.
Machine Learning 2 deep Learning: An IntroSi Krishan
The document provides an introduction to machine learning and deep learning. It discusses that machine learning involves making computers learn patterns from data without being explicitly programmed, while deep learning uses neural networks with many layers to perform end-to-end learning from raw data without engineered features. Deep learning has achieved remarkable success in applications involving computer vision, speech recognition, and natural language processing due to its ability to learn representations of the raw data. The document outlines popular deep learning models like convolutional neural networks and recurrent neural networks and provides examples of applications in areas such as image classification and prediction of heart attacks.
Shubhangi Tandon is pursuing a Master's degree in Computer Science at UC Santa Cruz with a GPA of 3.8/4. She received her Bachelor's degree in Information Technology from Delhi College of Engineering with distinction. Her technical skills include Python, TensorFlow, Java, C++ and machine learning algorithms. She has work experience as a researcher at UCSC, internships at VMware and Microsoft, and was a software developer at Goldman Sachs. Her projects include building chatbots using neural networks and conducting research on discourse coherence and argument summarization from social media.
The Challenges, Gaps and Future Trends: Network SecurityDeris Stiawan
This document discusses several challenges and future trends in network security, including network attacks, forensic investigation, cloud computing, heterogeneous networks, network graphs, network management, big data processing, and the Internet of Things. It provides examples of existing research and identifies opportunities for new research in tools and methods for defense, data analysis, clustering/classification, security mechanisms, privacy, quality of service, monitoring, and data processing in these domains.
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services
Scientists, developers, and other technologists from many different industries are taking advantage of Amazon Web Services to perform big data workloads from analytics to using data lakes for better decision making to meet the challenges of the increasing volume, variety, and velocity of digital information. This session will feature UCB's RISELab (Real time Intelligent Secure Execution), a new lab recently created at UCB to enable computers to make intelligent, real-time decisions. You will hear how they are building on their earlier success with AMPLab to enable applications to interact intelligently and securely with their environment in real time, wherever computing decisions need to interact with the world. From cybersecurity to coordinating fleets of self-driving cars and drones to earthquake warning systems, you will come away with insight on how they are using AWS to develop and experiment with the systems for important research. Learn More: https://aws.amazon.com/government-education/
This document summarizes a lecture on data integration. It discusses key challenges in data integration including providing uniform access to multiple autonomous and heterogeneous data sources. It describes common solutions like data warehousing and the virtual integration approach. Research projects on data integration and current industry solutions are also mentioned. Key concepts in data integration like wrappers, mediated schemas, query reformulation, and optimization are covered at a high level.
This tutorial provides an overview of recent advances in deep generative models. It will cover three types of generative models: Markov models, latent variable models, and implicit models. The tutorial aims to give attendees a full understanding of the latest developments in generative modeling and how these models can be applied to high-dimensional data. Several challenges and open questions in the field will also be discussed. The tutorial is intended for the 2017 conference of the International Society for Bayesian Analysis.
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao
This session will focus on basics of what Machine Learning is , different types of Machine Learning and Neural Networks , supervised and unsupervised machine learning with examples, AutoML for training models and this ends with an example of how to predict fraud , to determining shopping patterns to Wine picking and different algorithms as an example and also how to predict workload for your databases. We will also use OML in the Autonomous Database cloud to do this. If you are a DBA and want to learn something about machine learning and use the tools to perform your tasks more efficiently and automatically
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
This session will focus on basics of what Machine Learning is , different types of Machine Learning and Neural Networks , supervised and unsupervised machine learning with examples, AutoML for training models and this ends with an example of how to predict fraud , to determining shopping patterns to Wine picking and different algorithms as an example and also how to predict workload for your databases. We will also use OML in the Autonomous Database cloud to do this. If you are a DBA and want to learn something about machine learning and use the tools to perform your tasks more efficiently and automatically
This document is a resume for Mark Yashar. It summarizes his expertise in data analysis, scientific computing, physics, and high performance computing. It lists his qualifications including various programming languages, operating systems, and software. It also outlines his educational background including a PhD in Physics from UC Davis, and professional experience including roles doing data analysis, scientific modeling and simulation, and research.
This document is a resume for Mark Yashar summarizing his qualifications and experience in data analysis, scientific computing, physics, and related fields. It outlines his expertise in areas like image processing, algorithm development, data visualization, and machine learning. It also lists his proficiency with various programming languages, software applications, and high-performance computing platforms. His educational and professional background demonstrate extensive experience in scientific research, data analysis, and technical project roles.
ATI Courses Professional Development Short Course Applied Measurement Engin...Jim Jenkins
How do you know your test measurements are valid? Since NIST traceability actually guarantees little about your test data, how do you know? Could you prove validity to your customer? What is the right measurements solution for your testing requirements? Is it really as simple as the vendors say? What is your real cost of invalid, ambiguous data causing retest or, worst of all, hardware redesign?
This course is for engineers, scientists, and managers who must use systems to understand experimental test measurements on a daily basis. Learn how to design, buy and operate effective automated measurement systems providing demonstrably valid test data, the first time.
Fundamental & underlying engineering principles governing the design and operation of effective automated systems are demonstrated experimentally.
Big Data Analytics for connected home: a few usecases, some important messages and a little example. Presentation given at CEA Cadarache - Cité des Nouvelles Energies at the strategic comittee of ARCSIS (http://www.arcsis.org/missions.html)
Russell John Childs has over 20 years of experience in technical software engineering, modeling complex systems, and safety-critical C++ development. He has a PhD in Particle Physics from Birmingham University and skills in C++, algorithms, parallel programming, hardware modeling, testing, and more. His resume details roles at Microsoft, Sun Microsystems, Advantest, and more where he developed load balancing algorithms, hardware behavior models, testing frameworks, and more. He is currently seeking a role utilizing his experience in analysis, architecture, design, C++, and physics/mathematics background.
This document describes several C# projects from IEEE 2014, including summaries of each project. The projects cover topics like localizing jammers in wireless networks, network-coding based cloud storage, privacy-preserving search on encrypted cloud data, compatibility-aware cloud service composition, analyzing social media to understand students' experiences, viral marketing in social networks, opportunistic MAC for underwater sensor networks, WLAN monitoring systems, anonymous vehicle positioning in vehicular networks, and information flow control for cloud security.
Detection of Phishing Websites using machine Learning AlgorithmIRJET Journal
This document discusses the detection of phishing websites using machine learning algorithms. It begins with an abstract that defines phishing and explains why attackers use it. The introduction provides more details on phishing techniques and the need for anti-phishing detection methods. The document then reviews related work on phishing detection using machine learning features. It proposes using algorithms like artificial neural networks, k-nearest neighbors, support vector machines, and random forests. Features for these algorithms are discussed like URL-based, HTML/JavaScript-based, and domain-based features. The document concludes that machine learning classifiers can help detect phishing websites but future work is still needed to develop more effective detection systems.
Navy security contest-bigdataforsecuritystelligence
This document discusses using machine learning for security monitoring. It begins with an overview of why machine learning is useful for security monitoring and provides a high-level overview of machine learning concepts. It then discusses applying machine learning to practical security use cases like fraud detection, network anomaly detection, and predicting attack behaviors. Specific machine learning techniques like supervised learning, unsupervised learning, and anomaly detection are also discussed. Finally, it provides an example workflow for using machine learning in a security data science process.
From Efficiency to Innovation: Transforming Business Value through Gen AISameer Verma
The world of Al is undergoing a metamorphosis. Traditional Al, programmed for specific tasks like playing chess, is being eclipsed by the new era of learning Al. This new breed can adapt, analyze data, and even create content. This shift is a game-changer for enterprises. Repetitive tasks can be automated, vast datasets can be analyzed for insights, and even entirely new products can be Al-powered. But the workforce needs to adapt too. Collaboration with Al tools will be key, requiring new skillsets like critical thinking and problem-solving. Generative Al, with its ability to craft images, music, and even code, holds immense promise. However, current offerings are in their infancy they can be impressive but prone to stumbles and biases.
The future of business is a partnership with Al. Businesses must carefully assess current tools and invest in human-Al collaboration and continuous learning. This will be the key to navigating the exciting, but uncertain path ahead. Eventually, we must not lose sight of the true purpose of an enterprise to provide value to the consumers, in order to improve their lives, and to do so responsibly, and in a sustainable way that provides acceptable returns to stakeholders.
A Framework for Information Access in Rural and Remote CommunitiesSameer Verma
Access to information is predicated on the access to a digital infrastructure. However, access to electricity and the Internet remain elusive for a significant percentage of the world's population, let alone a sustainable access in one’s local language, local context, and relating to local culture. This paper examines the issues of resource constraints, and proposes a framework to classify them. It then proceeds to utilize this framework to look at three different case studies of implementations of offline Internet access in Madagascar, Jamaica and India.
Presented at IEEE ISTAS 2016. http://istas2016.org
More Related Content
Similar to Big Data Analytics: Concepts, Technologies, and Operations
The document provides a summary of Quinn M. Owens' professional experience and achievements. It details their experience leading deployments for new Amazon data centers globally and developing new hardware designs. It also lists their roles and responsibilities in network engineering and technical program management positions at Amazon from 2012 to present.
Machine Learning 2 deep Learning: An IntroSi Krishan
The document provides an introduction to machine learning and deep learning. It discusses that machine learning involves making computers learn patterns from data without being explicitly programmed, while deep learning uses neural networks with many layers to perform end-to-end learning from raw data without engineered features. Deep learning has achieved remarkable success in applications involving computer vision, speech recognition, and natural language processing due to its ability to learn representations of the raw data. The document outlines popular deep learning models like convolutional neural networks and recurrent neural networks and provides examples of applications in areas such as image classification and prediction of heart attacks.
Shubhangi Tandon is pursuing a Master's degree in Computer Science at UC Santa Cruz with a GPA of 3.8/4. She received her Bachelor's degree in Information Technology from Delhi College of Engineering with distinction. Her technical skills include Python, TensorFlow, Java, C++ and machine learning algorithms. She has work experience as a researcher at UCSC, internships at VMware and Microsoft, and was a software developer at Goldman Sachs. Her projects include building chatbots using neural networks and conducting research on discourse coherence and argument summarization from social media.
The Challenges, Gaps and Future Trends: Network SecurityDeris Stiawan
This document discusses several challenges and future trends in network security, including network attacks, forensic investigation, cloud computing, heterogeneous networks, network graphs, network management, big data processing, and the Internet of Things. It provides examples of existing research and identifies opportunities for new research in tools and methods for defense, data analysis, clustering/classification, security mechanisms, privacy, quality of service, monitoring, and data processing in these domains.
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Amazon Web Services
Scientists, developers, and other technologists from many different industries are taking advantage of Amazon Web Services to perform big data workloads from analytics to using data lakes for better decision making to meet the challenges of the increasing volume, variety, and velocity of digital information. This session will feature UCB's RISELab (Real time Intelligent Secure Execution), a new lab recently created at UCB to enable computers to make intelligent, real-time decisions. You will hear how they are building on their earlier success with AMPLab to enable applications to interact intelligently and securely with their environment in real time, wherever computing decisions need to interact with the world. From cybersecurity to coordinating fleets of self-driving cars and drones to earthquake warning systems, you will come away with insight on how they are using AWS to develop and experiment with the systems for important research. Learn More: https://aws.amazon.com/government-education/
This document summarizes a lecture on data integration. It discusses key challenges in data integration including providing uniform access to multiple autonomous and heterogeneous data sources. It describes common solutions like data warehousing and the virtual integration approach. Research projects on data integration and current industry solutions are also mentioned. Key concepts in data integration like wrappers, mediated schemas, query reformulation, and optimization are covered at a high level.
This tutorial provides an overview of recent advances in deep generative models. It will cover three types of generative models: Markov models, latent variable models, and implicit models. The tutorial aims to give attendees a full understanding of the latest developments in generative modeling and how these models can be applied to high-dimensional data. Several challenges and open questions in the field will also be discussed. The tutorial is intended for the 2017 conference of the International Society for Bayesian Analysis.
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao
This session will focus on basics of what Machine Learning is , different types of Machine Learning and Neural Networks , supervised and unsupervised machine learning with examples, AutoML for training models and this ends with an example of how to predict fraud , to determining shopping patterns to Wine picking and different algorithms as an example and also how to predict workload for your databases. We will also use OML in the Autonomous Database cloud to do this. If you are a DBA and want to learn something about machine learning and use the tools to perform your tasks more efficiently and automatically
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
This session will focus on basics of what Machine Learning is , different types of Machine Learning and Neural Networks , supervised and unsupervised machine learning with examples, AutoML for training models and this ends with an example of how to predict fraud , to determining shopping patterns to Wine picking and different algorithms as an example and also how to predict workload for your databases. We will also use OML in the Autonomous Database cloud to do this. If you are a DBA and want to learn something about machine learning and use the tools to perform your tasks more efficiently and automatically
This document is a resume for Mark Yashar. It summarizes his expertise in data analysis, scientific computing, physics, and high performance computing. It lists his qualifications including various programming languages, operating systems, and software. It also outlines his educational background including a PhD in Physics from UC Davis, and professional experience including roles doing data analysis, scientific modeling and simulation, and research.
This document is a resume for Mark Yashar summarizing his qualifications and experience in data analysis, scientific computing, physics, and related fields. It outlines his expertise in areas like image processing, algorithm development, data visualization, and machine learning. It also lists his proficiency with various programming languages, software applications, and high-performance computing platforms. His educational and professional background demonstrate extensive experience in scientific research, data analysis, and technical project roles.
ATI Courses Professional Development Short Course Applied Measurement Engin...Jim Jenkins
How do you know your test measurements are valid? Since NIST traceability actually guarantees little about your test data, how do you know? Could you prove validity to your customer? What is the right measurements solution for your testing requirements? Is it really as simple as the vendors say? What is your real cost of invalid, ambiguous data causing retest or, worst of all, hardware redesign?
This course is for engineers, scientists, and managers who must use systems to understand experimental test measurements on a daily basis. Learn how to design, buy and operate effective automated measurement systems providing demonstrably valid test data, the first time.
Fundamental & underlying engineering principles governing the design and operation of effective automated systems are demonstrated experimentally.
Big Data Analytics for connected home: a few usecases, some important messages and a little example. Presentation given at CEA Cadarache - Cité des Nouvelles Energies at the strategic comittee of ARCSIS (http://www.arcsis.org/missions.html)
Russell John Childs has over 20 years of experience in technical software engineering, modeling complex systems, and safety-critical C++ development. He has a PhD in Particle Physics from Birmingham University and skills in C++, algorithms, parallel programming, hardware modeling, testing, and more. His resume details roles at Microsoft, Sun Microsystems, Advantest, and more where he developed load balancing algorithms, hardware behavior models, testing frameworks, and more. He is currently seeking a role utilizing his experience in analysis, architecture, design, C++, and physics/mathematics background.
This document describes several C# projects from IEEE 2014, including summaries of each project. The projects cover topics like localizing jammers in wireless networks, network-coding based cloud storage, privacy-preserving search on encrypted cloud data, compatibility-aware cloud service composition, analyzing social media to understand students' experiences, viral marketing in social networks, opportunistic MAC for underwater sensor networks, WLAN monitoring systems, anonymous vehicle positioning in vehicular networks, and information flow control for cloud security.
Detection of Phishing Websites using machine Learning AlgorithmIRJET Journal
This document discusses the detection of phishing websites using machine learning algorithms. It begins with an abstract that defines phishing and explains why attackers use it. The introduction provides more details on phishing techniques and the need for anti-phishing detection methods. The document then reviews related work on phishing detection using machine learning features. It proposes using algorithms like artificial neural networks, k-nearest neighbors, support vector machines, and random forests. Features for these algorithms are discussed like URL-based, HTML/JavaScript-based, and domain-based features. The document concludes that machine learning classifiers can help detect phishing websites but future work is still needed to develop more effective detection systems.
Navy security contest-bigdataforsecuritystelligence
This document discusses using machine learning for security monitoring. It begins with an overview of why machine learning is useful for security monitoring and provides a high-level overview of machine learning concepts. It then discusses applying machine learning to practical security use cases like fraud detection, network anomaly detection, and predicting attack behaviors. Specific machine learning techniques like supervised learning, unsupervised learning, and anomaly detection are also discussed. Finally, it provides an example workflow for using machine learning in a security data science process.
Similar to Big Data Analytics: Concepts, Technologies, and Operations (20)
From Efficiency to Innovation: Transforming Business Value through Gen AISameer Verma
The world of Al is undergoing a metamorphosis. Traditional Al, programmed for specific tasks like playing chess, is being eclipsed by the new era of learning Al. This new breed can adapt, analyze data, and even create content. This shift is a game-changer for enterprises. Repetitive tasks can be automated, vast datasets can be analyzed for insights, and even entirely new products can be Al-powered. But the workforce needs to adapt too. Collaboration with Al tools will be key, requiring new skillsets like critical thinking and problem-solving. Generative Al, with its ability to craft images, music, and even code, holds immense promise. However, current offerings are in their infancy they can be impressive but prone to stumbles and biases.
The future of business is a partnership with Al. Businesses must carefully assess current tools and invest in human-Al collaboration and continuous learning. This will be the key to navigating the exciting, but uncertain path ahead. Eventually, we must not lose sight of the true purpose of an enterprise to provide value to the consumers, in order to improve their lives, and to do so responsibly, and in a sustainable way that provides acceptable returns to stakeholders.
A Framework for Information Access in Rural and Remote CommunitiesSameer Verma
Access to information is predicated on the access to a digital infrastructure. However, access to electricity and the Internet remain elusive for a significant percentage of the world's population, let alone a sustainable access in one’s local language, local context, and relating to local culture. This paper examines the issues of resource constraints, and proposes a framework to classify them. It then proceeds to utilize this framework to look at three different case studies of implementations of offline Internet access in Madagascar, Jamaica and India.
Presented at IEEE ISTAS 2016. http://istas2016.org
This document describes the XOVis learning analytics and visualization tool. XOVis collects metadata from students' work on their laptops to provide insights into learning and engagement. Student work is stored locally and then synced across schools and to the cloud using CouchDB and eventual consistency. This allows analytics even when internet is unavailable. XOVis processing and reporting is done both at local school appliances and in the cloud. The goal is to help educators better understand learning through visualized analytics on student computer usage.
Juju, LXC, OpenStack: Fun with Private CloudsSameer Verma
Description: Private clouds fill an interesting space in the cloud roadmap. They can provide a scalable, reliable, fault-tolerant cloud platform on your own infrastructure, and can be balanced with public cloud offerings. We will look at three technologies. OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface. Juju, a cloud orchestration platform from Ubuntu, enables you to build entire environments in the cloud with only a few commands on public clouds like Amazon Web Services and HP Cloud, to private clouds built on OpenStack. LXC is the userspace control package for Linux Containers, a lightweight virtual system mechanism sometimes described as “chroot on steroids”. LXC builds up from chroot to implement complete virtual systems, adding resource management and isolation mechanisms to Linux’s existing process management infrastructure. How cool would it be, to walk around with a private cloud on your laptop?
"Computer, end program": Virtualization and the CloudSameer Verma
One does not simply explain "cloud". A continuum from virtual machines to the cloud, with a Star Trek bias. Holodeck, virtual machines, hypervisors, pulbic cloud, private cloud, hybrid cloud, VirtualBox, Ubuntu, OpenStack, and finally, Make it so!
Creativity and Innovation with One Laptop per ChildSameer Verma
How the One Laptop per Child project comes up with creative and innovative solutions to challenging problems by changing the constraints to the problems.
The document discusses the One Laptop per Child (OLPC) initiative which aims to provide low-cost and rugged laptops to empower education for children in developing areas of the world. It has distributed over 3 million laptops to children in over 40 countries speaking over 30 languages. The laptops use the Sugar interface and are designed for collaborative, joyful learning through activities like TurtleArt, Scratch, and measuring. OLPC has implementations in specific areas described like Nigeria, Thailand, India, Mongolia, Ethiopia, and more.
The Joy of Z Axis: Creativity and Innovation through 3D PrintingSameer Verma
Presentation on creativity and innovation through 3D printing. Featuring the Printrbot Jr. V2 at the College of Business, San Francisco State University.
One Laptop per Child and Sugar: Collaborative, Joyful and Self-empowered Lear...Sameer Verma
The One Laptop Per Child (OLPC) project has had several beginnings. The idea has roots in the 60s. It gained momentum in the last 15 years. OLPC released the idea to the world in 2005, and its first product in 2007. A lot has changed since then. We'll look at an update on the projects, learning through robotics, assessment through learning analytics, offline mirco-clouds, HTML5 apps, Sugar on tablets and Raspberry Pi, and other new initiatives. In a world of cheap, Android-driven tablets, how does the idea of OLPC fit? What role does the Sugar learning platform continue to play inside and outside of OLPC? Help us grow the initiatives so that children of the world may continue to have a chance at collaborative, joyful, and self-empowered learning.
Pathagar is a book publishing company founded by Sameer Verma that focuses on open source textbooks. It maintains a GitHub repository where it publishes free and open source textbooks that can be accessed and customized by students and educators around the world to help improve access to affordable education. The company aims to lower the costs of textbooks while increasing their availability.
Education and Social Inclusion through InformationSameer Verma
The document discusses the One Laptop Per Child (OLPC) organization, which aims to empower children worldwide through education. Its mission is to provide each child with a low-cost, rugged laptop to support collaborative and self-directed learning. OLPC has distributed over 3 million laptops to children in over 40 countries. The document outlines OLPC's educational approach and principles, technical specifications for its XO laptop, and its software platform and learning content. It also describes OLPC's architecture which utilizes cloud, on-site micro-cloud, and individual devices to enable learning even without internet connectivity.
Data by itself is simply a collection of numbers. It only becomes meaningful when we weave it through context. A context of relevance that creates information - provides insight, creates solutions and solves problems. The Web gives us a fabric of connectedness, but if the data isn't substantiated semantically, the information we create isn't very useful. By building effective web assets using platforms like Drupal, we build ways to solve problems across the spectrum from local to global. We not only build the Web the way it was meant to be, but we also build it to support a commons across community, enterprise and government for generations to come.
An introduction to virtualization as a concept, its implementation in VirtualBox and an extension into an OpenStack private cloud. Done at SF State University. See more at http://commons.sfsu.edu/virtualization-and-cloud
Social Justice and Equity through InformationSameer Verma
This document summarizes a presentation by Sameer Verma on social justice and equity through information. It discusses how free and open source software can help increase access to information for underserved communities and reduce the digital divide. It provides examples of how One Laptop Per Child is working to provide low-cost laptops and educational resources to children in over 40 countries worldwide, especially in rural areas lacking technology and infrastructure. The presentation emphasizes using technology and information to empower communities and further social justice and equity goals.
Social Justice and Equity through InformationSameer Verma
This document summarizes a presentation about social justice and equity through information and technology. It discusses how free and open source software can help increase access to information globally. It provides examples of the One Laptop Per Child (OLPC) initiative that aims to provide low-cost laptops to children in developing countries around the world. Specific examples of OLPC programs in countries like India, Jamaica, Afghanistan and partnerships with San Francisco State University are mentioned. The document advocates that technologies like OLPC can help more of the world gain access to education and information.
Facilitating a Digital Commons for Generations to ComeSameer Verma
This document discusses facilitating a digital commons for future generations. It covers topics such as using Creative Commons licenses to enable legal sharing and collaboration of educational resources. It provides examples of how open licensing policies have been applied to funding for educational grants and open high school curriculum development. The importance of open access platforms for curating and disseminating resources like books, music and videos is also covered. Examples discussed include the Internet Archive and low-cost solutions like Dreamplug to provide access in remote areas. The overall message is the importance of keeping educational resources open and accessible for generations to come.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Big Data Analytics: Concepts, Technologies, and Operations
1. Sameer Verma, Ph.D.
Big Data Analytics
Concepts, technologies, and operations
Sameer Verma, Ph.D.
Professor and Chair, Information Systems
Lam Family College of Business
San Francisco State University
San Francisco, CA 94132 USA
https://faculty.sfsu.edu/~sverma
sverma@sfsu.edu
3. Sameer Verma, Ph.D.
University of the West Indies
Institutional Academic Partner
Centre of Excellence
Mona School of Business & Mgmt
University of the West Indies
Jamaica
5. Sameer Verma, Ph.D.
Big
●
Volume
– Size of dataset
●
Petabytes (1015), Exabytes (1018), Zettabytes (1021).
●
Variety
– Complex
●
Structured and unstructured text, audio, video etc.
●
Velocity
– Near-real time input, processing and output.
●
Veracity
– Questionable quality of input, false discovery rates...
6. Sameer Verma, Ph.D.
Sample v Population
●
Sampling leads to inferences.
●
We sample randomly, or in stratified modes, to
gain a lower scale.
●
Extrapolate results to population.
– p-value is of utmost importance!
●
What if we could crunch the entire population?
– No need to sample?
8. Sameer Verma, Ph.D.
Normalization
●
A process of restructuring a relational
database
●
A series of “normal forms” in order to reduce
data redundancy and improve data integrity
●
It was first proposed by Edgar F. Codd as
an integral part of his relational model.
9. Sameer Verma, Ph.D.
A Bookstore Example
●
Suggested fields for the bookstore:
– Title
– Author
– Author Biography
– ISBN
– Price
– Subject
– Number of Pages
– Publisher
– Publisher Address
– Description
– Review
– Reviewer Name
11. Sameer Verma, Ph.D.
Normalizing once: 1NF
Reduce redundancy across columns. Make values in each column of a
table atomic, i.e. no longer divisible
•Author
•Bio
•Subject
14. Sameer Verma, Ph.D.
NoSQL
●
Databases that require one table
●
No SQL-like relationships
●
Clickstream data
– Twitter, Facebook, etc.
●
Serialization: Reverse of Normalization
15. Sameer Verma, Ph.D.
JavaScript Object Notation
●
JSON or JavaScript Object Notation
{
"Table1": [
{
"id": 0,
"title": "Beginning MySQL Database Design and
Optimization"
},
{
"id": 1,
"firstname": "Jon"
},
{
"id": 2,
"lastname": "Stephens"
}
]
}
Title First Name Last Name
Beginning
MySQL
Database
Design and
Optimizatio
n
Jon Stephens
16. Sameer Verma, Ph.D.
JSON and JBSON
●
JSON is for text-like data
●
JBSON is Binary JSON
– Serialize anything as binary!
– Store music or video as BSON.
●
More detail:
https://en.wikipedia.org/wiki/NoSQL
17. Sameer Verma, Ph.D.
Analytics
●
Descriptive statistics
– Frequency count, mean, variance, etc.
●
Not inferring from sample stats.
●
Usually applied to population.
●
Four stages:
– Measure, Collect, Analyze, Report.
18. Sameer Verma, Ph.D.
Descriptive vs Inferential
●
Inferential: As sampled and extrapolated. See Cook
& Campbell (1979)
– Statistical validity: Validity of correlation.
– Internal validity: Correlation reflects a causal relationship
– Construct validity: Higher order constructs (independent,
dependent variables)
– External validity: Generalization across variations.
●
Descriptive: Applies to the entire population, as
measured.
19. Sameer Verma, Ph.D.
Near-real time
●
Input is usually near-real time.
– Automated processes.
– System and user logs.
●
Processing has to be near-real time.
– Mapped and distributed.
●
Output is expected to be near-real time.
– Trends, associations.
20. Sameer Verma, Ph.D.
SQL vs NoSQL
●
SQL
– Large structured data broken into smaller atomic ones,
connected by relationships.
– Relationships are integral to the DBMS.
– Multiple tables and keys (primary, foreign).
●
NoSQL
– Semi-structured and unstructured data, collapsed into strings.
– Relationships have to be handled outside the DBMS.
– Single table, columnar. Usually indexed.
21. Sameer Verma, Ph.D.
Column DB
●
A columnar database is a table with one
column (and one more for indexing).
●
Collapse (serialize) multiple “fields” into one
string.
{"Table1": [{"id": 0,"title": "Beginning MySQL Database Design and Optimization"},{"id": 1,"firstname": "Jon"},{"id": 2,"lastname": "Stephens"}]}
Title First Name Last Name
Beginning
MySQL
Database
Design and
Optimization
Jon Stephens becomes
24. Sameer Verma, Ph.D.
MapReduce
●
MapReduce
– Maps data into smaller components
– Reduces or distills the output from each
computational node.
●
Runs in unison and continuously.
●
Distributes the load across multiple cloud
machines.
26. Sameer Verma, Ph.D.
Cloud Computing
●
Moore’s law
– Cost and size being constant, computing crunch
doubles every 18 to 24 months.
●
Metcalfe’s law
– utility of a network is proportional to the square of
the number of connected computers.
●
Both observations are exponential in nature.
●
Cloud computing is the confluence of both.
27. Sameer Verma, Ph.D.
Cloud Computing
●
Infrastructure as a Service (IaaS).
– Amazon, Azure, Google, Openstack...
●
Utility-oriented.
●
Pay-as-you-go.
●
Challenges: provisioning and scaling of a
given architecture.
28. Sameer Verma, Ph.D.
Orchestration
●
Orchestration
– Streamlined provisioning and scaling
– Distilled ops
– Abstracted away from cloud vendors
●
API
●
Provision on any cloud platform.
●
AWS, Azure, Google, Openstack...
29. Sameer Verma, Ph.D.
Ubuntu Juju
●
Juju
– Canonical: Makers of Ubuntu.
– Open Source
– Application and Service modeling tool
– Deploy, Manage and Scale on any cloud
– Charms - https://charmhub.io
31. Sameer Verma, Ph.D.
Hadoop Hbase via Juju
●
Hadoop Hbase “charm”
– Fourteen unit big data cluster
– A distributed big data store with MapReduce
– Run on 8 machines in your cloud.
34. Sameer Verma, Ph.D.
Containers vs VM
●
Virtual Machine includes a kernel
●
Containers logically replicate all that is the
same across installs.
– Share kernel
– Account, resource and file system isolation
●
BSD jails, chroot, Docker, LXC.
35. Sameer Verma, Ph.D.
LXC as local cloud
●
LXC can run on a laptop
●
LXD to manage LXC containers
– https://charmhub.io/lxd
– juju deploy lxd
36. Sameer Verma, Ph.D.
Kubernetes
●
Container orchestration
system (via Google)
●
Containers can be a
mix&match of VMs,
Docker, etc.
●
https://en.wikipedia.org/
wiki/Kubernetes
37. Sameer Verma, Ph.D.
Micro Kubernetes
●
A micro installation of Kubernetes
– Microk8s (aka microkates)
– https://microk8s.io/
●
Run on your dev machine
– snap install microk8s
●
Run on Raspberry Pi
●
“Edge” device
38. Sameer Verma, Ph.D.
Conclusion
●
Population data (Volume)
●
Unstructured data (Variety)
●
Near-real time (Velocity)
●
Descriptive stats (Veracity)
●
Cloud Computing = crunch + network