Google processes 400 petabytes of data every month and that was way back in 2007! With users generating massive amounts of data in social networking sites like Facebook and Twitter, and an increase in the use of sensor devices, the amount of data generated is only going to go up. Further, with the cost of hard-disks going down, and such data being made available to everyone, and with the advent of cloud computing, we now have the power to process such data ourselves.
What are the challenges of processing such massive amounts of data? With such data being available to every corporation, big or small, how does this change how we have been perceiving data? The talk takes you through some of the technologies used to tackle these challenges.
The talk has been tailored to suit students. It helps them relate to and appreciate the subjects they learn in their curriculum - data structures, programming languages, databases, operating systems, networking etc. At the same time, it describes some of the interesting work being done in the software industry in the areas of databases, data analysis, cloud computing etc.
Gautham's talk will aim to demystify the world of coding to the non-techie, give practical tips on where and
how to get started, be both practical and motivational so participants can move from “I want to learn to code”
to “I will code” this New Year.
Carolyn Poe is the chair of the Computer Information Technology department at Lone Star College - Montgomery. She discusses several topics in her presentation including dropping classes, FERPA laws, using the syllabus as a contract, and various websites that may be helpful for students and instructors. She recommends engaging students with tools on the internet and provides many examples of educational and creative websites like YouTube, Google Translate, and Animoto.
The document lists various technology tools and resources for educators, including websites for learning tools, direct instruction, student products, presentations, mind mapping, educational portals, podcasts, wikis, blogs, digital storytelling, and video platforms. It emphasizes that effective teaching is not just about the tools used, but how they are applied to engage students. A disclaimer is included to remind readers that the most important aspect is not which tools are used, but how they enhance student learning.
Avoid 3 things and eat 3 food to beat prostatitisAmandaChou9
Men can beat prostatitis by doing three things and eating three foods. Also, drug treatment is important, like natural medicine Diuretic and Anti-inflammatory Pill
The document discusses a class on recent technologies for both Macintosh and personal computers. John Harrison will teach the class and cover many new technologies in an unbiased way, endorsing both Apple and IBM products. The goal is not to teach how to use specific applications but to expose students to a variety of technologies that can run on different computers.
Building your own Desktop Cloud EnvironmentJnaapti
As developers we have seen these problems:
Our development environments accumulate lots of applications and libraries over a period of months.
We are usually in the habit of installing everything in one machine.
We fear that we may screw up our development environment and that means unproductive man-hours.
We forget that a multi-machine deployment is different from a single machine deployment.
How about virtualization in the desktop?
In this demo, I will take you through the steps to create a multi-VM development environment.
This demo will make use of QEMU, KVM and Virt Manager and show you how you can create a VM image, and then start servers with a set of commands, deploy your app, test everything and tear down the environment once you are happy - all this in the cosy comforts of your laptop or desktop.
The Jnaapti development environment is based on this setup.
Gautham's talk will aim to demystify the world of coding to the non-techie, give practical tips on where and
how to get started, be both practical and motivational so participants can move from “I want to learn to code”
to “I will code” this New Year.
Carolyn Poe is the chair of the Computer Information Technology department at Lone Star College - Montgomery. She discusses several topics in her presentation including dropping classes, FERPA laws, using the syllabus as a contract, and various websites that may be helpful for students and instructors. She recommends engaging students with tools on the internet and provides many examples of educational and creative websites like YouTube, Google Translate, and Animoto.
The document lists various technology tools and resources for educators, including websites for learning tools, direct instruction, student products, presentations, mind mapping, educational portals, podcasts, wikis, blogs, digital storytelling, and video platforms. It emphasizes that effective teaching is not just about the tools used, but how they are applied to engage students. A disclaimer is included to remind readers that the most important aspect is not which tools are used, but how they enhance student learning.
Avoid 3 things and eat 3 food to beat prostatitisAmandaChou9
Men can beat prostatitis by doing three things and eating three foods. Also, drug treatment is important, like natural medicine Diuretic and Anti-inflammatory Pill
The document discusses a class on recent technologies for both Macintosh and personal computers. John Harrison will teach the class and cover many new technologies in an unbiased way, endorsing both Apple and IBM products. The goal is not to teach how to use specific applications but to expose students to a variety of technologies that can run on different computers.
Building your own Desktop Cloud EnvironmentJnaapti
As developers we have seen these problems:
Our development environments accumulate lots of applications and libraries over a period of months.
We are usually in the habit of installing everything in one machine.
We fear that we may screw up our development environment and that means unproductive man-hours.
We forget that a multi-machine deployment is different from a single machine deployment.
How about virtualization in the desktop?
In this demo, I will take you through the steps to create a multi-VM development environment.
This demo will make use of QEMU, KVM and Virt Manager and show you how you can create a VM image, and then start servers with a set of commands, deploy your app, test everything and tear down the environment once you are happy - all this in the cosy comforts of your laptop or desktop.
The Jnaapti development environment is based on this setup.
Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad
The document discusses human computation and how crowdsourcing can be used to generate metadata. It describes different models of human computation, including socially motivated tasks like tagging photos on Flickr, economically motivated tasks on Amazon Mechanical Turk, and tacit tasks like reCAPTCHAs. The document also discusses how human computation draws on human abilities at visual and language tasks to solve problems in parallel in a way similar to bittorrent networks. It argues that successful systems motivate participation through incentives, games, or the ability to contribute to a collective knowledge base.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
Christian Heilmann is a hacker and geek who is passionate about sharing his passions. He is in Atlanta to help with a Hack Day at Georgia Tech. He discusses the process of hack days - focusing first on an idea, using available data sources and APIs, and creating functional interfaces. He provides examples of past hacks using Twitter and earthquake data. The goal is for participants to work in teams on new hack projects using available Yahoo and other resources over the 24 hour period.
Christian Heilmann gave a talk on hacking and innovation at a university hack challenge. He defines hacking as altering systems to do what you want using available resources, and sees it as a way to have fun and drive unrestrained innovation. He encourages attendees to find something annoying with current systems and build workarounds. To hack effectively, one needs access to data sources, the data itself, and ways to reach users. He provides examples of his own hacks that make systems more accessible or filter data for specific uses. The talk aims to show attendees their potential and get feedback on explanations of development resources.
Presentation by Haroon Meer, Roelof Tammingh at black hat USA in 2006.
This presentation is about Suru, the inline proxy tool developed by Roelof Tammingh. How it works and some of it's features are discussed.
The document discusses what big data is, sources of big data, the 3 V's of big data (volume, velocity, and variety), and provides an example use case of how an e-commerce company could use big data and Hadoop to analyze customer purchase trends and offer targeted promotions. Specifically, big data refers to extremely large data sets in the order of petabytes that are growing rapidly. Common sources include social media, e-commerce, weather data, telecom data, and stock market data. Hadoop is then introduced as an open source framework for storing, processing, and analyzing large datasets in a distributed fashion across commodity hardware.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
The document discusses big data and provides examples of how it can be collected and analyzed. It describes a master's thesis that collected 74,000 Dutch news articles over 2 months to analyze rare content. It also describes a bachelor's thesis that automated the coding of tweets to determine the tone politicians used when referring to opponents. The document outlines the typical process of collecting, storing, and analyzing big data and describes the infrastructure used in the workshop to collect Twitter tweets, news articles, and web snapshots.
The document discusses big data sources, challenges, and analytics. It describes how big data is too large to be managed by traditional databases due to its volume, velocity, variety, and veracity. Big data comes from sources like web pages, social media, sensors, and financial transactions. Analyzing big data requires distributed computing across clusters of servers to store and process the data in parallel. Frameworks like MapReduce and Hadoop were developed to perform big data analytics across clusters and address challenges of node failures, network bottlenecks, and distributed programming.
Roddy Lindsay discusses how Facebook generates large amounts of user data daily and the challenges of analyzing this data at scale. Facebook initially used Oracle and Hadoop to analyze data but developed its own SQL-like query language called Hive to allow business analysts to access data. Hive distributed queries across large Hadoop clusters, enabling decentralized access. This allowed text analytics like sentiment analysis and associations mapping. Lindsay believes such analytics could help individuals understand their own happiness patterns from personal data.
Internal training presentation about how I go about advocating Yahoo to the outside world and what gets me pretty excited about our developer offers at the moment.
William Jones and others presented on bringing information together across devices and applications. They proposed modeling information structure using itemMirror objects that could be accessed by different applications. This would allow information to remain where it is while being used across platforms. A spring project was proposed for students to build HTML5 apps that work with the same information through itemMirror objects. The goal is to separate information from applications and stores to avoid lock-in and allow mixing and matching of tools.
The document discusses the vision of the Semantic Web and how it allows data to be shared and reused across applications. It outlines some of the key components of the Semantic Web like ontology, RDF, and URIs. It also discusses some common misconceptions about the Semantic Web, including that it is not about building AI applications or that it requires large ontologies. The Semantic Web is envisioned to seamlessly integrate with the existing Web to allow easier sharing and integration of data.
Nicholas Schiller presented on using APIs to customize library services. He demonstrated how to build a web application using the WorldCat Search API that automatically adds Boolean search terms to a user's query and formats the results. The application was built with PHP for server-side scripting, HTML5 for interface design, and jQuery Mobile to optimize for different devices. The presentation provided examples of APIs, guidelines for API projects, and resources for further learning about APIs and programming.
The document discusses the semantic web and its potential uses for liberal arts campuses. It provides an overview of semantic web technologies like RDF, OWL, and SPARQL. Examples are given of how semantic web tools could be used for campus projects, pedagogy, and research by exposing metadata and linking data. Challenges mentioned include complexity, lack of visible applications, and the ecological growth needed for widespread adoption.
The document discusses the evolution of the World Wide Web from static Web 1.0 to participatory Web 2.0 and the emerging Semantic Web or Web 3.0. Web 3.0 aims to add more context and meaning to online data through techniques like tagging, mapping, and natural language processing in order to better interconnect information and help computers assist users. Key aspects of the Semantic Web include using identifiers for things, representing relationships between things using languages like RDF and OWL, and using reasoners and queries to infer new conclusions and answers from semantically linked data.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad
The document discusses human computation and how crowdsourcing can be used to generate metadata. It describes different models of human computation, including socially motivated tasks like tagging photos on Flickr, economically motivated tasks on Amazon Mechanical Turk, and tacit tasks like reCAPTCHAs. The document also discusses how human computation draws on human abilities at visual and language tasks to solve problems in parallel in a way similar to bittorrent networks. It argues that successful systems motivate participation through incentives, games, or the ability to contribute to a collective knowledge base.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
Christian Heilmann is a hacker and geek who is passionate about sharing his passions. He is in Atlanta to help with a Hack Day at Georgia Tech. He discusses the process of hack days - focusing first on an idea, using available data sources and APIs, and creating functional interfaces. He provides examples of past hacks using Twitter and earthquake data. The goal is for participants to work in teams on new hack projects using available Yahoo and other resources over the 24 hour period.
Christian Heilmann gave a talk on hacking and innovation at a university hack challenge. He defines hacking as altering systems to do what you want using available resources, and sees it as a way to have fun and drive unrestrained innovation. He encourages attendees to find something annoying with current systems and build workarounds. To hack effectively, one needs access to data sources, the data itself, and ways to reach users. He provides examples of his own hacks that make systems more accessible or filter data for specific uses. The talk aims to show attendees their potential and get feedback on explanations of development resources.
Presentation by Haroon Meer, Roelof Tammingh at black hat USA in 2006.
This presentation is about Suru, the inline proxy tool developed by Roelof Tammingh. How it works and some of it's features are discussed.
The document discusses what big data is, sources of big data, the 3 V's of big data (volume, velocity, and variety), and provides an example use case of how an e-commerce company could use big data and Hadoop to analyze customer purchase trends and offer targeted promotions. Specifically, big data refers to extremely large data sets in the order of petabytes that are growing rapidly. Common sources include social media, e-commerce, weather data, telecom data, and stock market data. Hadoop is then introduced as an open source framework for storing, processing, and analyzing large datasets in a distributed fashion across commodity hardware.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
The document discusses big data and provides examples of how it can be collected and analyzed. It describes a master's thesis that collected 74,000 Dutch news articles over 2 months to analyze rare content. It also describes a bachelor's thesis that automated the coding of tweets to determine the tone politicians used when referring to opponents. The document outlines the typical process of collecting, storing, and analyzing big data and describes the infrastructure used in the workshop to collect Twitter tweets, news articles, and web snapshots.
The document discusses big data sources, challenges, and analytics. It describes how big data is too large to be managed by traditional databases due to its volume, velocity, variety, and veracity. Big data comes from sources like web pages, social media, sensors, and financial transactions. Analyzing big data requires distributed computing across clusters of servers to store and process the data in parallel. Frameworks like MapReduce and Hadoop were developed to perform big data analytics across clusters and address challenges of node failures, network bottlenecks, and distributed programming.
Roddy Lindsay discusses how Facebook generates large amounts of user data daily and the challenges of analyzing this data at scale. Facebook initially used Oracle and Hadoop to analyze data but developed its own SQL-like query language called Hive to allow business analysts to access data. Hive distributed queries across large Hadoop clusters, enabling decentralized access. This allowed text analytics like sentiment analysis and associations mapping. Lindsay believes such analytics could help individuals understand their own happiness patterns from personal data.
Internal training presentation about how I go about advocating Yahoo to the outside world and what gets me pretty excited about our developer offers at the moment.
William Jones and others presented on bringing information together across devices and applications. They proposed modeling information structure using itemMirror objects that could be accessed by different applications. This would allow information to remain where it is while being used across platforms. A spring project was proposed for students to build HTML5 apps that work with the same information through itemMirror objects. The goal is to separate information from applications and stores to avoid lock-in and allow mixing and matching of tools.
The document discusses the vision of the Semantic Web and how it allows data to be shared and reused across applications. It outlines some of the key components of the Semantic Web like ontology, RDF, and URIs. It also discusses some common misconceptions about the Semantic Web, including that it is not about building AI applications or that it requires large ontologies. The Semantic Web is envisioned to seamlessly integrate with the existing Web to allow easier sharing and integration of data.
Nicholas Schiller presented on using APIs to customize library services. He demonstrated how to build a web application using the WorldCat Search API that automatically adds Boolean search terms to a user's query and formats the results. The application was built with PHP for server-side scripting, HTML5 for interface design, and jQuery Mobile to optimize for different devices. The presentation provided examples of APIs, guidelines for API projects, and resources for further learning about APIs and programming.
The document discusses the semantic web and its potential uses for liberal arts campuses. It provides an overview of semantic web technologies like RDF, OWL, and SPARQL. Examples are given of how semantic web tools could be used for campus projects, pedagogy, and research by exposing metadata and linking data. Challenges mentioned include complexity, lack of visible applications, and the ecological growth needed for widespread adoption.
The document discusses the evolution of the World Wide Web from static Web 1.0 to participatory Web 2.0 and the emerging Semantic Web or Web 3.0. Web 3.0 aims to add more context and meaning to online data through techniques like tagging, mapping, and natural language processing in order to better interconnect information and help computers assist users. Key aspects of the Semantic Web include using identifiers for things, representing relationships between things using languages like RDF and OWL, and using reasoners and queries to infer new conclusions and answers from semantically linked data.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
2. A Few Guidelines
Ask questions – be active
What I cover depends on how active you are
Learn concepts before technology
You will be bombarded with several concepts, tools and
technologies – just remember that you are learning to bridge
concepts and technology.
After this program, you should be comfortable dabbling with
these concepts on your own – even reading things that are
not covered today.
http://jnaapti.com/
4. The Different Vases :(
Not preferable
Ideal!
Sufficient
Source: http://www.flickr.com/photos/bachmont/1382572541/
http://jnaapti.com/
5. Quick Poll
How many of you are from a CS background?
Knowledge of:
Data Structures
Algorithms
Databases
Have heard of:
NoSQL
Key-Value Stores
Cloud Computing
MapReduce
Hadoop
http://jnaapti.com/
7. What is this talk about?
2 themes in this talk:
About data – how is it stored, how do we work with
it
About understanding technology via concepts
learnt
http://jnaapti.com/
8. How much data are we talking about really?
200 million Tweets per day – as of Jun 2011
Wikipedia dump
current revisions only - 31GB uncompressed
entire history runs into multiple TBs uncompressed
Common Crawl data – 10s of Tbs
Tumblr – adding 3TB of new data everyday
Google processes 25PB of data per day
Facebook – 135+ billion messages a month
Facebook – 130TB of logs generated per day
Vestas - Wind data - 18 to 24 petabytes of data to be processed
http://jnaapti.com/
9. We are dealing with a lot more data...
Increase in the number of sensor devices
Larger audience of users using our applications via the
web and social networks results in increased data
generation
Cost of storage is falling – so we never discard any of
the data
http://jnaapti.com/
10. What's in it for me?
Scrabulous case study
Built by 2 young chaps from Kolkata
Both were in their early 20's when
they built it
One was still in college.
500,000 users daily – back in 2008,
25,000$ in ad-revenues per month
These days lots of apps being built by
Source: Wikipedia
college under-graduates.
If they can do it, you can do it too!
http://jnaapti.com/
11. You have all it takes
You have access to a lot
of the tools that big
corporations use for free
You have computing
power available cheaply
You have access to a lot
of the data for free
http://jnaapti.com/
12. What do I need then?
All you need is a little intelligence and a lot of
perseverance and you are on your way!
http://jnaapti.com/
13. Questions to ask
Ok, you have the resources
You build a cool web
application
It is an overnight hit - can you
handle it?
What happens if the server has
a disk crash?
Can we prevent website Slashdot Effect
outages in the account of
hardware failures?
http://jnaapti.com/
14. Looking for answers
What do technology companies like Google/Facebook/Twitter
use to manage data? What challenges do they face in managing
such huge volumes of data? How do they analyze such data?
Image Source: http://opencompute.org/
http://jnaapti.com/
15. From concept to technology
We learn quite a few subjects in
Computer Science – data structures,
algorithms, databases, networking,
operating systems, graph theory, etc.
Are we ever going to use this/need this
as engineers?
How do I use my knowledge of CS to
understand the latest developments in
the industry?
Image Source:http://www.flickr.com/photos/nics_events/2223583947/
http://jnaapti.com/
16. From concept to technology
This talk is about connecting concepts to real world
examples
Image Source:http://www.flickr.com/photos/nics_events/2223583947/
http://jnaapti.com/
17. A few snappy examples
Analysis of question papers from various companies
Analysis of image patterns in your photos and movie
collections
Analysis of your Facebook friends
2nd degree connections
Who is active at what time?
Who talks about what?
http://jnaapti.com/
19. What is this section all about?
Before dealing with big-data problems, we first need to
know how data is handled.
This section tries to answer questions like:
How is it that 0's and 1's are sufficient to do anything
that a computer does?
Why do we need data structures?
Why do we need databases – why can't I just store all
data as flat files?
http://jnaapti.com/
20. Computers – A Bit Processor
Computers only 0 0 1 0 0 1 0 1
1 0 0 1 0 0 1 0
understand bits 0 1 1 1 1 1 1 0
0 0 1 1 0 1 1 0
They have a way to store 0 1 0 0 1 0 0 0
0 0 1 0 0 1 0 1
and process these bits 1 0 0 1 0 0 1 0
0 1 1 1 1 1 1 0
It is upto users to give 0 0 1 1 0 1 1 0
0 1 0 0 1 0 0 0
the bits a “meaning”
http://jnaapti.com/
21. Data Structures
Data structure is like a
cast
Pour your bits into it and
a 'shape' is created
The 'shape' helps us
provide a meaning to the
bits Image Source: http://www.flickr.com/photos/andrein/3020194734/
http://jnaapti.com/
22. Programming Languages
Human mind does not understand bits. We need higher level
constructs to process bits. This is where programming languages
come in. They act as a bridge between what humans want to do and
what machines understand.
Image Source: http://www.flickr.com/photos/jurvetson/5872448596/
http://jnaapti.com/
23. Programming Languages
Variables a = 10, b = 20
c = a + b
Types
if condition:
Operators do_this()
for i in range(10):
Conditionals
do_this()
Looping
urllib.urlopen('http://yahoo.com
/').read()
Libraries
[str.lower() for str in
list_of_strings]
http://jnaapti.com/
24. Primitive Types
Languages usually have two 'bangalore'
primitive types
123
Numbers – Integers,
567.89
Floats, Doubles etc
0
Strings – A sequence of
characters put together -123
Why these two types? Why -567.89
not just strings? '123'
http://jnaapti.com/
25. Composite Types (or Collections)
The world is complex Name → First Name + Last Name
---
We cannot model everything
Phone No → (Country Code) Area Code +
with only strings and numbers Subscriber Number
---
We need ways to put
Address → Door No + Street + City +
primitive values together to State + Pin Code
form more complex types ---
Collections are a bag of values Composite of composites: Person →
Name + Phone No + Address
put together
---
Bottom up v/s Top down Group of People
http://jnaapti.com/
26. Collections – General Object Containers
We can represent As a matter of fact,
anything in the world this is what JSON allows
using collections you to do
Collections can be
mapped to bits
Computers can interpret
those bits
http://jnaapti.com/
28. Collections – Lists
Grocery shopping example
Order of items matter
Do items need to be of the same type?
The key identifier is the position of the item in the list
Operations on a list:
add an item to list
remove an item from the list
get an item from the list at a specific position
http://jnaapti.com/
29. Collections – Sets
Items in a set are unique
There is no definite order
Operations on a set:
Add items to the set
Test if an item exists in the set
Remove an item from the set
http://jnaapti.com/
30. Collections - Maps
Lots of maps in the real Toothpaste - 1, Rs. 54
Matchbox - 10, Rs. 15
world
Tomatoes - 1kg, Rs. 10
Indices are not always
Chips - 1, Rs. 15
integers in real world ---
We may want to Identify Dictionary of word definitions
properties of an item, ---
Phone book containing phone
using some name
numbers
http://jnaapti.com/
31. Collections – Maps contd...
Maps allow us to Grocery list: Item is the key,
properties are values
associate a key with a ---
value Dictionary as a map: keys are the
words, values are the definitions
The name that is used to ---
identify the set of Phone book as a map: keys are the
names, values are the phone numbers
properties is called a key
The properties identified
is called the value
http://jnaapti.com/
32. Collections – Maps contd...
Keys don't have a definite Important:
order The analogy breaks here -
Don't get confused with the
Operations on a map:
way a map works – keys
Put a key, value pair don't have an order...
Get a value for a key Looking up keys, not values
- You don't say get me the
Get me all the keys and
word whose definition
I will look at them one is ...
by one
http://jnaapti.com/
33. More composite types
List of lists List of people is a list
of maps
List of maps
---
Map of maps
Mailboxes containing
... mails is a map of maps
http://jnaapti.com/
35. Hashtables
Run the key through a magic
function that gives you a number
The number is a unique slot into an
array
The magic function is called a
“hash function” - it is chosen such
that there are minimal collisions
and most uniform distribution
Image Source: Wikipedia
http://jnaapti.com/
36. Gmail – An Example
What datastructures do we use
here?
Mail
Mailbox
Person
Label
A mailbox has a list of mails
A mail can be represented
using a map
http://jnaapti.com/
37. Gmail – An Example
What is the mailbox size? How much RAM does a system have?
If all the data of the world could fit into the RAM of a single machine,
we wouldn't have a lot of the problems we face
Luckily, that's not the case!
Properties of RAMs
Are limited in their capacity
Are volatile (data disappears on reboot)
Max data in memory is 256GB
Conclusion: We need the disk
http://jnaapti.com/
38. Hmm... Our First “Big” Data Problem
Let us say, the data is present as a huge 7 GB file in the
disk.
What is the amount of time it takes to read this file
into memory?
How do I measure disk speeds?
http://jnaapti.com/
40. Disk Read Speed
We can get disk read speeds close to 80MB/s
Let's round it off to 100MB/s
Reading 7000MB would take 70 seconds
Would you wait if Gmail took 70 seconds to fetch your mails?
Remember, parallel read accesses and writes slow it down further.
Hmm, ok, this doesn't work, we need something faster, solution?
http://jnaapti.com/
41. How do we solve this?
Imagine a world where there are no databases - you
have a hard-disk and you are asked to solve this
problem.
We need to be able to read only the data we want as
quickly as we can.
How do we solve this?
http://jnaapti.com/
42. Solution
Store data in fixed sized records and then have a way to
jump to the starting location of a specific record
http://jnaapti.com/
44. A word about Abstraction
Reading from a disk
Instruct the hardware to move the read head to a specific location, now
read the data
Reading from a file
Open the file, Read it, Close it
Reading from a database
Connect to the DB, query for data, Close connection
One of the skills you can pickup as an engineer is being able to define an
operation at every level of abstraction
http://jnaapti.com/
45. Relational Database Design
Define Entities and their Relationships
Handling 1..1, 1..n and m..n relationships
Perform normalization
Take the entities and their relationships and come up
with tables, fields, primary keys and foreign keys
Define queries to add, update, fetch and delete data
http://jnaapti.com/
46. Mapping Design to Implementation
Data is stored in tables (which map to entities)
Tables contain records (rows) and fields (columns)
Records are of fixed length
Records are stored sequentially
http://jnaapti.com/
47. Relational Databases – Storage Structure
Use hash-tables to point to records in the tables – so
individual records can be retrieved without having to
search the entire dataset.
This process is called “indexing”.
In theory you can have many such indexes.
Foreign keys are also indexed to speed up the lookup.
http://jnaapti.com/
53. Problem 1 – Too Many Requests
What if a thousand users access my server at the same time?
If the server can handle 200 such requests in parallel in one
second, what if I have 400 requests per second?
1st second → 200 requests
2nd second → 600 requests (200 are from the previous second)
Results in server thrashing
Solution: Load Balanced Setup
http://jnaapti.com/
55. Load Balancing
Load balancing is a way of parallelizing processing
across multiple machines
The load balancer acts as a proxy that streams
requests and responses between the client and the
processing server.
Eg: HAProxy
Stateful and Stateless Architectures
http://jnaapti.com/
56. Problem 2 – Even More Requests
What if the Load Balancer itself becomes the
bottleneck?
Solution:
Round Robin DNS
Building multiple independent clusters
http://jnaapti.com/
59. Problem 3 – The Stateful Database
A single database cannot handle all requests from all
users.
Unlike front-end servers, databases are not “stateless”
If we are trying to only read information, it's fine, but
if we are trying to write information, this is a problem.
http://jnaapti.com/
60. Scale Up v/s Scale Out
Scale up means to add resources (CPUs or memory) to
a single system system in order to increase its
processing capabilities
Scale up has limitations in how much we can scale –
but is easier to do
Scale out means to add more nodes to a system
Scale out provides linear scalability, is less
expensive, but is complex compared to scale-up
http://jnaapti.com/
62. Scale Up Solution to the DB Problem
Increase the system's capacity by adding more
resources to the system – faster disks, more RAM,
faster processors, more cores etc
Introduce on-the-fly compression of data in the
database
Scale up is not scalable enough
http://jnaapti.com/
65. Scale Out Solutions to the DB Problem
Until the virtualization revolution and until we reached
the limits of hardware, we were looking at scale up
solutions rather than scale out solutions
Partition your data and put them on multiple systems
– a subset of the rows in each system
This is called Sharding
http://jnaapti.com/
66. Issues with Sharding
No clear way of partitioning the data
Maintaining ACID (Atomicity, Consistency, Isolation,
Durability) properties is complex
Joining data across machines is complex
Re-sharding is complex
http://jnaapti.com/
67. Other Issues with Relational Databases
Data could be unstructured/semi-structured
Impedance mismatch (ORM issues)
Sparse values are not handled well - results in wastage of
storage (although some engines handle this today)
Changes in schema are difficult
Not all data require ACID/Transactional support
Normalization results in more queries and that means
more disk accesses - some apps can do without them
http://jnaapti.com/
68. The NoSQL Revolution
NoSQL revolution happened to solve the many issues faced
with storing web-scale data in relational databases
NoSQL as the name suggests don't use SQL to store and
retrieve data
Widely adopted in web applications these days, several
solutions available
Still in research – no clear winner and therefore difficult to
choose among alternatives
http://jnaapti.com/
69. Advantages of NoSQL Stores
They don't require fixed schemas
Avoid joins
Sharding (Scale out) is easier – some even do it
automatically
Many of the implementations replicate the data and
thus avoid SPOFs (Single Point of Failure)
http://jnaapti.com/
74. Examples of Web Scale Data Analysis
Distributed Grep - Look for a pattern in all the Tweets
Inverted Index Building - This is what is used by search
engines
Sentiment Analysis
Competition Analysis
Log Analysis
http://jnaapti.com/
75. Understanding the problem of Analysis
Unlike in the case of retrieving data, in the case of
analysis, we need to read through everything, but
reads are slow in the disk.
Let's see a simple math:
1 Hard Disk read speed is 100MB/s
100 Hard Disks read in parallel gives 10GB/s!
Can we exploit this parallelism?
http://jnaapti.com/
76. The Coin Counting Example
You have a sack full of coins, and you are asked to separate
them into 1, 2, 5 and 10 Rs coins and tell how many of each
are present.
Now, let's say you have few sacks full of coins and it will take
you a lot of time to count it yourself – so you call a few other
people to help you out.
Now, let's say there is few rooms full of coins (like in some
large temples in India) – how will you count them?
http://jnaapti.com/
77. Coin Counting Problem – in depth
You can't add more people to the same room – the
room is already full.
You can get a few more rooms, ask people to take some
coins to the other room and then do the counting
there, and come back with the coins and the final count.
This will mean a lot of “traffic” in the corridor.
So what's a better solution?
http://jnaapti.com/
78. A Possible Solution to the Coin Counting Problem
Unload the coins in different rooms rather than in the
same room.
Then get workers in different rooms. With an increase
in coins, increase the number of rooms and workers.
Let the workers in each room work independently.
This is how Map/Reduce frameworks work
http://jnaapti.com/
79. Traditional Parallel Processing
Use of threads, sharing data, synchronization
Results in Deadlocks, Livelocks, Starvation etc
Handling failures is complex
Parallel Programming is hard this way.
http://jnaapti.com/
80. Requirements from a parallel processing framework
Higher level programming constructs – don't need to deal with sockets,
threading, locking, sharing data etc
Manage failures - if a task fails or a system breaks down, we want the
framework to transparently manage it
Recoverability - If a system fails, another system must be able to pick up
its workload
Replication – if a system fails, we don't lose data – the framework
should replicate data in multiple nodes
Scalability – Adding more compute nodes should help us increase the
compute capacity
http://jnaapti.com/
81. Pulling data Or Pushing Computations?
Pulling data for computation results in a bottleneck
Every “database store” also has a “processor”.
Instead of pulling the data for computation, can we
think about pushing the computation out to where the
data resides?
Computation is in "bytes", may be a few MB of object
code, that is still trivial compared to the data it works
on
http://jnaapti.com/
82. MapReduce
Concept introduced by Google in 2004
Framework is inspired by map and reduce functions
found in functional programming languages
Hadoop is an opensource implementation of
MapReduce
http://jnaapti.com/
83. MapReduce Frameworks
Data is spread throughout machines before starting
the task
Computation is done in the nodes where data is stored
Data is replicated in multiple machines to increase
reliability
Tasks are executed on multiple nodes just in case one
of them is running slow
http://jnaapti.com/
84. Using the Common Crawl Data – A Case Study
The dump is a few 10s of TBs in size
Where/How do you download it?
Answer: You don't need to download it
Instead you push your computation to where the data
exists, perform your computation and then only fetch
results you are interested in!
http://jnaapti.com/
85. Recap
My knowledge of computer science:
Am I ever going to use this/need this as an
engineer?
How do I use this knowledge to understand the
latest developments in software engineering?
Hope you have an answer now!
http://jnaapti.com/
86. Parting Thoughts
Technology changes very rapidly – don't expect to be
spoon-fed
Practise, Practise, Practise - Katas
Concept before Technology
Try out new things – even if they are not related to your
project/curriculum
Read and understand other people's code
Read a lot, for example: http://highscalability.com/
http://jnaapti.com/
87. We at jnaapti conduct workshops and provide
training on these technologies – contact us at
http://jnaapti.com/ for more details
http://jnaapti.com/