Thermopylae Sciences & Technology has developed a custom spatial indexing solution for MongoDB to allow it to index and query multi-dimensional spatial data at scale. They implemented an R-tree spatial index that stores spatial objects as minimum bounding rectangles. This allows MongoDB to efficiently store and query geometries in more than two dimensions. Their solution also includes a geo-sharding approach to distribute the R-tree across multiple servers for additional scalability. Thermopylae has seen over 300% performance improvements versus PostGIS for spatial queries on large datasets with this customized indexing solution.
Petit Déjeuner Datastax 14-04-15 Courbo Spark : exemple de Machine Learning s...OCTO Technology
Ces dernières années, nous avons assisté à une évolution majeure de l’écosystème des solutions de gestion de la donnée. Les usages ont également évolué tant sur les aspects analytiques que transactionnels : le batch J+1 n'est plus une fatalité !
Quels constats et quelles perspectives pour les SI traditionnels à l'heure où les technologies événementielles sont de plus en plus accessibles et adoptées ?
Courbo-Spark : exemple de Machine Learning sur des séries temporelles
Les arbres de décisions sont des modèles bien connus de classification et de régression dans l'univers du Machine Learning. Dans le contexte industriel d’EDF il est souvent nécessaire d’appliquer ce type d’algorithme à des séries temporelles. Nous allons vous présenter comment EDF et OCTO ont adapté l’implémentation des arbres de décision dans Spark afin de traiter de grands volumes de courbes de charges.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
it is bit towards Hadoop/Hive installation experience and ecosystem concept. The outcome of this slide is derived from a under published book Fundamental of Big Data.
Petit Déjeuner Datastax 14-04-15 Courbo Spark : exemple de Machine Learning s...OCTO Technology
Ces dernières années, nous avons assisté à une évolution majeure de l’écosystème des solutions de gestion de la donnée. Les usages ont également évolué tant sur les aspects analytiques que transactionnels : le batch J+1 n'est plus une fatalité !
Quels constats et quelles perspectives pour les SI traditionnels à l'heure où les technologies événementielles sont de plus en plus accessibles et adoptées ?
Courbo-Spark : exemple de Machine Learning sur des séries temporelles
Les arbres de décisions sont des modèles bien connus de classification et de régression dans l'univers du Machine Learning. Dans le contexte industriel d’EDF il est souvent nécessaire d’appliquer ce type d’algorithme à des séries temporelles. Nous allons vous présenter comment EDF et OCTO ont adapté l’implémentation des arbres de décision dans Spark afin de traiter de grands volumes de courbes de charges.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
it is bit towards Hadoop/Hive installation experience and ecosystem concept. The outcome of this slide is derived from a under published book Fundamental of Big Data.
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Expressing and Exploiting Multi-Dimensional Locality in DASHMenlo Systems GmbH
DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations.
Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH.
In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications.
We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.
Scaling Storage and Computation with Hadoopyaevents
Hadoop provides a distributed storage and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. Hadoop is partitioning data and computation across thousands of hosts, and executes application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and IO bandwidth by simply adding commodity servers. Hadoop is an Apache Software Foundation project; it unites hundreds of developers, and hundreds of organizations worldwide report using Hadoop. This presentation will give an overview of the Hadoop family projects with a focus on its distributed storage solutions
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
Abstract: The presentation describes
- What is the BigData problem
- How Hadoop helps to solve BigData problems
- The main principles of the Hadoop architecture as a distributed computational platform
- History and definition of the MapReduce computational model
- Practical examples of how to write MapReduce programs and run them on Hadoop clusters
The talk is targeted to a wide audience of engineers who do not have experience using Hadoop.
GeoServer on Steroids at FOSS4G Europe 2014GeoSolutions
Setting up a GeoServer can sometimes be deceptively simple. However, going from proof of concept to production requires a number of steps to be taken in order to optimize the server in terms of availability, performance and scalability.
The presentation will show how to get from a basic setup to a battle ready, rock solid installation by showing the ropes an advanced user already mastered.
In KDD2011, Vijay Narayanan (Yahoo!) and Milind Bhandarkar (Greenplum Labs, EMC) conducted a tutorial on "Modeling with Hadoop". This is the second half of the tutorial.
3D Repo (http://3drepo.org), winner of the MongoDB Innovation Award, is a non-linear version control system that enables coordinated management of large scale 3D models over the Internet. It is currently the only cloud-based architecture able to support maintenance and transmission of 3D models and associated metadata as well as rendering on the scale required by the industry. With MongoDB we can deliver significant improvements in the engineering workflow that supports collaborative design not possible otherwise. Instead of architects, engineers and constructors sharing massive files in a costly and time consuming manner, they can simply point their web browser to a shared online 3D repository. With our system, all stakeholders are able to examine their projects virtually, even on mobile devices. During the presentation, we will demonstrate the management of massive 3D models in a repository built directly atop of MongoDB. We will also demonstrate our online web-browser viewer capable of rendering 3D models directly from the DB without the need to install any plug-ins or firewall exceptions.
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Expressing and Exploiting Multi-Dimensional Locality in DASHMenlo Systems GmbH
DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations.
Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH.
In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications.
We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.
Scaling Storage and Computation with Hadoopyaevents
Hadoop provides a distributed storage and a framework for the analysis and transformation of very large data sets using the MapReduce paradigm. Hadoop is partitioning data and computation across thousands of hosts, and executes application computations in parallel close to their data. A Hadoop cluster scales computation capacity, storage capacity and IO bandwidth by simply adding commodity servers. Hadoop is an Apache Software Foundation project; it unites hundreds of developers, and hundreds of organizations worldwide report using Hadoop. This presentation will give an overview of the Hadoop family projects with a focus on its distributed storage solutions
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
Abstract: The presentation describes
- What is the BigData problem
- How Hadoop helps to solve BigData problems
- The main principles of the Hadoop architecture as a distributed computational platform
- History and definition of the MapReduce computational model
- Practical examples of how to write MapReduce programs and run them on Hadoop clusters
The talk is targeted to a wide audience of engineers who do not have experience using Hadoop.
GeoServer on Steroids at FOSS4G Europe 2014GeoSolutions
Setting up a GeoServer can sometimes be deceptively simple. However, going from proof of concept to production requires a number of steps to be taken in order to optimize the server in terms of availability, performance and scalability.
The presentation will show how to get from a basic setup to a battle ready, rock solid installation by showing the ropes an advanced user already mastered.
In KDD2011, Vijay Narayanan (Yahoo!) and Milind Bhandarkar (Greenplum Labs, EMC) conducted a tutorial on "Modeling with Hadoop". This is the second half of the tutorial.
3D Repo (http://3drepo.org), winner of the MongoDB Innovation Award, is a non-linear version control system that enables coordinated management of large scale 3D models over the Internet. It is currently the only cloud-based architecture able to support maintenance and transmission of 3D models and associated metadata as well as rendering on the scale required by the industry. With MongoDB we can deliver significant improvements in the engineering workflow that supports collaborative design not possible otherwise. Instead of architects, engineers and constructors sharing massive files in a costly and time consuming manner, they can simply point their web browser to a shared online 3D repository. With our system, all stakeholders are able to examine their projects virtually, even on mobile devices. During the presentation, we will demonstrate the management of massive 3D models in a repository built directly atop of MongoDB. We will also demonstrate our online web-browser viewer capable of rendering 3D models directly from the DB without the need to install any plug-ins or firewall exceptions.
This presentation is about -
Based on as a service model,
• SAAS (Software as a service),
• PAAS (Platform as a service),
• IAAS (Infrastructure as a service,
Based on deployment or access model,
• Public Cloud,
• Private Cloud,
• Hybrid Cloud,
For more details you can visit -
http://vibranttechnologies.co.in/salesforce-classes-in-mumbai.html
Robotics classes in mumbai
best Robotics classes in mumbai with job assistance.
our features are:
expert guidance by it industry professionals
lowest fees of 5000
practical exposure to handle projects
well equiped lab
after course resume writing guidance
How Fannie Mae Leverages Data Quality to Improve the BusinessDLT Solutions
James Barrett, Data Quality Service Manager in Enterprise Data, Operations & Technology at Fannie Mae, shares how Fannie Mae leverages data quality to improve the business at the 2015 Informatica Government Summit.
Vibrant Technologies is headquarted in Mumbai,India.We are the best Robotics training provider in Navi Mumbai who provides Live Projects to students.We provide Corporate Training also.We are Best Robotics classes in Mumbai according to our students and corporates
contact us on : http://vibranttechnologies.co.in/
Python classes in mumbai
best Python classes in mumbai with job assistance.
our features are:
expert guidance by it industry professionals
lowest fees of 5000
practical exposure to handle projects
well equiped lab
after course resume writing guidance
How to Accelerate Backup Performance with Dell DR Series Backup AppliancesDLT Solutions
Join us for a live demonstration of Dell DR series disk backup and disaster recovery appliances. Learn how to transform your backup operations without the pain of replacing your existing backup software. DR series appliances can be deployed in any environment quickly and easily – fast and affordable data reduction with the future built in. Discover how to easily accelerate backup performance, reduce backup storage footprint, and simplify disaster recovery.
Cloud Ready Data: Speeding Your Journey to the CloudDLT Solutions
Ronen Schwartz, Vice President and General Manager Informatica Cloud at Informatica, shares how to speed your journey to the cloud from the 2015 Informatica Government Summit.
GET READY FOR INTEL'S KNIGHTS LANDING
As the leading provider of code modernization and optimization training, Colfax now offers a 1-hour webinar: “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing”.
ANOTHER LEAP IN PARALLEL PERFORMANCE
Next-generation Intel Xeon Phi processors codenamed Knights Landing (KNL) are expected to provide up to 3X higher performance than the current generation. With on-board high-bandwidth memory and optional integrated high-speed fabric—plus the availability of socket form-factor —these powerful components will transform the fundamental building block of technical computing.
The transformation of scalable manycore to a processor is going to be a remarkable in the parallel computing field and we are offering help to developers worldwide on getting the best out of the new processor. The webinar will help get you up to speed with:
- Knights Landing architecture
- New KNL features
- Code transition and modernization strategy
MULTIPLE RUNS AND FLEXIBLE TIMINGS FOR ALL GEOS
The webinar will air multiple times at different time slots making it convenient for developers worldwide to attend the webinar.
REGISTER TODAY
at http://colfaxresearch.com/knl-webinar/
Experts from immixGroup’s Market Intelligence organization identify and explain targeted sales opportunities for COTS manufacturers and solution providers and discuss how to navigate the complex waters of DOD. Topics will include agency IT budgets, organizational landscapes, major acquisition drivers, and FY15 programs. Click here to view the full presentation: http://immixgroup.com/Resources/Webcasts/Market-Intelligence-FY15-Defense-Budget/
Linux administration classes in mumbai
best Linux administration classes in mumbai with job assistance.
our features are:
expert guidance by it industry professionals
lowest fees of 5000
practical exposure to handle projects
well equiped lab
after course resume writing guidance
Scalable Machine Learning: The Role of Stratified Data Shardinginside-BigData.com
In this deck from the 2019 Stanford HPC Conference, Srinivasan Parthasarathy from Ohio State University presents: Scalable Machine Learning: The Role of Stratified Data Sharding.
"With the increasing popularity of structured data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and learning from such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the sharding, placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this eco-system according to the needs of the application to maximize locality, balance load, minimize data skew or even take into account energy consumption. Results on several real-world applications validate the efficacy and efficiency of our approach. (Notes: Joint work with Y. Wang (Airbnb) and A. Chakrabarti (MSR))."
Srinivasan Parthasarathy, Professor of Computer Science & Engineering, The Ohio State University
Srinivasan Parthasarathy is a Professor of Computer Science and Engineering and the director of the data mining research laboratory at Ohio State. His research interests span databases, data mining and high performance computing. He is among a handful of researchers nationwide to have won both the Department of Energy and National Science Foundation Career awards. He and his students have won multiple best paper awards or "best of" nominations from leading forums in the field including: SIAM Data Mining, ACM SIGKDD, VLDB, ISMB, WWW, ICDM, and ACM Bioinformatics. He chairs the SIAM data mining conference steering committee and serves on the action board of ACM TKDD and ACM DMKD --leading journals in the field. Since 2012 he also helped lead the creation of OSU's first-of-a-kind nationwide (USA) undergraduate major in data analytics and serves as one of its founding directors.
Watch the video: https://youtu.be/hOJI8e0p-UI
Learn more: http://web.cse.ohio-state.edu/~parthasarathy.2/
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
PostGIS is a spatial extension for PostgreSQL
PostGIS aims to be an “OpenGIS Simple Features for SQL” compliant spatial database
I am the principal developer
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
Whether you’re processing IoT data from millions of sensors or building a recommendation engine to provide a more engaging customer experience, the ability to derive actionable insights from massive volumes of diverse data is critical to success. MediaMath, a leading adtech company, relies on Apache Spark to process billions of data points ranging from ads, user cookies, impressions, clicks, and more — translating to several terabytes of data per day. To support the needs of the data science teams, data engineering must build data pipelines for both ETL and feature engineering that are scalable, performant, and reliable.
Join this webinar to learn how MediaMath leverages Databricks to simplify mission-critical data engineering tasks that surface data directly to clients and drive actionable business outcomes. This webinar will cover:
- Transforming TBs of data with RDDs and PySpark responsibly
- Using the JDBC connector to write results to production databases seamlessly
- Comparisons with a similar approach using Hive
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
(Berkeley CS186 guest lecture)
Big Data Analytics Systems: What Goes Around Comes Around
Introduction to MapReduce, GFS, HDFS, Spark, and differences between "Big Data" and database systems.
Advanced Non-Relational Schemas For Big DataVictor Smirnov
This is the presentation from barcamp in Altoros where I was explaining how various advanced non-relational schemas (or, simply, data structures) can be modelled on top of Key/Value storage. The set of covered schemas includes Dynamic Vector, File System, Searchable Bitmap, LOUDS Tree, Wavelet Tree and Inverted Index.
See https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData
for additional details.
Databases Basics and Spacial Matrix - Discussig Geographic Potentials of Data...Jerin John
A core introduction to Data Types, Databases around us and the use cases, Integrating GeoSpacial Arrangements to our projects. This include Speaker Notes. and is indented to give an overall idea about things not focusing on a specific focus. This Incudes references of Relational Databases likePostgresql and Mysql, Key Value databases like Redis, Document DB like MongoDB, Search Engines like ElasticSearch, and the second part handles with GeoSpatial Arrangements in Postgis.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
Find out how NoSQL can help your application with practical examples and use-cases from our Cloud Data Services Developer Advocate Glynn Bird. This webinar won't dwell on the science behind the database, but will walk you through real-life use-cases for NoSQL technologies that you can start using today.
Webinar: https://youtu.be/M_Jqw
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Codemotion
New winds are blowing in the world of Data. They say there are magic systems that are capable of the impossible. Systems that guarantee the scalable performance of the NoSQLs while still maintaining the ACID transactions of relational databases. What is this kind of magic? How did it come to be? How can it be used? And more importantly, how does it work? Welcome to the world of NewSQL. Welcome to the future.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
Similar to High Dimensional Indexing using MongoDB (MongoSV 2012) (20)
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2. Thermopylae Sciences & Technology – Who are we?
• Mixed Government (70%) and Commercial (30%) contracting
company w/ ~150 employees
• Core customers:
– SOUTHCOM, Intel & Security Command, Army Intel Sector, DOI
– LVMS, Select Energy Oil & Gas, OSU, Cleveland Cavaliers, and STL Rams
• #1 Google Enterprise partner for Federal and partner w/
imagery providers (GeoEye / Digital Globe)
• FOSS4G contributor and 10gen Enterprise partner
WHO ARE THESE GUYS?
ACCOMPLISHING THE IMPOSSIBLE
ENTERPRISE
PARTNER
3. “The 3D UDOP allows near real time visibility of all SOUTHCOM Directorates information in one
location…this capability allows for unprecedented situational awareness and information sharing”
-Gen. Doug Frasier
TST PRODUCTS
ACCOMPLISHING THE IMPOSSIBLE
4. COMMERCIAL CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
Commercial Examples
Cleveland
Cavaliers
USGIF Las Vegas
Motor Speedway
Baltimore
Grand Prix
iSpatial framework serves millions of mobile devices
5. 1. iSpatial provides web-based interface for Multi-INT visualization and collaborations
2. Map/Reduce provides spatial statistic processing (spatial regression) and heuristics
3. Modified MongoDB provides storing and indexing multi-dimension spatial data at scale
TST ARCHITECTURE
ACCOMPLISHING THE IMPOSSIBLE
iSpatial – UI/Visualization
Hadoop M/R – Processing / Analysis
MongoDB – Spatial Data Management @ Scale
1 2
3
6. What the…..HOW MUCH DATA?!?
• “Swimming in sensors drowning in data”
– What size data tsunami are we talking about?
• “Fix and Finish are meaningless until FIND is accomplished”
– A “Big Data” Spatial Search Problem
THAT’S A LOT OF DATA….
ACCOMPLISHING THE IMPOSSIBLE
Sensor Type Resolution Data Bandwidth TB/Hr
FMV 640 x 480 (Std Def)
1920 x 1080 (HD)
HD: 16bit x 3 bands @
30fps ~1Gbps
~0.45 TB
WAMI Constant Hawk = 96 Mpx
Gorgon Stare = 460 Mpx
Argus = 1.8 Gpx
GS @ 16bit x 3 bands @
2fps ~15.3Gps
Argus @ 16bit x 3 bands
@ 12fps ~345.6Gps
~6.89 TB
~155 TB
Satellite NITF / JP2 resolutions
32K x 32K
432K x 216K
32K x 32K @ 8bit x 3
bands @ 1frame/5mins
~27Gps
~12.15 TB
7. • Horizontally scalable – Large volume / elastic
• Vertically scalable – Heterogeneous data types (“Data Stack”)
• Smartly Distributed – Reduce the distance bits must travel
• Fault Tolerant – Replication Strategy and Consistency model
• High Availability – Node recovery
• Fast – Reads or writes (can’t always have both)
BIG DATA STORAGE CHARACTERISTICS
ACCOMPLISHING THE IMPOSSIBLE
Desired Data Store Characteristic for ‘Big Data’
8. • Cassandra
– Nice Bring Your Own Index (BYOI) design
– … but Java, Java, Java… Memory management can be a maintenance issue
– Adding new nodes can be a pain (Token Changes, nodetool)
– Key-Value store…good for simple data models
• Hbase
– Nice BigTable model
– Key-Value store…good for simple data models
– Lots of Java JNI (primarily based on std:hashmap of std:hashmap)
• CouchDB
– Provides some GeoSpatial functionality (Currently being rewritten)
– HEAVILY dependent on Map-Reduce model (complicated design)
– Erlang based – poor multi-threaded heap management
NOSQL OPTIONS
ACCOMPLISHING THE IMPOSSIBLE
Subset of Evaluated NoSQL Options
9. Why MongoDB for Thermopylae?
• Documents based on JSON – A GEOJSON match made in heaven! (OGC)
• C++ - No Garbage Collection Overhead! Efficient memory management
design reduces disk swapping and paging
• Disk storage is memory mapped, enabling fast swapping when necessary
• Built in auto-failover with replica sets and fast recovery with journaling
• Tunable Consistency – Consistency defined at application layer
• Schema Flexible – friendly properties of SQL enable easy port
• Provided initial spatial indexing support – Point based limited!
WHY TST <3’S MONGODB
ACCOMPLISHING THE IMPOSSIBLE
10. MONGODB SPATIAL INDEXER
ACCOMPLISHING THE IMPOSSIBLE
... The Spatial Indexer wasn’t quite right
• MongoDB (like nearly all relational DBs) uses a b-Tree
– Data structure for storing sorted data in log time
– Great for indexing numerical and text documents (1D attribute data)
– Cannot store multi-dimension (>2D) data – NOT COMPLEX GEOMETRY
FRIENDLY
11. DIMENSIONALITY REDUCTION
ACCOMPLISHING THE IMPOSSIBLE
How does MongoDB solve the dimensionality problem?
• Space Filling (Z) Curve
– A continuous line that
intersects every point in a
two-dimensional plane
• Use Geohash to
represent lat/lon values
– Interleave the bits of a
lat/long pair
– Base32 encode the result
12. GEOHASH BTREE ISSUES
ACCOMPLISHING THE IMPOSSIBLE
• Neighbors aren’t so
close!
– Neighboring points on the
Geoid may end up on
opposite ends of the
plane
– Impacts search efficiency
• What about Geometry?
– Doesn’t support > 2D
– Mongo uses Multi-
Location documents
which really just indexes
multiple points that link
back to a single document
Issues with the Geohash b-Tree approach
13. Sort Order and Multi-Dimension…a nightmare
(3D / 4D Hilbert Scanning Order)
GEO-SHARDING ALTERNATIVE
ACCOMPLISHING THE IMPOSSIBLE
14. Case 3:
Case 4:
Multi-Location Document (aka. Polygon) Search Polygon
Case 1:
Case 2:
Success!
Success!
Fail!
Fail!
Mongo Multi-location Document Clipping Issues
($within search doesn’t always work w/ multi-location)
MULTI-LOCATION CLIPPING
ACCOMPLISHING THE IMPOSSIBLE
15. • Constrain the system to single point searches
– Multi-dimension support will be exponentially complex (won’t scale)
• Interpolate points along the edge of the shape
– Multi-dimension support will be exponentially complex (won’t scale)
• Customize the spatial indexer
– Selected approach
SOLUTIONS TO GEOHASH PROBLEM
ACCOMPLISHING THE IMPOSSIBLE
Potential Solutions
16. CUSTOM TUNED SPATIAL INDEXER
ACCOMPLISHING THE IMPOSSIBLE
Thermopylae Custom Tuned MongoDB for Geo
TST Leverage’s Kriegel’s 1996 Research in R* Trees
• R-Trees organize any-dimensional data by representing
the data as a minimum bounding box.
• Each node bounds it’s children. A node can have many
objects in it (max: m min: ceil(m/2) )
• Splits and merges optimized by minimizing overlaps
• The leaves point to the actual objects (stored on disk
probably)
• Height balanced – search is always O(log n)
17. Spatial Indexing at Scale with R-Trees
RTREE THEORY
ACCOMPLISHING THE IMPOSSIBLE
Spatial data represented as minimum bounding rectangles (2-
dimension), cubes (3-dimension), hexadecant (4-dimension)
Index represented as: <I, DiskLoc> where:
I = (I0, I1, … In) : n = number of dimensions
Each I is a set in the form of [min,max] describing MBR range along a dimension
18. R*-Tree Spatial Index Example
• Sample insertion result for 4th order
tree
• Objectives:
1. Minimize area
2. Minimize overlaps
3. Minimize margins
4. Maximize inner node utilization
a b cd e f g h i j k l
m n o p
R*-TREE INDEX OBJECTIVES
ACCOMPLISHING THE IMPOSSIBLE
19. Insert
• Similar to insertion into B+-tree but may insert
into any leaf; leaf splits in case capacity exceeded.
– Which leaf to insert into?
– How to split a node?
R*-TREE INSERT EXAMPLE
ACCOMPLISHING THE IMPOSSIBLE
20. Insert—Leaf Selection
• Follow a path from root to leaf.
• At each node move into subtree whose MBR area
increases least with addition of new rectangle.
m
n
o p
27. R*-Tree Leverages B-Tree Base Data Structures (buckets)
R*-TREE MONGODB IMPLEMENTATION
ACCOMPLISHING THE IMPOSSIBLE
28. Spatial Index
Architecture, Organization, & Performance
MBRKeyNode(s)
BucketHeader
MBRHeader
…
Dimensions Num Buckets Tree Height Read Time
3 3,448,276 3 190 ms
5 50,76,143 3 275 ms
100 90,909,091 8 ~4.9 sec
1B Polygon Read Performance (worst case O(n))
SPATIAL INDEX ARCH & ORG
ACCOMPLISHING THE IMPOSSIBLE
29. Geo-Sharding – (in work)
Scalable Distributed R* Tree (SD-r*Tree)
“Balanced” binary tree, with
nodes distributed on a set of
servers:
• Each internal node has
exactly two children
• Each leaf node stores a
subset of the indexed
dataset
• At each node, the height
of the subtrees differ by
at most one
• mongos “routing” node
maintains binary tree
GEO-SHARDING
ACCOMPLISHING THE IMPOSSIBLE
30. d0 d1
r1d0
Data Node Spatial
Coverage
a a
b
c
cb d0
r1
a
b
c
c
b
d2d1
e
d
d
r2
e
SD-r*Tree Data Structure Illustration
• di = Data Node (Chunk)
• ri = Coverage Node
Leveraged work from Litwin, Mouza, Rigaux 2007
SD-r*Tree DATA STRUCTURE
ACCOMPLISHING THE IMPOSSIBLE
32. Beyond 4-Dimensions - X-Tree
(Berchtold, Keim, Kriegel – 1996)
Normal Internal Nodes Supernodes Data Nodes
• Avoid MBR overlaps – more overlaps approaches worst case O(n) read
• Avoid node splits (main cause for high overlap)
• Introduce new node structure: Supernodes – Large Directory nodes of variable size
BEYOND 4-DIMENSIONS
ACCOMPLISHING THE IMPOSSIBLE
34. T-Sciences Custom Tuned Spatial Indexer
• Optimized Spatial Search – Finds intersecting MBR and recurses into
those nodes
• Optimized Spatial Inserts – Uses the Hilbert Value of MBR centroid to
guide search
– 28% reduction in number of nodes touched
• Optimize Deletes – Leverages R* split/merge approach for rebalancing
tree when nodes become over/under-full
• Low maintenance – Leverages MongoDB’s automatic data compaction
and partitioning
CONCLUSION
ACCOMPLISHING THE IMPOSSIBLE
35. Example: Mosaicked Video with KLV Footprints
SLIDESHOW HEADER
ACCOMPLISHING THE IMPOSSIBLE
• Rip through
KLV Metadata
• Index frame
footprints, and
annotations as
MBR into
X(R*)-Tree
• Leverage Geo-
Sharding for
spatially
relevant scale
36. Example Use Case – OSINT (Foursquare Data)
• Sample Foursquare
data set mashed with
Government Intel
Data (poly reports)
• 100 million Geo
Document test (3D
points and polys)
• 4 server replica set
• ~350ms query
response
• ~300%
improvement over
PostGIS
EXAMPLE
ACCOMPLISHING THE IMPOSSIBLE
37. Community Support
• Thermopylae plans to open source
– http://github.com/thermopylae
• TST working with 10gen to offer as a spatial extension
• Active developer collaboration
– IRC: #mongodb freenode.net
FIND US
ACCOMPLISHING THE IMPOSSIBLE
40. Key Customers - Government
• US Dept of State Bureau of Diplomatic Security
– Build and support 30 TB Google Earth Globe with multi-
terabytes of individual globes sent to embassies throughout
the world. Integrated Google Earth and iSpatial framework.
• US Army Intelligence Security Command
– Provide expertise in managing technology integration –
prime contractor providing operations, intelligence, and IT
support worldwide. Partners include IBM, Lockheed Martin,
Google, MIT, Carnegie Mellon. Integrated Google Earth and
iSpatial framework.
• US Southern Command
– Coordinate Intelligence management systems spatial data
collection, indexing, and distribution. Integrated Google
Earth, iSpatial, and iHarvest.
– Index large volume imagery and expose it for different
services (Air Force, Navy, Army, Marines, Coast Guard)
GOVERNMENT CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
41. COMMERCIAL CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
Key Customers - Commercial
Cleveland
Cavaliers
USGIF Las Vegas
Motor Speedway
Baltimore
Grand Prix
iSpatial framework serves millions of mobile devices
42. • Expose and manage Multi-INT enterprise data in a geo-temporal
user defined environment
• Provide a flexible and scalable spatial data infrastructure (SDI)
for Multi-INT data access and analysis
• Spatially referenced data visualization on 3D globe & 2D maps
• Access real/near real-time data feeds from forward deployed
devices
• Enable real-time information sharing and mission collaboration
ISPATIAL OVERVIEW
ACCOMPLISHING THE IMPOSSIBLE
Editor's Notes
Screen shot of UDOP…blow-out of key features (sharing, presentation builder, etc)