The document discusses the value of data and the rise of big data. It notes that Matthew Fontaine Maury in the 1800s recognized the value of analyzing ship log data collectively. Today, new sources of data like sensors have exploded the volume of data. Characteristics of big data include volume, variety, and velocity. Technological challenges include scalability, heterogeneity, and low latency. The document provides examples of non-relational databases and MapReduce as approaches to handle big data.
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
Big Data consists of large-volume, complex, growing data sets with multiple, heterogenous sources. With the
tremendous development of networking, data storage, and the data collection capacity, Big Data are now rapidly
expanding in all science and engineering domains, including physical, biological and biomedical sciences. The
MapReduce programming mode which has parallel processing ability to analyze the large-scale network.
MapReduce is a programming model that allows easy development of scalable parallel applications to process
big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop
is a powerful tool for building such applications.
This document discusses big data and analytics, outlining five trends and five research challenges. It begins by defining big data in terms of volume, velocity, variety, veracity and value. It then discusses the origins and evolution of big data, from early statistics to modern data science. Analytics is defined as using data to make empirically-derived, statistically valid decisions. The document outlines how hardware choices led to scaling out data processing across clusters rather than scaling up on single machines. It also provides examples of fields that generate huge volumes of data from billion dollar instruments like CERN's Large Hadron Collider and genomic sequencing facilities.
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
This document discusses how biomedical discovery is being disrupted by big data. Large genomic, phenotype, and environmental datasets are needed to understand complex diseases that result from combinations of many rare variants. However, analyzing large biomedical data is costly and difficult given the standard model of local computing. The document proposes creating large "commons" of community data and computing as an instrument for big data discovery. Examples are given of the Cancer Genome Atlas project, which has petabytes of research data on thousands of cancer patients, and how tumors evolve over time. Overall, the document argues that new models of shared biomedical clouds and commons are needed to enable cost-effective analysis of big biomedical data.
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
These are the slides from a 5 minute Lightning Talk that I gave at XLDB 2015 on May 19, 2015 at Stanford. It is based in part on our experiences developing the NCI Genomic Data Commons (GDC).
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
Big Data consists of large-volume, complex, growing data sets with multiple, heterogenous sources. With the
tremendous development of networking, data storage, and the data collection capacity, Big Data are now rapidly
expanding in all science and engineering domains, including physical, biological and biomedical sciences. The
MapReduce programming mode which has parallel processing ability to analyze the large-scale network.
MapReduce is a programming model that allows easy development of scalable parallel applications to process
big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop
is a powerful tool for building such applications.
This document discusses big data and analytics, outlining five trends and five research challenges. It begins by defining big data in terms of volume, velocity, variety, veracity and value. It then discusses the origins and evolution of big data, from early statistics to modern data science. Analytics is defined as using data to make empirically-derived, statistically valid decisions. The document outlines how hardware choices led to scaling out data processing across clusters rather than scaling up on single machines. It also provides examples of fields that generate huge volumes of data from billion dollar instruments like CERN's Large Hadron Collider and genomic sequencing facilities.
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
This document discusses how biomedical discovery is being disrupted by big data. Large genomic, phenotype, and environmental datasets are needed to understand complex diseases that result from combinations of many rare variants. However, analyzing large biomedical data is costly and difficult given the standard model of local computing. The document proposes creating large "commons" of community data and computing as an instrument for big data discovery. Examples are given of the Cancer Genome Atlas project, which has petabytes of research data on thousands of cancer patients, and how tumors evolve over time. Overall, the document argues that new models of shared biomedical clouds and commons are needed to enable cost-effective analysis of big biomedical data.
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
A modified k means algorithm for big data clusteringSK Ahammad Fahad
Amount of data is getting bigger in every moment and this data comes from everywhere; social media, sensors, search engines, GPS signals, transaction records, satellites, financial markets, ecommerce sites etc. This large volume of data may be semi-structured, unstructured or even structured. So it is important to derive meaningful information from this huge data set. Clustering is the process to categorize data such that data are grouped in the same cluster when they are similar according to specific metrics. In this paper, we are working on k-mean clustering technique to cluster big data. Several methods have been proposed for improving the performance of the k-means clustering algorithm. We propose a method for making the algorithm less time consuming, more effective and efficient for better clustering with reduced complexity. According to our observation, quality of the resulting clusters heavily depends on the selection of initial centroid and changes in data clusters in the subsequence iterations. As we know, after a certain number of iterations, a small part of the data points change their clusters. Therefore, our proposed method first finds the initial centroid and puts an interval between those data elements which will not change their cluster and those which may change their cluster in the subsequence iterations. So that it will reduce the workload significantly in case of very large data sets. We evaluate our method with different sets of data and compare with others methods as well.
This document discusses supercomputing, including its applications across multiple domains like bioinformatics, computational materials science, and computational fluid dynamics. It notes that supercomputers can solve problems too small, large, fast, or slow for normal laboratories. The document also outlines some current challenges for supercomputing, such as developing new architectures for next generation supercomputers and processing large datasets to extract useful information.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
The document is a project report submitted by Suraj Sawant to his college on the topic of "Map Reduce in Big Data". It discusses the objectives, introduction and importance of big data and MapReduce. MapReduce is a programming model used for processing large datasets in a distributed manner. The document provides details about the various stages of MapReduce including mapping, shuffling and reducing data. It also includes diagrams to explain the execution process and parallel processing in MapReduce.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
This document provides an overview of big data and the Hadoop ecosystem. It defines big data as large and complex datasets that are difficult to process using traditional data management tools. Characteristics of big data include volume, variety, velocity and veracity. The document discusses challenges of managing big data and how Hadoop provides solutions through its distributed architecture. It also summarizes some prominent Apache projects in the Hadoop ecosystem like Pig, Hive, Spark and Hbase.
Here are a few key points that could be avoided through database technology and responsible departments:
- Prescription drug databases could track a patient's medications across different providers and flag potential drug interactions. This would require databases accessible by different healthcare organizations.
- Government health agencies would need to regulate and help coordinate such prescription drug databases. In the US, the Food and Drug Administration and Centers for Medicare and Medicaid Services could play a role.
- Individual healthcare providers also have a responsibility to access prescription histories from other providers, through available databases, before writing new prescriptions to avoid harmful interactions.
- With proper access controls and privacy protections, consolidated medication databases could help providers make more informed prescribing decisions and improve patient safety by preventing adverse
This document discusses the challenges of building a network infrastructure to support big data applications. Large amounts of data are being generated every day from a variety of sources and need to be aggregated and processed in powerful data centers. However, networks must be optimized to efficiently gather data from distributed sources, transport it to data centers over the Internet backbone, and distribute results. The unique demands of big data in terms of volume, variety and velocity are testing whether current networks can keep up. The document examines each segment of the required network from access networks to inter-data center networks and the challenges in supporting big data applications.
This document discusses the growing issue of large quantities of data from scientific research and simulations. It notes that data volumes are doubling every 9 months for genomic sequencing and that various scientific projects are now generating petabytes and exabytes of data. Current solutions for data management and analysis are often ad-hoc and inadequate for small and medium scale research. The document advocates for a national research cyberinfrastructure strategy in Canada that could provide standardized services and leverage partnerships with international organizations and commercial cloud providers to support scientists in dealing with large and complex research data. It raises questions about leadership and roles for universities, research organizations, and government in developing solutions.
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
This document presents a model for big data processing using the HACE theorem. It proposes a three-tier data mining structure to provide accurate, real-time social feedback for understanding society. The model adopts Hadoop's MapReduce for big data mining and uses k-means and Naive Bayes algorithms for clustering and classification. The goal is to address challenges of big data and assist governments and businesses in using big data technology.
This document discusses enabling end-to-end eScience through integrating query, workflow, visualization, and mashups at an ocean observatory. It describes using a domain-specific query algebra to optimize queries on unstructured grid data from ocean models. It also discusses enabling rapid prototyping of scientific mashups through visual programming frameworks to facilitate data integration and analysis.
Smart Data and real-world semantic web applications (2004)Amit Sheth
Probably the first recorded use of "smart data" for achieving the Semantic Web and for realizing productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.
2013 retake on this is discussed at: http://wiki.knoesis.org/index.php/Smart_Data
The document discusses tools for analyzing dark data and dark matter, including DeepDive and Apache Spark. DeepDive is highlighted as a system that helps extract value from dark data by creating structured data from unstructured sources and integrating it into existing databases. It allows for sophisticated relationships and inferences about entities. Apache Spark is also summarized as providing high-level abstractions for stream processing, graph analytics, and machine learning on big data.
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Amit Sheth
Keynote at the Workshop on Building Research Collaboration: Electricity Systems. Purdue University, West Lafayette, IN. Aug 28-29, 2013.
Abstract:
Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. Accomplishing this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data.
For achieving energy sustainability, Smart Grids are known to transform the way we generate, distribute, and consume power. Unprecedented amount of data is being collected from smart meters, smart devices, and sensors all throughout the power grid. I will discuss the central question of deriving Value from the entire smart grid data deluge by discussing novel algorithms and techniques such as Semantic Perception for dealing with Velocity, use of ontologies and vocabularies for dealing with Variety, and Continuous Semantics for dealing with Velocity. I will discuss scenarios that exemplify the process of deriving Value from Big Data in the context of Smart Grid.
Additional background is at: http://wiki.knoesis.org/index.php/Smart_Data
A previous version of this talk with more technical details but not focused on energy: http://j.mp/SmatData
The story of how data became big starts many years before the current buzz around big data.The history of Big Data as a term may be brief – but many of the foundations it is built on were laid many years ago. Now, let’s look at a detailed account of the major milestones in the history of sizing data volumes in the evolution of the idea of “big data” and observations pertaining to data or information explosion:
Neste workshop, olhamos para o Linkedin enquanto rede social, que melhor aporta valor para a construção de uma marca pessoa no universo digital. Workshop realizado em parceria com o Centro de Formação FLAG.
Estratégia de Comunicação e Marketing na Era DigitalFábio Mesquita
A empresa anunciou um novo produto revolucionário que promete mudar o mercado. O produto utiliza tecnologia de ponta para oferecer recursos inéditos a um preço acessível. Analistas preveem que o produto terá grande aceitação e poderá levar a empresa a uma nova fase de crescimento.
A modified k means algorithm for big data clusteringSK Ahammad Fahad
Amount of data is getting bigger in every moment and this data comes from everywhere; social media, sensors, search engines, GPS signals, transaction records, satellites, financial markets, ecommerce sites etc. This large volume of data may be semi-structured, unstructured or even structured. So it is important to derive meaningful information from this huge data set. Clustering is the process to categorize data such that data are grouped in the same cluster when they are similar according to specific metrics. In this paper, we are working on k-mean clustering technique to cluster big data. Several methods have been proposed for improving the performance of the k-means clustering algorithm. We propose a method for making the algorithm less time consuming, more effective and efficient for better clustering with reduced complexity. According to our observation, quality of the resulting clusters heavily depends on the selection of initial centroid and changes in data clusters in the subsequence iterations. As we know, after a certain number of iterations, a small part of the data points change their clusters. Therefore, our proposed method first finds the initial centroid and puts an interval between those data elements which will not change their cluster and those which may change their cluster in the subsequence iterations. So that it will reduce the workload significantly in case of very large data sets. We evaluate our method with different sets of data and compare with others methods as well.
This document discusses supercomputing, including its applications across multiple domains like bioinformatics, computational materials science, and computational fluid dynamics. It notes that supercomputers can solve problems too small, large, fast, or slow for normal laboratories. The document also outlines some current challenges for supercomputing, such as developing new architectures for next generation supercomputers and processing large datasets to extract useful information.
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
The Open Science Data Cloud is a petabyte scale science cloud for managing, analyzing, and sharing large datasets. We give an overview of the Open Science Data Cloud and how it can be used for data science research.
The document is a project report submitted by Suraj Sawant to his college on the topic of "Map Reduce in Big Data". It discusses the objectives, introduction and importance of big data and MapReduce. MapReduce is a programming model used for processing large datasets in a distributed manner. The document provides details about the various stages of MapReduce including mapping, shuffling and reducing data. It also includes diagrams to explain the execution process and parallel processing in MapReduce.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
This document provides an overview of big data and the Hadoop ecosystem. It defines big data as large and complex datasets that are difficult to process using traditional data management tools. Characteristics of big data include volume, variety, velocity and veracity. The document discusses challenges of managing big data and how Hadoop provides solutions through its distributed architecture. It also summarizes some prominent Apache projects in the Hadoop ecosystem like Pig, Hive, Spark and Hbase.
Here are a few key points that could be avoided through database technology and responsible departments:
- Prescription drug databases could track a patient's medications across different providers and flag potential drug interactions. This would require databases accessible by different healthcare organizations.
- Government health agencies would need to regulate and help coordinate such prescription drug databases. In the US, the Food and Drug Administration and Centers for Medicare and Medicaid Services could play a role.
- Individual healthcare providers also have a responsibility to access prescription histories from other providers, through available databases, before writing new prescriptions to avoid harmful interactions.
- With proper access controls and privacy protections, consolidated medication databases could help providers make more informed prescribing decisions and improve patient safety by preventing adverse
This document discusses the challenges of building a network infrastructure to support big data applications. Large amounts of data are being generated every day from a variety of sources and need to be aggregated and processed in powerful data centers. However, networks must be optimized to efficiently gather data from distributed sources, transport it to data centers over the Internet backbone, and distribute results. The unique demands of big data in terms of volume, variety and velocity are testing whether current networks can keep up. The document examines each segment of the required network from access networks to inter-data center networks and the challenges in supporting big data applications.
This document discusses the growing issue of large quantities of data from scientific research and simulations. It notes that data volumes are doubling every 9 months for genomic sequencing and that various scientific projects are now generating petabytes and exabytes of data. Current solutions for data management and analysis are often ad-hoc and inadequate for small and medium scale research. The document advocates for a national research cyberinfrastructure strategy in Canada that could provide standardized services and leverage partnerships with international organizations and commercial cloud providers to support scientists in dealing with large and complex research data. It raises questions about leadership and roles for universities, research organizations, and government in developing solutions.
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
This document presents a model for big data processing using the HACE theorem. It proposes a three-tier data mining structure to provide accurate, real-time social feedback for understanding society. The model adopts Hadoop's MapReduce for big data mining and uses k-means and Naive Bayes algorithms for clustering and classification. The goal is to address challenges of big data and assist governments and businesses in using big data technology.
This document discusses enabling end-to-end eScience through integrating query, workflow, visualization, and mashups at an ocean observatory. It describes using a domain-specific query algebra to optimize queries on unstructured grid data from ocean models. It also discusses enabling rapid prototyping of scientific mashups through visual programming frameworks to facilitate data integration and analysis.
Smart Data and real-world semantic web applications (2004)Amit Sheth
Probably the first recorded use of "smart data" for achieving the Semantic Web and for realizing productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.
2013 retake on this is discussed at: http://wiki.knoesis.org/index.php/Smart_Data
The document discusses tools for analyzing dark data and dark matter, including DeepDive and Apache Spark. DeepDive is highlighted as a system that helps extract value from dark data by creating structured data from unstructured sources and integrating it into existing databases. It allows for sophisticated relationships and inferences about entities. Apache Spark is also summarized as providing high-level abstractions for stream processing, graph analytics, and machine learning on big data.
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Amit Sheth
Keynote at the Workshop on Building Research Collaboration: Electricity Systems. Purdue University, West Lafayette, IN. Aug 28-29, 2013.
Abstract:
Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. Accomplishing this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data.
For achieving energy sustainability, Smart Grids are known to transform the way we generate, distribute, and consume power. Unprecedented amount of data is being collected from smart meters, smart devices, and sensors all throughout the power grid. I will discuss the central question of deriving Value from the entire smart grid data deluge by discussing novel algorithms and techniques such as Semantic Perception for dealing with Velocity, use of ontologies and vocabularies for dealing with Variety, and Continuous Semantics for dealing with Velocity. I will discuss scenarios that exemplify the process of deriving Value from Big Data in the context of Smart Grid.
Additional background is at: http://wiki.knoesis.org/index.php/Smart_Data
A previous version of this talk with more technical details but not focused on energy: http://j.mp/SmatData
The story of how data became big starts many years before the current buzz around big data.The history of Big Data as a term may be brief – but many of the foundations it is built on were laid many years ago. Now, let’s look at a detailed account of the major milestones in the history of sizing data volumes in the evolution of the idea of “big data” and observations pertaining to data or information explosion:
Neste workshop, olhamos para o Linkedin enquanto rede social, que melhor aporta valor para a construção de uma marca pessoa no universo digital. Workshop realizado em parceria com o Centro de Formação FLAG.
Estratégia de Comunicação e Marketing na Era DigitalFábio Mesquita
A empresa anunciou um novo produto revolucionário que promete mudar o mercado. O produto utiliza tecnologia de ponta para oferecer recursos inéditos a um preço acessível. Analistas preveem que o produto terá grande aceitação e poderá levar a empresa a uma nova fase de crescimento.
Social Media Marketing & Copywriting | Content Development for Moche's Facebo...Cíntia Pereira
I've devoted myself to Moche, a mobile phone's tariff, for 2 years. These are some of the contents I've created for it's Facebook page, a community with +600.000 fans.
The document announces an upcoming Annual Content Strategy Summit to be held in Berlin on February 16-17, 2017. It lists key speakers from companies like Microsoft, Audi, Google, SAP, Atos, Diageo, Schneider Electric, Dell Technologies, Oracle, The Economist Group, Salesforce, Cisco, and Metro AG. The summit will focus on topics like content for the customer journey, content distribution channels, creating a sustainable user experience, using a holistic content strategy, content recycling, incorporating virtual reality and IoT into content strategy, using a digital omni-channel strategy, personalizing content for business, and content creation for retail B2B.
O documento discute o que é WOM (Word of Mouth), as vantagens do WOM e como implementar estratégias de WOM de forma efetiva. O WOM consiste na comunicação boca a boca entre consumidores. Ele oferece vantagens como recomendações, baixo custo e aumento da qualidade e imagem da marca. Para implementar WOM, deve-se conhecer o público-alvo, criar conteúdos interessantes, envolver influenciadores, dar valor aos clientes e medir os resultados.
O documento resume a estratégia da empresa Rotina Perfeita para o Pinterest, incluindo a criação de boards sobre seus serviços de entrega de refeições, limpeza e cuidados com animais. A estratégia visa promover a marca, divulgar serviços e aumentar o tráfego para o site, além de engajar clientes com conteúdo e concursos nas redes sociais.
O documento discute conceitos de marketing, como a identificação e satisfação de necessidades humanas (Kotler e Keller), e o conhecimento do cliente para vender o produto sozinho (Drucker). Também aborda os objetivos da gestão de marketing, como criar valor e alinhar as pessoas aos valores, e os desafios do marketing como mudanças tecnológicas e comportamentais.
The document outlines 6 characteristics of compelling content: concept, structure, media, copy, stories, and challenges. It provides examples of different types of media that can be used in digital learning like videos, podcasts, slideshows, screencasts, and interactive materials. It discusses how to engage learners through relevance, stories, and goals that stretch them. The document promotes using a variety of media elements and focusing on one verbal and one visual element at a time according to multimedia learning theory. It concludes by stating compelling content can include these various applications and media for use in digital learning design programs.
Este documento apresenta um plano de marketing digital para a comunicação corporativa de uma empresa. O plano define estratégias para redes sociais, marketing de busca e mobile marketing, com o objetivo de aumentar a notoriedade da marca na América Latina e apoiar revendedores e pequenas empresas. Inclui propostas de conteúdos, ações e orçamentos para diferentes públicos-alvo e geografias.
O documento fornece uma análise do site da Distrifa em três áreas principais: indexação, keywords e links. A análise conclui que o site precisa melhorar a indexação de conteúdo, otimizar o uso de keywords e descrições, e desenvolver links internos e externos para aumentar o tráfego e visibilidade.
O documento descreve o plano de negócios para o desenvolvimento de uma plataforma de e-commerce para a GadgetStore, incluindo o público-alvo, análise da concorrência, orçamento, website e estratégia de marketing e comunicação.
Identidade Profissional - Linkedin e Outras Redes SociaisMárcio Miranda
O documento discute como os recrutadores usam as redes sociais para pesquisar candidatos e a importância de se criar uma identidade profissional online positiva. Recomenda-se completar perfis em redes como LinkedIn, manter privacidade em redes como Facebook e usar sites e blogs para compartilhar conhecimento e experiência.
Machine Learning meets Granular Computing: the emergence of granular models in the Big Data era
** Presentation Slides from Dr Rafael Falcon, from Larus Technologies, for the February 2018 Ottawa Machine Learning & Artificial Intelligence Meetup
Abstract
Traditional Machine Learning (ML) models are unable to effectively cope with the challenges posed by the many V’s (volume, velocity, variety, etc.) characterizing the Big Data phenomenon. This has triggered the need to revisit the underlying principles and assumptions ML stands upon. Dimensionality reduction, feature/instance selection, increased computational power and parallel/distributed algorithm implementations are well-known approaches to deal with these large volumes of data.
In this talk we will introduce Granular Computing (GrC), a vibrant research discipline devoted to the design of high-level information granules and their inference frameworks. By adopting more symbolic constructs such as sets, intervals or similarity classes to describe numerical data, GrC has paved the way for a more human-centric manner of interacting with and reasoning about the real world. We will go over several granular models that address common ML tasks such as classification/clustering and will outline a methodology to appropriately design information granules for the problem at hand. Though not a mainstream concept yet, GrC is a promising direction for ML systems to harness Big Data.
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
Big Data, Big Deal is a document that discusses big data. It begins by defining big data as high-volume, high-velocity, and high-variety information that requires new processing methods. It then discusses the key drivers for big data, including technical drivers like increased data storage and social media, as well as business drivers like customer analytics and public opinion analysis. The document concludes by discussing challenges for big data like data quality, privacy, and the need for skilled data scientists with technical expertise, curiosity, storytelling abilities, and cleverness.
This document discusses big data, defining it as large volumes of both structured and unstructured data that are growing rapidly and are difficult to process using traditional database and software techniques. It notes that big data is characterized by high volume, velocity, and variety. The document outlines common big data architectures and technologies, provides examples of big data applications in government, private sector, and science. It also discusses challenges of big data as well as potential transformations and critiques, and proposes solutions to security issues.
The document discusses addressing data management challenges in the cloud. It begins by introducing the scale of digital data using common size prefixes like kilobyte and petabyte. It then discusses sources of massive data from sensors, social media, and scientific experiments. The challenges of big data are defined through the 3Vs model of increasing volume, velocity and variety of data types. Cloud computing architectures and delivery models like IaaS, PaaS and SaaS are introduced as ways to provide elastic resources for data management. The concept of polyglot persistence using the appropriate data store for the job is discussed over relying solely on relational databases.
Big Data is changing abruptly, and where it is likely headingPaco Nathan
Big Data technologies are changing rapidly due to shifts in hardware, data types, and software frameworks. Incumbent Big Data technologies do not fully leverage newer hardware like multicore processors and large memory spaces, while newer open source projects like Spark have emerged to better utilize these resources. Containers, clouds, functional programming, databases, approximations, and notebooks represent significant trends in how Big Data is managed and analyzed at large scale.
The Age of Exabytes: Tools & Approaches for Managing Big DataReadWrite
This document discusses the rise of big data and innovations needed to manage exabytes of information. It covers developments in data storage from the chip level to large data centers. New databases like NoSQL are designed to handle non-relational and distributed data at web-scale. Real-time processing of big data requires distributed computing across many servers. Overall the document explores challenges posed by the growth of digital information and technological solutions emerging to process and analyze exabytes of data.
A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
This document summarizes discussions from a workshop organized by the National Institute of Standards and Technology's (NIST) Big Data Public Working Group. The workshop included four panels that discussed: 1) the current state of big data technologies; 2) future trends in big data hardware, computing models, analytics and measurement; 3) methods for improving big data sharing and collaboration; and 4) security and privacy concerns with big data. The panels featured presentations on topics such as big data reference architectures, use cases, benchmarks, data consistency issues, and approaches for enabling secure big data applications while preserving privacy.
This document discusses data mining on big data. It begins with an introduction to big data and data mining. It then discusses the characteristics of big data, known as the HASE theorem, which are that big data is huge in volume, heterogeneous and diverse, from autonomous sources, and has complex and evolving relationships. It presents a conceptual framework for big data processing with three tiers: tier I focuses on low-level data access and computing using techniques like MapReduce; tier II concentrates on semantics, knowledge and privacy; and tier III addresses data mining algorithms. The document concludes that high performance computing platforms are needed to fully leverage big data.
This document discusses big data and data mining. It defines big data as large volumes of diverse data that require new techniques and technologies to capture, manage and analyze. Factors driving big data growth include increased data collection, sources like the internet and sensors, and data types like text, images and video. Challenges include rapid data growth and developing new algorithms and architectures to handle varied and massive datasets. Data mining extracts useful patterns from large datasets and is an interdisciplinary field drawing from machine learning, statistics, databases and other areas.
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
This document discusses big data and data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional techniques due to their size. It outlines the 4 Vs of big data: volume, velocity, variety, and veracity. The proposed system would use distributed parallel computing with Hadoop to identify relationships in huge amounts of data from different sources and dimensions. It discusses challenges of big data like data location, volume, privacy, and gaining insights. Solutions involve parallel programming, distributed storage, and access restrictions.
This document discusses data mining with big data. It defines big data and data mining. Big data is characterized by its volume, variety, and velocity. The amount of data in the world is growing exponentially with 2.5 quintillion bytes created daily. The proposed system would use distributed parallel computing with Hadoop to handle large volumes of varied data types. It would provide a platform to process data across dimensions and summarize results while addressing challenges such as data location, privacy, and hardware resources.
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
We highlight a few trends of massive data that are available for corporations, government agencies and researchers and some examples of opportunities that exist for turning this data into knowledge. We provide a brief overview of some of the state-of-the-art technologies in the massive data analysis landscape. Then, we describe two applications from two diverse areas in detail: recommendations in e-commerce, link discovery from biomedical literature. Finally, we present some challenges and open problems in the field of massive data analysis.
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
This document discusses big data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional methods due to their size. It describes the characteristics of big data including volume, variety, velocity, variability, and complexity. It also discusses challenges of big data such as data location, volume, hardware resources, and privacy. Popular tools for big data mining include Hadoop, Apache S4, Storm, Apache Mahout, and MOA. Hadoop is an open source software framework that allows distributed processing of large datasets across clusters of computers. Common algorithms for big data mining operate at the model and knowledge levels to discover patterns and correlations across distributed data sources.
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
Internet Infrastructures for Big Data
Talk given at Verisign's Distinguished Speaker Series, 2014
Prof. Philippe Cudre-Mauroux
eXascale Infolab
http://exascale.info/
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
geostack
1. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
A GEO-stack for Big Data
Driving Spatial Analysis Beyond the Limits of Traditional Storage
Joana Sim˜oes 1
1Bdigital, CASA, CICS.NOVA
March 12, 2015
1 / 62
2. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data Word Cloud; source:http://olap.com/big-data/
2 / 62
3. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Table of Contents
1 The Value of Data
2 The Big Data Revolution
3 An Use Case
4 Final Remarks
3 / 62
4. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Warning
* This presentation may contain tech talk, such as: databases,
clusters, NoSQL, cloud, parallelism, scalability.
If you are susceptible to these concepts, you may want to leave the
room now.
From this point beyond, you are at your own risk! *
4 / 62
5. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Not a New Idea!
Matthew Fontaine Maury
(1806-1873).
5 / 62
6. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Not a New Idea!
Matthew Fontaine Maury
(1806-1873).
Foresaw the hidden value on
captain’s ships logs, when
analysed collectively.
Used time series data to
carry out analysis that would
enable him to recomend
optimal shipping routes.
6 / 62
7. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Data Mining, Open-source and Crowd-sourcing
7 / 62
8. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Data Mining, Open-source and Crowd-sourcing
In 1848, Captain Jackson was the the first person to try the route
recommended by Maury, and as a result he was able to save 17 days
on his outbond trip.
8 / 62
9. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Data Mining, Open-source and Crowd-sourcing
In 1848, Captain Jackson was the the first person to try the route
recommended by Maury, and as a result he was able to save 17 days
on his outbond trip.
Apart from collecting existing logs, Maury encouraged the collection
of more regular and systematic time series, by creating a template.
9 / 62
10. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
The Value of Data
Data Mining, Open-source and Crowd-sourcing
In 1848, Captain Jackson was the the first person to try the route
recommended by Maury, and as a result he was able to save 17 days
on his outbond trip.
Apart from collecting existing logs, Maury encouraged the collection
of more regular and systematic time series, by creating a template.
Collected Data: longitude, latitude, currents, magnetic variation, air
and water temperature, general wind direction, etc.
10 / 62
11. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Data Analysis
A multidisciplinary field
11 / 62
12. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Data Analysis
Traditional Stack
Spreadsheets (e.g.: Excel, OpenOffice)
RDBMS (e.g.: Oracle, PostgreSQL, MySQL)
Statistical Packages (e.g.: R, Matlab)
GIS Packages (e.g.: QGIS)
Scripting and programming languages (e.g.: Python, Java)
Libraries
12 / 62
13. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
What changed in recent years?
Differences in the way global business and transportation are
done have exploded the volume of traditional data sources.
Widespread increase in data from sensors.
13 / 62
14. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
Smart Citizen Project
https://smartcitizen.me/
14 / 62
15. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
Smart Citizen Project
Platform to generate
participatory processes of
people in the cities, by
connecting data, people and
knowledge.
https://smartcitizen.me/
15 / 62
16. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
Smart Citizen Project
Platform to generate
participatory processes of
people in the cities, by
connecting data, people and
knowledge.
Based on geolocation, Internet
and free hardware and software
for data collection and sharing.
https://smartcitizen.me/
16 / 62
17. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
Smart Citizen Project
Platform to generate
participatory processes of
people in the cities, by
connecting data, people and
knowledge.
Based on geolocation, Internet
and free hardware and software
for data collection and sharing.
Relies on the production of
objects to connect people with
their environment and their
city.
https://smartcitizen.me/
17 / 62
18. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
Based on reports, Fillipo Arrieta published maps
describing a multi-stage containment plan designed to
limit the plague in Bari (1864).
The iLab used the ushahidi platform to collect and display
crowd-sorcing information about Ebola in Liberia (2014).
18 / 62
19. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
A great deal of this data is actually geo-located (e.g.: satellite
navigate coordinates, ip addresses).
Based on reports, Fillipo Arrieta published maps
describing a multi-stage containment plan designed to
limit the plague in Bari (1864).
The iLab used the ushahidi platform to collect and display
crowd-sorcing information about Ebola in Liberia (2014).
19 / 62
20. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
A great deal of this data is actually geo-located (e.g.: satellite
navigate coordinates, ip addresses).
Geography has finally the opportunity to switch from being based
on guesses and samples, to become a truly data-driven science.
Based on reports, Fillipo Arrieta published maps
describing a multi-stage containment plan designed to
limit the plague in Bari (1864).
The iLab used the ushahidi platform to collect and display
crowd-sorcing information about Ebola in Liberia (2014).
20 / 62
21. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
What characterizes Big Data?
21 / 62
22. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
What characterizes Big Data?
Volume: Large amounts of data.
22 / 62
23. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
What characterizes Big Data?
Volume: Large amounts of data.
Variety: Different types of structured, unstructured, and
multi-structured data.
23 / 62
24. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Big Data
What characterizes Big Data?
Volume: Large amounts of data.
Variety: Different types of structured, unstructured, and
multi-structured data.
Velocity: Needs to be analyzed quickly.
24 / 62
25. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Technological Challenges
These characteristics map into challenges:
Scalability
Heterogeneity
Low latency
When the traditional stack is no longer enough, a paradigm shift is
required.
25 / 62
26. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
26 / 62
27. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
NoSQL databases trade away some capabilities of relational
databases (SQL), in order to improve scalability.
27 / 62
28. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
NoSQL databases trade away some capabilities of relational
databases (SQL), in order to improve scalability.
Advantages: NoSQL databases are simpler, can handle
semi-structured and denormalized data and have an higher
scalability.
28 / 62
29. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
NoSQL databases trade away some capabilities of relational
databases (SQL), in order to improve scalability.
Advantages: NoSQL databases are simpler, can handle
semi-structured and denormalized data and have an higher
scalability.
Disadvantages: loss of abstraction provided by the query
optimizer, that increases the complexity of the applications.
29 / 62
30. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
NoSQL databases trade away some capabilities of relational
databases (SQL), in order to improve scalability.
Advantages: NoSQL databases are simpler, can handle
semi-structured and denormalized data and have an higher
scalability.
Disadvantages: loss of abstraction provided by the query
optimizer, that increases the complexity of the applications.
Recently, tools were developed that bring back the full power
of SQL language to the NoSQL ecosystem (e.g.: Apache Drill,
Hive). 30 / 62
31. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
RDBMS vs NoSQL
31 / 62
32. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Examples
32 / 62
33. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
MapReduce
Programming model for processing and generating large data sets
with a parallel, distributed algorithm on a cluster.
Map(): performs filtering and sorting.
Reduce(): performs a summary operation.
33 / 62
34. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
MapReduce
Programming model for processing and generating large data sets
with a parallel, distributed algorithm on a cluster.
Map(): performs filtering and sorting.
Reduce(): performs a summary operation.
The framework coordinates the processing, by marshalling the
distributed servers, running the tasks and parallel and
managing all communications and data between the various
parts of the system.
34 / 62
35. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
MapReduce
Programming model for processing and generating large data sets
with a parallel, distributed algorithm on a cluster.
Map(): performs filtering and sorting.
Reduce(): performs a summary operation.
The framework coordinates the processing, by marshalling the
distributed servers, running the tasks and parallel and
managing all communications and data between the various
parts of the system.
There are many libraries that implement MapReduce.
35 / 62
36. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
A Big Data Approach
Does this mean we have to throw away our traditional tools
and methods?
36 / 62
37. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
A Big Data Approach
Does this mean we have to throw away our traditional tools
and methods?
First ensure that you really have, or will have Big Data at
some point in the future.
37 / 62
38. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
A Big Data Approach
Does this mean we have to throw away our traditional tools
and methods?
First ensure that you really have, or will have Big Data at
some point in the future.
Then identify the stages of the workflow that are bottlenecks,
in terms of the current technologies.
38 / 62
39. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
A Big Data Approach
Does this mean we have to throw away our traditional tools
and methods?
First ensure that you really have, or will have Big Data at
some point in the future.
Then identify the stages of the workflow that are bottlenecks,
in terms of the current technologies.
It is possible to mix & and match.
39 / 62
40. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Analysing geo-located Tweets
40 / 62
41. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Analysing geo-located Tweets
The purpose of this use case was to analyse the stream of
geo-located Tweets, as a sensor for citizen presence.
41 / 62
42. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Analysing geo-located Tweets
The purpose of this use case was to analyse the stream of
geo-located Tweets, as a sensor for citizen presence.
Number of tweets in Catalunya in around 3 months: +- 6
million.
42 / 62
43. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Analysing geo-located Tweets
The purpose of this use case was to analyse the stream of
geo-located Tweets, as a sensor for citizen presence.
Number of tweets in Catalunya in around 3 months: +- 6
million.
Continuous stream of data.
43 / 62
44. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Analysing geo-located Tweets
The purpose of this use case was to analyse the stream of
geo-located Tweets, as a sensor for citizen presence.
Number of tweets in Catalunya in around 3 months: +- 6
million.
Continuous stream of data.
This amount of data is not easily assimilated by the
”human-eye”, so we decided to create clusters of Tweets.
44 / 62
45. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
An Use Case
Clustering
45 / 62
46. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
An Use Case
Clustering
It is a descriptive data mining technique, often used for
dimensionality reduction.
46 / 62
47. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
An Use Case
Clustering
It is a descriptive data mining technique, often used for
dimensionality reduction.
It groups a set of objects in such a way that objects in the
same group are more similar to each other, than to object in
other groups.
47 / 62
48. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
An Use Case
Clustering
It is a descriptive data mining technique, often used for
dimensionality reduction.
It groups a set of objects in such a way that objects in the
same group are more similar to each other, than to object in
other groups.
Strictly, it corresponds to a family of unsupervised machine
learning algorithms.
48 / 62
49. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
An Use Case
Clustering
It is a descriptive data mining technique, often used for
dimensionality reduction.
It groups a set of objects in such a way that objects in the
same group are more similar to each other, than to object in
other groups.
Strictly, it corresponds to a family of unsupervised machine
learning algorithms.
We wanted to implement this concept using only Hadoop, and
apply it to spatial attributes.
49 / 62
50. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Technological Stack
50 / 62
51. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Workflow
51 / 62
52. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Results
Using this workflow we were able to turn the original point cloud,
into a density grid, and then into clusters.
Clusters expose patterns of presence over time, and can help us to
re-define boundaries for the city that are data-driven, rather than
administrative.
52 / 62
53. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
53 / 62
54. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
It is ok to use non scalable tools, at certain points of the
workflow.
54 / 62
55. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
It is ok to use non scalable tools, at certain points of the
workflow.
We are working on the edge of existing technologies; some
functions were not implemented yet, and bugs are common.
55 / 62
56. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
It is ok to use non scalable tools, at certain points of the
workflow.
We are working on the edge of existing technologies; some
functions were not implemented yet, and bugs are common.
Since it is a niche, this is even more true for spatial
technologies.
56 / 62
57. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
It is ok to use non scalable tools, at certain points of the
workflow.
We are working on the edge of existing technologies; some
functions were not implemented yet, and bugs are common.
Since it is a niche, this is even more true for spatial
technologies.
There are not ready-made solutions: the particular stack and
workflow should be compiled for the specific case study.
57 / 62
58. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Final Remarks
It is ok to use non scalable tools, at certain points of the
workflow.
We are working on the edge of existing technologies; some
functions were not implemented yet, and bugs are common.
Since it is a niche, this is even more true for spatial
technologies.
There are not ready-made solutions: the particular stack and
workflow should be compiled for the specific case study.
this is one stack to solve this problem; it is not the only one,
and it may not even be the ”best” one;
58 / 62
59. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Acknowledgements
I would like to thank Ellen Friedman (MapR, Apache Mahout,
Apache Drill), for her inspiring work on Big Data and Machine
Learning.
59 / 62
60. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
References
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X. and
Saltz, J.” Hadoop-GIS: A High Performance Spatial Data
Warehousing System over MapReduce”. Proceedings of the
VLDB Endowment International Conference on Very Large
Data Bases, 6(11), p1009. (2013)
Cuesta, H. ”Practical Data Analysis”. Packt Publishing (2013)
Dunning, T. and Friedman, E. ”Time Series Databases: New
Ways to Store and Access Data”. O’Reilly Media; 1 edition
(October, 2014)
Myatt, G and Johnson, W. ”Making Sense Of Data I: A
Practical Guide to Exploratory Data Analysis and Data
Mining”. O’Reilly : 2 edition (2014).
60 / 62
61. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Thank You!
This presentation is available at:
http://tinyurl.com/pcmgzxp
Next:
61 / 62
62. The Value of Data
The Big Data Revolution
An Use Case
Final Remarks
Thank You!
This presentation is available at:
http://tinyurl.com/pcmgzxp
Next:
9as Jornadas de SIG Libre - 26th-27th March 2015, Girona.
62 / 62