This talk is about how both private enterprise and government wish to improve the value of their data and how they deal with this issue. The talk summarizes the ways we think about Big Data, Open Data and their use by organizations or individuals. Big Data is explained in terms of collection, storage, analysis and valuation. This data is collected from numerous sources including networks of sensors, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analysis techniques to study both structured and unstructured data. Due to volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trendy topics nowadays.
In this talk several applications and case studies are presented as examples. Data which come from government sources must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, available for smartphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).
government of India has launched "Smart Cities Mission" on 25th June 2015.
This is a presentation explaining the guidelines and procedure for this mission.
government of India has launched "Smart Cities Mission" on 25th June 2015.
This is a presentation explaining the guidelines and procedure for this mission.
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
A presentation I gave at the 2018 Molecular Med Tri-Con in San Francisco, February 2018. This addresses the general challenge of biomedical data management, some of the things to consider when evaluation solutions in this space, and concludes with a brief summary of some of the tools and platforms in this space.
Enabling the physical world to the Internet and potential benefits for agricu...Andreas Kamilaris
The Internet of Things (IoT) allows physical devices that live inside smart homes, offices, roads, electricity networks and city infrastructures to seamlessly communicate through the Internet while the forthcoming Web of Things (WoT) ensures interoperability at the application level through standardized Web technologies and protocols. In this presentation, we explain the concepts of the IoT and the WoT and their potential through various applications in the aforementioned domains. Then, we examine how the IoT/WoT can be used in the agri-food industry in order to enable novel smart farming technologies and applications,considering the recent technological opportunities for big data analysis.
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
Next generation genomics: Petascale data in the life sciencesGuy Coates
Keynote presentation at OGF 28.
The year 2000 saw the release of "The" human genome, the product of a the combined sequencing effort of the whole planet. In 2010, single institutions are sequencing thousands of genomes a year, producing petabytes of data. Furthermore, many of the large scale sequencing projects are based around international collaboration and consortia. The talk will explore how Grid and Cloud technologies are being used to share genomics data around the planet, revolutionizing life science research.
This work is about how both private enterprise and government wish to improve their data value and how they deal with this issue. The talk summarizes the way of thinking about Big Data, Open Data and their use by organizations or individuals. Big Data is explained from collecting, storing, analyzing and put in value. This data is collected from numerous sources including sensor networks, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analytical techniques to study both structured and unstructured data. Due to the volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trending topic nowadays.
Data which come from government must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, avalaible for smarthphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
Data mining or Knowledge Discovery in Databases (KDD) is a new field in information technology that emerged because of progress in creation and maintenance of large databases by combining statistical and artificial intelligence methods with database management. Data mining is used to recognize hidden patterns and provide relevant information for decision making on complex problems where conventional methods are inecient or too slow. Data mining can be used as a powerful tool to predict future trends and behaviors, and this prediction allows making proactive, knowledge-driven decisions in businesses. Since the automated prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools, it can answer the business questions which are traditionally time consuming to resolve. Based on this great advantage, it provides more interest for the government, industry and commerce. In this paper we have used this tool to investigate the Euro currency fluctuation.For this investigation, we have three different algorithms: K*, IBK and MLP and we have extracted.Euro currency volatility by using the same criteria for all used algorithms. The used dataset has
21,084 records and is collected from daily price fluctuations in the Euro currency in the period
of10/2006 to 04/2010.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
In this seminar I introduce part of my work as a member of Social Big Data Consortium within GRASIA research group at Universidad Complutense de Madrid (2018)
More Related Content
Similar to Big&open data challenges for smartcity-PIC2014 Shanghai
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
A presentation I gave at the 2018 Molecular Med Tri-Con in San Francisco, February 2018. This addresses the general challenge of biomedical data management, some of the things to consider when evaluation solutions in this space, and concludes with a brief summary of some of the tools and platforms in this space.
Enabling the physical world to the Internet and potential benefits for agricu...Andreas Kamilaris
The Internet of Things (IoT) allows physical devices that live inside smart homes, offices, roads, electricity networks and city infrastructures to seamlessly communicate through the Internet while the forthcoming Web of Things (WoT) ensures interoperability at the application level through standardized Web technologies and protocols. In this presentation, we explain the concepts of the IoT and the WoT and their potential through various applications in the aforementioned domains. Then, we examine how the IoT/WoT can be used in the agri-food industry in order to enable novel smart farming technologies and applications,considering the recent technological opportunities for big data analysis.
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
Next generation genomics: Petascale data in the life sciencesGuy Coates
Keynote presentation at OGF 28.
The year 2000 saw the release of "The" human genome, the product of a the combined sequencing effort of the whole planet. In 2010, single institutions are sequencing thousands of genomes a year, producing petabytes of data. Furthermore, many of the large scale sequencing projects are based around international collaboration and consortia. The talk will explore how Grid and Cloud technologies are being used to share genomics data around the planet, revolutionizing life science research.
This work is about how both private enterprise and government wish to improve their data value and how they deal with this issue. The talk summarizes the way of thinking about Big Data, Open Data and their use by organizations or individuals. Big Data is explained from collecting, storing, analyzing and put in value. This data is collected from numerous sources including sensor networks, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analytical techniques to study both structured and unstructured data. Due to the volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trending topic nowadays.
Data which come from government must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, avalaible for smarthphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
Data mining or Knowledge Discovery in Databases (KDD) is a new field in information technology that emerged because of progress in creation and maintenance of large databases by combining statistical and artificial intelligence methods with database management. Data mining is used to recognize hidden patterns and provide relevant information for decision making on complex problems where conventional methods are inecient or too slow. Data mining can be used as a powerful tool to predict future trends and behaviors, and this prediction allows making proactive, knowledge-driven decisions in businesses. Since the automated prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools, it can answer the business questions which are traditionally time consuming to resolve. Based on this great advantage, it provides more interest for the government, industry and commerce. In this paper we have used this tool to investigate the Euro currency fluctuation.For this investigation, we have three different algorithms: K*, IBK and MLP and we have extracted.Euro currency volatility by using the same criteria for all used algorithms. The used dataset has
21,084 records and is collected from daily price fluctuations in the Euro currency in the period
of10/2006 to 04/2010.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
In this seminar I introduce part of my work as a member of Social Big Data Consortium within GRASIA research group at Universidad Complutense de Madrid (2018)
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALESVictoria López
I Jornada Big Data UCM: Se presenta nuestro grupo de investigación y dos proyectos en desarrollo: Bip4Cast (prevención de crisis en enfermos de bipolarida) y WAP Madrid (prevención de obesidad y colesterol mediante una red social deportiva)
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Big&open data challenges for smartcity-PIC2014 Shanghai
1. Big and Open Data
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
2. Big and Open data. Challenges for Smartcity
• Introduction
• Fighting with Big Data: Genoma Data
• Big Data. Big Projects
• Open Data. Technology Transfer Opportunities
• Smartcity. Big and Open Systems
• Madrid as Smartcity
• Conclusions
2
3. Introduction
Our Goal: to transfer technology and knowledge
– Mobile technologies applyed to environment
– Intelligent agents
– Optimization and forecasting from data
– Bioinformatics, Biostatistics
G-TeC group: statisticians, physicists, mathematicians,
economists and several computer scientists.
– www.tecnologiaUCM.es
4. Fighting with the Big Data
• Every day we need to deal with more and more data.
• For many years, new computers with more memory and higher
speed seem to be the solution for data growing (Elephant vendors).
• Many researching areas which was fighting with the Big Data:
Bioinformatics, Genoma data, DNA, RNA, proteins and, in general all
biological data have been required by computing monitors and
storing in large data bases in several laboratories and researching
centers along the world.
The future of genomics rests on the foundation of the Human Genome Project4
5. Fighting with the Big Data
• Each time an organization or an individual is not able
to deal with data, a big data problem is facing.
• Human Genoma Project managed with same
philosophy than modern Big Data: large data bases
distributed along the world with parallel processing
when available and suitable.
• Our experience: Sequence alignment and its
optimization with Dynamic Programming and
their heuristics.
• The amount of biological data is a Big Data base.
• Adding new sequences, searching and forecasting are
task very similar than those we face in every Big Data
problem.
5
6. 22/05/2014
Vineyards in La Geria, Lanzarote
6
Case of Use. Looking for a Fungus
• Application to infections in agricultural
crops when it is no possible to identify
the real fungus.
• The responsible needs to make
decisions about what to do, what
medicine apply, or procedure is better.
– A fragment of fungus DNA must be
sequenced in the lab.
– Then the scientist looks for it in molecular
data bases by means of sequence
searching (“DB homology search”).
– Some alignment algorithms (Blast, Fasta)
are executed to return the best matches.
• gtttacgctctacaaccctttgtgaacatacctacaactgttg
cttcggcgggtagggtctccgcgaccctcccggcctcccgcct
ccgggcgggtcggcgcccgccggaggataaccaaactctgatt
taacgacgtttcttctgagtggtacaagcaaataatcaaaact
tttaacaaccggatctcttggttctggcatcgatgaagaacgc
agcgaaatgcgataagtaatgtgaat
The sequence
7. 22/05/2014 7
1. EBI: European Bioinformatics Institute
2. Choose the tools available into the web site
a. Fasta3
b. Select DATABASE:
• Nucleic ACIDS
• FUNGI
c. Fit sequences and run queries
3. A sorted list (but not complete) from better to
worst similarity is returned.
Data Base and Algorithm Selection
PIC 2014, Shanghai
Case of Use
13. 22/05/2014 13
The output
• FASTA searches a protein or DNA sequence data bank
• version 3.3t09 May 18, 2001
• Please cite:
• W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
• @:1-: 241 nt
•
• vs EMBL Fungi library
• searching /ebi/services/idata/v225/fastadb/em_fun library
• 104701680 residues in 66478 sequences
• statistics extrapolated from 60000 to 61164 sequences
• Expectation_n fit: rho(ln(x))= -1.2290+/-0.000361; mu= 72.1313+/- 0.026
• mean_var=907.6270+/-295.007, 0's: 68 Z-trim: 4246 B-trim: 15652 in 3/79
• Lambda= 0.0426
• FASTA (3.39 May 2001) function [optimized, +5/-4 matrix (5:-4)] ktup: 6
• join: 48, opt: 33, gap-pen: -16/ -4, width: 16
• Scan time: 3.180
• The best scores are: opt bits E(61164)
• EM_FUN:CGL301988 AJ301988.1 Colletotrichum glo (1484) [f] 1184 88 5.7e-17
• EM_FUN:AF090855 AF090855.1 Colletotrichum gloe ( 500) [f] 1205 88 7.3e-17
• EM_FUN:CGL301986 AJ301986.1 Colletotrichum glo (1484) [f] 1166 87 1.2e-16
• EM_FUN:CGL301908 AJ301908.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16
• EM_FUN:CGL301909 AJ301909.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16
• EM_FUN:CGL301907 AJ301907.1 Colletotrichum glo (2867) [f] 1148 87 1.3e-16
• EM_FUN:CGL301919 AJ301919.1 Colletotrichum glo (1171) [f] 1166 87 1.6e-16
• EM_FUN:CGL301977 AJ301977.1 Colletotrichum glo (1876) [f] 1148 86 2e-16
• EM_FUN:CFR301912 AJ301912.1 Colletotrichum fra (2870) [f] 1137 86 2.1e-16
PIC 2014, Shanghai
Case of Use
14. Our background about Bioinformatics
• Bioinformatics (Master in researching in
Informatics, UCM)
• Several Master Thesis & publications
– Alignment of sequences with R and Rhadoop*
– Analysis & Visualization with R Language and
Chernoff faces
– Others
14
15. Big Data
From Data Warehouse to Big Data (large Data Bases)
15
1970 relational model invented
RDBMS declared mainstream till 90s
One-size fits all, Elephant vendors- heavily
encoded even indexing by B-trees.
16. Alex ' Sandy' Pentland, director of 'Media Lab' at
Massachusetts Institute of Technology (MIT):
The big data revolution,
2013 Campus Party Europe
16
Nowadays bussiness needs a high
avalailability of data, then new
techniques must be developed:
Complex analytics, Graph Databases
Data Volume is increasing
exponentially
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
17. unstructured
data
17
¿Quién genera Big Data?
Progress and innovation are no longer hampered by the ability to collect data,
but the ability to manage, analyze, synthesize, visualize, and discover
knowledge from data collected in a timely manner and in a scalable way
19. From data to value
• Big Data Collection
– Monitoring
– Data cleaning and integration
– Hosted Data Platforms and the Cloud
• Big Data Storage
– Modern Data Bases
– Distributed Computing Platforms
– NoSQL, NewSQL
• Big Data Systems
– Security
– Multicore scalability
– Visualization and User Interfaces
• Big Data Analytics
– Fast algorithms
– Data compression
– Machine learning tools
– Visualization & Reporting
19
The MIT proposal stage list
to deal with Big Data
20. Big Data in use
1. High Availability is now a requirement
2. Host (not only in house) and Cloudcomputing
3. Running in parallel
1. Data Aggregation process
2. Analytics on Data
3. GraphDBMSs similarities
4. Not only SQL: Cassandra* and MongoDB**
*The Apache Cassandra database is the right choice when you need
scalability and high availability without compromising performance.
**Document oriented storage
20
MONGO
21. 21
• Main feature: scalability to many nodes
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days
– Scan in a cluster of 1000 nodes = 33 minutes
MapReduce
– Parallel programming model
– Simple concept, smart, suitable for multiple applications
– Big datasets multi-node in multiprocessors
– Sets of nodes: Clusters or Grids (distributed programming)
• By Google (2004)
– Able to process 20 PB per day
– Based on Map & Reduce, classiclal methods in functional programming
related to the classic divide & conquer
– Come from numeric analysis (big matrix products).
Big Data: Map Reduce
MapReduce
22. • Friendly for non technical users
Map Reduce
22
Big Data: Map Reduce
24. More technical information
• http://www.slideshare.net/vlopezlo
24www.hortonworks.com www.coursera.com www.Bigdatauniversity.com www.mit.edu
25. Technology Transfer Opportunities
• A great opportunity for researchers working to transfer
technology, who can increase their efforts in
developing new techniques in optimization of:
– Monitoring data (Sensors, smartphones, …)
– Storing data (Cloud Computing, Amazon S3, EC2, Google
BigQuery, Tableau …)
– Cleaning, Integrating & Processing data (Data Curation at
Scale: The Data Tamer System, M. Stonebraker et al., CIDR 2013)
– Analysing data (R, SAS… but also Google, Amazon, eBay...)
– Encryption & searching on encrypted data
– Techniques of Data Mining (Machine Learning, Data
Clustering, Predictive Models, ...) which are compatible
with big data by complex analytics
25
26. Big Data. Big Projects.
• Google
• eBay
• Amazon
• Twitter
• …
• They develop big projects with their big data,
but also many business get their data to make
analysis.
• Government data. Public data.
26
29. Academia & Industry Working Together
OMUS
Industry
know-how
and
expertise
Data
Collection Big
Data
and
Analytics
Patents,
Intellectual
Property and
other output
Doctoral
Thesis: joint
guidance
University
Theoretical
Models &
Research
30. Open Data
“Open data is data that can be freely used, reused and redistributed by anyone –
subject only, at most, to the requirement to attribute and sharealike.”
OpenDefinition.org -
“Open data is data that can be freely used,
reused and redistributed by anyone – subject
only, at most, to the requirement to attribute
and share alike.” OpenDefinition.org
Availability and Access: the data must be
available as a whole and at no more than a
reasonable reproduction cost, preferably by
downloading over the internet. The data
must also be available in a convenient and
modifiable form.
Reuse and Redistribution: the data must be
provided under terms that permit reuse and
redistribution including the intermixing with
other datasets. The data must be machine-
readable.
Universal Participation: everyone must be
able to use, reuse and redistribute – there
should be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or
restrictions of use for certain purposes (e.g.
only in education), are not allowed.
30
33. Open Data for Smartcity
• What a citizen can expect when living in a
city?
• Internet of the things
– Libraries
– Public transportation, trafic monitoring
– Pets, devices, cars, even people
• Intelligent agents
– Interacting without our control
– Credit cards control (BBVA case of use)
33
34. C-KAN
• The Comprehensive Knowledge Archive
Network (CKAN) is a web-based open source
data management system for the storage and
distribution of data, such as spreadsheets
and the contents of databases. It is inspired
by the package management capabilities
common to open source operating systems
like Linux.
34
• Its code base is maintained by the Open Knowledge
Foundation.
• The system is used both as a public platform on Datahub and
in various government data catalogues (UK's data.gov.uk, the
Dutch National Data Register, the United States government's
Data.gov and the Australian government's "Gov 2.0“)
36. Smartcity concept
• Large amount of people. Big cities.
– Search 7 thousand differences
• Smartcity business.
• The role of technology in the city: efficiency & security
• Normalization of the concept of Smartcity (May, 2014)
– Better quality of life. Security
– Sustainability
– Innovation opportunities
– Multidiscipline: social researchers, engineers, architects, …
• Relationships are in change. Based on mobile
technologies (smartphones, tablets, internet of the
things,…)
• Transverse developing projects: sensors and monitoring
devices, connectivity, platform, services in the cloud. 36
37. Smartcity concept
• Large amount of non structured information
• Machine learning, big data technologies, internet
of the things, intelligent systems are needed.
• Technology development as a service in all areas:
1. Structure:
– Environment, infrastructure (water, energy, material,
mobility, nature), built domain
2. Society:
– pubic space, functions, people
3. Data:
– information flows, performance
37
38. Mariam Saucedo
Pilar Torralbo
Daniel Sanz
Recycla.me
Ana Alfaro
Sergio Ballesteros
Lidia Sesma
Héctor Martos
Álvaro Bustillo
Arturo Callejo
Belén Abellanas
Jaime Ramos
Ignacio P. de Ziriza
Victor Torres
Alberto Segovia
Miguel Bueno
Mar Octavio de
Toledo
Antonio Sanmartín
Carlos Fernández
MAPA DE RECURSOS
RECYCLA.TE
38
39. • Parks and gardens
• Parkings for
• Cars
• Motorbikes
• Bikes
• Recycing Points
• Fixed
• Mobile
• Cloths
• Stations
• Bioetanol
• Gas
• Oil
• Electric
• Routes for bikes
• Vías ciclistas
• Calles seguras
• Residential Priority Areas
Madrid – Smart City
39
47. Conclusions
47
Big Data, Open Data and Smartcity
• A great opportunity for researchers working to transfer
technology, who can increase their efforts in developing
new techniques in optimization of:
– Monitoring data
– Storing data
– Cleaning, Integrating & Processing data
– Analysing data
– Encryption & searching on encrypted data
– Techniques of Data Mining
• A great future work in relation to development new smart
cities in environment, security and infrastructures.
48. Big and Open Data
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
Editor's Notes
GRASIA: Agentes inteligentes e ingeniería del software
Esta plantilla se puede usar como archivo de inicio para proporcionar actualizaciones de los hitos del proyecto.
Secciones
Para agregar secciones, haga clic con el botón secundario del mouse en una diapositiva. Las secciones pueden ayudarle a organizar las diapositivas o a facilitar la colaboración entre varios autores.
Notas
Use la sección Notas para las notas de entrega o para proporcionar detalles adicionales al público. Vea las notas en la vista Presentación durante la presentación.
Tenga en cuenta el tamaño de la fuente (es importante para la accesibilidad, visibilidad, grabación en vídeo y producción en línea)
Colores coordinados
Preste especial atención a los gráficos, diagramas y cuadros de texto.
Tenga en cuenta que los asistentes imprimirán en blanco y negro o escala de grises. Ejecute una prueba de impresión para asegurarse de que los colores son los correctos cuando se imprime en blanco y negro puros y escala de grises.
Gráficos y tablas
En breve: si es posible, use colores y estilos uniformes y que no distraigan.
Etiquete todos los gráficos y tablas.
¿Cuáles son las dependencias que afectan a la escala de tiempo, costo y resultado de este proyecto?
Este Esta presentación, que se recomienda ver en modo de presentación, muestra las nuevas funciones de PowerPoint. Estas diapositivas están diseñadas para ofrecerle excelentes ideas para las presentaciones que creará en PowerPoint 2010.
Para obtener más plantillas de muestra, haga clic en la pestaña Archivo y después, en la ficha Nuevo, haga clic en Plantillas de muestra.