This document discusses how data visualization can make data science more tangible and accessible. It provides examples of using maps and interactive visualization to tell stories with data and simplify complex datasets. Visualization tools can turn raw data into insights by aggregating information over space and time. The document advocates using all dimensions of data, like integrating 3D models with spatiotemporal data, to better understand assets and dynamic systems.
Since GeoJSON is a standard for storing geographic data in JSON format, it is a best practice to adhere to this format when storing geo-coordinates in Cloudant and CouchDB.
One of the first steps of crisis management is the mapping of the emergency. The goal is to determine the area affected and the geographical constraints. This session looks at the best solutions available to map crises.
Mike King, Global Public Safety Manager, CAD/911/FirstNet, Esri & Francisco Nobre, Business Partner Coordinator, Esri
Presentation 'about the (very nearby) future of GIS' for GeoScience students, Universiteit Utrecht. I had a few recommended skill and recommendations as well, will blog about that later.
Visual Resources Association Annual Conference
March 26-29, 2019, Los Angeles
Session: Mapping New Vistas: Employing Emerging Technologies into Your Visual Resources Services
Presenter: Jon Cartledge
Collections as Data National Forum (Elings)Mary Elings
From May 7-8, the University of Nevada Las Vegas will hold a second Collections as Data national forum. During the forum a group of librarians, technologists, archivists, and disciplinary researchers will gather to share their work with collections as data, reality test project deliverables, and help frame future directions for collections as data work writ large.
My closing keynote at GISRUK 2019 - a call to arms for a human approach in a digital world, reflecting in a light-hearted and personal way on GIS industry trends, careers and how to succeed in GIS deployments and applications.
GISRUK is an annual GIS research conference attracting around 200 academic researchers from around the UK and beyond, each year held at a different university. The 2019 conference took place in Newcastle upon Tyne in April 2019. Info: https://gis.geos.ed.ac.uk/gisruk/gisruk.html
Since GeoJSON is a standard for storing geographic data in JSON format, it is a best practice to adhere to this format when storing geo-coordinates in Cloudant and CouchDB.
One of the first steps of crisis management is the mapping of the emergency. The goal is to determine the area affected and the geographical constraints. This session looks at the best solutions available to map crises.
Mike King, Global Public Safety Manager, CAD/911/FirstNet, Esri & Francisco Nobre, Business Partner Coordinator, Esri
Presentation 'about the (very nearby) future of GIS' for GeoScience students, Universiteit Utrecht. I had a few recommended skill and recommendations as well, will blog about that later.
Visual Resources Association Annual Conference
March 26-29, 2019, Los Angeles
Session: Mapping New Vistas: Employing Emerging Technologies into Your Visual Resources Services
Presenter: Jon Cartledge
Collections as Data National Forum (Elings)Mary Elings
From May 7-8, the University of Nevada Las Vegas will hold a second Collections as Data national forum. During the forum a group of librarians, technologists, archivists, and disciplinary researchers will gather to share their work with collections as data, reality test project deliverables, and help frame future directions for collections as data work writ large.
My closing keynote at GISRUK 2019 - a call to arms for a human approach in a digital world, reflecting in a light-hearted and personal way on GIS industry trends, careers and how to succeed in GIS deployments and applications.
GISRUK is an annual GIS research conference attracting around 200 academic researchers from around the UK and beyond, each year held at a different university. The 2019 conference took place in Newcastle upon Tyne in April 2019. Info: https://gis.geos.ed.ac.uk/gisruk/gisruk.html
Spatial is (not) special - Adventures in location-based dataThierry Gregorius
Delivered to the BCS Data Management forum, an overview of GIS/Geospatial trends, the need for spatial integrity, why spatial intelligence doesn't need a map, and creative curveballs like the enduring benefits of analog tools and handmade craftsmanship.
What is network analysis good for, and how can you apply it yourself using open source tools? A demo is shown, making a plot of the matatu bus network in Nairobi, Kenya.
The Critical Role of IoT Data Integration to develop Big Data Applications (f...Rainer Sternfeld
HP predicts that by 2020, 40% of all data ever collected by the human kind will have been generated by sensors. But if you can't use the data, if you can search and discover it; and if you can't make it machine-readable, then the investment into intelligent sensor networks will be unused.
In this presentation, I discuss different cases of data integration and discovery, and how to turn this data into usable/readable information both for humans and machines, thus allowing data professionals, executives and data vendors all do what they do best, leaving data integration and discovery to professionals.
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Rainer Sternfeld
1. Market trends in some of the biggest industries using scientific sensor data
2. Technology trends
3. How Planet OS is solving these challenges
4. The Industrial Internet (GE), The Internet of Everything (Cisco)
5. Security and trust
Integrating Geospatial into the EverydayCybera Inc.
Geospatial data have been an integral part of the everyday at the City of Calgary since the City’s incorporation in the late 19th century. Today, geospatial professionals at the City of Calgary assemble and manage increasing variety and volume of geospatial data, and work to integrate these data into everyday business operations, including planning and response to civil emergencies such as the 2013 flood event. Despite challenges to data integration, coordination with business and technology partners have resulted in successful development, deployment and maintenance of specific business and enterprise tools that leverage rich spatial and business data with emerging technologies throughout the corporation in the 21st century.
U.S. National Arboretum - Esri User Conference 2015 Presentation Blue Raster
Adrienne Allegretti (Blue Raster) and Joe Meny (USDA) presented on how the U.S. National Arboretum use GIS to support the National Arboretum's work. Learn about the Arboretum Botanical Explorer App and their new Arboretum Mobile application.
An introduction to Python tools for data science, presented for the DevX developer club at Carleton College, October 2017. Talk spans:
-Tools for interactive/collaborative programming (Ipython, Jupyter)
- Tools for data wrangling/ analysis (numpy, pandas)
- Tools for visualization (matplotlib, seaborn)
Guest Lecture for the Data Visualization Class at Ateneo de Manila University. Basic design for Computer Science students. For educational purposes only, no copyright infringement intended.
Changing contexts: museums, audiences and technologyMia
A presentation for the International Training Programme run by the British Museum for museum professionals from around the world. This is based on a presentation I prepared for OpenCulture 2011, but includes additional material on mobile phones/devices including the 'Hidden Histories' pilot.
Spatial is (not) special - Adventures in location-based dataThierry Gregorius
Delivered to the BCS Data Management forum, an overview of GIS/Geospatial trends, the need for spatial integrity, why spatial intelligence doesn't need a map, and creative curveballs like the enduring benefits of analog tools and handmade craftsmanship.
What is network analysis good for, and how can you apply it yourself using open source tools? A demo is shown, making a plot of the matatu bus network in Nairobi, Kenya.
The Critical Role of IoT Data Integration to develop Big Data Applications (f...Rainer Sternfeld
HP predicts that by 2020, 40% of all data ever collected by the human kind will have been generated by sensors. But if you can't use the data, if you can search and discover it; and if you can't make it machine-readable, then the investment into intelligent sensor networks will be unused.
In this presentation, I discuss different cases of data integration and discovery, and how to turn this data into usable/readable information both for humans and machines, thus allowing data professionals, executives and data vendors all do what they do best, leaving data integration and discovery to professionals.
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Rainer Sternfeld
1. Market trends in some of the biggest industries using scientific sensor data
2. Technology trends
3. How Planet OS is solving these challenges
4. The Industrial Internet (GE), The Internet of Everything (Cisco)
5. Security and trust
Integrating Geospatial into the EverydayCybera Inc.
Geospatial data have been an integral part of the everyday at the City of Calgary since the City’s incorporation in the late 19th century. Today, geospatial professionals at the City of Calgary assemble and manage increasing variety and volume of geospatial data, and work to integrate these data into everyday business operations, including planning and response to civil emergencies such as the 2013 flood event. Despite challenges to data integration, coordination with business and technology partners have resulted in successful development, deployment and maintenance of specific business and enterprise tools that leverage rich spatial and business data with emerging technologies throughout the corporation in the 21st century.
U.S. National Arboretum - Esri User Conference 2015 Presentation Blue Raster
Adrienne Allegretti (Blue Raster) and Joe Meny (USDA) presented on how the U.S. National Arboretum use GIS to support the National Arboretum's work. Learn about the Arboretum Botanical Explorer App and their new Arboretum Mobile application.
An introduction to Python tools for data science, presented for the DevX developer club at Carleton College, October 2017. Talk spans:
-Tools for interactive/collaborative programming (Ipython, Jupyter)
- Tools for data wrangling/ analysis (numpy, pandas)
- Tools for visualization (matplotlib, seaborn)
Guest Lecture for the Data Visualization Class at Ateneo de Manila University. Basic design for Computer Science students. For educational purposes only, no copyright infringement intended.
Changing contexts: museums, audiences and technologyMia
A presentation for the International Training Programme run by the British Museum for museum professionals from around the world. This is based on a presentation I prepared for OpenCulture 2011, but includes additional material on mobile phones/devices including the 'Hidden Histories' pilot.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
Motivating Introduction to MOOC on Big Data from an applications point of view https://bigdatacoursespring2014.appspot.com/course
Course says:
Geoffrey motivates the study of X-informatics by describing data science and clouds. He starts with striking examples of the data deluge with examples from research, business and the consumer. The growing number of jobs in data science is highlighted. He describes industry trend in both clouds and big data.
He introduces the cloud computing model developed at amazing speed by industry. The 4 paradigms of scientific research are described with growing importance of data oriented version. He covers 3 major X-informatics areas: Physics, e-Commerce and Web Search followed by a broad discussion of cloud applications. Parallel computing in general and particular features of MapReduce are described. He comments on a data science education and the benefits of using MOOC's.
ViziCities - Lessons Learnt Visualising Real-world Cities in 3DRobin Hawkes
ViziCities is an open-source 3D city visualisation platform powered by JavaScript, WebGL and many other cutting-edge Web technologies. Think SimCity meets the real world!
In this talk, Robin Hawkes, ViziCities’ creator will highlight the development issues experienced along the way and show you how he overcame them – ranging from how you tackle the realtime processing of thousands of 3D buildings without locking up the browser, to how you visualise the entire world without needing a server or your own geographic data source.
Data visualization is crucial to understanding the big data being generated by apps and services. Data visualization toolkits such as D3.js and charting toolkits are immensely popular but it remains difficult to create meaningful dashboards or usable analytics tools or clear data visualizations. This talk will discuss data visualization principles, present best practices, showcase excellent visualizations in practice, and share useful tips and mistakes learned.
The Next Wave of AR: Mobile Social Interaction Right Here, Right Now!Tish Shute
I began by asking the question: Can we create an open framework for distributed augmented reality using "off the shelf" standards, e.g., the Google Wave Federation Protocol?
But the implications of this proposal go well beyond augmented reality and towards an open framework for in context mobile social communication.
Also see video here http://www.mobilemonday.nl/talks/tish-shute-the-next-wave-of-ar/
29 March 2019 Presentation on the relation of digital and virtual heritage to digital humanities, issues, some projects..at Curtin University Perth Australia
The Next Wave of AR: Mobile Social Interaction, Right Here, Right Now!Tish Shute
I began by asking the question: Can we create an open framework for distributed augmented reality using "off the shelf" standards, e.g., the Google Wave Federation Protocol?
But the implications of this proposal go well beyond augmented reality and towards an open framework for in context mobile social communication.
Also see video here. http://www.mobilemonday.nl/talks/tish-shute-the-next-wave-of-ar/
Presentation during the Land-use science research group at the Swiss Federal Research Institute WSL, by Eduardo Oliveira and coordination of Silvia Tobias
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
Network Mapping & Data Storytelling for BeginnersRenaud Clément
5-hour Workshop about network mapping and data storytelling.
This includes examples about data, networks, visualization, etc.
Given on Jan 31st, 2013 during a lecture in the Master Information, Technology and Territories in the Institute of Geography and Social Sciences, Toulouse 2 University. France.
Many thanks to @graphcommons for the inspiration.
If you took a Geography course over 20 years ago, you might recall the subject involving little more than memorizing the locations of continents, countries, cities, as well as climate and cultural facts. In that time, many universities have expanded their geography programs by entering the world of Geographic Information Systems, or GIS for short. In the beginning GIS was an obscure field of specialized hardware, software, and cryptic keyboard commands that allowed a skilled professional to query data to get answers to geographic-based inquiries. Queries, such as the quantity of forested acres within an area, were the beginning of the geographicbased analysis revolution that has since unfolded. But today’s leading geography programs are teaching students more than just the where, what, who, and why of our world, but also bring to the table an interdisciplinary approach to solving today’s local, regional, national, and global problems. Many of these programs are not limited to just universities, now involving the K-12 space, tapping into young people’s minds to unleash innovative ideas in what is now an interdisciplinary field. To maintain a competitive advantage in today’s world, leading countries, companies, and research organizations are embracing these new capabilities and the talent that is available in the marketplace.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
While business analysis rapidly grows more data-driven, the analyst community is slow to adapt the best practices of data science workflows. Many parallels exists between data science “top topics” (e.g. reproducibility) and business pain points, but these common needs are obscured by the different “languages” of these two communities. The opportunity cost is greatest in heavily regulated industries such as finance and insurance where documentation and compliance are paramount.
In this talk, we will review our experience transitioning Capital One business analysts from legacy systems to open-source workflows by developing user-friendly tools. We incentivized business analysts to adopt the data science mindset by curating open-source tools and developing code packages which simplify workflows and eliminate pain points.
Our internal R package, tidycf, reimagines cumbersome Excel cashflow statements as dataframes and uses RMarkdown templates and the RStudio IDE for an intuitive, user-friendly experience without the overhead of maintaining a custom GUI. We tackle challenges in documentation and communication while immersing new users in the R language.
We will share best practices and lessons learned from our experience designing tools for non-technical end-users, standardizing workflows based on the RStudio IDE’s infrastructure, and evangelizing data science methods.
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
In this talk, we’ll describe NoSQL (“not-only SQL”) and document-oriented databases and the value they provide for data science companies like Uptake. We will walk through the unique challenges such datastores pose for data science workflows. To make these challenges and lessons learned concrete, we’ll explore data science workflows through a discussion of the development efforts that led to “uptasticsearch”, an R package released by the Uptake Data Science team to reduce friction in interacting with a document store called Elasticsearch. The talk will conclude with a discussion of recent developments in NoSQL technologies and implications for data scientists.
Racial Bias in Policing: an analysis of Illinois traffic stops dataDomino Data Lab
Since 2004, Illinois has collected demographic information about traffic stops conducted by police in an effort to identify racial bias. This data has been used by groups such as the ACLU and the Stanford Open Policing Project to identify key markers that infer racial bias in policing. We have applied exploratory data analysis to investigate whether systemic racial bias may appear and to what extent. This talk will walk the audience through the insights gleaned from the exploration of this data along with the challenges posed and ongoing questions raised.
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
Analytics and data science are ever growing fields, as business decision makers continue to use data to drive decisions. The pinnacle of these fields are the models and their accuracy/fit,; what about the data? Is your data clean, and how do you know that? Our discussion will focus on best practices for data preprocessing for analytic uses. Beginning with essential distributional checks of a dataset to a propose method for automated data validation process during ETL for transactional data.
Supporting innovation in insurance with randomized experimentationDomino Data Lab
Recent technological advances, a dynamic competitive landscape, and an evolving regulatory environment have led to a period of rapid innovation for many insurance providers. Here, we’ll explore how data scientists may use randomized experiments to rigorously assess the causal impact of innovations on business outcomes. Particular emphasis will be placed on experimentation in “offline” channels, with some of the challenges and mitigation strategies highlighted.
Leveraging Data Science in the Automotive IndustryDomino Data Lab
Cars.com Inc. is a decision engine for car buyers and a growth engine for our partners. Data Science is the bread and butter of any decision engine and Cars is no different. In this talk, I will discuss how we quantify various parameters of a car and plan to make use of all the data in hand to put predictive models at various stages of a users’ automobile lifecycle. This talk will also cater to students looking to gain knowledge on how data science is utilized at scale while still following certain processes and leading the way for business and product partners.
Summertime Analytics: Predicting E. coli and West Nile VirusDomino Data Lab
Lake Michigan and outdoor recreation are enjoyable aspects of summers in Chicago, but it can come with risk of potential E. coli in Lake Michigan or West Nile Virus from mosquitos. This summer, the City of Chicago launched two new predictive analytics projects to forecasts the risks and to proactively limit these risks. Members of the research team, Gene Leynes and Nick Lucius discuss the projects and how they’re being used as part of city operations.
What you till learn:
GOALS - What is the bar for data science teams
PITFALLS - What are common data science struggles
DIAGNOSES - Why so many of our efforts fail to deliver value
RECOMMENDATIONS - How to address these struggles with best practices
Presented by Mac Steele
Director of Product at Domino Data Lab
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
You love python. You love Data Science. But the size of your data set keeps crashing your code. Is it time to bring in big data tools or simply code smarter? Lee is going to show you efficiency hacks, drawn from top Kaggle competitors, to get python to work on large data sets. Skip the hassle of creating a Big Data infrastructure. Let’s find out how far we can push our home laptop first.
Most of analytics modeling work today focuses on the production of single-purpose "artisanal" models for predictions. This approach to analytics is fragile with respect to model consistency, reorganization, and resource availability. This talk will argue that instead the focus of analytics modeling should be toward the production of analytics interchangeable parts, which can be combined in creative ways to produce a wide variety of analytics results. This "nuts and bolts" approach allows analytics groups to produce results in an agile way where the time between ask and answer is determined by the right combination of analytics, rather than the modeling.
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
In this presentation, Jon Loyens will share:
-Best practices for sharing context and knowledge about your data projects
-How linked data can augment your existing data science workflow and toolchain to accelerate your work
-How a social network can unlock power of Linked Data and data collaboration
-How Linked Data can help you easily combine private and Open Data for fun and profit
Although both disciplines are unique in their own ways, Software Engineering and Data Science make heavy use of programing languages to do their respective jobs. Data Science is a relatively new discipline and many of its practitioners have not previously been professional software engineers. There are a few techniques that Data Scientists can leverage from Software Engineering in order to make their tooling and environments, faster to design, more easily debugged and most importantly, clearer to read. This talk will be going over some practical tips that anyone can use to help better understand their code; give clarity around cloud environments, their uses and drawbacks and finally briefly touching on the Software Development Lifecycle.
Within marketing research, big data is often described as being “census” data for the population that it represents. The devil is in the details and when we take a closer look we can see that this isn’t the case. There are many situations that are not captured within the population that big data purports to be a census of. Big data isn’t even a census of itself since it’s not uncommon for records to be excluded either by accident during the collection process or by design in the cleaning processor. Unfortunately, our industry is so enamored with the size of big data that some users of data are willing to trade off precision for tonnage. Fortunately, if the shortcomings of big data are understood and corrected it can accurately represent the population that it measures in the correct proportion to the universe. We will discuss a method that Nielsen has developed called “Common Homes” that is designed to identify and correct the shortcomings of big data sets that represent media consumption.
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
The exponential growth of Big Data and Analytics has outpaced the ability of organizations to govern their data appropriately. The ability to reuse the work done by data scientists work is becoming an economic necessity. The mix of data sources is changing from tradition transactional and ERP systems to include a mix of structured, semi-structured and unstructured data. Data Governance needs to adapt to these changes. This session discusses these data changes and proposed how to adapt current data governance processes. These include, how the concept of a stakeholder has changed and the need for expansion of communications and content management. We look at need to consolidate data from disparate systems and how it governed. Lastly we will investigate how context is emerging as an important factor in governance and how it can be leveraged to provide for accurate, reliable data reuse.
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
Big Data analytics is well known to uncover hidden insights that gives an organization an edge over the competition. But data does not need to be big in order to be useful. Smaller companies and startups may lack the volume of data that qualifies as big data, yet the variety of data can still yield a trove of insights that helps in driving the business strategies of a company. Startups may also lack the resources to fund an additional, seemingly expensive development project. The key is in simplicity, start small, simple and architect for scalability and performance. But how do you start? In this presentation, we share our experience in building a cost effective, AWS serverless data analytics platform that became an invaluable tool for sales, marketing and operational efficiencies.Serverless architectures simplify development work where servers and software are managed by a third party cloud provider. Developers can focus on just building the data wrangling and data analysis logic where critical aspects like scalability and high availability are guaranteed by the cloud provider. Besides, serverless services offer the pay as you go model, where you pay only based on the amount of resources you use. This turns out to be another attractive aspect where costs can be managed based on the usage. In this presentation we will focus on techniques and best practices to build a big data analytics platform using AWS serverless services like Lambda, DynamoDB, S3, Kinesis, Athena, QuickSight and Amazon ML. We will highlight the strengths of each of these services and what role each plays in the data analytics pipeline. We compare and contrast these services with some of the other popularly used big data technologies like Hadoop, Spark and Kafka. We also demonstrate the usage of these services to build intelligent components that detect anomalies, yield recommendations, simulate chat bots and generate predictive analytics.
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
The data science process seeks to transform and empower organizations by finding and exploiting market inefficiencies and potentially hidden opportunities, but this is often an expensive, tedious process. However, many steps can be automated to provide a streamlined experience for data scientists. Eduardo Arino de la Rubia explores the tools being created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation and impact validation.
The promise of the automated statistician is almost as old as statistics itself. From the creations of vast tables, which saved the labor of calculation, to modern tools which automatically mine datasets for correlations, there has been a considerable amount of advancement in this field. Eduardo compares and contrasts a number of open source tools, including TPOT and auto-sklearn for automated model generation and scikit-feature for feature generation and other aspects of the data science workflow, evaluates their results, and discusses their place in the modern data science workflow.
Along the way, Eduardo outlines the pitfalls of automated data science and applications of the “no free lunch” theorem and dives into alternate approaches, such as end-to-end deep learning, which seek to leverage massive-scale computing and architectures to handle automatic generation of features and advanced models.
The Role and Importance of Curiosity in Data ScienceDomino Data Lab
by Alfred Lee
Lead Data Scientist, White Ops
Is curiosity useful for more than serendipitous discovery? Can curiosity be taught? How do I foster curiosity in my team? Can someone be too curious? Questions!
by Jennifer Shin
Senior Principal Data Scientist, Nielsen
With more and more data being collected from consumers, finding a efficient solution to aligning data over time can become increasingly difficult and yet, even more necessary. Whether it's a change in the data collection process or an error in the system, working with big data requires tools that can account for real world complexities.
This talk with introduce the benefits and complexities of implementing a 'fuzzy' solution using the Levenshtein algorithm. Attendees will walk away with a high level understanding of fuzzy matching algorithms and learn how it can be effectively applied to solve real word business problem.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
23. The insurance industry has a long history with maps
Sanborn Fire Insurance Maps from the Library of Congress (https://www.loc.gov/)
24. Could this be the modern fire insurance map?
https://github.com/Esri/Manhattan-skyscraper-explorer
25. Index 3D Scene Layers
http://www.opengeospatial.org/standards/i3s
• An Open Standard
• For streaming large volumes of 3D content
• Designed for web, mobile, and cloud
• Works for
- 3D Objects
- Pointclouds
- Meshes
30. Dynamic aggregation spatio-temporal data
• What if we want to do this in real-time?
• What if we want to change how the aggregation is rendered?
ArcGIS
Enterprise
GeoEvent
Server
spatiotemporal
big data store
Big DataIoT
GeoAnalytics
Server
31. On-the-fly aggregation
http://coolmaps.esri.com/BigData/Cube/
• Leveraging existing geohash, square, and hexagon aggregation capabilities to
construct space-time bins
• Enables exploration of real-time and historic data
ArcGIS
Enterprise
GeoEvent
Server
spatiotemporal
big data store
Big DataIoT
GeoAnalytics
Server
ArcGIS API for
JavaScript
& WebGL
client-side rendering of
space-time cubes via WebGL