This document discusses data visualization and big data. It defines big data as large data sets characterized by volume, velocity, and variety. Big data comes from various sources like business transactions, social media, sensors, and machine data. New technologies like Hadoop have helped store and analyze extremely large data sets to reveal patterns. The document emphasizes that visualizing data is important because humans do not understand raw data. Effective data visualization can show what is happening in the data, explain why, allow users to explore, and expose the data in a low-effort way for consumers. Examples are provided of visualizing religions in India and Australia, grocery sales data, movie ratings data, and birth data from the US and India to reveal patterns.
A walkthrough of the choices in the life of Anand that led to his role as a Data Scientist at Gramener -- and a peek behind the scenes of the kind of work a data scientist does
In this presentation at Launch Kids, Jonathan Nowell of Nielsen Book will examine the data his company has developed about the habits and preferences of younger readers and children's book buyers. He will offer insight about the consumption habits of younger readers with some thoughtful speculation about how children's book buying is changing over time (across formats and channels) and why young people today continue to remain attached to the printed book.
Through the Looking Glass: The Past, Present and Future of Children's Publish...Dominique Raccah
At Digital Book World 2016, as part of the Launch Kids pre-conference day, I spoke about what we've seen in children's publishing in the last 5 years and where we could be headed. The evolution of digital, content as a driver, and the development of Put Me In The Story, which is adding significant revenue to author and book brands.
Bologna toc 2013 changing world of children's books finalDominique Raccah
The Changing World of Children's Books 2013 was a keynote given by Dominique Raccah at the Tools of Change Bologna 2013 Conference which took place at the Bologna Children's Book Fair #BCBF13
A walkthrough of the choices in the life of Anand that led to his role as a Data Scientist at Gramener -- and a peek behind the scenes of the kind of work a data scientist does
In this presentation at Launch Kids, Jonathan Nowell of Nielsen Book will examine the data his company has developed about the habits and preferences of younger readers and children's book buyers. He will offer insight about the consumption habits of younger readers with some thoughtful speculation about how children's book buying is changing over time (across formats and channels) and why young people today continue to remain attached to the printed book.
Through the Looking Glass: The Past, Present and Future of Children's Publish...Dominique Raccah
At Digital Book World 2016, as part of the Launch Kids pre-conference day, I spoke about what we've seen in children's publishing in the last 5 years and where we could be headed. The evolution of digital, content as a driver, and the development of Put Me In The Story, which is adding significant revenue to author and book brands.
Bologna toc 2013 changing world of children's books finalDominique Raccah
The Changing World of Children's Books 2013 was a keynote given by Dominique Raccah at the Tools of Change Bologna 2013 Conference which took place at the Bologna Children's Book Fair #BCBF13
The bulk of the NoSQL Technologies focus on achieving scale-out ability by building their architecture around a simple, distributed hash, key-value store. This works well for partitioning simple data, but in reality, your information models are not simple. As a result, you may have to build enormous layers of code to manage an explicit structure baked into the persistence tier. In this session, take a look at a NoSQL solution which allows you to store naturally clustered, richly linked object networks beneath your key partitioned roots. The result is that you do not have to write extensive code to deal with the physical structure in the persistence tier even when dealing with complex information models like predictive models, timeseries, recursive relations, compositions, etc. We will explore how such an implementation works in practice by looking at a case study of an advanced model analytics and visualization solution built on the clustered NoSQL database solution Versant Database Engine.
Here's an example of how to code with Riak using cURL and ruby to do a basic PUT, GET and more. We then index the data using Apache Solr integration.
No matter what platform we’re discussing, we’re beyond the view of rows and columns. Data is more diverse than ever. More difficult to parse. Here is some of that story.
Moving to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design.
* How do you structure data when you don't have tables and foreign keys?
* When should you denormalize, add links, or create map-reduce queries?
* Where will Riak be a natural fit and where will it be challenging?
This is a presentation given by Matt Brender (@mjbrender) at Big Data TechCon 2015.
In this class, we will discuss why companies choose Riak over a relational database with a specific focus on availability, scalability, and the key/value data model. We then analyze the decision points that should be considered when choosing a non-relational solution and review data modeling, querying, and consistency guarantees. Finally, we end with simple patterns for building common applications in Riak using its key/value design, dealing with data conflicts that emerge in an eventually consistent system, and discuss multi-datacenter replication.
13 juin 2016
Groupe Excel et Power BI
Sujet: Les platerformes analytiques de Microsoft
Conférencier: Stéphane Fréchette
Stéphane nous a parlé de toutes les plateformes disponibles en ce moment chez Microsoft, pour faire de la BI et de l'analyse de données, ainsi que du plan de développement futur.
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Tableau Software - Business Analytics and Data Visualizationlesterathayde
Tableau boasts drag-and-drop features that allow users to visualize information from any structured format. Tableau is the only provider of data visualization and business intelligence software that can be installed and used by anyone while also adhering to IT standards making it the fastest growing tool on the planet for Business Intelligence. Gartner has recently named us in the magic Quadrant among the Top 27 vendors for BI tool. We are no 1 in ease of use, no 1 in reporting and dashboard creation, interactive visualization, etc.
. Feel free to download the product, see the sample reports & dashboards for other industries from
http://www.tableausoftware.com
Please use the below link to download a 15 Day trial version of Tableau Desktop and Server Versions.
http://www.tableausoftware.com/products/trial
You can also do a self-training by going through the Videos in the below link.
http://www.tableausoftware.com/learn/training.
A key challenge faced by social organisations is the last mile gap -- communicating the insights and actions to the masses.
The problem is one of attention. Very few people spend time on anything that appears unengaging.
The problem is also one of complexity. Most of the audience is lost if the message is not communicated in the form of a simple story.
Data visualisation provides a mechanism for visually engaging stories that can can explain complex results in a simple fashion. It is seeing widespread adoption among the media, NGOs and the Government.
This Webinar discusses examples of how data visualisation has provided insights in areas of social interest, and has communicated these to a broader audience. We will what techniques and support mechanisms are available in the market today to enable visual storytelling.
http://www.eventbrite.com/e/data-visualization-for-social-problems-tickets-15044842529
The ultimate guide to data storytelling | MaterclassGramener
Gramener collaborated with Nasscom to conduct an online masterclass session on "Storytelling With Data." Gramener's CEO, S Anand, led the masterclass and shared some important slides on how to make data stories and how to drive storytelling.
The slides talk about the structure of data stories and how to find meaning full insights from data. There are real-time examples of data analysis and visualizations we created a Gramener to communicate insights as stories.
This is an ultimate guide on data storytelling that offers tips to create data stories, things to keep in mind while making storylines, and choosing designs to make a design-led data story.
Know more about Gramener's data storytelling workshop for analysts and data scientists at https://gramener.com/data-storytelling-workshop
This presentation was given by Carson Research Consulting at ComNet15, The Communications Network annual conference. The presentation was part of the pre-conference workshop, Telling Stories with Data. The workshop was led by Taj Carson, CEO, Carson Research Consulting.
Humanizing Data Storytelling for Greater Business ImpactGramener
This presentation was shared by Gramener's Kanishk Kumar Abhishek during his guest lecture session at School of Business Management, NMIMS Mumbai.
Check out Gramener's data storytelling workshop for analysts and data scientists at https://gramener.com/data-storytelling-workshop
A data-driven view of how our lives are shaped at birth. Our names, our time of birth, our place of birth -- all of these have a statistically significant impact on our futures.
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Gramener
Chief data and analytics officers (CDAO) Apex Winter 2020 conference in Orlando was an intimate gathering of analytics leaders from across the world.
Gramener's Co-founder and COO, Naveen Gattu, spoke about the importance of storytelling on easy data consumption and impact on data-driven decision making for business users.
The bulk of the NoSQL Technologies focus on achieving scale-out ability by building their architecture around a simple, distributed hash, key-value store. This works well for partitioning simple data, but in reality, your information models are not simple. As a result, you may have to build enormous layers of code to manage an explicit structure baked into the persistence tier. In this session, take a look at a NoSQL solution which allows you to store naturally clustered, richly linked object networks beneath your key partitioned roots. The result is that you do not have to write extensive code to deal with the physical structure in the persistence tier even when dealing with complex information models like predictive models, timeseries, recursive relations, compositions, etc. We will explore how such an implementation works in practice by looking at a case study of an advanced model analytics and visualization solution built on the clustered NoSQL database solution Versant Database Engine.
Here's an example of how to code with Riak using cURL and ruby to do a basic PUT, GET and more. We then index the data using Apache Solr integration.
No matter what platform we’re discussing, we’re beyond the view of rows and columns. Data is more diverse than ever. More difficult to parse. Here is some of that story.
Moving to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design.
* How do you structure data when you don't have tables and foreign keys?
* When should you denormalize, add links, or create map-reduce queries?
* Where will Riak be a natural fit and where will it be challenging?
This is a presentation given by Matt Brender (@mjbrender) at Big Data TechCon 2015.
In this class, we will discuss why companies choose Riak over a relational database with a specific focus on availability, scalability, and the key/value data model. We then analyze the decision points that should be considered when choosing a non-relational solution and review data modeling, querying, and consistency guarantees. Finally, we end with simple patterns for building common applications in Riak using its key/value design, dealing with data conflicts that emerge in an eventually consistent system, and discuss multi-datacenter replication.
13 juin 2016
Groupe Excel et Power BI
Sujet: Les platerformes analytiques de Microsoft
Conférencier: Stéphane Fréchette
Stéphane nous a parlé de toutes les plateformes disponibles en ce moment chez Microsoft, pour faire de la BI et de l'analyse de données, ainsi que du plan de développement futur.
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
The traditional data warehouse has served us well for many years, but new trends are causing it to break in four different ways: data growth, fast query expectations from users, non-relational/unstructured data, and cloud-born data. How can you prevent this from happening? Enter the modern data warehouse, which is able to handle and excel with these new trends. It handles all types of data (Hadoop), provides a way to easily interface with all these types of data (PolyBase), and can handle “big data” and provide fast queries. Is there one appliance that can support this modern data warehouse? Yes! It is the Analytics Platform System (APS) from Microsoft (formally called Parallel Data Warehouse or PDW) , which is a Massively Parallel Processing (MPP) appliance that has been recently updated (v2 AU1). In this session I will dig into the details of the modern data warehouse and APS. I will give an overview of the APS hardware and software architecture, identify what makes APS different, and demonstrate the increased performance. In addition I will discuss how Hadoop, HDInsight, and PolyBase fit into this new modern data warehouse.
Tableau Software - Business Analytics and Data Visualizationlesterathayde
Tableau boasts drag-and-drop features that allow users to visualize information from any structured format. Tableau is the only provider of data visualization and business intelligence software that can be installed and used by anyone while also adhering to IT standards making it the fastest growing tool on the planet for Business Intelligence. Gartner has recently named us in the magic Quadrant among the Top 27 vendors for BI tool. We are no 1 in ease of use, no 1 in reporting and dashboard creation, interactive visualization, etc.
. Feel free to download the product, see the sample reports & dashboards for other industries from
http://www.tableausoftware.com
Please use the below link to download a 15 Day trial version of Tableau Desktop and Server Versions.
http://www.tableausoftware.com/products/trial
You can also do a self-training by going through the Videos in the below link.
http://www.tableausoftware.com/learn/training.
A key challenge faced by social organisations is the last mile gap -- communicating the insights and actions to the masses.
The problem is one of attention. Very few people spend time on anything that appears unengaging.
The problem is also one of complexity. Most of the audience is lost if the message is not communicated in the form of a simple story.
Data visualisation provides a mechanism for visually engaging stories that can can explain complex results in a simple fashion. It is seeing widespread adoption among the media, NGOs and the Government.
This Webinar discusses examples of how data visualisation has provided insights in areas of social interest, and has communicated these to a broader audience. We will what techniques and support mechanisms are available in the market today to enable visual storytelling.
http://www.eventbrite.com/e/data-visualization-for-social-problems-tickets-15044842529
The ultimate guide to data storytelling | MaterclassGramener
Gramener collaborated with Nasscom to conduct an online masterclass session on "Storytelling With Data." Gramener's CEO, S Anand, led the masterclass and shared some important slides on how to make data stories and how to drive storytelling.
The slides talk about the structure of data stories and how to find meaning full insights from data. There are real-time examples of data analysis and visualizations we created a Gramener to communicate insights as stories.
This is an ultimate guide on data storytelling that offers tips to create data stories, things to keep in mind while making storylines, and choosing designs to make a design-led data story.
Know more about Gramener's data storytelling workshop for analysts and data scientists at https://gramener.com/data-storytelling-workshop
This presentation was given by Carson Research Consulting at ComNet15, The Communications Network annual conference. The presentation was part of the pre-conference workshop, Telling Stories with Data. The workshop was led by Taj Carson, CEO, Carson Research Consulting.
Humanizing Data Storytelling for Greater Business ImpactGramener
This presentation was shared by Gramener's Kanishk Kumar Abhishek during his guest lecture session at School of Business Management, NMIMS Mumbai.
Check out Gramener's data storytelling workshop for analysts and data scientists at https://gramener.com/data-storytelling-workshop
A data-driven view of how our lives are shaped at birth. Our names, our time of birth, our place of birth -- all of these have a statistically significant impact on our futures.
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Gramener
Chief data and analytics officers (CDAO) Apex Winter 2020 conference in Orlando was an intimate gathering of analytics leaders from across the world.
Gramener's Co-founder and COO, Naveen Gattu, spoke about the importance of storytelling on easy data consumption and impact on data-driven decision making for business users.
Data Storytelling - Game changer for Analytics Gramener
50 Percent of Data Science Projects Fail at
Consumption: Can Storytelling Be Your Game
Changer
Growth of Self Service BI has generated a lot of
dashboards, but “lots” does not always mean “good” or
“useful”.
• While advances in AI/ML lead to deeper insights,
business teams struggle with the adoption of
algorithms and consumption
• How can data officers and analytics leaders
get better business ROI from their data science
investments?
• This session will show you how to unleash the
power of data storytelling for business decision-
making, using industry examples
Gramener's Shravan Kumar took a virtual session with the students of IIM Shillong. It was a fantastic interaction and the participants were keen to understand how data and storytelling are making a difference to new-age businesses. Check out the slides from the session
You\'ve Got the Power is a very popular session delivered to frontline staff in schools and offices. It helps staff understand the key role they play in building and breaking school/district reputation and delivers five "power tools" to help them maximize their reputation-building influence.
We consider this a grossly underestimated marketing audience. They wield more consumer power than many think and they are a beautifully unique and socially connected group. This presentation offers advice on both understanding and reaching today's seniors through messaging, media and marketing.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. WHAT IS BIG DATA ?
Volume: Organizations collect data from a variety of sources,
including business transactions, social media and information from
sensor or machine-to-machine data. In the past, storing it would’ve
been a problem – but new technologies (such as Hadoop) have
eased the burden.
Velocity: Data streams in at an unprecedented speed and must be
dealt with in a timely manner. RFID tags, sensors and smart
metering are driving the need to deal with torrents of data in near-
real time.
Variety: Data comes in all types of formats – from structured,
numeric data in traditional databases to unstructured text
documents, email, video, audio, stock ticker data and financial
transactions.
3. BIG DATA also is…
Extremely large data sets that may be analysed computationally to
reveal patterns, trends, and associations, especially relating to
human behaviour and interactions
Which data is “BIG”?
• Google Services
• Social Media
• E-Commerce
• Geo-location and many more…
4. WHY CARE FOR DATA ?
DATA IS EVERYWHERE.
As users are continuously increasing, amount of data is getting
massive.
5. HOW TO USE DATA ?
To predict the user behavior…
6. HOW TO USE DATA ?
To make connectivity better…
7. HOW TO USE DATA ?
To provide better products to buy….
8. And the whole idea is ….
To give user…
Better experience
Accuracy
Relevancy
Productivity
Opportunity
Reliability
9. WHAT TO DO WITH DATA ?
IS
EVERYWHERE
ANALYSIS
DATA
VISUALIZE
10. WHY SHOULD WE VISUALIZE THE
DATA?
We humans do not understand the
language of raw DATA
11. “ We have internal information. Getting
information from outside is our
challenge. There’s no way of doing
that.
– Senior Editor
Leading Media Company
12. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
Low effort High effort
High effort
Low effort
Creator
Consumer
There are many ways to aid data
consumption
13. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
17. LET’S TAKE TESCO’S GROCERIEScategory title kJ rate
dairy Activia Pouring Natural Yogurt 1X950g 216 0.21
dairy Activia Pouring Strawberry Yogurt 1X950g 250 0.21
dairy Activia Pouring Vanilla Yogurt 1X950g 263 0.21
icecream Almondy Daim 400G 1804 0.75
icecream Almondy Toblerone 400G 1850 0.5
cereals Alpen 10 Pack Lite Summer Fruits Cereal Bars 210G 1222 1.57
cereals Alpen 10Pk Fruit Nut And Chocolate Cereal Bars 290G 1812 1.14
cereals Alpen Coconut And Chocolate Cereal Bars 5Pk 145G 1863 1.24
cereals Alpen Fruit And Nut With Chocolate Cereal Bar 5X29g 1812 1.24
cereals Alpen High Fruit 650G 1439 0.4
cereals Alpen Light Bars Chocolate And Orange 5X21g 1246 1.71
cereals Alpen Light Chocolate And Fudge Bar 5X21g 1264 1.71
cereals Alpen Light Sultana & Apple Bars 5Pk 105G 1197 1.71
cereals Alpen Light Summer Fruits Bars 5Pk 105G 1222 1.71
cereals Alpen No Added Sugar 1.3Kg 1488 0.31
cereals Alpen No Added Sugar 560G 1488 0.46
cereals Alpen Original 1.5Kg 1509 0.27
cereals Alpen Original Muesli 750G 1509 0.35
cereals Alpen Raspberry And Yoghurt Cereal Bars5x29g 1748 1.24
cereals Alpen Strawberry With Yoghurt Cereal Bar 5X29g 1756 1.24
dairy Alpro Natural Yofu 500G 0.28
dairy Alpro Raspberry Vanilla Yofu 4X125g 0.35
dairy Alpro Strawberry And Fof Soya Yofu 4X125g 0.35
18.
19.
20. The Shawshank
Redepmption
The Godfather
The Dark Knight
Titanic
The Phantom Menace
Twilight
New Moon
Wild Wild West
Transformers
The Good, The Bad,
The Ugly
12 Angry Men
7 Samurai
Taare Zameen Par
Rang De Basanti
Yojinbo
MORE VOTES
BETTER RATED
Many unwatched movies
Few unwatched movies
Mix of watched & unwatched
Few watched movies
Many watched movies
MOVIES ON THE IMDB
3 Idiots
21. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
Simplifying access to data is a big win
22. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
23. EDUCATION
PREDICTING MARKS
What determines a child’s
marks?
Do girls score better than boys?
Does the choice of subject
matter?
Does the medium of instruction
matter?
Does community or religion
matter?
Does their birthday matter?
Does the first letter of their
name matter?
27. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
… to make the hidden obvious
28. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me
29. Let’s look at 15 years of US Birth Data
This is a dataset (1975 – 1990) that has
been around for several years, and has
been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
• Are birthdays uniformly distributed?
• Do doctors or parents exercise the C-section option to move dates?
• Is there any day of the month that has unusually high or low births?
• Are there any months with relatively high or low births?
Very high births in September.
But this is fairly well known. Most
conceptions happen during the
winter holiday season
Relatively few births during the
Christmas and Thanksgiving
holidays, as well as New Year and
Independence Day.
Most people prefer not
to have children on the
13th of any month, given
that it’s an unlucky day
Some special days like April
Fool’s day are avoided, but
Valentine’s Day is quite
popular
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
30. The pattern in India is quite different
This is a birth date dataset that’s
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
• Is there an aversion to the 13th or is there a local cultural nuance?
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Very few children are born in the
month of August, and thereafter.
Most births are concentrated in
the first half of the year
We see a large number of
children born on the 5th, 10th,
15th, 20th and 25th of each month
– that is, round numbered dates
Such round numbered patterns a
typical indication of fraud. Here,
birthdates are brought forward to
aid early school admission
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
31. This adversely impacts children’s marks
It’s a well established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of
the month tend to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,
due to a higher proportion of younger children
32. SHOW
me what is happening
with the data
EXPLAIN
to me why it’s
happening
Allow me to
EXPLORE
and figure it out
Just
EXPOSE
the data to me