Successfully reported this slideshow.

Course 6 (part 2) data visualisation by toon vanagt

0

Share

1 of 39
1 of 39

Course 6 (part 2) data visualisation by toon vanagt

0

Share

Download to read offline

For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.

It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.

This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.

No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/

For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.

It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.

This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.

No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/

More Related Content

Course 6 (part 2) data visualisation by toon vanagt

  1. 1. Slides by Gabriela Antunes Vieira
  2. 2. Course by Toon Vanagt Data visualization is a generic term used which describes any attempt to help understanding of data by providing visual representation. ◘ Visualization of data makes it much easier to analyse and understand the textual and numeric data. ◘ Saves time for decision making
  3. 3. Course by Toon Vanagt Combine, correlate and improve quality of data sets The most complicated data will look easy when it gets through the process of visualization. Visual analytics offers a story to the viewers Easier for business owners and organisations to understand their large data in a simple to understand format
  4. 4. Course by Toon Vanagt Avoid distorting what the data has to say Induce the viewer to think about the substance rather than about methodology, graphic design, etc… Serve a reasonably clear purpose: description, exploration, tabulation or decoration Make large data sets coherent Show the data Encourage the eye to compare different pieces of data
  5. 5. Course by Toon Vanagt
  6. 6. Course by Toon Vanagt
  7. 7. Course by Toon Vanagt Move up the information ladder by asking users/patients for input Combine, correlate and improve quality of data sets Bring new value from raw (open) data sets Visualise in new ways Mine deeper to dig out “insights” (not just basic statistics) Any company can now run its “own Google” Bring new value from raw (open) data sets Mine deeper to dig out “insights” (not just basic statistics) Any company can now run its “own Google”
  8. 8. Course by Toon Vanagt
  9. 9. Course by Toon Vanagt Data Cleansing is about identifying the wrong or inacurate records in a data set and making appropriate corrections to the erronous records Identify incomplete, Inacurate and incorrect parts of data (elements) Replace them with correct data or delete the incorrect data element
  10. 10. Course by Toon Vanagt
  11. 11. Course by Toon Vanagt OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: ◘ cleaning it ◘transforming it from one format into another ◘ extending it with web services and external data.
  12. 12. Course by Toon Vanagt https://bit.ly/2YzBHwu
  13. 13. Course by Toon Vanagt https://bit.ly/2HWdmLh
  14. 14. This tutorial will walk you through some of the most common data-manipulation tasks you’ll need to perform. When you’re done, you should know how to: •clean up spelling inconsistencies •remove leading and trailing whitespace •split cells into multiple columns
  15. 15. Course by Toon Vanagt DOWNLOAD LINK FOR OPEN REFINE : https://github.com/OpenRefine/OpenRefine/releases/ DOWNLOAD LINK FOR THIS EXERCISE DATA: https://www.dropbox.com/s/w8gz5oifkvh376q/NJShipwrecks.csv?dl=0
  16. 16. Course by Toon Vanagt
  17. 17. Course by Toon Vanagt Click on Create Project and then Choose Files. Navigate to the NJShipwrecks.csv file and then click Next.
  18. 18. Course by Toon Vanagt Just click on “Create Project”
  19. 19. Course by Toon Vanagt This is the main interface you’ll use to work with your data. It sort of looks like Excel, but notice it shows you only 10 records at a time. That’s because you’re not supposed to be working with your data record by record; you’ll find ways to group it into batches and then work with it. We’ll try that next.
  20. 20. Course by Toon Vanagt A facet is a way to isolate certain records that share features. It’s easier to see what I mean when you try it yourself. Click on the down-arrow right next to the VESSEL TYPE column heading. Then select Facet, and then Text Facet.
  21. 21. Course by Toon Vanagt Look at the VESSEL TYPE list that appears on the lefthand side of the OpenRefine window. OpenRefine’s facet function has grouped together every term that appears in the VESSEL TYPE column, along with how many times it appears. You can sort the list of terms alphabetically by name, or by count, according to how many times those terms appear on the list. If you click on one of the terms, only those rows that contain that term will be selected. This allows you to work on your data one chunk at a time.
  22. 22. Course by Toon Vanagt Look closely at that list of terms. You’ll see that it includes two terms that are probably meant to be the same: Bark steamer and Bark Steamer. Even though a human can tell they’re meant to refer to the same thing, a computer doesn’t know that. So it’s important to clean up this data to create accurate visualizations and analyses. Hover over the Bark Steamer term in the facet list, so that you can see the Edit option. Press Edit and, in the box that appears, change Bark Steamer to Bark steamer and press Apply. Now the two terms will merge into one.
  23. 23. Course by Toon Vanagt Look again at the Facet box. You’ll see a button marked Cluster. Click it. The resulting box shows you terms that OpenRefine thinks should be merged together. Check the boxes of the terms you think should be merged and then click Merge Selected and Re-Cluster. Now experiment with some of the other items on the Method dropdown menu. What happens when you try different methods? Each uses a different algorithm to try to match terms. When you’re finished experimenting, click Close. You’ll notice you have fewer terms in your facet list.
  24. 24. Course by Toon Vanagt A lot of the problems with the data in the VESSEL TYPE were the result of variant cases (e.g., Pilot schooner versus Pilot Schooner). One way to eliminate these problems would be to make all of the terms lowercase. Let’s do that now. Click on the down arrow next to VESSEL TYPE. From the dropdown menu, click Edit cells, and then Common transforms. Finally, select To lowercase. Voila! All the vessel types are now lowercase.
  25. 25. Course by Toon Vanagt One common problem with data is extra spaces before and after the values. Those are easy to get rid of with OpenRefine. On the Year Built column, click the down arrow, then click Edit cells, then Common transforms. Finally, click Trim leading and trailing whitespace. Much better!
  26. 26. Course by Toon Vanagt Several of our columns contain location, formatted as City, State. But let’s say we want states to appear in their own column. That’s easy to do with OpenRefine. Scroll to the Departure Point column. Click the down arrow, then Edit columns, and finally Split multi-valued cells. The popup window asks which separator currently separates the values. Enter a comma and a space, since those are the two characters that lie between city and state. Then click OK. You now have two columns! You can rename them by clicking on the down arrow, then Edit column and then Rename.
  27. 27. Course by Toon Vanagt If you make a mistake in OpenRefine, no worries! It’s easy to undo. Just click on the Undo/Redo link on the lefthand side of the screen. Then click on the next-to-last step in the list. Your last action will be reversed. If you change your mind about redoing it, you can just click the last step.
  28. 28. Course by Toon Vanagt Let’s say we want to add the prefix S.S. to the name of any boat that has the vessel type schooner. We’ll do that by first using our vessel type facet to select all the rows with the term schooner in the VESSEL TYPE column. Once you have all of the schooners selected, head to the SHIP’S NAME column. Click on the down arrow, then select Edit cells, and then Transform… The popup box that follows wants you to use a language called the Google Refine Expression Language (GREL) to transform your data. You don’t have to actually know GREL; you just have to be able to look up the pattern for the expression you want to write. When you want to add a prefix to some data in OpenRefine, the pattern looks like this: “prefix”+value So in the blank text box, type “S.S. “+value You’ll see a preview of how your data will look in the lower right- hand column. When you’re satisfied, press OK. Now the title of every schooner is prefaced with “S.S.”!
  29. 29. Course by Toon Vanagt Once you’ve cleaned up your data, you’ll want to get it out of OpenRefine. To do that, click on the Exportbutton in the upper right-hand corner. Then click on Comma-separated value. Your cleaned-up spreadsheet should begin downloading. You can download your data as many times as you want, at any stage of the project. To close OpenRefine, just close the window or tab in your browser.
  30. 30. Course by Toon Vanagt These are some of the most common tasks you’ll want to perform in OpenRefine, but OpenRefine can also handle tasks of much greater complexity. To get a sense of some of these tasks, see the resources on the OpenRefine Resources page: http://miriamposner.com/classe s/dh101f17/tutorials-guides/data- manipulation/openrefine-resources/
  31. 31. Course by Toon Vanagt • Data.gov ( http://data.gov) • The US Government pledged last year to make all government data available freely online. • Socrata is another interesting place to explore government- related data, with some visualisation tools built-in. • European Union Open Data Portal ( http://open- data.europa.eu/en/data/ • Data.gov.uk http://data.gov.uk/ D ata from the UK Government, including the British National Bibliography – metadata on all UK books and publications since 1950. • The CIA World Factbook https://www.cia.gov/libr ary/publications/the-world- factbook/Information on history, population, economy, government, infrastructure and military of 267 countries. • UNICEF offers statistics on the situation of women and children worldwide. • World Health Organization offers world hunger, health, and disease statistics. • Amazon Web Services public datasets http://aws.amazon.com/ datasets Huge resource of public data • Face.com: A fascinating tool for facial recognition data. • Data Market is a place to check out data related to economics, healthcare, food and agriculture, and the automotive industry. • Google Public data explorer includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world. • Junar is a data scraping service that also includes data feeds. • Buzzdata is a social data sharing service that allows you to upload your own data and connect with others who are uploading their data. • GoogleFinance https://www.googl e.com/finance 40 years’ worth of stock market data, updated in real time. • DBPedia http://wiki.dbpedia.org Wikipedia is comprised of millions of pieces of data, structured and unstructured on every subject under the sun. • UCI Machine Learning Repository is a dataset specifically pre-processed for machine learning. • … And So many more, these are just a few examples
  32. 32. http://sites.miis.edu/metalab/resources/open-refine/ https://drive.google.com/file/d/0B9OONoAXyPa 5YU9kRDVMNXZrdVE/view https://drive.google.com/file/d/0B9OONoAXyPa5eV84X 1hzWFJ4VEE/view?usp=sharing https://www.propublica.org/nerds/using-google-refine- for-data-cleaning
  33. 33. Course by Toon Vanagt
  34. 34. Course by Toon Vanagt VIDI Let’s you create a visualization of your data for free. All you have to do is upload your data, select type, do a little customization and you are good to go. QLIK SENSE DESKTOP lets you create interactive data reports visualization for free. MICROSOFT BI PLATFORM Allows you to update your data from different sources and makes a report out of it. GOOGLE FUSION TABLES one of the simple tools to visualize the data. You can simply upload a file and choose how to display it And many more here : https://bit.ly/2TAlAe4
  35. 35. Course by Toon Vanagt
  36. 36. Course by Toon Vanagt ON 28 APRIL 2019, INFRABEL IS OPENING ITS DATA AND HOLDING ITS FIRST HACKATHON. REGISTRATION : https://www.eventbrite.be/e/trackathon- an-open-data-hackathon-registration-56970650750
  37. 37. Course by Toon Vanagt

×