OpenNews/hackdays/insideroutsider: Difference between revisions

 
(18 intermediate revisions by 9 users not shown)
Line 5: Line 5:


If you're tweeting about this hackday, please use the [https://twitter.com/#!/search/%23datahack #datahack] hashtag.
If you're tweeting about this hackday, please use the [https://twitter.com/#!/search/%23datahack #datahack] hashtag.
Full writeup of the hack day is also [http://source.mozillaopennews.org/articles/projects-opennews-mit-hack-day/ on Source].


=== Logistics ===
=== Logistics ===
* <b>Where:</b> MIT Media Lab 5th Floor [https://maps.google.com/maps?q=mit+media+lab&ll=42.361,-71.087412&spn=0.01056,0.017467&oe=utf-8&client=firefox-aurora&channel=fflb&fb=1&gl=us&hq=mit+media+lab&t=v&z=16 75 Amherst Street Cambridge, MA 02139]
* <b>Where:</b> MIT Media Lab 5th Floor [https://maps.google.com/maps?q=mit+media+lab&ll=42.361,-71.087412&spn=0.01056,0.017467&oe=utf-8&client=firefox-aurora&channel=fflb&fb=1&gl=us&hq=mit+media+lab&t=v&z=16 75 Amherst Street Cambridge, MA 02139] USA
* <b>When:</b> 3pm sharp Saturday June 22 to 4pm Sunday June 23
* <b>When:</b> 3pm sharp Saturday June 22 to 4pm Sunday June 23
* <b>Will there be food?</b> Yes, there will be food. We will be providing dinner on Saturday night, breakfast and lunch on Sunday, and snacks throughout.
* <b>Will there be food?</b> Yes, there will be food. We will be providing dinner on Saturday night, breakfast and lunch on Sunday, and snacks throughout.
* <b>What to bring</b> You will need to bring your own laptop and power supply. Also, bring any challenging civic data sets you've been wanting to wrangle.
* <b>What to bring</b> You will need to bring your own laptop and power supply. Also, bring any challenging civic data sets you've been wanting to wrangle.
* <b>We'll supply</b> the WiFi, the plugs, and collaboration and brainstorming materials like post-its, sharpies, etc.
* <b>We'll supply</b> the WiFi, the plugs, and collaboration and brainstorming materials like post-its, sharpies, etc.
==== Project needs/wants ====
Heading into Sunday, here are some of the requests people had for assistance with their projects.
* [http://onmit.hackdash.org/p/51c461171b2e34fe0a000002 "Judgmental" Court Decision Scraper] - could use help creating an easy search/index. Would like to do search with elasta search or AWS, have experience with sphinx/solr, but want to do something easier.
* [http://onmit.hackdash.org/p/51c60d31fc1f16342100073c CivOmega] - anybody with better ideas about taking sentences and making them into queries rather than just using regular expressions, e.g. natural language processing
* [http://onmit.hackdash.org/p/51c629d8fc1f163421003057 Open Gov Data Guide] - if anyone has a particular data set they'd want to share, please add it. Any good examples of environmental data, talk to Saul.
* [http://onmit.hackdash.org/p/51c37d52f64f402975004b6d NY Drug Price Data] - expert at data analysis, would like to know most optimal way to pick intervals for coloring and do data normalization.
* [http://onmit.hackdash.org/p/51c61ccbfc1f1634210015d3 OpenOpenNewsNews] - help from person who is interested in data visualization.
* [http://onmit.hackdash.org/p/51c5f133fc1f163421000071 DemocracyMap API] - help with naming the project.- Help with leaflet.js, postgis, javascript building out mapping


==== Schedule ====
==== Schedule ====
Line 22: Line 34:


<b>Sunday 6/23</b>
<b>Sunday 6/23</b>
* 9:00am Building opens
* 9am Building opens
* 9:30am Breakfast
* 9am Breakfast
* 12:30pm Lunch
* 11pm Brunch
* 2:45pm Show and Tell
* 2:45pm Show and Tell (with pizza)
* 3:45pm Closing Circle
* 3:45pm Closing Circle


=== Project Teams and Ideas ===
==== Communications ====
Want to get in touch with other hack day participants?
* Join the #opennews channel on irc.mozilla.org
* Tweet questions/ideas with #datahack hashtag. Tag @opennews with any questions about the event
* [mailto:opennews@mozillafoundation.org Email OpenNews] if there are any questions/concerns/ideas that should be emailed to the group on off hours.


We're going to use HackDash, developed by our friends at Hacks/Hackers Buenos Aires, to help gather teams and ideas this year. We'll get it set up and running, with instructions on how to add your ideas into the mix, on Thursday June 20th.
=== HackDash: Project Teams and Ideas ===
 
We're going to use [http://onmit.hackdash.org/ HackDash], developed by our friends at Hacks/Hackers Buenos Aires, to help gather teams and ideas this year. [http://bahackaton.herokuapp.com/ Here's an example] from a hackathon in Buenos Aires.
 
'''How to use HackDash'''
* Go to [http://onmit.hackdash.org/ our HackDash page]. There's an [http://onmit.hackdash.org/p/51a8cdcae8b8537b1100006b example project] listed to show the basic project format.
* To create or join a project, log in with Twitter (if you don't have a Twitter account, [mailto:erika@mozillafoundation.org email Erika] for assistance).
* To create a project:
** Click create a project.
** If your project exists on GitHub, you can import some of the fields from your GitHub repo using the GitHub importer.
** Give your project a title and description. Both of these items will be shown on the project card on the HackDash page. You can also include a photo associated with your project.
** If there's a link associated with your project, you can include that as well as any topical tags.
** The final drop list is for "state" of your project, which you can update as the project progresses from brainstorming to wireframing and so on.
** Click create project!
* Once a project exists, anyone can join, like, or follow the project. The Twitter avatars for team members are displayed at the bottom of the project card.
* Each project card also includes a Disqus comment thread where team members can communicate or other people can offer feedback on the idea.
 
It should be that simple. We'll have an easy way to collaborate and see all of the projects from the hack day. Please go ahead and start adding project ideas. If a project looks interesting to you, join the team.
 
[mailto:erika@mozillafoundation.org Let Erika know] if you have any questions or run into issues with the setup. This tool is in active development by [https://github.com/danzajdband/hackdash Dan Zajdband], so we can get help with any questions and any feedback is much appreciated.


=== Data "White Whales" ===
=== Data "White Whales" ===
Line 51: Line 86:
* [https://dl.dropboxusercontent.com/u/6682410/FY%202013%20Schedule%20C%20-%20Merge%20Final1.pdf 2013 New York City Council budget document (warning large PDF download)]
* [https://dl.dropboxusercontent.com/u/6682410/FY%202013%20Schedule%20C%20-%20Merge%20Final1.pdf 2013 New York City Council budget document (warning large PDF download)]
* [http://www.nyc.gov/html/nypd/html/traffic_reports/motor_vehicle_accident_data.shtml NYPD Motor Vehicle Accident Data]
* [http://www.nyc.gov/html/nypd/html/traffic_reports/motor_vehicle_accident_data.shtml NYPD Motor Vehicle Accident Data]
'''From Phil Ashlock, Civic Agency:'''
* [http://onmit.hackdash.org/p/51c5f133fc1f163421000071 City officials contact info] (per state)
This data fuels the [http://api.democracymap.org DemocracyMap API] and there's information about contributing additional scrapers at http://api.democracymap.org/#get-involved


'''From Daniel X O'Neil, Smart Chicago Collaborative/Everyblock:'''  
'''From Daniel X O'Neil, Smart Chicago Collaborative/Everyblock:'''  
Daniel, check out: http://stopfrisknyc.github.io/.
* [http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml NYPD Stop, Question and Frisk Report Database]
* [http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml NYPD Stop, Question and Frisk Report Database]
The data is amazingly detailed ([http://www.jjay.cuny.edu/web_images/PRIMER_electronic_version.pdf here's a great primer]), and lends itself to great visualizations ([http://www.nytimes.com/interactive/2010/07/11/nyregion/20100711-stop-and-frisk.html?ref=stopandfrisk here's one re: 2009 data]). The data itself is published in a highly inaccessible to regular people (notwithstanding the fact that is extremely well-structured as an SPSS portable file. Publishing this info as an easy-to-search, RSS-ready list of items would be high value.
The data is amazingly detailed ([http://www.jjay.cuny.edu/web_images/PRIMER_electronic_version.pdf here's a great primer]), and lends itself to great visualizations ([http://www.nytimes.com/interactive/2010/07/11/nyregion/20100711-stop-and-frisk.html?ref=stopandfrisk here's one re: 2009 data]). The data itself is published in a highly inaccessible to regular people (notwithstanding the fact that is extremely well-structured as an SPSS portable file. Publishing this info as an easy-to-search, RSS-ready list of items would be high value.
Line 89: Line 130:


Data for the Boston metro area and dozens of other cities are available for download at http://metro.teczno.com/
Data for the Boston metro area and dozens of other cities are available for download at http://metro.teczno.com/
Other OSM JavaScript queries can be made using the Overpass library: http://wiki.openstreetmap.org/wiki/Overpass_API
==== NearbyFYI ====
* [http://said.nearbyfyi.com/docs/V1/ NearbyFYI - local government documents]
* [http://www.nearbyfyi.com Search interface for the documents we are collecting]
What would you do with 100,000+ documents and extracted text from 170 city and town municipalities in Vermont? We collect city and town documents from select board meeting minutes, planning and zoning committees and other local government legislation. These are often published as PDFs and difficult to scrape HTML. We classify, extract entities [People, Companies, Locations], terms and make them searchable. This is a corpus of partially structured raw text from hundreds of cities and towns.
==== Open Data Tech Review by ODI ====
From Marcio Vasconcelos a [https://github.com/theodi/open-data-tech-review/wiki Wiki] compiled by Open Data Institute with several tools.
==== DemocracyMap API ====
The [http://api.democracymap.org/ DemocracyMap API] aims to provide normalized structured data for all the contact details and other primary information for every government body and government official that represents you (limited to the US for now). Simply enter an address or lat/long and get back the full stack of government bodies and elected officials. Much of this API relies on third parties so it essentially aggregates, normalizes, and caches a variety data sources including geospatial boundary queries and scrapers on ScraperWiki. It does not yet sync all data in a central datastore, so performance is not nearly as efficient as it could be because. Much of the aggregation happens on the fly.
The current coverage includes primary contact information for every city, county, and state in the United States as well as contact information for all state and national legislators, all governors, all county officials, and over 100,000 municipal officials.
The API docs can be found at http://api.democracymap.org/ and a basic demo can be seen at http://api.democracymap.org/demo
==== Semantria ====
Thanks to Waldo, the folks at [https://semantria.com/ Semantria] have offered participants access to their Text Analytics and Sentiment Analysis APIs. Here's what they have to say:
We do have great documentation on the support section of our website. Here's a link to all of our support pages: [http://support.semantria.com http://support.semantria.com]
There are so many articles I feel like it can be a bit overwhelming. Here is one that is strictly on sentiment analysis: [http://support.semantria.com/customer/portal/articles/834168-about-semantria-s-sentiment-analysis http://support.semantria.com/customer/portal/articles/834168-about-semantria-s-sentiment-analysis]
Here is a link to our video page: [https://semantria.com/excel/tutorial https://semantria.com/excel/tutorial]
Once again, there are quite a few videos, so here are the two I recommend people to check out:
Building Categories for Survey Analysis (it's nice because it's a use case): https://www.youtube.com/watch?v=_pYsJdOqKE4&feature=player_embedded
Sentiment Analysis: [https://www.youtube.com/watch?v=Ypdf4QbokXo&feature=player_embedded https://www.youtube.com/watch?v=Ypdf4QbokXo&feature=player_embedded]
Here is a link that participants can register with: [https://semantria.com/user/login_register https://semantria.com/user/login_register]
All of our accounts come with 10k documents for free. Please let them know that if they need more than 10k, they simply have to contact someone from Semantria and we can load their account with as many calls as they need. Please let me know if there's anything else I can do to help out from my end before the event starts.
==== Google Civic Information API ====
The Google Civic Information API allows developers to build applications that display civic information including polling place, early vote location, candidate data, and election official information to users. The initial version of the API is geared towards election-related information for the United States.  We will have data for the upcoming New York City Mayoral election.
https://developers.google.com/civic-information/
==== Archive.org TV News Closed Caption Data ====
You can use this script to query Archive.orgs TVNews archive (http://archive.org/details/tv) and return a JSON dump. Background and example: http://www.niemanlab.org/2013/03/tracking-memes-across-television-news-a-tool-for-analyzing-how-stories-move-through-broadcast/
https://github.com/mstem/archive.org-getter
==== LazyTruth Misinformation Database ====
Get in touch with @mstem if you want to talk about creative uses for a credibility API, consisting of rumors and myths and their associated debunks.


=== Communications ===
=== Communications ===
Confirmed users
266

edits