The Definitive Guide to Do Data Science for Good

by Tobias Pfaff | 8 min read

You are a fully-equipped (or aspiring) data scientist and want to use your precious skills for solving problems that really itch the world? Welcome to the club. The good news is that there are many ways for data scientists to do good. However, the path is not always beaten and you might need to show some initiative.  This article will give you some insight on how you can get involved, either through group meetings and events, as a volunteer or in paid positions.

data science for good

Source: flickr

Continue reading

#openimpact – The Details

Why this #openimpact marathon?

These projects just deserve it. They are innovative and they make sense. They are beacons for anyone who cares about using data for the greater good.

Why ten weeks? We have a deadline for presenting the results of the marathon to a jury of a TEDx competition. We want to win a TEDx talk and tell a story: If we all work together to reach X replications of Y projects in ten weeks, imagine how many projects and their impact can be replicated in a year! DataLook won’t stop after the ten weeks. The first replications will just seed the process. We will refine the framework and use it to present projects on Ultimately, more projects will be replicated more easily and thus more often, furthering impact.

What is it?

DataLook is a directory of reusable data-driven projects for social good (see who we are here).  Each project starts with a suboptimal condition in a community and the desire to effect change on it by innovatively using data. We are a non-profit project on a mission to encourage and simplify the replication of such projects. Over the past year we have been collecting 250+ projects on Now, it’s time to select the ones that stand out. We have curated a shortlist of the most impressive reusable projects. Their impact can and should be multiplied. Let’s work together and bring these projects to more communities. Let’s do the #openimpact replication marathon.

Is replication feasible?

Oh yeah! Ten weeks sounds short, but some of the projects can be replicated within a day if all the ingredients are present. Other projects require more time, but the replication process can be started immediately. To make replication easier in the future DataLook will be building a replication framework during the #openimpact marathon — think of it as recipe model that structures information and enables you to cook a delicious dish. Read more about the first version of the replication framework here. We want to learn from your feedback and gradually improve the framework.

Replication needs the definition of open: our site is based on the open source app Telescope, our customization is open source and our data is openly accessible. The European Commission promotes the open idea and we are supported until October by SpeedUP Europe, one of the FIWARE accelerators.

What next?

Civic hackers / data scientists: Choose a project! Join us on Slack, find a team and start working!
Organizations: Activate your community! Use your channels to spread the word about the #openimpact marathon! Organize a hackathon! Sponsor a prize for replications!

You like the idea and want to participate? Great! Get in touch with us:

Slack | Twitter | Facebook |


#openimpact — Foodborne


Source: flickr

(this is one of the projects of the #openimpact shortlist)

The tool accesses the Twitter API to scan for instances of the phrase “food poisoning” tweeted within the geographic bounds of Chicago. Tweets caught by a classification algorithm are manually sorted for legitimacy and relevance, and any users identified as possible victims of food poisoning are tweeted a message to visit the Foodborne Chicago website, where they can report their illness to the CDPH via the city’s Open311 system. The health department then examines those cases the same as it does those received from all other channels.
Continue reading

#openimpact — FixMyStreet Platform

Fix My Streets

Source: flickr

(this is one of the projects of the #openimpact shortlist)

FixMyStreet Platform is an open source project to help people run websites for reporting common street problems such as potholes and broken street lights to the appropriate authority. Users locate problems using a combination of address and sticking a pin in a map without worrying about the correct authority to report it to. FixMyStreet then works out the correct authority using the problem location and type and sends a report, by email or using a web service such as Open311. Reported problems are visible to everyone so they can see if something has already been reported and leave updates. Users can also subscribe to email or RSS alerts of problems in their area.
Continue reading

#openimpact — Food Inspection Forecasting

(this is one of the projects of the #openimpact shortlist)

There are over 15,000 food establishments across the City of Chicago that are subject to sanitation inspections by the Department of Public Health. Three dozen inspectors are responsible for checking these establishments, which means one inspector is responsible for nearly 470 food establishments. The Department of Public Health has systematically collected the results of nearly 100,000 sanitation inspections; meanwhile, other city departments have collected data on 311 complaints, business characteristics, and other information. With this information, the city’s advanced analytics team and Department of Public Health teamed up to forecast food establishments that are most likely to have critical violations so that they may be inspected first. The result is that food establishments with critical violations are more likely to be discovered earlier by the Department of Public Health’s inspectors.
Continue reading

Replication Framework (V1.0)



Source: flickr

The mission of DataLook is to encourage and simplify the replication of data-driven projects for social good. How can we live up to our mission? One of the things we are having in mind is to identify and structure important project attributes. You can call the result of this exercise “framework”, “meta model”, “blueprint” or “template”. Right now we are going with replication framework.

Continue reading

#openimpact — Link-SF


Source: flickr

(this is one of the projects of the #openimpact shortlist)

Link-SF is San Francisco’s first mobile-optimized website that connects homeless and low-income residents with critical and life-saving resources nearby. Focusing on basic services such as food, shelter, medical care, hygiene services, and technology access, Link-SF utilizes cutting-edge technology to stream the most up-to-date information to the people who need it most. Link-SF was designed with three user groups in mind: 1) A growing population of low-income San Franciscans who rely on mobile technology to meet their basic needs, 2) Service providers who can use the most real-time data to direct clients in need, and 3) Everyday people who can use this information as a way to help refer San Francisco’s homeless population to a social service agency nearby.
Continue reading