Chapter 7 Conclusion
Many international workers dream of extending their time in the U.S. – and even stay – for diverse reasons. May it be better and more opportunities, improved quality of life, or learning from the best in their field, this dream is prevalent among internationals. This desire motivated us to research how could we, as international students, increase our chances of staying in the United States through our work in Data Science, which led to us analyzing the world of the H1-B visas.
In this project we found relevant insights shedding light on the types of companies that sponsor visas and the employees who received them. However, as future work, it would be relevant to include demographic data from the applicants to see if there are biases in employers. Doing so could help internationals decide which companies to work for and which to avoid. Additionally, a natural next step in the analysis would be to research if it’s possible to get information about Green Card approvals since many of them are for H1-B Visa Holders. Such data integration could help us understand the chances of getting a Green Card and how to maximize those chances.
We learned valuable lessons in this project. Real-world data is dirty, scattered across many different sources, and not standardized. Cleaning and organizing the data took more time than expected, which took time away from other tasks we wanted to prioritize – revealing flaws in our project management skills.
Furthermore, we learned that expectations and goals change dynamically. In the beginning, our main goal was to understand what drives H1-B visa approvals (or to see if it’s a random process) because there is a field in our dataset called “case status” which informs if the visa is approved or denied by the Department Of Labour (DOF). What we didn’t know at the time was that the DOF doesn’t make the final call, it’s the United States Citizen and Immigration Services (USCIS) – and they make most of the visa rejections. Therefore, we couldn’t use that information and needed to switch objectives and readjust in the middle of the project. This example shows how domain knowledge is paramount when analyzing data – it’s not just crunching numbers and plotting graphs.